Re: [Python-Dev] PEP 410 (Decimal timestamp): the implementation is ready for a review
However, I am still -1 on the solution proposed by the PEP. I still think that migrating to datetime use is a better way to go, rather than a proliferation of the data types used to represent timestamps, along with an API to specify the type of data returned.
Let's look at each item in the PEPs rationale for discarding the use of datetimes:
Oh, I forgot to mention my main concern about datetime: many functions returning timestamp have an undefined starting point (an no timezone information ), and so cannot be converted to datetime: - time.clock(), time.wallclock(), time.monotonic(), time.clock_gettime() (except for CLOCK_REALTIME) - time.clock_getres() - signal.get/setitimer() - os.wait3(), os.wait4(), resource.getrusage() - etc. Allowing datetime.datetime type just for few functions (like datetime.datetime or time.time) but not the others (raise an exception) is not an acceptable solution.
I'm looking at a use case from my flufl.lock library:
return datetime.datetime.fromtimestamp( os.stat(self._lockfile).st_mtime)
Keep your code but just add timestamp=decimal.Decimal argument to os.stat() to get high-resolution timestamps! (well, you would at least avoid loss of precision loss if datetime is not improved to support nanosecond.)
* datetime.datetime has ordering issues with daylight saving time (DST) in the duplicate hour of switching from DST to normal time.
Sure, but only for timezone-ful datetimes, right?
I don't know enough this topic to answer. Martin von Loewis should answer to this question!
* datetime.datetime is not as well integrated than Epoch timestamps, some functions don't accept this type as input. For example, os.utime() expects a tuple of Epoch timestamps.
So, by implication, Decimal is better integrated by virtue of its ability to be coerced to floats and other numeric stack types?
Yes. decimal.Decimal is already supported by all functions accepting float (all functions expecting timestamps).
Will users ever have to explicitly convert Decimal types to use other APIs?
Sorry, I don't understand. What do you mean?
It bothers me that the PEP is proposing that users will now have to be prepared to handle yet another (and potentially *many* more) data types coming from what are essentially datetime-like APIs.
Users only get decimal.Decimal if they ask explicitly for decimal.Decimal. By default, they will still get float. Most users don't care of nanoseconds :-) If a library choose to return Decimal instead of float, it's a change in the library API unrelated to the PEP.
If it really is impossible or suboptimal to build high resolution datetimes and timedeltas, and to use them in these APIs, then at the very least, the PEP needs a stronger rationale for why this is.
IMO supporting nanosecond in datetime and timedelta is an orthogonal issue. And yes, the PEP should maybe give better arguments against datetime :-) I will update the PEP to mention the starting point issue.
In any case, thanks for your work in this (and so many other!) areas.
You're welcome :)
On Tue, Feb 14, 2012 at 4:33 AM, Victor Stinner <victor.stinner@gmail.com> wrote:
However, I am still -1 on the solution proposed by the PEP. I still think that migrating to datetime use is a better way to go, rather than a proliferation of the data types used to represent timestamps, along with an API to specify the type of data returned.
Let's look at each item in the PEPs rationale for discarding the use of datetimes:
Oh, I forgot to mention my main concern about datetime: many functions returning timestamp have an undefined starting point (an no timezone information ), and so cannot be converted to datetime: - time.clock(), time.wallclock(), time.monotonic(), time.clock_gettime() (except for CLOCK_REALTIME) - time.clock_getres() - signal.get/setitimer() - os.wait3(), os.wait4(), resource.getrusage() - etc.
Allowing datetime.datetime type just for few functions (like datetime.datetime or time.time) but not the others (raise an exception) is not an acceptable solution.
A datetime module based approach would need to either use a mix of datetime.datetime() (when returning an absolute time) and datetime.timedelta() (when returning a time relative to an unknown starting point), or else just always return datetime.timedelta (even when we know the epoch and could theoretically make the time absolute). In the former case, it may be appropriate to adopt a boolean flag API design and the "I want high precision time" request marker would just be "datetime=True". You'd then get back either datetime.datetime() or datetime.timedelta() as appropriate for the specific API. In the latter case, the design would be identical to the current PEP, only with "datetime.timedelta" in place of "decimal.Decimal". The challenge relative to the current PEP is that any APIs that wanted to *accept* either of these as a timestamp would need to do some specific work to avoid failing with a TypeError. For timedelta values, we'd have to define a way to easily extract the full precision timestamp as a number (total_seconds() currently returns a float, and hence can't handle nanosecond resolutions), as well as improving interoperability with algorithms that expected a floating point value. If handed a datetime value, you need to know the correct epoch value, do the subtraction, then extract the full precision timestamp from the resulting timedelta object. To make a datetime module based counter-proposal acceptable, it would need to be something along the following lines: - to avoid roundtripping problems, only return timedelta() (even for cases where we know the epoch and could theoretically return datetime instead) - implement __int__ and __float__ on timedelta (where the latter is just "self.total_seconds()" and the former "int(self.total_seconds())") It may also take some fancy footwork to avoid a circular dependency between time and datetime while supporting this (Victor allowed this in an earlier version of his patch, but he did it by accepting datetime.datetime and datetime.time_delta directly as arguments to the affected APIs). That's a relatively minor implementation concern, though (at worst it would require factoring out a support module used by both datetime and time). The big problem is that datetime and timedelta pose a huge problem for compatibility with existing third party APIs that accept timestamp values. This is in stark contrast to what happens with decimal.Decimal: coercion to float() or int() will potentially lose precision, but still basically works. While addition and subtraction of floats will fail, addition and subtraction of integers works fine. To avoid losing precision, it's sufficient to just avoid the coercion. I think the outline above really illustrates why the *raw* data type for timestamps should just be a number, not a higher level semantic type like timedelta or datetime. Eventually, you want to be able to express a timestamp as a number of seconds relative to a particular epoch. To do that, you want a number. Originally, we used ints, then, to support microsecond resolution, we used floats. The natural progression to support arbitrary resolutions is to decimal.Decimal. Then, the higher level APIs can be defined in *terms* of that high precision number. Would it be nice if there was a PyPI module that provided APIs that converted the raw timestamps in stat objects and other OS level APIs into datetime() and timedelta() objects as appropriate? Perhaps, although I'm not sure it's necessary. But are those types low-level enough to be suitable for the *OS* interface definition? I don't think so - we really just want a number to express "seconds since a particular time" that plays fairly nicely with other numbers, not anything fancier than that. Notice that PEP 410 as it stands can be used to *solve* the problem of how to extract the full precision timestamp from a timedelta object as a number: timedelta.total_seconds() can be updated to accept a "timestamp" argument, just like the other time related APIs already mentioned in the PEP. Then "delta.total_seconds(timestamp=decimal.Decimal)" will get you a full precision timestamp. If PEP 410 was instead defined in *terms* of timedelta, it would need to come up with a *different* solution for this. Also, by using decimal.Decimal, we open up the possibility of, at some point in the future, switching to returning high precision values by default (there are at least two prerequisites for that, though: incorporation of cdecimal into CPython and implicit promotion of floats to decimal values in binary operations without losing data. We've already started down that path by accepting floating point values directly in the Decimal constructor). No such migration path for the default behaviour presents itself for an API based on datetime or timedelta (unless we consider making timedelta behave a *lot* more like a number than it does now). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
A datetime module based approach would need to either use a mix of datetime.datetime() (when returning an absolute time) and datetime.timedelta() (when returning a time relative to an unknown starting point),
Returning a different type depending on the function would be surprising and confusing. time.clock_gettime(CLOCK_REALTIME) would return datetime.datetime, whereas time.clock_gettime(CLOCK_MONOTONIC) would return datetime.timedelta? Or time.clock_gettime(CLOCK_REALTIME) would return datetime.timedelta whereas time.time() would return datetime.datetime? What would be the logic?
or else just always return datetime.timedelta (even when we know the epoch and could theoretically make the time absolute).
datetime.timedelta is similar to decimal.Decimal, but I don't want to support both, one is enough. I prefer Decimal because it is simpler and "compatible" with float.
In the former case, it may be appropriate to adopt a boolean flag API design and the "I want high precision time" request marker would just be "datetime=True". You'd then get back either datetime.datetime() or datetime.timedelta() as appropriate for the specific API.
A boolean flag has a problem with the import of the decimal module: time.time(decimal=True) would need an implicit ("hidden") import of the decimal module. Another argument present in the PEP: "The boolean argument API was rejected because it is not "pythonic". Changing the return type with a parameter value is preferred over a boolean parameter (a flag)." http://www.python.org/dev/peps/pep-0410/#add-a-boolean-argument
If handed a datetime value, you need to know the correct epoch value, do the subtraction, then extract the full precision timestamp from the resulting timedelta object.
datetime.datetime don't have a .totimestamp() method. If I remember correctly, time.mktime(datetime.datetime.timetuple()) has issues with timezone and the DST.
- implement __int__ and __float__ on timedelta (where the latter is just "self.total_seconds()" and the former "int(self.total_seconds())")
It looks like an hack. Why would float(timedelta) return seconds? Why not minutes or nanoseconds? I prefer an unambiguously and explicit .toseconds() method.
The big problem is that datetime and timedelta pose a huge problem for compatibility with existing third party APIs that accept timestamp values.
I just think that datetime and timedelta are overkill and have more drawbacks than advantages. FYI when I implemented datetime, it just just implemented by calling datetime.datetime.fromtimestamp(). The user can do an explicit call to this function, and datetime.timedelta(seconds=ts) for timedelta.
This is in stark contrast to what happens with decimal.Decimal: coercion to float() or int() will potentially lose precision, but still basically works. While addition and subtraction of floats will fail, addition and subtraction of integers works fine. To avoid losing precision, it's sufficient to just avoid the coercion.
Why would you like to mix Decimal and float? If you ask explicitly to get Decimal timestamps, you should use Decimal everywhere or you lose advantages of this type (and may get TypeError).
I think the outline above really illustrates why the *raw* data type for timestamps should just be a number, not a higher level semantic type like timedelta or datetime. Eventually, you want to be able to express a timestamp as a number of seconds relative to a particular epoch. To do that, you want a number. Originally, we used ints, then, to support microsecond resolution, we used floats. The natural progression to support arbitrary resolutions is to decimal.Decimal.
Yep.
Then, the higher level APIs can be defined in *terms* of that high precision number. Would it be nice if there was a PyPI module that provided APIs that converted the raw timestamps in stat objects and other OS level APIs into datetime() and timedelta() objects as appropriate?
Do you really need a module to call datetime.datetime.fromtimestamp(ts) and datetime.timedelta(seconds=ts)?
timedelta.total_seconds() can be updated to accept a "timestamp" argument
Yes, it would be consistent with the other changes introduced by the PEP.
Also, by using decimal.Decimal, we open up the possibility of, at some point in the future, switching to returning high precision values by default
I don't think that it is necessary. Few people need this precision and float will always be faster than Decimal because float is implemented in *hardware* (FPU). I read somewhere that IBM plans to implement decimal float in their CPU, but I suppose than it will also have a "small" size like 64 bits, whereas 64 bits is not enough for a nanosecond resolution (same issue than binary float).
implicit promotion of floats to decimal values in binary operations without losing data
I don't think that such change would be accepted. You should ask Stephan Krah or Mark Dickson :-) -- I completed datetime, timedelta and boolean flag sections of the PEP 410. Victor
FWIW, I'm with Barry on this; doing more with the datetime types seems preferable to introducing yet more different stuff to date/time handling. On Mon, Feb 13, 2012 at 19:33, Victor Stinner <victor.stinner@gmail.com> wrote:
Oh, I forgot to mention my main concern about datetime: many functions returning timestamp have an undefined starting point (an no timezone information ), and so cannot be converted to datetime: - time.clock(), time.wallclock(), time.monotonic(), time.clock_gettime() (except for CLOCK_REALTIME) - time.clock_getres() - signal.get/setitimer() - os.wait3(), os.wait4(), resource.getrusage() - etc.
Allowing datetime.datetime type just for few functions (like datetime.datetime or time.time) but not the others (raise an exception) is not an acceptable solution.
It seems fairly simple to suggest that the functions with an undefined starting point could return a timedelta instead of a datetime?
* datetime.datetime has ordering issues with daylight saving time (DST) in the duplicate hour of switching from DST to normal time.
Sure, but only for timezone-ful datetimes, right?
I don't know enough this topic to answer. Martin von Loewis should answer to this question!
Yes, this should only be an issue for dates with timezones.
* datetime.datetime is not as well integrated than Epoch timestamps, some functions don't accept this type as input. For example, os.utime() expects a tuple of Epoch timestamps.
So, by implication, Decimal is better integrated by virtue of its ability to be coerced to floats and other numeric stack types?
Yes. decimal.Decimal is already supported by all functions accepting float (all functions expecting timestamps).
I suppose something like os.utime() could be changed to also accept datetimes.
If it really is impossible or suboptimal to build high resolution datetimes and timedeltas, and to use them in these APIs, then at the very least, the PEP needs a stronger rationale for why this is.
IMO supporting nanosecond in datetime and timedelta is an orthogonal issue.
Not if you use it to cast them aside for this issue. ;) Cheers, Dirkjan
IMO supporting nanosecond in datetime and timedelta is an orthogonal issue.
Not if you use it to cast them aside for this issue. ;)
Hum yes, I wanted to say that even if we don't keep datetime as a supported type for time.time(), we can still patch the type to make it support nanosecond resolution. Victor
On Feb 13, 2012, at 07:33 PM, Victor Stinner wrote:
Oh, I forgot to mention my main concern about datetime: many functions returning timestamp have an undefined starting point (an no timezone information ), and so cannot be converted to datetime: - time.clock(), time.wallclock(), time.monotonic(), time.clock_gettime() (except for CLOCK_REALTIME) - time.clock_getres() - signal.get/setitimer() - os.wait3(), os.wait4(), resource.getrusage() - etc.
That's not strictly true though, is it? E.g. clock_gettime() returns the number of seconds since the Epoch, which is a well-defined start time at least on *nix systems. So clearly those types of functions could return datetimes. I'm fairly certain that between those types of functions and timedeltas you could have most of the bases covered. -Barry
2012/2/14 Barry Warsaw <barry@python.org>:
On Feb 13, 2012, at 07:33 PM, Victor Stinner wrote:
Oh, I forgot to mention my main concern about datetime: many functions returning timestamp have an undefined starting point (an no timezone information ), and so cannot be converted to datetime: - time.clock(), time.wallclock(), time.monotonic(), time.clock_gettime() (except for CLOCK_REALTIME) - time.clock_getres() - signal.get/setitimer() - os.wait3(), os.wait4(), resource.getrusage() - etc.
That's not strictly true though, is it? E.g. clock_gettime() returns the number of seconds since the Epoch, which is a well-defined start time at least on *nix systems.
I mentionned the exception: time.clock_gettime(CLOCK_REALTIME) returns an Epoch timestamp, but all other clocks supported by clock_gettime() has an unspecified starting point: - CLOCK_MONOTONIC - CLOCK_MONOTONIC_RAW - CLOCK_PROCESS_CPUTIME_ID - CLOCK_THREAD_CPUTIME_ID
So clearly those types of functions could return datetimes.
What? What would be the starting point for all these functions? It would be surprising to get a datetime for CLOCK_PROCESS_CPUTIME_ID for example.
I'm fairly certain that between those types of functions and timedeltas you could have most of the bases covered.
Ah, timedelta case is different. But I already replied to Nick in this thread about timedelta. You can also
(Oops, I sent my email by mistake, here is the end of my email)
(...) Ah, timedelta case is different. But I already replied to Nick in this thread about timedelta. You can also
see arguments against timedelta in the PEP 410. Victor
I think I will just state my reasoning one last time and then leave it to the BDFL or BDFOP to make the final decision. Victor on IRC says that there is not much difference between Decimal and timedelta, and this may be true from an implementation point of view. From a cognitive point of view, I think they're miles apart. Ultimately, I wish ints and floats weren't used for time-y things, and only datetimes (for values with well-defined starting points, including the epoch) and timedeltas (for values with no starting point) were used. We obviously can't eliminate the APIs that return and accept ints and floats, most of which we inherited from C, but we can avoid making it worse by extended them to also accept Decimals. I think it would be valuable work to correct any deficiencies in datetimes and timedeltas so that they can be used in all time-y APIs, with whatever resolution is necessary. My primary concern with the PEP is adding to users confusion when they have to handle (at least) 5 different types[*] that represent time in Python. Cheers, -Barry [*] int, float, Decimal, datetime, timedelta; are there others?
On Wed, Feb 15, 2012 at 8:29 AM, Barry Warsaw <barry@python.org> wrote:
My primary concern with the PEP is adding to users confusion when they have to handle (at least) 5 different types[*] that represent time in Python.
My key question to those advocating the use of timedelta instead of Decimal: What should timedelta.total_seconds() return to avoid losing nanosecond precision? How should this be requested when calling the API? The core "timestamp" abstraction is "just a number" that (in context) represents a certain number of seconds. decimal.Decimal qualifies. datetime.timedelta doesn't - it's a higher level construct that makes the semantic context explicit (and currently refuses to interoperate with other values that are just numbers). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Tue, Feb 14, 2012 at 4:23 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On Wed, Feb 15, 2012 at 8:29 AM, Barry Warsaw <barry@python.org> wrote:
My primary concern with the PEP is adding to users confusion when they have to handle (at least) 5 different types[*] that represent time in Python.
My key question to those advocating the use of timedelta instead of Decimal:
What should timedelta.total_seconds() return to avoid losing nanosecond precision? How should this be requested when calling the API?
It should return a float as it does today. Add a timedelta.total_nanoseconds() call for people wanting high precision as a raw number and remind people of the precision limits of total_seconds() in the docs. -gps
On Tue, Feb 14, 2012 at 5:13 PM, Gregory P. Smith <greg@krypto.org> wrote:
On Tue, Feb 14, 2012 at 4:23 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On Wed, Feb 15, 2012 at 8:29 AM, Barry Warsaw <barry@python.org> wrote:
My primary concern with the PEP is adding to users confusion when they have to handle (at least) 5 different types[*] that represent time in Python.
My key question to those advocating the use of timedelta instead of Decimal:
What should timedelta.total_seconds() return to avoid losing nanosecond precision? How should this be requested when calling the API?
It should return a float as it does today. Add a timedelta.total_nanoseconds() call for people wanting high precision as a raw number and remind people of the precision limits of total_seconds() in the docs.
total_nanoseconds() would return an int() in case that wasn't obvious.
-gps
On Feb 15, 2012, at 10:23 AM, Nick Coghlan wrote:
What should timedelta.total_seconds() return to avoid losing nanosecond precision? How should this be requested when calling the API?
See, I have no problem having this method return a Decimal for high precision values. This preserves the valuable abstraction of timedeltas, but also provides a useful method for interoperability.
The core "timestamp" abstraction is "just a number" that (in context) represents a certain number of seconds. decimal.Decimal qualifies. datetime.timedelta doesn't - it's a higher level construct that makes the semantic context explicit (and currently refuses to interoperate with other values that are just numbers).
Right, but I think Python should promote the abstraction as the way to manipulate time-y data. Interoperability is an important principle to maintain, but IMO the right way to do that is to improve datetime and timedelta so that lower-level values can be extracted from, and added to, the higher-level abstract types. I think there are quite a few opportunities for improving the interoperability of datetime and timedelta, but that shouldn't be confused with bypassing them. Cheers, -Barry
On Tue, Feb 14, 2012 at 2:29 PM, Barry Warsaw <barry@python.org> wrote:
I think I will just state my reasoning one last time and then leave it to the BDFL or BDFOP to make the final decision.
Victor on IRC says that there is not much difference between Decimal and timedelta, and this may be true from an implementation point of view. >From a cognitive point of view, I think they're miles apart. Ultimately, I wish ints and floats weren't used for time-y things, and only datetimes (for values with well-defined starting points, including the epoch) and timedeltas (for values with no starting point) were used.
We obviously can't eliminate the APIs that return and accept ints and floats, most of which we inherited from C, but we can avoid making it worse by extended them to also accept Decimals. I think it would be valuable work to correct any deficiencies in datetimes and timedeltas so that they can be used in all time-y APIs, with whatever resolution is necessary.
My primary concern with the PEP is adding to users confusion when they have to handle (at least) 5 different types[*] that represent time in Python.
+1
Am 14.02.2012 23:29, schrieb Barry Warsaw:
I think I will just state my reasoning one last time and then leave it to the BDFL or BDFOP to make the final decision.
I'd like to remind people what the original point of the PEP process was: to avoid going in cycles in discussions. To achieve this, the PEP author is supposed to record all objections in the PEP, even if he disagrees (and may state rebuttals for each objection that people brought up). So, Victor: please record all objections in a separate section of the PEP, rather than just rebutting in them in the PEP (as is currently the case).
My primary concern with the PEP is adding to users confusion when they have to handle (at least) 5 different types[*] that represent time in Python.
I agree with Barry here (despite having voiced support for using Decimal before): datetime.datetime *is* the right data type to represent time stamps. If it means that it needs to be improved before it can be used in practice, then so be it - improve it. I think improving datetime needs to go in two directions: a) arbitrary-precision second fractions. My motivation for proposing/supporting Decimal was that it can support arbitrary precision, unlike any of the alternatives (except for using numerator/denominator pairs). So just adding nanosecond resolution to datetime is not enough: it needs to support arbitrary decimal fractions (it doesn't need to support non-decimal fractions, IMO). b) distinction between universal time and local time. This distinction is currently blurred; there should be prominent API to determine whether a point-in-time is meant as universal time or local time. In terminology of the datetime documentation, there needs to be builtin support for "aware" (rather than "naive") UTC time, even if that's the only timezone that comes with Python. Regards, Martin
On Wed, Feb 15, 2012 at 10:11, "Martin v. Löwis" <martin@v.loewis.de> wrote:
My primary concern with the PEP is adding to users confusion when they have to handle (at least) 5 different types[*] that represent time in Python.
I agree with Barry here (despite having voiced support for using Decimal before): datetime.datetime *is* the right data type to represent time stamps. If it means that it needs to be improved before it can be used in practice, then so be it - improve it.
I think improving datetime needs to go in two directions: a) arbitrary-precision second fractions. My motivation for proposing/supporting Decimal was that it can support arbitrary precision, unlike any of the alternatives (except for using numerator/denominator pairs). So just adding nanosecond resolution to datetime is not enough: it needs to support arbitrary decimal fractions (it doesn't need to support non-decimal fractions, IMO). b) distinction between universal time and local time. This distinction is currently blurred; there should be prominent API to determine whether a point-in-time is meant as universal time or local time. In terminology of the datetime documentation, there needs to be builtin support for "aware" (rather than "naive") UTC time, even if that's the only timezone that comes with Python.
+1. And adding stuff to datetime to make it easier to get a unix timestamp out (as proposed by Victor before, IIRC) would also be a good thing in my book. I really want to be able to handle all my date+time needs without ever importing time or calendar. Cheers, Dirkjan
On Wed, Feb 15, 2012 at 7:11 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
I agree with Barry here (despite having voiced support for using Decimal before): datetime.datetime *is* the right data type to represent time stamps. If it means that it needs to be improved before it can be used in practice, then so be it - improve it.
By contrast, I think the only remotely viable choices for arbitrary precision low level timestamp APIs are decimal.Decimal and datetime.timedelta. The "unknown epoch" problem makes it impossible to consistently produce datetime.datetime objects, and an API that inconsistently returned either datetime.datetime or datetime.timedelta for operations that currently consistently return float objects would just be annoying. However, I still think that decimal.Decimal is the right choice. There's nothing wrong with layering APIs, and the core concept of a timestamp is simply a number representing a certain number of seconds. We already have a data type that lets us represent a numeric value to arbitrary precision: decimal.Decimal. Instead of trying to hoist all those APIs up to a higher semantic level, I'd prefer to just leave them as they are now: dealing with numbers (originally ints, then floats to support microseconds, now decimal.Decimal to support nanoseconds and any future increases in precision). If the higher level semantic API is incomplete, then we should *complete it* instead of trying to mash the two different layers together indiscriminately.
I think improving datetime needs to go in two directions: a) arbitrary-precision second fractions. My motivation for proposing/supporting Decimal was that it can support arbitrary precision, unlike any of the alternatives (except for using numerator/denominator pairs). So just adding nanosecond resolution to datetime is not enough: it needs to support arbitrary decimal fractions (it doesn't need to support non-decimal fractions, IMO).
If our core timestamp representation is decimal.Decimal, this is trivial to implement for both datetime and timedelta - just store the seconds component as a decimal.Decimal instance. If not, we'd have to come up with some other way of obtaining arbitrary precision numeric storage (which seems rather wasteful). Even if we end up going down the datetime.timedelta path for the os module APIs, that's still the way I would want to go - arranging for timedelta.total_seconds() to return a Decimal value, rather than some other clumsy alternative like having a separate total_nanoseconds() function that returned a large integer.
b) distinction between universal time and local time. This distinction is currently blurred; there should be prominent API to determine whether a point-in-time is meant as universal time or local time. In terminology of the datetime documentation, there needs to be builtin support for "aware" (rather than "naive") UTC time, even if that's the only timezone that comes with Python.
As of 3.2, the datetime module already has full support for arbitrary fixed offsets from UTC, including datetime.timezone.utc (i.e. UTC+0), which allows timezone aware UTC. For 3.2+, you should only need a third party library like pytz if you want to support named timezones (including daylight savings changes). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
I agree with Barry here (despite having voiced support for using Decimal before): datetime.datetime *is* the right data type to represent time stamps. If it means that it needs to be improved before it can be used in practice, then so be it - improve it.
Maybe I missed the answer, but how do you handle timestamp with an unspecified starting point like os.times() or time.clock()? Should we leave these function unchanged? My motivation for the PEP 410 is to provide nanosecond resolution for time.clock_gettime(time.CLOCK_MONOTONIC) and time.clock_gettime(time.CLOCK_REALTIME). Victor
PEP author Victor asked (in http://mail.python.org/pipermail/python-dev/2012-February/116499.html):
Maybe I missed the answer, but how do you handle timestamp with an unspecified starting point like os.times() or time.clock()? Should we leave these function unchanged?
If *all* you know is that it is monotonic, then you can't -- but then you don't really have resolution either, as the clock may well speed up or slow down. If you do have resolution, and the only problem is that you don't know what the epoch was, then you can figure that out well enough by (once per type per process) comparing it to something that does have an epoch, like time.gmtime(). -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ
I just came to this thread. Having read the good arguments on both sides, I keep wondering why anybody would care about nanosecond precision in timestamps. Unless you're in charge of managing one of the few atomic reference clocks in the world, your clock is not going to tell time that accurate. (Hey, we don't even admit the existence of leap seconds in most places -- not that I mind. :-) What purpose is there to recording timestamps in nanoseconds? For clocks that start when the process starts running, float *is* (basically) good enough. For measuring e.g. file access times, there is no way that the actual time is know with anything like that precision (even if it is *recorded* as a number of milliseconds -- that's a different issue). Maybe it's okay to wait a few years on this, until either 128-bit floats are more common or cDecimal becomes the default floating point type? In the mean time for clock freaks we can have a few specialized APIs that return times in nanoseconds as a (long) integer. -- --Guido van Rossum (python.org/~guido)
On Wed, 15 Feb 2012 08:39:45 -0800 Guido van Rossum <guido@python.org> wrote:
What purpose is there to recording timestamps in nanoseconds? For clocks that start when the process starts running, float *is* (basically) good enough. For measuring e.g. file access times, there is no way that the actual time is know with anything like that precision (even if it is *recorded* as a number of milliseconds -- that's a different issue).
The number one use case, as far as I understand, is to have bit-identical file modification timestamps where it can matter. I agree that the rest is anecdotical. Regards Antoine.
On Wed, Feb 15, 2012 at 8:47 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Wed, 15 Feb 2012 08:39:45 -0800 Guido van Rossum <guido@python.org> wrote:
What purpose is there to recording timestamps in nanoseconds? For clocks that start when the process starts running, float *is* (basically) good enough. For measuring e.g. file access times, there is no way that the actual time is know with anything like that precision (even if it is *recorded* as a number of milliseconds -- that's a different issue).
The number one use case, as far as I understand, is to have bit-identical file modification timestamps where it can matter.
So that can be solved by adding extra fields st_{a,c,m}time_ns and an extra os.utime_ns() call. Only the rare tool for making 100% faithful backups of filesystems and the like would care. -- --Guido van Rossum (python.org/~guido)
2012/2/15 Guido van Rossum <guido@python.org>:
I just came to this thread. Having read the good arguments on both sides, I keep wondering why anybody would care about nanosecond precision in timestamps.
Python 3.3 exposes C functions that return timespec structure. This structure contains a timestamp with a resolution of 1 nanosecond, whereas the timeval structure has only a resolution of 1 microsecond. Examples of C functions -> Python functions: - timeval: gettimeofday() -> time.time() - timespec: clock_gettime() -> time.clock_gettime() - timespec: stat() -> os.stat() - etc. If we keep float, Python would have has worse precision than C just because it uses an inappropriate type (C uses two integers in timeval). Linux supports nanosecond timestamps since Linux 2.6, Windows supports 100 ns resolution since Windows 2000 or maybe before. It doesn't mean that Windows system clock is accurate: in practical, it's hard to get something better than 1 ms :-) But you may use QueryPerformanceCounter() is you need a bettre precision, it is used by time.clock() for example.
For measuring e.g. file access times, there is no way that the actual time is know with anything like that precision (even if it is *recorded* as a number of milliseconds -- that's a different issue).
If you need a real world example, here is an extract of http://en.wikipedia.org/wiki/Ext4: "Improved timestamps As computers become faster in general and as Linux becomes used more for mission-critical applications, the granularity of second-based timestamps becomes insufficient. To solve this, ext4 provides timestamps measured in nanoseconds. (...)" So nanosecond resolution is needed to check if a file is newer than another. Such test is common in build programs like make or scons. Filesystems resolution: - ext4: 1 ns - btrfs: 1 ns - NTFS: 100 ns - FAT32: 2 sec (yeah!) Victor
On Wed, 15 Feb 2012 18:23:55 +0100 Victor Stinner <victor.stinner@gmail.com> wrote:
Linux supports nanosecond timestamps since Linux 2.6, Windows supports 100 ns resolution since Windows 2000 or maybe before. It doesn't mean that Windows system clock is accurate: in practical, it's hard to get something better than 1 ms :-)
Well, do you think the Linux system clock is nanosecond-accurate? A nanosecond is what it takes to execute a couple of CPU instructions. Even on a real-time operating system, your nanosecond-precise measurement is already obsolete when it starts being processed by the higher-level application. A single cache miss in the CPU will make the precision worthless. And in a higher-level language like Python, the execution times of individual instructions are not specified or stable, so the resolution brings you nothing.
"Improved timestamps As computers become faster in general and as Linux becomes used more for mission-critical applications, the granularity of second-based timestamps becomes insufficient. To solve this, ext4 provides timestamps measured in nanoseconds. (...)"
This is a fallacy. Just because ext4 is able to *store* nanoseconds timestamps doesn't mean the timestamps are accurate up to that point.
Such test is common in build programs like make or scons.
scons is written in Python and its authors have not complained, AFAIK, about timestamp precision. Regards Antoine.
Linux supports nanosecond timestamps since Linux 2.6, Windows supports 100 ns resolution since Windows 2000 or maybe before. It doesn't mean that Windows system clock is accurate: in practical, it's hard to get something better than 1 ms :-)
Well, do you think the Linux system clock is nanosecond-accurate?
Test the following C program: ------------ #include <stdio.h> #include <time.h> int main(int argc, char **argv, char **arge) { struct timespec tps, tpe; if ((clock_gettime(CLOCK_REALTIME, &tps) != 0) || (clock_gettime(CLOCK_REALTIME, &tpe) != 0)) { perror("clock_gettime"); return -1; } printf("%lu s, %lu ns\n", tpe.tv_sec-tps.tv_sec, tpe.tv_nsec-tps.tv_nsec); return 0; } ------------ Compile it using gcc time.c -o time -lrt. It gives me differences smaller than 1000 ns on Ubuntu 11.10 and a Intel Core i5 @ 3.33GHz: $ ./a.out 0 s, 781 ns $ ./a.out 0 s, 785 ns $ ./a.out 0 s, 798 ns $ ./a.out 0 s, 818 ns $ ./a.out 0 s, 270 ns Victor
Le mercredi 15 février 2012 à 18:58 +0100, Victor Stinner a écrit :
It gives me differences smaller than 1000 ns on Ubuntu 11.10 and a Intel Core i5 @ 3.33GHz:
$ ./a.out 0 s, 781 ns $ ./a.out 0 s, 785 ns $ ./a.out 0 s, 798 ns $ ./a.out 0 s, 818 ns $ ./a.out 0 s, 270 ns
What is it supposed to prove exactly? There is a difference between being able to *represent* nanoseconds and being able to *measure* them; let alone give a precise meaning to them. (and ironically, floating-point numbers are precise enough to represent these numbers unambiguously) Regards Antoine.
Am 15.02.2012 19:10, schrieb Antoine Pitrou:
Le mercredi 15 février 2012 à 18:58 +0100, Victor Stinner a écrit :
It gives me differences smaller than 1000 ns on Ubuntu 11.10 and a Intel Core i5 @ 3.33GHz:
$ ./a.out 0 s, 781 ns $ ./a.out 0 s, 785 ns $ ./a.out 0 s, 798 ns $ ./a.out 0 s, 818 ns $ ./a.out 0 s, 270 ns
What is it supposed to prove exactly? There is a difference between being able to *represent* nanoseconds and being able to *measure* them; let alone give a precise meaning to them.
Linux *actually* is able to measure time in nanosecond precision, even though it is not able to keep its clock synchronized to UTC with a nanosecond accuracy. The way Linux does that is to use the time-stamping counter of the processor (the rdtsc instructions), which (originally) counts one unit per CPU clock. I believe current processors use slightly different countings (e.g. through the APIC), but still: you get a resolution within the clock frequency of the CPU quartz. With the quartz in Victor's machine, a single clock takes 0.3ns, so three of them make a nanosecond. As the quartz may not be entirely accurate (and also as the CPU frequency may change) you have to measure the clock rate against an external time source, but Linux has implemented algorithms for that. On my system, dmesg shows [ 2.236894] Refined TSC clocksource calibration: 2793.000 MHz. [ 2.236900] Switching to clocksource tsc Regards, Martin
On Wed, 15 Feb 2012 20:56:26 +0100 "Martin v. Löwis" <martin@v.loewis.de> wrote:
With the quartz in Victor's machine, a single clock takes 0.3ns, so three of them make a nanosecond. As the quartz may not be entirely accurate (and also as the CPU frequency may change) you have to measure the clock rate against an external time source, but Linux has implemented algorithms for that. On my system, dmesg shows
[ 2.236894] Refined TSC clocksource calibration: 2793.000 MHz. [ 2.236900] Switching to clocksource tsc
But that's still not meaningful. By the time clock_gettime() returns, an unpredictable number of nanoseconds have elapsed, and even more when returning to the Python evaluation loop. So the nanosecond precision is just an illusion, and a float should really be enough to represent durations for any task where Python is suitable as a language. Regards Antoine.
Antoine Pitrou wrote:
On Wed, 15 Feb 2012 20:56:26 +0100 "Martin v. Löwis" <martin@v.loewis.de> wrote:
With the quartz in Victor's machine, a single clock takes 0.3ns, so three of them make a nanosecond. As the quartz may not be entirely accurate (and also as the CPU frequency may change) you have to measure the clock rate against an external time source, but Linux has implemented algorithms for that. On my system, dmesg shows
[ 2.236894] Refined TSC clocksource calibration: 2793.000 MHz. [ 2.236900] Switching to clocksource tsc
But that's still not meaningful. By the time clock_gettime() returns, an unpredictable number of nanoseconds have elapsed, and even more when returning to the Python evaluation loop.
So the nanosecond precision is just an illusion, and a float should really be enough to represent durations for any task where Python is suitable as a language.
I reckon PyPy might be able to call clock_gettime() in a tight loop almost as frequently as the C program (although not with the overhead of converting to a decimal). Cheers, Mark.
2012/2/15 Mark Shannon <mark@hotpy.org>:
I reckon PyPy might be able to call clock_gettime() in a tight loop almost as frequently as the C program (although not with the overhead of converting to a decimal).
The nanosecond resolution is just as meaningless in C. -- Regards, Benjamin
Am 15.02.2012 21:06, schrieb Antoine Pitrou:
On Wed, 15 Feb 2012 20:56:26 +0100 "Martin v. Löwis" <martin@v.loewis.de> wrote:
With the quartz in Victor's machine, a single clock takes 0.3ns, so three of them make a nanosecond. As the quartz may not be entirely accurate (and also as the CPU frequency may change) you have to measure the clock rate against an external time source, but Linux has implemented algorithms for that. On my system, dmesg shows
[ 2.236894] Refined TSC clocksource calibration: 2793.000 MHz. [ 2.236900] Switching to clocksource tsc
But that's still not meaningful. By the time clock_gettime() returns, an unpredictable number of nanoseconds have elapsed, and even more when returning to the Python evaluation loop.
This is not exactly true: while the current time won't be what was returned when using it, it is certainly possible to predict how long it takes to return from a system call. So the result is not accurate, but meaningful. If you are formally arguing that uncertain evens may happen, such as the scheduler interrupting the thread: this is true for any clock reading; the actual time may be many milliseconds off by the time it is used. That is no reason to return to second resolution.
So the nanosecond precision is just an illusion, and a float should really be enough to represent durations for any task where Python is suitable as a language.
I agree with that statement - I was just refuting your claim that Linux cannot do nanosecond measurements. Please do recognize the point I made to Guido: despite us three agreeing that a float is good enough for time stamps, people will continue to submit patches and ask for new features until we give in. One way to delay that by several years could be to reject the PEP in a way that makes it clear that not only the specific approach is rejected, but any approach using anything else but floats. Regards, Martin
The way Linux does that is to use the time-stamping counter of the processor (the rdtsc instructions), which (originally) counts one unit per CPU clock. I believe current processors use slightly different countings (e.g. through the APIC), but still: you get a resolution within the clock frequency of the CPU quartz.
Linux has an internal clocksource API supporting different hardwares: PIT (Intel 8253 chipset): configurable frequency between 8.2 Hz and 1.2 MHz PMTMR (power management timer): ACPI clock with a frequency of 3.5 MHz TSC (Time Stamp Counter): frequency of your CPU HPET (High Precision Event Timer): frequency of at least 10 MHz (14.3 MHz on my computer) Linux has an algorithm to choose the best clock depend on its performance and accurary. Most clocks have a frequency higher than 1 MHz and so a resolution smaller than 1 us, even if the clock is not really accurate. I suppose that you can plug specialized hardward like an atomic clocks, or a GPS receiver, for a better accurary. Victor
So using floats we can match 100ns precision, right? On Wed, Feb 15, 2012 at 9:58 AM, Victor Stinner <victor.stinner@gmail.com> wrote:
Linux supports nanosecond timestamps since Linux 2.6, Windows supports 100 ns resolution since Windows 2000 or maybe before. It doesn't mean that Windows system clock is accurate: in practical, it's hard to get something better than 1 ms :-)
Well, do you think the Linux system clock is nanosecond-accurate?
Test the following C program: ------------ #include <stdio.h> #include <time.h>
int main(int argc, char **argv, char **arge) { struct timespec tps, tpe; if ((clock_gettime(CLOCK_REALTIME, &tps) != 0) || (clock_gettime(CLOCK_REALTIME, &tpe) != 0)) { perror("clock_gettime"); return -1; } printf("%lu s, %lu ns\n", tpe.tv_sec-tps.tv_sec, tpe.tv_nsec-tps.tv_nsec); return 0; } ------------ Compile it using gcc time.c -o time -lrt.
It gives me differences smaller than 1000 ns on Ubuntu 11.10 and a Intel Core i5 @ 3.33GHz:
$ ./a.out 0 s, 781 ns $ ./a.out 0 s, 785 ns $ ./a.out 0 s, 798 ns $ ./a.out 0 s, 818 ns $ ./a.out 0 s, 270 ns
Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
2012/2/15 Guido van Rossum <guido@python.org>:
So using floats we can match 100ns precision, right?
Nope, not to store an Epoch timestamp newer than january 1987:
x=2**29; (x+1e-7) != x # no loss of precision True x=2**30; (x+1e-7) != x # lose precision False print(datetime.timedelta(seconds=2**29)) 6213 days, 18:48:32 print(datetime.datetime.fromtimestamp(2**29)) 1987-01-05 19:48:32
Victor
On Wed, Feb 15, 2012 at 9:23 AM, Victor Stinner <victor.stinner@gmail.com> wrote:
2012/2/15 Guido van Rossum <guido@python.org>:
I just came to this thread. Having read the good arguments on both sides, I keep wondering why anybody would care about nanosecond precision in timestamps.
Python 3.3 exposes C functions that return timespec structure. This structure contains a timestamp with a resolution of 1 nanosecond, whereas the timeval structure has only a resolution of 1 microsecond. Examples of C functions -> Python functions:
- timeval: gettimeofday() -> time.time() - timespec: clock_gettime() -> time.clock_gettime() - timespec: stat() -> os.stat() - etc.
If we keep float, Python would have has worse precision than C just because it uses an inappropriate type (C uses two integers in timeval).
Linux supports nanosecond timestamps since Linux 2.6, Windows supports 100 ns resolution since Windows 2000 or maybe before. It doesn't mean that Windows system clock is accurate: in practical, it's hard to get something better than 1 ms :-) But you may use QueryPerformanceCounter() is you need a bettre precision, it is used by time.clock() for example.
For measuring e.g. file access times, there is no way that the actual time is know with anything like that precision (even if it is *recorded* as a number of milliseconds -- that's a different issue).
If you need a real world example, here is an extract of http://en.wikipedia.org/wiki/Ext4:
"Improved timestamps As computers become faster in general and as Linux becomes used more for mission-critical applications, the granularity of second-based timestamps becomes insufficient. To solve this, ext4 provides timestamps measured in nanoseconds. (...)"
So nanosecond resolution is needed to check if a file is newer than another. Such test is common in build programs like make or scons.
Filesystems resolution: - ext4: 1 ns - btrfs: 1 ns - NTFS: 100 ns - FAT32: 2 sec (yeah!)
This does not explain why microseconds aren't good enough. It seems none of the clocks involved can actually measure even relative time intervals more accurate than 100ns, and I expect that kernels don't actually keep their clock more accurate than milliseconds. (They may increment it by 1 microsecond approximately every microsecond, or even by 1 ns roughly every ns, but that doesn't fool me into believing all those digits of precision. I betcha that over say an hour even time deltas aren't more accurate than a microsecond, due to inevitable fluctuations in clock speed. It seems the argument goes simply "because Linux chose to go all the way to nanoseconds we must support nanoseconds" -- and Linux probably chose nanoseconds because that's what fits in 32 bits and there wasn't anything else to do with those bits. *Apart* from the specific use case of making an exact copy of a directory tree that can be verified by other tools that simply compare the nanosecond times for equality, I don't see any reason for complicating so many APIs to preserve the fake precision. As far as simply comparing whether one file is newer than another for tools like make/scons, I bet that it's in practice impossible to read a file and create another in less than a microsecond. (I actually doubt that you can do it faster than a millisecond, but for my argument I don't need that.) -- --Guido van Rossum (python.org/~guido)
*Apart* from the specific use case of making an exact copy of a directory tree that can be verified by other tools that simply compare the nanosecond times for equality, I don't see any reason for complicating so many APIs to preserve the fake precision. As far as simply comparing whether one file is newer than another for tools like make/scons, I bet that it's in practice impossible to read a file and create another in less than a microsecond. (I actually doubt that you can do it faster than a millisecond, but for my argument I don't need that.)
But this leads to the issue with specialized APIs just for nanoseconds (as the one you just proposed): people will use them *just because they are there*. It's like the byte-oriented APIs to do file names: most applications won't need them, either because the file names convert into character strings just fine, or because the emulation that we (now) provide will fall back to some nearly-accurate representation. Still, just because we have the byte APIs, people use them, to then find out that they don't work on Windows, so they will write very complicated code to make their code 100% correct. The same will happen with specialized API for nanosecond time stamps: people will be told to use them because it might matter, and not knowing for sure that it won't matter to them, they will use them. Therefore, I feel that we must not introduced such specialized APIs. Not supporting ns timestamps is something I can readily agree to. However, contributors won't agree to that, and will insist that these be added (and keep writing patches to do so) until it does get added. Some of them are core contributors, so there is no easy way to stop them :-) Regards, Martin
On Wed, Feb 15, 2012 at 11:38 AM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
*Apart* from the specific use case of making an exact copy of a directory tree that can be verified by other tools that simply compare the nanosecond times for equality, I don't see any reason for complicating so many APIs to preserve the fake precision. As far as simply comparing whether one file is newer than another for tools like make/scons, I bet that it's in practice impossible to read a file and create another in less than a microsecond. (I actually doubt that you can do it faster than a millisecond, but for my argument I don't need that.)
But this leads to the issue with specialized APIs just for nanoseconds (as the one you just proposed): people will use them *just because they are there*.
It's like the byte-oriented APIs to do file names: most applications won't need them, either because the file names convert into character strings just fine, or because the emulation that we (now) provide will fall back to some nearly-accurate representation. Still, just because we have the byte APIs, people use them, to then find out that they don't work on Windows, so they will write very complicated code to make their code 100% correct.
The same will happen with specialized API for nanosecond time stamps: people will be told to use them because it might matter, and not knowing for sure that it won't matter to them, they will use them.
Therefore, I feel that we must not introduced such specialized APIs.
You have a point, but applies just as much to the proposal in the PEP -- floats and Decimal are often not quite compatible, but people will pass type=Decimal to the clock and stat functions just because they can. The problems with mixing floats and Decimal are probably just as nasty as those with mixing byte and str. At least if people are mixing nanoseconds (integers) and seconds (floats) they will quickly notice results that are a billion times off.
Not supporting ns timestamps is something I can readily agree to.
Me too.
However, contributors won't agree to that, and will insist that these be added (and keep writing patches to do so) until it does get added. Some of them are core contributors, so there is no easy way to stop them :-)
Actually I think a rejected PEP would be an excellent way to stop this. Maybe an alternative PEP could be written that supports the filesystem copying use case only, using some specialized ns APIs? I really think that all you need is st_{a,c,m}time_ns fields and os.utime_ns(). -- --Guido van Rossum (python.org/~guido)
Maybe an alternative PEP could be written that supports the filesystem copying use case only, using some specialized ns APIs? I really think that all you need is st_{a,c,m}time_ns fields and os.utime_ns().
I'm -1 on that, because it will make people write complicated code. Regards, Martin
2012/2/16 "Martin v. Löwis" <martin@v.loewis.de>:
Maybe an alternative PEP could be written that supports the filesystem copying use case only, using some specialized ns APIs? I really think that all you need is st_{a,c,m}time_ns fields and os.utime_ns().
I'm -1 on that, because it will make people write complicated code.
Python 3.3 *has already* APIs for nanosecond timestamps: os.utimensat(), os.futimens(), signal.sigtimedwait(), etc. These functions expect a (seconds: int, nanoseconds: int) tuple. We have to decide before the Python 3.3 release if this API is just fine, or if it should be changed. After the release, it will be more difficult to change the API. If os.utimensat() expects a tuple, it would be nice to have a function getting time as a tuple, like the C language has the clock_gettime() function to get a timestamp as a timespec structure. During the discussion, many developers wanted a type allowing to do arithmetic operations like t2-t1 to compute a delta, or t+delta to "set" a timezone. It is possible to do arithmetic on a tuple, but it is not practical and I don't like a type with a fixed resolution (in some cases you need millisecond, microseconds or 100 ns resolution). If you consider that the float loss of precision is not an issue for nanoseconds, we should use float for os.utimensat(), os.futimens() and signal.sigtimedwait(), just for consistency. Victor
Am 16.02.2012 10:51, schrieb Victor Stinner:
2012/2/16 "Martin v. Löwis" <martin@v.loewis.de>:
Maybe an alternative PEP could be written that supports the filesystem copying use case only, using some specialized ns APIs? I really think that all you need is st_{a,c,m}time_ns fields and os.utime_ns().
I'm -1 on that, because it will make people write complicated code.
Python 3.3 *has already* APIs for nanosecond timestamps: os.utimensat(), os.futimens(), signal.sigtimedwait(), etc. These functions expect a (seconds: int, nanoseconds: int) tuple.
I'm -1 on adding these APIs, also. Since Python 3.3 is not released yet, it's not too late to revert them.
If you consider that the float loss of precision is not an issue for nanoseconds, we should use float for os.utimensat(), os.futimens() and signal.sigtimedwait(), just for consistency.
I'm wondering what use cases utimensat and futimens have that are not covered by utime/utimes (except for the higher resolution). Keeping the "ns" in the name but not doing nanoseconds would be bad, IMO. For sigtimedwait, accepting float is indeed the right thing to do. In the long run, we should see whether using 128-bit floats is feasible. Regards, Martin
Am 16.02.2012 11:14, schrieb "Martin v. Löwis":
Am 16.02.2012 10:51, schrieb Victor Stinner:
2012/2/16 "Martin v. Löwis" <martin@v.loewis.de>:
Maybe an alternative PEP could be written that supports the filesystem copying use case only, using some specialized ns APIs? I really think that all you need is st_{a,c,m}time_ns fields and os.utime_ns().
I'm -1 on that, because it will make people write complicated code.
Python 3.3 *has already* APIs for nanosecond timestamps: os.utimensat(), os.futimens(), signal.sigtimedwait(), etc. These functions expect a (seconds: int, nanoseconds: int) tuple.
I'm -1 on adding these APIs, also. Since Python 3.3 is not released yet, it's not too late to revert them.
+1. Georg
Georg Brandl wrote:
Am 16.02.2012 11:14, schrieb "Martin v. Löwis":
Am 16.02.2012 10:51, schrieb Victor Stinner:
2012/2/16 "Martin v. Löwis" <martin@v.loewis.de>:
Maybe an alternative PEP could be written that supports the filesystem copying use case only, using some specialized ns APIs? I really think that all you need is st_{a,c,m}time_ns fields and os.utime_ns(). I'm -1 on that, because it will make people write complicated code. Python 3.3 *has already* APIs for nanosecond timestamps: os.utimensat(), os.futimens(), signal.sigtimedwait(), etc. These functions expect a (seconds: int, nanoseconds: int) tuple. I'm -1 on adding these APIs, also. Since Python 3.3 is not released yet, it's not too late to revert them.
+1.
Sorry, is that +1 on the revert, or +1 on the APIs? -- Steven
Am 17.02.2012 10:28, schrieb Steven D'Aprano:
Georg Brandl wrote:
Am 16.02.2012 11:14, schrieb "Martin v. Löwis":
Am 16.02.2012 10:51, schrieb Victor Stinner:
2012/2/16 "Martin v. Löwis" <martin@v.loewis.de>:
Maybe an alternative PEP could be written that supports the filesystem copying use case only, using some specialized ns APIs? I really think that all you need is st_{a,c,m}time_ns fields and os.utime_ns(). I'm -1 on that, because it will make people write complicated code. Python 3.3 *has already* APIs for nanosecond timestamps: os.utimensat(), os.futimens(), signal.sigtimedwait(), etc. These functions expect a (seconds: int, nanoseconds: int) tuple. I'm -1 on adding these APIs, also. Since Python 3.3 is not released yet, it's not too late to revert them.
+1.
Sorry, is that +1 on the revert, or +1 on the APIs?
It's on what Martin said; you're right, it was a bit too ambiguous even for a RM :) Georg
On 02/16/2012 02:14 AM, "Martin v. Löwis" wrote:
Am 16.02.2012 10:51, schrieb Victor Stinner:
2012/2/16 "Martin v. Löwis"<martin@v.loewis.de>:
Maybe an alternative PEP could be written that supports the filesystem copying use case only, using some specialized ns APIs? I really think that all you need is st_{a,c,m}time_ns fields and os.utime_ns(). I'm -1 on that, because it will make people write complicated code. Python 3.3 *has already* APIs for nanosecond timestamps: os.utimensat(), os.futimens(), signal.sigtimedwait(), etc. These functions expect a (seconds: int, nanoseconds: int) tuple. I'm -1 on adding these APIs, also. Since Python 3.3 is not released yet, it's not too late to revert them.
+1. I also think they should be removed in favor of adding support for a nanosecond-friendly representation to the existing APIs (os.utime, etc). Python is not C, we don't need three functions that do the same thing but take different representations as their arguments. /arry
On 16/02/12 06:43, Guido van Rossum wrote:
This does not explain why microseconds aren't good enough. It seems none of the clocks involved can actually measure even relative time intervals more accurate than 100ns, and I expect that kernels don't actually keep their clock more accurate than milliseconds.
I gather that modern x86 CPUs have a counter that keeps track of time down to a nanosecond or so by counting clock cycles. In principle it seems like a kernel should be able to make use of it in conjunction with other timekeeping hardware to produce nanosecond-resolution timestamps. Whether any existing kernel actually does that is another matter. It probably isn't worth the bother for things like file timestamps, where the time taken to execute the system call that modifies the file is likely to be several orders of magnitude larger. Until we have computers with terahertz clocks and gigahertz disk drives, it seems like a rather theoretical issue. And it doesn't look like Mr. Moore is going to give us anything like that any time soon. -- Greg
On Wed, Feb 15, 2012 at 6:06 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
On 16/02/12 06:43, Guido van Rossum wrote:
This does not explain why microseconds aren't good enough. It seems none of the clocks involved can actually measure even relative time intervals more accurate than 100ns, and I expect that kernels don't actually keep their clock more accurate than milliseconds.
I gather that modern x86 CPUs have a counter that keeps track of time down to a nanosecond or so by counting clock cycles. In principle it seems like a kernel should be able to make use of it in conjunction with other timekeeping hardware to produce nanosecond-resolution timestamps.
Whether any existing kernel actually does that is another matter. It probably isn't worth the bother for things like file timestamps, where the time taken to execute the system call that modifies the file is likely to be several orders of magnitude larger.
Ironically, file timestamps are likely the only place where it matters. Read the rest of the thread. -- --Guido van Rossum (python.org/~guido)
Guido van Rossum <guido@python.org> writes:
On Wed, Feb 15, 2012 at 6:06 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
It probably isn't worth the bother for things like file timestamps, where the time taken to execute the system call that modifies the file is likely to be several orders of magnitude larger.
Ironically, file timestamps are likely the only place where it matters. Read the rest of the thread.
And log message timestamps. The *two* only places where it matters, file timestamps and log messages. And communication protocols. The *three* only places – I'll come in again. -- \ “Why should I care about posterity? What's posterity ever done | `\ for me?” —Groucho Marx | _o__) | Ben Finney
On 02/15/2012 09:43 AM, Guido van Rossum wrote:
*Apart* from the specific use case of making an exact copy of a directory tree that can be verified by other tools that simply compare the nanosecond times for equality,
A data point on this specific use case. The following code throws its assert ~90% of the time in Python 3.2.2 on a modern Linux machine (assuming "foo" exists and "bar" does not): import shutil import os shutil.copy2("foo", "bar") assert os.stat("foo").st_mtime == os.stat("bar").st_mtime The problem is with os.utime. IIUC stat() on Linux added nanosecond atime/mtime support back in 2.5. But the corresponding utime() functions to write nanosecond atime/mtime didn't appear until relatively recently--and Python 3.2 doesn't use them. With stat_float_times turned on, os.stat effectively reads with ~100-nanosecond precision, but os.utime still only writes with microsecond precision. I fixed this in trunk last September (issue 12904); os.utime now preserves all the precision that Python currently conveys. One way of looking at it: in Python 3.2 it's already pretty bad and almost nobody is complaining. (There's me, I guess, but I scratched my itch.) /arry
On Wed, Feb 15, 2012 at 7:28 PM, Larry Hastings <larry@hastings.org> wrote:
On 02/15/2012 09:43 AM, Guido van Rossum wrote:
*Apart* from the specific use case of making an exact copy of a directory tree that can be verified by other tools that simply compare the nanosecond times for equality,
A data point on this specific use case. The following code throws its assert ~90% of the time in Python 3.2.2 on a modern Linux machine (assuming "foo" exists and "bar" does not):
import shutil import os shutil.copy2("foo", "bar") assert os.stat("foo").st_mtime == os.stat("bar").st_mtime
The problem is with os.utime. IIUC stat() on Linux added nanosecond atime/mtime support back in 2.5. But the corresponding utime() functions to write nanosecond atime/mtime didn't appear until relatively recently--and Python 3.2 doesn't use them. With stat_float_times turned on, os.stat effectively reads with ~100-nanosecond precision, but os.utime still only writes with microsecond precision. I fixed this in trunk last September (issue 12904); os.utime now preserves all the precision that Python currently conveys.
One way of looking at it: in Python 3.2 it's already pretty bad and almost nobody is complaining. (There's me, I guess, but I scratched my itch.)
So, essentially you fixed this particular issue without having to do anything as drastic as the proposed PEP... -- --Guido van Rossum (python.org/~guido)
On 02/15/2012 08:12 PM, Guido van Rossum wrote:
On Wed, Feb 15, 2012 at 7:28 PM, Larry Hastings<larry@hastings.org> wrote:
I fixed this in trunk last September (issue 12904); os.utime now preserves all the precision that Python currently conveys. So, essentially you fixed this particular issue without having to do anything as drastic as the proposed PEP...
I wouldn't say that. The underlying representation is still nanoseconds, and Python only preserves roughly hundred-nanosecond precision. My patch only ensures that reading and writing atime/mtime looks consistent to Python programs using the os module. Any code that examined the nanosecond-precise values from stat()--written in Python or any other language--would notice the values didn't match. I'm definitely +1 for extending Python to represent nanosecond precision ctime/atime/mtime, but doing so in a way that permits seamlessly adding more precision down the road when the Linux kernel hackers get bored again and add femtosecond resolution. (And then presumably attosecond resolution four years later.) I haven't read 410 yet so I have no opinion on it. I wrote a patch last year that adds new Decimal ctime/mtime/atime fields to the output of stat, but it's a horrific performance regression (os.stat is 10x slower) and the reviewers were ambivalent so I've let it rot. Anyway I now agree that we should improve the precision of datetime objects and use those instead of Decimal. (But not timedeltas--ctime/mtime/atime are absolute times, not deltas.) /arry
A data point on this specific use case. The following code throws its assert ~90% of the time in Python 3.2.2 on a modern Linux machine (assuming "foo" exists and "bar" does not):
import shutil import os shutil.copy2("foo", "bar") assert os.stat("foo").st_mtime == os.stat("bar").st_mtime
It works because Python uses float for utime() and for stat(). But this assertion may fail if another program checks file timestamps without lossing precision (because of float), e.g. a program written in C that compares st_*time and st_*time_ns fields.
I fixed this in trunk last September (issue 12904); os.utime now preserves all the precision that Python currently conveys.
Let's try in a ext4 filesystem: $ ~/prog/python/timestamp/python Python 3.3.0a0 (default:35d6cc531800+, Feb 16 2012, 13:32:56)
import decimal, os, shutil, time open("test", "x").close() shutil.copy2("test", "test2") os.stat("test", timestamp=decimal.Decimal).st_mtime Decimal('1329395871.874886224') os.stat("test2", timestamp=decimal.Decimal).st_mtime Decimal('1329395871.873350282') os.stat("test2", timestamp=decimal.Decimal).st_mtime - os.stat("test", timestamp=decimal.Decimal).st_mtime Decimal('-0.001535942')
So shutil.copy2() failed to copy the timestamp: test2 is 1 ms older than test... Let's try with a program not written in Python: GNU make. The makefile: --------- test2: test @echo "Copy test into test2" @~/prog/python/default/python -c 'import shutil; shutil.copy2("test", "test2")' test: @echo "Create test" @touch test clean: rm -f test test2 --------- First try: $ make clean rm -f test test2 $ make Create test Copy test into test2 $ make Copy test into test2 => test2 is always older than test and so is always "regenerated". Second try: $ make clean rm -f test test2 $ make Create test Copy test into test2 $ make make: `test2' is up to date. => oh, here test2 is newer or has the exact same modification time, so there is no need to rebuild it. Victor
On Thu, 16 Feb 2012 13:46:18 +0100 Victor Stinner <victor.stinner@gmail.com> wrote:
Let's try in a ext4 filesystem:
$ ~/prog/python/timestamp/python Python 3.3.0a0 (default:35d6cc531800+, Feb 16 2012, 13:32:56)
import decimal, os, shutil, time open("test", "x").close() shutil.copy2("test", "test2") os.stat("test", timestamp=decimal.Decimal).st_mtime Decimal('1329395871.874886224') os.stat("test2", timestamp=decimal.Decimal).st_mtime Decimal('1329395871.873350282')
This looks fishy. Floating-point numbers are precise enough to represent the difference between these two numbers:
f = 1329395871.874886224 f.hex() '0x1.3cf3e27f7fe23p+30' g = 1329395871.873350282 g.hex() '0x1.3cf3e27f7e4f9p+30'
If I run your snippet and inspect modification times using `stat`, the difference is much smaller (around 10 ns, not 1 ms): $ stat test | \grep Modify Modify: 2012-02-16 13:51:25.643597139 +0100 $ stat test2 | \grep Modify Modify: 2012-02-16 13:51:25.643597126 +0100 In other words, you should check your PEP implementation for bugs. Regards Antoine.
If I run your snippet and inspect modification times using `stat`, the difference is much smaller (around 10 ns, not 1 ms):
$ stat test | \grep Modify Modify: 2012-02-16 13:51:25.643597139 +0100 $ stat test2 | \grep Modify Modify: 2012-02-16 13:51:25.643597126 +0100
The loss of precision is not constant: it depends on the timestamp value. Another example using the stat program: ------------ import decimal, os, shutil, time try: os.unlink("test") except OSError: pass try: os.unlink("test2") except OSError: pass open("test", "x").close() shutil.copy2("test", "test2") print(os.stat("test", timestamp=decimal.Decimal).st_mtime) print(os.stat("test2", timestamp=decimal.Decimal).st_mtime) print(os.stat("test2", timestamp=decimal.Decimal).st_mtime - os.stat("test", timestamp=decimal.Decimal).st_mtime) os.system("stat test|grep ^Mod") os.system("stat test2|grep ^Mod") ------------ Outputs: ------------ $ ./python x.py 1329398229.918858600 1329398229.918208829 -0.000649771 Modify: 2012-02-16 14:17:09.918858600 +0100 Modify: 2012-02-16 14:17:09.918208829 +0100 $ ./python x.py 1329398230.862858588 1329398230.861343658 -0.001514930 Modify: 2012-02-16 14:17:10.862858588 +0100 Modify: 2012-02-16 14:17:10.861343658 +0100 $ ./python x.py 1329398232.450858570 1329398232.450067044 -0.000791526 Modify: 2012-02-16 14:17:12.450858570 +0100 Modify: 2012-02-16 14:17:12.450067044 +0100 $ ./python x.py 1329398233.090858561 1329398233.090853761 -0.000004800 Modify: 2012-02-16 14:17:13.090858561 +0100 Modify: 2012-02-16 14:17:13.090853761 +0100 ------------ The loss of precision is between 1 ms and 4 us. Decimal timestamps display exactly the same value than the stat program: I don't see any bug in this example. Victor PS: Don't try os.utime(Decimal) with my patch, the conversion from Decimal to _PyTime_t does still use float internally (I know this issue, it should be fixed in my patch) and so loss precision ;-)
Le jeudi 16 février 2012 à 14:20 +0100, Victor Stinner a écrit :
If I run your snippet and inspect modification times using `stat`, the difference is much smaller (around 10 ns, not 1 ms):
$ stat test | \grep Modify Modify: 2012-02-16 13:51:25.643597139 +0100 $ stat test2 | \grep Modify Modify: 2012-02-16 13:51:25.643597126 +0100
The loss of precision is not constant: it depends on the timestamp value.
Well, I've tried several times and I can't reproduce a 1 ms difference.
The loss of precision is between 1 ms and 4 us.
It still looks fishy to me. IEEE doubles have a 52-bit mantissa. Since the integral part of a timestamp takes 32 bits or less, there are still 20 bits left for the fractional part: which allows for at least a 1 µs precision (2**20 ~= 10**6). A 1 ms precision loss looks like a bug. Regards Antoine.
$ stat test | \grep Modify Modify: 2012-02-16 13:51:25.643597139 +0100 $ stat test2 | \grep Modify Modify: 2012-02-16 13:51:25.643597126 +0100
The loss of precision is not constant: it depends on the timestamp value.
Well, I've tried several times and I can't reproduce a 1 ms difference.
The loss of precision is between 1 ms and 4 us.
It still looks fishy to me. IEEE doubles have a 52-bit mantissa. Since the integral part of a timestamp takes 32 bits or less, there are still 20 bits left for the fractional part: which allows for at least a 1 µs precision (2**20 ~= 10**6). A 1 ms precision loss looks like a bug.
Oh... It was a important bug in my function used to change the denominator of a timestamp. I tried to workaround integer overflow, but I added a bug. I changed my patch to use PyLong which has no integer overflow issue. Fixed example:
open("test", "x").close() import shutil shutil.copy2("test", "test2") [94386 refs] print(os.stat("test", datetime.datetime).st_mtime) 2012-02-16 21:58:30.835062+00:00 print(os.stat("test2", datetime.datetime).st_mtime) 2012-02-16 21:58:30.835062+00:00 print(os.stat("test", decimal.Decimal).st_mtime) 1329429510.835061686 print(os.stat("test2", decimal.Decimal).st_mtime) 1329429510.835061789 os.stat("test2", decimal.Decimal).st_mtime - os.stat("test", decimal.Decimal).st_mtime Decimal('1.03E-7')
So the difference is only 0.1 us (100 ns). It doesn't change anything to the Makefile issue, if timestamps are different in a single nanosecond, they are seen as different by make (by another program comparing the timestamp of two files using nanosecond precision). Victor
On Thu, Feb 16, 2012 at 2:04 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
It doesn't change anything to the Makefile issue, if timestamps are different in a single nanosecond, they are seen as different by make (by another program comparing the timestamp of two files using nanosecond precision).
But make doesn't compare timestamps for equality -- it compares for newer. That shouldn't be so critical, since if there is an *actual* causal link between file A and B, the difference in timestamps should always be much larger than 100 ns. -- --Guido van Rossum (python.org/~guido)
2012/2/16 Guido van Rossum <guido@python.org>:
On Thu, Feb 16, 2012 at 2:04 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
It doesn't change anything to the Makefile issue, if timestamps are different in a single nanosecond, they are seen as different by make (by another program comparing the timestamp of two files using nanosecond precision).
But make doesn't compare timestamps for equality -- it compares for newer. That shouldn't be so critical, since if there is an *actual* causal link between file A and B, the difference in timestamps should always be much larger than 100 ns.
The problem is that shutil.copy2() produces sometimes *older* timestamp :-/ As shown in my previous email: in such case, make will always rebuild the second file instead of only build it once. Example with two consecutive runs: $ ./python diff.py 1329432426.650957952 1329432426.650958061 1.09E-7 $ ./python diff.py 1329432427.854957910 1329432427.854957819 -9.1E-8 Victor
On Thu, Feb 16, 2012 at 2:48 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
2012/2/16 Guido van Rossum <guido@python.org>:
On Thu, Feb 16, 2012 at 2:04 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
It doesn't change anything to the Makefile issue, if timestamps are different in a single nanosecond, they are seen as different by make (by another program comparing the timestamp of two files using nanosecond precision).
But make doesn't compare timestamps for equality -- it compares for newer. That shouldn't be so critical, since if there is an *actual* causal link between file A and B, the difference in timestamps should always be much larger than 100 ns.
The problem is that shutil.copy2() produces sometimes *older* timestamp :-/ As shown in my previous email: in such case, make will always rebuild the second file instead of only build it once.
Example with two consecutive runs:
$ ./python diff.py 1329432426.650957952 1329432426.650958061 1.09E-7
$ ./python diff.py 1329432427.854957910 1329432427.854957819 -9.1E-8
Have you been able to reproduce this with an actual Makefile? What's the scenario? I'm thinking of a Makefile like this: a: cp /dev/null a b: a cp a b Now say a doesn't exist and we run "make b". This will create a and then b. I can't believe that the difference between the mtimes of a and b is so small that if you copy the directory containing Makefile, a and b using a Python tool that reproduces mtimes only with usec accuracy you'll end up with a directory where a is newer than n. What am I missing? -- --Guido van Rossum (python.org/~guido)
The problem is that shutil.copy2() produces sometimes *older* timestamp :-/ (...)
Have you been able to reproduce this with an actual Makefile? What's the scenario?
Hum. I asked the Internet who use shutil.copy2() and I found an "old" issue (Decimal('43462967.173053') seconds ago): Python issue #10148: st_mtime differs after shutil.copy2 (october 2010) "When copying a file with shutil.copy2() between two ext4 filesystems on 64-bit Linux, the mtime of the destination file is different after the copy. It appears as if the resolution is slightly different, so the mtime is truncated slightly. (...)" I don't know if it is a "theorical" or "practical" issue. Then I found: Python issue #11941: Support st_atim, st_mtim and st_ctim attributes in os.stat_result "They would expose relevant functionality from libc's stat() and provide better precision than floating-point-based st_atime, st_mtime and st_ctime attributes." Which is connected the issue that motivated me to write the PEP: Python issue #11457: os.stat(): add new fields to get timestamps as Decimal objects with nanosecond resolution "Support for such precision is available at the least on 2.6 Linux kernels." "This is important for example with the tarfile module with the pax tar format. The POSIX tar standard[3] mandates storing the mtime in the extended header (if it is not an integer) with as much precision as is available in the underlying file system, and likewise to restore this time properly upon extraction. Currently this is not possible." "The mailbox module would benefit from having this precision available." For the tarfile use case, we need at least a way to get the modification time with a nanosecond resolution *and* to set the modification time with a nanosecond resolution. We just need to decide which type is the best for this usecase, which is the purpose of the PEP 410 :-) Another use case of nanosecond timestamps are profilers (and maybe benchmark tools). The profiler itself may be implemented in a different language than Python. For example, DTrace uses nanosecond timestamps. -- Other examples. Debian bug #627460: (gcp) Expose nanoseconds in python (15 May 2011) http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=627460 Debian bug #626787: (gcp) gcp: timestamp is not always copied exact http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=626787 "When copying a (large) file from HDD to USB the files timestamp is not copied exact. It seems to work fine with smaller files (up to 1Gig), I couldn't spot the time-diff on these files." ("gcp is a grid enabled version of the scp copy command.") fuse-python supports nanosecond resolution: they chose to mimick the C API using: class Timespec(FuseStruct): """ Cf. struct timespec in time.h: http://www.opengroup.org/onlinepubs/009695399/basedefs/time.h.html """ def __init__(self, name=None, **kw): self.tv_sec = None self.tv_nsec = None kw['name'] = name FuseStruct.__init__(self, **kw) Python issue #9079: Make gettimeofday available in time module "... exposes gettimeofday as time.gettimeofday() returning (sec, usec) pair" The Oracle database supports timestamps with a nanosecond resolution. A related article about Ruby: http://marcricblog.blogspot.com/2010/04/who-cares-about-nanosecond.html "Files are uploaded in groups (fifteen maximum). It was important to know the order on which files have been upload. Depending on the size of the files and users’ internet broadband capacity, some files could be uploaded in the same second." And a last one for the fun: "This Week in Python Stupidity: os.stat, os.utime and Sub-Second Timestamps" (November 15, 2009) http://ciaranm.wordpress.com/2009/11/15/this-week-in-python-stupidity-os-sta... "Yup, that’s right, Python’s underlying type for floats is an IEEE 754 double, which is only good for about sixteen decimal digits. With ten digits before the decimal point, that leaves six for sub-second resolutions, which is three short of the range required to preserve POSIX nanosecond-resolution timestamps. With dates after the year 2300 or so, that leaves only five accurate digits, which isn’t even enough to deal with microseconds correctly. Brilliant." "Python does have a half-assed fixed point type. Not sure why they don’t use it more." Victor
So, make is unaffected. In my first post on this subject I already noted that the only real use case is making a directory or filesystem copy and then verifying that the copy is identical using native tools that compare times with nsec precision. At least one of the bugs you quote is about the current 1-second granularity, which is already addressed by using floats (up to ~usec precision). The fs copy use case should be pretty rare, and I would be okay with a separate lower-level API that uses a long to represent nanoseconds (though MvL doesn't like that either). Using (seconds, nsec) tuples is silly though. --Guido On Thu, Feb 16, 2012 at 4:04 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
The problem is that shutil.copy2() produces sometimes *older* timestamp :-/ (...)
Have you been able to reproduce this with an actual Makefile? What's the scenario?
Hum. I asked the Internet who use shutil.copy2() and I found an "old" issue (Decimal('43462967.173053') seconds ago):
Python issue #10148: st_mtime differs after shutil.copy2 (october 2010) "When copying a file with shutil.copy2() between two ext4 filesystems on 64-bit Linux, the mtime of the destination file is different after the copy. It appears as if the resolution is slightly different, so the mtime is truncated slightly. (...)"
I don't know if it is a "theorical" or "practical" issue. Then I found:
Python issue #11941: Support st_atim, st_mtim and st_ctim attributes in os.stat_result "They would expose relevant functionality from libc's stat() and provide better precision than floating-point-based st_atime, st_mtime and st_ctime attributes."
Which is connected the issue that motivated me to write the PEP:
Python issue #11457: os.stat(): add new fields to get timestamps as Decimal objects with nanosecond resolution "Support for such precision is available at the least on 2.6 Linux kernels." "This is important for example with the tarfile module with the pax tar format. The POSIX tar standard[3] mandates storing the mtime in the extended header (if it is not an integer) with as much precision as is available in the underlying file system, and likewise to restore this time properly upon extraction. Currently this is not possible." "The mailbox module would benefit from having this precision available."
For the tarfile use case, we need at least a way to get the modification time with a nanosecond resolution *and* to set the modification time with a nanosecond resolution. We just need to decide which type is the best for this usecase, which is the purpose of the PEP 410 :-)
Another use case of nanosecond timestamps are profilers (and maybe benchmark tools). The profiler itself may be implemented in a different language than Python. For example, DTrace uses nanosecond timestamps.
--
Other examples.
Debian bug #627460: (gcp) Expose nanoseconds in python (15 May 2011) http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=627460 Debian bug #626787: (gcp) gcp: timestamp is not always copied exact http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=626787 "When copying a (large) file from HDD to USB the files timestamp is not copied exact. It seems to work fine with smaller files (up to 1Gig), I couldn't spot the time-diff on these files." ("gcp is a grid enabled version of the scp copy command.")
fuse-python supports nanosecond resolution: they chose to mimick the C API using:
class Timespec(FuseStruct): """ Cf. struct timespec in time.h: http://www.opengroup.org/onlinepubs/009695399/basedefs/time.h.html """ def __init__(self, name=None, **kw): self.tv_sec = None self.tv_nsec = None kw['name'] = name FuseStruct.__init__(self, **kw)
Python issue #9079: Make gettimeofday available in time module "... exposes gettimeofday as time.gettimeofday() returning (sec, usec) pair"
The Oracle database supports timestamps with a nanosecond resolution. A related article about Ruby: http://marcricblog.blogspot.com/2010/04/who-cares-about-nanosecond.html "Files are uploaded in groups (fifteen maximum). It was important to know the order on which files have been upload. Depending on the size of the files and users’ internet broadband capacity, some files could be uploaded in the same second."
And a last one for the fun:
"This Week in Python Stupidity: os.stat, os.utime and Sub-Second Timestamps" (November 15, 2009) http://ciaranm.wordpress.com/2009/11/15/this-week-in-python-stupidity-os-sta... "Yup, that’s right, Python’s underlying type for floats is an IEEE 754 double, which is only good for about sixteen decimal digits. With ten digits before the decimal point, that leaves six for sub-second resolutions, which is three short of the range required to preserve POSIX nanosecond-resolution timestamps. With dates after the year 2300 or so, that leaves only five accurate digits, which isn’t even enough to deal with microseconds correctly. Brilliant." "Python does have a half-assed fixed point type. Not sure why they don’t use it more."
Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)
Guido van Rossum wrote:
if there is an *actual* causal link between file A and B, the difference in timestamps should always be much larger than 100 ns.
And if there isn't a causal link, simultaneity is relative anyway. To Fred sitting at his computer, file A might have been created before file B, but to George running from the other end of the building in response to an urgent bug report, it could be the other way around. So to be *really* accurate, the API needs a way for the caller to indicate a frame of reference. -- Greg
Greg Ewing <greg.ewing@canterbury.ac.nz> writes:
Guido van Rossum wrote:
if there is an *actual* causal link between file A and B, the difference in timestamps should always be much larger than 100 ns.
And if there isn't a causal link, simultaneity is relative anyway. To Fred sitting at his computer, file A might have been created before file B, but to George running from the other end of the building in response to an urgent bug report, it could be the other way around.
Does that change if Fred and George are separated in the building by twenty floors? -- \ “Kill myself? Killing myself is the last thing I'd ever do.” | `\ —Homer, _The Simpsons_ | _o__) | Ben Finney
On Wed, Feb 15, 2012 at 11:39 AM, Guido van Rossum <guido@python.org> wrote:
Maybe it's okay to wait a few years on this, until either 128-bit floats are more common or cDecimal becomes the default floating point type?
+1
Maybe it's okay to wait a few years on this, until either 128-bit floats are more common or cDecimal becomes the default floating point type? In the mean time for clock freaks we can have a few specialized APIs that return times in nanoseconds as a (long) integer.
I don't think that the default float type does really matter here. If I understood correctly, the major issue with Decimal is that Decimal is not fully "compatible" with float: Decimal+float raises a TypeError. Can't we improve the compatibility between Decimal and float, e.g. by allowing Decimal+float? Decimal (base 10) + float (base 2) may loss precision and this issue matters in some use cases. So we still need a way to warn the user on loss of precision. We may add a global flag to allow Decimal+float and turn it on by default. Developers concerns by loss of precision can just turn the flag off at startup. Something like what we did in Python 2: allow str+unicode, and only switch to unicode when unicode was mature enough and well accepted :-) -- I have some questions about 128-bit float and Decimal. Currently, there is only one hardware supporting "IEEE 754-2008 the 128-bit base-2": the IBM S/390, which is quite rare (at least on desktop :-)). Should we expect more CPU supporting this type in the (near) future? GCC, ICC and Clang implement this type in software, but there are license issues. At least with GCC which uses MPFR: the library is distributed under the GNU LGPL license, which is not compatible with the Python license. I didn't check Clang and ICC. I don't think that we can use 128-bit float by default before it is commonly available on hardware, because arithmetic in software is usually slower. We do also support platforms with a compiler not supporting 128-bit float, e.g. Windows with Visual Studio 2008. floating point in base 2 has also an issue with timestamp using 10^k resolution: such timestamp cannot be represented exactly in base 2 because 5 is coprime with 2 (10=2*5). The loss of precision is smaller than 10^-9 (nanosecond) with 128-bit float (for Epoch timestamps), but it would be more "natural" to use the base 10. System calls and functions of the C standard library use types with 10^k resolution: - 1 (time_t): time(), mktime(), localtime(), sleep(), ... - 10^-3 (int): poll() - 10^-6 (timeval, useconds_t): select(), gettimeofday(), usleep(), ... - 10^-9 (timespec): nanosleep(), utimensat(), clock_gettime(), ... decimal and cdecimal (_decimal) have the same performance issue, so I don't expect them to become the standard float type. But Decimal is able to store exactly a timetamp with a resolution of 10^k. There are also IEEE 754 for floating point types in base 10: decimal floating point (DFP), in 32, 64 and 128 bits. IBM System z9, System z10 and POWER6 CPU support these types in hardware. We may support this format in a specific module, or maybe use it to speedup the Python decimal module. But same issue here, such hardware is also rare, so we cannot use them by default or rely on them. Victor
On Fri, Feb 17, 2012 at 9:33 PM, Victor Stinner <victor.stinner@gmail.com> wrote:
Maybe it's okay to wait a few years on this, until either 128-bit floats are more common or cDecimal becomes the default floating point type? In the mean time for clock freaks we can have a few specialized APIs that return times in nanoseconds as a (long) integer.
I don't think that the default float type does really matter here. If I understood correctly, the major issue with Decimal is that Decimal is not fully "compatible" with float: Decimal+float raises a TypeError.
Can't we improve the compatibility between Decimal and float, e.g. by allowing Decimal+float? Decimal (base 10) + float (base 2) may loss precision and this issue matters in some use cases. So we still need a way to warn the user on loss of precision. We may add a global flag to allow Decimal+float and turn it on by default. Developers concerns by loss of precision can just turn the flag off at startup. Something like what we did in Python 2: allow str+unicode, and only switch to unicode when unicode was mature enough and well accepted :-)
Disallowing implicit binary float and Decimal interoperability was a deliberate design decision in the original Decimal PEP, in large part to discourage use of binary floats in applications where exact Decimal values are required. While this has been relaxed slightly to allow the exact explicit conversion of a binary float value to its full binary precision Decimal equivalent, the original rationale against implicit interoperability still seems valid (See http://www.python.org/dev/peps/pep-0327/#id17). OTOH, people have long had to cope with the fact that integer+float interoperability runs the risk of triggering ValueError if the integer is too large - it seems to me that the signalling behaviour of implicit promotions from float to Decimal could be adequately controlled with the Inexact flag on the Decimal context. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Victor Stinner <victor.stinner@gmail.com> wrote:
Can't we improve the compatibility between Decimal and float, e.g. by allowing Decimal+float? Decimal (base 10) + float (base 2) may loss precision and this issue matters in some use cases. So we still need a way to warn the user on loss of precision.
I think this should be discussed in a separate thread. It's getting slightly difficult to follow all the issues raised here.
decimal and cdecimal (_decimal) have the same performance issue, don't expect them to become the standard float type.
Well, _decimal in tight loops is about 2 times slower than float. There are areas where _decimal is actually faster than float, e.g in the cdecimal repository printing and formatting seems to be significantly faster: $ cat format.py import time from decimal import Decimal d = Decimal("7.928137192") f = 7.928137192 out = open("/dev/null", "w") start = time.time() for i in range(1000000): out.write("%s\n" % d) end = time.time() print("Decimal: ", end-start) start = time.time() for i in range(1000000): out.write("%s\n" % f) end = time.time() print("float: ", end-start) start = time.time() for i in range(1000000): out.write("{:020,.30}\n".format(d)) end = time.time() print("Decimal: ", end-start) start = time.time() for i in range(1000000): out.write("{:020,.30}\n".format(f)) end = time.time() print("float: ", end-start) $ ./python format.py Decimal: 0.8835508823394775 float: 1.3872010707855225 Decimal: 2.1346139907836914 float: 3.154278039932251 So it would make sense to profile the exact application in order to determine the suitability of _decimal for timestamps.
There are also IEEE 754 for floating point types in base 10: decimal floating point (DFP), in 32, 64 and 128 bits. IBM System z9, System z10 and POWER6 CPU support these types in hardware. We may support this format in a specific module, or maybe use it to speedup the Python decimal module.
Apart from the rarity of these systems, decimal.py is arbitrary precision. If I restricted _decimal to DECIMAL64, I could probably speed it up further. All that said, personally I wouldn't have problems with a chunked representation that includes nanoseconds, thus avoiding the decimal/float discusion entirely. I'm also a happy user of: http://cr.yp.to/libtai/tai64.html#tai64n Stefan Krah
PEP author Victor asked (in http://mail.python.org/pipermail/python-dev/2012-February/116499.html):
Maybe I missed the answer, but how do you handle timestamp with an unspecified starting point like os.times() or time.clock()? Should we leave these function unchanged?
If *all* you know is that it is monotonic, then you can't -- but then you don't really have resolution either, as the clock may well speed up or slow down.
If you do have resolution, and the only problem is that you don't know what the epoch was, then you can figure that out well enough by (once per type per process) comparing it to something that does have an epoch, like time.gmtime().
Hum, I suppose that you can expect that time.time() - time.monotonic() is constant or evolve very slowly. time.monotonic() should return a number of second. But you are right, usually monotonic clocks are less accurate. On Windows, QueryPerformanceCounter() is less accurate than GetSystemTimeAsFileTime() for example: http://msdn.microsoft.com/en-us/magazine/cc163996.aspx (read the "The Issue of Frequency" section) time.monotonic() (function added to Python 3.3) documentation should maybe mention the second unit and the accuracy issue. Victor
2012/2/15 "Martin v. Löwis" <martin@v.loewis.de>:
I agree with Barry here (despite having voiced support for using Decimal before): datetime.datetime *is* the right data type to represent time stamps. If it means that it needs to be improved before it can be used in practice, then so be it - improve it.
Decimal and datetime.datetime are not necessary exclusive options. Using the API proposed in the PEP, we can add the Decimal type today, then improve datetime.datetime API, and finally add also datetime.datetime type. Such compromise would solve the unspecified starting date issue: an exception would be raised if the timestamp has an unspecified timestamp. In such case, you can still get the timestamp as a Decimal object with nanosecond resolution. Or we may add support of datetime and Decimal today, even if datetime only support microsecond, and improve datetime later to support nanosecond. It looks like there are use cases for Decimal and datetime, both are useful. At least, datetime has a nice object API related to time, whereas Decimal requires functions from other modules. I don't know yet if one type is enough to handle all use cases. I wrote a patch to demonstrate that my internal API can be extended (store more information for new types like datetime.datetime) to add new types later, without touching the public API (func(timestamp=type)). See timestamp_datetime.patch attached to the issue #13882 (the patch is now outside, I can update it if you would like to). For example: - time.time() would support float, Decimal and datetime - os.times() would support float and Decimal (but not datetime) Victor
I'd like to remind people what the original point of the PEP process was: to avoid going in cycles in discussions. To achieve this, the PEP author is supposed to record all objections in the PEP, even if he disagrees (and may state rebuttals for each objection that people brought up).
So, Victor: please record all objections in a separate section of the PEP, rather than just rebutting in them in the PEP (as is currently the case).
Ok, I will try to list alternatives differently, e.g. by listing also advantages. I didn't know what a PEP is supposed to contain. Victor
On Feb 15, 2012, at 10:11 AM, Martin v. Löwis wrote:
I think improving datetime needs to go in two directions: a) arbitrary-precision second fractions. My motivation for proposing/supporting Decimal was that it can support arbitrary precision, unlike any of the alternatives (except for using numerator/denominator pairs). So just adding nanosecond resolution to datetime is not enough: it needs to support arbitrary decimal fractions (it doesn't need to support non-decimal fractions, IMO). b) distinction between universal time and local time. This distinction is currently blurred; there should be prominent API to determine whether a point-in-time is meant as universal time or local time. In terminology of the datetime documentation, there needs to be builtin support for "aware" (rather than "naive") UTC time, even if that's the only timezone that comes with Python.
+1 -Barry
participants (18)
-
"Martin v. Löwis" -
Alexander Belopolsky -
Antoine Pitrou -
Barry Warsaw -
Ben Finney -
Benjamin Peterson -
Dirkjan Ochtman -
Georg Brandl -
Greg Ewing -
Gregory P. Smith -
Guido van Rossum -
Jim J. Jewett -
Larry Hastings -
Mark Shannon -
Nick Coghlan -
Stefan Krah -
Steven D'Aprano -
Victor Stinner