A number of systems provide subsecond time stamp resolution for files. In particular: - NFS v3 has nanosecond time stamps. - Solaris 9 has nanosecond time stamps in stat(2), and microsecond time stamps in utimes(2). In addition, they have microsecond time stamps on ufs. It appears that other Unices have also extended stat(2), as does OS X. - NTFS has 100ns resolution for time stamps. I'd like to expose atleast the stat extensions to Python. Adding new fields to stat_result is easy enough, but there are a number of alternatives: A. Add an additional field to hold the nanoseconds, i.e. st_mtimensec, st_atimensec, st_ctimensec. This is the BSD Posix extension. B. Follow the Unix API (Solaris and others). They define a struct timespec_t { time_t tv_sec; unsigned long tv_nsec; }; and fields st_mtim, st_ctim, st_atim of timespec_t. For compatibility, they #define st_mtime st_mtim.tv_sec So to get at the seconds, you can write either st_mtim.tv_sec, or st_mtime. For the nanoseconds, you write st_mtim.tv_nsec. This requires to add a new type. C. Make st_mtime a floating point number. This won't offer nanosecond resolution, as C doubles are not dense enough. What do you think? Regards, Martin
On 6 Sep 2002, Martin v. Löwis wrote:
A number of systems provide subsecond time stamp resolution for files. In particular:
- NFS v3 has nanosecond time stamps.
- Solaris 9 has nanosecond time stamps in stat(2), and microsecond time stamps in utimes(2). In addition, they have microsecond time stamps on ufs. It appears that other Unices have also extended stat(2), as does OS X.
- NTFS has 100ns resolution for time stamps.
(---)
C. Make st_mtime a floating point number. This won't offer nanosecond resolution, as C doubles are not dense enough.
This seems to me the most Pythonic way. Are C doubles dense enough to offer 100 ns resolution ? /Paul
Paul Svensson <paul-python@svensson.org> writes:
This seems to me the most Pythonic way. Are C doubles dense enough to offer 100 ns resolution ?
It looks like they are:
time.time() 1031326478.373606 1031326478 + 1e-6 1031326478.000001 1031326478 + 1e-7 1031326478.0000001 1031326478 + 1e-8 1031326478.0
but only just so:
1031326478 + 2e-7 1031326478.0000002 1031326478 + 3e-7 1031326478.0000004 1031326478 + 4e-7 1031326478.0000004
I admit that this looks tempting, but I'm worried about applications that break because they expect time stamps in struct stat to be integers. Regards, Martin
This seems to me the most Pythonic way.
I admit that this looks tempting, but I'm worried about applications that break because they expect time stamps in struct stat to be integers.
Hm, so maybe new field names is still the way to go. E.g. st_mtime gives an int, st_mtimef gives a float. The tuple version only gives the int. If the system doesn't support subsecond resolution, the st_mtimef field still exists but is an int (no point allocating a float and converting the int). --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum <guido@python.org> writes:
Hm, so maybe new field names is still the way to go. E.g. st_mtime gives an int, st_mtimef gives a float. The tuple version only gives the int. If the system doesn't support subsecond resolution, the st_mtimef field still exists but is an int (no point allocating a float and converting the int).
OTOH, I just found that the time values are already floats on the Mac. Did the change in return value for time.time() cause any problems at the time it was made? Regards, Martin
On zaterdag, september 7, 2002, at 09:35 , Martin v. Loewis wrote:
Guido van Rossum <guido@python.org> writes:
Hm, so maybe new field names is still the way to go. E.g. st_mtime gives an int, st_mtimef gives a float. The tuple version only gives the int. If the system doesn't support subsecond resolution, the st_mtimef field still exists but is an int (no point allocating a float and converting the int).
OTOH, I just found that the time values are already floats on the Mac. Did the change in return value for time.time() cause any problems at the time it was made?
It's been causing me headaches in the form of failing test suites about once a year:-) But if I break down the time problems I have on the Mac (100% of which are due to people having a completely unix-centric idea of what a timestamp is) I would say 90% are due to the Mac epoch being in 1904 in stead of in 1970, 9% are due to mac timestamps being localtime in stead of GMT and only 1% are due to the timestamps being floats. And the latter are the easiest to fix, too. The localtime/gmt issues are the hardest, especially because of DST. My preference would be that st_mtime and all other such values are defined to be cookies (sort of similar to lseek values). You would then invoke one of the mythical Python datetime routines to convert the cookie into something guaranteed to be of your liking. (and this specific datetime routine would be platform dependent). If you use the cookie as-is you have a good chance of it working, but you're living dangerously (an analogy would be opening a binary file without "rb"). But this isn't very friendly for backwards compatibility... -- - Jack Jansen <Jack.Jansen@oratrix.com> http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman -
Hm, so maybe new field names is still the way to go. E.g. st_mtime gives an int, st_mtimef gives a float. The tuple version only gives the int. If the system doesn't support subsecond resolution, the st_mtimef field still exists but is an int (no point allocating a float and converting the int).
OTOH, I just found that the time values are already floats on the Mac. Did the change in return value for time.time() cause any problems at the time it was made?
It's been causing me headaches in the form of failing test suites about once a year:-) But if I break down the time problems I have on the Mac (100% of which are due to people having a completely unix-centric idea of what a timestamp is) I would say 90% are due to the Mac epoch being in 1904 in stead of in 1970, 9% are due to mac timestamps being localtime in stead of GMT and only 1% are due to the timestamps being floats. And the latter are the easiest to fix, too. The localtime/gmt issues are the hardest, especially because of DST.
I'm not sure if this can be used as an argument for making st_mtime and friends floats and be done with it. I wish it could be, because in the long run that's a much nicer API than adding new fields.
My preference would be that st_mtime and all other such values are defined to be cookies (sort of similar to lseek values). You would then invoke one of the mythical Python datetime routines to convert the cookie into something guaranteed to be of your liking. (and this specific datetime routine would be platform dependent). If you use the cookie as-is you have a good chance of it working, but you're living dangerously (an analogy would be opening a binary file without "rb"). But this isn't very friendly for backwards compatibility...
There's at least one place I know of in Python that assumes the epoch being 1970: calendar.timegm() -- note the line "EPOCH = 1970" right in front of it. :-) Would it make sense if the portable Python APIs translated everything to an epoch of 1970 and UTC? That's what the Windows C library does. Very helpful. (Or is this a problem that's going to disappear with MacOS X? I presume it uses UTC and I hope its epoch is 1970?) --Guido van Rossum (home page: http://www.python.org/~guido/)
On zondag, september 8, 2002, at 01:24 , Guido van Rossum wrote:
Would it make sense if the portable Python APIs translated everything to an epoch of 1970 and UTC? That's what the Windows C library does. Very helpful. (Or is this a problem that's going to disappear with MacOS X? I presume it uses UTC and I hope its epoch is 1970?)
On MacOSX (if you use unix-based Python, not if you use old MacPython) the problem is gone. At least, if you ignore the timestamps returned by mac-specific filesystem routines, but I think we can do that safely. Changing the APIs to return unix-style timestamps is what the GUSI unix-compatible socket and I/O library used by MacPython did originally, but I had to rip it out. The problem was that GUSI did provide all the unix system calls, but not the other library routines that handled timestamps. So these were provided by the Metrowerks C library, which assumes localtime. So ctime() and gmtime() and all its friends did the wrong thing, and I didn't cherish the idea of finding replacements for them. If your suggestion is that every timestamp goes through a conversion routine before being passed from C to Python and through a reverse conversion when it goes from Python to C: yes, that would definitely make sense. -- - Jack Jansen <Jack.Jansen@oratrix.com> http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman -
[Paul Svensson]
Are C doubles dense enough to offer 100 ns resolution ?
The question can't be answered unless you also specify how many years you want to cover. It takes about 25 bits to distinguish a year's worth of seconds, and an IEEE double has 53 bits to play with. So if you were only interested in representing one year, you've got about 28 bits left to play with. If you want to cover an N-year span, you've got about 28 - log2(N) bits to play with. It takes a bit over 23 bits to distinguish the number of 100 ns slices in a second, so N has to be small enough that 5 - log2(N) doesn't go negative. So if you count the start of the epoch at 1970, you've just created a year 2003 problem <wink>.
C. Make st_mtime a floating point number. This won't offer nanosecond resolution, as C doubles are not dense enough.
This is the most Pythonic approach. --Guido van Rossum (home page: http://www.python.org/~guido/)
On Fri, 6 Sep 2002, Guido van Rossum wrote:
C. Make st_mtime a floating point number. This won't offer nanosecond resolution, as C doubles are not dense enough.
This is the most Pythonic approach.
-1 This then locks Python into a specific bit-description notion of a double in order to get the appropriate number of significant digits to describe time sufficiently. Embedded/portable processors may not support the notion of an IEEE double. In addition, timers get increasingly dense as computers get faster. Thus, doubles may work for nanoseconds, but will not be sufficient for picoseconds. If the goal is a field which never has to be changed to support any amount of time, the value should be "infinite precision". At that point, a Python Long used in some tuple representation of fixed-point arithmetic springs to mind. ie. (<long>, <bit of fractional point>) -a
C. Make st_mtime a floating point number. This won't offer nanosecond resolution, as C doubles are not dense enough.
This is the most Pythonic approach.
-1
This then locks Python into a specific bit-description notion of a double in order to get the appropriate number of significant digits to describe time sufficiently. Embedded/portable processors may not support the notion of an IEEE double.
In addition, timers get increasingly dense as computers get faster. Thus, doubles may work for nanoseconds, but will not be sufficient for picoseconds.
If the goal is a field which never has to be changed to support any amount of time, the value should be "infinite precision". At that point, a Python Long used in some tuple representation of fixed-point arithmetic springs to mind. ie. (<long>, <bit of fractional point>)
I'm sorry, but I really don't see the point of wanting to record file mtimes all the way up to nanosecond precision. What would it mean? Most clocks are off by a few seconds at least anyway. Python has represented time as Pythin floats (implemented as C doubles) all its life long and it has served us well. --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote I'm sorry, but I really don't see the point of wanting to record file mtimes all the way up to nanosecond precision. What would it mean? Most clocks are off by a few seconds at least anyway.
Not only that, but if you're that precise, are you measuring the time when the modification started, the time when it started hitting the disks, when the write on the disk completed, when the O/S signalled to the application that the modification was complete... questions questions.. .:)
Anthony Baxter <anthony@interlink.com.au> writes:
Not only that, but if you're that precise, are you measuring the time when the modification started, the time when it started hitting the disks, when the write on the disk completed, when the O/S signalled to the application that the modification was complete... questions questions.. .:)
For Python, these questions are easy to answer: We just report to the application what the system reports to us. It the the file system implementor's job to define the notion of modification time. Regards, Martin
"Andrew P. Lentvorski" <bsder@mail.allcaps.org> writes:
This then locks Python into a specific bit-description notion of a double in order to get the appropriate number of significant digits to describe time sufficiently. Embedded/portable processors may not support the notion of an IEEE double.
That's not true. Support you have two fields, tv_sec and tv_nsec. Then the resulting float expression is tv_sec + 1e-9 * tv_nsec; This expression works on all systems that support floating point numbers - be it IEEE or not.
In addition, timers get increasingly dense as computers get faster. Thus, doubles may work for nanoseconds, but will not be sufficient for picoseconds.
At the same time, floating point numbers get increasingly more accurate as computer registers widen. In a 64-bit float, you can just barely express 1e-7s (if you base the era at 1970); with a 128-bit float, you can express 1e-20s easily.
If the goal is a field which never has to be changed to support any amount of time, the value should be "infinite precision".
No, just using floating point numbers is sufficient. Notice that time.time() also returns a floating point number.
At that point, a Python Long used in some tuple representation of fixed-point arithmetic springs to mind. ie. (<long>, <bit of fractional point>)
Yes, when/if Python gets rational numbers, or decimal fixed-or-floating point numbers, those data types might represent the the value that the system reports more accurately. At that time, there will be a transition plan to introduce those numbers at all places where it is reasonable, with as little impact on applications as possible. Regards, Martin
MvL wrote:
That's not true. Support you have two fields, tv_sec and tv_nsec. Then the resulting float expression is
tv_sec + 1e-9 * tv_nsec;
This expression works on all systems that support floating point numbers - be it IEEE or not.
Don't you have to truncate tv_sec for that to work? i.e. Truncate(tv_sec, 9) + 1e-9 * tv_nsec Cheers, Brian
participants (9)
-
Andrew P. Lentvorski
-
Anthony Baxter
-
Brian Quinlan
-
Guido van Rossum
-
Jack Jansen
-
loewis@informatik.hu-berlin.de
-
martin@v.loewis.de
-
Paul Svensson
-
Tim Peters