[Python-ideas] Add time.time_ns(): system clock with nanosecond resolution

Fri Oct 13 10:12:39 EDT 2017

Hi,

I would like to add new functions to return time as a number of
nanosecond (Python int), especially time.time_ns().

It would enhance the time.time() clock resolution. In my experience,
it decreases the minimum non-zero delta between two clock by 3 times,
new "ns" clock versus current clock: 84 ns (2.8x better) vs 239 ns on
Linux, and 318 us (2.8x better) vs 894 us on Windows, measured in
Python.

The question of this long email is if it's worth it to add more "_ns"
time functions than just time.time_ns()?

I would like to add:

* time.time_ns()
* time.monotonic_ns()
* time.perf_counter_ns()
* time.clock_gettime_ns()
* time.clock_settime_ns()

time(), monotonic() and perf_counter() clocks are the 3 most common
clocks and users use them to get the best available clock resolution.
clock_gettime/settime() are the generic UNIX API to access these
clocks and so should also be enhanced to get nanosecond resolution.

== Nanosecond resolution ==

More and more clocks have a frequency in MHz, up to GHz for the "TSC"
CPU clock, and so the clocks resolution is getting closer to 1
nanosecond (or even better than 1 ns for the TSC clock!).

The problem is that Python returns time as a floatting point number
which is usually a 64-bit binary floatting number (in the IEEE 754
format). This type starts to loose nanoseconds after 104 days.
Conversion from nanoseconds (int) to seconds (float) and then back to
nanoseconds (int) to check if conversions loose precision:

# no precision loss
>>> x=2**52+1; int(float(x * 1e-9) * 1e9) - x
0
# precision loss! (1 nanosecond)
>>> x=2**53+1; int(float(x * 1e-9) * 1e9) - x
-1
>>> print(datetime.timedelta(seconds=2**53 / 1e9))
104 days, 5:59:59.254741

While a system administrator can be proud to have an uptime longer
than 104 days, the problem also exists for the time.time() clock which
returns the number of seconds since the UNIX epoch (1970-01-01). This
clock started to loose nanoseconds since mid-May 1970 (47 years ago):

>>> import datetime
>>> print(datetime.datetime(1970, 1, 1) + datetime.timedelta(seconds=2**53 / 1e9))
1970-04-15 05:59:59.254741

== PEP 410 ==

Five years ago, I proposed a large and complex change in all Python
functions returning time to support nanosecond resolution using the
decimal.Decimal type:

   https://www.python.org/dev/peps/pep-0410/

The PEP was rejected for different reasons:

* it wasn't clear if hardware clocks really had a resolution of 1
nanosecond, especially when the clock is read from Python, since
reading a clock in Python also takes time...

* Guido van Rossum rejected the idea of adding a new optional
parameter to change the result type: it's an uncommon programming
practice (bad design in Python)

* decimal.Decimal is not widely used, it might be surprised to get such type

== CPython enhancements of the last 5 years ==

Since this PEP was rejected:

* the os.stat_result got 3 fields for timestamps as nanoseconds
(Python int): st_atime_ns, st_ctime_ns, st_mtime_ns

* Python 3.3 got 3 new clocks: time.monotonic(), time.perf_counter()
and time.process_time()

* I enhanced the private C API of Python handling time (API called
"pytime") to store all timings as the new _PyTime_t type which is a
simple 64-bit signed integer. The unit of _PyTime_t is not part of the
API, it's an implementation detail. The unit is currently 1
nanosecond.

This week, I converted one of the last clock to new _PyTime_t format:
time.perf_counter() now has internally a resolution of 1 nanosecond,
instead of using the C double type.

XXX technically https://github.com/python/cpython/pull/3983 is not
merged yet :-)

== Clocks resolution in Python ==

I implemented time.time_ns(), time.monotonic_ns() and
time.perf_counter_ns() which are similar of the functions without the
"_ns" suffix, but return time as nanoseconds (Python int).

I computed the smallest difference between two clock reads (ignoring a
differences of zero):

Linux:

* time_ns(): 84 ns <=== !!!
* time(): 239 ns <=== !!!
* perf_counter_ns(): 84 ns
* perf_counter(): 82 ns
* monotonic_ns(): 84 ns
* monotonic(): 81 ns

Windows:

* time_ns(): 318000 ns <=== !!!
* time(): 894070 ns <=== !!!
* perf_counter_ns(): 100 ns
* perf_counter(): 100 ns
* monotonic_ns(): 15000000 ns
* monotonic(): 15000000 ns

The difference on time.time() is significant: 84 ns (2.8x better) vs
239 ns on Linux and 318 us (2.8x better) vs 894 us on Windows. The
difference will be larger next years since every day adds
864,00,000,000,000 nanoseconds to the system clock :-) (please don't
bug me with leap seconds! you got my point)

The difference on perf_counter and monotonic clocks are not visible in
this quick script since my script runs less than 1 minute, my computer
uptime is smaller than 1 weak, ... and Python internally starts these
clocks at zero *to reduce the precision loss*! Using an uptime larger
than 104 days, you would probably see a significant difference (at
least +/- 1 nanosecond) between the regular (seconds as double) and
the "_ns" (nanoseconds as int) clocks.

== How many new nanosecond clocks? ==

The PEP 410 proposed to modify the following functions:

* os module: fstat(), fstatat(), lstat(), stat() (st_atime, st_ctime
and st_mtime fields of the stat structure), sched_rr_get_interval(),
times(), wait3() and wait4()

* resource module: ru_utime and ru_stime fields of getrusage()

* signal module: getitimer(), setitimer()

* time module: clock(), clock_gettime(), clock_getres(), monotonic(),
time() and wallclock() ("wallclock()" was finally called "monotonic",
see PEP 418)

According to my tests of the previous section, the precision loss
starts after 104 days (stored in nanoseconds). I don't know if it's
worth it to modify functions which return "CPU time" or "process time"
of processes, since most processes live shorter than 104 days. Do you
care of a resolution of 1 nanosecond for the CPU and process time?

Maybe we need 1 nanosecond resolution for profiling and benchmarks.
But in that case, you might want to implement your profiler in C
rather in Python, like the hotshot module, no? The "pytime" private
API of CPython gives you clocks with a resolution of 1 nanosecond.

== Annex: clock performance ==

To have an idea of the cost of reading the clock on the clock
resolution in Python, I also ran a microbenchmark on *reading* a
clock. Example:

$ ./python -m perf timeit --duplicate 1024 -s 'import time; t=time.time' 't()'

Linux (Mean +- std dev):

* time.time(): 45.4 ns +- 0.5 ns
* time.time_ns(): 47.8 ns +- 0.8 ns
* time.perf_counter(): 46.8 ns +- 0.7 ns
* time.perf_counter_ns(): 46.0 ns +- 0.6 ns

Windows (Mean +- std dev):

* time.time(): 42.2 ns +- 0.8 ns
* time.time_ns(): 49.3 ns +- 0.8 ns
* time.perf_counter(): 136 ns +- 2 ns <===
* time.perf_counter_ns(): 143 ns +- 4 ns <===
* time.monotonic(): 38.3 ns +- 0.9 ns
* time.monotonic_ns(): 48.8 ns +- 1.2 ns

Most clocks have the same performance except of perf_counter on
Windows: around 140 ns whereas other clocks are around 45 ns (on Linux
and Windows): 3x slower. Maybe the "bad" perf_counter performance can
be explained by the fact that I'm running Windows in a VM, which is
not ideal for benchmarking. Or maybe my C implementation of
time.perf_counter() is slow?

Note: I expect that a significant part of the numbers are the cost of
Python function calls. Reading these clocks using the Python C
functions are likely faster.

Victor