Mailman 3 Store timestamps as decimal.Decimal objects - Python-Dev

Store timestamps as decimal.Decimal objects

Victor Stinner

Jan. 30, 2012

11:31 p.m.

Hi, In issues #13882 and #11457, I propose to add an argument to functions returning timestamps to choose the timestamp format. Python uses float in most cases whereas float is not enough to store a timestamp with a resolution of 1 nanosecond. I added recently time.clock_gettime() to Python 3.3 which has a resolution of a nanosecond. The (first?) new timestamp format will be decimal.Decimal because it is able to store any timestamp in any resolution without loosing bits. Instead of adding a boolean argument, I would prefer to support more formats. My last patch provides the following formats: - "float": float (used by default) - "decimal": decimal.Decimal - "datetime": datetime.datetime - "timespec": (sec, nsec) tuple # I don't think that we need it, it is just another example The proposed API is: time.time(format="datetime") time.clock_gettime(time.CLOCK_REALTIME, format="decimal") os.stat(path, timestamp="datetime) etc. This API has an issue: importing the datetime or decimal object is implicit, I don't know if it is really an issue. (In my last patch, the import is done too late, but it can be fixed, it is not really a matter.) Alexander Belopolsky proposed to use time.time(format=datetime.datetime) instead. -- The first step would be to add an argument to functions returning timestamps. The second step is to accept these new formats (Decimal?) as input, for datetime.datetime.fromtimestamp() and os.utime() for example. (Using decimal.Decimal, we may remove os.utimens() and use the right function depending on the timestamp resolution.) -- I prefer Decimal over a dummy tuple like (sec, nsec) because you can do arithmetic on it: t2-t1, a+b, t/k, etc. It stores also the resolution of the clock: time.time() and time.clock_gettime() have for example different resolution (sec, ms, us for time.time() and ns for clock_gettime()). The decimal module is still implemented in Python, but there is working implementation in C which is much faster. Store timestamps as Decimal can be a motivation to integrate the C implementation :-) -- Examples with the time module: $ ./python Python 3.3.0a0 (default:52f68c95e025+, Jan 26 2012, 21:54:31)

...

Examples with os.stat: $ ./python Python 3.3.0a0 (default:2914ce82bf89+, Jan 30 2012, 23:07:24)

...

Victor

Show replies by date

Matt Joiner

January 2012

11:50 p.m.

Sounds good, but I also prefer Alexander's method. The type information is already encoded in the class object. This way you don't need to maintain a mapping of strings to classes, and other functions/third party can join in the fun without needing access to the latest canonical mapping. Lastly there will be no confusion or contention for duplicate keys. On Jan 31, 2012 10:32 AM, "Victor Stinner" <victor.stinner@haypocalc.com> wrote:

...

Hi,

In issues #13882 and #11457, I propose to add an argument to functions returning timestamps to choose the timestamp format. Python uses float in most cases whereas float is not enough to store a timestamp with a resolution of 1 nanosecond. I added recently time.clock_gettime() to Python 3.3 which has a resolution of a nanosecond. The (first?) new timestamp format will be decimal.Decimal because it is able to store any timestamp in any resolution without loosing bits. Instead of adding a boolean argument, I would prefer to support more formats. My last patch provides the following formats:

- "float": float (used by default) - "decimal": decimal.Decimal - "datetime": datetime.datetime - "timespec": (sec, nsec) tuple # I don't think that we need it, it is just another example

The proposed API is:

time.time(format="datetime") time.clock_gettime(time.CLOCK_REALTIME, format="decimal") os.stat(path, timestamp="datetime) etc.

This API has an issue: importing the datetime or decimal object is implicit, I don't know if it is really an issue. (In my last patch, the import is done too late, but it can be fixed, it is not really a matter.)

Alexander Belopolsky proposed to use time.time(format=datetime.datetime) instead.

--

The first step would be to add an argument to functions returning timestamps. The second step is to accept these new formats (Decimal?) as input, for datetime.datetime.fromtimestamp() and os.utime() for example.

(Using decimal.Decimal, we may remove os.utimens() and use the right function depending on the timestamp resolution.)

--

I prefer Decimal over a dummy tuple like (sec, nsec) because you can do arithmetic on it: t2-t1, a+b, t/k, etc. It stores also the resolution of the clock: time.time() and time.clock_gettime() have for example different resolution (sec, ms, us for time.time() and ns for clock_gettime()).

The decimal module is still implemented in Python, but there is working implementation in C which is much faster. Store timestamps as Decimal can be a motivation to integrate the C implementation :-)

--

Examples with the time module:

$ ./python Python 3.3.0a0 (default:52f68c95e025+, Jan 26 2012, 21:54:31)

...
...
...
import time time.time() 1327611705.948446 time.time('decimal') Decimal('1327611708.988419') t1=time.time('decimal'); t2=time.time('decimal'); t2-t1 Decimal('0.000550') t1=time.time('float'); t2=time.time('float'); t2-t1 5.9604644775390625e-06 time.clock_gettime(time.CLOCK_MONOTONIC, 'decimal') Decimal('1211833.389740312') time.clock_getres(time.CLOCK_MONOTONIC, 'decimal') Decimal('1E-9') time.clock() 0.12 time.clock('decimal') Decimal('0.120000')

Examples with os.stat:

$ ./python Python 3.3.0a0 (default:2914ce82bf89+, Jan 30 2012, 23:07:24)

...
...
...
import os s=os.stat("setup.py", timestamp="datetime") s.st_mtime - s.st_ctime datetime.timedelta(0) print(s.st_atime - s.st_ctime) 52 days, 1:44:06.191293 os.stat("setup.py", timestamp="timespec").st_ctime (1323458640, 702327236) os.stat("setup.py", timestamp="decimal").st_ctime Decimal('1323458640.702327236')

Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com

Georg Brandl

6:22 a.m.

Am 31.01.2012 00:50, schrieb Matt Joiner:

...

Sorry, I don't think it makes any sense to pass around classes as flags. Sure, if you do something directly with the class, it's fine, but in this case that's impossible. So you will be testing if format is datetime.datetime: ... elif format is decimal.Decimal: ... else: ... which has no advantage at all over if format == "datetime": ... elif format == "decimal": ... else: Not to speak of formats like "timespec" that don't have a respective class. And how do you propose to handle the extensibility you speak of to work? Georg

Victor Stinner

12:08 p.m.

Hi, 2012/1/31 Matt Joiner <anacrolix@gmail.com>:

...

Sounds good, but I also prefer Alexander's method. The type information is already encoded in the class object.

Ok, I posted a patch version 6 to use types instead of strings. I also prefer types because it solves the "hidden import" issue.

...

My patch checks isinstance(format, type), format.__module__ and format.__name__ to do the "mapping". It is not a direct mapping because I don't always call the same method, the implementation is completly differenet for each type. I don't think that we need user defined timestamp formats. My last patch provides 5 formats: - int - float - decimal.Decimal - datetime.datetime - datetime.timedelta (I removed the timespec format, I consider that we don't need it.) Examples: >>> time.time() 1328006975.681211 >>> time.time(format=int) 1328006979 >>> time.time(format=decimal.Decimal) Decimal('1328006983.761119') >>> time.time(format=datetime.datetime) datetime.datetime(2012, 1, 31, 11, 49, 49, 409831) >>> print(time.time(format=datetime.timedelta)) 15370 days, 10:49:52.842116 If someone wants another format, he/she should pick up an existing format to build his/her own format. datetime.datetime and datetime.timedelta can be used on any function, but datetime.datetime format gives surprising results on clocks using an arbitrary start like time.clock() or time.wallclock(). We may raise an error in these cases.

Georg Brandl

8:49 p.m.

Am 31.01.2012 13:08, schrieb Victor Stinner:

...

Rather, I guess you removed it because it didn't fit the "types as flags" pattern. As I said in another message, another hint that this is the wrong API design: Will the APIs ever support passing in types other than these five? Probably not, so I strongly believe they should not be passed in as types. Georg

Victor Stinner

9:41 p.m.

...

I removed it because I don't like tuple: you cannot do arithmetic on tuple, like t2-t1. Print a tuple doesn't give you a nice output. It is used in C because you have no other choice, but in Python, we can do better.

...

I don't know if we should only support 3 types today, or more, but I suppose that we will add more later (e.g. if datetime is replaced by another new and better datetime module). You mean that we should use a string instead of type, so time.time(format="decimal")? Or do something else? Victor

Matt Joiner

10:41 p.m.

Nick mentioned using a single type and converting upon return, I'm starting to like that more. A limited set of time formats is mostly arbitrary, and there will always be a performance hit deciding which type to return. The goal here is to allow high precision timings with minimal cost. A separate module, and an agreement on what the best performing high precision type is I think is the best way forward. On Feb 1, 2012 8:47 AM, "Victor Stinner" <victor.stinner@haypocalc.com> wrote:

...

Nick Coghlan

7:16 a.m.

On Tue, Jan 31, 2012 at 9:31 AM, Victor Stinner <victor.stinner@haypocalc.com> wrote:

...

I think this is definitely worth elaborating in a PEP (to recap the long discussion in #11457 if nothing else). In particular, I'd want to see a very strong case being made for supporting multiple formats over standardising on a *single* new higher precision format (for example, using decimal.Decimal in conjunction with integration of Stefan's cdecimal work) that can then be converted to other formats (like datetime) via the appropriate APIs. "There are lots of alternatives, so let's choose not to choose!" is a bad way to design an API. Helping to make decisions like this by laying out the alternatives and weighing up their costs and benefits is one of the major reasons the PEP process exists. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Victor Stinner

9:42 a.m.

...

I think this is definitely worth elaborating in a PEP (to recap the long discussion in #11457 if nothing else).

The discussion in issues #13882 and #11457 already lists many alternatives with their costs and benefits, but I can produce a PEP if you need a summary.

...

In particular, I'd want to see a very strong case being made for supporting multiple formats over standardising on a *single* new higher precision format (for example, using decimal.Decimal in conjunction with integration of Stefan's cdecimal work) that can then be converted to other formats (like datetime) via the appropriate APIs.

To convert a Decimal to a datetime object, we have already the datetime.datetime.fromtimestamp() function (it converts Decimal to float, but the function can be improved without touching its API). But I like the possibility of getting the file modification time directly as a datetime object to have something like:

...

...
...
s=os.stat("setup.py", timestamp="datetime") print(s.st_atime - s.st_ctime) 52 days, 1:44:06.191293

We have already more than one timestamp format: os.stat() uses int or float depending on os.stat_float_times() value. In 5 years, we may prefer to use directly float128 instead of Decimal. I prefer to have an extensible API to prepare future needs, even if we just add Decimal today. Hum, by the way, we need a "int" format for os.stat(), so os.stat_float_times() can be deprecated. So there will be a minimum of 3 types: - int - float - decimal.Decimal Victor

Nick Coghlan

11:11 a.m.

On Tue, Jan 31, 2012 at 7:42 PM, Victor Stinner <victor.stinner@haypocalc.com> wrote:

...

PEPs are about more than just providing a summary - they're about presenting the alternatives in a clear form instead of having them scattered across a long meandering tracker discussion. Laying out the alternatives and clearly articulating their pros and cons (as Larry attempted to do on the tracker) *helps to make better decisions*. I counted several options presented as possibilities and I probably missed some: - expose the raw POSIX (seconds, nanoseconds) 2-tuples (lots of good reasons not to go that way) - use decimal.Decimal (with or without cdecimal) - use float128 (nixed due to cross-platform supportability problems) - use datetime (bad idea for the reasons Martin mentioned) - use timedelta (not mentioned on the tracker, but a *much* better fit for a timestamp than datetime, since timestamps are relative to the epoch while datetime objects try to be absolute) A PEP would also allow the following items to be specifically addressed: - a survey of what other languages are doing to cope with nanosecond time resolutions (as suggested by Raymond but not actually done as far I could see on the tracker) - how to avoid a negative performance impact on os.stat() (new API? flag argument? new lazily populated attributes accessed by name only?) Guido's admonition against analysis paralysis doesn't mean we should go to the other extreme and skip clearly documenting our analysis of complex problems altogether (particularly for something like this which may end up having ramifications for a lot of other time related code). Having a low-level module like os needing to know about higher-level types like decimal.Decimal and datetime.datetime (or even timedelta) should be setting off all kinds of warning bells. Of all the possibilties that offer decent arithmetic support, timedelta is probably the one currently most suited to being pushed down to the os level, although decimal.Decimal is also a contender if backed up by Stefan's C implementation. You're right that supporting this does mean being able to at least select between 'int', 'float' and <high precision> output, but that's the kind of case that can be made most clearly in a PEP. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Antoine Pitrou

12:13 p.m.

On Tue, 31 Jan 2012 21:11:37 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:

...

Decimal is ideally low-level (it's a number), it's just that it has a complicated high-level implementation :) But we can't use Decimal by default, for the obvious reason (performance impact that threatens to contaminate other parts of the code through operator application).

...

I'm -1 on using timedelta. This is a purity proposition that will make no sense to the average user. By the way, datetimes are relative too, by the same reasoning. Regards Antoine.

Alexander Belopolsky

7:08 p.m.

On Tue, Jan 31, 2012 at 7:13 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

FWIW, my vote is also for Decimal and against datetime or timedelta. (I dream of Decimal replacing float in Python 4000, so take my vote with an appropriate amount of salt. :-)

Mark Shannon

10:58 p.m.

Alexander Belopolsky wrote:

...

Why not add a new function rather than modifying time.time()? (after all its just a timestamp, does it really need nanosecond precision?) For those who do want super-accuracy then add a new function time.picotime() (it could be nanotime but why not future proof it :) ) which returns an int represent the number of picoseconds since the epoch. ints never loose precision and never overflow. Cheers, Mark.

Nick Coghlan

February 2012

12:35 a.m.

On Wed, Feb 1, 2012 at 8:58 AM, Mark Shannon <mark@hotpy.org> wrote:

...

Because the problem is broader than that - it affects os.stat(), too, along with a number of the other time module APIs that produce timestamp values. That's where Alexander's suggestion of a separate "hirestime" module comes in - it would be based on the concept of *always* using a high precision type in the API (probably decimal.Decimal()). Conceptually, it's a very clean approach, and obviously has zero performance impact on existing APIs, but the idea of adding yet-another-time-related-module to the standard library is rather questionable. Such an approach is also likely to lead to a lot of duplicated code. Victor's current approach, unfortunately, is a bit of a "worst-of-both-worlds" approach. It couples the time and os modules to various other currently unrelated modules (such as datetime and decimal), but still doesn't provide a particularly extensible API (whether indicated by flags or strings, each new supported output type must be special cased in time and os). Perhaps more fruitful would be to revisit the original idea from the tracker of defining a conversion function protocol for timestamps using some basic fixed point arithmetic. The objection to using a conversion function that accepts a POSIX-style seconds+nanoseconds timespec is that it isn't future-proof - what if at some point in the future, nanonsecond resolution is considered inadequate? The secret to future-proofing such an API while only using integers lies in making the decimal exponent part of the conversion function signature: def from_components(integer, fraction=0, exponent=-9): return Decimal(integer) + Decimal(fraction) * Decimal((0, (1,), exponent)) >>> from_components(100) Decimal('100.000000000') >>> from_components(100, 100) Decimal('100.000000100') >>> from_components(100, 100) Decimal('100.000000100') >>> from_components(100, 100, -12) Decimal('100.000000000100') Such a protocol can easily be extended to any other type - the time module could provide conversion functions for integers and float objects (meaning results may have lower precision than the underlying system calls), while the existing "fromtimestamp" APIs in datetime can be updated to accept the new optional arguments (and perhaps an appropriate class method added to timedelta, too). A class method could also be added to the decimal module to construct instances from integer components (as shown above), since that method of construction isn't actually specific to timestamps. With this approach, API usage might end up looking something like:

...

This strategy would have negligible performance impact in already supported cases (just an extra check to determine that no callback was provided), and offer a very simple, yet fully general and future-proof, integer based callback protocol when you want your timestamps in a different format. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Antoine Pitrou

2:35 a.m.

On Wed, 1 Feb 2012 10:35:08 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:

...

It strikes me as inelegant to have to do so much typing for something as simple as getting the current time. We should approach the simplicity of ``time.time(format='decimal')`` or ``time.decimal_time()``. (and I think the callback thing is overkill) Regards Antoine.

Nick Coghlan

4:08 a.m.

On Wed, Feb 1, 2012 at 12:35 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

Getting the current time is simple (you can already do it), getting access to high precision time without performance regressions or backwards incompatiblities or excessive code duplication is hard. There's a very simple rule in large scale software development: coupling is bad and you should do everything you can to minimise it. Victor's approach throws that out the window by requiring that time and os know about every possible output format for time values. That's why protocols are so valuable: instead of having MxN points of interconnection, you just define a standard protocol as the basis for interaction, and the consumer of the protocol doesn't need to care about the details of the provider, they just care about the protocol itself. So, the question becomes how to solve the problem of exposing high resolution timestamps to Python code in a way that: - is applicable not just to time.time(), but also to os.stat(), time.clock(), time.wall_clock() and any other timestamp sources I've forgotten. - is backwards compatible for all those use cases - doesn't cause a significant performance regression for any of those use cases - doesn't cause excessive coupling between the time and os modules and other parts of Python - doesn't excessively duplicate code - doesn't add too much machinery for a relatively minor problem The one key aspect that I think Victor's suggestion gets right is that we want a way to request high precision time from the *existing* APIs, and that this needs to be selected on a per call basis rather than globally for the whole application. The big advantage of going with a callback based approach is that it gives you flexibility and low coupling without any additional supporting infrastructure, and you have the full suite of Python tools available to deal with any resulting verbosity issues. For example, it would become *trivial* to write Alexander's suggested "hirestime" module that always returned decimal.Decimal objects: _hires = decimal.Decimal.from_components def time(): return time.time(convert=_hires) def clock(): return time.clock(convert=_hires) def stat(path): return os.stat(path, timestamps=_hires) # etc... PJE is quite right that using a new named protocol rather than a callback with a particular signature could also work, but I don't see a lot of advantages in doing so. On the other hand, if you go with the "named output format", "hires=True" or new API approaches, you end up having to decide what additional coupling you're going to introduce to time and os. Now, in this case, I actually think there *is* a reasonable option available if we decide to go down that path: - incorporate Stefan Krah's cdecimal work into the standard library - add a "hires=False" flag to affected APIs - return a Decimal instance with full available precision if "hires=True" is passed in. - make time and os explicitly depend on the ability to create decimal.Decimal instances A hirestime module is even easier to implement in that case: def time(): return time.time(hires=True) def clock(): return time.clock(hires=True) def stat(path): return os.stat(path, hires=True) # etc... All of the other APIs (datetime, timedelta, etc) can then just be updated to also accept a Decimal object as input, rather than handling the (integer, fraction, exponent) callback signature I suggested. Either extreme (full flexibility via a callback API or protocol, or else settling specifically on decimal.Decimal and explicitly making time and os dependent on that type) makes sense to me. A wishy-washy middle ground that introduces a dependency from time and os onto multiple other modules *without* making the API user extensible doesn't seem reasonable at all. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Antoine Pitrou

11:08 a.m.

On Wed, 1 Feb 2012 14:08:34 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On Wed, Feb 1, 2012 at 12:35 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...
It strikes me as inelegant to have to do so much typing for something as simple as getting the current time. We should approach the simplicity of ``time.time(format='decimal')`` or ``time.decimal_time()``.

Getting the current time is simple (you can already do it), getting access to high precision time without performance regressions or backwards incompatiblities or excessive code duplication is hard.

The implementation of it might be hard, the API doesn't have to be. You can even use a callback system under the hood, you just don't have to *expose* that complication to the user.

...

There's a very simple rule in large scale software development: coupling is bad and you should do everything you can to minimise it.

The question is: is coupling worse than exposing horrible APIs? ;) If Decimal were a core object as float is, we wouldn't have this discussion because returning a Decimal would be considered "natural".

...

Victor's approach throws that out the window by requiring that time and os know about every possible output format for time values.

Victor's proposal is maximalist in that it proposes several different output formats. Decimal is probably enough for real use cases, though.

...

For example, it would become *trivial* to write Alexander's suggested "hirestime" module that always returned decimal.Decimal objects:

Right, but that's not even a plausible request. Nobody wants to write a separate time module just to have a different return type. Regards Antoine.

Nick Coghlan

11:26 a.m.

On Wed, Feb 1, 2012 at 9:08 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

Right, but that's not even a plausible request. Nobody wants to write a separate time module just to have a different return type.

I can definitely see someone doing "import hirestime as time" to avoid having to pass a flag everywhere, though. I don't think that should be the way *we* expose the functionality - I just think it's a possible end user technique we should keep in mind when assessing the alternatives. As I said in my last reply to Victor though, I'm definitely coming around to the point of view that supporting more than just Decimal is overgeneralising to the detriment of the API design. As you say, if decimal objects were a builtin type, we wouldn't even be considering alternative high precision representations - the only discussion would be about the details of the API for *requesting* high resolution timestamps (and while boolean flags are ugly, I'm not sure there's anything else that will satisfy backwards compatibility constraints). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

PJ Eby

5:27 p.m.

On Jan 31, 2012 11:08 PM, "Nick Coghlan" <ncoghlan@gmail.com> wrote:

...

The advantage is that it fits your brain better. That is, you don't have to remember another symbol besides the type you wanted. (There's probably fewer keystrokes involved, too.)

PJ Eby

2:40 a.m.

On Tue, Jan 31, 2012 at 7:35 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

Why not just make it something like __fromfixed__() and make it a standard protocol, implemented on floats, ints, decimals, etc. Then the API is just "time.time(type)", where type is any object providing a __fromfixed__ method. ;-)

Matt Joiner

3:02 a.m.

Analysis paralysis commence. +1 for separate module using decimal. On Feb 1, 2012 1:44 PM, "PJ Eby" <pje@telecommunity.com> wrote:

...

Victor Stinner

8:03 a.m.

2012/2/1 Nick Coghlan <ncoghlan@gmail.com>:

...

The fractional part is not necessary related to a power of 10. An earlier version of my patch used also powers of 10, but it didn't work (loose precision) for QueryPerformanceCounter() and was more complex than the new version. NTP timestamp uses a fraction of 2**32. QueryPerformanceCounter() (used by time.clock() on Windows) uses the CPU frequency. We may need more information when adding a new timestamp formats later. If we expose the "internal structure" used to compute any timestamp format, we cannot change the internal structure later without breaking (one more time) the API. My patch uses the format (seconds: int, floatpart: int, divisor: int). For example, I hesitate to add a field to specify the start of the timestamp: undefined for time.wallclock(), time.clock(), and time.clock_gettime(time.CLOCK_MONOTONIC), Epoch for other timestamps. My patch is similar to your idea except that everything is done internally to not have to expose internal structures, and it doesn't touch decimal or datetime modules. It would be surprising to add a method related to timestamp to the Decimal class.

...

This strategy would have negligible performance impact

There is no such performance issue: time.time() performance is exactly the same using my patch. Depending on the requested format, the performance may be better or worse. But even for Decimal, I think that the creation of Decimal is really "fast" (I should provide numbers :-)). Victor

Nick Coghlan

10:43 a.m.

On Wed, Feb 1, 2012 at 6:03 PM, Victor Stinner <victor.stinner@haypocalc.com> wrote:

...

If a callback protocol is used at all, there's no reason those details need to be exposed to the callbacks. Just choose an appropriate exponent based on the precision of the underlying API call.

...

You're assuming we're ever going to want timestamps that are something more than just a number. That's a *huge* leap (much bigger than increasing the precision, which is the problem we're dealing with now). With arbitrary length integers available, "integer, fraction, exponent" lets you express numbers to whatever precision you like, just as decimal.Decimal does (more on that below).

...

No, you wouldn't add a timestamp specific method to the Decimal class - you'd add one that let you easily construct a decimal from a fixed point representation (i.e. integer + fraction*10**exponent)

...

But this gets us to my final question. Given that Decimal supports arbitrary precision, *why* increase the complexity of the underlying API by supporting *other* output types? If you're not going to support arbitrary callbacks, why not just have a "high precision" flag to request Decimal instances and be done with it? datetime, timedelta and so forth would be able to get everything they needed from the Decimal value. As I said in my last message, both a 3-tuple (integer, fraction, exponent) based callback protocol effectively supporting arbitrary output types and a boolean flag to request Decimal values make sense to me and I could argue in favour of either of them. However, I don't understand the value you see in this odd middle ground of "instead of picking 1 arbitrary precision timestamp representation, whether an integer triple or decimal.Decimal, we're going to offer a few different ones and make you decide which one of them you actually want every time you call the API". That's seriously ducking our responsibilities as language developers - it's our job to make that call, not each user's. Given the way the discussion has gone, my preference is actually shifting strongly towards just returning decimal.Decimal instances when high precision timestamps are requested via a boolean flag. The flag isn't pretty, but it works, and the extra flexibility of a "type" parameter or a callback protocol doesn't really buy us anything once we have an output type that supports arbitrary precision. FWIW, I did a quick survey of what other languages seem to offer in terms of high resolution time interfaces: - Perl appears to have Time::HiRes (it seems to use floats in the API though, so I'm not sure how that works in practice) - C# (and the CLR) don't appear to care about POSIX and just offer 100 nanosecond resolution in their DateTime libraries - Java appears to have System.nanoTime(), no idea what they do for filesystem times However, I don't know enough about how the APIs in those languages work to do sensible searches. It doesn't appear to be a cleanly solved problem anywhere, though. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Victor Stinner

11:40 a.m.

...

If a callback protocol is used at all, there's no reason those details need to be exposed to the callbacks. Just choose an appropriate exponent based on the precision of the underlying API call.

If the clock divisor cannot be written as a power of 10, you loose precision, just because your format requires a power of 10. Using (seconds, floatpart, divisor) you don't loose any bit. The conversion function using this tuple can choose how to use these numbers and do its best to optimize the precision (e.g. choose how to round the division). By the way, my patch uses a dummy integer division (floatpart / divisor). I hesitate to round to the closest integer. For example, 19//10=1, whereas 2 whould be a better answer. A possibility is to use (floatpart + (divisor/2)) / divisor.

...

...
We may need more information when adding a new timestamp formats later. If we expose the "internal structure" used to compute any timestamp format, we cannot change the internal structure later without breaking (one more time) the API.

You're assuming we're ever going to want timestamps that are something more than just a number. That's a *huge* leap (much bigger than increasing the precision, which is the problem we're dealing with now).

I tried to design an API supporting future timestamp formats. For time methods, it is maybe not useful to produce directly a datetime object. But for os.stat(), it is just *practical* to get directly a high-level object. We may add a new float128 type later, and it would nice to be able to get a timestamp directly as a float128, without having to break the API one more time. Getting a timestamp as a Decimal to convert it to float128 is not optimal. That's why I don't like adding a boolean flag. It doesn't mean that we should add datetime.datetime or datetime.timedelta right now. It can be done later, or never :-)

...

No, you wouldn't add a timestamp specific method to the Decimal class - you'd add one that let you easily construct a decimal from a fixed point representation (i.e. integer + fraction*10**exponent)

Only if you use (intpart, floatpart, exponent). Would this function be useful for something else than timestamps?

...

But this gets us to my final question. Given that Decimal supports arbitrary precision, *why* increase the complexity of the underlying API by supporting *other* output types?

We need to support at least 3 formats: int, float and <high resolution format> (e.g. Decimal), to keep backward compatibilty.

...

datetime, timedelta and so forth would be able to get everything they needed from the Decimal value.

Yes. Getting timestamps directly as datetime or timedelta is maybe overkill. datetime gives more information than a raw number (int, float or Decimal): you don't have to care the start date of the timestamp. Internally, it would help to support Windows timestamps (number of 100 ns since 1601.1.1), even if we may have to convert the Windows timestamp to a Epoch timestamp if the user requests a number instead of a datetime object (for backward compatibility ?). Victor

Nick Coghlan

11:59 a.m.

On Wed, Feb 1, 2012 at 9:40 PM, Victor Stinner <victor.stinner@haypocalc.com> wrote:

...

If you would lose precision, make the decimal exponent (and hence fractional part) larger. You have exactly the same problem when converting to decimal, and the solution is the same (i.e. use as many significant digits as you need to preserve the underlying precision).

...

Introducing API complexity now for entirely theoretical future needs is a classic case of YAGNI (You Ain't Gonna Need It). Besides, float128 is a bad example - such a type could just be returned directly where we return float64 now. (The only reason we can't do that with Decimal is because we deliberately don't allow implicit conversion of float values to Decimal values in binary operations).

...

int and float are already supported today, and a process global switch works for that (since they're numerically interoperable). A per-call setting is only needed for Decimal due to its deliberate lack of implicit interoperability with binary floats.

...

That's a higher level concern though - not something the timestamp APIs themselves should be worrying about. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Jim J. Jewett

7:20 p.m.

In http://mail.python.org/pipermail/python-dev/2012-February/116073.html Nick Coghlan wrote:

...

If we could really replace float with another type, then there is no reason that type couldn't be a nearly trivial Decimal subclass which simply flips the default value of the (never used by any caller) allow_float parameter to internal function _convert_other. Since decimal inherits straight from object, this subtype could even be made to inherit from float as well, and to store the lower- precision value there. It could even produce the decimal version lazily, so as to minimize slowdown on cases that do not need the greater precision. Of course, that still doesn't answer questions on whether the higher precision is a good idea ... -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ

Victor Stinner

January 2012

12:20 p.m.

...

- use datetime (bad idea for the reasons Martin mentioned)

It is only a bad idea if it is the only available choice.

...

- use timedelta (not mentioned on the tracker, but a *much* better fit for a timestamp than datetime, since timestamps are relative to the epoch while datetime objects try to be absolute)

Last version of my patch supports also timedelta.

...

- a survey of what other languages are doing to cope with nanosecond time resolutions (as suggested by Raymond but not actually done as far I could see on the tracker)

I didn't check that right now. I don't know if it is really revelant because some languages don't have a builtin Decimal class or no "builtin" datetime module.

...

- how to avoid a negative performance impact on os.stat() (new API? flag argument? new lazily populated attributes accessed by name only?)

Because timestamp is an optional argument to os.stat() and the behaviour is unchanged by default, the performance impact of my patch on os.stat() is null (if you don't set timestamp argument).

...

Having a low-level module like os needing to know about higher-level types like decimal.Decimal and datetime.datetime (or even timedelta) should be setting off all kinds of warning bells.

What is the problem of using decimal in the os module? Especially if it is an option. In my patch version 6, the timestamp argument is now a type (e.g. decimal.Decimal) instead of a string, so the os module doesn't import directly the module (well, to be exact, it does import the module, but the module should already be in the cache, sys.modules).

...

You're right that supporting this does mean being able to at least select between 'int', 'float' and <high precision> output, but that's the kind of case that can be made most clearly in a PEP.

Why do you want to limit the available formats? Why not giving the choice to the user between Decimal, datetime and timedelta? Each type has a different use case and different features, sometimes exclusive. Victor

Stefan Behnel

1:19 p.m.

New subject: PEPs and cons (was: Re: Store timestamps as decimal.Decimal objects)

Nick Coghlan, 31.01.2012 12:11:

...

There was a keynote by Jan Lehnardt (of CouchDB fame) on last year's PyCon-DE on the end of language wars and why we should just give each other a hug and get along and all that. To seed some better understanding, he had come up with mottoes for the Ruby and Python language communities, which find themselves in continuous quarrel. I remember the motto for Python being "you do it right - and you document it". A clear hit IMHO. Decisions about language changes and environmental changes (such as the stdlib) aren't easily taken in the Python world, but when they are taken, they tend to show a good amount of well reflected common sense, and we make it transparent how they come to be by writing a PEP about them, so that we (and others) can go back and read them up later on when they are being questioned again or when similar problems appear in other languages. That's a good thing, and we should keep that up. Stefan

Alexander Belopolsky

6:57 p.m.

On Mon, Jan 30, 2012 at 6:31 PM, Victor Stinner <victor.stinner@haypocalc.com> wrote:

...

Alexander Belopolsky proposed to use time.time(format=datetime.datetime) instead.

Just to make sure my view is fully expressed: I am against adding flag arguments to time.time(). My preferred solution to exposing high resolution clocks is to do it in a separate module. You can even call the new function time() and access it as hirestime.time(). Longer names that reflect various time representation are also an option: hirestime.decimal_time(), hirestime.datetime_time() etc. The suggestion to use the actual type as a flag was motivated by the desire to require module import before fancy time.time() can be called. When you care about nanoseconds in your time stamps you won't tolerate an I/O delay between calling time() and getting the result. A separate module can solve this issue much better: simply import decimal or datetime or both at the top of the module.

Matt Joiner

January 2012

3:50 p.m.

...

Hi,

In issues #13882 and #11457, I propose to add an argument to functions returning timestamps to choose the timestamp format. Python uses float in most cases whereas float is not enough to store a timestamp with a resolution of 1 nanosecond. I added recently time.clock_gettime() to Python 3.3 which has a resolution of a nanosecond. The (first?) new timestamp format will be decimal.Decimal because it is able to store any timestamp in any resolution without loosing bits. Instead of adding a boolean argument, I would prefer to support more formats. My last patch provides the following formats:

- "float": float (used by default) - "decimal": decimal.Decimal - "datetime": datetime.datetime - "timespec": (sec, nsec) tuple # I don't think that we need it, it is just another example

The proposed API is:

time.time(format="datetime") time.clock_gettime(time.CLOCK_REALTIME, format="decimal") os.stat(path, timestamp="datetime) etc.

This API has an issue: importing the datetime or decimal object is implicit, I don't know if it is really an issue. (In my last patch, the import is done too late, but it can be fixed, it is not really a matter.)

Alexander Belopolsky proposed to use time.time(format=datetime.datetime) instead.

--

The first step would be to add an argument to functions returning timestamps. The second step is to accept these new formats (Decimal?) as input, for datetime.datetime.fromtimestamp() and os.utime() for example.

(Using decimal.Decimal, we may remove os.utimens() and use the right function depending on the timestamp resolution.)

--

I prefer Decimal over a dummy tuple like (sec, nsec) because you can do arithmetic on it: t2-t1, a+b, t/k, etc. It stores also the resolution of the clock: time.time() and time.clock_gettime() have for example different resolution (sec, ms, us for time.time() and ns for clock_gettime()).

The decimal module is still implemented in Python, but there is working implementation in C which is much faster. Store timestamps as Decimal can be a motivation to integrate the C implementation :-)

--

Examples with the time module:

$ ./python Python 3.3.0a0 (default:52f68c95e025+, Jan 26 2012, 21:54:31)

...
...
...
import time time.time() 1327611705.948446 time.time('decimal') Decimal('1327611708.988419') t1=time.time('decimal'); t2=time.time('decimal'); t2-t1 Decimal('0.000550') t1=time.time('float'); t2=time.time('float'); t2-t1 5.9604644775390625e-06 time.clock_gettime(time.CLOCK_MONOTONIC, 'decimal') Decimal('1211833.389740312') time.clock_getres(time.CLOCK_MONOTONIC, 'decimal') Decimal('1E-9') time.clock() 0.12 time.clock('decimal') Decimal('0.120000')

Examples with os.stat:

$ ./python Python 3.3.0a0 (default:2914ce82bf89+, Jan 30 2012, 23:07:24)

...
...
...
import os s=os.stat("setup.py", timestamp="datetime") s.st_mtime - s.st_ctime datetime.timedelta(0) print(s.st_atime - s.st_ctime) 52 days, 1:44:06.191293 os.stat("setup.py", timestamp="timespec").st_ctime (1323458640, 702327236) os.stat("setup.py", timestamp="decimal").st_ctime Decimal('1323458640.702327236')

Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/anacrolix%40gmail.com

Georg Brandl

10:22 p.m.

Am 31.01.2012 00:50, schrieb Matt Joiner:

...

Victor Stinner

4:08 a.m.

Hi, 2012/1/31 Matt Joiner <anacrolix@gmail.com>:

...

Sounds good, but I also prefer Alexander's method. The type information is already encoded in the class object.

Ok, I posted a patch version 6 to use types instead of strings. I also prefer types because it solves the "hidden import" issue.

...

Georg Brandl

12:49 p.m.

Am 31.01.2012 13:08, schrieb Victor Stinner:

...

Victor Stinner

1:41 p.m.

...

Matt Joiner

2:41 p.m.

...

Nick Coghlan

January 2012

11:16 p.m.

On Tue, Jan 31, 2012 at 9:31 AM, Victor Stinner <victor.stinner@haypocalc.com> wrote:

...

Victor Stinner

1:42 a.m.

...

I think this is definitely worth elaborating in a PEP (to recap the long discussion in #11457 if nothing else).

The discussion in issues #13882 and #11457 already lists many alternatives with their costs and benefits, but I can produce a PEP if you need a summary.

...

In particular, I'd want to see a very strong case being made for supporting multiple formats over standardising on a *single* new higher precision format (for example, using decimal.Decimal in conjunction with integration of Stefan's cdecimal work) that can then be converted to other formats (like datetime) via the appropriate APIs.

...

...
...
s=os.stat("setup.py", timestamp="datetime") print(s.st_atime - s.st_ctime) 52 days, 1:44:06.191293

Nick Coghlan

3:11 a.m.

On Tue, Jan 31, 2012 at 7:42 PM, Victor Stinner <victor.stinner@haypocalc.com> wrote:

...

Antoine Pitrou

4:13 a.m.

On Tue, 31 Jan 2012 21:11:37 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:

...

I'm -1 on using timedelta. This is a purity proposition that will make no sense to the average user. By the way, datetimes are relative too, by the same reasoning. Regards Antoine.

Alexander Belopolsky

11:08 a.m.

On Tue, Jan 31, 2012 at 7:13 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

FWIW, my vote is also for Decimal and against datetime or timedelta. (I dream of Decimal replacing float in Python 4000, so take my vote with an appropriate amount of salt. :-)

Mark Shannon

2:58 p.m.

Alexander Belopolsky wrote:

...

Nick Coghlan

January 2012

4:35 p.m.

On Wed, Feb 1, 2012 at 8:58 AM, Mark Shannon <mark@hotpy.org> wrote:

...

Antoine Pitrou

6:35 p.m.

On Wed, 1 Feb 2012 10:35:08 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:

...

Nick Coghlan

8:08 p.m.

On Wed, Feb 1, 2012 at 12:35 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

Antoine Pitrou

February 2012

3:08 a.m.

On Wed, 1 Feb 2012 14:08:34 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:

...

On Wed, Feb 1, 2012 at 12:35 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...
It strikes me as inelegant to have to do so much typing for something as simple as getting the current time. We should approach the simplicity of ``time.time(format='decimal')`` or ``time.decimal_time()``.

Getting the current time is simple (you can already do it), getting access to high precision time without performance regressions or backwards incompatiblities or excessive code duplication is hard.

The implementation of it might be hard, the API doesn't have to be. You can even use a callback system under the hood, you just don't have to *expose* that complication to the user.

...

There's a very simple rule in large scale software development: coupling is bad and you should do everything you can to minimise it.

...

Victor's approach throws that out the window by requiring that time and os know about every possible output format for time values.

Victor's proposal is maximalist in that it proposes several different output formats. Decimal is probably enough for real use cases, though.

...

For example, it would become *trivial* to write Alexander's suggested "hirestime" module that always returned decimal.Decimal objects:

Right, but that's not even a plausible request. Nobody wants to write a separate time module just to have a different return type. Regards Antoine.

Nick Coghlan

3:26 a.m.

On Wed, Feb 1, 2012 at 9:08 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

Right, but that's not even a plausible request. Nobody wants to write a separate time module just to have a different return type.

PJ Eby

9:27 a.m.

On Jan 31, 2012 11:08 PM, "Nick Coghlan" <ncoghlan@gmail.com> wrote:

...

The advantage is that it fits your brain better. That is, you don't have to remember another symbol besides the type you wanted. (There's probably fewer keystrokes involved, too.)

PJ Eby

February 2012

2:40 a.m.

On Tue, Jan 31, 2012 at 7:35 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

Matt Joiner

3:02 a.m.

Analysis paralysis commence. +1 for separate module using decimal. On Feb 1, 2012 1:44 PM, "PJ Eby" <pje@telecommunity.com> wrote:

...

Victor Stinner

8:03 a.m.

2012/2/1 Nick Coghlan <ncoghlan@gmail.com>:

...

This strategy would have negligible performance impact

Nick Coghlan

10:43 a.m.

On Wed, Feb 1, 2012 at 6:03 PM, Victor Stinner <victor.stinner@haypocalc.com> wrote:

...

If a callback protocol is used at all, there's no reason those details need to be exposed to the callbacks. Just choose an appropriate exponent based on the precision of the underlying API call.

...

No, you wouldn't add a timestamp specific method to the Decimal class - you'd add one that let you easily construct a decimal from a fixed point representation (i.e. integer + fraction*10**exponent)

...

Victor Stinner

11:40 a.m.

...

If a callback protocol is used at all, there's no reason those details need to be exposed to the callbacks. Just choose an appropriate exponent based on the precision of the underlying API call.

...

...
We may need more information when adding a new timestamp formats later. If we expose the "internal structure" used to compute any timestamp format, we cannot change the internal structure later without breaking (one more time) the API.

You're assuming we're ever going to want timestamps that are something more than just a number. That's a *huge* leap (much bigger than increasing the precision, which is the problem we're dealing with now).

...

No, you wouldn't add a timestamp specific method to the Decimal class - you'd add one that let you easily construct a decimal from a fixed point representation (i.e. integer + fraction*10**exponent)

Only if you use (intpart, floatpart, exponent). Would this function be useful for something else than timestamps?

...

But this gets us to my final question. Given that Decimal supports arbitrary precision, *why* increase the complexity of the underlying API by supporting *other* output types?

We need to support at least 3 formats: int, float and <high resolution format> (e.g. Decimal), to keep backward compatibilty.

...

datetime, timedelta and so forth would be able to get everything they needed from the Decimal value.

Nick Coghlan

11:59 a.m.

On Wed, Feb 1, 2012 at 9:40 PM, Victor Stinner <victor.stinner@haypocalc.com> wrote:

...

That's a higher level concern though - not something the timestamp APIs themselves should be worrying about. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Jim J. Jewett

February 2012

7:20 p.m.

In http://mail.python.org/pipermail/python-dev/2012-February/116073.html Nick Coghlan wrote:

...

Victor Stinner

January 2012

12:20 p.m.

...

- use datetime (bad idea for the reasons Martin mentioned)

It is only a bad idea if it is the only available choice.

...

- use timedelta (not mentioned on the tracker, but a *much* better fit for a timestamp than datetime, since timestamps are relative to the epoch while datetime objects try to be absolute)

Last version of my patch supports also timedelta.

...

- a survey of what other languages are doing to cope with nanosecond time resolutions (as suggested by Raymond but not actually done as far I could see on the tracker)

I didn't check that right now. I don't know if it is really revelant because some languages don't have a builtin Decimal class or no "builtin" datetime module.

...

- how to avoid a negative performance impact on os.stat() (new API? flag argument? new lazily populated attributes accessed by name only?)

Because timestamp is an optional argument to os.stat() and the behaviour is unchanged by default, the performance impact of my patch on os.stat() is null (if you don't set timestamp argument).

...

Having a low-level module like os needing to know about higher-level types like decimal.Decimal and datetime.datetime (or even timedelta) should be setting off all kinds of warning bells.

...

You're right that supporting this does mean being able to at least select between 'int', 'float' and <high precision> output, but that's the kind of case that can be made most clearly in a PEP.