[issue22356] mention explicitly that stdlib assumes gmtime(0) epoch is 1970

New submission from Akira Li: See discussion on Python-ideas https://mail.python.org/pipermail/python-ideas/2014-September/029228.html ---------- assignee: docs@python components: Documentation files: docs-time-epoch_is_1970.diff keywords: patch messages: 226539 nosy: akira, docs@python priority: normal severity: normal status: open title: mention explicitly that stdlib assumes gmtime(0) epoch is 1970 type: behavior versions: Python 2.7, Python 3.4, Python 3.5 Added file: http://bugs.python.org/file36567/docs-time-epoch_is_1970.diff _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue22356> _______________________________________

Changes by Chris Rebert <pybugs@rebertia.com>: ---------- nosy: +cvrebert _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue22356> _______________________________________

Chris Rebert added the comment: Ping. This small patch has been waiting nearly 3 months for a review. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue22356> _______________________________________

Changes by Ned Deily <nad@acm.org>: ---------- nosy: +belopolsky _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue22356> _______________________________________

Alexander Belopolsky added the comment: I don't like the proposed note. 1. It is not the job of the time module documentation to warn about "many functions in the stdlib." What are these functions, BTW? 2. What is "calendar time in POSIX encoding"? This sounds like what time.asctime() returns. I think an improvement would be to spell Epoch with a capital E and define it as "The time zero hours, zero minutes, zero seconds, on January 1, 1970 Coordinated Universal Time (UTC)." See <http://pubs.opengroup.org/onlinepubs/9699919799>. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue22356> _______________________________________

Akira Li added the comment:
Alexander Belopolsky added the comment:
1. It is not the job of the time module documentation to warn about "many functions in the stdlib." What are these functions, BTW?
The e-mail linked in the first message of this issue msg226539 enumerates some of the functions: https://mail.python.org/pipermail/python-ideas/2014-September/029228.html
2. What is "calendar time in POSIX encoding"? This sounds like what time.asctime() returns.
It is the language used by C standard for time() function: The time function determines the current calendar time. The encoding of the value is unspecified.
I think an improvement would be to spell Epoch with a capital E and define it as "The time zero hours, zero minutes, zero seconds, on January 1, 1970 Coordinated Universal Time (UTC)." See <http://pubs.opengroup.org/onlinepubs/9699919799>.
The word *epoch* (lowercase) is used by C standard. It is not enough to say that time module uses POSIX epoch (Epoch) e.g., a machine may use "right" zoneinfo (the same epoch year 1970) but the timestamp for the same UTC time are different by number of leap seconds (10+25 since 2012). POSIX encoding implies that the formula works: utc_time = datetime(1970, 1, 1) + timedelta(seconds=posix_timestamp) if time.time() doesn't return posix_timestamp than "many functions in the stdlib" will break. It is possible to inspect all stdlib functions that use time module and determine for some of them whether they will break if gmtime(0) is not 1970 or "right" zoneinfo is used or any non-POSIX time encoding is used. But it is hard to maintain such a list because any future code change may affect the behavior. I prefer a vague statement ("many functions") over a possible lie (the documentation shouldn't make promises that the implementation can't keep). POSIX language is (intentionally) vague and avoids SI seconds vs. UT1 (mean solar) seconds distinction. I don't consider systems where "seconds" doesn't mean SI seconds used by UTC time scale. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue22356> _______________________________________

Alexander Belopolsky added the comment: In the context of Python library documentation, the word "encoding" strongly suggests that you are dealing with string/bytes. The situation may be different in C. If you want to refer to something that is defined by the POSIX standard you should use the words that can actually be found in that standard. When I search for "encoding" at <http://pubs.opengroup.org/onlinepubs/9699919799/>, I get crypt - string encoding function (CRYPT) encrypt - encoding function (CRYPT) setkey - set encoding key (CRYPT) and nothing related to time. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue22356> _______________________________________

Akira Li added the comment:
Alexander Belopolsky added the comment:
In the context of Python library documentation, the word "encoding" strongly suggests that you are dealing with string/bytes. The situation may be different in C. If you want to refer to something that is defined by the POSIX standard you should use the words that can actually be found in that standard.
When I search for "encoding" at <http://pubs.opengroup.org/onlinepubs/9699919799/>, I get
crypt - string encoding function (CRYPT) encrypt - encoding function (CRYPT) setkey - set encoding key (CRYPT)
and nothing related to time.
I've provide the direct quote from *C* standard in my previous message msg231957:
2. What is "calendar time in POSIX encoding"? This sounds like what time.asctime() returns.
It is the language used by C standard for time() function: The time function determines the current calendar time. The encoding of the value is unspecified. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ <- from the C standard notice the word *encoding* in the quote. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue22356> _______________________________________

Alexander Belopolsky added the comment:
It is possible to inspect all stdlib functions that use time module and determine for some of them whether they will break if gmtime(0) is not 1970 or "right" zoneinfo is used or any non-POSIX time encoding is used. But it is hard to maintain such a list because any future code change may affect the behavior.
Let's not confuse the issue of gmtime(0) not being 1970-01-01T00 and localtime() expecting non-POSIX time_t. Since gmtime(0) is the same on all platforms supported by Python, it is a fair game to rely on this fact in Python code. The issue of "right" zoneinfo is different: at least two major Python platforms (OS X and Linux) can be configured in a non-POSIX way. The decision not to support these configurations in the datetime module was deliberate, but some partial support can be added. For example, datetime.astimezone() cannot work correctly in the "right" timezone because datetime.second cannot be 60, but if it returns values that are off by some 20 seconds in other times, I would call it a bug, but many will disagree. I don't know how popular configurations with right timezones are, but testing Python stdlib in those configurations can only help the overall stdlib quality. (Unfortunately, at the moment we have have very few tests even for the mainstream timezones such as Europe/Moscow.) ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue22356> _______________________________________

Alexander Belopolsky added the comment:
I've provide the direct quote from *C* standard ...
I understand that C standard uses the word "encoding", but it does so for a reason that is completely unrelated to the choice of epoch. "Encoding" is how the bytes in memory should be interpreted as "number of seconds" or some other notion of time. For, example "two's complement little-endian 32-bit signed int" is an example of valid time_t encoding, another example would be IEEE 754 big-endian 64-bit double. Note that these choices are valid for both C and POSIX standards. If you google for your phrase "time in POSIX encoding", this issue is the only hit. This strongly suggests that your choice of words is not the most natural. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue22356> _______________________________________

Akira Li added the comment:
Alexander Belopolsky added the comment:
I've provide the direct quote from *C* standard ...
I understand that C standard uses the word "encoding", but it does so for a reason that is completely unrelated to the choice of epoch. "Encoding" is how the bytes in memory should be interpreted as "number of seconds" or some other notion of time. For, example "two's complement little-endian 32-bit signed int" is an example of valid time_t encoding, another example would be IEEE 754 big-endian 64-bit double. Note that these choices are valid for both C and POSIX standards.
I agree one *part* of "encoding" is how time_t is *represented* in memory but it is not the only part e.g.: The mktime function converts the broken-down time, expressed as local time, in the structure pointed to by timeptr into a calendar time value with the same encoding as that of the values returned by the time function. notice: "the same encoding as ... returned by the time function". time() function can return values with different epoch (implementation defined). mktime() is specified to use the *same* encoding i.e., the same epoch, etc. i.e., [in simple words] we have calendar time (Gregorian date, time) and we can convert it to a number (e.g., Python integer), we can call that number "seconds" and we can represent that number as some (unspecified) bit-pattern in C. I consider the whole process of converting "time" to a bit-pattern in memory as "encoding" i.e., "32/64, un/signed int/754 double" is just *part* of it e.g., 1. specify that 1970-01-01T00:00:00Z is zero (0) 2. specify 0 has time_t type 3. specify how time_t type is represented in memory. I may be wrong that C standard includes the first item in time "encoding".
If you google for your phrase "time in POSIX encoding", this issue is the only hit. This strongly suggests that your choice of words is not the most natural.
I've googled the phrase (no surrounding quotes) and the links talk about time encoded as POSIX time [1] and some *literally* contain the phrase *POSIX encoding* [2] because *Python* documentation for calendar.timegm contains it [3]: [timegm] returns the corresponding Unix timestamp value, assuming an epoch of 1970, and the POSIX encoding. In fact, time.gmtime() and timegm() are each others’ inverse. In an effort to avoid personal influence, I've repeated the expreriment using Tor browser and other search engines -- the result is the same. timegm() documentation might be the reason why I've used the phrase. I agree "POSIX encoding" might be unclear. The patch could be replaced by any phrase that expresses that some functions in stdlib assume that time.time() returns (+/- fractional part) "seconds since the Epoch" as defined by POSIX [4]. [1] http://en.wikipedia.org/wiki/Unix_time#Encoding_time_as_a_number [2] http://ruslanspivak.com/2011/07/20/how-to-convert-python-utc-datetime-object... [3] https://docs.python.org/3/library/calendar.html#calendar.timegm [4] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_... ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue22356> _______________________________________
participants (4)
-
Akira Li
-
Alexander Belopolsky
-
Chris Rebert
-
Ned Deily