[PYTHON DB-SIG] [comp.lang.python] Date-Time requirements (esp. for databases)

Fredrik Lundh fredrik_lundh@ivab.se
Thu, 31 Oct 1996 09:30:51 +0100


> This date ordering is the recommended practice for U.S. government
> agencies[1].

Related to the international standard, I suppose.  Here's some
background info I found on the web a few month ago, including a
pointer to more stuff.  Share and enjoy (FWIW, Sweden adopted ISO 8601
back in the seventies).

	/F

--------------------------------------------------------------------
Date: Mon Jul 12 17:00:39 1996
Author: Fredrik Lundh
Subject: International Standard Date and Time Notation

A Summary of the International Standard Date and Time Notation

International Standard ISO 8601 specifies numeric representations of date
and time. This standard notation helps to avoid confusion in international
communication caused by the many different national notations and increases
the portability of computer user interfaces. In addition, these formats have
several important advantages for computer usage compared to other
traditional date and time notations. The time notation described here is
already the de-facto standard in almost all countries and the date notation
is becoming increasingly popular.

Especially software engineers who design user interfaces, file formats, and
communication protocols should be familiar with ISO 8601.

Contents: Date, Time of Day, Time Zone.

Date

The international standard date notation is

YYYY-MM-DD

where YYYY is the year in the usual Gregorian calendar, MM is the month of
the year between 01 (January) and 12 (December), and DD is the day of the
month between 01 and 31.

For example, the fourth day of February in the year 1995 is written in the
standard notation as

1995-02-04

Other commonly used notations are e.g. 2/4/95, 4/2/95, 95/2/4, 4.2.1995,
04-FEB-1995, 4-February-1995, and many more. Especially the first two
examples are dangerous, because as both are used quite often in the USA and
in Great Britain and both can not be distinguished, it is unclear whether
2/4/95 means 1995-04-02 or 1995-02-04. The date notation 2/4/5 has at least
six reasonable interpretations (assuming that only the twentieth and
twenty-first century are reasonable candidates in our live time).

Advantages of the ISO 8601 standard date notation compared to other commonly
used variants:

* easily readable and writeable by software (no 'JAN', 'FEB', ... table
necessary)
* easily comparable and sortable with a trivial string comparison
* language independent
* can not be confused with other popular date notations
* consistency with the common 24h time notation system, where the larger
units (hours) are also written in front of the smaller ones (minutes
and seconds)
* strings containing a date followed by a time are also easily comparable
and sortable (e.g. write "1995-02-04 22:45:00")
* the notation is short and has constant length, which makes both
keyboard data entry and table layout easier
* identical to the Chinese date notation, so the largest cultural group
(>25%) on this planet is already familiar with it :-)
* date notations with the order "year, month, day" are in addition
already widely used e.g. in Japan, Korea, Hungary, Sweden, Finland,
Denmark, and a few other countries and people in the US are already
used to at least the "month, day" order
* a 4-digit year representation avoids overflow problems after 1999-12-31

As dates will look a little bit strange anyway starting with 2000-01-01
(e.g. like 1/1/0), it has been suggested that the year 2000 is an excellent
opportunity to change to the standard date notation.

Apart from the recommended primary standard notation YYYY-MM-DD, ISO 8601
also specifies a number of alternative formats for use in applications with
special requirements. All of these alternatives can easily and automatically
be distinguished from each other:

The hyphens can be omitted if compactness of the representation is more
important than human readability, for example as in

19950204

For situations where information about the century is really not required, a
2-digit year representation is available:

95-02-04 or 950204

If only the month or even only the year is of interest:

1995-02 or 1995

In commercial and industrial applications (delivery times, production plans,
etc.), especially in Europe, it is often required to refer to a week of a
year. Week 01 of a year is per definition the first week that has the
Thursday in this year, which is equivalent to the week that contains the
fourth day of January. In other words, the first week of a new year is the
week that has the majority of its days in the new year. Week 01 might also
contain days from the previous year and the week before week 01 of a year is
the last week (52 or 53) of the previous year even if it contains days from
the new year. A week starts with Monday (day 1) and ends with Sunday (day
7). For example, the first week of the year 1997 lasts from 1996-12-30 to
1997-01-05 and can be written in standard notation as

1997-W01 or 1997W01

The week notation can also be extended by a number indicating the day of the
week. For example, the day 1996-12-31, which is the Tuesday (day 2) of the
first week of 1997, can also be written as

1997-W01-2 or 1997W012

for applications like industrial planning where many things like shift
rotations are organized per week and knowing the week number and the day of
the week is more handy than knowing the day of the month.

An abbreviated version of the year and week number like

95W05

is sometimes useful as a compact code printed on a product that indicates
when it has been manufactured.

The ISO standard avoids explicitly stating the possible range of week
numbers, but this can easily be deduced from the definition:

Theorem: Possible ISO week numbers are in the range 01 to 53. A
year always has a week 52. (There is one historic exception: the
year in which the Gregorian calendar was introduced had less than
365 days and less than 52 weeks.)

Proof: Per definition, the first week of a year is W01 and
consequently days before week W01 belong to the previous year and
so there is no week with lower numbers. Considering the highest
possible week number, the worst case is a leap year like 1976 that
starts with a Thursday, because this keeps the highest possible
number of days of W01 in the previous year, i.e. 3 days. In this
case, the Sunday of W52 of the worst case year is day number
4+51*7=361 and 361-366=5 days of W53 belong still to this year,
which guarantees that in the worst case year day 4 (Thursday) of
W53 is not yet in the next year, so a week number 53 is possible.
For example, the 53 weeks of the worst case year 1976 started with
1975-12-29 = 1976-W01-1 and ended with 1977-01-02 = 1976-W53-7. On
the other hand, considering the lowest number of the last week of
a year, the worst case is a non-leap year like 1999 that starts
with a Friday, which ensures that the first three days of the year
belong to the last week of the previous year. In this case, the
Sunday of week 52 would be day number 3+52*7=367, i.e. only the
last 367-365=2 days of the W52 reach into the next year and
consequently, even a worst case year like 1999 has a week W52
including the days 1999-12-27 to 2000-01-02. q.e.d.

[Unfortunately, the current version of the C programming language standard
provides in the strftime() function no means to generate the ISO 8601 week
notation. A required extension would be four new formatting codes: for the
year of the week to which the specified day belongs (both 2-digit and
4-digit), for the number of the week between 01 and 53, and for the day of
the week between 1 (Monday) and 7 (Sunday). Another trivial mistake in the
description of strftime() in the C standard is that the range of seconds
goes from 00 to 61, because at one time only one single leap second 60 can
be inserted into UTC and consequently there will never be a leap second 61.]

Both day and year are useful units of structuring time, because the position
of the sun on the sky, which influences our lives, is described by them.
However the 12 months of a year are of some obscure mystic origin and have
no real purpose today except that people are used to having them (they do
not even describe the current position of the moon). In some applications, a
date notation is preferred that uses only the year and the day of the year
between 001 and 365 (366 in leap years). The standard notation for this
variant representing the day 1995-02-04 (that is day 035 of the year 1995)
is

1995-035 or 1995035

Leap years are years with an additional day YYYY-02-29, where the year
number is a multiple of four with the following exception: If a year is a
multiple of 100, then it is only a leap year if it is also a multiple of
400. For example, 1900 was not a leap year, but 2000 is one.

Time of Day

The international standard notation for the time of day is

hh:mm:ss

where hh is the number of complete hours that have passed since midnight
(00-24), mm is the number of complete minutes that have passed since the
start of the hour (00-59), and ss is the number of seconds since the start
of the minute (00-60). The value 60 for ss appears only in case of an
inserted leap second into an atomic time scale like UTC in order to keep it
synchronized with a less constant astronomical time scale like UT1. The hour
value 24 is only possible when the minute and second values are zero. An
example time is

23:59:59

which represents the time one second before midnight.

As with the date notation, the separating colons can also be omitted as in

235959

and the precision can be reduced by omitting the seconds or both the seconds
and minutes as in

23:59, 2359, or 23

It is also possible to add fractions of a second after a decimal dot, e.g.
the time 5.8 ms before midnight can be written as

23:59:59.9942 or 235959.9942

As every day both starts and ends with midnight, the two notations 00:00 and
24:00 are available to distinguish the two midnights that can be associated
with one date. This means that the following two strings refer to exactly
the same point in time:

1995-02-04 24:00 = 1995-02-05 00:00

In case an unambiguous representation of time is required, 00:00 is usually
the preferred notation for midnight, which is also what most digital clocks
display.

A remark for readers from the US:

The 24h time notation specified here has already been the de-facto
standard all over the world in written language for decades. The
only exception are some English speaking countries, where still
notations with hours between 1 and 12 and additions like "a.m."
and "p.m." are in wide use. The common 24h international standard
notation starts to get widely used now even in England. Other
languages than English don't even have abbreviations like "a.m."
and "p.m." and the 12h notation is certainly hardly ever used on
Continental Europe to write or display a time. The old English 12h
notation has many disadvantages like:

* It is not clear, how 00:00, 12:00 and 24:00 are represented
(even encyclopedias and style manuals contain contradicting
descriptions and a common quick fix seems to be to avoid
"12:00 a.m./p.m." altogether and write "noon", "midnight" or
"12:01 a.m./p.m." instead).
* It makes people often believe that the next day starts at the
overflow from "12:59 a.m." to "1:00 a.m.", which is a common
problem not only when people try to program the timer of VCRs
shortly after midnight.
* It is not easily comparable with a string compare operation.
* It is not immediately clear for the unaware, whether the time
between "12:00 a.m./p.m." and "1:00 a.m./p.m." starts at
00:00 or at 12:00, i.e. the English 12h notation is more
difficult to understand.
* It is longer than the normal 24h notation.

The 12h time is simply a relic from the dark ages when Roman
numerals were used, the number zero had not yet been invented and
analog clocks where the only known form of displaying a time.
Please avoid using it today, especially in technical applications!
Even in the US, the widely respected Chicago Manual of Style
recommends now to use the international standard time notation in
publications.

Time Zone

Without any further additions, a date and time as written above is assumed
to be in some local time zone. In order to indicate that a time is measured
in Universal Time (UTC), you can append a capital letter Z to a time as in

23:59:59Z or 2359Z

[The Z stands for the "zero meridian", which goes through Greenwich in
London, and it is also commonly used in radio communication where it is
pronounced "Zulu" (the word for Z in the NATO radio alphabet). Universal
Time (sometimes also called "Zulu Time") was called Greenwich Mean Time
(GMT) before 1972, however this term should no longer be used. Since the
introduction of an international atomic time scale, almost all existing
civil time zones are now related to UTC, which is slightly different from
the old and now unused GMT.]

The strings

+hh:mm, +hhmm, or +hh

can be added to the time to indicate that the used local time zone is hh
hours and mm minutes ahead of UTC. For time zones west of the zero meridian,
which are behind UTC, the notation

-hh:mm, -hhmm, or -hh

is used instead. For example, Middle European Time (MET) is +0100 and
US/Canadian Eastern Standard Time (EST) is -0500. The following strings all
indicate the same point of time:

12:00Z = 13:00+01:00 = 0700-0500

There exists no international standard that specifies abbreviations for
civil time zones like MET, EST, etc. and sometimes the same abbreviation is
even used for two very different time zones. In addition, politicians enjoy
modifying the rules for civil time zones, especially for daylight saving
times, every few years, so the only really reliable way of describing a
local time zone is to specify numerically the difference of local time to
UTC. Better use directly UTC as your only time zone where this is possible
and then you do not have to worry about time zones and daylight saving time
changes at all.

For those readers interested in more information about time zones: Arthur
David Olson maintains a database of all current and many historic time zone
changes. It is available via ftp from elsie.nci.nih.gov in the tzcode* and
tzdata* files. Most Unix time zone handling implementations are based on
this package. If you want to join the tz@elsie.nci.nih.gov mailing list,
which is dedicated to discussions about time zones, please send a short
message to tz-request@elsie.nci.nih.gov.

----------------------------------------------------------------------------

This was a brief overview of the ISO 8601 standard, which covers only the
most useful notations and includes some additional related information. The
full standard defines in addition a number of more exotic notations
including some for periods of time. The ISO 8601 document is unfortunately
not available online and interested people will have to order a paper copy
from

International Organization for Standardization
Case postale 56
1, rue de Varembi
CH-1211 Genhve 20
Switzerland

phone: +41 22 749 01 11
fax: +41 22 733 34 30
e-mail: sales@isocs.iso.ch

A more detailed online summary of ISO 8601 than this one is the text ISO
8601:1988 Date/Time Representations available from
ftp.informatik.uni-erlangen.de/pub/doc/ISO/ISO8601.ps.Z (PostScript, 16 kb,
5 pages) written by Gary Houston, which is now also available in HTML.

----------------------------------------------------------------------------

I wish to thank Edward M. Reingold for developing the fine GNU Emacs
calendar functions, as well as Rich Wales, Mark Brader, Paul Eggert, and
others in the comp.std.internat, comp.protocols.time.ntp, and sci.astro
USENET discussion groups for valuable comments about this text. Further
comments and hyperlinks to this page are very welcome.

Markus Kuhn <mskuhn@cip.informatik.uni-erlangen.de>, 1996-05-14

--------------------------------------------------------------------

=================
DB-SIG  - SIG on Tabular Databases in Python

send messages to: db-sig@python.org
administrivia to: db-sig-request@python.org
=================