
I sent this out the 15th, but it seems to have gotten lost since I have reports that it didn't show up and I got no feedback. So here goes again, a new version of PEP 431. http://www.python.org/dev/peps/pep-0431/ //Lennart PEP: 431 Title: Time zone support improvements Version: $Revision$ Last-Modified: $Date$ Author: Lennart Regebro <regebro@gmail.com> BDFL-Delegate: Barry Warsaw <barry@python.org> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 11-Dec-2012 Post-History: 11-Dec-2012, 28-Dec-2012 Abstract ======== This PEP proposes the implementation of concrete time zone support in the Python standard library, and also improvements to the time zone API to deal with ambiguous time specifications during DST changes. Proposal ======== Concrete time zone support -------------------------- The time zone support in Python has no concrete implementation in the standard library outside of a tzinfo baseclass that supports fixed offsets. To properly support time zones you need to include a database over all time zones, both current and historical, including daylight saving changes. But such information changes frequently, so even if we include the last information in a Python release, that information would be outdated just a few months later. Time zone support has therefore only been available through two third-party modules, ``pytz`` and ``dateutil``, both who include and wrap the "zoneinfo" database. This database, also called "tz" or "The Olsen database", is the de-facto standard time zone database over time zones, and it is included in most Unix and Unix-like operating systems, including OS X. This gives us the opportunity to include the code that supports the zoneinfo data in the standard library, but by default use the operating system's copy of the data, which typically will be kept updated by the updating mechanism of the operating system or distribution. For those who have an operating system that does not include the zoneinfo database, for example Windows, the Python source distribution will include a copy of the zoneinfo database, and a distribution containing the latest zoneinfo database will also be available at the Python Package Index, so it can be easily installed with the Python packaging tools such as ``easy_install`` or ``pip``. This could also be done on Unices that are no longer receiving updates and therefore have an outdated database. With such a mechanism Python would have full time zone support in the standard library on any platform, and a simple package installation would provide an updated time zone database on those platforms where the zoneinfo database isn't included, such as Windows, or on platforms where OS updates are no longer provided. The time zone support will be implemented by making the ``datetime`` module into a package, and adding time zone support to ``datetime`` based on Stuart Bishop's ``pytz`` module. Getting the local time zone --------------------------- On Unix there is no standard way of finding the name of the time zone that is being used. All the information that is available is the time zone abbreviations, such as ``EST`` and ``PDT``, but many of those abbreviations are ambiguous and therefore you can't rely on them to figure out which time zone you are located in. There is however a standard for finding the compiled time zone information since it's located in ``/etc/localtime``. Therefore it is possible to create a local time zone object with the correct time zone information even though you don't know the name of the time zone. A function in ``datetime`` should be provided to return the local time zone. The support for this will be made by integrating Lennart Regebro's ``tzlocal`` module into the new ``datetime`` module. For Windows it will look up the local Windows time zone name, and use a mapping between Windows time zone names and zoneinfo time zone names provided by the Unicode consortium to convert that to a zoneinfo time zone. The mapping should be updated before each major or bugfix release, scripts for doing so will be provided in the ``Tools/`` directory. Ambiguous times --------------- When changing over from daylight savings time (DST) the clock is turned back one hour. This means that the times during that hour happens twice, once without DST and then once with DST. Similarly, when changing to daylight savings time, one hour goes missing. The current time zone API can not differentiate between the two ambiguous times during a change from DST. For example, in Stockholm the time of 2012-11-28 02:00:00 happens twice, both at UTC 2012-11-28 00:00:00 and also at 2012-11-28 01:00:00. The current time zone API can not disambiguate this and therefore it's unclear which time should be returned:: # This could be either 00:00 or 01:00 UTC: >>> dt = datetime(2012, 10, 28, 2, 0, tzinfo=zoneinfo('Europe/Stockholm')) # But we can not specify which: >>> dt.astimezone(zoneinfo('UTC')) datetime.datetime(2012, 10, 28, 1, 0, tzinfo=<UTC>) ``pytz`` solved this problem by adding ``is_dst`` parameters to several methods of the tzinfo objects to make it possible to disambiguate times when this is desired. This PEP proposes to add these ``is_dst`` parameters to the relevant methods of the ``datetime`` API, and therefore add this functionality directly to ``datetime``. This is likely the hardest part of this PEP as this involves updating the C version of the ``datetime`` library with this functionality, as this involved writing new code, and not just reorganizing existing external libraries. Implementation API ================== The zoneinfo database --------------------- The latest version of the zoneinfo database should exist in the ``Lib/tzdata`` directory of the Python source control system. This copy of the database should be updated before every Python feature and bug-fix release, but not for releases of Python versions that are in security-fix-only-mode. Scripts to update the database will be provided in ``Tools/``, and the release instructions will be updated to include this update. New configure options ``--enable-internal-timezone-database`` and ``--disable-internal-timezone-database`` will be implemented to enable and disable the installation of this database when installing from source. A source install will default to installing them. Binary installers for systems that have a system-provided zoneinfo database may skip installing the included database since it would never be used for these platforms. For other platforms, for example Windows, binary installers must install the included database. Changes in the ``datetime``-module ---------------------------------- The public API of the new time zone support contains one new class, one new function, one new exception and four new collections. In addition to this, several methods on the datetime object gets a new ``is_dst`` parameter. New class ``DstTzInfo`` ^^^^^^^^^^^^^^^^^^^^^^^^ This class provides a concrete implementation of the ``zoneinfo`` base class that implements DST support. New function ``zoneinfo(name=None, db_path=None)`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This function takes a name string that must be a string specifying a valid zoneinfo time zone, i.e. "US/Eastern", "Europe/Warsaw" or "Etc/GMT". If not given, the local time zone will be looked up. If an invalid zone name is given, or the local time zone can not be retrieved, the function raises `UnknownTimeZoneError`. The function also takes an optional path to the location of the zoneinfo database which should be used. If not specified, the function will look for databases in the following order: 1. Use the database in ``/usr/share/zoneinfo``, if it exists. 2. Check if the `tzdata-update` module is installed, and then use that database. 3. Use the Python-provided database in ``Lib/tzdata``. If no database is found an ``UnknownTimeZoneError`` or subclass thereof will be raised with a message explaining that no zoneinfo database can be found, but that you can install one with the ``tzdata-update`` package. New parameter ``is_dst`` ^^^^^^^^^^^^^^^^^^^^^^^^ A new ``is_dst`` parameter is added to several methods to handle time ambiguity during DST changeovers. * ``tzinfo.utcoffset(dt, is_dst=False)`` * ``tzinfo.dst(dt, is_dst=False)`` * ``tzinfo.tzname(dt, is_dst=False)`` * ``datetime.astimezone(tz, is_dst=False)`` The ``is_dst`` parameter can be ``False`` (default), ``True``, or ``None``. ``False`` will specify that the given datetime should be interpreted as not happening during daylight savings time, i.e. that the time specified is after the change from DST. ``True`` will specify that the given datetime should be interpreted as happening during daylight savings time, i.e. that the time specified is before the change from DST. ``None`` will raise an ``AmbiguousTimeError`` exception if the time specified was during a DST change over. It will also raise a ``NonExistentTimeError`` if a time is specified during the "missing time" in a change to DST. New exceptions ^^^^^^^^^^^^^^ * ``UnknownTimeZoneError`` This exception is a subclass of KeyError and raised when giving a time zone specification that can't be found:: >>> datetime.Timezone('Europe/New_York') Traceback (most recent call last): ... UnknownTimeZoneError: There is no time zone called 'Europe/New_York' * ``InvalidTimeError`` This exception serves as a base for ``AmbiguousTimeError`` and ``NonExistentTimeError``, to enable you to trap these two separately. It will subclass from ValueError, so that you can catch these errors together with inputs like the 29th of February 2011. * ``AmbiguousTimeError`` This exception is raised when giving a datetime specification that is ambiguous while setting ``is_dst`` to None:: >>> datetime(2012, 11, 28, 2, 0, tzinfo=zoneinfo('Europe/Stockholm'), is_dst=None) >>> Traceback (most recent call last): ... AmbiguousTimeError: 2012-10-28 02:00:00 is ambiguous in time zone Europe/Stockholm * ``NonExistentTimeError`` This exception is raised when giving a datetime specification that is ambiguous while setting ``is_dst`` to None:: >>> datetime(2012, 3, 25, 2, 0, tzinfo=zoneinfo('Europe/Stockholm'), is_dst=None) >>> Traceback (most recent call last): ... NonExistentTimeError: 2012-03-25 02:00:00 does not exist in time zone Europe/Stockholm New collections ^^^^^^^^^^^^^^^ * ``all_timezones`` is the exhaustive list of the time zone names that can be used, listed alphabethically. * ``all_timezones_set`` is a set of the time zones in ``all_timezones``. * ``common_timezones`` is a list of useful, current time zones, listed alphabethically. * ``common_timezones_set`` is a set of the time zones in ``common_timezones``. The ``tzdata-update``-package ----------------------------- The zoneinfo database will be packaged for easy installation with ``easy_install``/``pip``/``buildout``. This package will not install any Python code, and will not contain any Python code except that which is needed for installation. It will be kept updated with the same tools as the internal database, but released whenever the ``zoneinfo``-database is updated, and use the same version schema. Differences from the ``pytz`` API ================================= * ``pytz`` has the functions ``localize()`` and ``normalize()`` to work around that ``tzinfo`` doesn't have is_dst. When ``is_dst`` is implemented directly in ``datetime.tzinfo`` they are no longer needed. * The ``timezone()`` function is called ``zoneinfo()`` to avoid clashing with the ``timezone`` class introduced in Python 3.2. * ``zoneinfo()`` will return the local time zone if called without arguments. * The class ``pytz.StaticTzInfo`` is there to provide the ``is_dst`` support for static time zones. When ``is_dst`` support is included in ``datetime.tzinfo`` it is no longer needed. * ``InvalidTimeError`` subclasses from ``ValueError``. Resources ========= * http://pytz.sourceforge.net/ * http://pypi.python.org/pypi/tzlocal * http://pypi.python.org/pypi/python-dateutil * http://unicode.org/cldr/data/common/supplemental/windowsZones.xml Copyright ========= This document has been placed in the public domain.

Hi, Le Mon, 28 Jan 2013 11:01:05 +0100, Lennart Regebro <regebro@gmail.com> a écrit :
Will non-ambiguous shorthands such as "Warsaw" and "GMT" be accepted?
NonExistentTimeError can also be raised for is_dst=True and is_dst=False, no? Or am I misunderstanding the semantics? Regards Antoine.

On Mon, Jan 28, 2013 at 11:26 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Will non-ambiguous shorthands such as "Warsaw" and "GMT" be accepted?
pytz accepts GMT, so that will work. Otherwise no.
NonExistentTimeError can also be raised for is_dst=True and is_dst=False, no? Or am I misunderstanding the semantics?
No, it will only be raised when is_dst=None. The flag is there to say what timezone the resulting time should be if it is ambiguous or missing. It is true that the times that don't exist will also not exist even if you set the flag. 2012-03-25 02:30 is a time that don't exist in Europe/Warsaw, flag or not. But it will not raise an error if you set the flag to False or True, for reasons of backwards compatibility as well as ease of handling. People don't expect that certain times don't exist, and will generally not check for those exceptions. Therefore, the normal behavior is to not raise an error, but happily keep dealing with the time as if it does exist. Only of you explicitly set it to "None" will you get an error. //Lennart

On Sat, Jan 26, 2013 at 10:38 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Which probably mean it actually rather should be called tztimezone or something.
6. Under "New collections"
Why both lists and sets?
Because pytz did it. But yes, you are right, an ordered set is a better solution. Baseing it on OrderedDict seems like a hack, though. I could implement a custom orderedset, of course. //Lennart

On Mon, Jan 28, 2013 at 10:06 PM, Lennart Regebro <regebro@gmail.com> wrote:
"dsttimezone", "dst_timezone", "timezonedst" or "timezone_dst" are probably the possibilities most consistent with the current naming scheme. Of those "dst_timezone" is probably the best of a fairly bad bunch.
Sets themselves have an honourable history of just being a thin wrapper around dictionaries with all the values set to None (although they're not implemented that way any more). Whether you create an actual OrderedSet class, or just expose the result of calling keys() on an OrderedDict instance is just an implementation detail, though. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Le Mon, 28 Jan 2013 22:31:29 +1000, Nick Coghlan <ncoghlan@gmail.com> a écrit :
Why the complication? Just expose a regular set and let users call sorted() if that's what they want. I'm -1 on datetime shipping its own container subclass just for the sake of looking clever. Regards Antoine.

On Mon, Jan 28, 2013 at 10:52 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I was going by the fact that pytz decided to expose both, and it isn't clear whether the "fast containment test" or "lexically ordered iteration" should be the base behaviour if you're going to rely on sorted() or set() to convert between the two. Since OrderedDict can provide both, creating one internally and just exposing keys() makes sense to me (perhaps via types.MappingProxyType to make it read-only). If people really don't like using OrderedDict that way, I think the other two potentially sensible options are either using an ordinary set, or else adding collections.OrderedSet. I agree adding a new ordered set type hidden within datetime would be a poor idea. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 28/01/13 23:52, Antoine Pitrou wrote:
An OrderedSet is not a sorted set. An OrderedSet, like an OrderedDict, remembers *insertion order*, it does not automatically sort the keys. So if datetime needs an ordered set, and I have no opinion on whether or not it really does, calling sorted() on a regular set is not the solution. -- Steven

On Mon, Jan 28, 2013 at 11:17 PM, Steven D'Aprano <steve@pearwood.info> wrote:
I think the use case for needing really quick checks if a string is in the list of recognized timezone is rather unusual. What is usually needed is a sorted list of valid time zone names. I'll drop the set. //Lennart

Hi, Le Mon, 28 Jan 2013 11:01:05 +0100, Lennart Regebro <regebro@gmail.com> a écrit :
Will non-ambiguous shorthands such as "Warsaw" and "GMT" be accepted?
NonExistentTimeError can also be raised for is_dst=True and is_dst=False, no? Or am I misunderstanding the semantics? Regards Antoine.

On Mon, Jan 28, 2013 at 11:26 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Will non-ambiguous shorthands such as "Warsaw" and "GMT" be accepted?
pytz accepts GMT, so that will work. Otherwise no.
NonExistentTimeError can also be raised for is_dst=True and is_dst=False, no? Or am I misunderstanding the semantics?
No, it will only be raised when is_dst=None. The flag is there to say what timezone the resulting time should be if it is ambiguous or missing. It is true that the times that don't exist will also not exist even if you set the flag. 2012-03-25 02:30 is a time that don't exist in Europe/Warsaw, flag or not. But it will not raise an error if you set the flag to False or True, for reasons of backwards compatibility as well as ease of handling. People don't expect that certain times don't exist, and will generally not check for those exceptions. Therefore, the normal behavior is to not raise an error, but happily keep dealing with the time as if it does exist. Only of you explicitly set it to "None" will you get an error. //Lennart

On Sat, Jan 26, 2013 at 10:38 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Which probably mean it actually rather should be called tztimezone or something.
6. Under "New collections"
Why both lists and sets?
Because pytz did it. But yes, you are right, an ordered set is a better solution. Baseing it on OrderedDict seems like a hack, though. I could implement a custom orderedset, of course. //Lennart

On Mon, Jan 28, 2013 at 10:06 PM, Lennart Regebro <regebro@gmail.com> wrote:
"dsttimezone", "dst_timezone", "timezonedst" or "timezone_dst" are probably the possibilities most consistent with the current naming scheme. Of those "dst_timezone" is probably the best of a fairly bad bunch.
Sets themselves have an honourable history of just being a thin wrapper around dictionaries with all the values set to None (although they're not implemented that way any more). Whether you create an actual OrderedSet class, or just expose the result of calling keys() on an OrderedDict instance is just an implementation detail, though. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Le Mon, 28 Jan 2013 22:31:29 +1000, Nick Coghlan <ncoghlan@gmail.com> a écrit :
Why the complication? Just expose a regular set and let users call sorted() if that's what they want. I'm -1 on datetime shipping its own container subclass just for the sake of looking clever. Regards Antoine.

On Mon, Jan 28, 2013 at 10:52 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I was going by the fact that pytz decided to expose both, and it isn't clear whether the "fast containment test" or "lexically ordered iteration" should be the base behaviour if you're going to rely on sorted() or set() to convert between the two. Since OrderedDict can provide both, creating one internally and just exposing keys() makes sense to me (perhaps via types.MappingProxyType to make it read-only). If people really don't like using OrderedDict that way, I think the other two potentially sensible options are either using an ordinary set, or else adding collections.OrderedSet. I agree adding a new ordered set type hidden within datetime would be a poor idea. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 28/01/13 23:52, Antoine Pitrou wrote:
An OrderedSet is not a sorted set. An OrderedSet, like an OrderedDict, remembers *insertion order*, it does not automatically sort the keys. So if datetime needs an ordered set, and I have no opinion on whether or not it really does, calling sorted() on a regular set is not the solution. -- Steven

On Mon, Jan 28, 2013 at 11:17 PM, Steven D'Aprano <steve@pearwood.info> wrote:
I think the use case for needing really quick checks if a string is in the list of recognized timezone is rather unusual. What is usually needed is a sorted list of valid time zone names. I'll drop the set. //Lennart
participants (4)
-
Antoine Pitrou
-
Lennart Regebro
-
Nick Coghlan
-
Steven D'Aprano