[Python-Dev] Status on PEP-431 Timezones

Carl Meyer carl at oddbird.net
Wed Apr 8 19:37:24 CEST 2015


Hi Lennart,

On 04/08/2015 09:18 AM, Lennart Regebro wrote:
> I wrote PEP-431 two years ago, and never got around to implement it.
> This year I got some renewed motivation after Berker Peksağ made an
> effort of implementing it.
> I'm planning to work more on this during the PyCon sprints, and also
> have a BoF session or similar during the conference.
> 
> Anyone interested in a session on this, mail me and we'll set up a
> time and place!

I'm interested in the topic, and would probably attend a BoF at PyCon.
Comments below:

> If anyone is interested in the details of the problem, this is it.
> 
> The big problem is the ambiguous times, like 02:30 a time when you
> move the clock back one hour, as there are two different 02:30's that
> day. I wrote down my experiences with looking into and trying to
> implement several different solutions. And the problem there is
> actually how to tell the datetime if it is before or after the
> changeover.
> 
> 
> == How others have solved it ==
> 
> === dateutil.tz: Ignore the problem ===
> 
> dateutil.tz simply ignores the problems with ambiguous datetimes, keeping them
> ambiguous.
> 
> 
> === pytz: One timezone instance per changeover ===
> 
> Pytz implements ambiguous datetimes by having one class per timezone. Each
> change in the UTC offset changes, either because of a DST changeover, or because
> the timezone changes, is represented as one instance of the class.
> 
> All instances are held in a list which is a class attribute of the timezone
> class. You flag in which DST changeover you are by uising different instances
> as the datetimes tzinfo. Since the timezone this way knows if it is DST or not,
> the datetime as a whole knows if it's DST or not.
> 
> Benefits:
> - Only known possible implementation without modifying stdlib, which of course
>   was a requirement, as pytz is a third-party library.
> - DST offset can be quickly returned, as it does not need to be calculated.
> Drawbacks:
> - A complex and highly magical implementation of timezones that is hard to
>   understand.
> - Required new normalize()/localize() functions on the timezone, and hence
>   the API is not stdlib's API.
> - Hundreds of instances per timezone means slightly more memory usage.
> 
> 
> == Options for PEP 431 ==
> 
> === Stdlib option 0: Ignore it ===
> 
> I don't think this is an option, really. Listed for completness.
> 
> 
> === Stdlib option 1: One timezone instance per changeover ===
> 
> Option 1 is to do it like pytz, have one timezone instance per changeover.
> However, this is likely not possible to do without fundamentally changing the
> datetime API, or making it very hard to use.
> 
> For example, when creating a datetime instance and passing in a tzinfo today
> this tzinfo is just attached to the datetime. But when having multiple
> instances of tzinfos this means you have to select the correct one to pass in.
> pytz solves this with the .localize() method, which let's the timezone
> class choose which instance to pass in.
> 
> We can't pass in the timezone class into datetime(), because that would
> require datetime.__new__ to create new datetimes as a part of the timezone
> arithmetic. These in turn, would create new datetimes in __new__ as a part of
> the timezone arithmetic, which in turn, yeah, you get it...
> 
> I haven't been able to solve that issue without either changing the API/usage,
> or getting infinite recursions.
> 
> Benefits:
> - Proven soloution through pytz.
> - Fast dst() call.
> Drawbacks:
> - Trying to use this technique with the current API tends to create
>   infinite recursions. It seems to require big API changes.
> - Slow datetime() instance creation.

I think "proven solution" is a significant benefit.

Today, anyone who is serious about correct timezone handling in Python
is almost certainly using pytz. So is adopting pytz's expanded API into
the stdlib really a big problem? It probably presents _fewer_
back-compatibility issues with real-world code than taking a different
approach from pytz would.

> === Stdlib option 2: A datetime _is_dst flag ===
> 
> By having a flag on the datetime instance that says "this is in DST or not"
> the timezone implementation can be kept simpler.

Is this really adequate? pytz's implementation handles far more than "is
DST or not", it also correctly handles historical timezone changes. How
would those be handled under this proposal?

> You also have to either calculate if the datetime is in a DST or not either
> when creating it, which demands datetime object creations, and causes infinite
> recursions, or you have to calculate it when needed, which means you can
> get "Ambiguous date time errors" at unexpected times later.
> 
> Also, when trying to implement this, I get bogged down in the complexities
> of how tzinfo and datetime is calling each other back and forth, and when
> to pass in the current is_dst and when to pass in the the desired is_dst, etc.
> The API and current implementation is not designed with this case in mind,
> and it gets very tricky.
> 
> Benefits:
> - Simpler tzinfo() implementations.
> Drawbacks:
> - It seems likely that we must change some API's.
> - This in turn may affect the pytz implementation. Or not, hard to say.
> - The DST offset must use slow timezone calculations. However, since datetimes
>   are immutable it can be a cached, lazy, one-time operation.
> 
> 
> === Stdlib option 3: UTC internal representation ===
> 
> Having UTC as the internal representation makes the whole issue go away.
> Datetimes are no longer ambiguous, except when creating, so checks need to
> be done during creation, but that should be possible without datetime creation
> in this case, resolving the infinite recursion problem.
> 
> Benefits:
> - Problem solved.
> - Minimal API changes.
> Drawbacks:
> - Backwards compatibility with pickles.
> - Possible other backwards incompatibility problems.
> - Both DST offset and date time display representation must use slow timezone
>   calculations. However, since datetimes are immutable it can be a cached,
>   lazy, one-time operation.

If designing a library from scratch without any back-compat
considerations, this is probably the first approach I would try.

I would favor either solution 1 or 3.

Carl

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20150408/e62b14b9/attachment.sig>


More information about the Python-Dev mailing list