Re: [Datetime-SIG] how does PEP-495 help improve dateutil, pytz timezone packages?
Alexander Belopolsky
I am changing the subject line because neither of the PEPs mentioned in the original subject propose any changes to the time module.
This list is about improving datetime module, in particular PEP-495. I've changed the subject accordingly. As long as datetime module uses time module, the corresponding time module issues that cannot be worked-around *are* datetime module issues. It seems blatantly obvious. If you don't consider the tz_ database to be essential for writing the code that works with timezones; you may stop reading now. .. _tz: https://www.iana.org/time-zones/repository/tz-link.html What is discussed here ? ------------------------ It seems there is a communication problem. Let's overcommunicate then :) The time module issues are mentioned to explain why it is not reasonable to expect datetime.now(timezone.utc).astimezone() to work in the general case. stdlib's astimezone() is mentioned to point out that it may fail while pytz works in the exact same case. I want to demonstrate that pytz works in cases where stdlib and dateutil fail currently to point out that *PEP-495 should either provide more support for the way pytz works or demonstrate how PEP-495 fixes design issues in stdlib and dateutil that make it difficult to enable better timezone support.* Why PEP-495 -- Local Time Disambiguation should care about zoneinfo? -------------------------------------------------------------------- History shows that the current datetime API is at least partially responsible that the only working solution (pytz) has more complicated API. *it works but it might have been simpler and less error-prone.* The same could be said about stdlib, dateutil, and the timezone packages that are built on top of them such as arrow, delorean. The difference is that they work in less cases (fail more). Therefore even if the explicit goal of PEP-495 is different from PEP-431; PEP-495 should avoid making the life more difficult for zoneinfo packages or even more: it should consider *how it can help pytz, dateutil, or some other timezone package to provide a good tzdata-related API.* The last part is the reason I've mentioned cases where stdlib, dateutil fail in this thread. What are possible good timezone API examples? ---------------------------------------------
From the _minimalistic_ category: times_ Python package -- utc/posix time internally, local time is used only for input or display (similar to Unicode sandwich approach: Unicode internally, use bytes only if necessary to communicate with the outside world). No implicit timezone conversions.
From the _kitchen-sink_ category: Time4J_ Java package -- a few composable
It is unfortunately no longer supported. It is implemented on top of arrow_ which (last time I've checked) has the same issues as dateutil. primitives provide powerful API. Notable feature: no temporal arithmetic or manipulations for ZonalDateTime_. .. _times: https://github.com/nvie/times/ .. _arrow: http://arrow.readthedocs.org .. _Time4J: https://github.com/MenoData/Time4J .. _ZonalDateTime: http://www.time4j.net/tutorial/zdt.html What are examples of timezone-related issues that PEP-495 could solve? ---------------------------------------------------------------------- - utc -> local timezone conversions in dateutil. I haven't looked at the source but Stuart Bishop_ says that the new flag may fix this and perhaps other issues caused by ambiguous times - datetime constructor method might start working with pytz timezones. The general goal is to leave pytz localize() method only for those people who need an exception for ambiguous or non-existent times. The important part is that PEP-495 should not make it even more difficult to use the packages correctly. Ideally, PEP-495 should evolve with the corresponding experimental implementations that adapt the new flag. .. _Bishop: https://mail.python.org/pipermail/datetime-sig/2015-August/000466.html
On Tue, Aug 25, 2015 at 11:47 AM, Akira Li <4kir4.1i@gmail.com> wrote:
Alexander Belopolsky
writes: On Aug 25, 2015, at 7:44 AM, Akira Li <4kir4.1i@gmail.com> wrote:
note: stdlib variant datetime.now(timezone.utc).astimezone() may fail
if it
uses time.timezone, time.tzname internally [3,4,5] when tm_gmtoff tm_zone are not available on a given platform.
If this actually happens on any supported platform - please file a bug report. What we do in this case is not as simplistic as you describe.
Bug-driven development is probably not the best strategy for a datetime library ;) Tests can't catch all bugs. I've found out that astimezone() may fail by *reading* its source and trying to *understand* what it does.
I agree, but once you've read the code and see any logical errors, you should be able to construct a test case demonstrating wrong behavior.
I did.
Here's the part from datetime.py [1] that computes the local timezone if tm_gmtoff or tm_zone are not available:
# Compute UTC offset and compare with the value implied # by tm_isdst. If the values match, use the zone name # implied by tm_isdst. delta = local - datetime(*_time.gmtime(ts)[:6]) dst = _time.daylight and localtm.tm_isdst > 0 gmtoff = -(_time.altzone if dst else _time.timezone) if delta == timedelta(seconds=gmtoff): tz = timezone(delta, _time.tzname[dst]) else: tz = timezone(delta)
Here's its C equivalent [2].
Python issues that I've linked in the previous message [3,4,5] demonstrate that time.timezone and time.tzname may have wrong values and therefore the result *tz* may have a wrong tzname.
To summarize for those who will not follow the links: [3] Is a closed "No obvious and correct way to get the time zone offset" issue. It was superseded by http://bugs.python.org/issue9527 which in turn was closed by implementing the argument-less .astimezone() method. [4] and [5] are time module issues.
Look at the code example immediately above the text you are commenting on. Look at _time.tzname, _time.timezone there. It is the code from datetime.astimezone() method. If timezone, tzname may be wrong then astimezone() may also fail. The example below demonstrates the failure. The issues that I've linked demonstrate specific cases when timezone, tzname are wrong. The status of the issues is irrelevant (timezone, tzname behavior hasn't changed).
Here's an example inspired by "incorrect time.timezone value" Python issue [4]:
from datetime import datetime, timezone from email.utils import parsedate_to_datetime import tzlocal # to get local timezone as pytz timezone d = parsedate_to_datetime("Tue, 28 Oct 2013 14:27:54 +0000") # expected (TZ=Europe/Moscow) ... d.astimezone(tzlocal.get_localzone()).strftime('%Z%z') 'MSK+0400' # got ... d.astimezone().strftime('%Z%z') 'UTC+04:00+0400'
I don't understand why you keep presenting a mix of pytz, email.utils and something called "tzlocal" and then claim that the unexpected behavior indicates a problem in the datetime module? It could as well be in any of the three other modules that you use or in the way you combine them.
*"something called "tzlocal""*
import tzlocal # to get local timezone as pytz timezone
really, neither the comment ^^^ nor the code example d.astimezone(tzlocal.get_localzone()) itself told you nothing :) The purpose is to demonstrate that pytz works without relying on tm_gmtoff, tm_zone attributes while at the same time astimezone() fails here. Your own code below produces MSK+0400 that implies that you do know that it is the correct answer even if it weren't obvious just by looking at the result strings. I don't understand how you could even suggest that MSK+0400 is wrong and UTC+04:00+0400 is the correct behavior here. Here's a distilled example:
from datetime import datetime, timezone datetime(2013, 10, 28, tzinfo=timezone.utc).astimezone().strftime('%Z%z')
If you *disable tm_gmtoff attribute* then it produces UTC+04:00+0400. That differs from the expected output MSK+0400, like the same code demonstrates if you enable the attribute. Notice (direct quote): "if tm_gmtoff or tm_zone are not available" above.
If you want to parse the string "Tue, 28 Oct 2013 14:27:54 +0000" and convert it to Moscow time, here is how you do it using the datetime module:
import os; os.environ['TZ'] = 'Europe/Moscow' from datetime import datetime d = datetime.strptime("Tue, 28 Oct 2013 14:27:54 +0000", "%a, %d %b %Y %H:%M:%S %z") d.astimezone().strftime("%F %T %Z%z") '2013-10-28 18:27:54 MSK+0400'
Does this code behave differently on your system? If it does - please file a bug report.
My mistake, I should have made it even more clear that the example illustrates the results of the code from stdlib immediately above it and therefore the tm_gmtoff, tm_zone access is disabled. Try your code making sure that tm_zone is not used.
'UTC+04:00' instead of 'MSK' is not a major issue. I don't consider it a bug because without access to the tz database stdlib can't do much better, there always be cases when it breaks.
It is quite possible that that such cases exist, but you have not demonstrated one.
I just use pytz instead which does provide access to the tz database.
This will always be your option as it is your option to use just the datetime module. In both cases you can write correct code if you follow the reference manual or buggy code if you don't. An almost sure way to write buggy code is to use one library manual to write code using another.
No. You can't write the correct code that works with timezones using only stdlib e.g., %Z support http://bugs.python.org/issue22377
from datetime import datetime datetime.strptime('2016-12-04 08:00:00 EST', '%Y-%m-%d %H:%M:%S %Z') Traceback (most recent call last): ... ValueError: ...
dateutil allows to disambiguate the timezone abbreviation and returns an aware datetime in this case:
dateutil.parser.parse('2016-12-04 08:00:00 EST', tzinfos={'EST':-18000}) datetime.datetime(2016, 12, 4, 8, 0, tzinfo=tzoffset('EST', -18000))
--- 'UTC+04:00+0400' is not a bug like it is not a bug that a 8-bit Windows codepage can't encode all Unicode characters -- it can't and you don't expect it -- you just use the encoding such as utf-8 that does support the whole Unicode range. I don't expect datetime code that uses time.timezone, time.tzname internally (as the excerpt from datetime.py above demonstrates) to do timezone conversions without issues. Again, the purpose of the example is to demonstrate the *fundamental* deficiency in datetime module that can't be fixed without access to the tz database (tm_gmtoff is a way to get such access for a local timezone).
[1]
https://github.com/python/cpython/blob/fced0e12fc510e4a6158628695774ccfd0239...
[2] https://github.com/python/cpython/blob/fced0e12fc510e4a6158628695774ccfd0239... [3] http://bugs.python.org/issue1647654 [4] http://bugs.python.org/issue22752 [5] http://bugs.python.org/issue22798
On Tue, Aug 25, 2015 at 7:19 PM, Akira Li <4kir4.1i@gmail.com> wrote:
Here's a distilled example:
from datetime import datetime, timezone datetime(2013, 10, 28, tzinfo=timezone.utc).astimezone().strftime('%Z%z')
If you *disable tm_gmtoff attribute* then it produces UTC+04:00+0400. That differs from the expected output MSK+0400, like the same code demonstrates if you enable the attribute. Notice (direct quote): "if tm_gmtoff or tm_zone are not available" above.
Of course! That's why we exposed tm_gmtoff attribute in time.time_struct on *all platfoms* IIRC. It's been a long time, by I recall that we went to some great lengths to emulate tm_gmtoff by comparing the results of localtime calls to those of gmtime. Could it be that we missed some corner cases? Sure. But your "if tm_gmtoff or tm_zone are not available" sounds like complaining that after
del datetime.timezone
the datetime module does not support even the UTC timezone!
participants (2)
-
Akira Li
-
Alexander Belopolsky