Should `strptime`, when passed a %Z format specifier, parse more than just GMT, UTC, and the local time zone given that we now have the IANA database in the stdlib via PEP 615 since Python 3.9?
The regex for the %Z specifier appears to come from TimeRE and always has GMT, UTC, and then the local time zone.
from _strptime import TimeRE t = TimeRE() t['Z']
'(?P<Z>cst|gmt|utc|cdt)'
from datetime import datetime datetime.strptime('2016-12-04 08:00:00 CST', '%Y-%m-%d %H:%M:%S %Z')
datetime.datetime(2016, 12, 4, 8, 0)
datetime.strptime('2016-12-04 08:00:00 CDT', '%Y-%m-%d %H:%M:%S %Z')
datetime.datetime(2016, 12, 4, 8, 0)
datetime.strptime('2016-12-04 08:00:00 UTC', '%Y-%m-%d %H:%M:%S %Z')
datetime.datetime(2016, 12, 4, 8, 0)
datetime.strptime('2016-12-04 08:00:00 GMT', '%Y-%m-%d %H:%M:%S %Z')
datetime.datetime(2016, 12, 4, 8, 0)
datetime.strptime('2016-12-04 08:00:00 EST', '%Y-%m-%d %H:%M:%S %Z')
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/home/eugene/src/cpython/Lib/_strptime.py", line 568, in _strptime_datetime tt, fraction, gmtoff_fraction = _strptime(data_string, format) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/home/eugene/src/cpython/Lib/_strptime.py", line 349, in _strptime raise ValueError("time data %r does not match format %r" % ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ValueError: time data '2016-12-04 08:00:00 EST' does not match format '%Y-%m-%d %H:%M:%S %Z'
There have been some discussions and issues with regard to this in the past: - https://github.com/python/cpython/issues/66571 - https://github.com/python/cpython/issues/66616
I was interested in taking a look at this, but I wanted to reach out and know if it was something that would be desirable and what others' thoughts were.
Thanks, Eugene
On Sun, 24 Apr 2022 at 13:41, Eugene Triguba eugenetriguba@gmail.com wrote:
Should `strptime`, when passed a %Z format specifier, parse more than just GMT, UTC, and the local time zone given that we now have the IANA database in the stdlib via PEP 615 since Python 3.9?
The regex for the %Z specifier appears to come from TimeRE and always has GMT, UTC, and then the local time zone.
from _strptime import TimeRE t = TimeRE() t['Z']
'(?P<Z>cst|gmt|utc|cdt)'
For any timezone at all? That'd be ambiguous for BST, CST, and a bunch of others. Timezone abbreviations aren't unique.
For a select set? Maybe, but then there'd need to be a way to choose which ones you want to recognize, which might still end up being too small and/or too large.
ChrisA
On Saturday, April 23, 2022, Chris Angelico rosuav@gmail.com wrote:
For any timezone at all? That'd be ambiguous for BST, CST, and a bunch of others. Timezone abbreviations aren't unique.
For a select set? Maybe, but then there'd need to be a way to choose which ones you want to recognize, which might still end up being too small and/or too large.
Thanks Chris. I hadn’t recognized that detail of them not being unique.
Would this issue be good to close out in that case? It seems the work of clarifying the docs had been done: https://github.com/pytho n/cpython/issues/66571
And I would imagine the scope of this issue (https://github.com/python/cpy thon/issues/72751) then would only be for making utc, gmt, and the local time zone that is parsed from %Z to be tz-aware date times instead of the current naive date times?
Would it be worthwhile to look into if we can improve some of these strptime error messages? For instance, currently when parsing with %Z, the function gives a ValueError saying our string doesn’t match the given format if we try to parse a timezone that isn’t utc, gmt, or our local timezone but I would expect knowing that it is completely invalid or valid but not the local timezone/utc/gmt would be more useful?
I know there has been some conversations and work regarding improving the error message for the day/month: https://github.com/ python/cpython/issues/69117
The short names for time zones are not globally unique. The most you can expect from them is for the short names of the local time zones to be different for DST and non-DST and distinct from UTC/GMT.
I’m not even sure if they are guaranteed to be different from the US time zone names which are often hard-wired for legacy support (e.g. in email/_parseaddr.py for rfc2822 timestamps).
On Sun, 24 Apr 2022 at 6:41 Eugene Triguba eugenetriguba@gmail.com wrote:
Should `strptime`, when passed a %Z format specifier, parse more than just GMT, UTC, and the local time zone given that we now have the IANA database in the stdlib via PEP 615 since Python 3.9?
The regex for the %Z specifier appears to come from TimeRE and always has GMT, UTC, and then the local time zone.
from _strptime import TimeRE t = TimeRE() t['Z']
'(?P<Z>cst|gmt|utc|cdt)'
from datetime import datetime datetime.strptime('2016-12-04 08:00:00 CST', '%Y-%m-%d %H:%M:%S %Z')
datetime.datetime(2016, 12, 4, 8, 0)
datetime.strptime('2016-12-04 08:00:00 CDT', '%Y-%m-%d %H:%M:%S %Z')
datetime.datetime(2016, 12, 4, 8, 0)
datetime.strptime('2016-12-04 08:00:00 UTC', '%Y-%m-%d %H:%M:%S %Z')
datetime.datetime(2016, 12, 4, 8, 0)
datetime.strptime('2016-12-04 08:00:00 GMT', '%Y-%m-%d %H:%M:%S %Z')
datetime.datetime(2016, 12, 4, 8, 0)
datetime.strptime('2016-12-04 08:00:00 EST', '%Y-%m-%d %H:%M:%S %Z')
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/home/eugene/src/cpython/Lib/_strptime.py", line 568, in _strptime_datetime tt, fraction, gmtoff_fraction = _strptime(data_string, format) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/home/eugene/src/cpython/Lib/_strptime.py", line 349, in _strptime raise ValueError("time data %r does not match format %r" % ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ValueError: time data '2016-12-04 08:00:00 EST' does not match format '%Y-%m-%d %H:%M:%S %Z'
There have been some discussions and issues with regard to this in the past:
I was interested in taking a look at this, but I wanted to reach out and know if it was something that would be desirable and what others' thoughts were.
Thanks, Eugene _______________________________________________ Datetime-SIG mailing list -- datetime-sig@python.org To unsubscribe send an email to datetime-sig-leave@python.org https://mail.python.org/mailman3/lists/datetime-sig.python.org/ The PSF Code of Conduct applies to this mailing list: https://www.python.org/psf/codeofconduct/
On Sun, 24 Apr 2022 at 22:03, Oren Tirosh orent@hishome.net wrote:
The short names for time zones are not globally unique. The most you can expect from them is for the short names of the local time zones to be different for DST and non-DST and distinct from UTC/GMT.
I’m not even sure if they are guaranteed to be different from the US time zone names which are often hard-wired for legacy support (e.g. in email/_parseaddr.py for rfc2822 timestamps).
They're not. China Standard Time (Asia/Shanghai, I think?) conflicts with Central Standard Time (America/Chicago).
ChrisA
I've written up some thoughts on this here: https://github.com/python/cpython/issues/66571#issuecomment-1107838638
TL;DR: There's probably something more we can do before we close out #66571, but I doubt there will ever be a great way to make this work the way people think it intuitively should (mostly because people intuitively think that the 3-letter abbreviations 1. exist for all zones and 2. are globally unique).
Generally speaking, the kind of thing output by %Z is an attractive nuisance and probably shouldn't ever be used — though I'm sure it will continue to be used for decades to come anyway.
Best, Paul
On 4/24/22 01:04, Eugene Triguba wrote:
On Saturday, April 23, 2022, Chris Angelico rosuav@gmail.com wrote:
For any timezone at all? That'd be ambiguous for BST, CST, and a bunch of others. Timezone abbreviations aren't unique. For a select set? Maybe, but then there'd need to be a way to choose which ones you want to recognize, which might still end up being too small and/or too large.
Thanks Chris. I hadn’t recognized that detail of them not being unique.
Would this issue be good to close out in that case? It seems the work of clarifying the docs had been done: https://github.com/python/cpython/issues/66571 https://github.com/python/cpython/issues/66571
And I would imagine the scope of this issue (https://github.com/python/cpython/issues/72751 https://github.com/python/cpython/issues/72751) then would only be for making utc, gmt, and the local time zone that is parsed from %Z to be tz-aware date times instead of the current naive date times?
Would it be worthwhile to look into if we can improve some of these strptime error messages? For instance, currently when parsing with %Z, the function gives a ValueError saying our string doesn’t match the given format if we try to parse a timezone that isn’t utc, gmt, or our local timezone but I would expect knowing that it is completely invalid or valid but not the local timezone/utc/gmt would be more useful?
I know there has been some conversations and work regarding improving the error message for the day/month: https://github.com/python/cpython/issues/69117 https://github.com/python/cpython/issues/69117
Datetime-SIG mailing list --datetime-sig@python.org To unsubscribe send an email todatetime-sig-leave@python.org https://mail.python.org/mailman3/lists/datetime-sig.python.org/ The PSF Code of Conduct applies to this mailing list:https://www.python.org/psf/codeofconduct/
On Sun, Apr 24, 2022 at 8:16 AM Paul Ganssle python@ml.ganssle.io wrote:
I've written up some thoughts on this here: https://github.com/python/cpython/issues/66571#issuecomment-1107838638
Thanks for writing that up Paul. I gave a reply to you back there. I assume the discussion can move to that ticket.