Re: [Datetime-SIG] Matching +-HH:MM in strptime
I think that this is a case of the perfect being the enemy of the good. Just because we're trying to touch strptime does not mean we need to make it perfect in one go. I think it's a separate discussion if we want to add features to strptime to make it closer to a domain specific language for parsing dates, but I think we should start by focusing on parsing HH:MM, and we can have a separate discussion later about other extensions. With regards to "once we have these extensions, isoparse becomes a one-liner", I don't think this needs to be a goal at all. strptime does not need to be designed such that implementing isoparse is trivial, it just needs to be designed such that isoparse is *possible*. Consider this implementation of isoparse: def isoparse(dt_str, sep='T'): base_fmt = "%Y-%m-%d" len_str = len(dt_str) if len_str > 10: base_fmt += sep if len_str == 10: tail = '' elif len_str == 13: tail = '%H' # hours, no tzinfo elif len_str == 16: tail = '%H:%M' # minutes, no tzinfo elif len_str == 19: if dt_str[-6] in '-+': tail = '%H%:z' # hours, with tzinfo else: tail = '%H:%M:%S' # seconds, no tzinfo elif len_str == 22: tail = '%H:%M%:z' # minutes, with tzinfo elif len_str in {23, 26}: tail = '%H:%M:%S.%f' # milliseconds/microseconds, no tzinfo elif len_str== 25: tail = '%H:%M:%S%:z' # seconds, with tzinfo elif len_str in {29, 32}: tail = '%H:%M:%S.%f%:z' # milliseconds/microseconds, with tzinfo else: raise ValueError('Invalid isoformat string') return datetime.datetime.strptime(dt_str, base_fmt + tail) In C this could be implemented pretty efficiently as a switch statement, and it covers all possible outputs of isoformat (there's also a way to do it such that `sep` is automatically detected, but this is stricter), and the only thing actually missing is an `strptime` can accept a '%:z' (or equivalent of the gnu version of '%z') string. The fact that it's not a one-liner is immaterial, since it's going into the standard library, so then parsing the results of `isoformat` becomes the one-liner `datetime.isoparse(dt_str)`. Here is a working proof-of-concept with some basic tests: https://gist.github.com/pganssle/930756cc93f7d888ab63363eb33d5fe5 On 10/21/2017 10:20 AM, Oren Tirosh wrote:
On Sat, 21 Oct 2017 at 16:08, Mario Corchero <mariocj89@gmail.com> wrote:
Sorry, hit send by mistake on the previous message.
That is fine for parsing, but my issue with this is symmetry with strftime.
I can agree with having a %:z for support in strftime but I think that is a separate change. The issue I opened with the attached PR focused only in strptime to facilitate the discussion.
Yes, strftime is a separate issue, but still relevant as a design concern for any new changes to strptime.
My revised proposal is this:
Add "%:z" with the following semantics: 1. Requires ":" separator 2. Officially matches the empty string, producing a naive datetime (tzinfo=None) 3. [maybe] officially matches "Z", equivalent to "+00:00"
For "%z", retain the existing semantics, with one extension 1. Does not require ":" (but silently accepts it) 2. Does not match the empty string
Here's why:
[snip] Oren:
You can say that the real source of the asymmetry here is not with my proposal but rather in the underlying strftime/strptime: on formatting, %z yields an empty string for a naive timestamp rather that producing an error. But on parsing, it refuses to parse a timestamp with no offset. A truly symmetric implementation would have accepted it as a naive timestamp.
Too late for %z because it must remain backward compatible, but perhaps %:z can be made to accept a missing offset as a naive timestamp. The user can then check for naive timestamp and reject them if they are unacceptable in that context, rather than specifying whether a missing timestamp is acceptable or not in the format string. I have no problem with either solution
[snip]
A separate proposal:
Add "%.f" with the following semantics: 1. Offially matches empty string, producing a timestamp with 0 fraction. 2. Otherwise equivalent to ".%f"
Retracting proposal for "%?t" for now.
With these two extensions, an strptime format can be written that can parse and losslessly round-trip the output of datetime.__str__, or isoformat() with the default space separator for all possible datetime values, naive or aware, except those using custom tzinfo.
While not part of the proposal, these two extensions may also be naturally applied to strftime so that the same format string used for parsing will also produce an output identical to isoformat(), including naive timestamps and whole second timestamps.
_______________________________________________ Datetime-SIG mailing list Datetime-SIG@python.org https://mail.python.org/mailman/listinfo/datetime-sig The PSF Code of Conduct applies to this mailing list: https://www.python.org/psf/codeofconduct/
participants (1)
-
Paul G