[Datetime-SIG] Matching +-HH:MM in strptime

Sat Oct 21 12:01:51 EDT 2017

I think that this is a case of the perfect being the enemy of the good. Just because we're trying to touch strptime does not mean we need to make it perfect in one go. I think it's a separate discussion if we want to add features to strptime to make it closer to a domain specific language for parsing dates, but I think we should start by focusing on parsing HH:MM, and we can have a separate discussion later about other extensions.

With regards to "once we have these extensions, isoparse becomes a one-liner", I don't think this needs to be a goal at all. strptime does not need to be designed such that implementing isoparse is trivial, it just needs to be designed such that isoparse is *possible*. Consider this implementation of isoparse:

    def isoparse(dt_str, sep='T'):
        base_fmt = "%Y-%m-%d"

        len_str = len(dt_str)
        if len_str > 10:
            base_fmt += sep

        if len_str == 10:
            tail = ''
        elif len_str == 13:
            tail = '%H'                     # hours, no tzinfo
        elif len_str == 16:
            tail = '%H:%M'                  # minutes, no tzinfo
        elif len_str == 19:
            if dt_str[-6] in '-+':
                tail = '%H%:z'              # hours, with tzinfo
            else:
                tail = '%H:%M:%S'           # seconds, no tzinfo
        elif len_str == 22:
            tail = '%H:%M%:z'              # minutes, with tzinfo
        elif len_str in {23, 26}:
            tail = '%H:%M:%S.%f'            # milliseconds/microseconds, no tzinfo
        elif len_str== 25:
            tail = '%H:%M:%S%:z'            # seconds, with tzinfo
        elif len_str in {29, 32}:
            tail = '%H:%M:%S.%f%:z'         # milliseconds/microseconds, with tzinfo
        else:
            raise ValueError('Invalid isoformat string')

        return datetime.datetime.strptime(dt_str, base_fmt + tail)

In C this could be implemented pretty efficiently as a switch statement, and it covers all possible outputs of isoformat (there's also a way to do it such that `sep` is automatically detected, but this is stricter), and the only thing actually missing is an `strptime` can accept a '%:z' (or equivalent of the gnu version of '%z') string. The fact that it's not a one-liner is immaterial, since it's going into the standard library, so then parsing the results of `isoformat` becomes the one-liner `datetime.isoparse(dt_str)`.

Here is a working proof-of-concept with some basic tests: https://gist.github.com/pganssle/930756cc93f7d888ab63363eb33d5fe5

On 10/21/2017 10:20 AM, Oren Tirosh wrote:
> On Sat, 21 Oct 2017 at 16:08, Mario Corchero <mariocj89 at gmail.com> wrote:
>
>> Sorry, hit send by mistake on the previous message.
>>
>>
>> That is fine for parsing, but my issue with this is symmetry with strftime.
>>
>>
>> I can agree with having a %:z for support in strftime but I think that is
>> a separate change. The issue I opened with the attached PR focused only in
>> strptime to facilitate the discussion.
>>
>
> Yes, strftime is a separate issue, but still relevant as a design concern
> for any new changes to strptime.
>
>>
> My revised proposal is this:
>
> Add "%:z" with the following semantics:
> 1. Requires ":" separator
> 2. Officially matches the empty string, producing a naive datetime
> (tzinfo=None)
> 3. [maybe] officially matches "Z", equivalent to "+00:00"
>
> For "%z", retain the existing semantics, with one extension
> 1. Does not require ":" (but silently accepts it)
> 2. Does not match the empty string
>
> Here's why:
>
> [snip] Oren:
>>>>
>>>
>>>> You can say that the real source of the asymmetry here is not with my
>>>> proposal but rather in the underlying strftime/strptime: on formatting, %z
>>>> yields an empty string for a naive timestamp rather that producing an
>>>> error. But on parsing, it refuses to parse a timestamp with no offset. A
>>>> truly symmetric implementation would have accepted it as a naive timestamp.
>>>>
>>>
>>>> Too late for %z because it must remain backward compatible, but perhaps
>>>> %:z can be made to accept a missing offset as a naive timestamp. The user
>>>> can then check for naive timestamp and reject them if they are unacceptable
>>>> in that context, rather than specifying whether a missing timestamp is
>>>> acceptable or not in the format string. I have no problem with either
>>>> solution
>>>>
>>> [snip]
>>>>
>>>
> A separate proposal:
>
> Add "%.f" with the following semantics:
> 1. Offially matches empty string, producing a timestamp with 0 fraction.
> 2. Otherwise equivalent to ".%f"
>
> Retracting proposal for "%?t" for now.
>
> With these two extensions, an strptime format can be written that can parse
> and losslessly round-trip the output of datetime.__str__, or isoformat()
> with the default space separator for all possible datetime values, naive or
> aware, except those using custom tzinfo.
>
> While not part of the proposal, these two extensions may also be naturally
> applied to strftime so that the same format string used for parsing will
> also produce an output identical to isoformat(), including naive timestamps
> and whole second timestamps.
>
>
>
> _______________________________________________
> Datetime-SIG mailing list
> Datetime-SIG at python.org
> https://mail.python.org/mailman/listinfo/datetime-sig
> The PSF Code of Conduct applies to this mailing list: https://www.python.org/psf/codeofconduct/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/datetime-sig/attachments/20171021/6992e78a/attachment.sig>