[Datetime-SIG] Matching +-HH:MM in strptime

Mario Corchero mariocj89 at gmail.com
Sat Oct 21 18:16:49 EDT 2017


>
> I'm slightly leaning towards %:z because changing the semantics of %z
> could be construed as a backwards-incompatible change (albeit a minor one).
> I know some people have been asking for a "strict" version of the dateutil
> parser, and people do tend to use parsers for string validation. Adding the
> %:z option has the advantage that it's unambiguously backwards compatible,
> and it can be added to strftime if that is deemed desirable.


I think the issue in dateutil is a different one as the parser is fully
flexible. Here, even if it can be claimed as a backwards-incompatible
change (same could have been done in glibc) it seems quite fragile if you
are using isoparse with %z to check that your offset does not have a ':'.
Whilst in dateutil it is true that it can happen that sometimes it will
parse happily things that "don't seem to be a date" (but they can actually
be interpreted as so).

Moreover, (ideally) this will get on a new Python version (3.7) not on a
random patch.

Last but not least, as a user, if you don't even read the docs. Would you
not agree with %z being able to parse iso standard offsets? I actually
found it surprising that it could not.

I'd just keep it simple. I strongly prefer:
"%z parses RFC-822/ISO 8601 standard utc offset" (what you usually work
with).
Over: if your offsets have a colon, use "%:z" if they dont, use "%z" if
they can use Zulu remember to check for "Z" as well.

BUT! As said, no authority here :)

On 21 October 2017 at 17:12, Paul G <paul at ganssle.io> wrote:

> Back to the subject of how to handle +-HH:MM, I think the only really
> viable candidates are %z and %:z, so I think the question boils down to
> whether, with strptime, we care more about consistency with GNU / glibc's
> strptime (which apparently do implement %z to cover both HHMM and HH:MM) or
> whether we care more about users being able to specific *exactly* the
> string they want to match (e.g. allowing users to specify that a colon
> found in a time zone offset is an error condition).
>
> I'm slightly leaning towards %:z because changing the semantics of %z
> could be construed as a backwards-incompatible change (albeit a minor one).
> I know some people have been asking for a "strict" version of the dateutil
> parser, and people do tend to use parsers for string validation. Adding the
> %:z option has the advantage that it's unambiguously backwards compatible,
> and it can be added to strftime if that is deemed desirable.
>
> Best,
>
> Paul
>
> On 10/21/2017 09:07 AM, Mario Corchero wrote:
> > Sorry, hit send by mistake on the previous message.
> >
> > That is fine for parsing, but my issue with this is symmetry with
> strftime.
> >
> >
> > I can agree with having a %:z for support in strftime but I think that
> is a
> > separate change. The issue I opened with the attached PR focused only in
> > strptime to facilitate the discussion.
> >
> > Again, what is the alternative?
> >
> >
> > Making %z accept time-offset rfc3339 compatible.
> >
> > I have a working strptime:
> >
> >
> > Ouch, except for the fractionals seconds (which was not part of the issue
> > raised) I had also a patch for the colon and another for supporting 'Z'
> as
> > reported in the bug tracker. I was mentioning working with Paul in the
> > implementation of isoparse, as even if it might look simple it has caused
> > many long-standing discussions in the past.
> >
> > On 21 October 2017 at 13:55, Mario Corchero <mariocj89 at gmail.com> wrote:
> >
> >>
> >>
> >> On 21 October 2017 at 13:18, Oren Tirosh <orent at hishome.net> wrote:
> >>
> >>>
> >>> On Sat, 21 Oct 2017 at 13:24, Mario Corchero <mariocj89 at gmail.com>
> wrote:
> >>>
> >>>> My opinion (as a user, I have no authority here whatsoever)
> >>>>
> >>>> *1) About parsing colons in offsets with strptime*
> >>>>
> >>>> I think having %z support both +-HH:MM and +-HHMM would be the best
> >>>> choice, as it seems the simplest for me as a user.
> >>>> I'd go even further, making %z support ':' and 'Z', *a la glibc*.
> >>>> This effectively means that %z can now parse: Z, ±hh:mm, ±hhmm, or ±hh
> >>>>
> >>>
> >>> That is fine for parsing, but my issue with this is symmetry with
> >>> strftime. If the same extensions are also implemented for formatting (I
> >>> have a prototype) then you need some way to specify whether you want a
> :
> >>> separator or not. The %z will have to remain without colon on
> formatting
> >>> for backward compatibility.
> >>>
> >>> So l agree that the parser can be safely made more liberal in what it
> >>> accepts, but the formatter must be strict and specific in what it
> produces.
> >>>
> >>> I think this gives the best experience to the strptime user. It
> >>>> basically makes the time-offset rfc3339
> >>>> <https://tools.ietf.org/html/rfc3339> compatible.
> >>>>
> >>>
> >>> Yes, that's the goal.
> >>>
> >>> *2) Adding a handy function to build a datetime from a string
> serialized
> >>>> with isoformat*
> >>>> Absolutely agree on having an isoparse. That would be amazing, we can
> >>>> even build it on top of 1).
> >>>>
> >>>
> >>> ...and building it on top of 1 requires several extensions and
> variants.
> >>> People here seem to be a bit taken aback by the scope of these
> extensions.
> >>> I understand this reaction, but I maintain that most or all this
> complexity
> >>> is necessary if you want to implement this on to of strptime rather
> than a
> >>> custom isoparse().
> >>>
> >>> *Side note:*
> >>>> I am not totally in favour with "%?:z" (probably because I am leaning
> >>>> on %z doing the parsing for both and ?z will have no place on
> strftime).
> >>>> I think this starts to add way too much complexity to just say "parse
> a
> >>>> time-offset".
> >>>>
> >>>
> >>> Again, what is the alternative? If you want a parser that accepts the
> >>> output of isoformat() for all possible datetime values (except custom
> >>> tzinfo) then it needs to support a missing tz offset as indicating a
> naive
> >>> timestamp.
> >>>
> >>> You can say that the real source of the asymmetry here is not with my
> >>> proposal but rather in the underlying strftime/strptime: on
> formatting, %z
> >>> yields an empty string for a naive timestamp rather that producing an
> >>> error. But on parsing, it refuses to parse a timestamp with no offset.
> A
> >>> truly symmetric implementation would have accepted it as an naive
> >>> timestamp.
> >>>
> >>> Too late for %z because it must remain backward compatible, but perhaps
> >>> %:z can be made to accept a missing offset as a naive timestamp. The
> user
> >>> can then check for naive timestamp and reject them if they are
> unacceptable
> >>> in that context, rather than specifying whether a missing timestamp is
> >>> acceptable or not in the format string. I have no problem with either
> >>> solution.
> >>>
> >>>>
> >>>> *Implementation:*
> >>>> I am happy to work with PaulG in the isoparse implementation if we
> >>>> decide to go with it and if he wants to get involved :)
> >>>>
> >>>
> >>> I have a working strptime:
> >>>  https://github.com/orent/cpython/tree/strptime_extensions
> >>>
> >>> isoparse() on top of this strptime is a trivial one-liner.
> >>>
> >>> Oren
> >>>
> >>>>
> >>>>
> >>>> *Thanks:*
> >>>> Thanks for dedicating time to this, I think that even if minor this
> >>>> would be a killer addition to 3.7 if we manage to get it through.
> >>>>
> >>>> On 21 October 2017 at 07:34, Oren Tirosh <orent at hishome.net> wrote:
> >>>>
> >>>>> ok, let's try to separate the issues and choices on each one:
> >>>>>
> >>>>> 1. Extending strptime to support time zone offset with : separator:
> >>>>> Should a single directive accepts either hhmm or by:mm or use two
> >>>>> separate directives?
> >>>>>
> >>>>> 2. Round tripping of isoformat() back to datetime value:
> >>>>> Implement custom isoparse() function or extend strptime so isoparse
> >>>>> simply calls strptime with a default format?
> >>>>> Support all variations produced by isoformat or just a subset?
> >>>>> (Variations include with/without fraction, with/without tz and
> separator
> >>>>> choice)
> >>>>>
> >>>>> I suggest 1 separate directives 2a extend strptime and 2b support all
> >>>>> variations. Do you have different preferences on any of these
> questions?
> >>>>>
> >>>>> I understand that the number of extensions to support this seems
> >>>>> excessive to you.
> >>>>>
> >>>>> Technically, my proposed "%.f" is not really necessary. I added it
> for
> >>>>> completeness. We can keep using ".%f" for non-optional fraction and
> define
> >>>>> "%?f" to implicitly include the dot.
> >>>>>
> >>>>> The distinction between "%z",  "%:z" and "%?:z"" can also be narrowed
> >>>>> down. This can be done, for example, by making "%z" and "%?s" always
> accept
> >>>>> hhmm with or without the : separator.
> >>>>>
> >>>>> On Fri, 20 Oct 2017 at 17:16, Paul G <paul at ganssle.io> wrote:
> >>>>>
> >>>>>> I think this would be a much bigger change to the strptime interface
> >>>>>> than is actually warranted, and probably would add in additional,
> >>>>>> unnecessary complexity by introducing the concept of optional
> matches.
> >>>>>> Adding the capability to match HH:MM offsets is a reasonable
> extension
> >>>>>> partially because that is a standard representation that is
> currently *not*
> >>>>>> covered by strptime, and the fact that that's how isoformat()
> represents
> >>>>>> the offset just makes this lack all the more acute.
> >>>>>>
> >>>>>> I think it should be uncontroversial to add *one* of these two %z
> >>>>>> extensions to Python 3 without getting bogged down in allowing a
> single
> >>>>>> strptime string to match any output from `.isoformat`.
> >>>>>>
> >>>>>> That said, I'm also very much in favor of a `.isoparse` or
> >>>>>> `.fromisoformat` constructor that *is* the inverse of `isoformat`,
> which
> >>>>>> should solve the issue without sweeping changes to how `strptime`
> works.
> >>>>>>
> >>>>>> On 10/19/2017 04:07 PM, Oren Tirosh wrote:
> >>>>>>> https://github.com/orent/cpython/tree/strptime_extensions
> >>>>>>>
> >>>>>>> %:z  - matches +HH:MM
> >>>>>>> %?:z - optional %:z
> >>>>>>> %.f  - equivalent to .%f
> >>>>>>> %?.f - optional %.f
> >>>>>>> %?t  - matches ' ' or 'T'
> >>>>>>>
> >>>>>>> What they all have in common is that together they make it possible
> >>>>>> to
> >>>>>>> write a strptime format that matches all possible output variations
> >>>>>> of
> >>>>>>> datetime.__str__/ datetime.isoformat.
> >>>>>>>
> >>>>>>> The time zone not only supports the : separator but also allows
> >>>>>> making the
> >>>>>>> entire component optional, as isoformat() will add it only for
> aware
> >>>>>>> datetime objects. The seconds fraction is dropped from the default
> >>>>>> string
> >>>>>>> representation if the datetime represents a whole second. Since it
> is
> >>>>>>> dropped along with the decimal dot, I first made "%.f" that
> includes
> >>>>>> the
> >>>>>>> dot and then created the optional variant. Finally, "%?t" can be
> >>>>>> used to
> >>>>>>> accept a timestamp with either of the separators defined in
> iso8601.
> >>>>>>>
> >>>>>>> It is quite absurd that datetime cannot parse its own string
> >>>>>>> representation. Using these extensions an .isoparse() method may be
> >>>>>> added
> >>>>>>> that calls strptime('%Y-%m-%d%?t%H:%M:%S%?.f%?:z') and supports
> full
> >>>>>>> round-tripping of all possible datetime values that do not not use
> a
> >>>>>> custom
> >>>>>>> tzinfo.
> >>>>>>>
> >>>>>>> Oren
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Thu, 19 Oct 2017 at 17:06, Paul G <paul at ganssle.io> wrote:
> >>>>>>>>
> >>>>>>>> There is a new issue about the %z directive in strptime on the
> issue
> >>>>>>> tracker: https://bugs.python.org/issue31800 (linked to a few
> related
> >>>>>>> issues), and a linked PR expanding the definition of %z to match
> >>>>>> HH:MM:
> >>>>>>> https://github.com/python/cpython/pull/4015
> >>>>>>>>
> >>>>>>>> I think either adding a %:z directive or expanding the definition
> >>>>>> of %z
> >>>>>>> would be pretty important, and I think there's a good case to be
> >>>>>> made for
> >>>>>>> either one. To summarize the arguments for people on the mailing
> >>>>>> list:
> >>>>>>>>
> >>>>>>>> The argument for expanding the definition of %z that I find
> >>>>>> strongest is
> >>>>>>> that according to the linux man pages (
> >>>>>>> http://man7.org/linux/man-pages/man3/strptime.3.html ), while %z
> >>>>>> generates
> >>>>>>> +-HHMM in strftime, strptime is supposed to match "An RFC-822/ISO
> >>>>>> 8601
> >>>>>>> standard timezone specification",and ISO 8601 uses +-HH:MM, so if
> >>>>>> we're
> >>>>>>> following those linux pages, we should be accepting the version
> with
> >>>>>> the
> >>>>>>> colon.
> >>>>>>>>
> >>>>>>>> The argument that I find most compelling for adding a %:z
> directive
> >>>>>> are:
> >>>>>>>>
> >>>>>>>>     1. maintains the symmetry between strftime and strptime
> >>>>>>>>     2. allows users to be stricter about their datetime format
> >>>>>>>>     3. has precedent in that GNU's `date` command accepts %z, %:z
> >>>>>> and
> >>>>>>> %::z formats
> >>>>>>>>
> >>>>>>>> Can we establish some consensus on which should be done so that it
> >>>>>> can be
> >>>>>>> implemented?
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>>
> >>>>>>>> Paul
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> Datetime-SIG mailing list
> >>>>>>>> Datetime-SIG at python.org
> >>>>>>>> https://mail.python.org/mailman/listinfo/datetime-sig
> >>>>>>>> The PSF Code of Conduct applies to this mailing list:
> >>>>>>> https://www.python.org/psf/codeofconduct/
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Datetime-SIG mailing list
> >>>>>>> Datetime-SIG at python.org
> >>>>>>> https://mail.python.org/mailman/listinfo/datetime-sig
> >>>>>>> The PSF Code of Conduct applies to this mailing list:
> >>>>>> https://www.python.org/psf/codeofconduct/
> >>>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Datetime-SIG mailing list
> >>>>>> Datetime-SIG at python.org
> >>>>>> https://mail.python.org/mailman/listinfo/datetime-sig
> >>>>>> The PSF Code of Conduct applies to this mailing list:
> >>>>>> https://www.python.org/psf/codeofconduct/
> >>>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Datetime-SIG mailing list
> >>>>> Datetime-SIG at python.org
> >>>>> https://mail.python.org/mailman/listinfo/datetime-sig
> >>>>> The PSF Code of Conduct applies to this mailing list:
> >>>>> https://www.python.org/psf/codeofconduct/
> >>>>>
> >>>>>
> >>>>
> >>
> >
> >
> >
> > _______________________________________________
> > Datetime-SIG mailing list
> > Datetime-SIG at python.org
> > https://mail.python.org/mailman/listinfo/datetime-sig
> > The PSF Code of Conduct applies to this mailing list:
> https://www.python.org/psf/codeofconduct/
> >
>
>
> _______________________________________________
> Datetime-SIG mailing list
> Datetime-SIG at python.org
> https://mail.python.org/mailman/listinfo/datetime-sig
> The PSF Code of Conduct applies to this mailing list:
> https://www.python.org/psf/codeofconduct/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/datetime-sig/attachments/20171021/645c28c3/attachment-0001.html>


More information about the Datetime-SIG mailing list