Re: [Datetime-SIG] Matching +-HH:MM in strptime
Back to the subject of how to handle +-HH:MM, I think the only really viable candidates are %z and %:z, so I think the question boils down to whether, with strptime, we care more about consistency with GNU / glibc's strptime (which apparently do implement %z to cover both HHMM and HH:MM) or whether we care more about users being able to specific *exactly* the string they want to match (e.g. allowing users to specify that a colon found in a time zone offset is an error condition). I'm slightly leaning towards %:z because changing the semantics of %z could be construed as a backwards-incompatible change (albeit a minor one). I know some people have been asking for a "strict" version of the dateutil parser, and people do tend to use parsers for string validation. Adding the %:z option has the advantage that it's unambiguously backwards compatible, and it can be added to strftime if that is deemed desirable. Best, Paul On 10/21/2017 09:07 AM, Mario Corchero wrote:
Sorry, hit send by mistake on the previous message.
That is fine for parsing, but my issue with this is symmetry with strftime.
I can agree with having a %:z for support in strftime but I think that is a separate change. The issue I opened with the attached PR focused only in strptime to facilitate the discussion.
Again, what is the alternative?
Making %z accept time-offset rfc3339 compatible.
I have a working strptime:
Ouch, except for the fractionals seconds (which was not part of the issue raised) I had also a patch for the colon and another for supporting 'Z' as reported in the bug tracker. I was mentioning working with Paul in the implementation of isoparse, as even if it might look simple it has caused many long-standing discussions in the past.
On 21 October 2017 at 13:55, Mario Corchero <mariocj89@gmail.com> wrote:
On 21 October 2017 at 13:18, Oren Tirosh <orent@hishome.net> wrote:
On Sat, 21 Oct 2017 at 13:24, Mario Corchero <mariocj89@gmail.com> wrote:
My opinion (as a user, I have no authority here whatsoever)
*1) About parsing colons in offsets with strptime*
I think having %z support both +-HH:MM and +-HHMM would be the best choice, as it seems the simplest for me as a user. I'd go even further, making %z support ':' and 'Z', *a la glibc*. This effectively means that %z can now parse: Z, ±hh:mm, ±hhmm, or ±hh
That is fine for parsing, but my issue with this is symmetry with strftime. If the same extensions are also implemented for formatting (I have a prototype) then you need some way to specify whether you want a : separator or not. The %z will have to remain without colon on formatting for backward compatibility.
So l agree that the parser can be safely made more liberal in what it accepts, but the formatter must be strict and specific in what it produces.
I think this gives the best experience to the strptime user. It
basically makes the time-offset rfc3339 <https://tools.ietf.org/html/rfc3339> compatible.
Yes, that's the goal.
*2) Adding a handy function to build a datetime from a string serialized
with isoformat* Absolutely agree on having an isoparse. That would be amazing, we can even build it on top of 1).
...and building it on top of 1 requires several extensions and variants. People here seem to be a bit taken aback by the scope of these extensions. I understand this reaction, but I maintain that most or all this complexity is necessary if you want to implement this on to of strptime rather than a custom isoparse().
*Side note:*
I am not totally in favour with "%?:z" (probably because I am leaning on %z doing the parsing for both and ?z will have no place on strftime). I think this starts to add way too much complexity to just say "parse a time-offset".
Again, what is the alternative? If you want a parser that accepts the output of isoformat() for all possible datetime values (except custom tzinfo) then it needs to support a missing tz offset as indicating a naive timestamp.
You can say that the real source of the asymmetry here is not with my proposal but rather in the underlying strftime/strptime: on formatting, %z yields an empty string for a naive timestamp rather that producing an error. But on parsing, it refuses to parse a timestamp with no offset. A truly symmetric implementation would have accepted it as an naive timestamp.
Too late for %z because it must remain backward compatible, but perhaps %:z can be made to accept a missing offset as a naive timestamp. The user can then check for naive timestamp and reject them if they are unacceptable in that context, rather than specifying whether a missing timestamp is acceptable or not in the format string. I have no problem with either solution.
*Implementation:* I am happy to work with PaulG in the isoparse implementation if we decide to go with it and if he wants to get involved :)
I have a working strptime: https://github.com/orent/cpython/tree/strptime_extensions
isoparse() on top of this strptime is a trivial one-liner.
Oren
*Thanks:* Thanks for dedicating time to this, I think that even if minor this would be a killer addition to 3.7 if we manage to get it through.
On 21 October 2017 at 07:34, Oren Tirosh <orent@hishome.net> wrote:
ok, let's try to separate the issues and choices on each one:
1. Extending strptime to support time zone offset with : separator: Should a single directive accepts either hhmm or by:mm or use two separate directives?
2. Round tripping of isoformat() back to datetime value: Implement custom isoparse() function or extend strptime so isoparse simply calls strptime with a default format? Support all variations produced by isoformat or just a subset? (Variations include with/without fraction, with/without tz and separator choice)
I suggest 1 separate directives 2a extend strptime and 2b support all variations. Do you have different preferences on any of these questions?
I understand that the number of extensions to support this seems excessive to you.
Technically, my proposed "%.f" is not really necessary. I added it for completeness. We can keep using ".%f" for non-optional fraction and define "%?f" to implicitly include the dot.
The distinction between "%z", "%:z" and "%?:z"" can also be narrowed down. This can be done, for example, by making "%z" and "%?s" always accept hhmm with or without the : separator.
On Fri, 20 Oct 2017 at 17:16, Paul G <paul@ganssle.io> wrote:
I think this would be a much bigger change to the strptime interface than is actually warranted, and probably would add in additional, unnecessary complexity by introducing the concept of optional matches. Adding the capability to match HH:MM offsets is a reasonable extension partially because that is a standard representation that is currently *not* covered by strptime, and the fact that that's how isoformat() represents the offset just makes this lack all the more acute.
I think it should be uncontroversial to add *one* of these two %z extensions to Python 3 without getting bogged down in allowing a single strptime string to match any output from `.isoformat`.
That said, I'm also very much in favor of a `.isoparse` or `.fromisoformat` constructor that *is* the inverse of `isoformat`, which should solve the issue without sweeping changes to how `strptime` works.
On 10/19/2017 04:07 PM, Oren Tirosh wrote: > https://github.com/orent/cpython/tree/strptime_extensions > > %:z - matches +HH:MM > %?:z - optional %:z > %.f - equivalent to .%f > %?.f - optional %.f > %?t - matches ' ' or 'T' > > What they all have in common is that together they make it possible to > write a strptime format that matches all possible output variations of > datetime.__str__/ datetime.isoformat. > > The time zone not only supports the : separator but also allows making the > entire component optional, as isoformat() will add it only for aware > datetime objects. The seconds fraction is dropped from the default string > representation if the datetime represents a whole second. Since it is > dropped along with the decimal dot, I first made "%.f" that includes the > dot and then created the optional variant. Finally, "%?t" can be used to > accept a timestamp with either of the separators defined in iso8601. > > It is quite absurd that datetime cannot parse its own string > representation. Using these extensions an .isoparse() method may be added > that calls strptime('%Y-%m-%d%?t%H:%M:%S%?.f%?:z') and supports full > round-tripping of all possible datetime values that do not not use a custom > tzinfo. > > Oren > > > > On Thu, 19 Oct 2017 at 17:06, Paul G <paul@ganssle.io> wrote: >> >> There is a new issue about the %z directive in strptime on the issue > tracker: https://bugs.python.org/issue31800 (linked to a few related > issues), and a linked PR expanding the definition of %z to match HH:MM: > https://github.com/python/cpython/pull/4015 >> >> I think either adding a %:z directive or expanding the definition of %z > would be pretty important, and I think there's a good case to be made for > either one. To summarize the arguments for people on the mailing list: >> >> The argument for expanding the definition of %z that I find strongest is > that according to the linux man pages ( > http://man7.org/linux/man-pages/man3/strptime.3.html ), while %z generates > +-HHMM in strftime, strptime is supposed to match "An RFC-822/ISO 8601 > standard timezone specification",and ISO 8601 uses +-HH:MM, so if we're > following those linux pages, we should be accepting the version with the > colon. >> >> The argument that I find most compelling for adding a %:z directive are: >> >> 1. maintains the symmetry between strftime and strptime >> 2. allows users to be stricter about their datetime format >> 3. has precedent in that GNU's `date` command accepts %z, %:z and > %::z formats >> >> Can we establish some consensus on which should be done so that it can be > implemented? >> >> Best, >> >> Paul >> >> _______________________________________________ >> Datetime-SIG mailing list >> Datetime-SIG@python.org >> https://mail.python.org/mailman/listinfo/datetime-sig >> The PSF Code of Conduct applies to this mailing list: > https://www.python.org/psf/codeofconduct/ > > > > _______________________________________________ > Datetime-SIG mailing list > Datetime-SIG@python.org > https://mail.python.org/mailman/listinfo/datetime-sig > The PSF Code of Conduct applies to this mailing list: https://www.python.org/psf/codeofconduct/ >
_______________________________________________ Datetime-SIG mailing list Datetime-SIG@python.org https://mail.python.org/mailman/listinfo/datetime-sig The PSF Code of Conduct applies to this mailing list: https://www.python.org/psf/codeofconduct/
_______________________________________________ Datetime-SIG mailing list Datetime-SIG@python.org https://mail.python.org/mailman/listinfo/datetime-sig The PSF Code of Conduct applies to this mailing list: https://www.python.org/psf/codeofconduct/
_______________________________________________ Datetime-SIG mailing list Datetime-SIG@python.org https://mail.python.org/mailman/listinfo/datetime-sig The PSF Code of Conduct applies to this mailing list: https://www.python.org/psf/codeofconduct/
I'm slightly leaning towards %:z because changing the semantics of %z could be construed as a backwards-incompatible change (albeit a minor one). I know some people have been asking for a "strict" version of the dateutil parser, and people do tend to use parsers for string validation. Adding the %:z option has the advantage that it's unambiguously backwards compatible, and it can be added to strftime if that is deemed desirable.
I think the issue in dateutil is a different one as the parser is fully flexible. Here, even if it can be claimed as a backwards-incompatible change (same could have been done in glibc) it seems quite fragile if you are using isoparse with %z to check that your offset does not have a ':'. Whilst in dateutil it is true that it can happen that sometimes it will parse happily things that "don't seem to be a date" (but they can actually be interpreted as so). Moreover, (ideally) this will get on a new Python version (3.7) not on a random patch. Last but not least, as a user, if you don't even read the docs. Would you not agree with %z being able to parse iso standard offsets? I actually found it surprising that it could not. I'd just keep it simple. I strongly prefer: "%z parses RFC-822/ISO 8601 standard utc offset" (what you usually work with). Over: if your offsets have a colon, use "%:z" if they dont, use "%z" if they can use Zulu remember to check for "Z" as well. BUT! As said, no authority here :) On 21 October 2017 at 17:12, Paul G <paul@ganssle.io> wrote:
Back to the subject of how to handle +-HH:MM, I think the only really viable candidates are %z and %:z, so I think the question boils down to whether, with strptime, we care more about consistency with GNU / glibc's strptime (which apparently do implement %z to cover both HHMM and HH:MM) or whether we care more about users being able to specific *exactly* the string they want to match (e.g. allowing users to specify that a colon found in a time zone offset is an error condition).
I'm slightly leaning towards %:z because changing the semantics of %z could be construed as a backwards-incompatible change (albeit a minor one). I know some people have been asking for a "strict" version of the dateutil parser, and people do tend to use parsers for string validation. Adding the %:z option has the advantage that it's unambiguously backwards compatible, and it can be added to strftime if that is deemed desirable.
Best,
Paul
Sorry, hit send by mistake on the previous message.
That is fine for parsing, but my issue with this is symmetry with strftime.
I can agree with having a %:z for support in strftime but I think that is a separate change. The issue I opened with the attached PR focused only in strptime to facilitate the discussion.
Again, what is the alternative?
Making %z accept time-offset rfc3339 compatible.
I have a working strptime:
Ouch, except for the fractionals seconds (which was not part of the issue raised) I had also a patch for the colon and another for supporting 'Z' as reported in the bug tracker. I was mentioning working with Paul in the implementation of isoparse, as even if it might look simple it has caused many long-standing discussions in the past.
On 21 October 2017 at 13:55, Mario Corchero <mariocj89@gmail.com> wrote:
On 21 October 2017 at 13:18, Oren Tirosh <orent@hishome.net> wrote:
On Sat, 21 Oct 2017 at 13:24, Mario Corchero <mariocj89@gmail.com>
wrote:
My opinion (as a user, I have no authority here whatsoever)
*1) About parsing colons in offsets with strptime*
I think having %z support both +-HH:MM and +-HHMM would be the best choice, as it seems the simplest for me as a user. I'd go even further, making %z support ':' and 'Z', *a la glibc*. This effectively means that %z can now parse: Z, ±hh:mm, ±hhmm, or ±hh
That is fine for parsing, but my issue with this is symmetry with strftime. If the same extensions are also implemented for formatting (I have a prototype) then you need some way to specify whether you want a
:
separator or not. The %z will have to remain without colon on
for backward compatibility.
So l agree that the parser can be safely made more liberal in what it accepts, but the formatter must be strict and specific in what it
I think this gives the best experience to the strptime user. It
basically makes the time-offset rfc3339 <https://tools.ietf.org/html/rfc3339> compatible.
Yes, that's the goal.
*2) Adding a handy function to build a datetime from a string
serialized
with isoformat* Absolutely agree on having an isoparse. That would be amazing, we can even build it on top of 1).
...and building it on top of 1 requires several extensions and variants. People here seem to be a bit taken aback by the scope of these extensions. I understand this reaction, but I maintain that most or all this complexity is necessary if you want to implement this on to of strptime rather
custom isoparse().
*Side note:*
I am not totally in favour with "%?:z" (probably because I am leaning on %z doing the parsing for both and ?z will have no place on strftime). I think this starts to add way too much complexity to just say "parse a time-offset".
Again, what is the alternative? If you want a parser that accepts the output of isoformat() for all possible datetime values (except custom tzinfo) then it needs to support a missing tz offset as indicating a naive timestamp.
You can say that the real source of the asymmetry here is not with my proposal but rather in the underlying strftime/strptime: on
yields an empty string for a naive timestamp rather that producing an error. But on parsing, it refuses to parse a timestamp with no offset. A truly symmetric implementation would have accepted it as an naive timestamp.
Too late for %z because it must remain backward compatible, but perhaps %:z can be made to accept a missing offset as a naive timestamp. The user can then check for naive timestamp and reject them if they are unacceptable in that context, rather than specifying whether a missing timestamp is acceptable or not in the format string. I have no problem with either solution.
*Implementation:* I am happy to work with PaulG in the isoparse implementation if we decide to go with it and if he wants to get involved :)
I have a working strptime: https://github.com/orent/cpython/tree/strptime_extensions
isoparse() on top of this strptime is a trivial one-liner.
Oren
*Thanks:* Thanks for dedicating time to this, I think that even if minor this would be a killer addition to 3.7 if we manage to get it through.
On 21 October 2017 at 07:34, Oren Tirosh <orent@hishome.net> wrote:
ok, let's try to separate the issues and choices on each one:
1. Extending strptime to support time zone offset with : separator: Should a single directive accepts either hhmm or by:mm or use two separate directives?
2. Round tripping of isoformat() back to datetime value: Implement custom isoparse() function or extend strptime so isoparse simply calls strptime with a default format? Support all variations produced by isoformat or just a subset? (Variations include with/without fraction, with/without tz and
separator
choice)
I suggest 1 separate directives 2a extend strptime and 2b support all variations. Do you have different preferences on any of these questions?
I understand that the number of extensions to support this seems excessive to you.
Technically, my proposed "%.f" is not really necessary. I added it for completeness. We can keep using ".%f" for non-optional fraction and define "%?f" to implicitly include the dot.
The distinction between "%z", "%:z" and "%?:z"" can also be narrowed down. This can be done, for example, by making "%z" and "%?s" always accept hhmm with or without the : separator.
On Fri, 20 Oct 2017 at 17:16, Paul G <paul@ganssle.io> wrote:
> I think this would be a much bigger change to the strptime interface > than is actually warranted, and probably would add in additional, > unnecessary complexity by introducing the concept of optional matches. > Adding the capability to match HH:MM offsets is a reasonable extension > partially because that is a standard representation that is currently *not* > covered by strptime, and the fact that that's how isoformat() represents > the offset just makes this lack all the more acute. > > I think it should be uncontroversial to add *one* of these two %z > extensions to Python 3 without getting bogged down in allowing a single > strptime string to match any output from `.isoformat`. > > That said, I'm also very much in favor of a `.isoparse` or > `.fromisoformat` constructor that *is* the inverse of `isoformat`, which > should solve the issue without sweeping changes to how `strptime` works. > > On 10/19/2017 04:07 PM, Oren Tirosh wrote: >> https://github.com/orent/cpython/tree/strptime_extensions >> >> %:z - matches +HH:MM >> %?:z - optional %:z >> %.f - equivalent to .%f >> %?.f - optional %.f >> %?t - matches ' ' or 'T' >> >> What they all have in common is that together they make it possible > to >> write a strptime format that matches all possible output variations > of >> datetime.__str__/ datetime.isoformat. >> >> The time zone not only supports the : separator but also allows > making the >> entire component optional, as isoformat() will add it only for aware >> datetime objects. The seconds fraction is dropped from the default > string >> representation if the datetime represents a whole second. Since it is >> dropped along with the decimal dot, I first made "%.f" that includes > the >> dot and then created the optional variant. Finally, "%?t" can be > used to >> accept a timestamp with either of the separators defined in iso8601. >> >> It is quite absurd that datetime cannot parse its own string >> representation. Using these extensions an .isoparse() method may be > added >> that calls strptime('%Y-%m-%d%?t%H:%M:%S%?.f%?:z') and supports full >> round-tripping of all possible datetime values that do not not use a > custom >> tzinfo. >> >> Oren >> >> >> >> On Thu, 19 Oct 2017 at 17:06, Paul G <paul@ganssle.io> wrote: >>> >>> There is a new issue about the %z directive in strptime on the issue >> tracker: https://bugs.python.org/issue31800 (linked to a few related >> issues), and a linked PR expanding the definition of %z to match > HH:MM: >> https://github.com/python/cpython/pull/4015 >>> >>> I think either adding a %:z directive or expanding the definition > of %z >> would be pretty important, and I think there's a good case to be > made for >> either one. To summarize the arguments for people on the mailing > list: >>> >>> The argument for expanding the definition of %z that I find > strongest is >> that according to the linux man pages ( >> http://man7.org/linux/man-pages/man3/strptime.3.html ), while %z > generates >> +-HHMM in strftime, strptime is supposed to match "An RFC-822/ISO > 8601 >> standard timezone specification",and ISO 8601 uses +-HH:MM, so if > we're >> following those linux pages, we should be accepting the version with > the >> colon. >>> >>> The argument that I find most compelling for adding a %:z
On 10/21/2017 09:07 AM, Mario Corchero wrote: formatting produces. than a formatting, %z directive
> are: >>> >>> 1. maintains the symmetry between strftime and strptime >>> 2. allows users to be stricter about their datetime format >>> 3. has precedent in that GNU's `date` command accepts %z, %:z > and >> %::z formats >>> >>> Can we establish some consensus on which should be done so that it > can be >> implemented? >>> >>> Best, >>> >>> Paul >>> >>> _______________________________________________ >>> Datetime-SIG mailing list >>> Datetime-SIG@python.org >>> https://mail.python.org/mailman/listinfo/datetime-sig >>> The PSF Code of Conduct applies to this mailing list: >> https://www.python.org/psf/codeofconduct/ >> >> >> >> _______________________________________________ >> Datetime-SIG mailing list >> Datetime-SIG@python.org >> https://mail.python.org/mailman/listinfo/datetime-sig >> The PSF Code of Conduct applies to this mailing list: > https://www.python.org/psf/codeofconduct/ >> > > _______________________________________________ > Datetime-SIG mailing list > Datetime-SIG@python.org > https://mail.python.org/mailman/listinfo/datetime-sig > The PSF Code of Conduct applies to this mailing list: > https://www.python.org/psf/codeofconduct/ >
_______________________________________________ Datetime-SIG mailing list Datetime-SIG@python.org https://mail.python.org/mailman/listinfo/datetime-sig The PSF Code of Conduct applies to this mailing list: https://www.python.org/psf/codeofconduct/
_______________________________________________ Datetime-SIG mailing list Datetime-SIG@python.org https://mail.python.org/mailman/listinfo/datetime-sig The PSF Code of Conduct applies to this mailing list: https://www.python.org/psf/codeofconduct/
_______________________________________________ Datetime-SIG mailing list Datetime-SIG@python.org https://mail.python.org/mailman/listinfo/datetime-sig The PSF Code of Conduct applies to this mailing list: https://www.python.org/psf/codeofconduct/
participants (2)
-
Mario Corchero
-
Paul G