New subject: Matching +-HH:MM in strptime

21 Oct 2017

      Back to the subject of how to handle +-HH:MM, I think the only really viable candidates are %z and %:z, so I think the question boils down to whether, with strptime, we care more about consistency with GNU / glibc's strptime (which apparently do implement %z to cover both HHMM and HH:MM) or whether we care more about users being able to specific *exactly* the string they want to match (e.g. allowing users to specify that a colon found in a time zone offset is an error condition).

I'm slightly leaning towards %:z because changing the semantics of %z could be construed as a backwards-incompatible change (albeit a minor one). I know some people have been asking for a "strict" version of the dateutil parser, and people do tend to use parsers for string validation. Adding the %:z option has the advantage that it's unambiguously backwards compatible, and it can be added to strftime if that is deemed desirable.

Best,

Paul

On 10/21/2017 09:07 AM, Mario Corchero wrote:
...
Sorry, hit send by mistake on the previous message.
That is fine for parsing, but my issue with this is symmetry with strftime.
I can agree with having a %:z for support in strftime but I think that is a
separate change. The issue I opened with the attached PR focused only in
strptime to facilitate the discussion.
Again, what is the alternative?
Making %z accept time-offset rfc3339 compatible.
I have a working strptime:
Ouch, except for the fractionals seconds (which was not part of the issue
raised) I had also a patch for the colon and another for supporting 'Z' as
reported in the bug tracker. I was mentioning working with Paul in the
implementation of isoparse, as even if it might look simple it has caused
many long-standing discussions in the past.
On 21 October 2017 at 13:55, Mario Corchero <mariocj89@gmail.com> wrote:
...
On 21 October 2017 at 13:18, Oren Tirosh <orent@hishome.net> wrote:
...
On Sat, 21 Oct 2017 at 13:24, Mario Corchero <mariocj89@gmail.com> wrote:
...
My opinion (as a user, I have no authority here whatsoever)
*1) About parsing colons in offsets with strptime*
I think having %z support both +-HH:MM and +-HHMM would be the best
choice, as it seems the simplest for me as a user.
I'd go even further, making %z support ':' and 'Z', *a la glibc*.
This effectively means that %z can now parse: Z, ±hh:mm, ±hhmm, or ±hh
That is fine for parsing, but my issue with this is symmetry with
strftime. If the same extensions are also implemented for formatting (I
have a prototype) then you need some way to specify whether you want a :
separator or not. The %z will have to remain without colon on formatting
for backward compatibility.
So l agree that the parser can be safely made more liberal in what it
accepts, but the formatter must be strict and specific in what it produces.
I think this gives the best experience to the strptime user. It
...
basically makes the time-offset rfc3339
<https://tools.ietf.org/html/rfc3339> compatible.
Yes, that's the goal.
*2) Adding a handy function to build a datetime from a string serialized
...
with isoformat*
Absolutely agree on having an isoparse. That would be amazing, we can
even build it on top of 1).
...and building it on top of 1 requires several extensions and variants.
People here seem to be a bit taken aback by the scope of these extensions.
I understand this reaction, but I maintain that most or all this complexity
is necessary if you want to implement this on to of strptime rather than a
custom isoparse().
*Side note:*
...
I am not totally in favour with "%?:z" (probably because I am leaning
on %z doing the parsing for both and ?z will have no place on strftime).
I think this starts to add way too much complexity to just say "parse a
time-offset".
Again, what is the alternative? If you want a parser that accepts the
output of isoformat() for all possible datetime values (except custom
tzinfo) then it needs to support a missing tz offset as indicating a naive
timestamp.
You can say that the real source of the asymmetry here is not with my
proposal but rather in the underlying strftime/strptime: on formatting, %z
yields an empty string for a naive timestamp rather that producing an
error. But on parsing, it refuses to parse a timestamp with no offset. A
truly symmetric implementation would have accepted it as an naive
timestamp.
Too late for %z because it must remain backward compatible, but perhaps
%:z can be made to accept a missing offset as a naive timestamp. The user
can then check for naive timestamp and reject them if they are unacceptable
in that context, rather than specifying whether a missing timestamp is
acceptable or not in the format string. I have no problem with either
solution.
...
*Implementation:*
I am happy to work with PaulG in the isoparse implementation if we
decide to go with it and if he wants to get involved :)
I have a working strptime:
 https://github.com/orent/cpython/tree/strptime_extensions
isoparse() on top of this strptime is a trivial one-liner.
Oren
...
*Thanks:*
Thanks for dedicating time to this, I think that even if minor this
would be a killer addition to 3.7 if we manage to get it through.
On 21 October 2017 at 07:34, Oren Tirosh <orent@hishome.net> wrote:
...
ok, let's try to separate the issues and choices on each one:
1. Extending strptime to support time zone offset with : separator:
Should a single directive accepts either hhmm or by:mm or use two
separate directives?
2. Round tripping of isoformat() back to datetime value:
Implement custom isoparse() function or extend strptime so isoparse
simply calls strptime with a default format?
Support all variations produced by isoformat or just a subset?
(Variations include with/without fraction, with/without tz and separator
choice)
I suggest 1 separate directives 2a extend strptime and 2b support all
variations. Do you have different preferences on any of these questions?
I understand that the number of extensions to support this seems
excessive to you.
Technically, my proposed "%.f" is not really necessary. I added it for
completeness. We can keep using ".%f" for non-optional fraction and define
"%?f" to implicitly include the dot.
The distinction between "%z",  "%:z" and "%?:z"" can also be narrowed
down. This can be done, for example, by making "%z" and "%?s" always accept
hhmm with or without the : separator.
On Fri, 20 Oct 2017 at 17:16, Paul G <paul@ganssle.io> wrote:
...
I think this would be a much bigger change to the strptime interface
than is actually warranted, and probably would add in additional,
unnecessary complexity by introducing the concept of optional matches.
Adding the capability to match HH:MM offsets is a reasonable extension
partially because that is a standard representation that is currently *not*
covered by strptime, and the fact that that's how isoformat() represents
the offset just makes this lack all the more acute.
I think it should be uncontroversial to add *one* of these two %z
extensions to Python 3 without getting bogged down in allowing a single
strptime string to match any output from `.isoformat`.
That said, I'm also very much in favor of a `.isoparse` or
`.fromisoformat` constructor that *is* the inverse of `isoformat`, which
should solve the issue without sweeping changes to how `strptime` works.
On 10/19/2017 04:07 PM, Oren Tirosh wrote:
> https://github.com/orent/cpython/tree/strptime_extensions
>
> %:z  - matches +HH:MM
> %?:z - optional %:z
> %.f  - equivalent to .%f
> %?.f - optional %.f
> %?t  - matches ' ' or 'T'
>
> What they all have in common is that together they make it possible
to
> write a strptime format that matches all possible output variations
of
> datetime.__str__/ datetime.isoformat.
>
> The time zone not only supports the : separator but also allows
making the
> entire component optional, as isoformat() will add it only for aware
> datetime objects. The seconds fraction is dropped from the default
string
> representation if the datetime represents a whole second. Since it is
> dropped along with the decimal dot, I first made "%.f" that includes
the
> dot and then created the optional variant. Finally, "%?t" can be
used to
> accept a timestamp with either of the separators defined in iso8601.
>
> It is quite absurd that datetime cannot parse its own string
> representation. Using these extensions an .isoparse() method may be
added
> that calls strptime('%Y-%m-%d%?t%H:%M:%S%?.f%?:z') and supports full
> round-tripping of all possible datetime values that do not not use a
custom
> tzinfo.
>
> Oren
>
>
>
> On Thu, 19 Oct 2017 at 17:06, Paul G <paul@ganssle.io> wrote:
>>
>> There is a new issue about the %z directive in strptime on the issue
> tracker: https://bugs.python.org/issue31800 (linked to a few related
> issues), and a linked PR expanding the definition of %z to match
HH:MM:
> https://github.com/python/cpython/pull/4015
>>
>> I think either adding a %:z directive or expanding the definition
of %z
> would be pretty important, and I think there's a good case to be
made for
> either one. To summarize the arguments for people on the mailing
list:
>>
>> The argument for expanding the definition of %z that I find
strongest is
> that according to the linux man pages (
> http://man7.org/linux/man-pages/man3/strptime.3.html ), while %z
generates
> +-HHMM in strftime, strptime is supposed to match "An RFC-822/ISO
8601
> standard timezone specification",and ISO 8601 uses +-HH:MM, so if
we're
> following those linux pages, we should be accepting the version with
the
> colon.
>>
>> The argument that I find most compelling for adding a %:z directive
are:
>>
>>     1. maintains the symmetry between strftime and strptime
>>     2. allows users to be stricter about their datetime format
>>     3. has precedent in that GNU's `date` command accepts %z, %:z
and
> %::z formats
>>
>> Can we establish some consensus on which should be done so that it
can be
> implemented?
>>
>> Best,
>>
>> Paul
>>
>> _______________________________________________
>> Datetime-SIG mailing list
>> Datetime-SIG@python.org
>> https://mail.python.org/mailman/listinfo/datetime-sig
>> The PSF Code of Conduct applies to this mailing list:
> https://www.python.org/psf/codeofconduct/
>
>
>
> _______________________________________________
> Datetime-SIG mailing list
> Datetime-SIG@python.org
> https://mail.python.org/mailman/listinfo/datetime-sig
> The PSF Code of Conduct applies to this mailing list:
https://www.python.org/psf/codeofconduct/
>
_______________________________________________
Datetime-SIG mailing list
Datetime-SIG@python.org
https://mail.python.org/mailman/listinfo/datetime-sig
The PSF Code of Conduct applies to this mailing list:
https://www.python.org/psf/codeofconduct/
_______________________________________________
Datetime-SIG mailing list
Datetime-SIG@python.org
https://mail.python.org/mailman/listinfo/datetime-sig
The PSF Code of Conduct applies to this mailing list:
https://www.python.org/psf/codeofconduct/
_______________________________________________
Datetime-SIG mailing list
Datetime-SIG@python.org
https://mail.python.org/mailman/listinfo/datetime-sig
The PSF Code of Conduct applies to this mailing list: https://www.python.org/psf/codeofconduct/

Re: [Datetime-SIG] Matching +-HH:MM in strptime

Paul G

Mario Corchero

tags

participants (2)