[Datetime-SIG] Matching +-HH:MM in strptime

Mario Corchero mariocj89 at gmail.com
Sat Oct 21 09:07:19 EDT 2017


Sorry, hit send by mistake on the previous message.

That is fine for parsing, but my issue with this is symmetry with strftime.


I can agree with having a %:z for support in strftime but I think that is a
separate change. The issue I opened with the attached PR focused only in
strptime to facilitate the discussion.

Again, what is the alternative?


Making %z accept time-offset rfc3339 compatible.

I have a working strptime:


Ouch, except for the fractionals seconds (which was not part of the issue
raised) I had also a patch for the colon and another for supporting 'Z' as
reported in the bug tracker. I was mentioning working with Paul in the
implementation of isoparse, as even if it might look simple it has caused
many long-standing discussions in the past.

On 21 October 2017 at 13:55, Mario Corchero <mariocj89 at gmail.com> wrote:

>
>
> On 21 October 2017 at 13:18, Oren Tirosh <orent at hishome.net> wrote:
>
>>
>> On Sat, 21 Oct 2017 at 13:24, Mario Corchero <mariocj89 at gmail.com> wrote:
>>
>>> My opinion (as a user, I have no authority here whatsoever)
>>>
>>> *1) About parsing colons in offsets with strptime*
>>>
>>> I think having %z support both +-HH:MM and +-HHMM would be the best
>>> choice, as it seems the simplest for me as a user.
>>> I'd go even further, making %z support ':' and 'Z', *a la glibc*.
>>> This effectively means that %z can now parse: Z, ±hh:mm, ±hhmm, or ±hh
>>>
>>
>> That is fine for parsing, but my issue with this is symmetry with
>> strftime. If the same extensions are also implemented for formatting (I
>> have a prototype) then you need some way to specify whether you want a :
>> separator or not. The %z will have to remain without colon on formatting
>> for backward compatibility.
>>
>> So l agree that the parser can be safely made more liberal in what it
>> accepts, but the formatter must be strict and specific in what it produces.
>>
>> I think this gives the best experience to the strptime user. It
>>> basically makes the time-offset rfc3339
>>> <https://tools.ietf.org/html/rfc3339> compatible.
>>>
>>
>> Yes, that's the goal.
>>
>> *2) Adding a handy function to build a datetime from a string serialized
>>> with isoformat*
>>> Absolutely agree on having an isoparse. That would be amazing, we can
>>> even build it on top of 1).
>>>
>>
>> ...and building it on top of 1 requires several extensions and variants.
>> People here seem to be a bit taken aback by the scope of these extensions.
>> I understand this reaction, but I maintain that most or all this complexity
>> is necessary if you want to implement this on to of strptime rather than a
>> custom isoparse().
>>
>> *Side note:*
>>> I am not totally in favour with "%?:z" (probably because I am leaning
>>> on %z doing the parsing for both and ?z will have no place on strftime).
>>> I think this starts to add way too much complexity to just say "parse a
>>> time-offset".
>>>
>>
>> Again, what is the alternative? If you want a parser that accepts the
>> output of isoformat() for all possible datetime values (except custom
>> tzinfo) then it needs to support a missing tz offset as indicating a naive
>> timestamp.
>>
>> You can say that the real source of the asymmetry here is not with my
>> proposal but rather in the underlying strftime/strptime: on formatting, %z
>> yields an empty string for a naive timestamp rather that producing an
>> error. But on parsing, it refuses to parse a timestamp with no offset. A
>> truly symmetric implementation would have accepted it as an naive
>> timestamp.
>>
>> Too late for %z because it must remain backward compatible, but perhaps
>> %:z can be made to accept a missing offset as a naive timestamp. The user
>> can then check for naive timestamp and reject them if they are unacceptable
>> in that context, rather than specifying whether a missing timestamp is
>> acceptable or not in the format string. I have no problem with either
>> solution.
>>
>>>
>>> *Implementation:*
>>> I am happy to work with PaulG in the isoparse implementation if we
>>> decide to go with it and if he wants to get involved :)
>>>
>>
>> I have a working strptime:
>>  https://github.com/orent/cpython/tree/strptime_extensions
>>
>> isoparse() on top of this strptime is a trivial one-liner.
>>
>> Oren
>>
>>>
>>>
>>> *Thanks:*
>>> Thanks for dedicating time to this, I think that even if minor this
>>> would be a killer addition to 3.7 if we manage to get it through.
>>>
>>> On 21 October 2017 at 07:34, Oren Tirosh <orent at hishome.net> wrote:
>>>
>>>> ok, let's try to separate the issues and choices on each one:
>>>>
>>>> 1. Extending strptime to support time zone offset with : separator:
>>>> Should a single directive accepts either hhmm or by:mm or use two
>>>> separate directives?
>>>>
>>>> 2. Round tripping of isoformat() back to datetime value:
>>>> Implement custom isoparse() function or extend strptime so isoparse
>>>> simply calls strptime with a default format?
>>>> Support all variations produced by isoformat or just a subset?
>>>> (Variations include with/without fraction, with/without tz and separator
>>>> choice)
>>>>
>>>> I suggest 1 separate directives 2a extend strptime and 2b support all
>>>> variations. Do you have different preferences on any of these questions?
>>>>
>>>> I understand that the number of extensions to support this seems
>>>> excessive to you.
>>>>
>>>> Technically, my proposed "%.f" is not really necessary. I added it for
>>>> completeness. We can keep using ".%f" for non-optional fraction and define
>>>> "%?f" to implicitly include the dot.
>>>>
>>>> The distinction between "%z",  "%:z" and "%?:z"" can also be narrowed
>>>> down. This can be done, for example, by making "%z" and "%?s" always accept
>>>> hhmm with or without the : separator.
>>>>
>>>> On Fri, 20 Oct 2017 at 17:16, Paul G <paul at ganssle.io> wrote:
>>>>
>>>>> I think this would be a much bigger change to the strptime interface
>>>>> than is actually warranted, and probably would add in additional,
>>>>> unnecessary complexity by introducing the concept of optional matches.
>>>>> Adding the capability to match HH:MM offsets is a reasonable extension
>>>>> partially because that is a standard representation that is currently *not*
>>>>> covered by strptime, and the fact that that's how isoformat() represents
>>>>> the offset just makes this lack all the more acute.
>>>>>
>>>>> I think it should be uncontroversial to add *one* of these two %z
>>>>> extensions to Python 3 without getting bogged down in allowing a single
>>>>> strptime string to match any output from `.isoformat`.
>>>>>
>>>>> That said, I'm also very much in favor of a `.isoparse` or
>>>>> `.fromisoformat` constructor that *is* the inverse of `isoformat`, which
>>>>> should solve the issue without sweeping changes to how `strptime` works.
>>>>>
>>>>> On 10/19/2017 04:07 PM, Oren Tirosh wrote:
>>>>> > https://github.com/orent/cpython/tree/strptime_extensions
>>>>> >
>>>>> > %:z  - matches +HH:MM
>>>>> > %?:z - optional %:z
>>>>> > %.f  - equivalent to .%f
>>>>> > %?.f - optional %.f
>>>>> > %?t  - matches ' ' or 'T'
>>>>> >
>>>>> > What they all have in common is that together they make it possible
>>>>> to
>>>>> > write a strptime format that matches all possible output variations
>>>>> of
>>>>> > datetime.__str__/ datetime.isoformat.
>>>>> >
>>>>> > The time zone not only supports the : separator but also allows
>>>>> making the
>>>>> > entire component optional, as isoformat() will add it only for aware
>>>>> > datetime objects. The seconds fraction is dropped from the default
>>>>> string
>>>>> > representation if the datetime represents a whole second. Since it is
>>>>> > dropped along with the decimal dot, I first made "%.f" that includes
>>>>> the
>>>>> > dot and then created the optional variant. Finally, "%?t" can be
>>>>> used to
>>>>> > accept a timestamp with either of the separators defined in iso8601.
>>>>> >
>>>>> > It is quite absurd that datetime cannot parse its own string
>>>>> > representation. Using these extensions an .isoparse() method may be
>>>>> added
>>>>> > that calls strptime('%Y-%m-%d%?t%H:%M:%S%?.f%?:z') and supports full
>>>>> > round-tripping of all possible datetime values that do not not use a
>>>>> custom
>>>>> > tzinfo.
>>>>> >
>>>>> > Oren
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Thu, 19 Oct 2017 at 17:06, Paul G <paul at ganssle.io> wrote:
>>>>> >>
>>>>> >> There is a new issue about the %z directive in strptime on the issue
>>>>> > tracker: https://bugs.python.org/issue31800 (linked to a few related
>>>>> > issues), and a linked PR expanding the definition of %z to match
>>>>> HH:MM:
>>>>> > https://github.com/python/cpython/pull/4015
>>>>> >>
>>>>> >> I think either adding a %:z directive or expanding the definition
>>>>> of %z
>>>>> > would be pretty important, and I think there's a good case to be
>>>>> made for
>>>>> > either one. To summarize the arguments for people on the mailing
>>>>> list:
>>>>> >>
>>>>> >> The argument for expanding the definition of %z that I find
>>>>> strongest is
>>>>> > that according to the linux man pages (
>>>>> > http://man7.org/linux/man-pages/man3/strptime.3.html ), while %z
>>>>> generates
>>>>> > +-HHMM in strftime, strptime is supposed to match "An RFC-822/ISO
>>>>> 8601
>>>>> > standard timezone specification",and ISO 8601 uses +-HH:MM, so if
>>>>> we're
>>>>> > following those linux pages, we should be accepting the version with
>>>>> the
>>>>> > colon.
>>>>> >>
>>>>> >> The argument that I find most compelling for adding a %:z directive
>>>>> are:
>>>>> >>
>>>>> >>     1. maintains the symmetry between strftime and strptime
>>>>> >>     2. allows users to be stricter about their datetime format
>>>>> >>     3. has precedent in that GNU's `date` command accepts %z, %:z
>>>>> and
>>>>> > %::z formats
>>>>> >>
>>>>> >> Can we establish some consensus on which should be done so that it
>>>>> can be
>>>>> > implemented?
>>>>> >>
>>>>> >> Best,
>>>>> >>
>>>>> >> Paul
>>>>> >>
>>>>> >> _______________________________________________
>>>>> >> Datetime-SIG mailing list
>>>>> >> Datetime-SIG at python.org
>>>>> >> https://mail.python.org/mailman/listinfo/datetime-sig
>>>>> >> The PSF Code of Conduct applies to this mailing list:
>>>>> > https://www.python.org/psf/codeofconduct/
>>>>> >
>>>>> >
>>>>> >
>>>>> > _______________________________________________
>>>>> > Datetime-SIG mailing list
>>>>> > Datetime-SIG at python.org
>>>>> > https://mail.python.org/mailman/listinfo/datetime-sig
>>>>> > The PSF Code of Conduct applies to this mailing list:
>>>>> https://www.python.org/psf/codeofconduct/
>>>>> >
>>>>>
>>>>> _______________________________________________
>>>>> Datetime-SIG mailing list
>>>>> Datetime-SIG at python.org
>>>>> https://mail.python.org/mailman/listinfo/datetime-sig
>>>>> The PSF Code of Conduct applies to this mailing list:
>>>>> https://www.python.org/psf/codeofconduct/
>>>>>
>>>>
>>>> _______________________________________________
>>>> Datetime-SIG mailing list
>>>> Datetime-SIG at python.org
>>>> https://mail.python.org/mailman/listinfo/datetime-sig
>>>> The PSF Code of Conduct applies to this mailing list:
>>>> https://www.python.org/psf/codeofconduct/
>>>>
>>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/datetime-sig/attachments/20171021/e38e9f22/attachment-0001.html>


More information about the Datetime-SIG mailing list