Add parse_duration to datetime - a golang like fucntion to parse duration

Hi Everyone, I really like how go parses durations: ``` hours, _ := time.ParseDuration("10h") complex, _ := time.ParseDuration("1h10m10s") micro, _ := time.ParseDuration("1µs") // The package also accepts the incorrect but common prefix u for micro. micro2, _ := time.ParseDuration("1us") ``` Consider the example in https://docs.python.org/3/library/datetime.html#timedelta-objects: ```
With a go like parsing it would be:
datetime.parse_duration("2w50d8h5m27s10ms2000us")
Go lang's implementation only supports "ns", "us" (or "µs"), "ms", "s", "m", "h", but that does not mean Python has to restirct itself to these units.
There a few similar pypi packages and SO answers with similar implementations, so there a basis to start from:
https://github.com/wroberts/pytimeparse
https://github.com/oleiade/durations
But the code can be as simple as this:
from datetime import timedelta
import re
regex = re.compile( r'((?P<weeks>[\.\d]+?)w)?' r'((?P<days>[\.\d]+?)d)?' r'((?P<hours>[\.\d]+?)h)?' r'((?P<minutes>[\.\d]+?)m)?' r'((?P<seconds>[\.\d]+?)s)?' r'((?P<microseconds>[\.\d]+?)ms)?' r'((?P<milliseconds>[\.\d]+?)us)?$' ) def parse_time(time_str): """ Parse a time string e.g. (2h13m) into a timedelta object. Modified from virhilo's answer at https://stackoverflow.com/a/4628148/851699 :param time_str: A string identifying a duration. (eg. 2h13m) :return datetime.timedelta: A datetime.timedelta object """ parts = regex.match(time_str) assert parts is not None, "Could not parse any time information from '{}'. Examples of valid strings: '8h', '2d8h5m20s', '2m4s'".format(time_str) time_params = {name: float(param) for name, param in parts.groupdict().items() if param} return timedelta(**time_params) print(repr(parse_time("2w50d8h5m27s10ms2000us"))) ``` This is an extended version of: https://stackoverflow.com/a/51916936/492620\ Is there someone willing to sponsor a PR for adding this to the STL? I'm willing to work on the code as well as the tests and documentation (I contributed small changes to docs.python.org and the `calendar` module in the past). Best regards, Oz

I would very much welcome this to the standard library as it seems like a useful function; I certainly would make good use of it and I'm sure others working with durations use similar code for user-facing parts.
With a go like parsing it would be:
datetime.parse_duration("2w50d8h5m27s10ms2000us")
I feel like this would fit better as an alternate constructor (e.g. timedelta.from_string) however

Oz writes:
-1 I see no need for this, and minutes vs. months are ambiguous. Especially with denormalized input as above, it's quite unreadable compared to the timedelta constructor with keywords. I suspect it's mistake-prone. If it is to be used for user input, I think consideration should be given to internationalization (e.g., all of the Japanese units except hours are single character characters, and hours can be abbreviated intuitively and unambiguously which is more than you can say for English months, minutes, milliseconds, and microseconds!) But that degree of complexity is pretty clearly out of scope for the standard library. There probably should be an option for warning or raising on denormalized input. That is, it's OK to omit units that could be used, but not to overflow to the next bigger unit actually used. So "8h300s" would be OK, but "8h4300s" not. Also, a warning if the units are out of order. Again, that kind of thing seems like it never ends and is beyond the scope of the standard library. If anything is to be added, I would prefer using ISO 8601 durations. https://en.wikipedia.org/wiki/ISO_8601#Durations The main differences are use of "P" and "T" to signal that it is a duration and allow both minutes and months to be identified by "M", and decimal fractions of the smallest unit present are permitted, and seconds is the smallest unit, so the above would be represented "P2W50DT8H5M27.012000S" Since conversions among s, ms, and μs are just 3 place decimal shifts, I don't see a desperate need for separate unit symbols for those. There are some useful extensions that would not make parsing more difficult. I don't see any reason to disallow lower case or mixed case, and in most fonts I would find lowercase more readable. Perhaps both ASCII SPC and underscore ('_') could be ignored, for better readability. This would also help future-proof against extensions to datetime to allow nanoseconds (or arbitrary SI prefixes), and (at least up to nanoseconds) quite readable with the recent Python convention allowing "_" to group digits in numbers. Steve

On 8/22/20 1:07 PM, Stephen J. Turnbull wrote:
One issue with allowing Months here is then suddenly an interval becomes dependent on when it is, so needs to be keep in a complex form, as a month (and year) are variable length time units. Also, the math gets funny when you do things like this. Jan 31st + 1 month is Feb 28th (or 29th) + 1 month is March 28th (or 29th), but Jan 31st + 2 Months is March 31st, Similarly, adding a month and then subtracting a month doesn't always get you back to your starting time. -- Richard Damon

On Sat, Aug 22, 2020 at 10:29 AM Richard Damon <Richard@damon-family.org> wrote: One issue with allowing Months here is then suddenly an interval becomes
dependent on when it is, so needs to be keep in a complex form, as a month (and year) are variable length time units.
months are already not supported by timedelta, I'm not sure why this was brought up. On 8/22/20 1:07 PM, Stephen J. Turnbull wrote:
If anything is to be added, I would prefer using ISO 8601 durations.
I agree here -- I know I"ve written a little wrapper, "asdatetime" that takes either a datetime object or a ISO 8601 string, so my scripting users have an easier API. Adding one for timedetla makes sense to me. However, the "problem" witht he ISO Duration standard is that it's what I call a "calendar" description -- e.g. this date next month, not a timedelta description -- the datetime module already mingles those a bit too much in my mind, but we really shoudln't make it worse, and while timedelta does support days and weeks, it does not support months, because months are not a well defined time interval. So if we want to go with an ISO-like, how about the time portion of a datetime string. T12:23:34:12 In any case, I have to say that I find the OP's example pretty darn unreadable: datetime.parse_duration("2w50d8h5m27s10ms2000us") Do we really need to support weeks, or even days? Again, those are good fro Calendering operations, but those aren't well supported by timedelta anyway. Another option, which has been brought up on this list before is a set of utilties for comon durations. I actualyl have these in y code as well: days(n=1) hours(n=1) minutes(n=1) seconds(n=1) I find these very useful for scripting where the scrit writer may not be all that familiar with python and the datetime module. And while: hours(2) + minutes(30) Is only arguably more readable than: timedelta(hours=2, minutes=30) I find that in my uses, folks have a timescale for there application, and rarely need toim mix them. (and hours(2.5) workd fine as well. TL;DR: -0 on the full string parsing as proposed by the OP -1 on ISO duration string +0 on using teh time portion of the ISO datetime string +1 on named utilites: days(), hours(), minutes(), seconds(), microsecronds() -CHB
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Richard Damon writes:
This is the paradigmatic reason why I don't want this in the stdlib. There's an irreducible ambiguity between what humans mean by "a month later" and what timedelta can deal with natively. But if it's going to be in the standard library, it should be a standard protocol. Note that hours have the same problem in jurisdictions that observe daylight savings time, and minutes have the same problem because of leap seconds. Weeks have a different problem (or ambiguity): suppose "1 week later" falls on a holiday? So, "consenting adults." The documentation can warn about these issues. You could also add a flag to accept years and months (and hours and minutes? ;-). But does something that needs such a flag belong in the stdlib?

On 8/23/20 7:21 AM, Stephen J. Turnbull wrote:
You get around leap seconds by just using the right time base, if you use UT1 or TAI then there are no leap seconds, so no problem there. Similarly for Daylight Savings Time, that is only an issue if you are in a frame that changes, if you define you time as a uniform frame (like GMT) then there is no DST to worry about (only when you convert your 'universal time measure' to a 'Local Clock Time' do you need to worry about that. As for holidays, why do holidays matter for time. A week after Dec 18th is Dec 25th, even if much of the world treats that day as differently. Now, if you want to deal with 'Work Weeks' or 'Work Days' then you need something more complicated, just like we had with months. -- Richard Damon

Richard Damon writes:
As for holidays, why do holidays matter for time.
They don't. They matter for durations, because humans regularly do things like schedule a meeting for "one week from today" and then have to it because it will fall on a holiday observed by their employer. Why use units like months and cubits? Because "Man is the measure of all things." People think in terms of such units, and then are surprised when computers do "stupid" things with them, like convert them to intervals measured in seconds that are applied to TAI dates, and so schedule meetings on a holiday. Now, if you aren't thinking like a human, all you need are seconds. Why mess with such a complicated representation combining weeks and minutes and microseconds? So the OP is evidently thinking in service of humans. This protocol *will* be used by humans, and I'll guarantee you those users will occasionally be surprised and annoyed by the results. Steve

On Sat, 22 Aug 2020 at 13:08, Oz <nahumoz@gmail.com> wrote:
In the context you present, it looks like the expected use case is almost exclusively parsing constant strings representing fixed timedeltas. In that context, it seems to me that we have: pros: more compact (not everyone would view this as a "pro", but let's go with it). cons: overhead of string parsing at runtime more potential errors (mistype w as q, for example) new syntax to remember (do the parts need to be in a particular order, are spaces allowed, is it us or μs, etc?) Overall, I don't think this is particularly beneficial, personally. If on the other hand you're expecting to parse *non-constant* strings, you're typically talking about user-entered data. In which case, it seems like you're inventing a new, fairly limited, notation for time intervals, with the expectation that it would be used in places like config files, or maybe even direct user input. So the proposal depends heavily on whether the notation is something people would want. And in that case, I think it's unlikely. I'd be much more supportive if this were a well-known standard format for intervals. It appears that ISO 8601 defines such a format - see https://en.wikipedia.org/wiki/ISO_8601#Durations. Maybe the Go notation is somehow better, but there's no immediate reason I can see to assume that. And for human input, you'd want something a lot more flexible. People typically don't enter things in nice neat formats, and parsers need a lot of flexibility. That's quite messy, and the stdlib typically isn't where such parsers are available (parsing of human date input is found in external libraries like dateutil, the stdlib sticks to more fixed formats). To be honest, there doesn't seem to be much around in the way of parsers for interval data, so it would be nice to see something. But (a) I'd rather it were on PyPI, so it's not restricted to newer versions of Python, and (b) the proposed format isn't one I'd want, personally. Paul

I would very much welcome this to the standard library as it seems like a useful function; I certainly would make good use of it and I'm sure others working with durations use similar code for user-facing parts.
With a go like parsing it would be:
datetime.parse_duration("2w50d8h5m27s10ms2000us")
I feel like this would fit better as an alternate constructor (e.g. timedelta.from_string) however

Oz writes:
-1 I see no need for this, and minutes vs. months are ambiguous. Especially with denormalized input as above, it's quite unreadable compared to the timedelta constructor with keywords. I suspect it's mistake-prone. If it is to be used for user input, I think consideration should be given to internationalization (e.g., all of the Japanese units except hours are single character characters, and hours can be abbreviated intuitively and unambiguously which is more than you can say for English months, minutes, milliseconds, and microseconds!) But that degree of complexity is pretty clearly out of scope for the standard library. There probably should be an option for warning or raising on denormalized input. That is, it's OK to omit units that could be used, but not to overflow to the next bigger unit actually used. So "8h300s" would be OK, but "8h4300s" not. Also, a warning if the units are out of order. Again, that kind of thing seems like it never ends and is beyond the scope of the standard library. If anything is to be added, I would prefer using ISO 8601 durations. https://en.wikipedia.org/wiki/ISO_8601#Durations The main differences are use of "P" and "T" to signal that it is a duration and allow both minutes and months to be identified by "M", and decimal fractions of the smallest unit present are permitted, and seconds is the smallest unit, so the above would be represented "P2W50DT8H5M27.012000S" Since conversions among s, ms, and μs are just 3 place decimal shifts, I don't see a desperate need for separate unit symbols for those. There are some useful extensions that would not make parsing more difficult. I don't see any reason to disallow lower case or mixed case, and in most fonts I would find lowercase more readable. Perhaps both ASCII SPC and underscore ('_') could be ignored, for better readability. This would also help future-proof against extensions to datetime to allow nanoseconds (or arbitrary SI prefixes), and (at least up to nanoseconds) quite readable with the recent Python convention allowing "_" to group digits in numbers. Steve

On 8/22/20 1:07 PM, Stephen J. Turnbull wrote:
One issue with allowing Months here is then suddenly an interval becomes dependent on when it is, so needs to be keep in a complex form, as a month (and year) are variable length time units. Also, the math gets funny when you do things like this. Jan 31st + 1 month is Feb 28th (or 29th) + 1 month is March 28th (or 29th), but Jan 31st + 2 Months is March 31st, Similarly, adding a month and then subtracting a month doesn't always get you back to your starting time. -- Richard Damon

On Sat, Aug 22, 2020 at 10:29 AM Richard Damon <Richard@damon-family.org> wrote: One issue with allowing Months here is then suddenly an interval becomes
dependent on when it is, so needs to be keep in a complex form, as a month (and year) are variable length time units.
months are already not supported by timedelta, I'm not sure why this was brought up. On 8/22/20 1:07 PM, Stephen J. Turnbull wrote:
If anything is to be added, I would prefer using ISO 8601 durations.
I agree here -- I know I"ve written a little wrapper, "asdatetime" that takes either a datetime object or a ISO 8601 string, so my scripting users have an easier API. Adding one for timedetla makes sense to me. However, the "problem" witht he ISO Duration standard is that it's what I call a "calendar" description -- e.g. this date next month, not a timedelta description -- the datetime module already mingles those a bit too much in my mind, but we really shoudln't make it worse, and while timedelta does support days and weeks, it does not support months, because months are not a well defined time interval. So if we want to go with an ISO-like, how about the time portion of a datetime string. T12:23:34:12 In any case, I have to say that I find the OP's example pretty darn unreadable: datetime.parse_duration("2w50d8h5m27s10ms2000us") Do we really need to support weeks, or even days? Again, those are good fro Calendering operations, but those aren't well supported by timedelta anyway. Another option, which has been brought up on this list before is a set of utilties for comon durations. I actualyl have these in y code as well: days(n=1) hours(n=1) minutes(n=1) seconds(n=1) I find these very useful for scripting where the scrit writer may not be all that familiar with python and the datetime module. And while: hours(2) + minutes(30) Is only arguably more readable than: timedelta(hours=2, minutes=30) I find that in my uses, folks have a timescale for there application, and rarely need toim mix them. (and hours(2.5) workd fine as well. TL;DR: -0 on the full string parsing as proposed by the OP -1 on ISO duration string +0 on using teh time portion of the ISO datetime string +1 on named utilites: days(), hours(), minutes(), seconds(), microsecronds() -CHB
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Richard Damon writes:
This is the paradigmatic reason why I don't want this in the stdlib. There's an irreducible ambiguity between what humans mean by "a month later" and what timedelta can deal with natively. But if it's going to be in the standard library, it should be a standard protocol. Note that hours have the same problem in jurisdictions that observe daylight savings time, and minutes have the same problem because of leap seconds. Weeks have a different problem (or ambiguity): suppose "1 week later" falls on a holiday? So, "consenting adults." The documentation can warn about these issues. You could also add a flag to accept years and months (and hours and minutes? ;-). But does something that needs such a flag belong in the stdlib?

On 8/23/20 7:21 AM, Stephen J. Turnbull wrote:
You get around leap seconds by just using the right time base, if you use UT1 or TAI then there are no leap seconds, so no problem there. Similarly for Daylight Savings Time, that is only an issue if you are in a frame that changes, if you define you time as a uniform frame (like GMT) then there is no DST to worry about (only when you convert your 'universal time measure' to a 'Local Clock Time' do you need to worry about that. As for holidays, why do holidays matter for time. A week after Dec 18th is Dec 25th, even if much of the world treats that day as differently. Now, if you want to deal with 'Work Weeks' or 'Work Days' then you need something more complicated, just like we had with months. -- Richard Damon

Richard Damon writes:
As for holidays, why do holidays matter for time.
They don't. They matter for durations, because humans regularly do things like schedule a meeting for "one week from today" and then have to it because it will fall on a holiday observed by their employer. Why use units like months and cubits? Because "Man is the measure of all things." People think in terms of such units, and then are surprised when computers do "stupid" things with them, like convert them to intervals measured in seconds that are applied to TAI dates, and so schedule meetings on a holiday. Now, if you aren't thinking like a human, all you need are seconds. Why mess with such a complicated representation combining weeks and minutes and microseconds? So the OP is evidently thinking in service of humans. This protocol *will* be used by humans, and I'll guarantee you those users will occasionally be surprised and annoyed by the results. Steve

On Sat, 22 Aug 2020 at 13:08, Oz <nahumoz@gmail.com> wrote:
In the context you present, it looks like the expected use case is almost exclusively parsing constant strings representing fixed timedeltas. In that context, it seems to me that we have: pros: more compact (not everyone would view this as a "pro", but let's go with it). cons: overhead of string parsing at runtime more potential errors (mistype w as q, for example) new syntax to remember (do the parts need to be in a particular order, are spaces allowed, is it us or μs, etc?) Overall, I don't think this is particularly beneficial, personally. If on the other hand you're expecting to parse *non-constant* strings, you're typically talking about user-entered data. In which case, it seems like you're inventing a new, fairly limited, notation for time intervals, with the expectation that it would be used in places like config files, or maybe even direct user input. So the proposal depends heavily on whether the notation is something people would want. And in that case, I think it's unlikely. I'd be much more supportive if this were a well-known standard format for intervals. It appears that ISO 8601 defines such a format - see https://en.wikipedia.org/wiki/ISO_8601#Durations. Maybe the Go notation is somehow better, but there's no immediate reason I can see to assume that. And for human input, you'd want something a lot more flexible. People typically don't enter things in nice neat formats, and parsers need a lot of flexibility. That's quite messy, and the stdlib typically isn't where such parsers are available (parsing of human date input is found in external libraries like dateutil, the stdlib sticks to more fixed formats). To be honest, there doesn't seem to be much around in the way of parsers for interval data, so it would be nice to see something. But (a) I'd rather it were on PyPI, so it's not restricted to newer versions of Python, and (b) the proposed format isn't one I'd want, personally. Paul
participants (7)
-
Alex Hall
-
Christopher Barker
-
Oz
-
Paul Moore
-
Richard Damon
-
Stephen J. Turnbull
-
tcphone93@gmail.com