redux: fractional seconds in strptime

A couple months ago I proposed (maybe in a SF bug report) that time.strptime() grow some way to parse time strings containing fractional seconds based on my experience with the logging module. I've hit that stumbling block again, this time in parsing files with timestamps that were generated using datetime.time objects. I hacked around it again (in miserable fashion), but I really think this shortcoming should be addressed. A couple possibilities come to mind: 1. Extend the %S format token to accept simple decimals that match the re pattern "[0-9]+(?:\.[0-9]+)". 2. Add a new token that accepts decimals as above to avoid overloading the meaning of %S. 3. Add a token that matches integers corresponding to fractional parts. The Perl DateTime module uses %N to match nanoseconds (wanna bet that was added by a physicist?). Arbitrary other units can be specified by sticking a number between the "%" and the "N". I didn't see an example, but I presume "%6N" would match integers that are interpreted as microseconds. The advantage of the third choice is that you can use anything as the "decimal" point. The logging module separates seconds from their fractional part with a comma for some reason. (I live in the USofA where decimal points are usually represented by a period. I would be in favor of replacing the comma with a locale-specific decimal point in a future version of the logging module.) I'm not sure I like the optional exponent thing in Perl's DateTime module but it does make it easy to interpret integers representing fractions of a second when they occur without a decimal point to tell you where it is. I'm open to suggestions and will be happy to implement whatever is agreed to. Skip

Skip Montanaro wrote:
A couple months ago I proposed (maybe in a SF bug report)
http://www.python.org/sf/1006786 that
The problem I have always had with this proposal is that the value is worthless, time tuples do not have a slot for fractional seconds. Yes, it could possibly be changed to return a float for seconds, but that could possibly break things. My vote is that if something is added it be like %N but without the optional optional digit count. This allows any separator to be used while still consuming the digits. It also doesn't suddenly add optional args which are not supported for any other directive. -Brett

Brett> The problem I have always had with this proposal is that the Brett> value is worthless, time tuples do not have a slot for fractional Brett> seconds. Yes, it could possibly be changed to return a float for Brett> seconds, but that could possibly break things. Actually, time.strptime() returns a struct_time object. Would it be possible to extend %S to parse floats then add a microseconds (or whatever) field to struct_time objects that is available by attribute only? In Py3k it could worm its way into the tuple representation somehow (either as a new field or by returning seconds as a float). Brett> My vote is that if something is added it be like %N but without Brett> the optional optional digit count. This allows any separator to Brett> be used while still consuming the digits. It also doesn't Brett> suddenly add optional args which are not supported for any other Brett> directive. I realize the %4N notation is distasteful, but without it I think you will have trouble parsing something like 13:02:00.704 What would be the format string? %H:%M:%S.%N would be incorrect. It works if you allow the digit notation: %H:%M:%S.%3N I think that except for the logging module presentation of fractions of a second would almost always use the locale-specific decimal point, so if that problem is fixed, extending %S to understand floating point seconds would be reasonable. Skip

On 2005 Jan 14, at 10:36, Skip Montanaro wrote:
+1 -- I never liked the idea that 'time tuples' lost fractions of a second. On platforms where that's sensible and not too hard, time.time() could also -- unobtrusively and backwards compatibly -- set that same attribute. I wonder if, where the attribute's real value is unknown, it should be None (a correct indication of "I dunno") or 0.0 (maybe handier); instinctively, I would prefer None. "Available by attribute only" is probably sensible, overall, but maybe strftime should make available whatever formatting item[s] strptime may grow to support fractions of a second; and one such item (distinct from %S for guaranteed backwards compatibility) should be "seconds and fraction, with [[presumably, locale-specific]] decimal point inside". Alex

On Fri, 2005-01-14 at 09:36, Skip Montanaro wrote:
+1 for adding a microseconds field to struct_time, but I'd also like to see an integer-only way of parsing fractional seconds in time.strptime. Using floating point makes it harder to support exact comparison of timestamps (an issue I recently ran into when writing unit tests for code storing timestamps in a database). My vote is for %<digit>N producing a microseconds field. Mark Russell

Skip Montanaro wrote:
Right, it's a struct_time object; just force of habit to call it a time tuple. And I technically don't see why a fractional second attribute could not be added that is not represented in the tuple. But I personally would like to see struct_tm eliminated in Py3k and replaced with datetime usage. My wish is to have the 'time' module stripped down to only the bare essentials that just don't fit in datetime and push everyone to use datetime for most things.
Why is that incorrect? -Brett

On Fri, Jan 14, 2005, Brett C. wrote:
Because of people doing things like year, month, day, hour, min, sec, junk, junk, junk = time.localtime() -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "19. A language that doesn't affect the way you think about programming, is not worth knowing." --Alan Perlis

On 2005 Jan 14, at 19:11, Aahz wrote:
And why would that be a problem? It would keep working just like today, assuming you're answering the "don't see why" part. From the start, we discussed fractional seconds being available only as an ATTRIBUTE of a struct_time, not an ITEM (==iteration on a struct_time will keep working just line now). Alex

On Fri, Jan 14, 2005, Alex Martelli wrote:
Uh, I missed the second "not" in Brett's first sentence of second paragraph. Never mind! </litella> -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "19. A language that doesn't affect the way you think about programming, is not worth knowing." --Alan Perlis

>> I realize the %4N notation is distasteful, but without it I think you >> will have trouble parsing something like >> >> 13:02:00.704 >> >> What would be the format string? %H:%M:%S.%N would be incorrect. Brett> Why is that incorrect? Because "704" represents the number of milliseconds, not the number of nanoseconds. I'm sure that in some applications people are interested in extremely short time scales. Writing out hours, minutes and seconds when all you are concerned with are small fractions of seconds (think high energy physics) would be a waste. In those situations log entries like 704 saw proton 705 proton hit neutron 706 saw electron headed toward Saturn might make perfect sense. Parsing the time field entirely within time.strptime would be at least clumsy if you couldn't tell it the scale of the numbers you're dealing with. Parsing with %N, %3N or %6N would give different values (nanoseconds, milliseconds or microseconds). Skip

Skip Montanaro wrote:
Fine, but couldn't you also do a pass over the data after extraction to get to the actual result you want (so parse, and take the millisecond value and multiply by the proper scale)? This feels like it is YAGNI, or at least KISS. If you want to handle milliseconds because of the logging module, fine. But trying to deal with all possible time parsing possibilities is painful and usually not needed. Personally I am more inclined to add a new directive that acts as %S but allows for an optional decimal point, comma or the current locale's separator if it isn't one of those two which will handle the logging package's optional decimal output ('\d+([,.%s]\d+)?" % locale.localeconv()['decimal_point']). Also doesn't break any existing code. And an issue I forgot to mention for all of this is it will break symmetry with time.strftime(). If symmetry is kept then an extra step in strftime will need to be handled since whatever solution we do will not match the C spec anymore. -Brett

Everyone went silent on this topic. Does this mean people just stopped caring (which I doubt since I know Skip wants this bad enough to bring it up every so often)? Was it the issue of symmetry with strftime? I am willing to add this (albeit the simple way I proposed in my last email on this thread) but I obviously don't want to bother if no one wants it or likes my proposed solution. -Brett

Brett> Everyone went silent on this topic. Does this mean people just Brett> stopped caring (which I doubt since I know Skip wants this bad Brett> enough to bring it up every so often)? Was it the issue of Brett> symmetry with strftime? I have a patch to do strptime() fractional seconds, but stumbled on the reverse direction (making strftime() accept fractional seconds). I'll submit a patch with what I have later today. I have to catch a train just now. Skip

Skip Montanaro wrote:
A couple months ago I proposed (maybe in a SF bug report)
http://www.python.org/sf/1006786 that
The problem I have always had with this proposal is that the value is worthless, time tuples do not have a slot for fractional seconds. Yes, it could possibly be changed to return a float for seconds, but that could possibly break things. My vote is that if something is added it be like %N but without the optional optional digit count. This allows any separator to be used while still consuming the digits. It also doesn't suddenly add optional args which are not supported for any other directive. -Brett

Brett> The problem I have always had with this proposal is that the Brett> value is worthless, time tuples do not have a slot for fractional Brett> seconds. Yes, it could possibly be changed to return a float for Brett> seconds, but that could possibly break things. Actually, time.strptime() returns a struct_time object. Would it be possible to extend %S to parse floats then add a microseconds (or whatever) field to struct_time objects that is available by attribute only? In Py3k it could worm its way into the tuple representation somehow (either as a new field or by returning seconds as a float). Brett> My vote is that if something is added it be like %N but without Brett> the optional optional digit count. This allows any separator to Brett> be used while still consuming the digits. It also doesn't Brett> suddenly add optional args which are not supported for any other Brett> directive. I realize the %4N notation is distasteful, but without it I think you will have trouble parsing something like 13:02:00.704 What would be the format string? %H:%M:%S.%N would be incorrect. It works if you allow the digit notation: %H:%M:%S.%3N I think that except for the logging module presentation of fractions of a second would almost always use the locale-specific decimal point, so if that problem is fixed, extending %S to understand floating point seconds would be reasonable. Skip

On 2005 Jan 14, at 10:36, Skip Montanaro wrote:
+1 -- I never liked the idea that 'time tuples' lost fractions of a second. On platforms where that's sensible and not too hard, time.time() could also -- unobtrusively and backwards compatibly -- set that same attribute. I wonder if, where the attribute's real value is unknown, it should be None (a correct indication of "I dunno") or 0.0 (maybe handier); instinctively, I would prefer None. "Available by attribute only" is probably sensible, overall, but maybe strftime should make available whatever formatting item[s] strptime may grow to support fractions of a second; and one such item (distinct from %S for guaranteed backwards compatibility) should be "seconds and fraction, with [[presumably, locale-specific]] decimal point inside". Alex

On Fri, 2005-01-14 at 09:36, Skip Montanaro wrote:
+1 for adding a microseconds field to struct_time, but I'd also like to see an integer-only way of parsing fractional seconds in time.strptime. Using floating point makes it harder to support exact comparison of timestamps (an issue I recently ran into when writing unit tests for code storing timestamps in a database). My vote is for %<digit>N producing a microseconds field. Mark Russell

Skip Montanaro wrote:
Right, it's a struct_time object; just force of habit to call it a time tuple. And I technically don't see why a fractional second attribute could not be added that is not represented in the tuple. But I personally would like to see struct_tm eliminated in Py3k and replaced with datetime usage. My wish is to have the 'time' module stripped down to only the bare essentials that just don't fit in datetime and push everyone to use datetime for most things.
Why is that incorrect? -Brett
participants (6)
-
Aahz
-
Alex Martelli
-
Barry Warsaw
-
Brett C.
-
Mark Russell
-
Skip Montanaro