strptime without second argument as an inverse to __str__
What do you think about having `datetime.strptime`, when called without a `format` for the second argument, be a precise inverse of `datetime.__str__`? This is because I don't currently see an obvious way to get an inverse of `datetime.__str__`, and this seems like an okay place to put it.
On Mon, Aug 04, 2014 at 09:17:14AM -0700, Ram Rachum wrote:
What do you think about having `datetime.strptime`, when called without a `format` for the second argument, be a precise inverse of `datetime.__str__`? This is because I don't currently see an obvious way to get an inverse of `datetime.__str__`, and this seems like an okay place to put it.
Is str(datetime) guaranteed to use a specific format, or is that an implementation detail? -- Steven
On Mon, Aug 4, 2014 at 2:15 PM, Steven D'Aprano <steve@pearwood.info> wrote:
What do you think about having `datetime.strptime`, when called without a `format` for the second argument, be a precise inverse of `datetime.__str__`? This is because I don't currently see an obvious way to get an inverse of `datetime.__str__`, and this seems like an okay place to put it.
Is str(datetime) guaranteed to use a specific format, or is that an implementation detail?
Why is this question relevant for Ram's proposal? As long as str(datetime) is guaranteed to be different for different datetimes, one should be able to implement an inverse. The inverse function should accept ISO format (with either ' ' or 'T' separator) and str(datetime) if it is different in the implementation. I agree that datetime type should provide a simple way to construct instances from well-formatted strings, but I don't think datetime.strptime() is a good choice of name. I would much rather have date(str), time(str) and datetime(str) constructors.
On Mon, Aug 4, 2014 at 1:40 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
Why is this question relevant for Ram's proposal?
It would seem to have some impact on how hard it is to create a general inverse. Will one format work for all platforms ("one and done"), or will the inverse implementation potentially have to be updated as new platforms come into (or go out of) existence? Also, would the creation of such an inverse lock the implementation into existing format(s)? For example, when fed a datetime object, the CSV module will stringify it for output. If I create a CSV file with one version of Python, then read it into another version of Python (or on a different platform), it's not unreasonable that I would expect one-argument strptime() to parse it. That would lock you into a specific format. If only one format exists today, no big deal, bless it and move on. Skip
On Mon, Aug 4, 2014 at 3:00 PM, Skip Montanaro <skip@pobox.com> wrote:
Why is this question relevant for Ram's proposal?
It would seem to have some impact on how hard it is to create a general inverse. Will one format work for all platforms ("one and done"), or will the inverse implementation potentially have to be updated as new platforms come into (or go out of) existence?
I think str(datetime) format is an implementation detail to the same extent as str(int) or str(float) is. In the past, these variations did not prevent providing (sometimes imperfect) inverse.
On Mon, Aug 4, 2014 at 2:23 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
On Mon, Aug 4, 2014 at 3:00 PM, Skip Montanaro <skip@pobox.com> wrote:
Why is this question relevant for Ram's proposal?
It would seem to have some impact on how hard it is to create a general inverse. Will one format work for all platforms ("one and done"), or will the inverse implementation potentially have to be updated as new platforms come into (or go out of) existence?
I think str(datetime) format is an implementation detail to the same extent as str(int) or str(float) is. In the past, these variations did not prevent providing (sometimes imperfect) inverse.
I took a look at whatever version of CPython I have laying about (some variant of 2.7). str(datetime) seems to be well-defined as calling isoformat with " " as the separator. The only caveat is that if the microsecond field is zero, it's omitted. If that behavior holds true in 3.x, only two cases require consideration: %Y-%m-%d %H:%M:%S %Y-%m-%d %H:%M:%S.%f Skip
On 04.08.2014 22:14, Skip Montanaro wrote:
On Mon, Aug 4, 2014 at 2:23 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
On Mon, Aug 4, 2014 at 3:00 PM, Skip Montanaro <skip@pobox.com> wrote:
Why is this question relevant for Ram's proposal?
It would seem to have some impact on how hard it is to create a general inverse. Will one format work for all platforms ("one and done"), or will the inverse implementation potentially have to be updated as new platforms come into (or go out of) existence?
I think str(datetime) format is an implementation detail to the same extent as str(int) or str(float) is. In the past, these variations did not prevent providing (sometimes imperfect) inverse.
I took a look at whatever version of CPython I have laying about (some variant of 2.7). str(datetime) seems to be well-defined as calling isoformat with " " as the separator. The only caveat is that if the microsecond field is zero, it's omitted.
If that behavior holds true in 3.x, only two cases require consideration:
it does hold true in 3.x, but the documented behavior is slightly more complex (I assume also in 2.x): datetime.__str__() For a datetime instance d, str(d) is equivalent to d.isoformat(' '). datetime.isoformat(sep='T') Return a string representing the date and time in ISO 8601 format, YYYY-MM-DDTHH:MM:SS.mmmmmm or, if microsecond is 0, YYYY-MM-DDTHH:MM:SS If utcoffset() does not return None, a 6-character string is appended, giving the UTC offset in (signed) hours and minutes: YYYY-MM-DDTHH:MM:SS.mmmmmm+HH:MM or, if microsecond is 0 YYYY-MM-DDTHH:MM:SS+HH:MM The optional argument sep (default 'T') is a one-character separator, placed between the date and time portions of the result.
%Y-%m-%d %H:%M:%S %Y-%m-%d %H:%M:%S.%f
=> plus timezone versions of the above. Wolfgang
On Mon, Aug 04, 2014 at 10:56:56PM +0200, Wolfgang Maier wrote: [...]
it does hold true in 3.x, but the documented behavior is slightly more complex (I assume also in 2.x):
datetime.__str__() For a datetime instance d, str(d) is equivalent to d.isoformat(' ').
Since str(d) is documented to use a well-defined format, then I agree that it makes sense to make the second argument to d.strptime optional, and default to that same format. The concern I had was the sort of scenario Skip suggested: I might write out a datetime object as a string on one machine, where the format is X, and read it back elsewhere, where the format is Y, leading to at best an exception and at worse incorrect data. +1 on the suggestion. -- Steven
On 05.08.2014 03:39, Steven D'Aprano wrote:
Since str(d) is documented to use a well-defined format, then I agree that it makes sense to make the second argument to d.strptime optional, and default to that same format. The concern I had was the sort of scenario Skip suggested: I might write out a datetime object as a string on one machine, where the format is X, and read it back elsewhere, where the format is Y, leading to at best an exception and at worse incorrect data.
+1 on the suggestion.
After looking a bit into the code of the datetime module, I am not convinced anymore that strptime() is the right place for the functionality for the following reasons: 1) strptime already has a clear counterpart and that's strftime. 2) strftime/strptime use explicit format strings, not any more sophisticated parsing (as would be required to parse the different formats that datetime.__str__ can produce) and they try, intentionally, to mimick the behavior of their C equivalents. In other words, strftime/strptime have a very clear underlying concept, which IMO should not be given up just because we are trying to stuff some extra-functionality into them. That said, I still think that the basic idea - being able to reverse-parse the output of datetime.__str__ - is right. I would suggest that a better place for this is an additional classmethod constructor (the datetime class already has quite a number of them). Maybe fromisostring() could be a suitable name ? With this you could even pass an extra-argument for the date-time separator just like with the current isoformat. This constructor would then be more like a counterpart to datetime.isoformat(), but it could simply be documented that calling it with fromisostring(datestring, sep=" ") can be used to parse strings written with datetime.str(). -1 on the specifics of the proposal, +1 on the general idea.
On 05.08.2014 23:22, Wolfgang Maier wrote:
On 05.08.2014 03:39, Steven D'Aprano wrote:
Since str(d) is documented to use a well-defined format, then I agree that it makes sense to make the second argument to d.strptime optional, and default to that same format. The concern I had was the sort of scenario Skip suggested: I might write out a datetime object as a string on one machine, where the format is X, and read it back elsewhere, where the format is Y, leading to at best an exception and at worse incorrect data.
+1 on the suggestion.
After looking a bit into the code of the datetime module, I am not convinced anymore that strptime() is the right place for the functionality for the following reasons:
1) strptime already has a clear counterpart and that's strftime.
2) strftime/strptime use explicit format strings, not any more sophisticated parsing (as would be required to parse the different formats that datetime.__str__ can produce) and they try, intentionally, to mimick the behavior of their C equivalents.
In other words, strftime/strptime have a very clear underlying concept, which IMO should not be given up just because we are trying to stuff some extra-functionality into them.
That said, I still think that the basic idea - being able to reverse-parse the output of datetime.__str__ - is right.
I would suggest that a better place for this is an additional classmethod constructor (the datetime class already has quite a number of them). Maybe fromisostring() could be a suitable name ?
Maybe rather fromisoformat(), to stay analogous with the formatting method?
With this you could even pass an extra-argument for the date-time separator just like with the current isoformat. This constructor would then be more like a counterpart to datetime.isoformat(), but it could simply be documented that calling it with fromisostring(datestring, sep=" ") can be used to parse strings written with datetime.str().
-1 on the specifics of the proposal, +1 on the general idea.
+1 for this rating.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Aug 5, 2014, at 14:22, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:
On 05.08.2014 03:39, Steven D'Aprano wrote:
Since str(d) is documented to use a well-defined format, then I agree that it makes sense to make the second argument to d.strptime optional, and default to that same format. The concern I had was the sort of scenario Skip suggested: I might write out a datetime object as a string on one machine, where the format is X, and read it back elsewhere, where the format is Y, leading to at best an exception and at worse incorrect data.
+1 on the suggestion.
After looking a bit into the code of the datetime module, I am not convinced anymore that strptime() is the right place for the functionality for the following reasons:
1) strptime already has a clear counterpart and that's strftime.
2) strftime/strptime use explicit format strings, not any more sophisticated parsing (as would be required to parse the different formats that datetime.__str__ can produce) and they try, intentionally, to mimick the behavior of their C equivalents.
In other words, strftime/strptime have a very clear underlying concept, which IMO should not be given up just because we are trying to stuff some extra-functionality into them.
What if strftime _also_ allowed the format string to be omitted, in which case it would produce the same format as str? Then they would remain perfect inverses.
That said, I still think that the basic idea - being able to reverse-parse the output of datetime.__str__ - is right.
I would suggest that a better place for this is an additional classmethod constructor (the datetime class already has quite a number of them). Maybe fromisostring() could be a suitable name ? With this you could even pass an extra-argument for the date-time separator just like with the current isoformat. This constructor would then be more like a counterpart to datetime.isoformat(), but it could simply be documented that calling it with fromisostring(datestring, sep=" ") can be used to parse strings written with datetime.str().
Wouldn't you expect a method called fromisostring to be able to parse any valid ISO string, especially given that there are third-party libs with functions named fromisoformat that do exactly that, and people suggest adding one of them to the stdlib every few months? What you want to get across is that this function parses the default Python representation of datetimes; the fact that it happens to be a subset of ISO format doesn't seem as relevant here. I like the idea of a new alternate constructor, I'm just not crazy about the name.
-1 on the specifics of the proposal, +1 on the general idea.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Tue, Aug 5, 2014 at 11:35 PM, Andrew Barnert <abarnert@yahoo.com.dmarc.invalid> wrote:
On Aug 5, 2014, at 14:22, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:
On 05.08.2014 03:39, Steven D'Aprano wrote:
Since str(d) is documented to use a well-defined format, then I agree that it makes sense to make the second argument to d.strptime optional, and default to that same format. The concern I had was the sort of scenario Skip suggested: I might write out a datetime object as a string on one machine, where the format is X, and read it back elsewhere, where the format is Y, leading to at best an exception and at worse incorrect data.
+1 on the suggestion.
After looking a bit into the code of the datetime module, I am not convinced anymore that strptime() is the right place for the functionality for the following reasons:
1) strptime already has a clear counterpart and that's strftime.
2) strftime/strptime use explicit format strings, not any more sophisticated parsing (as would be required to parse the different formats that datetime.__str__ can produce) and they try, intentionally, to mimick the behavior of their C equivalents.
In other words, strftime/strptime have a very clear underlying concept, which IMO should not be given up just because we are trying to stuff some extra-functionality into them.
What if strftime _also_ allowed the format string to be omitted, in which case it would produce the same format as str? Then they would remain perfect inverses.
+1
That said, I still think that the basic idea - being able to reverse-parse the output of datetime.__str__ - is right.
I would suggest that a better place for this is an additional classmethod constructor (the datetime class already has quite a number of them). Maybe fromisostring() could be a suitable name ? With this you could even pass an extra-argument for the date-time separator just like with the current isoformat. This constructor would then be more like a counterpart to datetime.isoformat(), but it could simply be documented that calling it with fromisostring(datestring, sep=" ") can be used to parse strings written with datetime.str().
Wouldn't you expect a method called fromisostring to be able to parse any valid ISO string, especially given that there are third-party libs with functions named fromisoformat that do exactly that, and people suggest adding one of them to the stdlib every few months?
What you want to get across is that this function parses the default Python representation of datetimes; the fact that it happens to be a subset of ISO format doesn't seem as relevant here. I like the idea of a new alternate constructor, I'm just not crazy about the name.
Let me just note this, since it hasn't been said here yet: When people say "iso" in the context of datestimes, they usually mean RFC 3339. As Wikipedia can tell you, ISO 8601 is a big complicated non-public specification under which today can be written as: - 2014-08-05 - 2014-W32-2 - 2014-217 ... and by now I can see why there's no ISO 8601 parser in the stdlib. RFC 3339, on the other hand, specifies one specific variant of ISO 8601: the one we're all used to, and which datetime's isoformat and __str__ return. (Just about the only exception is that to be compatible with ISO 8601, it still specifies "T"/"t" for the separator and graciously lets people agree on space.)
On Aug 5, 2014, at 14:46, Petr Viktorin <encukou@gmail.com> wrote:
On Tue, Aug 5, 2014 at 11:35 PM, Andrew Barnert <abarnert@yahoo.com.dmarc.invalid> wrote:
On Aug 5, 2014, at 14:22, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:
On 05.08.2014 03:39, Steven D'Aprano wrote:
Since str(d) is documented to use a well-defined format, then I agree that it makes sense to make the second argument to d.strptime optional, and default to that same format. The concern I had was the sort of scenario Skip suggested: I might write out a datetime object as a string on one machine, where the format is X, and read it back elsewhere, where the format is Y, leading to at best an exception and at worse incorrect data.
+1 on the suggestion.
After looking a bit into the code of the datetime module, I am not convinced anymore that strptime() is the right place for the functionality for the following reasons:
1) strptime already has a clear counterpart and that's strftime.
2) strftime/strptime use explicit format strings, not any more sophisticated parsing (as would be required to parse the different formats that datetime.__str__ can produce) and they try, intentionally, to mimick the behavior of their C equivalents.
In other words, strftime/strptime have a very clear underlying concept, which IMO should not be given up just because we are trying to stuff some extra-functionality into them.
What if strftime _also_ allowed the format string to be omitted, in which case it would produce the same format as str? Then they would remain perfect inverses.
+1
That said, I still think that the basic idea - being able to reverse-parse the output of datetime.__str__ - is right.
I would suggest that a better place for this is an additional classmethod constructor (the datetime class already has quite a number of them). Maybe fromisostring() could be a suitable name ? With this you could even pass an extra-argument for the date-time separator just like with the current isoformat. This constructor would then be more like a counterpart to datetime.isoformat(), but it could simply be documented that calling it with fromisostring(datestring, sep=" ") can be used to parse strings written with datetime.str().
Wouldn't you expect a method called fromisostring to be able to parse any valid ISO string, especially given that there are third-party libs with functions named fromisoformat that do exactly that, and people suggest adding one of them to the stdlib every few months?
What you want to get across is that this function parses the default Python representation of datetimes; the fact that it happens to be a subset of ISO format doesn't seem as relevant here. I like the idea of a new alternate constructor, I'm just not crazy about the name.
Let me just note this, since it hasn't been said here yet:
When people say "iso" in the context of datestimes, they usually mean RFC 3339.
RFC 3339 is still more complicated than just reversing Python's str or isoformat. IIRC (it's hard to check on my phone), it mandates that parsers should accept 2-digit years (including 3-digit or semicolon-and-two-digit years), lowercase T and Z, missing "-", and other things that you shouldn't generate but some code might. That being said, it's still obviously easier to write an RFC 3339 parser than a full ISO 8601 parser, and as long as someone is willing to write it (with sufficient tests) I don't see any problem with the stdlib having one. But I don't know that it should be called "fromisostring". "fromisoformat" isn't quite as bad, since at least it implies that it's the inverse of the same type's "isoformat", but it still seems misleading.
Andrew Barnert <abarnert@yahoo.com.dmarc.invalid> writes:
On Aug 5, 2014, at 14:46, Petr Viktorin <encukou@gmail.com> wrote:
When people say "iso" in the context of datestimes, they usually mean RFC 3339.
RFC 3339 is still more complicated than just reversing Python's str or isoformat. IIRC (it's hard to check on my phone), it mandates that parsers should accept 2-digit years (including 3-digit or semicolon-and-two-digit years), lowercase T and Z, missing "-", and other things that you shouldn't generate but some code might.
Please, don't spread misinformation. Among the explicit rfc 3339 design goals are simplicity and human readability. Just read http://tools.ietf.org/html/rfc3339 (for an rfc it is relatively short and readable). Here's full ABNF: date-fullyear = 4DIGIT date-month = 2DIGIT ; 01-12 date-mday = 2DIGIT ; 01-28, 01-29, 01-30, 01-31 based on ; month/year time-hour = 2DIGIT ; 00-23 time-minute = 2DIGIT ; 00-59 time-second = 2DIGIT ; 00-58, 00-59, 00-60 based on leap second ; rules time-secfrac = "." 1*DIGIT time-numoffset = ("+" / "-") time-hour ":" time-minute time-offset = "Z" / time-numoffset partial-time = time-hour ":" time-minute ":" time-second [time-secfrac] full-date = date-fullyear "-" date-month "-" date-mday full-time = partial-time time-offset date-time = full-date "T" full-time Example: 1937-01-01T12:00:27.87+00:20 The format is so simple that people just write adhoc parsers using strptime() without installing any formal rfc3339 module (if it even exists).
That being said, it's still obviously easier to write an RFC 3339 parser than a full ISO 8601 parser, and as long as someone is willing to write it (with sufficient tests) I don't see any problem with the stdlib having one. But I don't know that it should be called "fromisostring".
"fromisoformat" isn't quite as bad, since at least it implies that it's the inverse of the same type's "isoformat", but it still seems misleading.
-- Akira
On Aug 7, 2014, at 5:35, Akira Li <4kir4.1i@gmail.com> wrote:
Andrew Barnert <abarnert@yahoo.com.dmarc.invalid> writes:
On Aug 5, 2014, at 14:46, Petr Viktorin <encukou@gmail.com> wrote:
When people say "iso" in the context of datestimes, they usually mean RFC 3339.
RFC 3339 is still more complicated than just reversing Python's str or isoformat. IIRC (it's hard to check on my phone), it mandates that parsers should accept 2-digit years (including 3-digit or semicolon-and-two-digit years), lowercase T and Z, missing "-", and other things that you shouldn't generate but some code might.
Please, don't spread misinformation.
Among the explicit rfc 3339 design goals are simplicity and human readability.
Just read http://tools.ietf.org/html/rfc3339 (for an rfc it is relatively short and readable).
OK, I just read it. Among other things:
NOTE: Per [ABNF] and ISO8601, the "T" and "Z" characters in this syntax may alternatively be lower case "t" or "z" respectively. This date/time format may be used in some environments or contexts that distinguish between the upper- and lower-case letters 'A'-'Z' and 'a'-'z' (e.g. XML). Specifications that use this format in such environments MAY further limit the date/time syntax so that the letters 'T' and 'Z' used in the date/time syntax must always be upper case. Applications that generate this format SHOULD use upper case letters. NOTE: ISO 8601 defines date and time separated by "T". Applications using this syntax may choose, for the sake of readability, to specify a full-date and full-time separated by (say) a space character. Klyne, et. al. Standards Track [Page 8]
(And there's also a whole section of interpreting "legacy"/"deprecated" 2-digit years and how you should handle them.) So, is the RFC "spreading misinformation" about itself?
The format is so simple that people just write adhoc parsers using strptime() without installing any formal rfc3339 module (if it even exists).
Sure, and people also do that and call it an ISO parser. If it can't interoperable with everything compliant applications may generate (much less deprecated formats the standard doesn't allow you to generate but mandates how you parse and interpret), it's not accurate to call it an RFC 3339 parser (at least not in a general-purpose library). As I said before, it's still certainly much easier to write an RFC 3339 parser than an ISO 8601 parser, but it's not as trivial as you're implying.
Andrew Barnert <abarnert@yahoo.com> writes:
On Aug 7, 2014, at 5:35, Akira Li <4kir4.1i@gmail.com> wrote:
Andrew Barnert <abarnert@yahoo.com.dmarc.invalid> writes:
On Aug 5, 2014, at 14:46, Petr Viktorin <encukou@gmail.com> wrote:
When people say "iso" in the context of datestimes, they usually mean RFC 3339.
RFC 3339 is still more complicated than just reversing Python's str or isoformat. IIRC (it's hard to check on my phone), it mandates that parsers should accept 2-digit years (including 3-digit or semicolon-and-two-digit years), lowercase T and Z, missing "-", and other things that you shouldn't generate but some code might.
Please, don't spread misinformation.
Among the explicit rfc 3339 design goals are simplicity and human readability.
Just read http://tools.ietf.org/html/rfc3339 (for an rfc it is relatively short and readable). ... (And there's also a whole section of interpreting "legacy"/"deprecated" 2-digit years and how you should handle them.)
So, is the RFC "spreading misinformation" about itself?
You are *obviously* wrong for the rfc 3339 Internet Date/Time Format itself (used by __str__, isoformat -- relevant to the current topic). http://tools.ietf.org/html/rfc3339 You can be *subtly* wrong for the rfc as a whole. The full ABNF from my previous message contains date-fullyear definition. "The following profile of ISO 8601 [ISO8601] dates SHOULD be used in new protocols on the Internet" [rfc3339]: date-fullyear = 4DIGIT It means that the year SHOULD be *exactly* 4 digits i.e., the rfc 3339 Internet Date/Time Format uses only 4-digit years. I don't know what "mandates that parsers should accept 2-digit years" [Andrew Barnert] means (does "mandates" mean SHOULD or MUST here?) but it seems inspired by: "Internet Protocols MUST generate four digit years in dates. The use of 2-digit years is deprecated. If a 2-digit year is received, it should be accepted ONLY if an incorrect interpretation will not cause a protocol or processing failure" [rfc3339] and the words MUST and SHOULD are well-defined [rfc2119]. Do you see that 2-digit year MUST or SHOULD be accepted? (I don't see it). 'missing "-"' [Andrew Barnert] seems like an obvious mistake. All punctuaction except optional time-secfrac (fraction of a second) is mandatory. "and other things that you shouldn't generate but some code might." [Andrew Barnert] What things? Could you be more specific? There is not much room in the format. Redundant information is not included.
The format is so simple that people just write adhoc parsers using strptime() without installing any formal rfc3339 module (if it even exists).
Sure, and people also do that and call it an ISO parser.
If it can't interoperable with everything compliant applications may generate (much less deprecated formats the standard doesn't allow you to generate but mandates how you parse and interpret), it's not accurate to call it an RFC 3339 parser (at least not in a general-purpose library).
As I said before, it's still certainly much easier to write an RFC 3339 parser than an ISO 8601 parser, but it's not as trivial as you're implying.
Compared to the full ISO 8601 format (I don't know whether it can be parsed unambiguously), rfc 3339 (a conformant subset of the ISO 8601 extended format) is simple by design. It *is* accurate to call it (with only 4-digit year support) an rfc 3339 parser in a general purpose library: compliant software MUST generate 4-digit year, Internet Date/Time Format SHOULD contain ONLY 4-digit year (show quotes from the rfc that contain MUST, SHOULD, etc that say otherwise if you disagree). To be fair, the rfc *does not forbid* 2-digit year outright in *all possible cases*. Regardless the rfc language nuances, 2-digit year is harmful in practice -- different software may interpret it differently: software that is aware of rfc 3339 MUST generate 4-digit year, software that is not aware of rfc 3339 can interpret 2-digit year differently. If an error is possible when 2-digit year should not be used: "it should be accepted ONLY if an incorrect interpretation will not cause a protocol or processing failure" [rfc3339] P.S. ...
OK, I just read it. Among other things:
NOTE: Per [ABNF] and ISO8601, the "T" and "Z" characters in this syntax may alternatively be lower case "t" or "z" respectively. This date/time format may be used in some environments or contexts that distinguish between the upper- and lower-case letters 'A'-'Z' and a'-'z' (e.g. XML). Specifications that use this format in such environments MAY further limit the date/time syntax so that the letters 'T' and 'Z' used in the date/time syntax must always be upper case. Applications that generate this format SHOULD use upper case letters. NOTE: ISO 8601 defines date and time separated by "T". Applications using this syntax may choose, for the sake of readability, to specify a full-date and full-time separated by (say) a space character. Klyne, et. al. Standards Track [Page 8]
What is the point of the copy-paste? Does handling lowercase "t", "z", and a space complicates the parsing in a meaningful manner? Imagine a parser that supports sep='T' (default for isoformat()) and sep=' ' (space for __str__). How hard do you think to extend such parser to support 't' as a separator? -- Akira
Mixing up responses to an previous email you already responded to with the new one makes it harder to reply, but I'll try. On Thursday, August 7, 2014 10:43 PM, Akira Li <4kir4.1i@gmail.com> wrote:
Andrew Barnert <abarnert@yahoo.com> writes:
On Aug 7, 2014, at 5:35, Akira Li <4kir4.1i@gmail.com> wrote:
Please, don't spread misinformation.
Among the explicit rfc 3339 design goals are simplicity and human readability.
Just read http://tools.ietf.org/html/rfc3339 (for an rfc it is relatively short and readable). ... (And there's also a whole section of interpreting "legacy"/"deprecated" 2-digit years and how you should handle them.)
So, is the RFC "spreading misinformation" about itself?
You are *obviously* wrong for the rfc 3339 Internet Date/Time Format itself (used by __str__, isoformat -- relevant to the current topic).
You accused me of "spreading misinformation" by saying that RFC 3339 is more complicated than what Python's str generates. You also suggested that people often parse RFC 3339 with a simple strptime. I pointed out that the RFC itself clearly defines something more complicated than what Python's str generates, and that it can't be fully parsed by a simple strptime, and then added a parenthetical remark about support for 2-year dates. You cut out everything but the parenthetical remark, and replied to that as if it was the whole point of my message. And you even responded to nothing but the parenthetical 2-year comment in replies to completely separate sections of the message. I have no idea how to respond to that except to say: read it again. The fact that a compliant parser should accept any of "T", "t", or " " as a separator already makes it impossible to parse with strptime, and more complicated than parsing Python's str output. I don't see how you can dispute that, or how it's "spreading misinformation" to point that out.
"and other things that you shouldn't generate but some code
might." [Andrew Barnert]
What things? Could you be more specific?
I was specific, and you chopped it out of my message, or moved it to a different part of the message.
The format is so simple that people just write adhoc parsers using strptime() without installing any formal rfc3339 module (if it even exists).
Sure, and people also do that and call it an ISO parser.
If it can't interoperable with everything compliant applications may generate (much less deprecated formats the standard doesn't allow you to generate but mandates how you parse and interpret), it's not accurate to call it an RFC 3339 parser (at least not in a general-purpose library).
As I said before, it's still certainly much easier to write an RFC 3339 parser than an ISO 8601 parser, but it's not as trivial as you're implying.
Compared to the full ISO 8601 format (I don't know whether it can be parsed unambiguously), rfc 3339 (a conformant subset of the ISO 8601 extended format) is simple by design.
Since you're arguing here exactly what I said, in the very paragraph you just quoted—"it's certainly much easier to write an RFC 3339 parser than an ISO 8601 parser"—I don't know why you think you have to convince me of that fact. But again, that doesn't mean that a trivial strptime call is sufficient for a general-purpose library that claims to be a compliant RFC 3339 parser.
Regardless the rfc language nuances, 2-digit year is harmful in practice
Of course. So what? If my point actually were that RFC 3339 was complicated because of 2-digit years, and if that point were true, then this rebuttal might be relevant—but in that case it would only be proving the opposite point you're trying to make, that RFC 3339 parsing is _hard_. Fortunately, that isn't my point, and your statement is just irrelevant rather than contradictory to your whole argument.
P.S.
... What is the point of the copy-paste? Does handling lowercase "t", "z", and a space complicates the parsing in a meaningful manner?
Sure. Even ignoring the fact that you're claiming that people can parse it with strptime, the very fact that many people have written code that claims to be able to parse RFC 3339, and yet doesn't handle lowercase "t" and "z", shows that there is something you can get wrong; therefore, it is not trivial.
Imagine a parser that supports sep='T' (default for isoformat()) and sep=' ' (space for __str__). How hard do you think to extend such parser to support 't' as a separator?
One again: "it's certainly much easier to write an RFC 3339 parser than an ISO 8601 parser," but it's not completely trivial. You seem to think that this is a black-or-white issue: either RFC 3339 is a complete failure and it's actually as hard to parse as ISO 8601, or RFC 3339 is trivially parseable and no one should bother to write a parser for a subset of the language because it's just as easy to parse the whole thing. In reality, neither one of those is true. The latter may be closer to the truth than the former, but it's still wrong. In other words, it makes perfect sense to write a parser for exactly what Python generates, and not claim it as RFC 3339 compliant.
Please, don't put words in my mouth. To avoid Straw-man, quote me directly in context. As I do with you in my messages. Andrew Barnert <abarnert@yahoo.com.dmarc.invalid> writes:
Mixing up responses to an previous email you already responded to with the new one makes it harder to reply, but I'll try.
On Thursday, August 7, 2014 10:43 PM, Akira Li <4kir4.1i@gmail.com> wrote:
Andrew Barnert <abarnert@yahoo.com> writes:
On Aug 7, 2014, at 5:35, Akira Li <4kir4.1i@gmail.com> wrote:
Please, don't spread misinformation.
Among the explicit rfc 3339 design goals are simplicity and human readability.
Just read http://tools.ietf.org/html/rfc3339 (for an rfc it is relatively short and readable). ... (And there's also a whole section of interpreting "legacy"/"deprecated" 2-digit years and how you should handle them.)
So, is the RFC "spreading misinformation" about itself?
You are *obviously* wrong for the rfc 3339 Internet Date/Time Format itself (used by __str__, isoformat -- relevant to the current topic).
You accused me of "spreading misinformation" by saying that RFC 3339 is more complicated than what Python's str generates. You also suggested that people often parse RFC 3339 with a simple strptime.
You said [1]: "RFC 3339 is still more complicated than just reversing Python's str or isoformat. IIRC (it's hard to check on my phone), it mandates that parsers should accept 2-digit years (including 3-digit or semicolon-and-two-digit years), lowercase T and Z, missing "-", and other things that you shouldn't generate but some code might." [1] https://mail.python.org/pipermail/python-ideas/2014-August/028509.html What I consider misinformation: "it mandates that parsers should accept 2-digit years (including 3-digit or semicolon-and-two-digit years)" [Andrew Barnert] '"missing "-"' [Andrew Barnert] "and other things that you shouldn't generate but some code might." [Andrew Barnert] I've already described it quote by quote in [2] with all the gory details including a careful usage of MUST, SHOULD words from [rfc 2119]. [2] https://mail.python.org/pipermail/python-ideas/2014-August/028541.html [rfc 2119] http://tools.ietf.org/html/rfc2119
I pointed out that the RFC itself clearly defines something more complicated than what Python's str generates, and that it can't be fully parsed by a simple strptime, and then added a parenthetical remark about support for 2-year dates.
It *is* obvious that the rfc is more complex than the str method e.g., the rfc supports 'T', 'Z' (default str doesn't generate them as far as I know). But all specific details (except "lowercase T and Z") you provided in the quote from [1] are wrong as described in [2]. What conclusions can you draw about the whole statement after that?
You cut out everything but the parenthetical remark, and replied to that as if it was the whole point of my message. And you even responded to nothing but the parenthetical 2-year comment in replies to completely separate sections of the message. I have no idea how to respond to that except to say: read it again.
I've read and reread [1]. I don't see what "other things" you are referring to. To avoid ambiguity, could you use a direct quote and a link (as I've demonstrated above)? -- Akira
On Aug 8, 2014, at 8:17, Akira Li <4kir4.1i@gmail.com> wrote:
It *is* obvious that the rfc is more complex than the str method e.g., the rfc supports 'T', 'Z' (default str doesn't generate them as far as I know).
This is the only point that matters here to me. If you're no longer disagreeing with it, I have no interest in continuing to argue, and I doubt anyone else has any interest in reading it. So, let me summarize and see if you have anything substantive to disagree with: People want a function to reverse __str__, possibly also handling space as a separator. Such a function is trivial to write. If it's not a fully compliant RFC 3339 parser, that's fine, and it will be useful for it's intended goal, as long as it isn't called fromrfc3339string (which no one has suggested anyway, although someone did suggest fromisostring, which is what started the whole RFC 3339 side track--obviously that name shouldn't be used either).
On 8/5/2014 5:35 PM, Andrew Barnert wrote:
On Aug 5, 2014, at 14:22, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:
On 05.08.2014 03:39, Steven D'Aprano wrote:
Since str(d) is documented to use a well-defined format, then I agree that it makes sense to make the second argument to d.strptime optional, and default to that same format. The concern I had was the sort of scenario Skip suggested: I might write out a datetime object as a string on one machine, where the format is X, and read it back elsewhere, where the format is Y, leading to at best an exception and at worse incorrect data.
+1 on the suggestion.
After looking a bit into the code of the datetime module, I am not convinced anymore that strptime() is the right place for the functionality for the following reasons:
1) strptime already has a clear counterpart and that's strftime.
2) strftime/strptime use explicit format strings, not any more sophisticated parsing (as would be required to parse the different formats that datetime.__str__ can produce) and they try, intentionally, to mimick the behavior of their C equivalents.
In other words, strftime/strptime have a very clear underlying concept, which IMO should not be given up just because we are trying to stuff some extra-functionality into them.
What if strftime _also_ allowed the format string to be omitted, in which case it would produce the same format as str? Then they would remain perfect inverses.
That said, I still think that the basic idea - being able to reverse-parse the output of datetime.__str__ - is right.
I would suggest that a better place for this is an additional classmethod constructor (the datetime class already has quite a number of them). Maybe fromisostring() could be a suitable name ? With this you could even pass an extra-argument for the date-time separator just like with the current isoformat. This constructor would then be more like a counterpart to datetime.isoformat(), but it could simply be documented that calling it with fromisostring(datestring, sep=" ") can be used to parse strings written with datetime.str().
Wouldn't you expect a method called fromisostring to be able to parse any valid ISO string, especially given that there are third-party libs with functions named fromisoformat that do exactly that, and people suggest adding one of them to the stdlib every few months?
Probably yes
What you want to get across is that this function parses the default Python representation of datetimes; the fact that it happens to be a subset of ISO format doesn't seem as relevant here. I like the idea of a new alternate constructor, I'm just not crazy about the name.
Given that str(dti) (datetime instance) is conceptually dt.tostr(dit), name the inverse as dti = dt.fromstr(s). -- Terry Jan Reedy
On Aug 5, 2014, at 16:12, Terry Reedy <tjreedy@udel.edu> wrote:
On 8/5/2014 5:35 PM, Andrew Barnert wrote:
What you want to get across is that this function parses the default Python representation of datetimes; the fact that it happens to be a subset of ISO format doesn't seem as relevant here. I like the idea of a new alternate constructor, I'm just not crazy about the name.
Given that str(dti) (datetime instance) is conceptually dt.tostr(dit), name the inverse as dti = dt.fromstr(s).
Wow, now I feel stupid for not thinking of this one. +00:00:01
On 05.08.2014 23:35, Andrew Barnert wrote:
On Aug 5, 2014, at 14:22, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:
On 05.08.2014 03:39, Steven D'Aprano wrote:
Since str(d) is documented to use a well-defined format, then I agree that it makes sense to make the second argument to d.strptime optional, and default to that same format. The concern I had was the sort of scenario Skip suggested: I might write out a datetime object as a string on one machine, where the format is X, and read it back elsewhere, where the format is Y, leading to at best an exception and at worse incorrect data.
+1 on the suggestion.
After looking a bit into the code of the datetime module, I am not convinced anymore that strptime() is the right place for the functionality for the following reasons:
1) strptime already has a clear counterpart and that's strftime.
2) strftime/strptime use explicit format strings, not any more sophisticated parsing (as would be required to parse the different formats that datetime.__str__ can produce) and they try, intentionally, to mimick the behavior of their C equivalents.
In other words, strftime/strptime have a very clear underlying concept, which IMO should not be given up just because we are trying to stuff some extra-functionality into them.
What if strftime _also_ allowed the format string to be omitted, in which case it would produce the same format as str? Then they would remain perfect inverses.
Yes, but strftime without format string would then be completely redundant with __str__ and isoformat with " " separator, which is really quite against the one and only one way of doing things idea. Plus again, right now strftime takes an explicit format string and then generates a datetime string with exactly this and only this format. In the optional format string scenario, it would have to generate slightly differently formatted output depending on whether there is microseconds and/or timezone information. So, like for strptime, this would change the very clearly defined current behavior into a mix of things, unnecessarily.
That said, I still think that the basic idea - being able to reverse-parse the output of datetime.__str__ - is right.
I would suggest that a better place for this is an additional classmethod constructor (the datetime class already has quite a number of them). Maybe fromisostring() could be a suitable name ? With this you could even pass an extra-argument for the date-time separator just like with the current isoformat. This constructor would then be more like a counterpart to datetime.isoformat(), but it could simply be documented that calling it with fromisostring(datestring, sep=" ") can be used to parse strings written with datetime.str().
Wouldn't you expect a method called fromisostring to be able to parse any valid ISO string, especially given that there are third-party libs with functions named fromisoformat that do exactly that, and people suggest adding one of them to the stdlib every few months?
What you want to get across is that this function parses the default Python representation of datetimes; the fact that it happens to be a subset of ISO format doesn't seem as relevant here. I like the idea of a new alternate constructor, I'm just not crazy about the name.
Fair enough, it was just the first half-reasonable thing that came to my mind :) Being able to parse any valid ISO string would be another nice feature, but it's really a different story. Wolfgang
On 06.08.2014 12:34, Ethan Furman wrote:
On 08/06/2014 01:35 AM, Wolfgang Maier wrote:
[...] which is really quite against the one and only one way of doing things idea.
It's "One Obvious Way" not "Only One Way".
I wasn't quoting, just paraphrasing.
On 08/06/2014 03:40 AM, Wolfgang Maier wrote:
On 06.08.2014 12:34, Ethan Furman wrote:
On 08/06/2014 01:35 AM, Wolfgang Maier wrote:
[...] which is really quite against the one and only one way of doing things idea.
It's "One Obvious Way" not "Only One Way".
I wasn't quoting, just paraphrasing.
It's a bad paraphrase as the two have nearly completely different meanings. If you wish to pursue this sub-thread further I'll have to turn my portion over to D'Aprano (assuming he's willing) as he is much better at long explanations than I am. -- ~Ethan~
On 06.08.2014 16:05, Ethan Furman wrote:
On 08/06/2014 03:40 AM, Wolfgang Maier wrote:
On 06.08.2014 12:34, Ethan Furman wrote:
On 08/06/2014 01:35 AM, Wolfgang Maier wrote:
[...] which is really quite against the one and only one way of doing things idea.
It's "One Obvious Way" not "Only One Way".
I wasn't quoting, just paraphrasing.
It's a bad paraphrase as the two have nearly completely different meanings.
If you wish to pursue this sub-thread further I'll have to turn my portion over to D'Aprano (assuming he's willing) as he is much better at long explanations than I am.
I don't think that's necessary :) I like focused threads !
On Wed, Aug 06, 2014 at 07:05:48AM -0700, Ethan Furman wrote:
On 08/06/2014 03:40 AM, Wolfgang Maier wrote:
On 06.08.2014 12:34, Ethan Furman wrote:
On 08/06/2014 01:35 AM, Wolfgang Maier wrote:
[...] which is really quite against the one and only one way of doing things idea.
It's "One Obvious Way" not "Only One Way".
I wasn't quoting, just paraphrasing.
It's a bad paraphrase as the two have nearly completely different meanings.
If you wish to pursue this sub-thread further I'll have to turn my portion over to D'Aprano (assuming he's willing) as he is much better at long explanations than I am.
I can make it a short explanation :-) "Only One Way" implies: assert len(collection_of_ways) == 1 "One Obvious Way" implies: assert any(way.is_obvious() for way in collection_of_ways) Now I suppose I'll have to go back and read the rest of the thread to understand what this is about *wink* -- Steven
On Wed, Aug 6, 2014 at 7:10 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Now I suppose I'll have to go back and read the rest of the thread to understand what this is about *wink*
Might be more fun to recast the Zen of Python into Python itself. :-) Skip
On Aug 6, 2014, at 1:35, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:
On 05.08.2014 23:35, Andrew Barnert wrote:
On Aug 5, 2014, at 14:22, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:
On 05.08.2014 03:39, Steven D'Aprano wrote:
Since str(d) is documented to use a well-defined format, then I agree that it makes sense to make the second argument to d.strptime optional, and default to that same format. The concern I had was the sort of scenario Skip suggested: I might write out a datetime object as a string on one machine, where the format is X, and read it back elsewhere, where the format is Y, leading to at best an exception and at worse incorrect data.
+1 on the suggestion.
After looking a bit into the code of the datetime module, I am not convinced anymore that strptime() is the right place for the functionality for the following reasons:
1) strptime already has a clear counterpart and that's strftime.
2) strftime/strptime use explicit format strings, not any more sophisticated parsing (as would be required to parse the different formats that datetime.__str__ can produce) and they try, intentionally, to mimick the behavior of their C equivalents.
In other words, strftime/strptime have a very clear underlying concept, which IMO should not be given up just because we are trying to stuff some extra-functionality into them.
What if strftime _also_ allowed the format string to be omitted, in which case it would produce the same format as str? Then they would remain perfect inverses.
Yes, but strftime without format string would then be completely redundant with __str__ and isoformat with " " separator, which is really quite against the one and only one way of doing things idea.
They're not redundant. str provides gives you some human-readable, ideally but not necessarily parseable, representation. isoformat gives you a specific format that you know is parseable by many other libraries and languages, and sorts in date order. strftime lets you specify a format to be parsed by specific code or problem-specific human expectations. Is the fact that they happen to overlap (which is already true, since you can always specify the same format explicitly if you want) any worse than the fact that str(3) and format(3, 'd') give you the same result?
Plus again, right now strftime takes an explicit format string and then generates a datetime string with exactly this and only this format. In the optional format string scenario, it would have to generate slightly differently formatted output depending on whether there is microseconds and/or timezone information. So, like for strptime, this would change the very clearly defined current behavior into a mix of things, unnecessarily.
The purpose of strftime and strptime is to be inverses of each other--to generate and parse datetime strings in a specified way. If one of those ways is "the default Python string representation" for one function, it should be true for the other. (Doesn't gnu strf/ptime have an extension that gives you % codes for "default" date and time representations, which don't guarantee anything other than that they be reasonable for the locale and reversible?)
That said, I still think that the basic idea - being able to reverse-parse the output of datetime.__str__ - is right.
I would suggest that a better place for this is an additional classmethod constructor (the datetime class already has quite a number of them). Maybe fromisostring() could be a suitable name ? With this you could even pass an extra-argument for the date-time separator just like with the current isoformat. This constructor would then be more like a counterpart to datetime.isoformat(), but it could simply be documented that calling it with fromisostring(datestring, sep=" ") can be used to parse strings written with datetime.str().
Wouldn't you expect a method called fromisostring to be able to parse any valid ISO string, especially given that there are third-party libs with functions named fromisoformat that do exactly that, and people suggest adding one of them to the stdlib every few months?
What you want to get across is that this function parses the default Python representation of datetimes; the fact that it happens to be a subset of ISO format doesn't seem as relevant here. I like the idea of a new alternate constructor, I'm just not crazy about the name.
Fair enough, it was just the first half-reasonable thing that came to my mind :) Being able to parse any valid ISO string would be another nice feature, but it's really a different story.
Wolfgang
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Wed, Aug 06, 2014 at 06:48:44AM -0700, Andrew Barnert wrote:
The purpose of strftime and strptime is to be inverses of each other--to generate and parse datetime strings in a specified way. If one of those ways is "the default Python string representation" for one function, it should be true for the other.
As of Python 3.3, neither strftime nor strptime take a default format. It's only __str__ which has an implicit default format.
(Doesn't gnu strf/ptime have an extension that gives you % codes for "default" date and time representations, which don't guarantee anything other than that they be reasonable for the locale and reversible?)
I worry about something like that. Unless the default is guaranteed to be a particular format, what counts as "reasonable" when the string is written out and when read back in may not be the same. -- Steven
On Aug 6, 2014, at 17:32, Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Aug 06, 2014 at 06:48:44AM -0700, Andrew Barnert wrote:
The purpose of strftime and strptime is to be inverses of each other--to generate and parse datetime strings in a specified way. If one of those ways is "the default Python string representation" for one function, it should be true for the other.
As of Python 3.3, neither strftime nor strptime take a default format. It's only __str__ which has an implicit default format.
Exactly my point. They're currently balanced. Adding a default format to strptime only would mean they can no longer be used as inverses of each other. One solution is to also add a default format to strftime. The other solution is to recognize that if the desire is an inverse for str, strptime is not the best way to write that. (I didn't have a good alternative suggestion, but Terry's later fromstr seems perfect to me.)
(Doesn't gnu strf/ptime have an extension that gives you % codes for "default" date and time representations, which don't guarantee anything other than that they be reasonable for the locale and reversible?)
I worry about something like that. Unless the default is guaranteed to be a particular format, what counts as "reasonable" when the string is written out and when read back in may not be the same.
I share that worry. Is it guaranteed that str on any Python anywhere can be parsed back to the same value on a different Python somewhere else? The docs for str (and the presumed docs got the new function) can make that guarantee, but most of the people in this thread didn't know that, or weren't confident in it. That's why I don't really like the idea of the inverse functions being __str__ and the constructor--it isn't obvious or explicit to the reader, and it's not quite as easy to look up. (That wasn't the initial proposal, but it came up later in the thread, so it was worth responding to.)
On 08/04/2014 06:39 PM, Steven D'Aprano wrote:
On Mon, Aug 04, 2014 at 10:56:56PM +0200, Wolfgang Maier wrote: [...]
it does hold true in 3.x, but the documented behavior is slightly more complex (I assume also in 2.x):
datetime.__str__() For a datetime instance d, str(d) is equivalent to d.isoformat(' ').
Since str(d) is documented to use a well-defined format, then I agree that it makes sense to make the second argument to d.strptime optional, and default to that same format. The concern I had was the sort of scenario Skip suggested: I might write out a datetime object as a string on one machine, where the format is X, and read it back elsewhere, where the format is Y, leading to at best an exception and at worse incorrect data.
What are the downsides of: dt = datetime.datetime.now() # assuming this works ;) sdt = str(dt) ndt = datetime.datetime(std) print(dt == ndt) #True -- ~Ethan~
On 06.08.2014 16:14, Ethan Furman wrote:
On 08/04/2014 06:39 PM, Steven D'Aprano wrote:
On Mon, Aug 04, 2014 at 10:56:56PM +0200, Wolfgang Maier wrote: [...]
it does hold true in 3.x, but the documented behavior is slightly more complex (I assume also in 2.x):
datetime.__str__() For a datetime instance d, str(d) is equivalent to d.isoformat(' ').
Since str(d) is documented to use a well-defined format, then I agree that it makes sense to make the second argument to d.strptime optional, and default to that same format. The concern I had was the sort of scenario Skip suggested: I might write out a datetime object as a string on one machine, where the format is X, and read it back elsewhere, where the format is Y, leading to at best an exception and at worse incorrect data.
What are the downsides of:
dt = datetime.datetime.now() # assuming this works ;) sdt = str(dt) ndt = datetime.datetime(std) print(dt == ndt) #True
I'll refrain from mentioning "explicit is better than implicit" ;) It's just that it seems to be a design pattern of the datetime class to provide alternative constructors as classmethods instead of doing magic things in __new__ . There are fromtimestamp and utcfromtimestamp already, and you can think of datetime.now() the same way. After all, you could decide to have this called when datetime() is called without an argument. I guess there are just too many different things that *could* make sense to pass to __new__ and to be treated implicitly. Wolfgang
On Wed, Aug 6, 2014 at 11:19 AM, Wolfgang Maier < wolfgang.maier@biologie.uni-freiburg.de> wrote:
What are the downsides of:
dt = datetime.datetime.now() # assuming this works ;) sdt = str(dt) ndt = datetime.datetime(std) print(dt == ndt) #True
I'll refrain from mentioning "explicit is better than implicit" ;)
It's just that it seems to be a design pattern of the datetime class to provide alternative constructors as classmethods instead of doing magic things in __new__ .
I don't think this is a "design pattern". In Python 2, having date(str) constructor was blocked by some magic that is there to support unpickling:
from datetime import date date('\x07\xd0\x01\x01') datetime.date(2000, 1, 1)
This is no longer an issue in Python 3. Note that if we allow date('2000-01-01'), this may become a more readable and efficient alternative to date(2001, 1, 1).
On 06.08.2014 17:42, Alexander Belopolsky wrote:
On Wed, Aug 6, 2014 at 11:19 AM, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de <mailto:wolfgang.maier@biologie.uni-freiburg.de>> wrote:
What are the downsides of:
dt = datetime.datetime.now() # assuming this works ;) sdt = str(dt) ndt = datetime.datetime(std) print(dt == ndt) #True
I'll refrain from mentioning "explicit is better than implicit" ;)
It's just that it seems to be a design pattern of the datetime class to provide alternative constructors as classmethods instead of doing magic things in __new__ .
I don't think this is a "design pattern". In Python 2, having date(str) constructor was blocked by some magic that is there to support unpickling:
from datetime import date date('\x07\xd0\x01\x01') datetime.date(2000, 1, 1)
I see. None of my examples (fromtimestamp, utcfromtimestamp and now) involves string parsing though.
This is no longer an issue in Python 3.
Note that if we allow date('2000-01-01'), this may become a more readable and efficient alternative to date(2001, 1, 1).
One problem with this is the very first concern raised by Steven and Skip in this thread: choosing a string format that __new__ can deal with *would* lock you into this format. If later, for example, full-blown ISO 8601 or even just RFC 3339 parsing makes it into the module, wouldn't you rather want this to be done by __new__ when it sees a string ? Implementing the current proposal as a classmethod with its own name (once a good one is accepted) is a much more cautious approach. BTW, Terry's suggestion of datetime.fromstr(s) sounds very reasonable.
On 08/06/2014 09:20 AM, Wolfgang Maier wrote:
On 06.08.2014 17:42, Alexander Belopolsky wrote:
On Wed, Aug 6, 2014 at 11:19 AM, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de <mailto:wolfgang.maier@biologie.uni-freiburg.de>> wrote:
What are the downsides of:
dt = datetime.datetime.now() # assuming this works ;) sdt = str(dt) ndt = datetime.datetime(std) print(dt == ndt) #True
I'll refrain from mentioning "explicit is better than implicit" ;)
It's just that it seems to be a design pattern of the datetime class to provide alternative constructors as classmethods instead of doing magic things in __new__ .
I don't think this is a "design pattern". In Python 2, having date(str) constructor was blocked by some magic that is there to support unpickling:
from datetime import date date('\x07\xd0\x01\x01') datetime.date(2000, 1, 1)
I see. None of my examples (fromtimestamp, utcfromtimestamp and now) involves string parsing though.
This is no longer an issue in Python 3.
Note that if we allow date('2000-01-01'), this may become a more readable and efficient alternative to date(2001, 1, 1).
One problem with this is the very first concern raised by Steven and Skip in this thread: choosing a string format that __new__ can deal with *would* lock you into this format. If later, for example, full-blown ISO 8601 or even just RFC 3339 parsing makes it into the module, wouldn't you rather want this to be done by __new__ when it sees a string ? Implementing the current proposal as a classmethod with its own name (once a good one is accepted) is a much more cautious approach.
BTW, Terry's suggestion of datetime.fromstr(s) sounds very reasonable.
For my own classes I accept both year, month, day, ..., or a single string in __new__. But for the stdlib I agree that .fromstr() is the better approach. +1 for .fromstr() -- ~Ethan~
On Wed, Aug 6, 2014 at 12:45 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
For my own classes I accept both year, month, day, ..., or a single string in __new__.
But for the stdlib I agree that .fromstr() is the better approach.
Can you explain why what is good for your own classes is not good for stdlib?
On 08/06/2014 09:53 AM, Alexander Belopolsky wrote:
On Wed, Aug 6, 2014 at 12:45 PM, Ethan Furman wrote:
For my own classes I accept both year, month, day, ..., or a single string in __new__.
But for the stdlib I agree that .fromstr() is the better approach.
Can you explain why what is good for your own classes is not good for stdlib?
I tolerate a higher level of risk in my own work, but the stdlib should be more stable. -- ~Ethan~
On Wed, Aug 6, 2014 at 12:20 PM, Wolfgang Maier < wolfgang.maier@biologie.uni-freiburg.de> wrote:
BTW, Terry's suggestion of datetime.fromstr(s) sounds very reasonable.
We don't have int.fromstr, float.fromstr, or in fact a .fromstr constructor for any other type. IMO, date(str) is "the obvious way to do it."
On 08/06/2014 09:47 AM, Alexander Belopolsky wrote:
On Wed, Aug 6, 2014 at 12:20 PM, Wolfgang Maier wrote:
BTW, Terry's suggestion of datetime.fromstr(s) sounds very reasonable.
We don't have int.fromstr, float.fromstr, or in fact a .fromstr constructor for any other type. IMO, date(str) is "the obvious way to do it."
int, float, and, I suspect, all the core data types, have the same __str__ as __repr__, so there's really no difference. datetime objects, on the other hand, definitely have different __str__ and __repr__, and the desire is to be able no eval(obj.__repr__()), not of __str__. -- ~Ethan~
On 8/6/2014 12:47 PM, Alexander Belopolsky wrote:
On Wed, Aug 6, 2014 at 12:20 PM, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de <mailto:wolfgang.maier@biologie.uni-freiburg.de>> wrote:
BTW, Terry's suggestion of datetime.fromstr(s) sounds very reasonable.
We don't have int.fromstr, float.fromstr, or in fact a .fromstr constructor for any other type. IMO, date(str) is "the obvious way to do it."
There are two parts to my suggestion. The first is to focus on the original goal of the thread, 'an inverse to __str__', and only that, not on automatically parsing, without providing a format, larger classes of possible strings. The second is to suggest a better spelling for that original goal than the original proposal (strptime without second argument) or alternate proposals like .fromisostring based on an expanded goal. I think '.fromstr' is the best possible spelling among '.from...' choices. If instead using the standard constructor works, fine with me. I am not sure of the criterion for adding alternative to .__init__ versus an alterntive method. -- Terry Jan Reedy
On Wed, Aug 6, 2014 at 3:26 PM, Terry Reedy <tjreedy@udel.edu> wrote:
I think '.fromstr' is the best possible spelling among '.from...' choices.
With this I agree. Since the path is already paved with float.fromhex, I am casting my +0 for date/datetime.fromstr.
On Aug 6, 2014, at 8:42, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
On Wed, Aug 6, 2014 at 11:19 AM, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:
What are the downsides of:
dt = datetime.datetime.now() # assuming this works ;) sdt = str(dt) ndt = datetime.datetime(std) print(dt == ndt) #True
I'll refrain from mentioning "explicit is better than implicit" ;)
It's just that it seems to be a design pattern of the datetime class to provide alternative constructors as classmethods instead of doing magic things in __new__ .
I don't think this is a "design pattern". In Python 2, having date(str) constructor was blocked by some magic that is there to support unpickling:
from datetime import date date('\x07\xd0\x01\x01') datetime.date(2000, 1, 1)
This is no longer an issue in Python 3.
Note that if we allow date('2000-01-01'), this may become a more readable and efficient alternative to date(2001, 1, 1).
More readable maybe, but more efficient? You're doing the same work, plus string parsing; you're eliminating two parameters (but only if you use *args) at the cost of three locals; by any measure it's less efficient. But more readable is the important part. In isolation it's readable, the question is whether the added complexity (in the docs and in people's heads) of an effectively-overloaded constructor is worth the cost. Given that, unlike all the obvious parallel cases (int, float, etc.) this constructor will not accept the repr, I'm not sure the answer comes out the same. But maybe it does. I'm -0 on this, +1 on Terry's fromstr, +0 on strptime and strftime both accepting no arguments, -1 on only strptime, -0.5 on fromisostring/fromisoformat, and -1 on remembering any of the other ideas in this thread well enough to comment.
On Wed, Aug 6, 2014 at 7:55 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
Given that, unlike all the obvious parallel cases (int, float, etc.) this constructor will not accept the repr, I'm not sure the answer comes out the same.
The parallel is in accepting str, not repr.
On Aug 6, 2014, at 7:19 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
On Wed, Aug 6, 2014 at 7:55 PM, Andrew Barnert <abarnert@yahoo.com <mailto:abarnert@yahoo.com>> wrote: Given that, unlike all the obvious parallel cases (int, float, etc.) this constructor will not accept the repr, I'm not sure the answer comes out the same.
The parallel is in accepting str, not repr.
Indeed. The precedent for repr is that may be eval-able, not that the repr string can be passed into the constructor.
On Aug 6, 2014, at 17:19, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
On Wed, Aug 6, 2014 at 7:55 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
Given that, unlike all the obvious parallel cases (int, float, etc.) this constructor will not accept the repr, I'm not sure the answer comes out the same.
The parallel is in accepting str, not repr.
Is it? Sure, I'll accept that's the parallel you have in mind, but is it a good one? The only way I can think to distinguish is this: For bytes, str, tuple, etc., there is no constructor from either string representation. For int, the two representations are identical. For float, they're different--and it's float(repr(f)) that gives you back the same value you started with. (Of course in the other direction, neither one is guaranteed to do so.) Also note that int can accept hex, etc. Python literal values as strings, the same ones eval can. Is there some documentation that implies that the int, float, etc. constructors are meant to take "human-readable" strings rather than "Python-evaluable" strings? Or some other case I'm missing?
On Wed, Aug 6, 2014 at 7:55 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
More readable maybe, but more efficient? You're doing the same work, plus string parsing; you're eliminating two parameters (but only if you use *args) at the cost of three locals; by any measure it's less efficient.
dis(lambda: date(2001, 1, 1)) 1 0 LOAD_GLOBAL 0 (date) 3 LOAD_CONST 1 (2001) 6 LOAD_CONST 2 (1) 9 LOAD_CONST 2 (1) 12 CALL_FUNCTION 3 15 RETURN_VALUE dis(lambda: date('2001-01-01')) 1 0 LOAD_GLOBAL 0 (date) 3 LOAD_CONST 1 ('2001-01-01') 6 CALL_FUNCTION 1 9 RETURN_VALUE
Since parsing will be done in C, it's cost can be made negligible. In implementations other than CPython, YMMV.
On 07.08.2014 02:29, Alexander Belopolsky wrote:
On Wed, Aug 6, 2014 at 7:55 PM, Andrew Barnert <abarnert@yahoo.com <mailto:abarnert@yahoo.com>> wrote:
More readable maybe, but more efficient? You're doing the same work, plus string parsing; you're eliminating two parameters (but only if you use *args) at the cost of three locals; by any measure it's less efficient.
Since parsing will be done in C, it's cost can be made negligible. In implementations other than CPython, YMMV.
Why would parsing occur in C ? The datetime module is implemented in pure Python.
On Thu, Aug 7, 2014 at 4:17 AM, Wolfgang Maier < wolfgang.maier@biologie.uni-freiburg.de> wrote:
Since parsing will be done in C, it's cost can be made negligible. In implementations other than CPython, YMMV.
Why would parsing occur in C ? The datetime module is implemented in pure Python.
No. In CPython, datetime module is implemented in C. http://hg.python.org/cpython/file/default/Modules/_datetimemodule.c
On 8 Aug 2014 03:27, "Alexander Belopolsky" <alexander.belopolsky@gmail.com> wrote:
On Thu, Aug 7, 2014 at 4:17 AM, Wolfgang Maier <
Since parsing will be done in C, it's cost can be made negligible. In implementations other than CPython, YMMV.
Why would parsing occur in C ? The datetime module is implemented in
wolfgang.maier@biologie.uni-freiburg.de> wrote: pure Python.
No. In CPython, datetime module is implemented in C.
http://hg.python.org/cpython/file/default/Modules/_datetimemodule.c
Don't we have both these days? (C accelerator with pure Python fallback) Anyway, I'm +1 for Wolfgang's trio of "fromstr" alternative constructors, but the suggested variation that allows both " " and "T" as the separator, rather than accepting a parameter. Anyone wanting more flexibility can use strptime, or else switch to something like dateutil. Cheers, Nick.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On 07.08.2014 19:27, Alexander Belopolsky wrote:
On Thu, Aug 7, 2014 at 4:17 AM, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de <mailto:wolfgang.maier@biologie.uni-freiburg.de>> wrote:
Since parsing will be done in C, it's cost can be made negligible. In implementations other than CPython, YMMV.
Why would parsing occur in C ? The datetime module is implemented in pure Python.
No. In CPython, datetime module is implemented in C.
http://hg.python.org/cpython/file/default/Modules/_datetimemodule.c
Just to make sure I understood things right: new additions to the datetime module would normally be implemented in the Python version. But then I was forgetting that your suggestion was about changing the default constructor, which would have to happen in the C version. Correct ?
On 07.08.2014 01:55, Andrew Barnert wrote:
On Aug 6, 2014, at 8:42, Alexander Belopolsky <alexander.belopolsky@gmail.com <mailto:alexander.belopolsky@gmail.com>> wrote:
Note that if we allow date('2000-01-01'), this may become a more readable and efficient alternative to date(2001, 1, 1).
I'm -0 on this, +1 on Terry's fromstr, +0 on strptime and strftime both accept ing no arguments, -1 on only strptime, -0.5 on fromisostring/fromisoformat, and -1 on remembering any of the other ideas in this thread well enough to comment.
So to summarize, the three currently discussed options (in order of current votes for) are: - datetime.fromstr(string) classmethod to be used as an alternative constructor - datetime (string) add limited string parsing to the default constructor - datetime.strptime (string) parse datetime.__str__ format when format argument is not specified (the OP's original proposal) A point that has not been discussed yet is that the first two options could easily be implemented for datetime.date and datetime.time objects as well providing counterparts for date.__str_ and time.__str__ . With strptime this is not possible since datetime.date and datetime.time don't have such a method. In addition, the first two options, in particular, raise a scope question. I can see the following options: - string has to be of exactly the format generated by datetime.__str__, i.e. YYYY-MM-DD HH:MM:SS with optional microseconds and timezone information (the original proposal) - string has to be of a format that can be generated by datetime.isoformat, of which the datetime.__str__ format is a subset. The difference is that with datetime.isoformat the separator between the date and time portions of the string can be specified, while with datetime.__str__ this is fixed to " ". Accordingly with this option, you would have either: datetime.fromstr(string, sep = " ") or datetime(string, sep = " ") to be able to pass an optional separator. When absent the datetime.__str__ format is expected. - (potentially, but I don't think anyone opted for it: more powerful parsing of a wider range of formats) Personally, I'm in favor of the second option here for the following reason: the string format returned by datetime.__str__ is not fully ISO 8601 compatible because it uses " " as the separator instead of "T", i.e. if an application has to produce ISO 8601 compliant output it has to use datetime.isoformat not __str__, even though the two formats differ by just a single character. Hence, allowing the separator to be specified makes the new functionality a lot more useful at the expense of only a moderate increase in complexity. Note that this is still fundamentally different from asking for any full parsing for ISO 8601 and that it would have no impact on potential date.fromstr and time.fromstr methods or their default constructor versions since they do not have to deal with a separator. So my preferred complete version would be something like: datetime.fromstr (string, sep = ' ') Return a datetime object from a string as generated by datetime.isoformat(sep). With the default value of sep this is also the format generated by datetime.__str__(). date.fromstr (string) Return a date object from a string as generated by date.__str__() and date.isoformat(). time.fromstr (string) Return a time object from a string as generated by time.__str__() and time.isoformat(). Wolfgang
On Thu, Aug 7, 2014 at 11:57 AM, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:
On 07.08.2014 01:55, Andrew Barnert wrote:
On Aug 6, 2014, at 8:42, Alexander Belopolsky <alexander.belopolsky@gmail.com <mailto:alexander.belopolsky@gmail.com>>
wrote:
Note that if we allow date('2000-01-01'), this may become a more readable and efficient alternative to date(2001, 1, 1).
I'm -0 on this, +1 on Terry's fromstr, +0 on strptime and strftime both accept ing no arguments, -1 on only strptime, -0.5 on
fromisostring/fromisoformat, and -1 on remembering any of the other ideas in this thread well enough to comment.
So to summarize, the three currently discussed options (in order of current votes for) are:
- datetime.fromstr(string) classmethod to be used as an alternative constructor
- datetime (string) add limited string parsing to the default constructor
- datetime.strptime (string) parse datetime.__str__ format when format argument is not specified (the OP's original proposal)
A point that has not been discussed yet is that the first two options could easily be implemented for datetime.date and datetime.time objects as well providing counterparts for date.__str_ and time.__str__ . With strptime this is not possible since datetime.date and datetime.time don't have such a method.
In addition, the first two options, in particular, raise a scope question. I can see the following options:
- string has to be of exactly the format generated by datetime.__str__, i.e. YYYY-MM-DD HH:MM:SS with optional microseconds and timezone information (the original proposal)
- string has to be of a format that can be generated by datetime.isoformat, of which the datetime.__str__ format is a subset. The difference is that with datetime.isoformat the separator between the date and time portions of the string can be specified, while with datetime.__str__ this is fixed to " ". Accordingly with this option, you would have either:
datetime.fromstr(string, sep = " ") or datetime(string, sep = " ")
to be able to pass an optional separator. When absent the datetime.__str__ format is expected.
- (potentially, but I don't think anyone opted for it: more powerful parsing of a wider range of formats)
Personally, I'm in favor of the second option here for the following reason: the string format returned by datetime.__str__ is not fully ISO 8601 compatible because it uses " " as the separator instead of "T", i.e. if an application has to produce ISO 8601 compliant output it has to use datetime.isoformat not __str__, even though the two formats differ by just a single character. Hence, allowing the separator to be specified makes the new functionality a lot more useful at the expense of only a moderate increase in complexity. Note that this is still fundamentally different from asking for any full parsing for ISO 8601 and that it would have no impact on potential date.fromstr and time.fromstr methods or their default constructor versions since they do not have to deal with a separator.
According to Wikipedia's quote of the standard: "the character [T] may be omitted in applications where there is no risk of confusing a date and time of day representation with others defined in this International Standard." If you give the string to a deatetime constructor, there is obviously no such confusion. I'm not sure if replacing by space counts as omitting, but it's the de-facto standard :) Anyway, i'd prefer not taking sep as an argument, but accepting both ' ' and 'T', since other separators aren't widely used. Also, note that we while a full ISO parser may not be feasible, we can (and IMO should) implement a complete RFC 3339 parser (with optional space separator), and document that both __str__ and fromstr (or whatever it ends up to be) is indeed is compatible with RFC 3339.
participants (13)
-
Akira Li -
Alexander Belopolsky -
Andrew Barnert -
Ethan Furman -
Jonas Wielicki -
Nick Coghlan -
Petr Viktorin -
Ram Rachum -
Ryan Hiebert -
Skip Montanaro -
Steven D'Aprano -
Terry Reedy -
Wolfgang Maier