Proposal to change the default of to_datetime in case of errors from 'ignore' to 'raise'
Hi all, On github there is a proposal to change the default behaviour of to_datetime in case of a parsing error from 'ignore' (leaving the values untouched) to 'raise' (raise an error). As a small example, the current behaviour: In [5]: pd.to_datetime('2014-30-30', errors='ignore') # the default now Out[5]: '2014-30-30' In [6]: pd.to_datetime('2014-30-30', errors='raise') ... ValueError: month must be in 1..12 So the proposal would be to change the default to the second case, raising an error. Note that this behaviour is already the default when providing your own format (and so in fact ignoring the value of the errors keyword): In [7]: pd.to_datetime('2014-30-30', format='%Y-%m-%d') ... ValueError: time data '2014-30-30' does not match format '%Y-%m-%d' *Are there any objections to this change? * *Are there people relying on the fact that, by default, to_datetime returns the exact original value if parsing does not succeed?* Best regards, Joris
Personally I'm very much in favor of this change. I don't like silent defaults ;) L On Wednesday, July 22, 2015 at 4:59:31 PM UTC+2, Joris Van den Bossche wrote:
Hi all,
On github there is a proposal to change the default behaviour of to_datetime in case of a parsing error from 'ignore' (leaving the values untouched) to 'raise' (raise an error).
As a small example, the current behaviour:
In [5]: pd.to_datetime('2014-30-30', errors='ignore') # the default now Out[5]: '2014-30-30'
In [6]: pd.to_datetime('2014-30-30', errors='raise') ... ValueError: month must be in 1..12
So the proposal would be to change the default to the second case, raising an error.
Note that this behaviour is already the default when providing your own format (and so in fact ignoring the value of the errors keyword):
In [7]: pd.to_datetime('2014-30-30', format='%Y-%m-%d') ... ValueError: time data '2014-30-30' does not match format '%Y-%m-%d'
*Are there any objections to this change? * *Are there people relying on the fact that, by default, to_datetime returns the exact original value if parsing does not succeed?*
Best regards, Joris
Well, I was only yesterday complaining at github about the silent default of read_csv converting 'NA' to NaN. ;-) So I have to agree with Lorenzo that this is a good change. It also seems more consistent with pandas overall behavior. FWIW, Stata's default with these sorts of operations is always to tell you how many values were changed, which is often very helpful. E.g. if Stata tells you zero values were changed, this is a big clue you screwed up. Often this is more verbose than desired, but it's also easy to change that. So, I'm definitely fine with just making it an error, but a possible middle ground would be a short report like: "20 values changed, 5 values not changed". On Thursday, July 23, 2015 at 9:51:11 AM UTC-4, Lorenzo De Leo wrote:
Personally I'm very much in favor of this change. I don't like silent defaults ;)
L
On Wednesday, July 22, 2015 at 4:59:31 PM UTC+2, Joris Van den Bossche wrote:
Hi all,
On github there is a proposal to change the default behaviour of to_datetime in case of a parsing error from 'ignore' (leaving the values untouched) to 'raise' (raise an error).
As a small example, the current behaviour:
In [5]: pd.to_datetime('2014-30-30', errors='ignore') # the default now Out[5]: '2014-30-30'
In [6]: pd.to_datetime('2014-30-30', errors='raise') ... ValueError: month must be in 1..12
So the proposal would be to change the default to the second case, raising an error.
Note that this behaviour is already the default when providing your own format (and so in fact ignoring the value of the errors keyword):
In [7]: pd.to_datetime('2014-30-30', format='%Y-%m-%d') ... ValueError: time data '2014-30-30' does not match format '%Y-%m-%d'
*Are there any objections to this change? * *Are there people relying on the fact that, by default, to_datetime returns the exact original value if parsing does not succeed?*
Best regards, Joris
I forgot to mention the github issue: https://github.com/pydata/pandas/issues/10636, and there is now also a PR to do the change: https://github.com/pydata/pandas/pull/10674 John, if we want some more verbose output when coercing the errors, is a separate issue I think (as this is not the default). You can always open an issue on github for this. Joris 2015-07-23 17:10 GMT+02:00 John E <eiler13@gmail.com>:
Well, I was only yesterday complaining at github about the silent default of read_csv converting 'NA' to NaN. ;-) So I have to agree with Lorenzo that this is a good change. It also seems more consistent with pandas overall behavior.
FWIW, Stata's default with these sorts of operations is always to tell you how many values were changed, which is often very helpful. E.g. if Stata tells you zero values were changed, this is a big clue you screwed up. Often this is more verbose than desired, but it's also easy to change that.
So, I'm definitely fine with just making it an error, but a possible middle ground would be a short report like: "20 values changed, 5 values not changed".
On Thursday, July 23, 2015 at 9:51:11 AM UTC-4, Lorenzo De Leo wrote:
Personally I'm very much in favor of this change. I don't like silent defaults ;)
L
On Wednesday, July 22, 2015 at 4:59:31 PM UTC+2, Joris Van den Bossche wrote:
Hi all,
On github there is a proposal to change the default behaviour of to_datetime in case of a parsing error from 'ignore' (leaving the values untouched) to 'raise' (raise an error).
As a small example, the current behaviour:
In [5]: pd.to_datetime('2014-30-30', errors='ignore') # the default now Out[5]: '2014-30-30'
In [6]: pd.to_datetime('2014-30-30', errors='raise') ... ValueError: month must be in 1..12
So the proposal would be to change the default to the second case, raising an error.
Note that this behaviour is already the default when providing your own format (and so in fact ignoring the value of the errors keyword):
In [7]: pd.to_datetime('2014-30-30', format='%Y-%m-%d') ... ValueError: time data '2014-30-30' does not match format '%Y-%m-%d'
*Are there any objections to this change? * *Are there people relying on the fact that, by default, to_datetime returns the exact original value if parsing does not succeed?*
Best regards, Joris
participants (3)
-
John E -
Joris Van den Bossche -
Lorenzo De Leo