Thanks for bringing this up.

Limiting the discussion to getitem for a moment (I think other methods like fillna could deviate if we really want, or could have keywords for it), I am personally in favor of option 2: making everything strict (since I opened that referenced issue about it: https://github.com/pandas-dev/pandas/issues/39584)

Now, on the short term, already starting to deprecate silent casting to object (so the first aspect of option 3) doesn't prevent later becoming even more strict (it only wouldn't fully solve the existing inconsistencies), so for that point of view, I personally am fine with that.

Joris

On Wed, 27 Oct 2021 at 06:38, Brock Mendel <jbrockmendel@gmail.com> wrote:
TLDR
----
We have inconsistent silent-casting vs raising logic for numpy vs EA dtypes
(and inconsistencies within EA dtypes).  By deprecating silently casting to *object* dtype, we can *mostly* make the behaviors match.


Background
----------
A number of Series/DataFrame methods will silently cast when dealing with mismatched values.  With a numpy dtype, each of the following silently
cast to float64:

    ser = pd.Series([1, 2, 3], dtype="i8")

    ser.shift(1, fill_value=1.5)
    ser.mask([True, False, False], 1.5)
    ser.where([False, True, True], 1.5)
    ser.replace(1, 1.5)
    ser[0] = 1.5
    ser.fillna(1.5)  # <- this one doesn't cast as it is a no-op

If we were to pass "foo" or a pd.Period, these would coerce to object
instead of float.

By contrast, similar mixed-type operations with an ExtensionDtype Series
_mostly_ raise:

    ser2 = pd.Series(pd.period_range("2016-01-01", periods=3, freq="D"))

    ser2.shift(1, fill_value=1.5)         # <- ValueError
    ser2.mask([True, False, False], 1.5)  # <- ValueError
    ser2.where([False, True, True], 1.5)  # <- ValueError
    ser2.fillna(1.5)                      # <- TypeError
    ser2.replace(ser2[0], 1.5)            # <- coerces to object
    ser2[0] = 1.5                         # <- coerces to object

    ser3 = pd.Series([pd.NA, 2, 3], dtype="Int64")

    ser3.shift(1, fill_value=1.5)         # <- TypeError
    ser3.mask([True, False, False], 1.5)  # <- TypeError
    ser3.where([False, True, True], 1.5)  # <- TypeError
    ser3.fillna(1.5)                      # <- TypeError
    ser3.replace(ser3[0], 1.5)            # <- TypeError
    ser3[0] = 1.5                         # <- TypeError

timedelta64, datetime64, and datetime64tz mostly behave like the numpy dtypes,
with a few exceptions:

    - shift raises on mismatch
    - fillna raises on mismatch for timedelta64, casts for the others

Categorical mostly behaves like other ExtensionDtypes, except for replace which
has special logic.

Goals
-----
- Have matching behavior across dtypes.
- Share code.

Options
-------
1) Change EA (and dt64/td64) behavior to match non-EA behavior
2) Change non-EA behavior to match EA behavior (or stricter xref https://github.com/pandas-dev/pandas/issues/39584)
3) Deprecate (and eventually raise on) silent casting to _object_ dtype, allowing silent casting otherwise.


Here I am advocating for option 3).  The advantages as I see them:

A) For numpy dtypes, we retain the most useful cases (int->float)
B) Deprecates cases most likely to be unintentional (e.g. typo "2016-01-01" -> "2p16-01-01" causing a datetime64 Series to silently cast)
C) For td64/dt64/dt64tz/period, the *only* silent casting is to object, so this completely gets rid of special-casing among that code
D) For IntegerArray, FloatingArray, IntervalArray leaves open the option of allowing e.g. Integer->Floating casting (xref https://github.com/pandas-dev/pandas/issues/25288#issuecomment-941762174)
E) Does not preclude later deciding on the stricter options in 2)
_______________________________________________
Pandas-dev mailing list
Pandas-dev@python.org
https://mail.python.org/mailman/listinfo/pandas-dev