Hi Simeon,

This is a somewhat known issue with astype(), and more in general related to the behaviour of concat dealing with subclasses.

For example, in GeoPandas, we override astype() for this reason to ensure a proper return type: https://github.com/geopandas/geopandas/blob/ee8adfb27659e9f982ba8cdadbf62c6b36dcc053/geopandas/geodataframe.py#L1694-L1718

When using astype with a dictionary of column name -> dtype, the underlying implementation casts every column separately and then uses concat to combine the columns (Series objects) back into a dataframe.
However, without doing anything special in astype(), that means it relies on the logic of concat to determine the output class (which is to use the _constructor_expanddim of the first object, i.e. of the first column / Series). See https://github.com/pandas-dev/pandas/issues/35415 for some discussion about this.

I think that we could add some extra logic to the astype method implementation to try to preserve the original class (by using its _constructor) after doing the concat, similarly as was done recently for the convert_dtypes() method (https://github.com/pandas-dev/pandas/pull/44249). I think a contribution (pull request) for that would certainly be welcome!

Best,
Joris

On Mon, 13 Dec 2021 at 13:17, Simeon Simeonov <simeon.simeonov.s@gmail.com> wrote:
Hi all,

I saw this behaviour and I don't know if this is a bug or feature. I don't have much experience with directly inheriting from pandas.DataFrame as I've always preferred aggregation rather than inheritance there. A working sample is pasted below. Notice how df.astype(dtypes) changes the type to pandas.DataFrame. Any suggestions if this is intended behaviour?


import pandas as pd

class DF(pd.DataFrame):
    @property
    def _constructor(self):
        return self.__class__


df = DF({
    'A': [1,2,3],
    'B': [10,20,30],
    'C': [100,200,300],
})  # Type is DF


a = df['A'] # type is Series
ab = df[['A', 'B']] # type is DF

dtypes = {'A': 'float64', 'B': 'float64', 'C': 'float64'}
x = df.astype(dtypes)

type(x)  # type is pd.DataFrame

Regards,
Simeon

_______________________________________________
Pandas-dev mailing list
Pandas-dev@python.org
https://mail.python.org/mailman/listinfo/pandas-dev