Hi all, I saw this behaviour and I don't know if this is a bug or feature. I don't have much experience with directly inheriting from pandas.DataFrame as I've always preferred aggregation rather than inheritance there. A working sample is pasted below. Notice how *df.astype(dtypes)* changes the type to pandas.DataFrame. Any suggestions if this is intended behaviour? import pandas as pd class DF(pd.DataFrame): @property def _constructor(self): return self.__class__ df = DF({ 'A': [1,2,3], 'B': [10,20,30], 'C': [100,200,300], }) # Type is DF a = df['A'] # type is Series ab = df[['A', 'B']] # type is DF dtypes = {'A': 'float64', 'B': 'float64', 'C': 'float64'} x = df.astype(dtypes) type(x) # type is pd.DataFrame Regards, Simeon
Hi Simeon, This is a somewhat known issue with astype(), and more in general related to the behaviour of concat dealing with subclasses. For example, in GeoPandas, we override astype() for this reason to ensure a proper return type: https://github.com/geopandas/geopandas/blob/ee8adfb27659e9f982ba8cdadbf62c6b... When using astype with a dictionary of column name -> dtype, the underlying implementation casts every column separately and then uses concat to combine the columns (Series objects) back into a dataframe. However, without doing anything special in astype(), that means it relies on the logic of concat to determine the output class (which is to use the _constructor_expanddim of the first object, i.e. of the first column / Series). See https://github.com/pandas-dev/pandas/issues/35415 for some discussion about this. I think that we could add some extra logic to the astype method implementation to try to preserve the original class (by using its _constructor) after doing the concat, similarly as was done recently for the convert_dtypes() method (https://github.com/pandas-dev/pandas/pull/44249). I think a contribution (pull request) for that would certainly be welcome! Best, Joris On Mon, 13 Dec 2021 at 13:17, Simeon Simeonov <simeon.simeonov.s@gmail.com> wrote:
Hi all,
I saw this behaviour and I don't know if this is a bug or feature. I don't have much experience with directly inheriting from pandas.DataFrame as I've always preferred aggregation rather than inheritance there. A working sample is pasted below. Notice how *df.astype(dtypes)* changes the type to pandas.DataFrame. Any suggestions if this is intended behaviour?
import pandas as pd class DF(pd.DataFrame): @property def _constructor(self): return self.__class__
df = DF({ 'A': [1,2,3], 'B': [10,20,30], 'C': [100,200,300], }) # Type is DF
a = df['A'] # type is Series ab = df[['A', 'B']] # type is DF
dtypes = {'A': 'float64', 'B': 'float64', 'C': 'float64'} x = df.astype(dtypes) type(x) # type is pd.DataFrame
Regards,
Simeon
_______________________________________________ Pandas-dev mailing list Pandas-dev@python.org https://mail.python.org/mailman/listinfo/pandas-dev
participants (2)
-
Joris Van den Bossche -
Simeon Simeonov