[Pandas-dev] What could a pandas 2.0 look like?

Joris Van den Bossche jorisvandenbossche at gmail.com
Mon Feb 10 12:43:21 EST 2020


pandas 1.0 is out, so time to start thinking about 2.0 ;)

In principle, pandas 2.0 will just be one of the next releases when we
decide we want to clean-up the deprecations / make a few changes that are
hard to deprecate (following our new versioning policy).
But nonetheless, I think it can still be interesting to think about it if
it can also be something more than that, and have more specific goals in
mind*.

Last year I made the pd.NA proposal, which resulted in using that for the
nullale integer, boolean and string dtypes. In the proposal, pd.NA was
described as "can be used consistently across all data types". And for me,
the aspirational end goal of this proposal is to *actually* have this for
*all* dtypes, but we never really discussed this aspect explicitly.

So, for me, a possible future pandas 2.0:

   - Uses all "nullable dtypes" by default (i.e. dtypes that use pd.NA as
   missing value indicator). That means we add a nullable version of all other
   dtypes (as we now already did for int, boolean, string). End goal: a single
   missing value indicator with the same behavior for all dtypes.
   - If we add such nullable dtypes using the extension dtypes/array
   mechanism (so it can first be opt-in in 1.X), this could "automatically"
   lead to a simplification of the internals / Block Manager (another
   aspirational goal that has been discussed before, but never became
   concrete). Because in such a case (all extension dtypes), we would only be
   using 1D blocks (simplifying the 1D / 2D thorny cases in internals). This
   simplifies the memory model, consolidation, etc

Do you think this is a desirable goal? And realistic? Other aspirational
goals?

Best,
Joris

*Agreeing on goals doesn't mean it will happen, that's open source (or at
least community-based open source). But I think it can still be useful to
guide some efforts where possible or in trying to get traction for certain
issues from contributors. And then we can still see if it gets done in 2.0,
3.0, 4.0 or never ;)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20200210/72e1de4d/attachment.html>


More information about the Pandas-dev mailing list