[Pandas-dev] Sparse data structures in pandas: refactor - feedback welcome!

Fri Nov 16 15:07:58 EST 2018

Hi all,

Distributing the message that Tom put on twitter (
https://twitter.com/TomAugspurger/status/1062718319445213184) also on the
mailing lists: we are making changes to the support for sparse data in
pandas, and would like to get feedback on this.

To give some context: part of the internals of pandas are getting
refactored based on the ExtensionArrays. This also applies to the sparse
data structures:

- The SparseArray is refactored to follow the ExtensionArray protocol, and
this has some consequences (also impacting SparseSeries and Series holding
sparse data): no longer subclassing numpy.ndarray, change in `np.asarray`
behaviour, ... For more details see
http://pandas-docs.github.io/pandas-docs-travis/whatsnew/v0.24.0.html#sparse-data-structure-refactor
.

- Since a normal pandas Series and DataFrame can hold sparse data, there
may be no need for the dedicated SparseSeries and SparseDataFrame
subclasses. Therefore, we are planning to deprecate those subclasses, and
the specific sparse functionality will be accessible on normal
Series/DataFrame with the `sparse` accessor.
  However, this might have complications we didn't think about, so we need
your feedback!

See https://github.com/pandas-dev/pandas/issues/19239 and
https://github.com/pandas-dev/pandas/issues/21978 for related github issues
on this topic.

Is you are a user of the sparse functionalities of pandas, trying out
master / providing feedback is much appreciated.

Best,
Joris

(I send it both to pydata and pandas-dev mailing lists, but please answer
to pandas-dev at python.org)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20181116/afb5f3a1/attachment.html>