[Pandas-dev] A case for a simplified (non-consolidating) BlockManager with 1D blocks

Joris Van den Bossche jorisvandenbossche at gmail.com
Tue May 26 15:44:43 EDT 2020


On Tue, 26 May 2020 at 16:14, Brock Mendel <jbrockmendel at gmail.com> wrote:

> >> Assuming we go down this path, do you have an idea of how we get from
> here to there incrementally?  i.e. presumably this wont just be one massive
> PR
> >  [...] I would first like to focus on the "assuming we go down this
> path" part. Let's discuss the pros and cons and trade-offs, and try to turn
> assumptions in an agreed-upon roadmap. [...]
>
> I think understanding the difficulty/feasibility of the implementation is
> a pretty important part of the pros/cons.
>

That's true. Personally I think there are enough options to do it to not
have to worry about the "how" too much, but for sure it will be a lot of
work to do it properly (so rather the "who is going to do this").


> Looking back at #10556, I'm wondering if we could disable _most_
> consolidation, e.g. only consolidate when making copies anyway, which might
> be a never-break-views policy.  From a user standpoint would that achieve
> much/most of th benefits here?
>

That could certainly alleviate some of the drawbacks of the consolidated
BlockManager regarding its copying behaviour (but not necessarily regarding
the transparency / understandability of it, I would say).
But for example for the "complexity of the internals" argument, I think
this would rather make it worse. Now, you at least know (after ensuring
consolidation) that you have only a single block for a certain dtype. Still
having many, potentially-but-not-always consolidated 2D blocks will make it
more difficult to optimize the situation of non-consolidated / 1D blocks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pandas-dev/attachments/20200526/6071f76f/attachment-0001.html>


More information about the Pandas-dev mailing list