And to give another update on this topic: the development branch of pandas now contains an experimental version of this "columnar store" (using an ArrayManager class instead of the BlockManager under the hood, which stores the columns as a list of 1D arrays), which is almost feature-complete (the biggest missing links are JSON and PyTables IO).

At the moment, there is an option to enable it for experimenting with it (not yet documented, as it might still see behaviour changes):

# set the default manager to ArrayManager
pd.options.mode.data_manager = "array"

# when creating a DataFrame, you will now get one with an ArrayManager instead of BlockManager
df = pd.DataFrame(...)
df = pd.read_csv(...)

There are still some remaining work items (more IO, ironing out some known bugs/todo's, checking performance), see https://github.com/pandas-dev/pandas/issues/39146 to keep track of this.

Best,
Joris

On Tue, 9 Feb 2021 at 19:17, Joris Van den Bossche <jorisvandenbossche@gmail.com> wrote:

On Mon, 31 Aug 2020 at 16:20, Joris Van den Bossche <jorisvandenbossche@gmail.com> wrote:


On Fri, 12 Jun 2020 at 22:34, Joris Van den Bossche <jorisvandenbossche@gmail.com> wrote:
On Thu, 11 Jun 2020 at 23:35, Brock Mendel <jbrockmendel@gmail.com> wrote:
> We actually have prototypes: the prototype of the split-policy discussed

AFAICT that is a 5 year old branch.  Is there a version of this based off of master that you can show asv results for?

A correction here: that branch has been updated several times over the last 5 years, and a last time two weeks ago when I started this thread, as I explained in the github issue comment I linked to: https://github.com/pandas-dev/pandas/issues/10556#issuecomment-633703160
 
> Also, if performance is in the end the decisive criterion, I repeat my earlier remark in this thread: we need to be clearer about what we want / expect.

In principle, this is pretty much exactly what the asvs are supposed to represent.

Well, I am repeating myself .. but I already mentioned that I am not sure ASV is fully useful for this, as that requires a complete working replacement, which is IMO too much to ask for an initial prototype.

But OK, the message is clear: we need a more concrete implementation / prototype. So let's put this discussion aside for a moment, and focus on that instead. I will try to look at that in the coming weeks, but any help is welcome (and I will try to get it running with ASV, or at least a part of it).
 
To come back to this: I cleaned up a proof-of-concept implementation that I started after the above discussed, and put it in a PR to view/discuss: https://github.com/pandas-dev/pandas/pull/36010
 

Another follow-up: the proof-of-concept now is merged in the master branch, and I am currently working on making it more feature complete (see https://github.com/pandas-dev/pandas/issues/39146 for an overview issue)

Joris