[Numpy-discussion] Fast Access to Container of Numpy Arrays on Disk?

Stephan Hoyer shoyer at gmail.com
Thu Jan 14 17:13:38 EST 2016


On Thu, Jan 14, 2016 at 8:26 AM, Travis Oliphant <travis at continuum.io>
wrote:

> I don't know enough about xray to know whether it supports this kind of
> general labeling to be able to build your entire data-structure as an x-ray
> object.   Dask could definitely be used to process your data in an easy to
> describe manner (creating a dask.bag of dask.arrays would work though I'm
> not sure there are any methods that would buy you from just having a
> standard dictionary of dask.arrays).   You can definitely use dask
> imperative to parallelize your data-manipulation algorithms.
>

Indeed, xray's data model is not flexible enough to represent this sort of
data -- it's designed around cases where multiple arrays use shared axes.

However, I would indeed recommend dask.array (coupled with some sort of
on-disk storage) as a possible solution for this problem, if you need to be
able manipulate these arrays with an API that looks like NumPy. That said,
the fact that your data consists of ragged arrays suggests that the
dask.array API may be less useful for you.

Tools like dask.imperative, coupled with HDF5 for storage, could still be
very useful, though.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160114/c3f689e2/attachment.html>


More information about the NumPy-Discussion mailing list