[IPython-dev] ANN: DistArray 0.5 release
ellisonbg at gmail.com
Mon Aug 11 16:47:15 EDT 2014
Thanks for sharing, this looks great!
On Mon, Aug 11, 2014 at 1:38 PM, Kurt Smith <ksmith at enthought.com> wrote:
> DistArray 0.5 release
> **Mailing list:** distarray at googlegroups.com
> **Documentation:** http://distarray.readthedocs.org
> **License:** Three-clause BSD
> **Python versions:** 2.7, 3.3, and 3.4
> **OS support:** \*nix and Mac OS X
> What is DistArray?
> DistArray aims to bring the ease-of-use of NumPy to data-parallel
> high-performance computing. It provides distributed multi-dimensional NumPy
> arrays, distributed ufuncs, and distributed IO capabilities. It can
> efficiently interoperate with external distributed libraries like Trilinos.
> DistArray works with NumPy and builds on top of it in a flexible and natural
> 0.5 Release
> Noteworthy improvements in this release include:
> * closer alignment with NumPy's API,
> * support for Python 3.4 (existing support for Python 2.7 and 3.3),
> * a performance-oriented MPI-only mode for deployment on clusters and
> * a way to register user-defined functions to be callable locally on worker
> * more consistent naming of sub-packages,
> * testing with MPICH2 (already tested against OpenMPI),
> * improved and expanded examples,
> * installed version testable via ``distarray.test()``, and
> * performance and scaling improvements.
> With this release, DistArray ready for real-world testing and deployment.
> project is still evolving rapidly and we appreciate the continued input from
> the larger scientific-Python community.
> Existing features
> * supports NumPy-like slicing, reductions, and ufuncs on distributed
> multidimensional arrays;
> * has a client-engine process design -- data resides on the worker
> commands are initiated from master;
> * allows full control over what is executed on the worker processes and
> integrates transparently with the master process;
> * allows direct communication between workers, bypassing the master process
> for scalability;
> * integrates with IPython.parallel for interactive creation and exploration
> distributed data;
> * supports distributed ufuncs (currently without broadcasting);
> * builds on and leverages MPI via MPI4Py in a transparent and user-friendly
> * has basic support for unstructured arrays;
> * supports user-controllable array distributions across workers (block,
> cyclic, block-cyclic, and unstructured) on a per-axis basis;
> * has a straightforward API to control how an array is distributed;
> * has basic plotting support for visualization of array distributions;
> * separates the array’s distribution from the array’s data -- useful for
> slicing, reductions, redistribution, broadcasting, and other operations;
> * implements distributed random arrays;
> * supports ``.npy``-like flat-file IO and hdf5 parallel IO (via ``h5py``);
> leverages MPI-based IO parallelism in an easy-to-use and transparent way;
> * supports the distributed array protocol [protocol]_, which allows
> independently developed parallel libraries to share distributed arrays
> without copying, analogous to the PEP-3118 new buffer protocol.
> Planned features and roadmap
> Near-term features and improvements include:
> * array re-distribution capabilities;
> * lazy evaluation and deferred computation for latency hiding;
> * interoperation with Trilinos [Trilinos]_; and
> * distributed broadcasting support.
> The longer-term roadmap includes:
> * Integration with other packages [petsc]_ that subscribe to the distributed
> array protocol [protocol]_;
> * Distributed fancy indexing;
> * Out-of-core computations;
> * Support for distributed sorting and other non-trivial distributed
> algorithms; and
> * End-user control over communication and temporary array creation, and
> performance aspects of distributed computations.
> History and funding
> Brian Granger started DistArray as a NASA-funded SBIR project in 2008.
> Enthought picked it up as part of a DOE Phase II SBIR [SBIR]_ to provide a
> generally useful distributed array package. It builds on NumPy, MPI,
> IPython, IPython.parallel, and interfaces with the Trilinos suite of
> distributed HPC solvers (via PyTrilinos [Trilinos]_).
> This material is based upon work supported by the Department of Energy under
> Award Number DE-SC0007699.
> This report was prepared as an account of work sponsored by an agency of the
> United States Government. Neither the United States Government nor any
> thereof, nor any of their employees, makes any warranty, express or implied,
> or assumes any legal liability or responsibility for the accuracy,
> completeness, or usefulness of any information, apparatus, product, or
> disclosed, or represents that its use would not infringe privately owned
> rights. Reference herein to any specific commercial product, process, or
> service by trade name, trademark, manufacturer, or otherwise does not
> necessarily constitute or imply its endorsement, recommendation, or favoring
> by the United States Government or any agency thereof. The views and
> of authors expressed herein do not necessarily state or reflect those of the
> United States Government or any agency thereof.
> .. [protocol]
> .. [Trilinos] http://trilinos.org/
> .. [petsc] http://www.mcs.anl.gov/petsc/
> .. [SBIR] http://www.sbir.gov/sbirsearch/detail/410257
> Kurt W. Smith, Ph.D.
> Enthought, Inc. | 512.536.1057
> IPython-dev mailing list
> IPython-dev at scipy.org
Brian E. Granger
Cal Poly State University, San Luis Obispo
@ellisonbg on Twitter and GitHub
bgranger at calpoly.edu and ellisonbg at gmail.com
More information about the IPython-dev