[AstroPy] Projects involving irregularly shaped data
E. Madison Bray
erik.m.bray at gmail.com
Wed Oct 7 16:19:34 EDT 2020
Awkward Array looks, well, awesome. Thanks for pointing it out. (By
the way, we met a couple years ago when you came to a workshop in
France, hi!) Besides ASDF which you already mentioned it might also
be useful for dealing with some more awkward FITS files too, but I'm
not sure. I'll have to give it more careful scrutiny.
I also have one non-astro/physics related project where this could be
useful. In this case it's a machine learning application where I have
binary matrices of 0s and 1s but of potentially different sizes, but
they can be batched together when doing mini-batch gradient descent.
In this case I just mask out the margins with -1s, which the models
then have to account for in their evaluation. Do you know if anyone
has tried adapting Awkward Array for use with PyTorch?
On Wed, Oct 7, 2020 at 9:59 PM Jim Pivarski <jpivarski at gmail.com> wrote:
> Hi everyone,
> Adrian Price-Whelan recommended that I ask my question here, since it would reach a greater number of people involved in astronomical software.
> I'm a developer of Awkward Array, a Python package for manipulating large, irregularly shaped datasets: arrays with variable-length lists, nested records, missing values, or mixed data types. The interface is a strict generalization of NumPy: you can slice jagged arrays as though they were ordinary multidimensional arrays, and there are new functions that only make sense in the context of irregular data. Like NumPy, the actual calculations are precompiled loops on internally homogeneous arrays, and we're expanding it to include GPUs transparently (irregular data on GPUs in a NumPy-like syntax).
> This package was developed for particle physics (variable numbers of particles emerging from an array of collision events), but it seems like these problems would exist in other fields as well. Right now, we're working on a proposal to find data analysis projects that need to deal with large, irregularly structured data to see if Awkward Array is applicable and if it can be made more useful for them. Ideally, this would motivate more interoperability with other scientific Python libraries. (We can already use Awkward Arrays in Numba; we're working on cuDF, Dask, and Zarr. Adrian also recommended ASDF, which I'm looking into now.)
> Does anyone have or know about a data analysis project that is currently limited by this combination of large + irregular data? Is anyone interested in collaborating?
> Thank you!
> -- Jim
> AstroPy mailing list
> AstroPy at python.org
More information about the AstroPy