start of an array (tensor) and dataframe API standardization initiative
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
Hi all, I'd like to share this announcement blog post about the creation of a consortium for array and dataframe API standardization here: https://data-apis.org/blog/announcing_the_consortium/. It's still in the beginning stages, but starting to take shape. We have participation from one or more maintainers of most array and tensor libraries - NumPy, TensorFlow, PyTorch, MXNet, Dask, JAX, Xarray. Stephan Hoyer, Travis Oliphant and myself have been providing input from a NumPy perspective. The effort is very much related to some of the interoperability work we've been doing in NumPy (e.g. it could provide an answer to what's described in https://numpy.org/neps/nep-0037-array-module.html#requesting-restricted-subs... ). At this point we're looking for feedback from maintainers at a high level (see the blog post for details). Also important: the python-record-api tooling and data in its repo has very granular API usage data, of the kind we could really use when making decisions that impact backwards compatibility. Cheers, Ralf
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
Hi all, I'd like to share an update on this topic. The draft array API standard is now ready for wider review: - Blog post: https://data-apis.org/blog/array_api_standard_release - Array API standard document: https://data-apis.github.io/array-api/latest/ - Repo: https://github.com/data-apis/array-api/ It would be great if people - and in particular, NumPy maintainers - could have a look at it and see if that looks sensible from a NumPy perspective and whether the goals and benefits of adopting it are described clearly enough and are compelling. I'm sure a NEP will be needed for proposing adoption of the standard once it is closer to completion, and work out what that means for interaction with the array protocol NEPs and/or NEP 37, and how an implementation would look. It's a bit early for that now, I'm thinking maybe by the end of the year. Some initial discussion now would be useful though, since it's easier to make changes now rather than when that API standard is already further along. Cheers, Ralf On Mon, Aug 17, 2020 at 9:34 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Wed, Nov 11, 2020 at 10:56 AM Ilhan Polat <ilhanpolat@gmail.com> wrote:
It's not closed, this is the start of community review so if things are missing or need changing, now is a good time to bring them up - please have a look at CONTRIBUTING.md in the array-api repo. What I would personally expect is that most discussion will be about the bigger picture topics and about the clarity of the document. There may be some individual functions that are important to add, if that's what you have in mind I would recommend looking at some merged PRs to see how the analysis is done (e.g. usage data, comparison between existing libraries). https://github.com/data-apis/array-api/pull/42 is a good example. Cheers, Ralf
![](https://secure.gravatar.com/avatar/2714f300a64add63013275c3ee702220.jpg?s=120&d=mm&r=g)
This is great! I'm working on some Haskell based mmap shared array lib, with Python like surface language API. I would adhere to such standard very willingly. A quick skim but I can't find dataframe related info, that's scheduled for the future? Will take Pandas as primary reference? Thanks with best regards, Compl
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Wed, Nov 11, 2020 at 12:15 PM YueCompl <compl.yue@icloud.com> wrote:
Awesome. Library authors from other languages is definitely something else we had in mind, so glad to hear it's helpful. A quick skim but I can't find dataframe related info, that's scheduled for
the future? Will take Pandas as primary reference?
Yes, that is planned but will take a while longer. Dataframes are less mature, and Pandas itself is still very much in flux (the first proposal after the 1.0 release was "let's deprecate <stuff> for 2.0", so it's a more complex puzzle. Pandas is an important reference, but I'd expect the end result to deviate more from Pandas than the array API differs from NumPy. Cheers, Ralf
![](https://secure.gravatar.com/avatar/72f994ca072df3a3d2c3db8a137790fd.jpg?s=120&d=mm&r=g)
On 11/10/20 8:19 PM, Ralf Gommers wrote:
I think it is compelling for a first version. The test suite and benchmark suite will be valuable tools. I hope future versions standardize complex numbers as a dtype. I realize there is a limit to the breadth of the scope of functions to be covered. Is there a page that lists them in one place? For instance I tried to look up what the standard has to say on issue https://github.com/numpy/numpy/issues/17760 about using bincount on unt64 arrays. It took me a while to figure out that bincount was not in the API (although unique(..., return_counts) is). Matti
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Thu, Nov 12, 2020 at 1:54 PM Matti Picus <matti.picus@gmail.com> wrote:
Yes, that's definitely a desire - when implementations are there/ready. At the moment most libraries have very incomplete support for complex dtypes, largely because they're not very important for deep learning. Also NumPy's implementations/choices are shaky in places, and that's being turned up by the PyTorch effort that's ongoing now to implement complex dtype support in a NumPy-compatible way. I realize there is a limit to
That's a good idea and still missing, thanks for asking. The test suite that's in development has a complete list [1]. In the document itself Sphinx search works, but it should be easier to get a complete overview perhaps (although it requires some thought - the NumPy docs don't have everything on one page either). [1] https://github.com/data-apis/array-api-tests/tree/master/array_api_tests/fun... Cheers, Ralf
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
Hi all, I'd like to share an update on this topic. The draft array API standard is now ready for wider review: - Blog post: https://data-apis.org/blog/array_api_standard_release - Array API standard document: https://data-apis.github.io/array-api/latest/ - Repo: https://github.com/data-apis/array-api/ It would be great if people - and in particular, NumPy maintainers - could have a look at it and see if that looks sensible from a NumPy perspective and whether the goals and benefits of adopting it are described clearly enough and are compelling. I'm sure a NEP will be needed for proposing adoption of the standard once it is closer to completion, and work out what that means for interaction with the array protocol NEPs and/or NEP 37, and how an implementation would look. It's a bit early for that now, I'm thinking maybe by the end of the year. Some initial discussion now would be useful though, since it's easier to make changes now rather than when that API standard is already further along. Cheers, Ralf On Mon, Aug 17, 2020 at 9:34 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Wed, Nov 11, 2020 at 10:56 AM Ilhan Polat <ilhanpolat@gmail.com> wrote:
It's not closed, this is the start of community review so if things are missing or need changing, now is a good time to bring them up - please have a look at CONTRIBUTING.md in the array-api repo. What I would personally expect is that most discussion will be about the bigger picture topics and about the clarity of the document. There may be some individual functions that are important to add, if that's what you have in mind I would recommend looking at some merged PRs to see how the analysis is done (e.g. usage data, comparison between existing libraries). https://github.com/data-apis/array-api/pull/42 is a good example. Cheers, Ralf
![](https://secure.gravatar.com/avatar/2714f300a64add63013275c3ee702220.jpg?s=120&d=mm&r=g)
This is great! I'm working on some Haskell based mmap shared array lib, with Python like surface language API. I would adhere to such standard very willingly. A quick skim but I can't find dataframe related info, that's scheduled for the future? Will take Pandas as primary reference? Thanks with best regards, Compl
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Wed, Nov 11, 2020 at 12:15 PM YueCompl <compl.yue@icloud.com> wrote:
Awesome. Library authors from other languages is definitely something else we had in mind, so glad to hear it's helpful. A quick skim but I can't find dataframe related info, that's scheduled for
the future? Will take Pandas as primary reference?
Yes, that is planned but will take a while longer. Dataframes are less mature, and Pandas itself is still very much in flux (the first proposal after the 1.0 release was "let's deprecate <stuff> for 2.0", so it's a more complex puzzle. Pandas is an important reference, but I'd expect the end result to deviate more from Pandas than the array API differs from NumPy. Cheers, Ralf
![](https://secure.gravatar.com/avatar/72f994ca072df3a3d2c3db8a137790fd.jpg?s=120&d=mm&r=g)
On 11/10/20 8:19 PM, Ralf Gommers wrote:
I think it is compelling for a first version. The test suite and benchmark suite will be valuable tools. I hope future versions standardize complex numbers as a dtype. I realize there is a limit to the breadth of the scope of functions to be covered. Is there a page that lists them in one place? For instance I tried to look up what the standard has to say on issue https://github.com/numpy/numpy/issues/17760 about using bincount on unt64 arrays. It took me a while to figure out that bincount was not in the API (although unique(..., return_counts) is). Matti
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Thu, Nov 12, 2020 at 1:54 PM Matti Picus <matti.picus@gmail.com> wrote:
Yes, that's definitely a desire - when implementations are there/ready. At the moment most libraries have very incomplete support for complex dtypes, largely because they're not very important for deep learning. Also NumPy's implementations/choices are shaky in places, and that's being turned up by the PyTorch effort that's ongoing now to implement complex dtype support in a NumPy-compatible way. I realize there is a limit to
That's a good idea and still missing, thanks for asking. The test suite that's in development has a complete list [1]. In the document itself Sphinx search works, but it should be easier to get a complete overview perhaps (although it requires some thought - the NumPy docs don't have everything on one page either). [1] https://github.com/data-apis/array-api-tests/tree/master/array_api_tests/fun... Cheers, Ralf
participants (4)
-
Ilhan Polat
-
Matti Picus
-
Ralf Gommers
-
YueCompl