I have an idea that I've discussed with a few people in person, and the
feedback has generally been positive. So I'd like to bring it up here, to
get a sense of if this is going to fly. Note that this is NOT a proposal at
Idea in five words: define a NumPy API standard
- Many libraries, both in Python and other languages, have APIs copied from
or inspired by NumPy.
- All of those APIs are incomplete, and many deviate from NumPy either by
accident or on purpose.
- The NumPy API is very large and ill-defined.
Libraries with a NumPy-like API
- GPU: Tensorflow, PyTorch, CuPy, MXNet
- distributed: Dask
- sparse: pydata/sparse
- other: tensorly, uarray/unumpy, ...
In other languages:
- Go: Gonum
- Rust: rust-ndarray, rust-numpy
- C++: xtensor
- C: XND
- Java: ND4J
- C#: NumSharp, numpy.net
- Ruby: Narray, xnd-ruby
- R: Rray
This is an incomplete list. Xtensor and XND aim for multi-language support.
These libraries are of varying completeness, size and quality - everything
from one-person efforts that have just started, to large code bases that go
beyond NumPy in features or performance.
Define a standard for "the NumPy API" (or "NumPy core API", or .... - it's
just a name for now), that
other libraries can use as a guide on what to implement and when to say
they are NumPy compatible.
- Define a NumPy API standard, containing an N-dimensional array object and
a set of functions.
- List of functions and ndarray methods to include.
- Recommendations about where to deviate from NumPy (e.g. leave out array
Out of scope, or to be treated separately:
- dtypes and casting
- function behavior (e.g. returning views vs. copies, which keyword
arguments to include)
- indexing behavior
- submodules (fft, random, linalg)
Who cares and why?
- Library authors: this saves them work and helps them make decisions.
- End users: consistency between libraries/languages, helps transfer
knowledge and understand code
- NumPy developers: gives them a vocabulary for "the NumPy API",
"compatible with NumPy", etc.
- If not done well, we just add to the confusion rather than make things
- Opportunity for endless amount of bikeshedding
Some more rationale:
We (NumPy devs) mostly have a shared understanding of what is "core NumPy
functionality", what we'd like to remove but are stuck with, what's not
used a whole lot, etc. Examples: financial functions don't belong, array
creation methods with weird names like np.r_ were a mistake. We are not
communicating this in any way though. Doing so would be helpful. Perhaps
this API standard could even have layers, to indicate what's really core,
what are secondary sets of functionality to include in other libraries, etc.
Discussion and next steps
What I'd like to get a sense of is:
- Is this a good idea to begin with?
- What should the scope be?
- What should the format be (a NEP, some other doc, defining in code)?
If this idea is well-received, I can try to draft a proposal during the
next month (help/volunteers welcome!). It can then be discussed at SciPy'19
- high-bandwidth communication may help to get a set of people on the same
page and hash out a lot of details.
I know how to reshape arrays, my problem is a little more complicated than
I am looking for the most efficient way to do the following and an example
1) I have a an array of bytes b = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
This bytes array represents a width of 2 and a height of 3.
Each 2 bytes in that bytes array makes an unsigned 16 bit integer.
2) I want to get the following 2 dimensional array from the bytes array
where each item in the two dimensional array is an unsigned 16 bit integer
[(1,2) , (3,4)],
[(5,6) , (7,8)],
[(9,10) , (11,12)]
in the first row there are 2 elements each one is made of 2 bytes from the
bytes array etc...
(1, 2) = b + b << 8
(3, 4) = b + b << 8
What is the most efficient way to achieve that with numpy ?
 << 8 ,, [3, 4]
we have reserved a room for a Developer meeting at SciPy. The meeting
is planned for Friday at 1pm in room 108.
I have created a tentative Agenda with some broad discussion points at:
Please feel free to edit or add the Agenda.
All the Best,
In PR https://github.com/numpy/numpy/pull/13812, Thrasibule rewrote the
algorithm used with a faster alternative branch for some cases. The
faster algorithm does not necessarily shuffle the results, so for
instance gen.choice(2000, 2000, replace=False) may simply return
arange(2000). In the old code the result is always shuffled. We propose
adding a new kwarg "shuffle" that defaults to True. Users looking for
maximum performance may choose to use shuffle=False.
Since this is a behavioral change (although only in the new Generator
class, the new code will not be used in RandomState), we are proposing
it to the mailing list
due to the SciPy conference we will _not_ have a Community meeting this
week. We will have a developer meeting at the SciPy conference on
Friday at 1pm in room 108 and take part in the Sprints. I will send a
tentative Agenda/things to discuss in a separate email.
I was recently converting some Matlab codes into python, where I recently
stumble upon bitget(n,x) function of Matlab, b = bitget(A,bit) returns the
bit value at the position bit in integer array A. I was unable to find any
Numpy equivalent of this, so should I create a pull request in Numpy for