On Tue, Jan 9, 2018 at 12:53 PM, Tyler Reddy <tyler.je.reddy(a)gmail.com> wrote:
> One common issue in computational geometry is the need to operate rapidly on
> arrays with "heterogeneous shapes."
> So, an array that has rows with different numbers of columns -- shape (1,3)
> for the first polygon and shape (1, 12) for the second polygon and so on.
> This seems like a particularly nasty scenario when the loss of "homogeneity"
> in shape precludes traditional vectorization -- I think numpy effectively
> converts these to dtype=object, etc. I don't
> think is necessarily a BLAS issue since wrapping comp. geo. libraries does
> happen in a subset of cases to handle this, but if there's overlap in
> utility you could pass it along I suppose.
You might be interested in this discussion of "Batch BLAS":
I didn't get into it in the draft response, because it didn't seem
like something where NumPy/SciPy have any useful experience to offer,
but it sounds like there are people worrying about this case.
Nathaniel J. Smith -- https://vorpus.org
As mentioned earlier , there's work underway to revise and
update the BLAS standard -- e.g. we might get support for strided
arrays and lose xerbla! There's a draft at . They're interested in
feedback from users, so I've written up a first draft of comments
about what we would like as NumPy/SciPy developers. This is very much
a first attempt -- I know we have lots of people who are more expert
on BLAS than me on these lists :-). Please let me know what you think.
# Comments from NumPy / SciPy developers on "A Proposal for a
These are comments on [A Proposal for a Next-Generation
(version as of 2017-12-13), from the perspective of the developers of
the NumPy and SciPy libraries. We hope this feedback is useful, and
welcome further discussion.
## Who are we?
NumPy and SciPy are the two foundational libraries of the Python
numerical ecosystem, and one of their duties is to wrap BLAS and
expose it for the use of other Python libraries. (NumPy primarily
provides a GEMM wrapper, while SciPy exposes more specialized
operations.) It's unclear how many users we have exactly, but we
certainly ship multiple million copies of BLAS every month, and
provide one of the most popular numerical toolkits for both novice and
Looking at the original BLAS and LAPACK interfaces, it often seems
that their imagined user is something like a classic supercomputer
consumer, who writes code directly in Fortran or C against the BLAS
API, and where the person writing the code and running the code are
the same. NumPy/SciPy are coming from a very different perspective:
our users generally know nothing about the details of the underlying
BLAS; they just want to describe their problem in some high-level way,
and the library is responsible for making it happen as efficiently as
possible, and is often integrated into some larger system (e.g. a
real-time analytics platform embedded in a web server).
When it comes to our BLAS usage, we mostly use only a small subset of
the routines. However, as "consumer software" used by a wide variety
of users with differing degress of technical expertise, we're expected
to Just Work on a wide variety of systems, and with as many different
vendor BLAS libraries as possible. On the other hand, the fact that
we're working with Python means we don't tend to worry about small
inefficiencies that will be lost in the noise in any case, and are
willing to sacrifice some performance to get more reliable operation
across our diverse userbase.
## Comments on specific aspects of the proposal
### Data Layout
We are **strongly in favor** of the proposal to support arbitrary
strided data layouts. Ideally, this would support strides *specified
in bytes* (allowing for unaligned data layouts), and allow for truly
arbitrary strides, including *zero or negative* values. However, we
think it's fine if some of the weirder cases suffer a performance
Rationale: NumPy – and thus, most of the scientific Python ecosystem –
only has one way of representing an array: the `numpy.ndarray` type,
which is an arbitrary dimensional tensor with arbitrary strides. It is
common to encounter matrices with non-trivial strides. For example::
# Make a 3-dimensional tensor, 10 x 9 x 8
t = np.zeros((10, 9, 8))
# Considering this as a stack of eight 10x9 matrices, extract the first:
mat = t[:, :, 0]
Now `mat` has non-trivial strides on both axes. (If running this in a
Python interpreter, you can see this by looking at the value of
`mat.strides`.) Another case where interesting strides arise is when
which is the name for NumPy's rules for stretching arrays to make
their shapes match. For example, in an expression like::
np.array([1, 2, 3]) + 1
the scalar `1` is "broadcast" to create a vector `[1, 1, 1]`. This is
accomplished without allocating memory, by creating a vector with
settings length = 3, strides = 0 – so all the elements share a single
location in memory. Similarly, by using negative strides we can
reverse an array without allocating memory::
a = np.array([1, 2, 3])
a_flipped = a[::-1]
Now `a_flipped` has the value `[3, 2, 1]`, while sharing storage with
the array `a = [1, 2, 3]`. Misaligned data is also possible (e.g. an
array of 8-byte doubles with a 9-byte stride), though it arises more
rarely. (An example of when it might occurs is in an on-disk data
format that alternates between storing a double value and then a
single byte value, which is then memory-mapped.)
While this array representation is very flexible and useful, it makes
interfacing with BLAS a challenge: how do you perform a GEMM operation
on two arrays with arbitrary strides? Currently, NumPy attempts to
detect a number of special cases: if the strides in both arrays imply
a column-major layout, then call BLAS directly; if one of them has
strides corresponding to a row-major layout, then set the
corresponding `transA`/`transB` argument, etc. – and if all else
fails, either copy the data into a contiguous buffer, or else fall
back on a naive triple-nested-loop GEMM implementation. (There's also
a check where if we can determine through examining the arrays' data
pointers and strides that they're actually transposes of each other,
then we instead dispatch to SYRK.)
So for us, native stride support in the BLAS has two major advantages:
- It allows a wider variety of operations to be transparently handled
using high-speed kernels.
- It reduces the complexity of the NumPy/BLAS interface code.
Any kind of stride support produces both of these advantages to some
extent. The ideal would be if BLAS supports strides specified in bytes
(not elements), and allowing truly arbitrary values, because in this
case, we can simply pass through our arrays directly to the library
without having to do any fiddly checking at all. Note that for these
purposes, it's OK if "weird" arrays pay a speed penalty: if the BLAS
didn't support them, we'd just have to fall back on some even worse
strategy for handling them ourselves. And this approach would mean
that if some BLAS vendors do at some point decide to optimize these
cases, then our users will immediately see the advantages.
We are **strongly in favor** of the draft's proposal to replace XERBLA
with proper error codes. XERBLA is fine for one-off programs in
Fortran, but awful for wrapper libraries like NumPy/SciPy.
### Handling of exceptional values
A recurring issue in low-level linear-algebra libraries is that they
may crash, freeze, or produce unexpected results when receiving
non-finite inputs. Of course as a wrapper library, NumPy/SciPy can't
control what kind of strange data our users might pass in, and we have
to produce sensible results regardless, so we sometimes accumulate
hacks like having to validate all incoming data for exceptional values
We support the proposal's recommendation of "NaN interpretation (4)".
We also note that contrary to the note under "Interpretation (1)"; in
the R statistical environment NaN does *not* represent missing data. R
maintains a clear distinction between "NA" (missing) values and "NaN"
(invalid) values. This is important for various statistical procedures
such as imputation, where you might want to estimate the true value of
an unmeasured variable, but you don't want to estimate the true value
of 0/0. For efficiency, they do perform some calculations on NA values
by storing them as NaNs with a specific payload; in the R context,
therefore, it's particularly valuable if operations **correctly
propagate NaN payloads**. NumPy does not yet provide first-class
missing value support, but we plan to add it within the next 1-2
years, and when we do it seems likely that we'll have similar needs to
### BLAS G2
NumPy has a rich type system, including 16-, 32-, and 64-bit floats
(plus extended precision on some platforms), a similar set of complex
values, and is further extensible with new types. We have a similarly
rich dispatch system for operations that attempts to find the best
matching optimized-loop for an arbitrary set of input types. We're
thus quite interested in the proposed mixed-type operations, and if
implemented we'll probably start transparently taking advantage of
them (i.e., in cases where we're currently upcasting to perform a
requested operation, we may switch to using the native precision
operation without requiring any changes in user code).
Two comments, though:
1. The various tricks to make names less redundant (e.g., specifying
just one type if they're all the same) are convenient for those
calling these routines by hand, but rather inconvenient for those of
us who will be generating a table mapping different type combinations
to different functions. When designing a compression scheme for names,
please remember that the scheme will have to be unambiguously
implemented in code by multiple projects. Another option to consider:
ask vendors to implement the full "uncompressed" names, and then
provide a shim library mapping short "convenience" names to their full
2. Regarding the suggestion that the naming scheme will allow for a
wide variety of differently typed routines, but that only some subset
will be implemented by any particular vendor: we are worried about
this subset business. For a library like NumPy that wants to wrap
"all" the provided routines, while supporting "all" the different BLAS
libraries ... how will this work? **How do we know which routines any
particular library provides?** What if new routines are added to the
### Reproducible BLAS
This is interesting, but we don't have any particularly useful comments.
### Batch BLAS
NumPy/SciPy actually provide batched operations in many cases,
currently implemented using a `for` loop around individual calls to
BLAS/LAPACK. However, our interface is relatively restricted compared
to some of the proposals considered here: generally all we need is the
ability to perform an operation on two "stacks" of size-homogenous
matrices, with the offset between matrices determined by an arbitrary
(potentially zero) stride.
### Fixed-point BLAS
This is interesting, but we don't have any particularly useful comments.
Nathaniel J. Smith -- https://vorpus.org
I have a related question:
a = numpy.array([[0,1,2],[3,4]])
assert a.ndim == 1
b = numpy.array([[0,1,2],[3,4,5]])
assert b.ndim == 2
Is there an elegant way to force b to remain a 1-dim object array?
I have a use case where normally the sublists are of different lengths, but I get a completely different structure when they are (coincidentally in my case) of the same length.
Thanks and best regards, Martin
Martin Gfeller, Swisscom / Enterprise / Banking / Products / Quantax
Date: Sun, 31 Dec 2017 00:11:48 +0100
From: Derek Homeier <derek(a)astro.physik.uni-goettingen.de>
To: Discussion of Numerical Python <numpy-discussion(a)python.org>
Subject: Re: [Numpy-discussion] array - dimension size of 1-D and 2-D
Content-Type: text/plain; charset=utf-8
On 30 Dec 2017, at 5:38 pm, Vinodhini Balusamy <me.vinob(a)gmail.com> wrote:
> Just one more question from the details you have provided which from
> my understanding strongly seems to be Design [DEREK] You cannot create
> a regular 2-dimensional integer array from one row of length 3
>> and a second one of length 0. Thus np.array chooses the next most
>> basic type of array it can fit your input data in
Indeed, the general philosophy is to preserve the structure and type of your input data as far as possible, i.e. a list is turned into a 1d-array, a list of lists (or tuples etc?) into a 2d-array,_ if_ the sequences are of equal length (even if length 1).
As long as there is an unambiguous way to convert the data into an array (see below).
> Which is the case, only if an second one of length 0 is given.
> What about the case 1 :
> >>> x12 = np.array([[1,2,3]])
> >>> x12
> array([[1, 2, 3]])
> >>> print(x12)
> [[1 2 3]]
> >>> x12.ndim
> This seems to take 2 dimension.
Yes, structurally this is equivalent to your second example
>>> x12 = np.array([[1,2,3],[0,0,0]])
[[1 2 3]
[0 0 0]]
> I presumed the above case and the case where length 0 is provided to be treated same(I mean same behaviour).
> Correct me if I am wrong.
In this case there is no unambiguous way to construct the array - you would need a shape (2, 3) array to store the two lists with 3 elements in the first list. Obviously x12 would be np.array([1,2,3]), but what should be the value of x12, if the second list is empty - it could be zeros, or repeating x12, or simply undefined. np.array([1, 2, 3], ]) would be even less clearly defined.
These cases where there is no obvious ?right? way to create the array have usually been discussed at some length, but I don?t know if this is fully documented in some place. For the essentials, see
note also the upcasting rules if you have e.g. a mix of integers and reals or complex numbers, and also how to control shape or data type explicitly with the respective keywords.
I am new to bumpy and started working on basics.
I need clarification for the below behaviour.
>>> x12 = np.array([[1,2,3]])
[[1 2 3]]
Tried creating a 2-D array.
>>> x12 = np.array([[1,2,3],[0,0,0]])
[[1 2 3]
[0 0 0]]
>>> x12 = np.array([[1,2,3],])
[list([1, 2, 3]) list()]
In case 2, I am trying to understand why it becomes 1 dimentional ?!?!
>>> x12 = np.array([1,2,3])
[1 2 3]
This seems reasonable to me to be considered as 1 dimensional.
Would like to understand case 2 a bit more to get to know if i am missing something.
Will be much appreciated if someone to explain me a bit.
Thanks in advance.
For building and testing Numpy on the PowerPC arch, I've requested and
obtained a VM instance at OSUOSL, which provides free hosting for
open-source projects. This was suggested by @npanpaliya on github.
I have an immediate use for it to fix some float128 ppc problems, but it
can be useful to other numpy devs too. If you are a numpy dev and want
access to a ppc system for testing, I can gladly create an account for
you. For now, just send me an email with your desired username and a
public ssh key. We can discuss permissions and management depending on
=== VM details ===
It is single node, Ubuntu 16.04.1, ppc64 POWER (Big Endian) arch,
"m1_small" flavor, with usage policy at
I've installed the packages needed to build and test numpy.
I ran tests: Numpy 1.13.3 has 2 unimportant test failures, but as
expected 1.14 has ~20 more failures related to float128 reprs.
=== Other services to Consider ===
I did not request it but they provide a "Power CI" service which allows
automated testing of ppc arch on github. If we want this I can look into
it more. We might also consider asking for a node with ppc64le (Little
Endian) arch for testing, but maybe the big-endian one is enough.
I've started working on the proposal discussed in this thread:
You can see how I modified PEP 1 here:
I used numbers (i.e., ``nep-0000.rst``) in the file names, since it
seems more standard and numbers are easier to refer to in discussions.
I don't have a strong preference, so I am happy to change it. If
using numbers seems reasonable, I will number the existing NEPs.
Moreover, if the preamble seems reasonable, I will go through the
existing NEPs and make sure they all have compliant headers. For now,
I think auto-generating the index is unnecessary. Once we are happy
with the purpose and template NEPs as well as automatically publish to
GH pages, we can always go back and write a little script to
autogenerate the index using the preamble information.
Finally, I started preparing to host the NEPs online at:
If that seems reasonable, I will need someone to create a neps repo.
We also need to decide whether to move ``numpy/doc/neps`` to the
master branch of the new neps repo or whether to leave it where it is.
I don't have a strong opinion either way. However, if no one else
minds leaving it where it is, not moving it is slightly less work.
Either way, the work is trivial. Regardless of where the source
resides, we can host the generated web page in the same location
It is probably also worth having
I recently started working with Python and GPU,
found that there're lot's of libraries provides
ndarray like interface such as CuPy/PyOpenCL/PyCUDA/etc.
I got so confused which one to use.
Is there any reason not to support GPU computation
directly on the NumPy itself?
I want NumPy to support GPU computation as a standard.
If the reason is just about human resources,
I'd like to try implementing GPU support on my NumPy fork.
My goal is to create standard NumPy interface which supports
both CUDA and OpenCL, and more devices if available.
Are there other reason not to support GPU on NumPy?