Mailman 3 January 2018 - NumPy-Discussion

NumPy 1.14.0 release
by Charles R Harris Jan. 14, 2018

Jan. 14, 2018

Hi All, On behalf of the NumPy team, I am pleased to announce NumPy 1.14.0. Numpy 1.14.0 is the result of seven months of work and contains a large number of bug fixes and new features, along with several changes with potential compatibility issues. The major change that users will notice are the stylistic changes in the way numpy arrays and scalars are printed, a change that will affect doctests. See the release notes for details on how to preserve the old style printing when needed. A major decision affecting future development concerns the schedule for dropping Python 2.7 support in the runup to 2020. The decision has been made to support 2.7 for all releases made in 2018, with the last release being designated a long term release with support for bug fixes extending through the end of 2019. Starting from January, 2019 support for 2.7 will be dropped in all new releases. More details can be found in the relevant NEP <https://github.com/numpy/numpy/blob/master/doc/neps/dropping-python2.7-prop…> . This release supports Python 2.7 and 3.4 - 3.6. Wheels for the release are available on PyPI. Source tarballs, zipfiles, release notes, and the changelog are available on github <https://github.com/numpy/numpy/releases/tag/v1.14.0>. *Highlights* - The ``np.einsum`` function uses BLAS when possible - ``genfromtxt``, ``loadtxt``, ``fromregex`` and ``savetxt`` can now handle files with arbitrary Python supported encoding. - Major improvements to printing of NumPy arrays and scalars. *New functions* - ``parametrize``: decorator added to numpy.testing - ``chebinterpolate``: Interpolate function at Chebyshev points. - ``format_float_positional`` and ``format_float_scientific`` : format floating-point scalars unambiguously with control of rounding and padding. - ``PyArray_ResolveWritebackIfCopy`` and ``PyArray_SetWritebackIfCopyBase``, new C-API functions useful in achieving PyPy compatibity. *Contributors* A total of 100 people contributed to this release. People with a "+" by their names contributed a patch for the first time. * Alexey Brodkin + * Allan Haldane * Andras Deak + * Andrew Lawson + * Anna Chiara + * Antoine Pitrou * Bernhard M. Wiedemann + * Bob Eldering + * Brandon Carter * CJ Carey * Charles Harris * Chris Lamb * Christoph Boeddeker + * Christoph Gohlke * Daniel Hrisca + * Daniel Smith * Danny Hermes * David Freese * David Hagen * David Linke + * David Schaefer + * Dillon Niederhut + * Egor Panfilov + * Emilien Kofman * Eric Wieser * Erik Bray + * Erik Quaeghebeur + * Garry Polley + * Gunjan + * Han Shen + * Henke Adolfsson + * Hidehiro NAGAOKA + * Hemil Desai + * Hong Xu + * Iryna Shcherbina + * Jaime Fernandez * James Bourbeau + * Jamie Townsend + * Jarrod Millman * Jean Helie + * Jeroen Demeyer + * John Goetz + * John Kirkham * John Zwinck * Jonathan Helmus * Joseph Fox-Rabinovitz * Joseph Paul Cohen + * Joshua Leahy + * Julian Taylor * Jörg Döpfert + * Keno Goertz + * Kevin Sheppard + * Kexuan Sun + * Konrad Kapp + * Kristofor Maynard + * Licht Takeuchi + * Loïc Estève * Lukas Mericle + * Marten van Kerkwijk * Matheus Portela + * Matthew Brett * Matti Picus * Michael Lamparski + * Michael Odintsov + * Michael Schnaitter + * Michael Seifert * Mike Nolta * Nathaniel J. Smith * Nelle Varoquaux + * Nicholas Del Grosso + * Nico Schlömer + * Oleg Zabluda + * Oleksandr Pavlyk * Pauli Virtanen * Pim de Haan + * Ralf Gommers * Robert T. McGibbon + * Roland Kaufmann * Sebastian Berg * Serhiy Storchaka + * Shitian Ni + * Spencer Hill + * Srinivas Reddy Thatiparthy + * Stefan Winkler + * Stephan Hoyer * Steven Maude + * SuperBo + * Thomas Köppe + * Toon Verstraelen * Vedant Misra + * Warren Weckesser * Wirawan Purwanto + * Yang Li + * Ziyan Zhou + * chaoyu3 + * orbit-stabilizer + * solarjoe * wufangjie + * xoviat + * Élie Gouzien + Cheers, Charles Harris

9 12

Understanding np.min behaviour with nan
by Andrew Nelson Jan. 11, 2018

Jan. 11, 2018

I'm having some trouble understanding the behaviour of np.min when used with NaN. In the documentation of np.amin it says: "NaN values are propagated, that is if at least one item is NaN, the corresponding min value will be NaN as well. ". It doesn't say that in some circumstances there will be RuntimeWarning's raised (presumably depending on the errstate). Consider the following: >>> import numpy as np >>> np.min([1, np.nan]) /Users/andrew/miniconda3/envs/dev3/lib/python3.6/site-packages/numpy/core/_methods.py:29: RuntimeWarning: invalid value encountered in reduce return umr_minimum(a, axis, None, out, keepdims) nan >>> np.min([1, 2, np.nan]) nan >>> np.min([np.nan, 1, 2]) nan >>> np.min([np.nan, 1]) nan Why is there a RuntimeWarning for the first example, but not the others? Is this expected behaviour? -- _____________________________________ Dr. Andrew Nelson _____________________________________

4 5

Re: [Numpy-discussion] [SciPy-Dev] RFC: comments to BLAS committee from numpy/scipy devs
by Nathaniel Smith Jan. 10, 2018

Jan. 10, 2018

On Tue, Jan 9, 2018 at 12:53 PM, Tyler Reddy <tyler.je.reddy(a)gmail.com> wrote: > One common issue in computational geometry is the need to operate rapidly on > arrays with "heterogeneous shapes." > > So, an array that has rows with different numbers of columns -- shape (1,3) > for the first polygon and shape (1, 12) for the second polygon and so on. > > This seems like a particularly nasty scenario when the loss of "homogeneity" > in shape precludes traditional vectorization -- I think numpy effectively > converts these to dtype=object, etc. I don't > think is necessarily a BLAS issue since wrapping comp. geo. libraries does > happen in a subset of cases to handle this, but if there's overlap in > utility you could pass it along I suppose. You might be interested in this discussion of "Batch BLAS": https://docs.google.com/document/d/1DY4ImZT1coqri2382GusXgBTTTVdBDvtD5I14QH… I didn't get into it in the draft response, because it didn't seem like something where NumPy/SciPy have any useful experience to offer, but it sounds like there are people worrying about this case. -n -- Nathaniel J. Smith -- https://vorpus.org

1 0

RFC: comments to BLAS committee from numpy/scipy devs
by Nathaniel Smith Jan. 10, 2018

Jan. 10, 2018

Hi all, As mentioned earlier [1][2], there's work underway to revise and update the BLAS standard -- e.g. we might get support for strided arrays and lose xerbla! There's a draft at [3]. They're interested in feedback from users, so I've written up a first draft of comments about what we would like as NumPy/SciPy developers. This is very much a first attempt -- I know we have lots of people who are more expert on BLAS than me on these lists :-). Please let me know what you think. -n [1] https://mail.python.org/pipermail/numpy-discussion/2017-November/077420.html [2] https://mail.python.org/pipermail/scipy-dev/2017-November/022267.html [3] https://docs.google.com/document/d/1DY4ImZT1coqri2382GusXgBTTTVdBDvtD5I14QH… ----- # Comments from NumPy / SciPy developers on "A Proposal for a Next-Generation BLAS" These are comments on [A Proposal for a Next-Generation BLAS](https://docs.google.com/document/d/1DY4ImZT1coqri2382GusXgBTTTVdBDvtD… (version as of 2017-12-13), from the perspective of the developers of the NumPy and SciPy libraries. We hope this feedback is useful, and welcome further discussion. ## Who are we? NumPy and SciPy are the two foundational libraries of the Python numerical ecosystem, and one of their duties is to wrap BLAS and expose it for the use of other Python libraries. (NumPy primarily provides a GEMM wrapper, while SciPy exposes more specialized operations.) It's unclear how many users we have exactly, but we certainly ship multiple million copies of BLAS every month, and provide one of the most popular numerical toolkits for both novice and expert users. Looking at the original BLAS and LAPACK interfaces, it often seems that their imagined user is something like a classic supercomputer consumer, who writes code directly in Fortran or C against the BLAS API, and where the person writing the code and running the code are the same. NumPy/SciPy are coming from a very different perspective: our users generally know nothing about the details of the underlying BLAS; they just want to describe their problem in some high-level way, and the library is responsible for making it happen as efficiently as possible, and is often integrated into some larger system (e.g. a real-time analytics platform embedded in a web server). When it comes to our BLAS usage, we mostly use only a small subset of the routines. However, as "consumer software" used by a wide variety of users with differing degress of technical expertise, we're expected to Just Work on a wide variety of systems, and with as many different vendor BLAS libraries as possible. On the other hand, the fact that we're working with Python means we don't tend to worry about small inefficiencies that will be lost in the noise in any case, and are willing to sacrifice some performance to get more reliable operation across our diverse userbase. ## Comments on specific aspects of the proposal ### Data Layout We are **strongly in favor** of the proposal to support arbitrary strided data layouts. Ideally, this would support strides *specified in bytes* (allowing for unaligned data layouts), and allow for truly arbitrary strides, including *zero or negative* values. However, we think it's fine if some of the weirder cases suffer a performance penalty. Rationale: NumPy – and thus, most of the scientific Python ecosystem – only has one way of representing an array: the `numpy.ndarray` type, which is an arbitrary dimensional tensor with arbitrary strides. It is common to encounter matrices with non-trivial strides. For example:: # Make a 3-dimensional tensor, 10 x 9 x 8 t = np.zeros((10, 9, 8)) # Considering this as a stack of eight 10x9 matrices, extract the first: mat = t[:, :, 0] Now `mat` has non-trivial strides on both axes. (If running this in a Python interpreter, you can see this by looking at the value of `mat.strides`.) Another case where interesting strides arise is when performing ["broadcasting"](https://docs.scipy.org/doc/numpy-1.13.0/user/basics.broadcasting.html), which is the name for NumPy's rules for stretching arrays to make their shapes match. For example, in an expression like:: np.array([1, 2, 3]) + 1 the scalar `1` is "broadcast" to create a vector `[1, 1, 1]`. This is accomplished without allocating memory, by creating a vector with settings length = 3, strides = 0 – so all the elements share a single location in memory. Similarly, by using negative strides we can reverse an array without allocating memory:: a = np.array([1, 2, 3]) a_flipped = a[::-1] Now `a_flipped` has the value `[3, 2, 1]`, while sharing storage with the array `a = [1, 2, 3]`. Misaligned data is also possible (e.g. an array of 8-byte doubles with a 9-byte stride), though it arises more rarely. (An example of when it might occurs is in an on-disk data format that alternates between storing a double value and then a single byte value, which is then memory-mapped.) While this array representation is very flexible and useful, it makes interfacing with BLAS a challenge: how do you perform a GEMM operation on two arrays with arbitrary strides? Currently, NumPy attempts to detect a number of special cases: if the strides in both arrays imply a column-major layout, then call BLAS directly; if one of them has strides corresponding to a row-major layout, then set the corresponding `transA`/`transB` argument, etc. – and if all else fails, either copy the data into a contiguous buffer, or else fall back on a naive triple-nested-loop GEMM implementation. (There's also a check where if we can determine through examining the arrays' data pointers and strides that they're actually transposes of each other, then we instead dispatch to SYRK.) So for us, native stride support in the BLAS has two major advantages: - It allows a wider variety of operations to be transparently handled using high-speed kernels. - It reduces the complexity of the NumPy/BLAS interface code. Any kind of stride support produces both of these advantages to some extent. The ideal would be if BLAS supports strides specified in bytes (not elements), and allowing truly arbitrary values, because in this case, we can simply pass through our arrays directly to the library without having to do any fiddly checking at all. Note that for these purposes, it's OK if "weird" arrays pay a speed penalty: if the BLAS didn't support them, we'd just have to fall back on some even worse strategy for handling them ourselves. And this approach would mean that if some BLAS vendors do at some point decide to optimize these cases, then our users will immediately see the advantages. ### Error-reporting We are **strongly in favor** of the draft's proposal to replace XERBLA with proper error codes. XERBLA is fine for one-off programs in Fortran, but awful for wrapper libraries like NumPy/SciPy. ### Handling of exceptional values A recurring issue in low-level linear-algebra libraries is that they may crash, freeze, or produce unexpected results when receiving non-finite inputs. Of course as a wrapper library, NumPy/SciPy can't control what kind of strange data our users might pass in, and we have to produce sensible results regardless, so we sometimes accumulate hacks like having to validate all incoming data for exceptional values We support the proposal's recommendation of "NaN interpretation (4)". We also note that contrary to the note under "Interpretation (1)"; in the R statistical environment NaN does *not* represent missing data. R maintains a clear distinction between "NA" (missing) values and "NaN" (invalid) values. This is important for various statistical procedures such as imputation, where you might want to estimate the true value of an unmeasured variable, but you don't want to estimate the true value of 0/0. For efficiency, they do perform some calculations on NA values by storing them as NaNs with a specific payload; in the R context, therefore, it's particularly valuable if operations **correctly propagate NaN payloads**. NumPy does not yet provide first-class missing value support, but we plan to add it within the next 1-2 years, and when we do it seems likely that we'll have similar needs to R. ### BLAS G2 NumPy has a rich type system, including 16-, 32-, and 64-bit floats (plus extended precision on some platforms), a similar set of complex values, and is further extensible with new types. We have a similarly rich dispatch system for operations that attempts to find the best matching optimized-loop for an arbitrary set of input types. We're thus quite interested in the proposed mixed-type operations, and if implemented we'll probably start transparently taking advantage of them (i.e., in cases where we're currently upcasting to perform a requested operation, we may switch to using the native precision operation without requiring any changes in user code). Two comments, though: 1. The various tricks to make names less redundant (e.g., specifying just one type if they're all the same) are convenient for those calling these routines by hand, but rather inconvenient for those of us who will be generating a table mapping different type combinations to different functions. When designing a compression scheme for names, please remember that the scheme will have to be unambiguously implemented in code by multiple projects. Another option to consider: ask vendors to implement the full "uncompressed" names, and then provide a shim library mapping short "convenience" names to their full form. 2. Regarding the suggestion that the naming scheme will allow for a wide variety of differently typed routines, but that only some subset will be implemented by any particular vendor: we are worried about this subset business. For a library like NumPy that wants to wrap "all" the provided routines, while supporting "all" the different BLAS libraries ... how will this work? **How do we know which routines any particular library provides?** What if new routines are added to the list later? ### Reproducible BLAS This is interesting, but we don't have any particularly useful comments. ### Batch BLAS NumPy/SciPy actually provide batched operations in many cases, currently implemented using a `for` loop around individual calls to BLAS/LAPACK. However, our interface is relatively restricted compared to some of the proposals considered here: generally all we need is the ability to perform an operation on two "stacks" of size-homogenous matrices, with the offset between matrices determined by an arbitrary (potentially zero) stride. ### Fixed-point BLAS This is interesting, but we don't have any particularly useful comments. -- Nathaniel J. Smith -- https://vorpus.org

2 2

Re: [Numpy-discussion] array - dimension size of 1-D and 2-D examples
by Martin.Gfeller＠swisscom.com Jan. 9, 2018

Jan. 9, 2018

Hi Derek I have a related question: Given: a = numpy.array([[0,1,2],[3,4]]) assert a.ndim == 1 b = numpy.array([[0,1,2],[3,4,5]]) assert b.ndim == 2 Is there an elegant way to force b to remain a 1-dim object array? I have a use case where normally the sublists are of different lengths, but I get a completely different structure when they are (coincidentally in my case) of the same length. Thanks and best regards, Martin Martin Gfeller, Swisscom / Enterprise / Banking / Products / Quantax Message: 1 Date: Sun, 31 Dec 2017 00:11:48 +0100 From: Derek Homeier <derek(a)astro.physik.uni-goettingen.de> To: Discussion of Numerical Python <numpy-discussion(a)python.org> Subject: Re: [Numpy-discussion] array - dimension size of 1-D and 2-D examples Message-ID: <CC548593-308B-4561-A03C-D3017C707108(a)astro.physik.uni-goettingen.de> Content-Type: text/plain; charset=utf-8 On 30 Dec 2017, at 5:38 pm, Vinodhini Balusamy <me.vinob(a)gmail.com> wrote: > > Just one more question from the details you have provided which from > my understanding strongly seems to be Design [DEREK] You cannot create > a regular 2-dimensional integer array from one row of length 3 >> and a second one of length 0. Thus np.array chooses the next most >> basic type of array it can fit your input data in > Indeed, the general philosophy is to preserve the structure and type of your input data as far as possible, i.e. a list is turned into a 1d-array, a list of lists (or tuples etc?) into a 2d-array,_ if_ the sequences are of equal length (even if length 1). As long as there is an unambiguous way to convert the data into an array (see below). > Which is the case, only if an second one of length 0 is given. > What about the case 1 : > >>> x12 = np.array([[1,2,3]]) > >>> x12 > array([[1, 2, 3]]) > >>> print(x12) > [[1 2 3]] > >>> x12.ndim > 2 > >>> > >>> > This seems to take 2 dimension. Yes, structurally this is equivalent to your second example > also, >>> x12 = np.array([[1,2,3],[0,0,0]]) >>> print(x12) [[1 2 3] [0 0 0]] >>> x12.ndim 2 > I presumed the above case and the case where length 0 is provided to be treated same(I mean same behaviour). > Correct me if I am wrong. > In this case there is no unambiguous way to construct the array - you would need a shape (2, 3) array to store the two lists with 3 elements in the first list. Obviously x12[0] would be np.array([1,2,3]), but what should be the value of x12[1], if the second list is empty - it could be zeros, or repeating x12[0], or simply undefined. np.array([1, 2, 3], [4]]) would be even less clearly defined. These cases where there is no obvious ?right? way to create the array have usually been discussed at some length, but I don?t know if this is fully documented in some place. For the essentials, see https://docs.scipy.org/doc/numpy/reference/routines.array-creation.html note also the upcasting rules if you have e.g. a mix of integers and reals or complex numbers, and also how to control shape or data type explicitly with the respective keywords. Derek

2 1

array - dimension size of 1-D and 2-D examples
by Vinodhini Balusamy Jan. 8, 2018

Jan. 8, 2018

Hi, I am new to bumpy and started working on basics. I need clarification for the below behaviour. case 1: >>> x12 = np.array([[1,2,3]]) >>> print(x12) [[1 2 3]] >>> x12.ndim 2 >>> Tried creating a 2-D array. also, >>> x12 = np.array([[1,2,3],[0,0,0]]) >>> print(x12) [[1 2 3] [0 0 0]] >>> x12.ndim 2 >>> Case 2: >>> >>> x12 = np.array([[1,2,3],[]]) >>> x12.ndim 1 >>> print(x12) [list([1, 2, 3]) list([])] >>> In case 2, I am trying to understand why it becomes 1 dimentional ?!?! Case 3: >>> >>> x12 = np.array([1,2,3]) >>> x12.ndim 1 >>> print(x12) [1 2 3] >>> This seems reasonable to me to be considered as 1 dimensional. Would like to understand case 2 a bit more to get to know if i am missing something. Will be much appreciated if someone to explain me a bit. Thanks in advance. Regards, Vinodhini B

2 4

PowerPC machine available from OSUOSL
by Allan Haldane Jan. 5, 2018

Jan. 5, 2018

Hi all, For building and testing Numpy on the PowerPC arch, I've requested and obtained a VM instance at OSUOSL, which provides free hosting for open-source projects. This was suggested by @npanpaliya on github. http://osuosl.org/services/powerdev/request_hosting/ I have an immediate use for it to fix some float128 ppc problems, but it can be useful to other numpy devs too. If you are a numpy dev and want access to a ppc system for testing, I can gladly create an account for you. For now, just send me an email with your desired username and a public ssh key. We can discuss permissions and management depending on interest. === VM details === It is single node, Ubuntu 16.04.1, ppc64 POWER (Big Endian) arch, "m1_small" flavor, with usage policy at http://osuosl.org/services/hosting/policy/ I've installed the packages needed to build and test numpy. I ran tests: Numpy 1.13.3 has 2 unimportant test failures, but as expected 1.14 has ~20 more failures related to float128 reprs. === Other services to Consider === I did not request it but they provide a "Power CI" service which allows automated testing of ppc arch on github. If we want this I can look into it more. We might also consider asking for a node with ppc64le (Little Endian) arch for testing, but maybe the big-endian one is enough. Cheers, Allan

2 1

Position at BIDS (UC Berkeley) to work on NumPy
by Stefan van der Walt Jan. 5, 2018

Jan. 5, 2018

Hi everyone, Chuck suggested that I send a reminder that the Berkeley Institute for Data Science (BIDS) is hiring scientific Python Developers to contribute to NumPy. You can read more about the new positions here: https://bids.berkeley.edu/news/bids-receives-sloan-foundation-grant-contrib… If you enjoy collaborative work as well as the technical challenges posed by numerical computing, this is an excellent opportunity to play a fundamental role in the development of one of the most impactful libraries in the entire Python ecosystem. Best regards Stéfan Job link: https://jobsprod.is.berkeley.edu/psc/jobsprod/EMPLOYEE/HRMS/c/HRS_HRAM.HRS_…

1 0

NEP process PR
by Jarrod Millman Jan. 4, 2018

Jan. 4, 2018

Hi all, I've started working on the proposal discussed in this thread: https://mail.python.org/pipermail/numpy-discussion/2017-December/077481.html here: https://github.com/numpy/numpy/pull/10213 You can see how I modified PEP 1 here: https://github.com/numpy/numpy/pull/10213/commits/eaf788940dee7d0f1c7922fac… I used numbers (i.e., ``nep-0000.rst``) in the file names, since it seems more standard and numbers are easier to refer to in discussions. I don't have a strong preference, so I am happy to change it. If using numbers seems reasonable, I will number the existing NEPs. Moreover, if the preamble seems reasonable, I will go through the existing NEPs and make sure they all have compliant headers. For now, I think auto-generating the index is unnecessary. Once we are happy with the purpose and template NEPs as well as automatically publish to GH pages, we can always go back and write a little script to autogenerate the index using the preamble information. Finally, I started preparing to host the NEPs online at: http://numpy.github.io/neps If that seems reasonable, I will need someone to create a neps repo. We also need to decide whether to move ``numpy/doc/neps`` to the master branch of the new neps repo or whether to leave it where it is. I don't have a strong opinion either way. However, if no one else minds leaving it where it is, not moving it is slightly less work. Either way, the work is trivial. Regardless of where the source resides, we can host the generated web page in the same location (i.e., http://numpy.github.io/neps). It is probably also worth having https://docs.scipy.org/doc/numpy/neps/ redirect to http://numpy.github.io/neps Best regards, Jarrod

3 2

Direct GPU support on NumPy
by Yasunori Endo Jan. 3, 2018

Jan. 3, 2018

Hi I recently started working with Python and GPU, found that there're lot's of libraries provides ndarray like interface such as CuPy/PyOpenCL/PyCUDA/etc. I got so confused which one to use. Is there any reason not to support GPU computation directly on the NumPy itself? I want NumPy to support GPU computation as a standard. If the reason is just about human resources, I'd like to try implementing GPU support on my NumPy fork. My goal is to create standard NumPy interface which supports both CUDA and OpenCL, and more devices if available. Are there other reason not to support GPU on NumPy? Thanks. -- Yasunori Endo

8 10