From ralf.gommers at gmail.com Mon Jan 1 20:39:10 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 2 Jan 2018 14:39:10 +1300 Subject: [Numpy-discussion] NEP process PR In-Reply-To: References: Message-ID: On Wed, Dec 13, 2017 at 11:45 AM, Jarrod Millman wrote: > Hi all, > > I've started working on the proposal discussed in this thread: > https://mail.python.org/pipermail/numpy-discussion/ > 2017-December/077481.html > here: > https://github.com/numpy/numpy/pull/10213 Thanks Jarrod! Stefan, Marten and I reviewed in the meantime, and all looks good. Probably good to accept the PR in a few days unless there are other comments by then. Ralf > > You can see how I modified PEP 1 here: > https://github.com/numpy/numpy/pull/10213/commits/ > eaf788940dee7d0f1c7922fac70a87144de89656 > > I used numbers (i.e., ``nep-0000.rst``) in the file names, since it > seems more standard and numbers are easier to refer to in discussions. > I don't have a strong preference, so I am happy to change it. If > using numbers seems reasonable, I will number the existing NEPs. > Moreover, if the preamble seems reasonable, I will go through the > existing NEPs and make sure they all have compliant headers. For now, > I think auto-generating the index is unnecessary. Once we are happy > with the purpose and template NEPs as well as automatically publish to > GH pages, we can always go back and write a little script to > autogenerate the index using the preamble information. > > Finally, I started preparing to host the NEPs online at: > http://numpy.github.io/neps > If that seems reasonable, I will need someone to create a neps repo. > > We also need to decide whether to move ``numpy/doc/neps`` to the > master branch of the new neps repo or whether to leave it where it is. > I don't have a strong opinion either way. However, if no one else > minds leaving it where it is, not moving it is slightly less work. > Either way, the work is trivial. Regardless of where the source > resides, we can host the generated web page in the same location > (i.e., http://numpy.github.io/neps). > > It is probably also worth having > https://docs.scipy.org/doc/numpy/neps/ > redirect to > http://numpy.github.io/neps > > Best regards, > Jarrod > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jo7ueb at gmail.com Tue Jan 2 10:37:16 2018 From: jo7ueb at gmail.com (Yasunori Endo) Date: Tue, 02 Jan 2018 15:37:16 +0000 Subject: [Numpy-discussion] Direct GPU support on NumPy Message-ID: Hi I recently started working with Python and GPU, found that there're lot's of libraries provides ndarray like interface such as CuPy/PyOpenCL/PyCUDA/etc. I got so confused which one to use. Is there any reason not to support GPU computation directly on the NumPy itself? I want NumPy to support GPU computation as a standard. If the reason is just about human resources, I'd like to try implementing GPU support on my NumPy fork. My goal is to create standard NumPy interface which supports both CUDA and OpenCL, and more devices if available. Are there other reason not to support GPU on NumPy? Thanks. -- Yasunori Endo -------------- next part -------------- An HTML attachment was scrubbed... URL: From lev at columbia.edu Tue Jan 2 12:49:34 2018 From: lev at columbia.edu (Lev E Givon) Date: Tue, 2 Jan 2018 12:49:34 -0500 Subject: [Numpy-discussion] Direct GPU support on NumPy In-Reply-To: References: Message-ID: On Tue, Jan 2, 2018 at 10:37 AM, Yasunori Endo wrote: > Hi > > I recently started working with Python and GPU, > found that there're lot's of libraries provides > ndarray like interface such as CuPy/PyOpenCL/PyCUDA/etc. > I got so confused which one to use. > > Is there any reason not to support GPU computation > directly on the NumPy itself? > I want NumPy to support GPU computation as a standard. > > If the reason is just about human resources, > I'd like to try implementing GPU support on my NumPy fork. > My goal is to create standard NumPy interface which supports > both CUDA and OpenCL, and more devices if available. > > Are there other reason not to support GPU on NumPy? > > Thanks. > -- > Yasunori Endo Check out numba - it may already address some of your needs: https://numba.pydata.org/ -- Lev E. Givon, PhD http://lebedov.github.io From jerome.kieffer at esrf.fr Tue Jan 2 15:22:01 2018 From: jerome.kieffer at esrf.fr (Jerome Kieffer) Date: Tue, 2 Jan 2018 21:22:01 +0100 Subject: [Numpy-discussion] Direct GPU support on NumPy In-Reply-To: References: Message-ID: <20180102212201.69978472@mac13.esrf.fr> On Tue, 02 Jan 2018 15:37:16 +0000 Yasunori Endo wrote: > If the reason is just about human resources, > I'd like to try implementing GPU support on my NumPy fork. > My goal is to create standard NumPy interface which supports > both CUDA and OpenCL, and more devices if available. I think this initiative already exists ... something which merges the approach of cuda and opencl but I have no idea on the momentum behind it. > Are there other reason not to support GPU on NumPy? yes. Matlab has such support and the performances gain are in the order of 2x vs 10x when addressing the GPU directly. All the time is spent in sending data back & forth. Numba is indeed a good candidate bu limited to the PTX assembly (i.e. cuda, hence nvidia hardware) Cheers, Jerome From stefan at seefeld.name Tue Jan 2 15:38:41 2018 From: stefan at seefeld.name (Stefan Seefeld) Date: Tue, 2 Jan 2018 15:38:41 -0500 Subject: [Numpy-discussion] Direct GPU support on NumPy In-Reply-To: <20180102212201.69978472@mac13.esrf.fr> References: <20180102212201.69978472@mac13.esrf.fr> Message-ID: On 02.01.2018 15:22, Jerome Kieffer wrote: > On Tue, 02 Jan 2018 15:37:16 +0000 > Yasunori Endo wrote: > >> If the reason is just about human resources, >> I'd like to try implementing GPU support on my NumPy fork. >> My goal is to create standard NumPy interface which supports >> both CUDA and OpenCL, and more devices if available. > I think this initiative already exists ... something which merges the > approach of cuda and opencl but I have no idea on the momentum behind > it. > >> Are there other reason not to support GPU on NumPy? > yes. Matlab has such support and the performances gain are in the order > of 2x vs 10x when addressing the GPU directly. All the time is spent in > sending data back & forth. Numba is indeed a good candidate bu limited > to the PTX assembly (i.e. cuda, hence nvidia hardware) This suggests a new, higher-level data model which supports replicating data into different memory spaces (e.g. host and GPU). Then users (or some higher layer in the software stack) can dispatch operations to suitable implementations to minimize data movement. Given NumPy's current raw-pointer C API this seems difficult to implement, though, as it is very hard to track memory aliases. Regards, Stefan -- ...ich hab' noch einen Koffer in Berlin... -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.png Type: image/png Size: 1478 bytes Desc: not available URL: From jo7ueb at gmail.com Tue Jan 2 16:21:03 2018 From: jo7ueb at gmail.com (Yasunori Endo) Date: Tue, 02 Jan 2018 21:21:03 +0000 Subject: [Numpy-discussion] Direct GPU support on NumPy In-Reply-To: References: <20180102212201.69978472@mac13.esrf.fr> Message-ID: Hi all Numba looks so nice library to try. Thanks for the information. This suggests a new, higher-level data model which supports replicating > data into different memory spaces (e.g. host and GPU). Then users (or some > higher layer in the software stack) can dispatch operations to suitable > implementations to minimize data movement. > > Given NumPy's current raw-pointer C API this seems difficult to implement, > though, as it is very hard to track memory aliases. > I understood modifying numpy.ndarray for GPU is technically difficult. So my next primitive question is why NumPy doesn't offer ndarray like interface (e.g. numpy.gpuarray)? I wonder why everybody making *separate* library, making user confused. Is there any policy that NumPy refuse standard GPU implementation? Thanks. -- Yasunori Endo -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthieu.brucher at gmail.com Tue Jan 2 16:36:30 2018 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Tue, 2 Jan 2018 22:36:30 +0100 Subject: [Numpy-discussion] Direct GPU support on NumPy In-Reply-To: References: <20180102212201.69978472@mac13.esrf.fr> Message-ID: Hi, Let's say that Numpy provides a GPU version on GPU. How would that work with all the packages that expect the memory to be allocated on CPU? It's not that Numpy refuses a GPU implementation, it's that it wouldn't solve the problem of GPU/CPU having different memory. When/if nVidia decides (finally) that memory should be also accessible from the CPU (like AMD APU), then this argument is actually void. Matthieu 2018-01-02 22:21 GMT+01:00 Yasunori Endo : > Hi all > > Numba looks so nice library to try. > Thanks for the information. > > This suggests a new, higher-level data model which supports replicating >> data into different memory spaces (e.g. host and GPU). Then users (or some >> higher layer in the software stack) can dispatch operations to suitable >> implementations to minimize data movement. >> >> Given NumPy's current raw-pointer C API this seems difficult to >> implement, though, as it is very hard to track memory aliases. >> > I understood modifying numpy.ndarray for GPU is technically difficult. > > So my next primitive question is why NumPy doesn't offer > ndarray like interface (e.g. numpy.gpuarray)? > I wonder why everybody making *separate* library, making user confused. > Is there any policy that NumPy refuse standard GPU implementation? > > Thanks. > > > -- > Yasunori Endo > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -- Quantitative analyst, Ph.D. Blog: http://blog.audio-tk.com/ LinkedIn: http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue Jan 2 16:38:40 2018 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 2 Jan 2018 13:38:40 -0800 Subject: [Numpy-discussion] Direct GPU support on NumPy In-Reply-To: References: <20180102212201.69978472@mac13.esrf.fr> Message-ID: On Tue, Jan 2, 2018 at 1:21 PM, Yasunori Endo wrote: > > Hi all > > Numba looks so nice library to try. > Thanks for the information. > >> This suggests a new, higher-level data model which supports replicating data into different memory spaces (e.g. host and GPU). Then users (or some higher layer in the software stack) can dispatch operations to suitable implementations to minimize data movement. >> >> Given NumPy's current raw-pointer C API this seems difficult to implement, though, as it is very hard to track memory aliases. > > I understood modifying numpy.ndarray for GPU is technically difficult. > > So my next primitive question is why NumPy doesn't offer > ndarray like interface (e.g. numpy.gpuarray)? > I wonder why everybody making *separate* library, making user confused. Because there is no settled way to do this. All of those separate library implementations are trying different approaches. We are learning from each of their attempts. They can each move at their own pace rather than being tied down to numpy's slow rate of development and strict backwards compatibility requirements. They can try new things and aren't limited to their first mistakes. The user may well be confused by all of the different options currently available. I don't think that's avoidable: there are lots of meaningful options. Picking just one to stick into numpy is a disservice to the community that needs the other options. > Is there any policy that NumPy refuse standard GPU implementation? Not officially, but I'm pretty sure that there is no appetite among the developers for incorporating proper GPU support into numpy (by which I mean that a user would build numpy with certain settings then make use of the GPU using just numpy APIs). numpy is a mature project with a relatively small development team. Much of that effort is spent more on maintenance than new development. What there is appetite for is to listen to the needs of the GPU-using libraries and making sure that numpy's C and Python APIs are flexible enough to do what the GPU libraries need. This ties into the work that's being done to make ndarray subclasses better and formalizing the notions of an "array-like" interface that things like pandas Series, etc. can implement and play well with the rest of numpy. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at seefeld.name Tue Jan 2 17:04:14 2018 From: stefan at seefeld.name (Stefan Seefeld) Date: Tue, 2 Jan 2018 17:04:14 -0500 Subject: [Numpy-discussion] Direct GPU support on NumPy In-Reply-To: References: <20180102212201.69978472@mac13.esrf.fr> Message-ID: <823d3343-886f-2b9d-69b5-ecfffc81bcc0@seefeld.name> On 02.01.2018 16:36, Matthieu Brucher wrote: > Hi, > > Let's say that Numpy provides a GPU version on GPU. How would that > work with all the packages that expect the memory to be allocated on CPU? > It's not that Numpy refuses a GPU implementation, it's that it > wouldn't solve the problem of GPU/CPU having different memory. When/if > nVidia decides (finally) that memory should be also accessible from > the CPU (like AMD APU), then this argument is actually void. I actually doubt that. Sure, having a unified memory is convenient for the programmer. But as long as copying data between host and GPU is orders of magnitude slower than copying data locally, performance will suffer. Addressing this performance issue requires some NUMA-like approach, moving the operation to where the data resides, rather than treating all data locations equal. Stefan -- ...ich hab' noch einen Koffer in Berlin... -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.png Type: image/png Size: 1478 bytes Desc: not available URL: From harrigan.matthew at gmail.com Tue Jan 2 20:35:38 2018 From: harrigan.matthew at gmail.com (Matthew Harrigan) Date: Tue, 2 Jan 2018 20:35:38 -0500 Subject: [Numpy-discussion] Direct GPU support on NumPy In-Reply-To: <823d3343-886f-2b9d-69b5-ecfffc81bcc0@seefeld.name> References: <20180102212201.69978472@mac13.esrf.fr> <823d3343-886f-2b9d-69b5-ecfffc81bcc0@seefeld.name> Message-ID: Is it possible to have NumPy use a BLAS/LAPACK library that is GPU accelerated for certain problems? Any recommendations or readme's on how that might be set up? The other packages are nice but I would really love to just use scipy/sklearn and have decompositions, factorizations, etc for big matrices go a little faster without recoding the algorithms. Thanks On Tue, Jan 2, 2018 at 5:04 PM, Stefan Seefeld wrote: > On 02.01.2018 16:36, Matthieu Brucher wrote: > > Hi, > > Let's say that Numpy provides a GPU version on GPU. How would that work > with all the packages that expect the memory to be allocated on CPU? > It's not that Numpy refuses a GPU implementation, it's that it wouldn't > solve the problem of GPU/CPU having different memory. When/if nVidia > decides (finally) that memory should be also accessible from the CPU (like > AMD APU), then this argument is actually void. > > > I actually doubt that. Sure, having a unified memory is convenient for the > programmer. But as long as copying data between host and GPU is orders of > magnitude slower than copying data locally, performance will suffer. > Addressing this performance issue requires some NUMA-like approach, moving > the operation to where the data resides, rather than treating all data > locations equal. > > [image: Stefan] > > -- > > ...ich hab' noch einen Koffer in Berlin... > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.png Type: image/png Size: 1478 bytes Desc: not available URL: From lev at columbia.edu Tue Jan 2 20:56:21 2018 From: lev at columbia.edu (Lev E Givon) Date: Tue, 2 Jan 2018 20:56:21 -0500 Subject: [Numpy-discussion] Direct GPU support on NumPy In-Reply-To: References: <20180102212201.69978472@mac13.esrf.fr> <823d3343-886f-2b9d-69b5-ecfffc81bcc0@seefeld.name> Message-ID: On Jan 2, 2018 8:35 PM, "Matthew Harrigan" wrote: Is it possible to have NumPy use a BLAS/LAPACK library that is GPU accelerated for certain problems? Any recommendations or readme's on how that might be set up? The other packages are nice but I would really love to just use scipy/sklearn and have decompositions, factorizations, etc for big matrices go a little faster without recoding the algorithms. Thanks Depending on what operation you want to accelerate, scikit-cuda may provide a scipy-like interface to GPU-based implementations that you can use. It isn't a drop-in replacement for numpy/scipy, however. http://github.com/lebedov/scikit-cuda L -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Wed Jan 3 02:00:54 2018 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 3 Jan 2018 08:00:54 +0100 Subject: [Numpy-discussion] Direct GPU support on NumPy In-Reply-To: References: <20180102212201.69978472@mac13.esrf.fr> <823d3343-886f-2b9d-69b5-ecfffc81bcc0@seefeld.name> Message-ID: <20180103070054.GH754811@phare.normalesup.org> > The other packages are nice but I would really love to just use scipy/ > sklearn and have decompositions, factorizations, etc for big matrices > go a little faster without recoding the algorithms.? Thanks If you have very big matrices, scikit-learn's PCA already uses randomized linear algebra, which buys you more than GPUs. Ga?l From stefanv at berkeley.edu Thu Jan 4 15:43:16 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Thu, 04 Jan 2018 12:43:16 -0800 Subject: [Numpy-discussion] NEP process PR In-Reply-To: References: Message-ID: <1515098596.2948656.1224548360.0B291606@webmail.messagingengine.com> On Mon, Jan 1, 2018, at 17:39, Ralf Gommers wrote: > > On Wed, Dec 13, 2017 at 11:45 AM, Jarrod Millman > wrote:>> Hi all, >> >> I've started working on the proposal discussed in this thread: >> https://mail.python.org/pipermail/numpy-discussion/2017-December/077481.html>> here: >> https://github.com/numpy/numpy/pull/10213 > > Thanks Jarrod! > > Stefan, Marten and I reviewed in the meantime, and all looks good. > Probably good to accept the PR in a few days unless there are other > comments by then. Since the PR has been merged, I went ahead and created the two relevant repos, `numpy/neps` and `numpy/devdocs`, and gave Jarrod admin access. He'll take it from here, but feel free to change names etc. as needed. St?fan -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Thu Jan 4 20:02:02 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Thu, 04 Jan 2018 17:02:02 -0800 Subject: [Numpy-discussion] Position at BIDS (UC Berkeley) to work on NumPy Message-ID: <1515114122.3037386.1224777800.7DFD06B2@webmail.messagingengine.com> Hi everyone, Chuck suggested that I send a reminder that the Berkeley Institute for Data Science (BIDS) is hiring scientific Python Developers to contribute to NumPy. You can read more about the new positions here: https://bids.berkeley.edu/news/bids-receives-sloan-foundation-grant-contribute-numpy-development If you enjoy collaborative work as well as the technical challenges posed by numerical computing, this is an excellent opportunity to play a fundamental role in the development of one of the most impactful libraries in the entire Python ecosystem. Best regards St?fan Job link: https://jobsprod.is.berkeley.edu/psc/jobsprod/EMPLOYEE/HRMS/c/HRS_HRAM.HRS_CE.GBL?Page=HRS_CE_JOB_DTL&Action=A&JobOpeningId=24142&SiteId=1&PostingSeq=1 From allanhaldane at gmail.com Fri Jan 5 14:47:35 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Fri, 5 Jan 2018 14:47:35 -0500 Subject: [Numpy-discussion] PowerPC machine available from OSUOSL Message-ID: <1add5c32-40db-c088-34d5-a51da4719902@gmail.com> Hi all, For building and testing Numpy on the PowerPC arch, I've requested and obtained a VM instance at OSUOSL, which provides free hosting for open-source projects. This was suggested by @npanpaliya on github. http://osuosl.org/services/powerdev/request_hosting/ I have an immediate use for it to fix some float128 ppc problems, but it can be useful to other numpy devs too. If you are a numpy dev and want access to a ppc system for testing, I can gladly create an account for you. For now, just send me an email with your desired username and a public ssh key. We can discuss permissions and management depending on interest. === VM details === It is single node, Ubuntu 16.04.1, ppc64 POWER (Big Endian) arch, "m1_small" flavor, with usage policy at http://osuosl.org/services/hosting/policy/ I've installed the packages needed to build and test numpy. I ran tests: Numpy 1.13.3 has 2 unimportant test failures, but as expected 1.14 has ~20 more failures related to float128 reprs. === Other services to Consider === I did not request it but they provide a "Power CI" service which allows automated testing of ppc arch on github. If we want this I can look into it more. We might also consider asking for a node with ppc64le (Little Endian) arch for testing, but maybe the big-endian one is enough. Cheers, Allan From m.h.vankerkwijk at gmail.com Fri Jan 5 17:50:08 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Fri, 5 Jan 2018 17:50:08 -0500 Subject: [Numpy-discussion] PowerPC machine available from OSUOSL In-Reply-To: <1add5c32-40db-c088-34d5-a51da4719902@gmail.com> References: <1add5c32-40db-c088-34d5-a51da4719902@gmail.com> Message-ID: Doing CI on a different architecture, especially big-endian, would seem very useful. (Indeed, I'll look into it for my own project, of reading radio baseband data -- we run it on a BGQ, so more constant checking would be good to have). But it may be that we're a bit too big for CI, and that it is better to test at release time (as I guess you're doing!). -- Marten From charlesr.harris at gmail.com Sat Jan 6 20:00:20 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 6 Jan 2018 18:00:20 -0700 Subject: [Numpy-discussion] NumPy 1.14.0 release Message-ID: Hi All, On behalf of the NumPy team, I am pleased to announce NumPy 1.14.0. Numpy 1.14.0 is the result of seven months of work and contains a large number of bug fixes and new features, along with several changes with potential compatibility issues. The major change that users will notice are the stylistic changes in the way numpy arrays and scalars are printed, a change that will affect doctests. See the release notes for details on how to preserve the old style printing when needed. A major decision affecting future development concerns the schedule for dropping Python 2.7 support in the runup to 2020. The decision has been made to support 2.7 for all releases made in 2018, with the last release being designated a long term release with support for bug fixes extending through the end of 2019. Starting from January, 2019 support for 2.7 will be dropped in all new releases. More details can be found in the relevant NEP . This release supports Python 2.7 and 3.4 - 3.6. Wheels for the release are available on PyPI. Source tarballs, zipfiles, release notes, and the changelog are available on github . *Highlights* - The ``np.einsum`` function uses BLAS when possible - ``genfromtxt``, ``loadtxt``, ``fromregex`` and ``savetxt`` can now handle files with arbitrary Python supported encoding. - Major improvements to printing of NumPy arrays and scalars. *New functions* - ``parametrize``: decorator added to numpy.testing - ``chebinterpolate``: Interpolate function at Chebyshev points. - ``format_float_positional`` and ``format_float_scientific`` : format floating-point scalars unambiguously with control of rounding and padding. - ``PyArray_ResolveWritebackIfCopy`` and ``PyArray_SetWritebackIfCopyBase``, new C-API functions useful in achieving PyPy compatibity. *Contributors* A total of 100 people contributed to this release. People with a "+" by their names contributed a patch for the first time. * Alexey Brodkin + * Allan Haldane * Andras Deak + * Andrew Lawson + * Anna Chiara + * Antoine Pitrou * Bernhard M. Wiedemann + * Bob Eldering + * Brandon Carter * CJ Carey * Charles Harris * Chris Lamb * Christoph Boeddeker + * Christoph Gohlke * Daniel Hrisca + * Daniel Smith * Danny Hermes * David Freese * David Hagen * David Linke + * David Schaefer + * Dillon Niederhut + * Egor Panfilov + * Emilien Kofman * Eric Wieser * Erik Bray + * Erik Quaeghebeur + * Garry Polley + * Gunjan + * Han Shen + * Henke Adolfsson + * Hidehiro NAGAOKA + * Hemil Desai + * Hong Xu + * Iryna Shcherbina + * Jaime Fernandez * James Bourbeau + * Jamie Townsend + * Jarrod Millman * Jean Helie + * Jeroen Demeyer + * John Goetz + * John Kirkham * John Zwinck * Jonathan Helmus * Joseph Fox-Rabinovitz * Joseph Paul Cohen + * Joshua Leahy + * Julian Taylor * J?rg D?pfert + * Keno Goertz + * Kevin Sheppard + * Kexuan Sun + * Konrad Kapp + * Kristofor Maynard + * Licht Takeuchi + * Lo?c Est?ve * Lukas Mericle + * Marten van Kerkwijk * Matheus Portela + * Matthew Brett * Matti Picus * Michael Lamparski + * Michael Odintsov + * Michael Schnaitter + * Michael Seifert * Mike Nolta * Nathaniel J. Smith * Nelle Varoquaux + * Nicholas Del Grosso + * Nico Schl?mer + * Oleg Zabluda + * Oleksandr Pavlyk * Pauli Virtanen * Pim de Haan + * Ralf Gommers * Robert T. McGibbon + * Roland Kaufmann * Sebastian Berg * Serhiy Storchaka + * Shitian Ni + * Spencer Hill + * Srinivas Reddy Thatiparthy + * Stefan Winkler + * Stephan Hoyer * Steven Maude + * SuperBo + * Thomas K?ppe + * Toon Verstraelen * Vedant Misra + * Warren Weckesser * Wirawan Purwanto + * Yang Li + * Ziyan Zhou + * chaoyu3 + * orbit-stabilizer + * solarjoe * wufangjie + * xoviat + * ?lie Gouzien + Cheers, Charles Harris -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sun Jan 7 12:37:58 2018 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 8 Jan 2018 06:37:58 +1300 Subject: [Numpy-discussion] NumPy 1.14.0 release In-Reply-To: References: Message-ID: On Sun, Jan 7, 2018 at 2:00 PM, Charles R Harris wrote: > Hi All, > > On behalf of the NumPy team, I am pleased to announce NumPy 1.14.0. > Thanks for doing the heavy lifting to get this release out the door Chuck! Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Sun Jan 7 15:59:06 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Sun, 7 Jan 2018 15:59:06 -0500 Subject: [Numpy-discussion] NumPy 1.14.0 release In-Reply-To: References: Message-ID: <4e07370e-51ff-03b7-848a-c9080cdb433f@gmail.com> On 01/07/2018 12:37 PM, Ralf Gommers wrote: > > > On Sun, Jan 7, 2018 at 2:00 PM, Charles R Harris > > wrote: > > Hi All, > > On behalf of the NumPy team, I am pleased to announce NumPy 1.14.0. > > > Thanks for doing the heavy lifting to get this release out the door Chuck! > > Ralf Yes, I am always very impressed and appreciative of all the work Chuck does for Numpy. Thank you very much! Allan From njs at pobox.com Sun Jan 7 23:24:41 2018 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 7 Jan 2018 20:24:41 -0800 Subject: [Numpy-discussion] NumPy 1.14.0 release In-Reply-To: <4e07370e-51ff-03b7-848a-c9080cdb433f@gmail.com> References: <4e07370e-51ff-03b7-848a-c9080cdb433f@gmail.com> Message-ID: On Sun, Jan 7, 2018 at 12:59 PM, Allan Haldane wrote: > On 01/07/2018 12:37 PM, Ralf Gommers wrote: >> >> >> >> On Sun, Jan 7, 2018 at 2:00 PM, Charles R Harris >> > wrote: >> >> Hi All, >> >> On behalf of the NumPy team, I am pleased to announce NumPy 1.14.0. >> >> >> Thanks for doing the heavy lifting to get this release out the door Chuck! >> >> Ralf > > > Yes, I am always very impressed and appreciative of all the work Chuck does > for Numpy. Thank you very much! +1 -- Nathaniel J. Smith -- https://vorpus.org From me.vinob at gmail.com Mon Jan 8 06:20:15 2018 From: me.vinob at gmail.com (Vinodhini Balusamy) Date: Mon, 8 Jan 2018 22:20:15 +1100 Subject: [Numpy-discussion] array - dimension size of 1-D and 2-D examples In-Reply-To: References: <449981B3-DE05-4595-9A13-C129CFFBC51F@gmail.com> Message-ID: <08FA01E4-A372-4BBA-BFCE-B75669804B3C@gmail.com> Missed this mail. Thanks Derek For the clarification provided. Kind Rgds, Vinodhini > On 31 Dec 2017, at 10:11 am, Derek Homeier wrote: > > On 30 Dec 2017, at 5:38 pm, Vinodhini Balusamy wrote: >> >> Just one more question from the details you have provided which from my understanding strongly seems to be Design >> [DEREK] You cannot create a regular 2-dimensional integer array from one row of length 3 >>> and a second one of length 0. Thus np.array chooses the next most basic type of >>> array it can fit your input data in >> > Indeed, the general philosophy is to preserve the structure and type of your input data > as far as possible, i.e. a list is turned into a 1d-array, a list of lists (or tuples etc?) into > a 2d-array,_ if_ the sequences are of equal length (even if length 1). > As long as there is an unambiguous way to convert the data into an array (see below). > >> Which is the case, only if an second one of length 0 is given. >> What about the case 1 : >>>>> x12 = np.array([[1,2,3]]) >>>>> x12 >> array([[1, 2, 3]]) >>>>> print(x12) >> [[1 2 3]] >>>>> x12.ndim >> 2 >>>>> >>>>> >> This seems to take 2 dimension. > > Yes, structurally this is equivalent to your second example > >> also, >>>> x12 = np.array([[1,2,3],[0,0,0]]) >>>> print(x12) > [[1 2 3] > [0 0 0]] >>>> x12.ndim > 2 > >> I presumed the above case and the case where length 0 is provided to be treated same(I mean same behaviour). >> Correct me if I am wrong. >> > In this case there is no unambiguous way to construct the array - you would need a shape (2, 3) > array to store the two lists with 3 elements in the first list. Obviously x12[0] would be np.array([1,2,3]), > but what should be the value of x12[1], if the second list is empty - it could be zeros, or repeating x12[0], > or simply undefined. np.array([1, 2, 3], [4]]) would be even less clearly defined. > These cases where there is no obvious ?right? way to create the array have usually been discussed at > some length, but I don?t know if this is fully documented in some place. For the essentials, see > > https://docs.scipy.org/doc/numpy/reference/routines.array-creation.html > > note also the upcasting rules if you have e.g. a mix of integers and reals or complex numbers, > and also how to control shape or data type explicitly with the respective keywords. > > Derek > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From njs at pobox.com Tue Jan 9 03:24:17 2018 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 9 Jan 2018 00:24:17 -0800 Subject: [Numpy-discussion] RFC: comments to BLAS committee from numpy/scipy devs Message-ID: Hi all, As mentioned earlier [1][2], there's work underway to revise and update the BLAS standard -- e.g. we might get support for strided arrays and lose xerbla! There's a draft at [3]. They're interested in feedback from users, so I've written up a first draft of comments about what we would like as NumPy/SciPy developers. This is very much a first attempt -- I know we have lots of people who are more expert on BLAS than me on these lists :-). Please let me know what you think. -n [1] https://mail.python.org/pipermail/numpy-discussion/2017-November/077420.html [2] https://mail.python.org/pipermail/scipy-dev/2017-November/022267.html [3] https://docs.google.com/document/d/1DY4ImZT1coqri2382GusXgBTTTVdBDvtD5I14QHp9OE/edit ----- # Comments from NumPy / SciPy developers on "A Proposal for a Next-Generation BLAS" These are comments on [A Proposal for a Next-Generation BLAS](https://docs.google.com/document/d/1DY4ImZT1coqri2382GusXgBTTTVdBDvtD5I14QHp9OE/edit#) (version as of 2017-12-13), from the perspective of the developers of the NumPy and SciPy libraries. We hope this feedback is useful, and welcome further discussion. ## Who are we? NumPy and SciPy are the two foundational libraries of the Python numerical ecosystem, and one of their duties is to wrap BLAS and expose it for the use of other Python libraries. (NumPy primarily provides a GEMM wrapper, while SciPy exposes more specialized operations.) It's unclear how many users we have exactly, but we certainly ship multiple million copies of BLAS every month, and provide one of the most popular numerical toolkits for both novice and expert users. Looking at the original BLAS and LAPACK interfaces, it often seems that their imagined user is something like a classic supercomputer consumer, who writes code directly in Fortran or C against the BLAS API, and where the person writing the code and running the code are the same. NumPy/SciPy are coming from a very different perspective: our users generally know nothing about the details of the underlying BLAS; they just want to describe their problem in some high-level way, and the library is responsible for making it happen as efficiently as possible, and is often integrated into some larger system (e.g. a real-time analytics platform embedded in a web server). When it comes to our BLAS usage, we mostly use only a small subset of the routines. However, as "consumer software" used by a wide variety of users with differing degress of technical expertise, we're expected to Just Work on a wide variety of systems, and with as many different vendor BLAS libraries as possible. On the other hand, the fact that we're working with Python means we don't tend to worry about small inefficiencies that will be lost in the noise in any case, and are willing to sacrifice some performance to get more reliable operation across our diverse userbase. ## Comments on specific aspects of the proposal ### Data Layout We are **strongly in favor** of the proposal to support arbitrary strided data layouts. Ideally, this would support strides *specified in bytes* (allowing for unaligned data layouts), and allow for truly arbitrary strides, including *zero or negative* values. However, we think it's fine if some of the weirder cases suffer a performance penalty. Rationale: NumPy ? and thus, most of the scientific Python ecosystem ? only has one way of representing an array: the `numpy.ndarray` type, which is an arbitrary dimensional tensor with arbitrary strides. It is common to encounter matrices with non-trivial strides. For example:: # Make a 3-dimensional tensor, 10 x 9 x 8 t = np.zeros((10, 9, 8)) # Considering this as a stack of eight 10x9 matrices, extract the first: mat = t[:, :, 0] Now `mat` has non-trivial strides on both axes. (If running this in a Python interpreter, you can see this by looking at the value of `mat.strides`.) Another case where interesting strides arise is when performing ["broadcasting"](https://docs.scipy.org/doc/numpy-1.13.0/user/basics.broadcasting.html), which is the name for NumPy's rules for stretching arrays to make their shapes match. For example, in an expression like:: np.array([1, 2, 3]) + 1 the scalar `1` is "broadcast" to create a vector `[1, 1, 1]`. This is accomplished without allocating memory, by creating a vector with settings length = 3, strides = 0 ? so all the elements share a single location in memory. Similarly, by using negative strides we can reverse an array without allocating memory:: a = np.array([1, 2, 3]) a_flipped = a[::-1] Now `a_flipped` has the value `[3, 2, 1]`, while sharing storage with the array `a = [1, 2, 3]`. Misaligned data is also possible (e.g. an array of 8-byte doubles with a 9-byte stride), though it arises more rarely. (An example of when it might occurs is in an on-disk data format that alternates between storing a double value and then a single byte value, which is then memory-mapped.) While this array representation is very flexible and useful, it makes interfacing with BLAS a challenge: how do you perform a GEMM operation on two arrays with arbitrary strides? Currently, NumPy attempts to detect a number of special cases: if the strides in both arrays imply a column-major layout, then call BLAS directly; if one of them has strides corresponding to a row-major layout, then set the corresponding `transA`/`transB` argument, etc. ? and if all else fails, either copy the data into a contiguous buffer, or else fall back on a naive triple-nested-loop GEMM implementation. (There's also a check where if we can determine through examining the arrays' data pointers and strides that they're actually transposes of each other, then we instead dispatch to SYRK.) So for us, native stride support in the BLAS has two major advantages: - It allows a wider variety of operations to be transparently handled using high-speed kernels. - It reduces the complexity of the NumPy/BLAS interface code. Any kind of stride support produces both of these advantages to some extent. The ideal would be if BLAS supports strides specified in bytes (not elements), and allowing truly arbitrary values, because in this case, we can simply pass through our arrays directly to the library without having to do any fiddly checking at all. Note that for these purposes, it's OK if "weird" arrays pay a speed penalty: if the BLAS didn't support them, we'd just have to fall back on some even worse strategy for handling them ourselves. And this approach would mean that if some BLAS vendors do at some point decide to optimize these cases, then our users will immediately see the advantages. ### Error-reporting We are **strongly in favor** of the draft's proposal to replace XERBLA with proper error codes. XERBLA is fine for one-off programs in Fortran, but awful for wrapper libraries like NumPy/SciPy. ### Handling of exceptional values A recurring issue in low-level linear-algebra libraries is that they may crash, freeze, or produce unexpected results when receiving non-finite inputs. Of course as a wrapper library, NumPy/SciPy can't control what kind of strange data our users might pass in, and we have to produce sensible results regardless, so we sometimes accumulate hacks like having to validate all incoming data for exceptional values We support the proposal's recommendation of "NaN interpretation (4)". We also note that contrary to the note under "Interpretation (1)"; in the R statistical environment NaN does *not* represent missing data. R maintains a clear distinction between "NA" (missing) values and "NaN" (invalid) values. This is important for various statistical procedures such as imputation, where you might want to estimate the true value of an unmeasured variable, but you don't want to estimate the true value of 0/0. For efficiency, they do perform some calculations on NA values by storing them as NaNs with a specific payload; in the R context, therefore, it's particularly valuable if operations **correctly propagate NaN payloads**. NumPy does not yet provide first-class missing value support, but we plan to add it within the next 1-2 years, and when we do it seems likely that we'll have similar needs to R. ### BLAS G2 NumPy has a rich type system, including 16-, 32-, and 64-bit floats (plus extended precision on some platforms), a similar set of complex values, and is further extensible with new types. We have a similarly rich dispatch system for operations that attempts to find the best matching optimized-loop for an arbitrary set of input types. We're thus quite interested in the proposed mixed-type operations, and if implemented we'll probably start transparently taking advantage of them (i.e., in cases where we're currently upcasting to perform a requested operation, we may switch to using the native precision operation without requiring any changes in user code). Two comments, though: 1. The various tricks to make names less redundant (e.g., specifying just one type if they're all the same) are convenient for those calling these routines by hand, but rather inconvenient for those of us who will be generating a table mapping different type combinations to different functions. When designing a compression scheme for names, please remember that the scheme will have to be unambiguously implemented in code by multiple projects. Another option to consider: ask vendors to implement the full "uncompressed" names, and then provide a shim library mapping short "convenience" names to their full form. 2. Regarding the suggestion that the naming scheme will allow for a wide variety of differently typed routines, but that only some subset will be implemented by any particular vendor: we are worried about this subset business. For a library like NumPy that wants to wrap "all" the provided routines, while supporting "all" the different BLAS libraries ... how will this work? **How do we know which routines any particular library provides?** What if new routines are added to the list later? ### Reproducible BLAS This is interesting, but we don't have any particularly useful comments. ### Batch BLAS NumPy/SciPy actually provide batched operations in many cases, currently implemented using a `for` loop around individual calls to BLAS/LAPACK. However, our interface is relatively restricted compared to some of the proposals considered here: generally all we need is the ability to perform an operation on two "stacks" of size-homogenous matrices, with the offset between matrices determined by an arbitrary (potentially zero) stride. ### Fixed-point BLAS This is interesting, but we don't have any particularly useful comments. -- Nathaniel J. Smith -- https://vorpus.org From ilhanpolat at gmail.com Tue Jan 9 06:40:02 2018 From: ilhanpolat at gmail.com (Ilhan Polat) Date: Tue, 9 Jan 2018 12:40:02 +0100 Subject: [Numpy-discussion] [SciPy-Dev] RFC: comments to BLAS committee from numpy/scipy devs In-Reply-To: References: Message-ID: I couldn't find an item to place this but I think ilaenv and also calling the function twice (one with lwork=-1 and reading the optimal block size and the call the function again properly with lwork=) in LAPACK needs to be gotten rid of. That's a major annoyance during the wrapping of LAPACK routines for SciPy. I don't know if this is realistic but the values ilaenv needed can be computed once (or again if hardware is changed) at the install and can be read off by the routines. On Jan 9, 2018 09:25, "Nathaniel Smith" wrote: > Hi all, > > As mentioned earlier [1][2], there's work underway to revise and > update the BLAS standard -- e.g. we might get support for strided > arrays and lose xerbla! There's a draft at [3]. They're interested in > feedback from users, so I've written up a first draft of comments > about what we would like as NumPy/SciPy developers. This is very much > a first attempt -- I know we have lots of people who are more expert > on BLAS than me on these lists :-). Please let me know what you think. > > -n > > [1] https://mail.python.org/pipermail/numpy-discussion/ > 2017-November/077420.html > [2] https://mail.python.org/pipermail/scipy-dev/2017-November/022267.html > [3] https://docs.google.com/document/d/1DY4ImZT1coqri2382GusXgBTTTVdB > DvtD5I14QHp9OE/edit > > ----- > > # Comments from NumPy / SciPy developers on "A Proposal for a > Next-Generation BLAS" > > These are comments on [A Proposal for a Next-Generation > BLAS](https://docs.google.com/document/d/1DY4ImZT1coqri2382GusXgBTTTVdB > DvtD5I14QHp9OE/edit#) > (version as of 2017-12-13), from the perspective of the developers of > the NumPy and SciPy libraries. We hope this feedback is useful, and > welcome further discussion. > > ## Who are we? > > NumPy and SciPy are the two foundational libraries of the Python > numerical ecosystem, and one of their duties is to wrap BLAS and > expose it for the use of other Python libraries. (NumPy primarily > provides a GEMM wrapper, while SciPy exposes more specialized > operations.) It's unclear how many users we have exactly, but we > certainly ship multiple million copies of BLAS every month, and > provide one of the most popular numerical toolkits for both novice and > expert users. > > Looking at the original BLAS and LAPACK interfaces, it often seems > that their imagined user is something like a classic supercomputer > consumer, who writes code directly in Fortran or C against the BLAS > API, and where the person writing the code and running the code are > the same. NumPy/SciPy are coming from a very different perspective: > our users generally know nothing about the details of the underlying > BLAS; they just want to describe their problem in some high-level way, > and the library is responsible for making it happen as efficiently as > possible, and is often integrated into some larger system (e.g. a > real-time analytics platform embedded in a web server). > > When it comes to our BLAS usage, we mostly use only a small subset of > the routines. However, as "consumer software" used by a wide variety > of users with differing degress of technical expertise, we're expected > to Just Work on a wide variety of systems, and with as many different > vendor BLAS libraries as possible. On the other hand, the fact that > we're working with Python means we don't tend to worry about small > inefficiencies that will be lost in the noise in any case, and are > willing to sacrifice some performance to get more reliable operation > across our diverse userbase. > > ## Comments on specific aspects of the proposal > > ### Data Layout > > We are **strongly in favor** of the proposal to support arbitrary > strided data layouts. Ideally, this would support strides *specified > in bytes* (allowing for unaligned data layouts), and allow for truly > arbitrary strides, including *zero or negative* values. However, we > think it's fine if some of the weirder cases suffer a performance > penalty. > > Rationale: NumPy ? and thus, most of the scientific Python ecosystem ? > only has one way of representing an array: the `numpy.ndarray` type, > which is an arbitrary dimensional tensor with arbitrary strides. It is > common to encounter matrices with non-trivial strides. For example:: > > # Make a 3-dimensional tensor, 10 x 9 x 8 > t = np.zeros((10, 9, 8)) > # Considering this as a stack of eight 10x9 matrices, extract the > first: > mat = t[:, :, 0] > > Now `mat` has non-trivial strides on both axes. (If running this in a > Python interpreter, you can see this by looking at the value of > `mat.strides`.) Another case where interesting strides arise is when > performing ["broadcasting"](https://docs.scipy.org/doc/numpy-1.13.0/ > user/basics.broadcasting.html), > which is the name for NumPy's rules for stretching arrays to make > their shapes match. For example, in an expression like:: > > np.array([1, 2, 3]) + 1 > > the scalar `1` is "broadcast" to create a vector `[1, 1, 1]`. This is > accomplished without allocating memory, by creating a vector with > settings length = 3, strides = 0 ? so all the elements share a single > location in memory. Similarly, by using negative strides we can > reverse an array without allocating memory:: > > a = np.array([1, 2, 3]) > a_flipped = a[::-1] > > Now `a_flipped` has the value `[3, 2, 1]`, while sharing storage with > the array `a = [1, 2, 3]`. Misaligned data is also possible (e.g. an > array of 8-byte doubles with a 9-byte stride), though it arises more > rarely. (An example of when it might occurs is in an on-disk data > format that alternates between storing a double value and then a > single byte value, which is then memory-mapped.) > > While this array representation is very flexible and useful, it makes > interfacing with BLAS a challenge: how do you perform a GEMM operation > on two arrays with arbitrary strides? Currently, NumPy attempts to > detect a number of special cases: if the strides in both arrays imply > a column-major layout, then call BLAS directly; if one of them has > strides corresponding to a row-major layout, then set the > corresponding `transA`/`transB` argument, etc. ? and if all else > fails, either copy the data into a contiguous buffer, or else fall > back on a naive triple-nested-loop GEMM implementation. (There's also > a check where if we can determine through examining the arrays' data > pointers and strides that they're actually transposes of each other, > then we instead dispatch to SYRK.) > > So for us, native stride support in the BLAS has two major advantages: > > - It allows a wider variety of operations to be transparently handled > using high-speed kernels. > > - It reduces the complexity of the NumPy/BLAS interface code. > > Any kind of stride support produces both of these advantages to some > extent. The ideal would be if BLAS supports strides specified in bytes > (not elements), and allowing truly arbitrary values, because in this > case, we can simply pass through our arrays directly to the library > without having to do any fiddly checking at all. Note that for these > purposes, it's OK if "weird" arrays pay a speed penalty: if the BLAS > didn't support them, we'd just have to fall back on some even worse > strategy for handling them ourselves. And this approach would mean > that if some BLAS vendors do at some point decide to optimize these > cases, then our users will immediately see the advantages. > > ### Error-reporting > > We are **strongly in favor** of the draft's proposal to replace XERBLA > with proper error codes. XERBLA is fine for one-off programs in > Fortran, but awful for wrapper libraries like NumPy/SciPy. > > ### Handling of exceptional values > > A recurring issue in low-level linear-algebra libraries is that they > may crash, freeze, or produce unexpected results when receiving > non-finite inputs. Of course as a wrapper library, NumPy/SciPy can't > control what kind of strange data our users might pass in, and we have > to produce sensible results regardless, so we sometimes accumulate > hacks like having to validate all incoming data for exceptional values > > We support the proposal's recommendation of "NaN interpretation (4)". > > We also note that contrary to the note under "Interpretation (1)"; in > the R statistical environment NaN does *not* represent missing data. R > maintains a clear distinction between "NA" (missing) values and "NaN" > (invalid) values. This is important for various statistical procedures > such as imputation, where you might want to estimate the true value of > an unmeasured variable, but you don't want to estimate the true value > of 0/0. For efficiency, they do perform some calculations on NA values > by storing them as NaNs with a specific payload; in the R context, > therefore, it's particularly valuable if operations **correctly > propagate NaN payloads**. NumPy does not yet provide first-class > missing value support, but we plan to add it within the next 1-2 > years, and when we do it seems likely that we'll have similar needs to > R. > > ### BLAS G2 > > NumPy has a rich type system, including 16-, 32-, and 64-bit floats > (plus extended precision on some platforms), a similar set of complex > values, and is further extensible with new types. We have a similarly > rich dispatch system for operations that attempts to find the best > matching optimized-loop for an arbitrary set of input types. We're > thus quite interested in the proposed mixed-type operations, and if > implemented we'll probably start transparently taking advantage of > them (i.e., in cases where we're currently upcasting to perform a > requested operation, we may switch to using the native precision > operation without requiring any changes in user code). > > Two comments, though: > > 1. The various tricks to make names less redundant (e.g., specifying > just one type if they're all the same) are convenient for those > calling these routines by hand, but rather inconvenient for those of > us who will be generating a table mapping different type combinations > to different functions. When designing a compression scheme for names, > please remember that the scheme will have to be unambiguously > implemented in code by multiple projects. Another option to consider: > ask vendors to implement the full "uncompressed" names, and then > provide a shim library mapping short "convenience" names to their full > form. > > 2. Regarding the suggestion that the naming scheme will allow for a > wide variety of differently typed routines, but that only some subset > will be implemented by any particular vendor: we are worried about > this subset business. For a library like NumPy that wants to wrap > "all" the provided routines, while supporting "all" the different BLAS > libraries ... how will this work? **How do we know which routines any > particular library provides?** What if new routines are added to the > list later? > > ### Reproducible BLAS > > This is interesting, but we don't have any particularly useful comments. > > ### Batch BLAS > > NumPy/SciPy actually provide batched operations in many cases, > currently implemented using a `for` loop around individual calls to > BLAS/LAPACK. However, our interface is relatively restricted compared > to some of the proposals considered here: generally all we need is the > ability to perform an operation on two "stacks" of size-homogenous > matrices, with the offset between matrices determined by an arbitrary > (potentially zero) stride. > > ### Fixed-point BLAS > > This is interesting, but we don't have any particularly useful comments. > > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > SciPy-Dev mailing list > SciPy-Dev at python.org > https://mail.python.org/mailman/listinfo/scipy-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Martin.Gfeller at swisscom.com Tue Jan 9 07:27:58 2018 From: Martin.Gfeller at swisscom.com (Martin.Gfeller at swisscom.com) Date: Tue, 9 Jan 2018 12:27:58 +0000 Subject: [Numpy-discussion] array - dimension size of 1-D and 2-D examples Message-ID: Hi Derek I have a related question: Given: a = numpy.array([[0,1,2],[3,4]]) assert a.ndim == 1 b = numpy.array([[0,1,2],[3,4,5]]) assert b.ndim == 2 Is there an elegant way to force b to remain a 1-dim object array? I have a use case where normally the sublists are of different lengths, but I get a completely different structure when they are (coincidentally in my case) of the same length. Thanks and best regards, Martin Martin Gfeller, Swisscom / Enterprise / Banking / Products / Quantax Message: 1 Date: Sun, 31 Dec 2017 00:11:48 +0100 From: Derek Homeier To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] array - dimension size of 1-D and 2-D examples Message-ID: Content-Type: text/plain; charset=utf-8 On 30 Dec 2017, at 5:38 pm, Vinodhini Balusamy wrote: > > Just one more question from the details you have provided which from > my understanding strongly seems to be Design [DEREK] You cannot create > a regular 2-dimensional integer array from one row of length 3 >> and a second one of length 0. Thus np.array chooses the next most >> basic type of array it can fit your input data in > Indeed, the general philosophy is to preserve the structure and type of your input data as far as possible, i.e. a list is turned into a 1d-array, a list of lists (or tuples etc?) into a 2d-array,_ if_ the sequences are of equal length (even if length 1). As long as there is an unambiguous way to convert the data into an array (see below). > Which is the case, only if an second one of length 0 is given. > What about the case 1 : > >>> x12 = np.array([[1,2,3]]) > >>> x12 > array([[1, 2, 3]]) > >>> print(x12) > [[1 2 3]] > >>> x12.ndim > 2 > >>> > >>> > This seems to take 2 dimension. Yes, structurally this is equivalent to your second example > also, >>> x12 = np.array([[1,2,3],[0,0,0]]) >>> print(x12) [[1 2 3] [0 0 0]] >>> x12.ndim 2 > I presumed the above case and the case where length 0 is provided to be treated same(I mean same behaviour). > Correct me if I am wrong. > In this case there is no unambiguous way to construct the array - you would need a shape (2, 3) array to store the two lists with 3 elements in the first list. Obviously x12[0] would be np.array([1,2,3]), but what should be the value of x12[1], if the second list is empty - it could be zeros, or repeating x12[0], or simply undefined. np.array([1, 2, 3], [4]]) would be even less clearly defined. These cases where there is no obvious ?right? way to create the array have usually been discussed at some length, but I don?t know if this is fully documented in some place. For the essentials, see https://docs.scipy.org/doc/numpy/reference/routines.array-creation.html note also the upcasting rules if you have e.g. a mix of integers and reals or complex numbers, and also how to control shape or data type explicitly with the respective keywords. Derek From sebastian at sipsolutions.net Tue Jan 9 08:47:41 2018 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 09 Jan 2018 14:47:41 +0100 Subject: [Numpy-discussion] array - dimension size of 1-D and 2-D examples In-Reply-To: References: Message-ID: <1515505661.23088.1.camel@sipsolutions.net> On Tue, 2018-01-09 at 12:27 +0000, Martin.Gfeller at swisscom.com wrote: > Hi Derek > > I have a related question: > > Given: > > a = numpy.array([[0,1,2],[3,4]]) > assert a.ndim == 1 > b = numpy.array([[0,1,2],[3,4,5]]) > assert b.ndim == 2 > > Is there an elegant way to force b to remain a 1-dim object array? > You will have to create an empty object array and assign the lists to it. ``` b = np.empty(len(l), dtype=object) b[...] = l ``` > I have a use case where normally the sublists are of different > lengths, but I get a completely different structure when they are > (coincidentally in my case) of the same length. > > Thanks and best regards, Martin > > > Martin Gfeller, Swisscom / Enterprise / Banking / Products / Quantax > > Message: 1 > Date: Sun, 31 Dec 2017 00:11:48 +0100 > From: Derek Homeier > To: Discussion of Numerical Python > Subject: Re: [Numpy-discussion] array - dimension size of 1-D and 2-D > examples > Message-ID: > en.de> > Content-Type: text/plain; charset=utf-8 > > On 30 Dec 2017, at 5:38 pm, Vinodhini Balusamy > wrote: > > > > Just one more question from the details you have provided which > > from > > my understanding strongly seems to be Design [DEREK] You cannot > > create > > a regular 2-dimensional integer array from one row of length 3 > > > and a second one of length 0. Thus np.array chooses the next > > > most > > > basic type of array it can fit your input data in > > Indeed, the general philosophy is to preserve the structure and type > of your input data as far as possible, i.e. a list is turned into a > 1d-array, a list of lists (or tuples etc?) into a 2d-array,_ if_ the > sequences are of equal length (even if length 1). > As long as there is an unambiguous way to convert the data into an > array (see below). > > > Which is the case, only if an second one of length 0 is given. > > What about the case 1 : > > > > > x12 = np.array([[1,2,3]]) > > > > > x12 > > > > array([[1, 2, 3]]) > > > > > print(x12) > > > > [[1 2 3]] > > > > > x12.ndim > > > > 2 > > > > > > > > > > > > > > This seems to take 2 dimension. > > Yes, structurally this is equivalent to your second example > > > also, > > > > x12 = np.array([[1,2,3],[0,0,0]]) > > > > print(x12) > > [[1 2 3] > [0 0 0]] > > > > x12.ndim > > 2 > > > I presumed the above case and the case where length 0 is provided > > to be treated same(I mean same behaviour). > > Correct me if I am wrong. > > > > In this case there is no unambiguous way to construct the array - you > would need a shape (2, 3) array to store the two lists with 3 > elements in the first list. Obviously x12[0] would be > np.array([1,2,3]), but what should be the value of x12[1], if the > second list is empty - it could be zeros, or repeating x12[0], or > simply undefined. np.array([1, 2, 3], [4]]) would be even less > clearly defined. > These cases where there is no obvious ?right? way to create the array > have usually been discussed at some length, but I don?t know if this > is fully documented in some place. For the essentials, see > > https://docs.scipy.org/doc/numpy/reference/routines.array-creation.ht > ml > > note also the upcasting rules if you have e.g. a mix of integers and > reals or complex numbers, and also how to control shape or data type > explicitly with the respective keywords. > > Derek > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From njs at pobox.com Tue Jan 9 19:34:04 2018 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 9 Jan 2018 16:34:04 -0800 Subject: [Numpy-discussion] [SciPy-Dev] RFC: comments to BLAS committee from numpy/scipy devs In-Reply-To: References: Message-ID: On Tue, Jan 9, 2018 at 3:40 AM, Ilhan Polat wrote: > I couldn't find an item to place this but I think ilaenv and also calling > the function twice (one with lwork=-1 and reading the optimal block size and > the call the function again properly with lwork=) in LAPACK needs to > be gotten rid of. > > That's a major annoyance during the wrapping of LAPACK routines for SciPy. > > I don't know if this is realistic but the values ilaenv needed can be > computed once (or again if hardware is changed) at the install and can be > read off by the routines. Unfortunately I think this effort is just to revise BLAS, not LAPACK. Maybe you should try starting a conversation with the LAPACK developers though ? I don't know much about how they work but maybe they'd be interested in feedback. -n -- Nathaniel J. Smith -- https://vorpus.org From njs at pobox.com Tue Jan 9 19:37:18 2018 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 9 Jan 2018 16:37:18 -0800 Subject: [Numpy-discussion] [SciPy-Dev] RFC: comments to BLAS committee from numpy/scipy devs In-Reply-To: References: Message-ID: On Tue, Jan 9, 2018 at 12:53 PM, Tyler Reddy wrote: > One common issue in computational geometry is the need to operate rapidly on > arrays with "heterogeneous shapes." > > So, an array that has rows with different numbers of columns -- shape (1,3) > for the first polygon and shape (1, 12) for the second polygon and so on. > > This seems like a particularly nasty scenario when the loss of "homogeneity" > in shape precludes traditional vectorization -- I think numpy effectively > converts these to dtype=object, etc. I don't > think is necessarily a BLAS issue since wrapping comp. geo. libraries does > happen in a subset of cases to handle this, but if there's overlap in > utility you could pass it along I suppose. You might be interested in this discussion of "Batch BLAS": https://docs.google.com/document/d/1DY4ImZT1coqri2382GusXgBTTTVdBDvtD5I14QHp9OE/edit#heading=h.pvsif1mxvaqq I didn't get into it in the draft response, because it didn't seem like something where NumPy/SciPy have any useful experience to offer, but it sounds like there are people worrying about this case. -n -- Nathaniel J. Smith -- https://vorpus.org From andyfaff at gmail.com Wed Jan 10 17:58:41 2018 From: andyfaff at gmail.com (Andrew Nelson) Date: Thu, 11 Jan 2018 09:58:41 +1100 Subject: [Numpy-discussion] Understanding np.min behaviour with nan Message-ID: I'm having some trouble understanding the behaviour of np.min when used with NaN. In the documentation of np.amin it says: "NaN values are propagated, that is if at least one item is NaN, the corresponding min value will be NaN as well. ". It doesn't say that in some circumstances there will be RuntimeWarning's raised (presumably depending on the errstate). Consider the following: >>> import numpy as np >>> np.min([1, np.nan]) /Users/andrew/miniconda3/envs/dev3/lib/python3.6/site-packages/numpy/core/_methods.py:29: RuntimeWarning: invalid value encountered in reduce return umr_minimum(a, axis, None, out, keepdims) nan >>> np.min([1, 2, np.nan]) nan >>> np.min([np.nan, 1, 2]) nan >>> np.min([np.nan, 1]) nan Why is there a RuntimeWarning for the first example, but not the others? Is this expected behaviour? -- _____________________________________ Dr. Andrew Nelson _____________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From andyfaff at gmail.com Wed Jan 10 18:03:25 2018 From: andyfaff at gmail.com (Andrew Nelson) Date: Thu, 11 Jan 2018 10:03:25 +1100 Subject: [Numpy-discussion] Understanding np.min behaviour with nan In-Reply-To: References: Message-ID: Further to my last message, why is the warning only raised once? ``` >>> import numpy as np >>> np.min([1, np.nan]) /Users/andrew/miniconda3/envs/dev3/lib/python3.6/site-packages/numpy/core/_methods.py:29: RuntimeWarning: invalid value encountered in reduce return umr_minimum(a, axis, None, out, keepdims) nan >>> np.min([1, np.nan]) nan >>> ``` -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Jan 10 18:21:27 2018 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 10 Jan 2018 15:21:27 -0800 Subject: [Numpy-discussion] Understanding np.min behaviour with nan In-Reply-To: References: Message-ID: On Wed, Jan 10, 2018 at 3:03 PM, Andrew Nelson wrote: > > Further to my last message, why is the warning only raised once? > > ``` > >>> import numpy as np > >>> np.min([1, np.nan]) > /Users/andrew/miniconda3/envs/dev3/lib/python3.6/site-packages/numpy/core/_methods.py:29: RuntimeWarning: invalid value encountered in reduce > return umr_minimum(a, axis, None, out, keepdims) > nan > >>> np.min([1, np.nan]) > nan > >>> > ``` This is default behavior for warnings. https://docs.python.org/3/library/warnings.html#the-warnings-filter It also explains the previous results. The same warning would have been issued from the same place in each of the variations you tried. Since the warnings mechanism had already seen that RuntimeWarning with the same message from the same code location, they were not printed. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From gerrit.holl at gmail.com Wed Jan 10 18:21:57 2018 From: gerrit.holl at gmail.com (Gerrit Holl) Date: Wed, 10 Jan 2018 23:21:57 +0000 Subject: [Numpy-discussion] Understanding np.min behaviour with nan In-Reply-To: References: Message-ID: On 10 January 2018 at 23:03, Andrew Nelson wrote: > Further to my last message, why is the warning only raised once? Warnings are only raised once because that is the default behaviour for warnings. The warning is issued every time but only displayed on the first occurence. You can control this with the warnings module: https://docs.python.org/3.6/library/warnings.html Gerrit. From andyfaff at gmail.com Wed Jan 10 19:24:25 2018 From: andyfaff at gmail.com (Andrew Nelson) Date: Thu, 11 Jan 2018 11:24:25 +1100 Subject: [Numpy-discussion] Understanding np.min behaviour with nan In-Reply-To: References: Message-ID: > The same warning would have been issued from the same place in each of the variations you tried. That's not the case, I tried with np.min([1, 2, 3, np.nan]) in a fresh interpreter and no warning was raised. Furthermore on my work computer (conda python3.6.2 with pip installed numpy) I can't get the problem to show at all: >>> import numpy as np >>> np.version.version '1.14.0' >>> np.min([1., 2., 3., 4., np.nan]) nan >>> np.min([1., 2., 3., np.nan, 4.]) nan >>> np.min([1., 2., np.nan, 3., 4.]) nan >>> np.min([1., np.nan, 2., 3., 4.]) nan >>> np.min([np.nan, 1., 2., 3., 4.]) nan >>> np.min([np.nan, 1.]) nan >>> np.min([np.nan, 1., np.nan]) nan >>> np.min([1., np.nan]) nan >>> np.seterr(all='raise') {'divide': 'warn', 'over': 'warn', 'under': 'ignore', 'invalid': 'warn'} >>> np.min([1., np.nan]) nan >>> np.min([np.nan, 1.]) nan >>> np.min([np.nan, 1., 2., 3., 4.]) nan >>> np.min([np.nan, 1., 2., 3., 4.]) nan The context for these questions is the sudden CI fails I'm observing for scipy on appveyor - https://ci.appveyor.com/project/scipy/scipy/build/1.0.1444/job/n05ptntm0xxjklvt -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Jan 11 11:51:21 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 11 Jan 2018 09:51:21 -0700 Subject: [Numpy-discussion] Understanding np.min behaviour with nan In-Reply-To: References: Message-ID: On Wed, Jan 10, 2018 at 5:24 PM, Andrew Nelson wrote: > > The same warning would have been issued from the same place in each of > the variations you tried. > > That's not the case, I tried with np.min([1, 2, 3, np.nan]) in a fresh > interpreter and no warning was raised. > > Furthermore on my work computer (conda python3.6.2 with pip installed > numpy) I can't get the problem to show at all: > > >>> import numpy as np > >>> np.version.version > '1.14.0' > >>> np.min([1., 2., 3., 4., np.nan]) > nan > >>> np.min([1., 2., 3., np.nan, 4.]) > nan > >>> np.min([1., 2., np.nan, 3., 4.]) > nan > >>> np.min([1., np.nan, 2., 3., 4.]) > nan > >>> np.min([np.nan, 1., 2., 3., 4.]) > nan > >>> np.min([np.nan, 1.]) > nan > >>> np.min([np.nan, 1., np.nan]) > nan > >>> np.min([1., np.nan]) > nan > >>> np.seterr(all='raise') > {'divide': 'warn', 'over': 'warn', 'under': 'ignore', 'invalid': 'warn'} > >>> np.min([1., np.nan]) > nan > >>> np.min([np.nan, 1.]) > nan > >>> np.min([np.nan, 1., 2., 3., 4.]) > nan > >>> np.min([np.nan, 1., 2., 3., 4.]) > nan > > > The context for these questions is the sudden CI fails I'm observing for > scipy on appveyor - https://ci.appveyor.com/project/scipy/scipy/build/1.0. > 1444/job/n05ptntm0xxjklvt > Different compilers/libraries respond differently to nans. If the change in scipy is recent, probably compiler flags or something similar has changed in the test environment. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From spluque at gmail.com Fri Jan 12 16:12:16 2018 From: spluque at gmail.com (Seb) Date: Fri, 12 Jan 2018 15:12:16 -0600 Subject: [Numpy-discussion] custom Welch method for power spectral density Message-ID: <87po6e234f.fsf@gmail.com> Hello, I'm trying to compute a power spectral density of a signal, using the Welch method, in the broad sense; i.e. splitting the signal into segments for deriving smoother spectra. This is well implemented in scipy.signal.welch. However, I'd like to use exponentially increasing (power 2) segment length to dampen increasing variance in spectra at higher frequencies. Before hacking the scipy.signal.spectral module for this, I'd appreciate any tips on available packages/modules that allow for this kind of binning scheme, or other suggestions. Thanks, -- Seb From robert.kern at gmail.com Fri Jan 12 16:32:11 2018 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 12 Jan 2018 13:32:11 -0800 Subject: [Numpy-discussion] custom Welch method for power spectral density In-Reply-To: <87po6e234f.fsf@gmail.com> References: <87po6e234f.fsf@gmail.com> Message-ID: On Fri, Jan 12, 2018 at 1:12 PM, Seb wrote: > > Hello, > > I'm trying to compute a power spectral density of a signal, using the > Welch method, in the broad sense; i.e. splitting the signal into > segments for deriving smoother spectra. This is well implemented in > scipy.signal.welch. However, I'd like to use exponentially increasing > (power 2) segment length to dampen increasing variance in spectra at > higher frequencies. Before hacking the scipy.signal.spectral module for > this, I'd appreciate any tips on available packages/modules that allow > for this kind of binning scheme, or other suggestions. Not entirely sure about this kind of binning scheme per se, but you may want to look at multitaper spectral estimation methods. The Welch method can be viewed as a poor-man's multitaper. Multitaper methods give you better control over the resolution/variance tradeoff that may help with your problem. Googling for "python multitaper" gives you several options; I haven't used any of them in anger, so I don't have a single recommendation for you. The nitime documentation provides more information about multitaper methods that may be useful to you: http://nipy.org/nitime/examples/multi_taper_spectral_estimation.html -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From spluque at gmail.com Sat Jan 13 00:08:01 2018 From: spluque at gmail.com (Seb) Date: Fri, 12 Jan 2018 23:08:01 -0600 Subject: [Numpy-discussion] custom Welch method for power spectral density In-Reply-To: (Robert Kern's message of "Fri, 12 Jan 2018 13:32:11 -0800") References: <87po6e234f.fsf@gmail.com> Message-ID: <878td2ny6m.fsf@otaria.sebmel.org> On Fri, 12 Jan 2018 13:32:11 -0800, Robert Kern wrote: [...] > Not entirely sure about this kind of binning scheme per se, but you > may want to look at multitaper spectral estimation methods. The Welch > method can be viewed as a poor-man's multitaper. Multitaper methods > give you better control over the resolution/variance tradeoff that may > help with your problem. Googling for "python multitaper" gives you > several options; I haven't used any of them in anger, so I don't have > a single recommendation for you. The nitime documentation provides > more information about multitaper methods that may be useful to you: > http://nipy.org/nitime/examples/multi_taper_spectral_estimation.html Very interesting documentation and suggestion. A test application of this looks promising. Thanks, -- Seb From josef.pktd at gmail.com Sat Jan 13 16:25:43 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 13 Jan 2018 16:25:43 -0500 Subject: [Numpy-discussion] NumPy 1.14.0 release In-Reply-To: References: <4e07370e-51ff-03b7-848a-c9080cdb433f@gmail.com> Message-ID: statsmodels does not work with numpy 1.4.0 Besides the missing WarningsManager there seems to be 22 errors or failures from changes in numpy behavior, mainly from recarrays again. Josef From ben.v.root at gmail.com Sat Jan 13 16:49:31 2018 From: ben.v.root at gmail.com (Benjamin Root) Date: Sat, 13 Jan 2018 16:49:31 -0500 Subject: [Numpy-discussion] NumPy 1.14.0 release In-Reply-To: References: <4e07370e-51ff-03b7-848a-c9080cdb433f@gmail.com> Message-ID: I assume you mean 1.14.0, rather than 1.4.0? Did recarrays change? I didn't see anything in the release notes. On Sat, Jan 13, 2018 at 4:25 PM, wrote: > statsmodels does not work with numpy 1.4.0 > > Besides the missing WarningsManager there seems to be 22 errors or > failures from changes in numpy behavior, mainly from recarrays again. > > Josef > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat Jan 13 17:15:46 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 13 Jan 2018 17:15:46 -0500 Subject: [Numpy-discussion] NumPy 1.14.0 release In-Reply-To: References: <4e07370e-51ff-03b7-848a-c9080cdb433f@gmail.com> Message-ID: On Sat, Jan 13, 2018 at 4:49 PM, Benjamin Root wrote: > I assume you mean 1.14.0, rather than 1.4.0? Yes, typo > > Did recarrays change? I didn't see anything in the release notes. I didn't look at the details. thequackdaddy is doing the hunting. maybe it's the loadtxt interaction one problem is with converters https://github.com/statsmodels/statsmodels/pull/4205/files (statsmodels has too much "legacy" code from pre-pandas days.) Josef > > On Sat, Jan 13, 2018 at 4:25 PM, wrote: >> >> statsmodels does not work with numpy 1.4.0 >> >> Besides the missing WarningsManager there seems to be 22 errors or >> failures from changes in numpy behavior, mainly from recarrays again. >> >> Josef >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > From wieser.eric+numpy at gmail.com Sat Jan 13 22:35:39 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Sun, 14 Jan 2018 03:35:39 +0000 Subject: [Numpy-discussion] NumPy 1.14.0 release In-Reply-To: References: <4e07370e-51ff-03b7-848a-c9080cdb433f@gmail.com> Message-ID: Did recarrays change? I didn?t see anything in the release notes. Not directly, but structured arrays did , for which recarrays are really just a thin and somewhat buggy wrapper. ? On Sat, 13 Jan 2018 at 14:19 wrote: > On Sat, Jan 13, 2018 at 4:49 PM, Benjamin Root > wrote: > > I assume you mean 1.14.0, rather than 1.4.0? > > Yes, typo > > > > Did recarrays change? I didn't see anything in the release notes. > > I didn't look at the details. thequackdaddy is doing the hunting. > maybe it's the loadtxt interaction > one problem is with converters > https://github.com/statsmodels/statsmodels/pull/4205/files > > (statsmodels has too much "legacy" code from pre-pandas days.) > > Josef > > > > > > On Sat, Jan 13, 2018 at 4:25 PM, wrote: > >> > >> statsmodels does not work with numpy 1.4.0 > >> > >> Besides the missing WarningsManager there seems to be 22 errors or > >> failures from changes in numpy behavior, mainly from recarrays again. > >> > >> Josef > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at python.org > >> https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sun Jan 14 06:35:15 2018 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 14 Jan 2018 11:35:15 +0000 Subject: [Numpy-discussion] NumPy 1.14.0 release In-Reply-To: References: <4e07370e-51ff-03b7-848a-c9080cdb433f@gmail.com> Message-ID: Hi, On Sun, Jan 14, 2018 at 3:35 AM, Eric Wieser wrote: > Did recarrays change? I didn?t see anything in the release notes. > > Not directly, but structured arrays did, for which recarrays are really just > a thin and somewhat buggy wrapper. Oh dear oh dear - for some reason I had completely missed these changes, and the justification for them. They do exactly the kind of thing that Konrad Hinsen was complaining about before, with justification, which is to change the behavior of previous code, without an intervening (long) period of raising an error. In this case, the benefits of these changes seem small, compared to the inevitable breakage and silently changed results they will cause. Is there any chance of reversing them? Cheers, Matthew From sebastian at sipsolutions.net Sun Jan 14 07:10:29 2018 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 14 Jan 2018 13:10:29 +0100 Subject: [Numpy-discussion] NumPy 1.14.0 release In-Reply-To: References: <4e07370e-51ff-03b7-848a-c9080cdb433f@gmail.com> Message-ID: <1515931829.23458.5.camel@sipsolutions.net> On Sun, 2018-01-14 at 11:35 +0000, Matthew Brett wrote: > Hi, > > On Sun, Jan 14, 2018 at 3:35 AM, Eric Wieser > wrote: > > Did recarrays change? I didn?t see anything in the release notes. > > > > Not directly, but structured arrays did, for which recarrays are > > really just > > a thin and somewhat buggy wrapper. > > Oh dear oh dear - for some reason I had completely missed these > changes, and the justification for them. > > They do exactly the kind of thing that Konrad Hinsen was complaining > about before, with justification, which is to change the behavior of > previous code, without an intervening (long) period of raising an > error. In this case, the benefits of these changes seem small, > compared to the inevitable breakage and silently changed results they > will cause. > > Is there any chance of reversing them? > Without knowing the change, there is always a chance of (temporary) reversal and for unexpected complications its probably the safest default if there is no agreement anyway. - Sebastian > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: This is a digitally signed message part URL: From charlesr.harris at gmail.com Sun Jan 14 11:30:48 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 14 Jan 2018 09:30:48 -0700 Subject: [Numpy-discussion] NumPy 1.14.0 release In-Reply-To: References: <4e07370e-51ff-03b7-848a-c9080cdb433f@gmail.com> Message-ID: On Sun, Jan 14, 2018 at 4:35 AM, Matthew Brett wrote: > Hi, > > On Sun, Jan 14, 2018 at 3:35 AM, Eric Wieser > wrote: > > Did recarrays change? I didn?t see anything in the release notes. > > > > Not directly, but structured arrays did, for which recarrays are really > just > > a thin and somewhat buggy wrapper. > > Oh dear oh dear - for some reason I had completely missed these > changes, and the justification for them. > See https://github.com/numpy/numpy/pull/6053. It actually goes back a couple of years. > > They do exactly the kind of thing that Konrad Hinsen was complaining > about before, with justification, which is to change the behavior of > previous code, without an intervening (long) period of raising an > error. In this case, the benefits of these changes seem small, > compared to the inevitable breakage and silently changed results they > will cause. > > Is there any chance of reversing them? > Maybe, we'll see how things go. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Sun Jan 14 12:34:28 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Sun, 14 Jan 2018 12:34:28 -0500 Subject: [Numpy-discussion] NumPy 1.14.0 release In-Reply-To: References: <4e07370e-51ff-03b7-848a-c9080cdb433f@gmail.com> Message-ID: <7ab70d5e-bd6a-0ad7-963d-b489e6c4320e@gmail.com> On 01/14/2018 11:30 AM, Charles R Harris wrote: > > > On Sun, Jan 14, 2018 at 4:35 AM, Matthew Brett > wrote: > > Hi, > > On Sun, Jan 14, 2018 at 3:35 AM, Eric Wieser > > > wrote: > > Did recarrays change? I didn?t see anything in the release notes. > > > > Not directly, but structured arrays did, for which recarrays are really just > > a thin and somewhat buggy wrapper. > > Oh dear oh dear - for some reason I had completely missed these > changes, and the justification for them. > > > See https://github.com/numpy/numpy/pull/6053. It actually goes back a > couple of years. > > > They do exactly the kind of thing that Konrad Hinsen was complaining > about before, with justification, which is to change the behavior of > previous code, without an intervening (long) period of raising an > error.? In this case, the benefits of these changes seem small, > compared to the inevitable breakage and silently changed results they > will cause. > > Is there any chance of reversing them? Of course the goal was to make things backwards-compatible; If some part of the changes is breaking a lot of code we will revert, or find a way to stop breaking code. It's not yet clear to me which exact changes are causing the problem. The statsmodel failures may be related to what we are working on in one of these different issues: https://github.com/numpy/numpy/issues/10344 https://github.com/numpy/numpy/issues/10387 https://github.com/numpy/numpy/issues/10394 I will be checking the statsmodel unit tests as we fix things, to make sure they pass in the end. Allan From charlesr.harris at gmail.com Sun Jan 14 12:34:36 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 14 Jan 2018 10:34:36 -0700 Subject: [Numpy-discussion] NumPy 1.14.0 release In-Reply-To: References: <4e07370e-51ff-03b7-848a-c9080cdb433f@gmail.com> Message-ID: On Sun, Jan 14, 2018 at 9:30 AM, Charles R Harris wrote: > > > On Sun, Jan 14, 2018 at 4:35 AM, Matthew Brett > wrote: > >> Hi, >> >> On Sun, Jan 14, 2018 at 3:35 AM, Eric Wieser >> wrote: >> > Did recarrays change? I didn?t see anything in the release notes. >> > >> > Not directly, but structured arrays did, for which recarrays are really >> just >> > a thin and somewhat buggy wrapper. >> >> Oh dear oh dear - for some reason I had completely missed these >> changes, and the justification for them. >> > > See https://github.com/numpy/numpy/pull/6053. It actually goes back a > couple of years. > And I wonder how many of these are related to https://github.com/numpy/numpy/issues/10344, which came about because there was previously an attempt to fix up malformed inputs. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From deak.andris at gmail.com Thu Jan 18 10:54:48 2018 From: deak.andris at gmail.com (Andras Deak) Date: Thu, 18 Jan 2018 16:54:48 +0100 Subject: [Numpy-discussion] building numpy with python3.7 In-Reply-To: <1125979235.6298287.1513757752706.JavaMail.zimbra@saao.ac.za> References: <1237981928.6063979.1513267195600.JavaMail.zimbra@saao.ac.za> <280195652.6103380.1513332661356.JavaMail.zimbra@saao.ac.za> <389896254.6106478.1513334545873.JavaMail.zimbra@saao.ac.za> <1524113059.6201087.1513592408374.JavaMail.zimbra@saao.ac.za> <1125979235.6298287.1513757752706.JavaMail.zimbra@saao.ac.za> Message-ID: Hello, After failing with several attempts to build numpy on python 3.7.0a4, the combo that worked with pip was cython 0.28a0 (current master) numpy 1.15.0.dev0 (current master) in a fresh, clean venv. Older cython (0.27.3 where the aforementioned issue seems to have been solved https://github.com/cython/cython/issues/1955) didn't help. Stable numpy 1.14.0 didn't seem to work even with cython master. This seems to be the same setup that Thomas mentioned, I mostly want to note that anything less doesn't seem to compile yet. Andr?s On Wed, Dec 20, 2017 at 9:15 AM, Hannes Breytenbach wrote: > Hi Chuck > > I'm using Cython 0.28a0. > > Hannes > > ________________________________ > From: "Charles R Harris" > To: "Discussion of Numerical Python" > Sent: Tuesday, December 19, 2017 10:01:40 PM > Subject: Re: [Numpy-discussion] building numpy with python3.7 > > > > On Mon, Dec 18, 2017 at 3:20 AM, Hannes Breytenbach > wrote: >> >> OK, thanks for the link to the issue. I'm not using virtual environments >> - I built python3.7 (and 2.7) with the `make altinstall` method. I managed >> to get the numpy build working on 3.7 by removing the >> `random/mtrand/mtrand.c` file so that it gets (re-)generated during the >> build using the latest cython version. >> >> Thanks for the help! > > > Just to be sure, which Cython version is that? > > Chuck > > !DSPAM:5a39704114261779167816! > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > > !DSPAM:5a39704114261779167816! > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > From charlesr.harris at gmail.com Thu Jan 18 12:19:26 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 18 Jan 2018 10:19:26 -0700 Subject: [Numpy-discussion] building numpy with python3.7 In-Reply-To: References: <1237981928.6063979.1513267195600.JavaMail.zimbra@saao.ac.za> <280195652.6103380.1513332661356.JavaMail.zimbra@saao.ac.za> <389896254.6106478.1513334545873.JavaMail.zimbra@saao.ac.za> <1524113059.6201087.1513592408374.JavaMail.zimbra@saao.ac.za> <1125979235.6298287.1513757752706.JavaMail.zimbra@saao.ac.za> Message-ID: On Thu, Jan 18, 2018 at 8:54 AM, Andras Deak wrote: > Hello, > > After failing with several attempts to build numpy on python 3.7.0a4, > the combo that worked with pip was > cython 0.28a0 (current master) > numpy 1.15.0.dev0 (current master) > in a fresh, clean venv. > Older cython (0.27.3 where the aforementioned issue seems to have been > solved https://github.com/cython/cython/issues/1955) didn't help. > Stable numpy 1.14.0 didn't seem to work even with cython master. > This seems to be the same setup that Thomas mentioned, I mostly want > to note that anything less doesn't seem to compile yet. Where did you get 1.14.0 source? If it was with pip, it was generated using cython 0.26.1. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From deak.andris at gmail.com Thu Jan 18 12:34:49 2018 From: deak.andris at gmail.com (Andras Deak) Date: Thu, 18 Jan 2018 18:34:49 +0100 Subject: [Numpy-discussion] building numpy with python3.7 In-Reply-To: References: <1237981928.6063979.1513267195600.JavaMail.zimbra@saao.ac.za> <280195652.6103380.1513332661356.JavaMail.zimbra@saao.ac.za> <389896254.6106478.1513334545873.JavaMail.zimbra@saao.ac.za> <1524113059.6201087.1513592408374.JavaMail.zimbra@saao.ac.za> <1125979235.6298287.1513757752706.JavaMail.zimbra@saao.ac.za> Message-ID: On Thursday, January 18, 2018, Charles R Harris wrote: > > > On Thu, Jan 18, 2018 at 8:54 AM, Andras Deak wrote: >> >> Hello, >> >> After failing with several attempts to build numpy on python 3.7.0a4, >> the combo that worked with pip was >> cython 0.28a0 (current master) >> numpy 1.15.0.dev0 (current master) >> in a fresh, clean venv. >> Older cython (0.27.3 where the aforementioned issue seems to have been >> solved https://github.com/cython/cython/issues/1955) didn't help. >> Stable numpy 1.14.0 didn't seem to work even with cython master. >> This seems to be the same setup that Thomas mentioned, I mostly want >> to note that anything less doesn't seem to compile yet. > > Where did you get 1.14.0 source? If it was with pip, it was generated using cython 0.26.1. > Chuck Yes, I did with my last attempts (in earlier attempts I did pull earlier versions from github but only before I realized I had to upgrade cython altogether). So that's probably it; sorry, I had no understanding of the workings of pip. I'll retry with cython stable and numpy from source, and only post again if I find something surprising in order to reduce further noise. Thanks, Andr?s -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Jan 19 05:30:05 2018 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 19 Jan 2018 10:30:05 +0000 Subject: [Numpy-discussion] Behavior of rint? Message-ID: Hi, Sorry for my confusion, but I noticed (as a result of the discussion here [1]) that np.rint and the fallback C function [2] seem to round to even. But - my impression was that C rint, by default, rounds down [3]. Is numpy rint not behaving the same way as the GNU C library rint? In [4]: np.rint(np.arange(0.5, 11)) Out[4]: array([ 0., 2., 2., 4., 4., 6., 6., 8., 8., 10., 10.]) In [5]: np.round(np.arange(0.5, 11)) Out[5]: array([ 0., 2., 2., 4., 4., 6., 6., 8., 8., 10., 10.]) Cheers, Matthew [1] https://github.com/nipy/dipy/issues/1402 [2] https://github.com/numpy/numpy/blob/master/numpy/core/src/npymath/npy_math_internal.h.src#L290 [3] https://www.gnu.org/software/libc/manual/html_node/Rounding-Functions.html From charlesr.harris at gmail.com Fri Jan 19 08:41:15 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 19 Jan 2018 06:41:15 -0700 Subject: [Numpy-discussion] Behavior of rint? In-Reply-To: References: Message-ID: On Fri, Jan 19, 2018 at 3:30 AM, Matthew Brett wrote: > Hi, > > Sorry for my confusion, but I noticed (as a result of the discussion > here [1]) that np.rint and the fallback C function [2] seem to round > to even. But - my impression was that C rint, by default, rounds down > [3]. Is numpy rint not behaving the same way as the GNU C library > rint? > > In [4]: np.rint(np.arange(0.5, 11)) > Out[4]: array([ 0., 2., 2., 4., 4., 6., 6., 8., 8., 10., 10.]) > > In [5]: np.round(np.arange(0.5, 11)) > Out[5]: array([ 0., 2., 2., 4., 4., 6., 6., 8., 8., 10., 10.]) > The GNU C documentation says that rint "round(s) x to an integer value according to the current rounding mode." The rounding mode is determined by settings in the FPU control word. Numpy runs with it set to round to even, although, IIRC, there is a bug on windows where the library is not setting those bits correctly. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jan 19 08:51:28 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 19 Jan 2018 06:51:28 -0700 Subject: [Numpy-discussion] Behavior of rint? In-Reply-To: References: Message-ID: On Fri, Jan 19, 2018 at 6:41 AM, Charles R Harris wrote: > > > On Fri, Jan 19, 2018 at 3:30 AM, Matthew Brett > wrote: > >> Hi, >> >> Sorry for my confusion, but I noticed (as a result of the discussion >> here [1]) that np.rint and the fallback C function [2] seem to round >> to even. But - my impression was that C rint, by default, rounds down >> [3]. Is numpy rint not behaving the same way as the GNU C library >> rint? >> >> In [4]: np.rint(np.arange(0.5, 11)) >> Out[4]: array([ 0., 2., 2., 4., 4., 6., 6., 8., 8., 10., 10.]) >> >> In [5]: np.round(np.arange(0.5, 11)) >> Out[5]: array([ 0., 2., 2., 4., 4., 6., 6., 8., 8., 10., 10.]) >> > > The GNU C documentation says that rint "round(s) x to an integer value > according to the current rounding mode." The rounding mode is determined by > settings in the FPU control word. Numpy runs with it set to round to even, > although, IIRC, there is a bug on windows where the library is not setting > those bits correctly. > Round to even is also the Python default rounding mode. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Jan 19 09:48:31 2018 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 19 Jan 2018 14:48:31 +0000 Subject: [Numpy-discussion] Behavior of rint? In-Reply-To: References: Message-ID: Hi Chuck, Thanks for the replies, they are very helpful. On Fri, Jan 19, 2018 at 1:51 PM, Charles R Harris wrote: > > > On Fri, Jan 19, 2018 at 6:41 AM, Charles R Harris > wrote: >> >> >> >> On Fri, Jan 19, 2018 at 3:30 AM, Matthew Brett >> wrote: >>> >>> Hi, >>> >>> Sorry for my confusion, but I noticed (as a result of the discussion >>> here [1]) that np.rint and the fallback C function [2] seem to round >>> to even. But - my impression was that C rint, by default, rounds down >>> [3]. Is numpy rint not behaving the same way as the GNU C library >>> rint? >>> >>> In [4]: np.rint(np.arange(0.5, 11)) >>> Out[4]: array([ 0., 2., 2., 4., 4., 6., 6., 8., 8., 10., 10.]) >>> >>> In [5]: np.round(np.arange(0.5, 11)) >>> Out[5]: array([ 0., 2., 2., 4., 4., 6., 6., 8., 8., 10., 10.]) >> >> >> The GNU C documentation says that rint "round(s) x to an integer value >> according to the current rounding mode." The rounding mode is determined by >> settings in the FPU control word. Numpy runs with it set to round to even, >> although, IIRC, there is a bug on windows where the library is not setting >> those bits correctly. > > > Round to even is also the Python default rounding mode. Do you mean that it is Python setting the FPU control word? Or do we set it? Do you happen to know where that is in the source? I did a quick grep just now without anything obvious. Cheers, Matthew From robert.kern at gmail.com Fri Jan 19 09:55:57 2018 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 19 Jan 2018 23:55:57 +0900 Subject: [Numpy-discussion] Moving NumPy's PRNG Forward Message-ID: tl;dr: I think that our stream-compatibility policy is holding us back, and I think we can come up with a way forward with a new policy that will allow us to innovate without seriously compromising our reliability. To recap, our current policy for numpy.random is that we guarantee that the stream of random numbers from any of the methods of a seeded `RandomState` does not change from version to version (at least on the same hardware, OS, compiler, etc.), except in the case where we are fixing correctness bugs. That is, for code like this: prng = np.random.RandomState(12345) x = prng.normal(10.0, 3.0, size=100) `x` will be the same exact floats regardless of what version of numpy was installed. There seems to be a lot of pent-up motivation to improve on the random number generation, in particular the distributions, that has been blocked by our policy. I think we've lost a few potential first-time contributors that have run up against this wall. We have been pondering ways to allow for adding new core PRNGs and improve the distribution methods while maintaining stream-compatibility for existing code. Kevin Sheppard, in particular, has been working hard to implement new core PRNGs with a common API. https://github.com/bashtage/ng-numpy-randomstate Kevin has also been working to implement the several proposals that have been made to select different versions of distribution implementations. In particular, one idea is to pass something to the RandomState constructor to select a specific version of distributions (or switch out the core PRNG). Note that to satisfy the policy, the simplest method of seeding a RandomState will always give you the oldest version: what we have now. Kevin has recently come to the conclusion that it's not technically feasible to add the version-selection at all if we keep the stream-compatibility policy. https://github.com/numpy/numpy/pull/10124#issuecomment-350876221 I would argue that our current policy isn't providing the value that it claims to. In the first place, there are substantial holes in the reproducibility of streams across platforms. All of the integers (internal and external) are C `long`s, so integer overflows can cause variable streams if you use any of the rejection algorithms involving integers across Windows and Linux. Plain-old floating point arithmetic differences between platforms can cause similar issues (though rarer). Our policy of fixing bugs interferes with strict reproducibility. And our changes to non-random routines can interfere with the ability to reproduce the results of the whole software, independent of the PRNG stream. The multivariate normal implementation is even more vulnerable, as it uses `np.linalg` routines that may be affected by which LAPACK library numpy is built against much less changes that we might make to them in the normal course of development. At the time I established the policy (2008-9), there was significantly less tooling around for pinning versions of software. The PyPI/pip/setuptools ecosystem was in its infancy, VMs were slow cumbersome beasts mostly used to run Windows programs unavailable on Linux, and containerization a la Docker was merely a dream. A lot of resources have been put into reproducible research since then that pins the whole stack from OS libraries on up. The need to have stream-compatibility across numpy versions for the purpose of reproducible research is much diminished. I think that we can relax the strict stream-compatibility policy to allow innovation without giving up much practically-usable stability. Let's compare with Python's policy: https://docs.python.org/3.6/library/random.html#notes-on-reproducibility """ Most of the random module?s algorithms and seeding functions are subject to change across Python versions, but two aspects are guaranteed not to change: * If a new seeding method is added, then a backward compatible seeder will be offered. * The generator?s random() method will continue to produce the same sequence when the compatible seeder is given the same seed. """ I propose that we adopt a similar policy. This would immediately resolve many of the issues blocking innovation in the random distributions. Improvements to the distributions could be made at the same rhythm as normal features. No version-selection API would be required as you select the version by installing the desired version of numpy. By default, everyone gets the latest, best versions of the sampling algorithms. Selecting a different core PRNG could be easily achieved as ng-numpy-randomstate does it, by instantiating different classes. The different incompatible ways to initialize different core PRNGs (with unique features like selectable streams and the like) are transparently handled: different classes have different constructors. There is no need to jam all options for all core PRNGs into a single constructor. I would add a few more of the simpler distribution methods to the list that *is* guaranteed to remain stream-compatible, probably `randint()` and `bytes()` and maybe a couple of others. I would appreciate input on the matter. The current API should remain available and working, but not necessarily with the same algorithms. That is, for code like the following: prng = np.random.RandomState(12345) x = prng.normal(10.0, 3.0, size=100) `x` is still guaranteed to be 100 normal variates with the appropriate mean and standard deviation, but they might be computed by the ziggurat method from PCG-generated bytes (or whatever new default core PRNG we have). As an alternative, we may also want to leave `np.random.RandomState` entirely fixed in place as deprecated legacy code that is never updated. This would allow current unit tests that depend on the stream-compatibility that we previously promised to still pass until they decide to update. Development would move to a different class hierarchy with new names. I am personally not at all interested in preserving any stream compatibility for the `numpy.random.*` aliases or letting the user swap out the core PRNG for the global PRNG that underlies them. `np.random.seed()` should be discouraged (if not outright deprecated) in favor of explicitly passing around instances. In any case, we have a lot of different options to discuss if we decide to relax our stream-compatibility policy. At the moment, I'm not pushing for any particular changes to the code, just the policy in order to enable a more wide-ranging field of options that we have been able to work with so far. Thanks. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From madsipsen at gmail.com Fri Jan 19 09:56:48 2018 From: madsipsen at gmail.com (Mads Ipsen) Date: Fri, 19 Jan 2018 15:56:48 +0100 Subject: [Numpy-discussion] Behavior of rint? In-Reply-To: References: Message-ID: I am confused . Shouldn't rint round to nearest integer. http://en.cppreference.com/w/cpp/numeric/math/rint Regards Mads On Jan 19, 2018 15:50, "Matthew Brett" wrote: > Hi Chuck, > > Thanks for the replies, they are very helpful. > > On Fri, Jan 19, 2018 at 1:51 PM, Charles R Harris > wrote: > > > > > > On Fri, Jan 19, 2018 at 6:41 AM, Charles R Harris > > wrote: > >> > >> > >> > >> On Fri, Jan 19, 2018 at 3:30 AM, Matthew Brett > > >> wrote: > >>> > >>> Hi, > >>> > >>> Sorry for my confusion, but I noticed (as a result of the discussion > >>> here [1]) that np.rint and the fallback C function [2] seem to round > >>> to even. But - my impression was that C rint, by default, rounds down > >>> [3]. Is numpy rint not behaving the same way as the GNU C library > >>> rint? > >>> > >>> In [4]: np.rint(np.arange(0.5, 11)) > >>> Out[4]: array([ 0., 2., 2., 4., 4., 6., 6., 8., 8., 10., 10.]) > >>> > >>> In [5]: np.round(np.arange(0.5, 11)) > >>> Out[5]: array([ 0., 2., 2., 4., 4., 6., 6., 8., 8., 10., 10.]) > >> > >> > >> The GNU C documentation says that rint "round(s) x to an integer value > >> according to the current rounding mode." The rounding mode is > determined by > >> settings in the FPU control word. Numpy runs with it set to round to > even, > >> although, IIRC, there is a bug on windows where the library is not > setting > >> those bits correctly. > > > > > > Round to even is also the Python default rounding mode. > > Do you mean that it is Python setting the FPU control word? Or do we > set it? Do you happen to know where that is in the source? I did a > quick grep just now without anything obvious. > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Fri Jan 19 10:03:39 2018 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 20 Jan 2018 00:03:39 +0900 Subject: [Numpy-discussion] Behavior of rint? In-Reply-To: References: Message-ID: On Fri, Jan 19, 2018 at 11:56 PM, Mads Ipsen wrote: > > I am confused . Shouldn't rint round to nearest integer. http://en.cppreference.com/w/cpp/numeric/math/rint It does. Matthew was asking specifically about its behavior when it is rounding numbers ending in .5, not the general case. Since that's the only real difference between rounding routines, we often recognize that from the context and speak in a somewhat elliptical way (i.e. just "round-to-even" instead of "round to the nearest integer and rounding numbers ending in .5 to the nearest even integer"). -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jan 19 10:24:29 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 19 Jan 2018 08:24:29 -0700 Subject: [Numpy-discussion] Behavior of rint? In-Reply-To: References: Message-ID: On Fri, Jan 19, 2018 at 7:48 AM, Matthew Brett wrote: > Hi Chuck, > > Thanks for the replies, they are very helpful. > > On Fri, Jan 19, 2018 at 1:51 PM, Charles R Harris > wrote: > > > > > > On Fri, Jan 19, 2018 at 6:41 AM, Charles R Harris > > wrote: > >> > >> > >> > >> On Fri, Jan 19, 2018 at 3:30 AM, Matthew Brett > > >> wrote: > >>> > >>> Hi, > >>> > >>> Sorry for my confusion, but I noticed (as a result of the discussion > >>> here [1]) that np.rint and the fallback C function [2] seem to round > >>> to even. But - my impression was that C rint, by default, rounds down > >>> [3]. Is numpy rint not behaving the same way as the GNU C library > >>> rint? > >>> > >>> In [4]: np.rint(np.arange(0.5, 11)) > >>> Out[4]: array([ 0., 2., 2., 4., 4., 6., 6., 8., 8., 10., 10.]) > >>> > >>> In [5]: np.round(np.arange(0.5, 11)) > >>> Out[5]: array([ 0., 2., 2., 4., 4., 6., 6., 8., 8., 10., 10.]) > >> > >> > >> The GNU C documentation says that rint "round(s) x to an integer value > >> according to the current rounding mode." The rounding mode is > determined by > >> settings in the FPU control word. Numpy runs with it set to round to > even, > >> although, IIRC, there is a bug on windows where the library is not > setting > >> those bits correctly. > > > > > > Round to even is also the Python default rounding mode. > > Do you mean that it is Python setting the FPU control word? Or do we > set it? Do you happen to know where that is in the source? I did a > quick grep just now without anything obvious. > I can't find official (PEP) documentation, but googling indicates that in Python 3, `round` rounds to even, and in Python 2 it rounds up. See also https://docs.python.org/3/whatsnew/3.0.html. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Jan 19 10:27:26 2018 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 19 Jan 2018 15:27:26 +0000 Subject: [Numpy-discussion] Behavior of rint? In-Reply-To: References: Message-ID: Hi, On Fri, Jan 19, 2018 at 3:24 PM, Charles R Harris wrote: > > > On Fri, Jan 19, 2018 at 7:48 AM, Matthew Brett > wrote: >> >> Hi Chuck, >> >> Thanks for the replies, they are very helpful. >> >> On Fri, Jan 19, 2018 at 1:51 PM, Charles R Harris >> wrote: >> > >> > >> > On Fri, Jan 19, 2018 at 6:41 AM, Charles R Harris >> > wrote: >> >> >> >> >> >> >> >> On Fri, Jan 19, 2018 at 3:30 AM, Matthew Brett >> >> >> >> wrote: >> >>> >> >>> Hi, >> >>> >> >>> Sorry for my confusion, but I noticed (as a result of the discussion >> >>> here [1]) that np.rint and the fallback C function [2] seem to round >> >>> to even. But - my impression was that C rint, by default, rounds down >> >>> [3]. Is numpy rint not behaving the same way as the GNU C library >> >>> rint? >> >>> >> >>> In [4]: np.rint(np.arange(0.5, 11)) >> >>> Out[4]: array([ 0., 2., 2., 4., 4., 6., 6., 8., 8., 10., 10.]) >> >>> >> >>> In [5]: np.round(np.arange(0.5, 11)) >> >>> Out[5]: array([ 0., 2., 2., 4., 4., 6., 6., 8., 8., 10., 10.]) >> >> >> >> >> >> The GNU C documentation says that rint "round(s) x to an integer value >> >> according to the current rounding mode." The rounding mode is >> >> determined by >> >> settings in the FPU control word. Numpy runs with it set to round to >> >> even, >> >> although, IIRC, there is a bug on windows where the library is not >> >> setting >> >> those bits correctly. >> > >> > >> > Round to even is also the Python default rounding mode. >> >> Do you mean that it is Python setting the FPU control word? Or do we >> set it? Do you happen to know where that is in the source? I did a >> quick grep just now without anything obvious. > > > I can't find official (PEP) documentation, but googling indicates that in > Python 3, `round` rounds to even, and in Python 2 it rounds up. See also > https://docs.python.org/3/whatsnew/3.0.html. But I guess this could be the Python implementation of round, rather than rint and the FPU control word? I'm asking because the question arose about npy_rint at the C level ... Cheers, Matthew From charlesr.harris at gmail.com Fri Jan 19 10:29:38 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 19 Jan 2018 08:29:38 -0700 Subject: [Numpy-discussion] Behavior of rint? In-Reply-To: References: Message-ID: On Fri, Jan 19, 2018 at 8:24 AM, Charles R Harris wrote: > > > On Fri, Jan 19, 2018 at 7:48 AM, Matthew Brett > wrote: > >> Hi Chuck, >> >> Thanks for the replies, they are very helpful. >> >> On Fri, Jan 19, 2018 at 1:51 PM, Charles R Harris >> wrote: >> > >> > >> > On Fri, Jan 19, 2018 at 6:41 AM, Charles R Harris >> > wrote: >> >> >> >> >> >> >> >> On Fri, Jan 19, 2018 at 3:30 AM, Matthew Brett < >> matthew.brett at gmail.com> >> >> wrote: >> >>> >> >>> Hi, >> >>> >> >>> Sorry for my confusion, but I noticed (as a result of the discussion >> >>> here [1]) that np.rint and the fallback C function [2] seem to round >> >>> to even. But - my impression was that C rint, by default, rounds down >> >>> [3]. Is numpy rint not behaving the same way as the GNU C library >> >>> rint? >> >>> >> >>> In [4]: np.rint(np.arange(0.5, 11)) >> >>> Out[4]: array([ 0., 2., 2., 4., 4., 6., 6., 8., 8., 10., 10.]) >> >>> >> >>> In [5]: np.round(np.arange(0.5, 11)) >> >>> Out[5]: array([ 0., 2., 2., 4., 4., 6., 6., 8., 8., 10., 10.]) >> >> >> >> >> >> The GNU C documentation says that rint "round(s) x to an integer value >> >> according to the current rounding mode." The rounding mode is >> determined by >> >> settings in the FPU control word. Numpy runs with it set to round to >> even, >> >> although, IIRC, there is a bug on windows where the library is not >> setting >> >> those bits correctly. >> > >> > >> > Round to even is also the Python default rounding mode. >> >> Do you mean that it is Python setting the FPU control word? Or do we >> set it? Do you happen to know where that is in the source? I did a >> quick grep just now without anything obvious. >> > > I can't find official (PEP) documentation, but googling indicates that in > Python 3, `round` rounds to even, and in Python 2 it rounds up. See also > https://docs.python.org/3/whatsnew/3.0.html. > Note that this applies to floating point operations in general where there are usually a couple of extra guard bits for the mantissa, and the mantissa is the result of rounding to even. The reason for this choice is that always rounding in the same direction leads to a small consistent bias, whereas rounding to even effectively randomizes the error. One grows order N, the other as the square root. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jan 19 10:45:36 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 19 Jan 2018 08:45:36 -0700 Subject: [Numpy-discussion] Behavior of rint? In-Reply-To: References: Message-ID: On Fri, Jan 19, 2018 at 8:27 AM, Matthew Brett wrote: > Hi, > > On Fri, Jan 19, 2018 at 3:24 PM, Charles R Harris > wrote: > > > > > > On Fri, Jan 19, 2018 at 7:48 AM, Matthew Brett > > wrote: > >> > >> Hi Chuck, > >> > >> Thanks for the replies, they are very helpful. > >> > >> On Fri, Jan 19, 2018 at 1:51 PM, Charles R Harris > >> wrote: > >> > > >> > > >> > On Fri, Jan 19, 2018 at 6:41 AM, Charles R Harris > >> > wrote: > >> >> > >> >> > >> >> > >> >> On Fri, Jan 19, 2018 at 3:30 AM, Matthew Brett > >> >> > >> >> wrote: > >> >>> > >> >>> Hi, > >> >>> > >> >>> Sorry for my confusion, but I noticed (as a result of the discussion > >> >>> here [1]) that np.rint and the fallback C function [2] seem to round > >> >>> to even. But - my impression was that C rint, by default, rounds > down > >> >>> [3]. Is numpy rint not behaving the same way as the GNU C library > >> >>> rint? > >> >>> > >> >>> In [4]: np.rint(np.arange(0.5, 11)) > >> >>> Out[4]: array([ 0., 2., 2., 4., 4., 6., 6., 8., 8., 10., > 10.]) > >> >>> > >> >>> In [5]: np.round(np.arange(0.5, 11)) > >> >>> Out[5]: array([ 0., 2., 2., 4., 4., 6., 6., 8., 8., 10., > 10.]) > >> >> > >> >> > >> >> The GNU C documentation says that rint "round(s) x to an integer > value > >> >> according to the current rounding mode." The rounding mode is > >> >> determined by > >> >> settings in the FPU control word. Numpy runs with it set to round to > >> >> even, > >> >> although, IIRC, there is a bug on windows where the library is not > >> >> setting > >> >> those bits correctly. > >> > > >> > > >> > Round to even is also the Python default rounding mode. > >> > >> Do you mean that it is Python setting the FPU control word? Or do we > >> set it? Do you happen to know where that is in the source? I did a > >> quick grep just now without anything obvious. > > > > > > I can't find official (PEP) documentation, but googling indicates that in > > Python 3, `round` rounds to even, and in Python 2 it rounds up. See also > > https://docs.python.org/3/whatsnew/3.0.html. > > But I guess this could be the Python implementation of round, rather > than rint and the FPU control word? I'm asking because the question > arose about npy_rint at the C level ... > I'm pretty sure Python sets the FPU control word. Note that Python itself doesn't have a public interface for setting it, nor does Java. The GNU library documentation has the following: Round to nearest. This is the default mode. It should be used unless there is a specific need for one of the others. In this mode results are rounded to the nearest representable value. If the result is midway between two representable values, the even representable is chosen. *Even* here means the lowest-order bit is zero. This rounding mode prevents statistical bias and guarantees numeric stability: round-off errors in a lengthy calculation will remain smaller than half of FLT_EPSILON. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jan 19 10:50:55 2018 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 19 Jan 2018 08:50:55 -0700 Subject: [Numpy-discussion] Behavior of rint? In-Reply-To: References: Message-ID: On Fri, Jan 19, 2018 at 8:45 AM, Charles R Harris wrote: > > > On Fri, Jan 19, 2018 at 8:27 AM, Matthew Brett > wrote: > >> Hi, >> >> On Fri, Jan 19, 2018 at 3:24 PM, Charles R Harris >> wrote: >> > >> > >> > On Fri, Jan 19, 2018 at 7:48 AM, Matthew Brett > > >> > wrote: >> >> >> >> Hi Chuck, >> >> >> >> Thanks for the replies, they are very helpful. >> >> >> >> On Fri, Jan 19, 2018 at 1:51 PM, Charles R Harris >> >> wrote: >> >> > >> >> > >> >> > On Fri, Jan 19, 2018 at 6:41 AM, Charles R Harris >> >> > wrote: >> >> >> >> >> >> >> >> >> >> >> >> On Fri, Jan 19, 2018 at 3:30 AM, Matthew Brett >> >> >> >> >> >> wrote: >> >> >>> >> >> >>> Hi, >> >> >>> >> >> >>> Sorry for my confusion, but I noticed (as a result of the >> discussion >> >> >>> here [1]) that np.rint and the fallback C function [2] seem to >> round >> >> >>> to even. But - my impression was that C rint, by default, rounds >> down >> >> >>> [3]. Is numpy rint not behaving the same way as the GNU C library >> >> >>> rint? >> >> >>> >> >> >>> In [4]: np.rint(np.arange(0.5, 11)) >> >> >>> Out[4]: array([ 0., 2., 2., 4., 4., 6., 6., 8., 8., 10., >> 10.]) >> >> >>> >> >> >>> In [5]: np.round(np.arange(0.5, 11)) >> >> >>> Out[5]: array([ 0., 2., 2., 4., 4., 6., 6., 8., 8., 10., >> 10.]) >> >> >> >> >> >> >> >> >> The GNU C documentation says that rint "round(s) x to an integer >> value >> >> >> according to the current rounding mode." The rounding mode is >> >> >> determined by >> >> >> settings in the FPU control word. Numpy runs with it set to round to >> >> >> even, >> >> >> although, IIRC, there is a bug on windows where the library is not >> >> >> setting >> >> >> those bits correctly. >> >> > >> >> > >> >> > Round to even is also the Python default rounding mode. >> >> >> >> Do you mean that it is Python setting the FPU control word? Or do we >> >> set it? Do you happen to know where that is in the source? I did a >> >> quick grep just now without anything obvious. >> > >> > >> > I can't find official (PEP) documentation, but googling indicates that >> in >> > Python 3, `round` rounds to even, and in Python 2 it rounds up. See also >> > https://docs.python.org/3/whatsnew/3.0.html. >> >> But I guess this could be the Python implementation of round, rather >> than rint and the FPU control word? I'm asking because the question >> arose about npy_rint at the C level ... >> > > I'm pretty sure Python sets the FPU control word. Note that Python itself > doesn't have a public interface for setting it, nor does Java. The GNU > library documentation has the following: > > Round to nearest. > This is the default mode. It should be used unless there is a specific > need for one of the others. In this mode results are rounded to the nearest > representable value. If the result is midway between two representable > values, the even representable is chosen. *Even* here means the > lowest-order bit is zero. This rounding mode prevents statistical bias and > guarantees numeric stability: round-off errors in a lengthy calculation > will remain smaller than half of FLT_EPSILON. > > A classic reference for rounding is TAOCP (section 4.2.2 in the Third Edition). I read that many years ago, so don't recall the details. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Jan 19 12:27:10 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 19 Jan 2018 12:27:10 -0500 Subject: [Numpy-discussion] Moving NumPy's PRNG Forward In-Reply-To: References: Message-ID: On Fri, Jan 19, 2018 at 9:55 AM, Robert Kern wrote: > tl;dr: I think that our stream-compatibility policy is holding us back, > and I think we can come up with a way forward with a new policy that will > allow us to innovate without seriously compromising our reliability. > > To recap, our current policy for numpy.random is that we guarantee that > the stream of random numbers from any of the methods of a seeded > `RandomState` does not change from version to version (at least on the same > hardware, OS, compiler, etc.), except in the case where we are fixing > correctness bugs. That is, for code like this: > > prng = np.random.RandomState(12345) > x = prng.normal(10.0, 3.0, size=100) > > `x` will be the same exact floats regardless of what version of numpy was > installed. > > There seems to be a lot of pent-up motivation to improve on the random > number generation, in particular the distributions, that has been blocked > by our policy. I think we've lost a few potential first-time contributors > that have run up against this wall. We have been pondering ways to allow > for adding new core PRNGs and improve the distribution methods while > maintaining stream-compatibility for existing code. Kevin Sheppard, in > particular, has been working hard to implement new core PRNGs with a common > API. > > https://github.com/bashtage/ng-numpy-randomstate > > Kevin has also been working to implement the several proposals that have > been made to select different versions of distribution implementations. In > particular, one idea is to pass something to the RandomState constructor to > select a specific version of distributions (or switch out the core PRNG). > Note that to satisfy the policy, the simplest method of seeding a > RandomState will always give you the oldest version: what we have now. > > Kevin has recently come to the conclusion that it's not technically > feasible to add the version-selection at all if we keep the > stream-compatibility policy. > > https://github.com/numpy/numpy/pull/10124#issuecomment-350876221 > > I would argue that our current policy isn't providing the value that it > claims to. In the first place, there are substantial holes in the > reproducibility of streams across platforms. All of the integers (internal > and external) are C `long`s, so integer overflows can cause variable > streams if you use any of the rejection algorithms involving integers > across Windows and Linux. Plain-old floating point arithmetic differences > between platforms can cause similar issues (though rarer). Our policy of > fixing bugs interferes with strict reproducibility. And our changes to > non-random routines can interfere with the ability to reproduce the results > of the whole software, independent of the PRNG stream. The multivariate > normal implementation is even more vulnerable, as it uses `np.linalg` > routines that may be affected by which LAPACK library numpy is built > against much less changes that we might make to them in the normal course > of development. > > At the time I established the policy (2008-9), there was significantly > less tooling around for pinning versions of software. The > PyPI/pip/setuptools ecosystem was in its infancy, VMs were slow cumbersome > beasts mostly used to run Windows programs unavailable on Linux, and > containerization a la Docker was merely a dream. A lot of resources have > been put into reproducible research since then that pins the whole stack > from OS libraries on up. The need to have stream-compatibility across numpy > versions for the purpose of reproducible research is much diminished. > > I think that we can relax the strict stream-compatibility policy to allow > innovation without giving up much practically-usable stability. Let's > compare with Python's policy: > > https://docs.python.org/3.6/library/random.html#notes-on-reproducibility > > """ > Most of the random module?s algorithms and seeding functions are subject > to change across Python versions, but two aspects are guaranteed not to > change: > > * If a new seeding method is added, then a backward compatible seeder will > be offered. > * The generator?s random() method will continue to produce the same > sequence when the compatible seeder is given the same seed. > """ > > I propose that we adopt a similar policy. This would immediately resolve > many of the issues blocking innovation in the random distributions. > Improvements to the distributions could be made at the same rhythm as > normal features. No version-selection API would be required as you select > the version by installing the desired version of numpy. By default, > everyone gets the latest, best versions of the sampling algorithms. > Selecting a different core PRNG could be easily achieved as > ng-numpy-randomstate does it, by instantiating different classes. The > different incompatible ways to initialize different core PRNGs (with unique > features like selectable streams and the like) are transparently handled: > different classes have different constructors. There is no need to jam all > options for all core PRNGs into a single constructor. > > I would add a few more of the simpler distribution methods to the list > that *is* guaranteed to remain stream-compatible, probably `randint()` and > `bytes()` and maybe a couple of others. I would appreciate input on the > matter. > > The current API should remain available and working, but not necessarily > with the same algorithms. That is, for code like the following: > > prng = np.random.RandomState(12345) > x = prng.normal(10.0, 3.0, size=100) > > `x` is still guaranteed to be 100 normal variates with the appropriate > mean and standard deviation, but they might be computed by the ziggurat > method from PCG-generated bytes (or whatever new default core PRNG we have). > > As an alternative, we may also want to leave `np.random.RandomState` > entirely fixed in place as deprecated legacy code that is never updated. > This would allow current unit tests that depend on the stream-compatibility > that we previously promised to still pass until they decide to update. > Development would move to a different class hierarchy with new names. > > I am personally not at all interested in preserving any stream > compatibility for the `numpy.random.*` aliases or letting the user swap out > the core PRNG for the global PRNG that underlies them. `np.random.seed()` > should be discouraged (if not outright deprecated) in favor of explicitly > passing around instances. > > In any case, we have a lot of different options to discuss if we decide to > relax our stream-compatibility policy. At the moment, I'm not pushing for > any particular changes to the code, just the policy in order to enable a > more wide-ranging field of options that we have been able to work with so > far. > I'm not sure I fully understand Is the proposal to drop stream-backward compatibility completely for the future or just a one time change? > No version-selection API would be required as you select the version by installing the desired version of numpy. That's not useful if we want to have unit tests that run in the same way across numpy versions. There are many unit tests that rely on fixed streams and have hard coded results that rely on specific numbers (up to floating point, numerical noise). Giving up stream compatibility would essentially kill using np.random for these unit tests. Similar, reproducibility from another user, e.g. in notebooks, would break without stream compatibility across numpy versions. One possibility is to keep the current stream-compatible np.random version and maintain it in future for those usecases, and add a new "high-performance" version with the new features. Josef > > Thanks. > > -- > Robert Kern > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Fri Jan 19 12:57:12 2018 From: shoyer at gmail.com (Stephan Hoyer) Date: Fri, 19 Jan 2018 17:57:12 +0000 Subject: [Numpy-discussion] Moving NumPy's PRNG Forward In-Reply-To: References: Message-ID: On Fri, Jan 19, 2018 at 6:57 AM Robert Kern wrote: > As an alternative, we may also want to leave `np.random.RandomState` > entirely fixed in place as deprecated legacy code that is never updated. > This would allow current unit tests that depend on the stream-compatibility > that we previously promised to still pass until they decide to update. > Development would move to a different class hierarchy with new names. > I like this alternative, but I would hesitate to call it "deprecated". Users who care about exact reproducibility across NumPy versions (e.g., for testing) are probably less concerned about performance, and could continue to use it. New random number generator classes could implement their own guarantees about compatibility across their methods. I am personally not at all interested in preserving any stream > compatibility for the `numpy.random.*` aliases or letting the user swap out > the core PRNG for the global PRNG that underlies them. `np.random.seed()` > should be discouraged (if not outright deprecated) in favor of explicitly > passing around instances. > I agree that np.random.seed() should be discouraged, but it feels very late in NumPy's development to remove it. If we do alter the random number streams for numpy.random.*, it seems that we should probably issue a warning (at least for a several major versions) whenever numpy.random.seed() is called. This could get pretty noisy. I guess that's all the more incentive to switch to random state objects. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.e.creasey.00 at googlemail.com Fri Jan 19 16:18:40 2018 From: p.e.creasey.00 at googlemail.com (Peter Creasey) Date: Fri, 19 Jan 2018 13:18:40 -0800 Subject: [Numpy-discussion] Moving NumPy's PRNG Forward Message-ID: > Date: Fri, 19 Jan 2018 23:55:57 +0900 > From: Robert Kern > > tl;dr: I think that our stream-compatibility policy is holding us back, and > I think we can come up with a way forward with a new policy that will allow > us to innovate without seriously compromising our reliability. > > I propose that we adopt a similar policy. This would immediately resolve > many of the issues blocking innovation in the random distributions. > Improvements to the distributions could be made at the same rhythm as > normal features. No version-selection API would be required as you select > the version by installing the desired version of numpy. By default, > everyone gets the latest, best versions of the sampling algorithms. > Selecting a different core PRNG could be easily achieved as > ng-numpy-randomstate does it, by instantiating different classes. The > different incompatible ways to initialize different core PRNGs (with unique > features like selectable streams and the like) are transparently handled: > different classes have different constructors. There is no need to jam all > options for all core PRNGs into a single constructor. +1 I think I have a general comment that random streams are probably over-used in testing, in particular: 1. If you need a small amount (100s) of random values you should just spit them out to code/files for repeatability and, 2. If you need a large amount of random data then maybe you want to rethink your testing strategy. In particular if you need millions of points to hit a few edge cases then your code is significantly more mature (low failure rate) than your test suite (inefficient), and you could probably target your tests a bit more (I'm as guilty of this as others). Having said that I don't think NumPy today is in a position to fight the inertia of 2 - numpy.random has given everyone easy tools to make large and (nearly always) reproducible sequences, and they get used a lot. On the other hand staying on top of performance requires an active code-base, and for a reproducible PRNG almost any change is a breaking-change. Allowing non-backwards-compatible streams concurrently with the old style seems the logical way forward. So +1. Best, Peter From robert.kern at gmail.com Fri Jan 19 17:34:12 2018 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 20 Jan 2018 07:34:12 +0900 Subject: [Numpy-discussion] Moving NumPy's PRNG Forward In-Reply-To: References: Message-ID: On Sat, Jan 20, 2018 at 2:27 AM, wrote: > I'm not sure I fully understand > Is the proposal to drop stream-backward compatibility completely for the future or just a one time change? For all future. > > No version-selection API would be required as you select the version by installing the desired version of numpy. > > That's not useful if we want to have unit tests that run in the same way across numpy versions. > > There are many unit tests that rely on fixed streams and have hard coded results that rely on specific numbers (up to floating point, numerical noise). > Giving up stream compatibility would essentially kill using np.random for these unit tests. This is a use case that I am sensitive to. However, it should be noted that relying on the exact stream for unit tests makes you vulnerable to platform differences. That's something that we've never guaranteed (because we can't). That said, there are some of the simpler distributions that are more robust to such things, and those are fairly typical in unit tests. As I mentioned, I am open to a small set of methods that we do guarantee stream-compatibility for. I think that unit tests are the obvious use case that should determine what that set is. Unit tests rarely need `noncentral_chisquare()`, for instance. I'd also be willing to make the API a little clunkier in order to maintain the stable set of methods. For example, two methods that are common in unit testing are `normal()` and `choice()`, but those have been the target of the most attempted innovation. I'd be willing to leave them alone while providing other methods that do the same thing but are allowed to innovate. > Similar, reproducibility from another user, e.g. in notebooks, would break without stream compatibility across numpy versions. That is the reproducible-research use case that I discussed already. I argued that the stability that our policy actually provides is rather more muted than what it seems on its face. > One possibility is to keep the current stream-compatible np.random version and maintain it in future for those usecases, and add a new "high-performance" version with the new features. That is one of the alternatives I raised. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Fri Jan 19 17:50:31 2018 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 20 Jan 2018 07:50:31 +0900 Subject: [Numpy-discussion] Moving NumPy's PRNG Forward In-Reply-To: References: Message-ID: On Sat, Jan 20, 2018 at 2:57 AM, Stephan Hoyer wrote: > > On Fri, Jan 19, 2018 at 6:57 AM Robert Kern wrote: >> >> As an alternative, we may also want to leave `np.random.RandomState` entirely fixed in place as deprecated legacy code that is never updated. This would allow current unit tests that depend on the stream-compatibility that we previously promised to still pass until they decide to update. Development would move to a different class hierarchy with new names. > > I like this alternative, but I would hesitate to call it "deprecated". Users who care about exact reproducibility across NumPy versions (e.g., for testing) are probably less concerned about performance, and could continue to use it. I would be careful about that because quite a few of the methods are not stable across platforms, even on the same numpy version. If you want to declare that some part of the np.random API is stable for such purposes, we need to curate a subset of the methods anyways. As a one-off thing, this alternative proposes to declare that all of `np.random.RandomState` is stable across versions, but we can't guarantee that all of it is unconditionally stable for exact reproducibility. We can make a guarantee for a smaller subset of methods, though. To your point, though, if we freeze the current `RandomState`, we can make that guarantee for a larger subset of the methods than we would for the new API. So I guess I talked myself around to your view, but I would be a bit more cautious in how we advertise the stability of the frozen `RandomState` API. > New random number generator classes could implement their own guarantees about compatibility across their methods. > >> I am personally not at all interested in preserving any stream compatibility for the `numpy.random.*` aliases or letting the user swap out the core PRNG for the global PRNG that underlies them. `np.random.seed()` should be discouraged (if not outright deprecated) in favor of explicitly passing around instances. > > I agree that np.random.seed() should be discouraged, but it feels very late in NumPy's development to remove it. > > If we do alter the random number streams for numpy.random.*, it seems that we should probably issue a warning (at least for a several major versions) whenever numpy.random.seed() is called. This could get pretty noisy. I guess that's all the more incentive to switch to random state objects. True. I like that. The reason I think that it might be worth an exception is that it has been a moral hazard. People aren't just writing correct but improvable code (relying on `np.random.*` methods but seeding exactly once at the start of their single-threaded simulation) but they've been writing incorrect and easily-broken code. For example: np.random.seed(seed) np.random.shuffle(x_train) np.random.seed(seed) np.random.shuffle(labels_train) -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Jan 19 18:12:56 2018 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 19 Jan 2018 15:12:56 -0800 Subject: [Numpy-discussion] Moving NumPy's PRNG Forward In-Reply-To: References: Message-ID: On Fri, Jan 19, 2018 at 6:55 AM, Robert Kern wrote: [...] > There seems to be a lot of pent-up motivation to improve on the random > number generation, in particular the distributions, that has been blocked by > our policy. I think we've lost a few potential first-time contributors that > have run up against this wall. We have been pondering ways to allow for > adding new core PRNGs and improve the distribution methods while maintaining > stream-compatibility for existing code. Kevin Sheppard, in particular, has > been working hard to implement new core PRNGs with a common API. > > https://github.com/bashtage/ng-numpy-randomstate > > Kevin has also been working to implement the several proposals that have > been made to select different versions of distribution implementations. In > particular, one idea is to pass something to the RandomState constructor to > select a specific version of distributions (or switch out the core PRNG). > Note that to satisfy the policy, the simplest method of seeding a > RandomState will always give you the oldest version: what we have now. > > Kevin has recently come to the conclusion that it's not technically feasible > to add the version-selection at all if we keep the stream-compatibility > policy. > > https://github.com/numpy/numpy/pull/10124#issuecomment-350876221 > > I would argue that our current policy isn't providing the value that it > claims to. I agree that relaxing our policy would be better than the status quo. Before making any decisions, though, I'd like to make sure we understand the alternatives and their trade-offs. Specifically, I think the main alternative would be the following approach to versioning: 1) make RandomState's state be a tuple (underlying RNG algorithm, underlying RNG state, distribution version) 2) zero-argument initialization/seeding, like RandomState() or rstate.seed(), sets the state to: (our recommended RNG algorithm, os.urandom(...), version=LATEST_VERSION) 3) for backcompat, single-argument seeding like RandomState(123) or rstate.seed(123), sets the state to: (mersenne twister, expand_mt_seed(123), version=0) 4) also allow seeding to explicitly control all the parameters, like RandomState(PCG_XSL_RR(123), version=12) or whatever 5) the distribution functions are implemented like: def normal(*args, **kwargs): if self.version < 3: return self._normal_box_muller(*args, **kwargs) elif self.version < 8: return self._normal_ziggurat_v1(*args, **kwargs) else: # version >= 8 return self._normal_ziggurat_v2(*args, **kwargs) Advantages: fully backwards compatible; preserves the compatibility guarantee (such as it is); users who use the default seeding automatically get the highest speed and quality Disadvantages: users who specify seeds explicitly get old/slow distributions (but of course that's the point of compatibility); we have to keep the old distribution code around forever (but this is not too hard; it just sits in some function and we never touch it). Kevin, is this the version that you think is non-viable? Is the above a good description of the advantages/disadvantages? -n -- Nathaniel J. Smith -- https://vorpus.org From robert.kern at gmail.com Sat Jan 20 07:07:39 2018 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 20 Jan 2018 21:07:39 +0900 Subject: [Numpy-discussion] Moving NumPy's PRNG Forward In-Reply-To: References: Message-ID: On Sat, Jan 20, 2018 at 7:34 AM, Robert Kern wrote: > > On Sat, Jan 20, 2018 at 2:27 AM, wrote: > > > I'm not sure I fully understand > > Is the proposal to drop stream-backward compatibility completely for the future or just a one time change? > > For all future. To color this a little, while we'll have a burst of activity for the first round to fix all of the mistakes I baked in early, I do not expect the pace of change after that to be very large. While we are going to relax the strictness of the policy, we should carefully weigh the benefit of a change against the pain of changing the stream, and explore ways to implement the improvement that would retain the stream. We should take more care making such changes in `np.random` than in other parts of numpy. When we get to drafting the NEP, I'd be happy to include language to this effect. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Sat Jan 20 07:19:59 2018 From: ndbecker2 at gmail.com (Neal Becker) Date: Sat, 20 Jan 2018 12:19:59 +0000 Subject: [Numpy-discussion] Moving NumPy's PRNG Forward In-Reply-To: References: Message-ID: On Fri, Jan 19, 2018 at 6:13 PM Nathaniel Smith wrote: > ... > I agree that relaxing our policy would be better than the status quo. > Before making any decisions, though, I'd like to make sure we > understand the alternatives and their trade-offs. Specifically, I > think the main alternative would be the following approach to > versioning: > > 1) make RandomState's state be a tuple (underlying RNG algorithm, > underlying RNG state, distribution version) > 2) zero-argument initialization/seeding, like RandomState() or > rstate.seed(), sets the state to: (our recommended RNG algorithm, > os.urandom(...), version=LATEST_VERSION) > 3) for backcompat, single-argument seeding like RandomState(123) or > rstate.seed(123), sets the state to: (mersenne twister, > expand_mt_seed(123), version=0) > 4) also allow seeding to explicitly control all the parameters, like > RandomState(PCG_XSL_RR(123), version=12) or whatever > 5) the distribution functions are implemented like: > > def normal(*args, **kwargs): > if self.version < 3: > return self._normal_box_muller(*args, **kwargs) > elif self.version < 8: > return self._normal_ziggurat_v1(*args, **kwargs) > else: # version >= 8 > return self._normal_ziggurat_v2(*args, **kwargs) > I like this suggestion, but I suggest to modify it so that a zero-argument initialization or 1-argument seeding will initialize to a global value, which would default to backcompat, but could be changed. Then my old code would by default produce the same old results, but adding 1 line at the top switches to faster code if I want. -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Sun Jan 21 21:48:37 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Sun, 21 Jan 2018 21:48:37 -0500 Subject: [Numpy-discussion] Multiple-field indexing: view vs copy in 1.14+ Message-ID: <196d4e06-5d46-aa58-cb8f-7a033c30aaed@gmail.com> Hello all, We are making a decision (again) about what to do about the behavior of multiple-field indexing of structured arrays: Should it return a view or a copy, and on what release schedule? As a reminder, this refers to operations like (1.13 behavior): >>> a = np.zeros(3, dtype=[('a', 'i4'), ('b', 'i4'), ('c', 'f4')]) >>> a[['a', 'c']] array([(0, 0.), (0, 0.), (0, 0.)], dtype=[('a', ' References: <196d4e06-5d46-aa58-cb8f-7a033c30aaed@gmail.com> Message-ID: Hi Allan, I think on the consistency argument is perhaps the most important: views are very powerful and in many ways one *counts* on them happening, especially in working with large arrays. They really should be used everywhere it is possible. In this respect, I think one has to weigh breakage of some code against time spent solving unexpected bugs because a view is *not* taken (the change in MaskedArray to ensure the mask is always viewed instead of copied is another example of trying to move as much as possible in that direction). Anyway, in favour of views. All the best, Marten From josef.pktd at gmail.com Mon Jan 22 10:53:13 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 22 Jan 2018 10:53:13 -0500 Subject: [Numpy-discussion] Multiple-field indexing: view vs copy in 1.14+ In-Reply-To: <196d4e06-5d46-aa58-cb8f-7a033c30aaed@gmail.com> References: <196d4e06-5d46-aa58-cb8f-7a033c30aaed@gmail.com> Message-ID: On Sun, Jan 21, 2018 at 9:48 PM, Allan Haldane wrote: > Hello all, > > We are making a decision (again) about what to do about the > behavior of multiple-field indexing of structured arrays: Should > it return a view or a copy, and on what release schedule? > > As a reminder, this refers to operations like (1.13 behavior): > > >>> a = np.zeros(3, dtype=[('a', 'i4'), ('b', 'i4'), ('c', 'f4')]) > >>> a[['a', 'c']] > array([(0, 0.), (0, 0.), (0, 0.)], > dtype=[('a', ' > In numpy 1.14.0 we made this return a view instead of a copy, but > downstream test failures suggest we reconsider. In our current > implementation for 1.14.1, we have reverted this change, but > still plan to go through with it in 1.15. > > See here for our discussion the problem and solutions: > https://github.com/numpy/numpy/pull/10411 > > The two main options we have discussed are either to try to make > the change in 1.15, or never make the change at all and always > return a copy. > > Here are some pros and cons: > > Pros (change to view in 1.15) > ============================= > > * Views are useful and convenient. Other forms of indexing also > often return views so this is more consistent. > * This change has been planned since numpy 1.7 in 2009, > and there have been visible FutureWarnings about it since > then. Anyone whose code will break should have seen the > warnings. It has been extensively warned about in recent > release notes. > * Past discussions have supported the change. See my comment in > the PR with many links to them and to other history. > * Users have requested the change on the list. > * Possibly a majority of the reported code failures were not > actually caused by the change, but by another bug (#8100) > involving np.load/np.save which this change exposed. If we > push it off to 1.15, we will have time to fix this other bug. > (There were no FutureWarnings for this breakage, of course). > * The code that really will break is of the form > a[['a', 'c']].view('i8') > because the returned itemsize is different. This has > raised FutureWarnings since numpy 1.7, and no users reported > failures due to this change. In the PR we still try to > mitigate this breakage by introducing a new method > `pack_fields`, which converts the result into the 1.13 form, > so that > np.pack_fields(a[['a', 'c']]).view('i8') > will work. > > > Cons (keep returning a copy) > ============================ > > * The extra convenience is not really that much, and fancy > indexing also returns a copy instead of a view, so there is > a precedent there. > * We want to minimize compatibility breaks with old behavior. > We've had a fair amount of discussion and complaints about > how we break things in general. > * We have lived with a "copy" for 8 years now. At some point the > behavior gets set in stone for compatibility reasons. > * Users have written to the list and github about their code > breaking in 1.14.0. As far as I am aware, they all refer > to the #8100 problem. > * If a new function `pack_fields` is needed to guard against > mishaps with the view behavior, that seems like a sign that > keeping the copy behavior is the best option from an API > perspective. > > My initial vote is go with the change in 1.15: The "view" code > that will ultimately break (not the code related to #8100) has > been sending FutureWarnings for many years, and I am not aware of > any user complaints involving it: All the complaints so far > would be fixed with #8100 in 1.15. > > (Note based on a linked mailing list thread, 2012 might be the last time I looked more closely at structured dtypes. So some of what I understand might be outdated.) views on structured dtypes are very important, but viewing them as standard arrays with standard dtypes is the main part that I had used. Essentially structured dtypes are useless for any computation, e.g. just some simple reduce operation. To work with them we need a standard view. I think the usecase that fails in statsmodels (except there is no test failure anymore because we switched to using pandas in the unit test) cls.confint_res = cls.results[['acvar_lb','acvar_ub']].view((float, > 2)) E ValueError: Changing the dtype to a subarray type is only supported if the total itemsize is unchanged This is similar to the above example a[['a', 'c']].view('i8') but it doesn't try to combine fields. In many examples where I used structured dtypes a long time ago, switched between consistent views as either a standard array of subsets or as .structured dtypes. For this usecase it wouldn't matter whether a[['a', 'c']] returns a view or copy, as long as we can get the second view that is consistent with the selected part of the memory. This would also be independent of whether numpy pads internally and adjusts the strides if possible or not. >>> np.__version__ '1.11.2' >>> a = np.ones(5, dtype=[('a', 'i8'), ('b', 'f8'), ('c', 'f8')]) >>> a array([(1, 1.0, 1.0), (1, 1.0, 1.0), (1, 1.0, 1.0), (1, 1.0, 1.0), (1, 1.0, 1.0)], dtype=[('a', '>> a.mean(0) Traceback (most recent call last): File "", line 1, in a.mean(0) File "C:\...\python-3.4.4.amd64\lib\site-packages\numpy\core\_methods.py", line 65, in _mean ret = umr_sum(arr, axis, dtype, out, keepdims) TypeError: cannot perform reduce with flexible type >>> a[['b', 'c']].mean(0) Traceback (most recent call last): File "", line 1, in a[['b', 'c']].mean(0) File "C:\...\python-3.4.4.amd64\lib\site-packages\numpy\core\_methods.py", line 65, in _mean ret = umr_sum(arr, axis, dtype, out, keepdims) TypeError: cannot perform reduce with flexible type >>> a[['b', 'c']].view(('f8', 2)).mean(0) array([ 1., 1.]) >>> a[['b', 'c']].view(('f8', 2)).dtype dtype('float64') Aside The plan is that statsmodels will drop all usage and support for rec_arays/structured dtypes in the following release (0.10). Then structured dtypes are free (from our perspective) to provide low level struct support instead of pretending to be dataframe_like. Josef > Feel free to also discuss the related proposed change, to make > np.diag return a view instead of a copy. That change has > not been implemented yet, only proposed. > Cheers, > Allan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Jan 22 11:07:52 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 22 Jan 2018 11:07:52 -0500 Subject: [Numpy-discussion] Multiple-field indexing: view vs copy in 1.14+ In-Reply-To: References: <196d4e06-5d46-aa58-cb8f-7a033c30aaed@gmail.com> Message-ID: On Mon, Jan 22, 2018 at 10:53 AM, wrote: > > > On Sun, Jan 21, 2018 at 9:48 PM, Allan Haldane > wrote: > >> Hello all, >> >> We are making a decision (again) about what to do about the >> behavior of multiple-field indexing of structured arrays: Should >> it return a view or a copy, and on what release schedule? >> >> As a reminder, this refers to operations like (1.13 behavior): >> >> >>> a = np.zeros(3, dtype=[('a', 'i4'), ('b', 'i4'), ('c', 'f4')]) >> >>> a[['a', 'c']] >> array([(0, 0.), (0, 0.), (0, 0.)], >> dtype=[('a', '> >> In numpy 1.14.0 we made this return a view instead of a copy, but >> downstream test failures suggest we reconsider. In our current >> implementation for 1.14.1, we have reverted this change, but >> still plan to go through with it in 1.15. >> >> See here for our discussion the problem and solutions: >> https://github.com/numpy/numpy/pull/10411 >> >> The two main options we have discussed are either to try to make >> the change in 1.15, or never make the change at all and always >> return a copy. >> >> Here are some pros and cons: >> >> Pros (change to view in 1.15) >> ============================= >> >> * Views are useful and convenient. Other forms of indexing also >> often return views so this is more consistent. >> * This change has been planned since numpy 1.7 in 2009, >> and there have been visible FutureWarnings about it since >> then. Anyone whose code will break should have seen the >> warnings. It has been extensively warned about in recent >> release notes. >> * Past discussions have supported the change. See my comment in >> the PR with many links to them and to other history. >> * Users have requested the change on the list. >> * Possibly a majority of the reported code failures were not >> actually caused by the change, but by another bug (#8100) >> involving np.load/np.save which this change exposed. If we >> push it off to 1.15, we will have time to fix this other bug. >> (There were no FutureWarnings for this breakage, of course). >> * The code that really will break is of the form >> a[['a', 'c']].view('i8') >> because the returned itemsize is different. This has >> raised FutureWarnings since numpy 1.7, and no users reported >> failures due to this change. In the PR we still try to >> mitigate this breakage by introducing a new method >> `pack_fields`, which converts the result into the 1.13 form, >> so that >> np.pack_fields(a[['a', 'c']]).view('i8') >> will work. >> >> >> Cons (keep returning a copy) >> ============================ >> >> * The extra convenience is not really that much, and fancy >> indexing also returns a copy instead of a view, so there is >> a precedent there. >> * We want to minimize compatibility breaks with old behavior. >> We've had a fair amount of discussion and complaints about >> how we break things in general. >> * We have lived with a "copy" for 8 years now. At some point the >> behavior gets set in stone for compatibility reasons. >> * Users have written to the list and github about their code >> breaking in 1.14.0. As far as I am aware, they all refer >> to the #8100 problem. >> * If a new function `pack_fields` is needed to guard against >> mishaps with the view behavior, that seems like a sign that >> keeping the copy behavior is the best option from an API >> perspective. >> >> My initial vote is go with the change in 1.15: The "view" code >> that will ultimately break (not the code related to #8100) has >> been sending FutureWarnings for many years, and I am not aware of >> any user complaints involving it: All the complaints so far >> would be fixed with #8100 in 1.15. >> >> > (Note based on a linked mailing list thread, 2012 might be the last time I > looked more closely at structured dtypes. > So some of what I understand might be outdated.) > > > views on structured dtypes are very important, but viewing them as > standard arrays with standard dtypes is the main part that I had used. > Essentially structured dtypes are useless for any computation, e.g. just > some simple reduce operation. To work with them we need a standard view. > > I think the usecase that fails in statsmodels (except there is no test > failure anymore because we switched to using pandas in the unit test) > do add a detail here results is a recarray created from a csv file with results = genfromtxt(open(filename, "rb"), delimiter=",", names=True,dtype=float) ['acvar_lb','acvar_ub'] are the last two columns, so this corresponds to my example below where AFAIU no padding is necessary to get a view. > > > cls.confint_res = cls.results[['acvar_lb','acvar > _ub']].view((float, > > > 2)) > E ValueError: Changing the dtype to a subarray type is only > supported if the total itemsize is unchanged > > > This is similar to the above example > a[['a', 'c']].view('i8') > but it doesn't try to combine fields. > > In many examples where I used structured dtypes a long time ago, switched > between consistent views as either a standard array of subsets or as > .structured dtypes. > For this usecase it wouldn't matter whether a[['a', 'c']] returns a view > or copy, as long as we can get the second view that is consistent with the > selected part of the memory. This would also be independent of whether > numpy pads internally and adjusts the strides if possible or not. > > >>> np.__version__ > '1.11.2' > > >>> a = np.ones(5, dtype=[('a', 'i8'), ('b', 'f8'), ('c', 'f8')]) > >>> a > array([(1, 1.0, 1.0), (1, 1.0, 1.0), (1, 1.0, 1.0), (1, 1.0, 1.0), > (1, 1.0, 1.0)], > dtype=[('a', ' > >>> a.mean(0) > Traceback (most recent call last): > File "", line 1, in > a.mean(0) > File "C:\...\python-3.4.4.amd64\lib\site-packages\numpy\core\_methods.py", > line 65, in _mean > ret = umr_sum(arr, axis, dtype, out, keepdims) > TypeError: cannot perform reduce with flexible type > > >>> a[['b', 'c']].mean(0) > Traceback (most recent call last): > File "", line 1, in > a[['b', 'c']].mean(0) > File "C:\...\python-3.4.4.amd64\lib\site-packages\numpy\core\_methods.py", > line 65, in _mean > ret = umr_sum(arr, axis, dtype, out, keepdims) > TypeError: cannot perform reduce with flexible type > > >>> a[['b', 'c']].view(('f8', 2)).mean(0) > array([ 1., 1.]) > >>> a[['b', 'c']].view(('f8', 2)).dtype > dtype('float64') > > > Aside The plan is that statsmodels will drop all usage and support for > rec_arays/structured dtypes > in the following release (0.10). > Then structured dtypes are free (from our perspective) to provide low > level struct support > instead of pretending to be dataframe_like. > > Josef > > > >> Feel free to also discuss the related proposed change, to make >> np.diag return a view instead of a copy. That change has >> not been implemented yet, only proposed. > > >> Cheers, >> Allan >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Mon Jan 22 11:13:17 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Mon, 22 Jan 2018 11:13:17 -0500 Subject: [Numpy-discussion] Multiple-field indexing: view vs copy in 1.14+ In-Reply-To: References: <196d4e06-5d46-aa58-cb8f-7a033c30aaed@gmail.com> Message-ID: <9a64e741-c9c7-d538-c2a1-c7761d0cf518@gmail.com> On 01/22/2018 10:53 AM, josef.pktd at gmail.com wrote: > > This is similar to the above example > a[['a', 'c']].view('i8') > but it doesn't try to combine fields. > > In? many examples where I used structured dtypes a long time ago, > switched between consistent views as either a standard array of subsets > or as .structured dtypes. > For this usecase it wouldn't matter whether a[['a', 'c']] returns a view > or copy, as long as we can get the second view that is consistent with > the selected part of the memory. This would also be independent of > whether numpy pads internally and adjusts the strides if possible or not. > >>>> np.__version__ > '1.11.2' > >>>> a = np.ones(5, dtype=[('a', 'i8'), ('b', 'f8'), ('c', 'f8')]) >>>> a > array([(1, 1.0, 1.0), (1, 1.0, 1.0), (1, 1.0, 1.0), (1, 1.0, 1.0), > ? ? ? ?(1, 1.0, 1.0)], > ? ? ? dtype=[('a', ' >>>> a[['b', 'c']].view(('f8', 2)).mean(0) > array([ 1.,? 1.]) >>>> a[['b', 'c']].view(('f8', 2)).dtype > dtype('float64') Hmm, this did not raise a FutureWarning in 11.2, so I was not quite right in my message. It looks like this particular line only started raising FutureWarnings in 1.12.0. > Aside The plan is that statsmodels will drop all usage and support for > rec_arays/structured dtypes > in the following release (0.10). > Then structured dtypes are free (from our perspective) to provide low > level struct support > instead of pretending to be dataframe_like. Your use of structured arrays is "pandas-like", ie you are using it tabular data manipulation. In numpy 1.13 we updated the structured docs to discourage this. Of course users can do what they want, but here is what the new docs say: Structured arrays are designed for low-level manipulation of structured data, for example, for interpreting binary blobs. Structured datatypes are designed to mimic 'structs' in the C language, making them also useful for interfacing with C code. For these purposes, numpy supports specialized features such as subarrays and nested datatypes, and allows manual control over the memory layout of the structure. For simple manipulation of tabular data other pydata projects, such as pandas, xarray, or DataArray, provide higher-level interfaces that may be more suitable. These projects may also give better performance for tabular data analysis because the C-struct-like memory layout of structured arrays can lead to poor cache behavior. Allan From allanhaldane at gmail.com Mon Jan 22 11:23:28 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Mon, 22 Jan 2018 11:23:28 -0500 Subject: [Numpy-discussion] Multiple-field indexing: view vs copy in 1.14+ In-Reply-To: References: <196d4e06-5d46-aa58-cb8f-7a033c30aaed@gmail.com> Message-ID: <5022ecfd-a6f4-ddab-39a6-cc0fcbd392f9@gmail.com> On 01/22/2018 10:53 AM, josef.pktd at gmail.com wrote: > On Sun, Jan 21, 2018 at 9:48 PM, Allan Haldane > In? many examples where I used structured dtypes a long time ago, > switched between consistent views as either a standard array of subsets > or as .structured dtypes. > For this usecase it wouldn't matter whether a[['a', 'c']] returns a view > or copy, as long as we can get the second view that is consistent with > the selected part of the memory. This would also be independent of > whether numpy pads internally and adjusts the strides if possible or not. > >>>> np.__version__ > '1.11.2' > >>>> a = np.ones(5, dtype=[('a', 'i8'), ('b', 'f8'), ('c', 'f8')]) >>>> a > array([(1, 1.0, 1.0), (1, 1.0, 1.0), (1, 1.0, 1.0), (1, 1.0, 1.0), > ? ? ? ?(1, 1.0, 1.0)], > ? ? ? dtype=[('a', '>>> a[['b', 'c']].view(('f8', 2)).dtype > dtype('float64') Thanks for a real example to think about. I just want to note that I thought of another way to "fix" this for 1.15 which does not involve "pack_fields", which is a[['b', 'c']].astype('f8,f8').view(('f8', 2)) Which is back-compatible will numpy back to 1.7, I think. So that's another option to ease the transition. Allan From josef.pktd at gmail.com Mon Jan 22 11:24:30 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 22 Jan 2018 11:24:30 -0500 Subject: [Numpy-discussion] Multiple-field indexing: view vs copy in 1.14+ In-Reply-To: <9a64e741-c9c7-d538-c2a1-c7761d0cf518@gmail.com> References: <196d4e06-5d46-aa58-cb8f-7a033c30aaed@gmail.com> <9a64e741-c9c7-d538-c2a1-c7761d0cf518@gmail.com> Message-ID: On Mon, Jan 22, 2018 at 11:13 AM, Allan Haldane wrote: > On 01/22/2018 10:53 AM, josef.pktd at gmail.com wrote: > >> >> This is similar to the above example >> a[['a', 'c']].view('i8') >> but it doesn't try to combine fields. >> >> In many examples where I used structured dtypes a long time ago, >> switched between consistent views as either a standard array of subsets or >> as .structured dtypes. >> For this usecase it wouldn't matter whether a[['a', 'c']] returns a view >> or copy, as long as we can get the second view that is consistent with the >> selected part of the memory. This would also be independent of whether >> numpy pads internally and adjusts the strides if possible or not. >> >> np.__version__ >>>>> >>>> '1.11.2' >> >> a = np.ones(5, dtype=[('a', 'i8'), ('b', 'f8'), ('c', 'f8')]) >>>>> a >>>>> >>>> array([(1, 1.0, 1.0), (1, 1.0, 1.0), (1, 1.0, 1.0), (1, 1.0, 1.0), >> (1, 1.0, 1.0)], >> dtype=[('a', '> >> a[['b', 'c']].view(('f8', 2)).mean(0) >>>>> >>>> array([ 1., 1.]) >> >>> a[['b', 'c']].view(('f8', 2)).dtype >>>>> >>>> dtype('float64') >> > > Hmm, this did not raise a FutureWarning in 11.2, so I was not quite right > in my message. It looks like this particular line only started raising > FutureWarnings in 1.12.0. > > Aside The plan is that statsmodels will drop all usage and support for >> rec_arays/structured dtypes >> in the following release (0.10). >> Then structured dtypes are free (from our perspective) to provide low >> level struct support >> instead of pretending to be dataframe_like. >> > > Your use of structured arrays is "pandas-like", ie you are using it > tabular data manipulation. In numpy 1.13 we updated the structured docs to > discourage this. Of course users can do what they want, but here is what > the new docs say: > > Structured arrays are designed for low-level > manipulation of structured data, for example, for > interpreting binary blobs. Structured datatypes are > designed to mimic 'structs' in the C language, making > them also useful for interfacing with C code. For these > purposes, numpy supports specialized features such as > subarrays and nested datatypes, and allows manual > control over the memory layout of the structure. > > For simple manipulation of tabular data other pydata > projects, such as pandas, xarray, or DataArray, provide > higher-level interfaces that may be more suitable. These > projects may also give better performance for tabular > data analysis because the C-struct-like memory layout of > structured arrays can lead to poor cache behavior. > > Once upon a time .... The test code was written in June 2010 In Oct/Nov 2017 we switched to pandas for loading the data but not for the reference `results` to avoid numpy recarray warnings. In Jan 2018 we switched to pandas also for the reference results statsmodels has a lot of "legacy" code especially in the datasets and unit tests, when recarrays were still the appropriate precursor to pandas. recarrays are built on structured dtypes, and were not just supposed to be low level C-structs. Josef > > Allan > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Mon Jan 22 11:46:08 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Mon, 22 Jan 2018 11:46:08 -0500 Subject: [Numpy-discussion] Multiple-field indexing: view vs copy in 1.14+ In-Reply-To: <5022ecfd-a6f4-ddab-39a6-cc0fcbd392f9@gmail.com> References: <196d4e06-5d46-aa58-cb8f-7a033c30aaed@gmail.com> <5022ecfd-a6f4-ddab-39a6-cc0fcbd392f9@gmail.com> Message-ID: <02f98251-e0b7-968a-c450-f33ce9d9972e@gmail.com> On 01/22/2018 11:23 AM, Allan Haldane wrote: > I just want to note that I > thought of another way to "fix" this for 1.15 which does not involve > "pack_fields", which is > > ??? a[['b', 'c']].astype('f8,f8').view(('f8', 2)) > > Which is back-compatible will numpy back to 1.7, I think. Apologies, this is not back-compatible, do not use it. I forgot that past versions of numpy had a weird quirk that this will replace all the structured data with 0s. Allan From andyfaff at gmail.com Tue Jan 23 05:04:06 2018 From: andyfaff at gmail.com (Andrew Nelson) Date: Tue, 23 Jan 2018 21:04:06 +1100 Subject: [Numpy-discussion] with np.testing.assert_raises still requires nose. Message-ID: Dear np list, from a recent scipy PR (https://github.com/scipy/scipy/pull/8322) it appears that the `np.testing.assert_raises` context manager still requires nose to be installed: https://ci.appveyor.com/project/scipy/scipy/build/1.0.1550/job/s5nq8xqd0n72feqf The appveyor test is using numpy-1.14.0 This doesn't seem to be an issue on our travis tests, only the windows tests. Is this a bug? -- _____________________________________ Dr. Andrew Nelson _____________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue Jan 23 11:34:25 2018 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 23 Jan 2018 16:34:25 +0000 Subject: [Numpy-discussion] with np.testing.assert_raises still requires nose. In-Reply-To: References: Message-ID: Hi, On Tue, Jan 23, 2018 at 10:04 AM, Andrew Nelson wrote: > Dear np list, > from a recent scipy PR (https://github.com/scipy/scipy/pull/8322) it appears > that the `np.testing.assert_raises` context manager still requires nose to > be installed: > https://ci.appveyor.com/project/scipy/scipy/build/1.0.1550/job/s5nq8xqd0n72feqf > > The appveyor test is using numpy-1.14.0 > > This doesn't seem to be an issue on our travis tests, only the windows > tests. Is this a bug? I noticed that too. For my own project I ended up adding a replacement stub: """ from unittest import TestCase assert_raises = TestCase().assertRaises """ I think that's all nose is doing, but I couldn't find the code path in a quick grep. Cheers, Matthew From solarjoe at posteo.org Thu Jan 25 09:51:40 2018 From: solarjoe at posteo.org (Joe) Date: Thu, 25 Jan 2018 15:51:40 +0100 Subject: [Numpy-discussion] Using np.frombuffer and cffi.buffer on array of C structs (problem with struct member padding) In-Reply-To: <878td2ny6m.fsf@otaria.sebmel.org> References: <87po6e234f.fsf@gmail.com> <878td2ny6m.fsf@otaria.sebmel.org> Message-ID: <8bd9f09ce2eb55140bc65775201ed47c@posteo.de> Hello, how I could dynamically handle the dtype of a structured array when reading an array of C structs with np.frombuffer (regarding the member padding in the struct). So far I manually adjusted the dtype of the structured array and added a field for the padding, this works on a small scale. The structs are not defined within my code, but within a third party and basically I am looking for no-worry hassle free way to handle this, because there are a lot of structs Is there some smart way to do this in Numpy? So far the best approach seems to parse the struct with the cffi functions ffi.sizeof(), ffi.offsetof() and maybe ffi.alignof() to find out where the padding happens and add it dynamically to the dtype. But maybe someone has a smarter idea how to solve this. You can find a more detailed description and a working example here: https://stackoverflow.com/questions/48423725/how-to-handle-member-padding-in-struct-when-reading-cffi-buffer-with-numpy-fromb Kind regards, Joe From chris.barker at noaa.gov Thu Jan 25 11:31:26 2018 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Thu, 25 Jan 2018 08:31:26 -0800 Subject: [Numpy-discussion] Using np.frombuffer and cffi.buffer on array of C structs (problem with struct member padding) In-Reply-To: <8bd9f09ce2eb55140bc65775201ed47c@posteo.de> References: <87po6e234f.fsf@gmail.com> <878td2ny6m.fsf@otaria.sebmel.org> <8bd9f09ce2eb55140bc65775201ed47c@posteo.de> Message-ID: Sent from my iPhone > On Jan 25, 2018, at 6:51 AM, Joe wrote: > > Hello, > > how I could dynamically handle the dtype of a structured array when reading > an array of C structs with np.frombuffer (regarding the member padding in the struct). > > So far I manually adjusted the dtype of the structured array and added a field for the padding, > this works on a small scale. > The structs are not defined within my code, but within a third party and > basically I am looking for no-worry hassle free way to handle this, because there are a lot of structs > > Is there some smart way to do this in Numpy? The numpy dtype constructor takes an ?align? keyword that will pad it for you. However, if these strict are coming from a lib compiled by a third party, I?m not sure you can count on the alignment rules being the same. So maybe you will need to use the cffi functions :-( -CHB > > So far the best approach seems to parse the struct with the cffi functions > ffi.sizeof(), ffi.offsetof() and maybe ffi.alignof() to find out where the padding > happens and add it dynamically to the dtype. But maybe someone has a > smarter idea how to solve this. > > You can find a more detailed description and a working example here: > > https://stackoverflow.com/questions/48423725/how-to-handle-member-padding-in-struct-when-reading-cffi-buffer-with-numpy-fromb > > Kind regards, > Joe > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From chris.barker at noaa.gov Thu Jan 25 11:33:05 2018 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Thu, 25 Jan 2018 08:33:05 -0800 Subject: [Numpy-discussion] Using np.frombuffer and cffi.buffer on array of C structs (problem with struct member padding) In-Reply-To: References: <87po6e234f.fsf@gmail.com> <878td2ny6m.fsf@otaria.sebmel.org> <8bd9f09ce2eb55140bc65775201ed47c@posteo.de> Message-ID: The numpy dtype constructor takes an ?align? keyword that will pad it for you. https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.dtype.html -CHB -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Thu Jan 25 13:01:46 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Thu, 25 Jan 2018 13:01:46 -0500 Subject: [Numpy-discussion] Using np.frombuffer and cffi.buffer on array of C structs (problem with struct member padding) In-Reply-To: References: <87po6e234f.fsf@gmail.com> <878td2ny6m.fsf@otaria.sebmel.org> <8bd9f09ce2eb55140bc65775201ed47c@posteo.de> Message-ID: <88dbefc1-ab01-8efe-420c-102e13e3aaed@gmail.com> There is a new section discussing alignment in the numpy 1.14 structured array docs, which has some hints about interfacing with C structs. These new 1.14 docs are not online yet on scipy.org, but in the meantime you can view them here: https://ahaldane.github.io/user/basics.rec.html#automatic-byte-offsets-and-alignment (That links specifically to the discussion of alignments and padding). Allan On 01/25/2018 11:33 AM, Chris Barker - NOAA Federal wrote: > >> >> The numpy dtype constructor takes an ?align? keyword that will pad it >> for you. > > https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.dtype.html > > -CHB > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > From stefanv at berkeley.edu Thu Jan 25 13:16:31 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Thu, 25 Jan 2018 10:16:31 -0800 Subject: [Numpy-discussion] Multiple-field indexing: view vs copy in 1.14+ In-Reply-To: References: <196d4e06-5d46-aa58-cb8f-7a033c30aaed@gmail.com> Message-ID: <20180125181631.l5cxygykjshpp44f@fastmail.com> On Mon, 22 Jan 2018 10:11:08 -0500, Marten van Kerkwijk wrote: >I think on the consistency argument is perhaps the most important: >views are very powerful and in many ways one *counts* on them >happening, especially in working with large arrays. I had the same gut feeling, but the fancy indexing example made me pause: In [9]: x = np.arange(12, dtype=float).reshape((3, 4)) In [10]: p = x[[0, 1]] # copy of data Then: In [11]: x = np.array([(0, 1), (2, 3)], dtype=[('a', int), ('b', int)]) In [12]: p = x[['a', 'b']] # copy of data, but proposal will change that We're not doing the same kind of indexing here exactly (in one case we grab elements, in the other parts of elements), but the view behavior may still break the "mental expectation". Fortunately, there's already other proof that this operatoin is not exactly fancy indexing: In [15]: x[['a', 'a']] --------------------------------------------------------------------------- KeyError Traceback (most recent call last) in () ----> 1 x[['a', 'a']] KeyError: 'duplicate field of name a' Not copying wherever possible feels like an important principle to uphold, so I am +1. St?fan From m.h.vankerkwijk at gmail.com Thu Jan 25 13:49:18 2018 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Thu, 25 Jan 2018 13:49:18 -0500 Subject: [Numpy-discussion] Multiple-field indexing: view vs copy in 1.14+ In-Reply-To: <20180125181631.l5cxygykjshpp44f@fastmail.com> References: <196d4e06-5d46-aa58-cb8f-7a033c30aaed@gmail.com> <20180125181631.l5cxygykjshpp44f@fastmail.com> Message-ID: On Thu, Jan 25, 2018 at 1:16 PM, Stefan van der Walt wrote: > On Mon, 22 Jan 2018 10:11:08 -0500, Marten van Kerkwijk wrote: >> >> I think on the consistency argument is perhaps the most important: >> views are very powerful and in many ways one *counts* on them >> happening, especially in working with large arrays. > > > I had the same gut feeling, but the fancy indexing example made me > pause: > > In [9]: x = np.arange(12, dtype=float).reshape((3, 4)) > > In [10]: p = x[[0, 1]] # copy of data > > Then: > > In [11]: x = np.array([(0, 1), (2, 3)], dtype=[('a', int), ('b', int)]) > > In [12]: p = x[['a', 'b']] # copy of data, but proposal will change that > > We're not doing the same kind of indexing here exactly (in one case we > grab elements, in the other parts of elements), but the view behavior > may still break the "mental expectation". A bit off-topic, but maybe this is another argument to just allow `x['a', 'b']` -- I never understood why a tuple was not the appropriate iterable for getting multiple items from a record. -- Marten From josef.pktd at gmail.com Thu Jan 25 15:56:28 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 25 Jan 2018 15:56:28 -0500 Subject: [Numpy-discussion] Multiple-field indexing: view vs copy in 1.14+ In-Reply-To: References: <196d4e06-5d46-aa58-cb8f-7a033c30aaed@gmail.com> <20180125181631.l5cxygykjshpp44f@fastmail.com> Message-ID: On Thu, Jan 25, 2018 at 1:49 PM, Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > On Thu, Jan 25, 2018 at 1:16 PM, Stefan van der Walt > wrote: > > On Mon, 22 Jan 2018 10:11:08 -0500, Marten van Kerkwijk wrote: > >> > >> I think on the consistency argument is perhaps the most important: > >> views are very powerful and in many ways one *counts* on them > >> happening, especially in working with large arrays. > > > > > > I had the same gut feeling, but the fancy indexing example made me > > pause: > > > > In [9]: x = np.arange(12, dtype=float).reshape((3, 4)) > > > > In [10]: p = x[[0, 1]] # copy of data > > > > Then: > > > > In [11]: x = np.array([(0, 1), (2, 3)], dtype=[('a', int), ('b', int)]) > > > > In [12]: p = x[['a', 'b']] # copy of data, but proposal will change that > What does this do? p = x[['a', 'b']].copy() My impression is that the problems with the view are because the padded view doesn't behave like a "standard" dtype or array, i.e. the follow-up behavior is the problematic part. Josef > > > > We're not doing the same kind of indexing here exactly (in one case we > > grab elements, in the other parts of elements), but the view behavior > > may still break the "mental expectation". > > A bit off-topic, but maybe this is another argument to just allow > `x['a', 'b']` -- I never understood why a tuple was not the > appropriate iterable for getting multiple items from a record. > > -- Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Jan 25 18:06:56 2018 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 25 Jan 2018 15:06:56 -0800 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 Message-ID: Hi all, I'm pretty sure this is the same thing as recently discussed on this list about 1.14, but to confirm: I had failures in my code with an upgrade for 1.14 -- turns out it was a single line in a single test fixture, so no big deal, but a regression just the same, with no deprecation warning. I was essentially doing this: In [*48*]: dt Out[*48*]: dtype([('time', ' in () ----> 1 full = np.array(zip(time, uv), dtype=dt) ValueError: setting an array element with a sequence. It took some poking, but the solution was to do: full = np.array(zip(time, (tuple(w) *for* w *in* uv)), dtype=dt) That is, convert the values to nested tuples, rather than an array in a tuple, or a list in a tuple. As I said, my problem is solved, but to confirm: 1) This is a known change with good reason? 2) My solution was the best (only) one -- the only way to set a nested dtype like that is with tuples? If so, then I think we should: A) improve the error message. "ValueError: setting an array element with a sequence." Is not really clear -- I spent a while trying to figure out how I could set a nested dtype like that without a sequence? and I was actually using a ndarray, so it wasn't even a generic sequence. And a tuple is a sequence, too... I had a vague recollection that in some circumstances, numpy treats tuples and lists (and arrays) differently (fancy indexing??), so I tried the tuple thing and that worked. But I've been around numpy a long time -- that could have been very very confusing to many people. So could the message be changed to something like: "ValueError: setting an array element with a generic sequence. Only the tuple type can be used in this context." or something like that -- I'm not sure where else this same error message might pop up, so that could be totally inappropriate. 2) maybe add a .totuple()method to ndarray, much like the .tolist() method? that would have been handy here. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Thu Jan 25 19:06:18 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Thu, 25 Jan 2018 19:06:18 -0500 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: References: Message-ID: <66031cd1-b872-203b-bdcc-a4416efbf946@gmail.com> On 01/25/2018 06:06 PM, Chris Barker wrote: > Hi all, > > I'm pretty sure this is the same thing as recently discussed on this > list about 1.14, but to confirm: > > I had failures in my code with an upgrade for 1.14 -- turns out it was a > single line in a single test fixture, so no big deal, but a regression > just the same, with no deprecation warning. > > I was essentially doing this: > > In [*48*]: dt > > Out[*48*]: dtype([('time', ' ' > > In [*49*]: uv > > Out[*49*]:? > > array([[1., 1.], > > ?? ? ? [1., 1.], > > ?? ? ? [1., 1.], > > ?? ? ? [1., 1.]]) > > > In [*50*]: time > > Out[*50*]: array([1, 1, 1, 1]) > > > In [*51*]: full = np.array(zip(time, uv), dtype=dt) > > --------------------------------------------------------------------------- > > ValueError? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Traceback (most recent call last) > > in () > > ----> 1full =np.array(zip(time,uv),dtype=dt) > > > ValueError: setting an array element with a sequence. > > > > It took some poking, but the solution was to do: > > full = np.array(zip(time, (tuple(w) *for*w *in*uv)), dtype=dt) > > > That is, convert the values to nested tuples, rather than an array in a > tuple, or a list in a tuple. > > As I said, my problem is solved, but to confirm: > > 1) This is a known change with good reason? This change is a little different from what we discussed before. The change occurred because the old assignment behavior was dangerous, and was not doing what you thought. If you modify your dtype above changing both 'f8' fields to 'f4', you will see you get very strange results: Your array gets filled in with the values (1, ( 0., 1.875)). Here's what happened: Previously, numpy was *not* iterating your data as a sequence. Instead, if numpy did not find a tuple it would interpret the data a a raw buffer and copy the value byte-by-byte, ignoring endianness, casting, stride, etc. You can get even weirder results if you do `uv = uv.astype('i4')`, for example. It happened to work for you because ndarrays expose a buffer interface, and you were assigning using exactly the same type and endianness. In 1.14 the fix was to disallow this 'buffer' assignment for structured arrays, it was causing quite confusing bugs. Unstructured "void" arrays still do this though. > 2) My solution was the best (only) one -- the only way to set a nested > dtype like that is with tuples? Right, our solution was to only allow assignment from tuples. We might be able to relax that for structured scalars, but for arrays I remember one consideration was to avoid confusion with array broadcasting: If you do >>> x = np.zeros(2, dtype='i4,i4') >>> x[:] = np.array([3, 4]) >>> x array([(3, 3), (4, 4)], dtype=[('f0', '>> x[:] = (3, 4) >>> x array([(3, 4), (3, 4)], dtype=[('f0', ' If so, then I think we should: > > A) improve the error message. > > "ValueError: setting an array element with a sequence." > > Is not really clear -- I spent a while trying to figure out how I could > set a nested dtype like that without a sequence? and I was actually > using a ndarray, so it wasn't even a generic sequence. And a tuple is a > sequence, too... > > I had a vague recollection that in some circumstances, numpy treats > tuples and lists (and arrays) differently (fancy indexing??), so I tried > the tuple thing and that worked. But I've been around numpy a long time > -- that could have been very very confusing to many people. > > So could the message be changed to something like: > > "ValueError: setting an array element with a generic sequence. Only the > tuple type can be used in this context." > > or something like that -- I'm not sure where else this same error > message might pop up, so that could be totally inappropriate. Good idea. I'll see if we can do it for 1.14.1. > 2) maybe add a?.totuple()method to ndarray, much like the .tolist() > method? that would have been handy here. >> -Chris > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959 ?? voice > 7600 Sand Point Way NE ??(206) 526-6329 ?? fax > Seattle, WA ?98115 ? ? ??(206) 526-6317 ?? > main reception > > Chris.Barker at noaa.gov > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > From chris.barker at noaa.gov Thu Jan 25 20:53:39 2018 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Thu, 25 Jan 2018 17:53:39 -0800 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: <66031cd1-b872-203b-bdcc-a4416efbf946@gmail.com> References: <66031cd1-b872-203b-bdcc-a4416efbf946@gmail.com> Message-ID: > On Jan 25, 2018, at 4:06 PM, Allan Haldane wrote: >> 1) This is a known change with good reason? > . The > change occurred because the old assignment behavior was dangerous, and > was not doing what you thought. OK, that?s a good reason! >> A) improve the error message. > > Good idea. I'll see if we can do it for 1.14.1. What do folks think about a totuple() method ? even before this I?ve wanted that. But in this case, it seems particularly useful. -CHB From kevin.k.sheppard at gmail.com Fri Jan 26 11:14:07 2018 From: kevin.k.sheppard at gmail.com (Kevin Sheppard) Date: Fri, 26 Jan 2018 16:14:07 +0000 Subject: [Numpy-discussion] Moving NumPy's PRNG Forward Message-ID: I am a firm believer that the current situation is not sustainable. There are a lot of improvements that can practically be incorporated. While many of these are performance related, there are also improvements in accuracy over some ranges of parameters that cannot be incorporated. I also think that perfect stream reproducibility is a bit of a myth across versions since this really would require identical OS, compiler and possibly CPU for some of the generators that produce floats. I believe there is a case for separating the random generator from core NumPy. Some points that favor becoming a subproject: 1. It is a pure consumer of NumPy API. Other parts of the API do no depend on random. 2. A stand alone package could be installed along side many different version of core NumPy which would reduce the pressure on freezing the stream. In terms of what is needed, I think that the underlying PRNG should be swappable. The will provide a simple mechanism to allow certain types of advancement while easily providing backward compat. In the current design this is very hard and requires compiling many nearly identical copies of RandomState. In pseudocode something like standard_normal(prng) where prng is a basic class that retains the PRNG state and has a small set of core random number generators that belong to the underlying PRNG -- probably something like int32, int64, double, and possibly int53. I am not advocating explicitly passing the PRNG as an argument, but having generators which can take any suitable PRNG would add a lot of flexibility in terms of taking advantage of improvements in the underlying PRNGs (see, e.g., xoroshiro128/xorshift1024). The "small" core PRNG would have responsibility over state and streams. The remainder of the module would transform the underlying PRNG into the required distributions. This would also simplify making improvements, since old versions could be saved or improved versions could be added to the API. For example, from numpy.random import standard_normal, prng # Preferred versions standard_normal(prng) # Ziggurat from numpy.random.legacy import standard_normal_bm, mt19937 # legacy generators standard_normal_bm(mt19937) # Box-Muller Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Fri Jan 26 13:48:19 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Fri, 26 Jan 2018 13:48:19 -0500 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: References: <66031cd1-b872-203b-bdcc-a4416efbf946@gmail.com> Message-ID: <2c7a541a-dcc0-32f1-b18b-3ca35a34bb71@gmail.com> On 01/25/2018 08:53 PM, Chris Barker - NOAA Federal wrote: >> On Jan 25, 2018, at 4:06 PM, Allan Haldane wrote: > >>> 1) This is a known change with good reason? > >> . The >> change occurred because the old assignment behavior was dangerous, and >> was not doing what you thought. > > OK, that?s a good reason! > >>> A) improve the error message. >> >> Good idea. I'll see if we can do it for 1.14.1. > > What do folks think about a totuple() method ? even before this I?ve > wanted that. But in this case, it seems particularly useful. > > -CHB Two thoughts: 1. `totuple` makes most sense for 2d arrays. But what should it do for 1d or 3+d arrays? I suppose it could make the last dimension a tuple, so 1d arrays would give a list of tuples of size 1. 2. structured array's .tolist() already returns a list of tuples. If we have a 2d structured array, would it add one more layer of tuples? That would raise an exception if read back in by `np.array` with the same dtype. These points make me think that instead of a `.totuple` method, this might be more suitable as a new function in np.lib.recfunctions. If the goal is to help manipulate structured arrays, that submodule is appropriate since it already has other functions do manipulate fields in similar ways. What about calling it `pack_last_axis`? def pack_last_axis(arr, names=None): if arr.names: return arr names = names or ['f{}'.format(i) for i in range(arr.shape[-1])] return arr.view([(n, arr.dtype) for n in names]).squeeze(-1) Then you could do: >>> pack_last_axis(uv).tolist() to get a list of tuples. Allan From wieser.eric+numpy at gmail.com Fri Jan 26 14:17:20 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Fri, 26 Jan 2018 19:17:20 +0000 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: <2c7a541a-dcc0-32f1-b18b-3ca35a34bb71@gmail.com> References: <66031cd1-b872-203b-bdcc-a4416efbf946@gmail.com> <2c7a541a-dcc0-32f1-b18b-3ca35a34bb71@gmail.com> Message-ID: Why is the list of tuples a useful thing to have in the first place? If the goal is to convert an array into a structured array, you can do that far more efficiently with: def make_tup_dtype(arr): """ Attempt to make a type capable of viewing the last axis of an array, even if it is non-contiguous. Unfortunately `.view` doesn't allow us to use this dtype in that case, which needs a patch... """ n_fields = arr.shape[-1] step = arr.strides[-1] descr = dict(names=[], formats=[], offsets=[], itemsize=step * n_fields) for i in range(n_fields): descr['names'].append('f{}'.format(i)) descr['offsets'].append(step * i) descr['formats'].append(arr.dtype) return np.dtype(descr) Used as: >>> arr = np.arange(6).reshape(3, 2)>>> arr.view(make_tup_dtype(arr)).squeeze(axis=-1) array([(0, 1), (2, 3), (4, 5)], dtype=[('f0', ' wrote: > On 01/25/2018 08:53 PM, Chris Barker - NOAA Federal wrote: > >> On Jan 25, 2018, at 4:06 PM, Allan Haldane > wrote: > > > >>> 1) This is a known change with good reason? > > > >> . The > >> change occurred because the old assignment behavior was dangerous, and > >> was not doing what you thought. > > > > OK, that?s a good reason! > > > >>> A) improve the error message. > >> > >> Good idea. I'll see if we can do it for 1.14.1. > > > > What do folks think about a totuple() method ? even before this I?ve > > wanted that. But in this case, it seems particularly useful. > > > > -CHB > > Two thoughts: > > 1. `totuple` makes most sense for 2d arrays. But what should it do for > 1d or 3+d arrays? I suppose it could make the last dimension a tuple, so > 1d arrays would give a list of tuples of size 1. > > 2. structured array's .tolist() already returns a list of tuples. If we > have a 2d structured array, would it add one more layer of tuples? That > would raise an exception if read back in by `np.array` with the same dtype. > > These points make me think that instead of a `.totuple` method, this > might be more suitable as a new function in np.lib.recfunctions. If the > goal is to help manipulate structured arrays, that submodule is > appropriate since it already has other functions do manipulate fields in > similar ways. What about calling it `pack_last_axis`? > > def pack_last_axis(arr, names=None): > if arr.names: > return arr > names = names or ['f{}'.format(i) for i in range(arr.shape[-1])] > return arr.view([(n, arr.dtype) for n in names]).squeeze(-1) > > Then you could do: > > >>> pack_last_axis(uv).tolist() > > to get a list of tuples. > > Allan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Fri Jan 26 14:19:14 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Fri, 26 Jan 2018 19:19:14 +0000 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: References: <66031cd1-b872-203b-bdcc-a4416efbf946@gmail.com> <2c7a541a-dcc0-32f1-b18b-3ca35a34bb71@gmail.com> Message-ID: Apologies, it seems that I skipped to the end of @ahaldane's remark - we're on the same page. On Fri, 26 Jan 2018 at 11:17 Eric Wieser wrote: > Why is the list of tuples a useful thing to have in the first place? If > the goal is to convert an array into a structured array, you can do that > far more efficiently with: > > def make_tup_dtype(arr): > """ > Attempt to make a type capable of viewing the last axis of an array, even if it is non-contiguous. > Unfortunately `.view` doesn't allow us to use this dtype in that case, which needs a patch... > """ > n_fields = arr.shape[-1] > step = arr.strides[-1] > descr = dict(names=[], formats=[], offsets=[], itemsize=step * n_fields) > for i in range(n_fields): > descr['names'].append('f{}'.format(i)) > descr['offsets'].append(step * i) > descr['formats'].append(arr.dtype) > return np.dtype(descr) > > Used as: > > >>> arr = np.arange(6).reshape(3, 2)>>> arr.view(make_tup_dtype(arr)).squeeze(axis=-1) > array([(0, 1), (2, 3), (4, 5)], > dtype=[('f0', ' > Perhaps this should be provided by recfunctions (or maybe it already is, > in a less rigid form?) > > Eric > ? > > On Fri, 26 Jan 2018 at 10:48 Allan Haldane wrote: > >> On 01/25/2018 08:53 PM, Chris Barker - NOAA Federal wrote: >> >> On Jan 25, 2018, at 4:06 PM, Allan Haldane >> wrote: >> > >> >>> 1) This is a known change with good reason? >> > >> >> . The >> >> change occurred because the old assignment behavior was dangerous, and >> >> was not doing what you thought. >> > >> > OK, that?s a good reason! >> > >> >>> A) improve the error message. >> >> >> >> Good idea. I'll see if we can do it for 1.14.1. >> > >> > What do folks think about a totuple() method ? even before this I?ve >> > wanted that. But in this case, it seems particularly useful. >> > >> > -CHB >> >> Two thoughts: >> >> 1. `totuple` makes most sense for 2d arrays. But what should it do for >> 1d or 3+d arrays? I suppose it could make the last dimension a tuple, so >> 1d arrays would give a list of tuples of size 1. >> >> 2. structured array's .tolist() already returns a list of tuples. If we >> have a 2d structured array, would it add one more layer of tuples? That >> would raise an exception if read back in by `np.array` with the same >> dtype. >> >> These points make me think that instead of a `.totuple` method, this >> might be more suitable as a new function in np.lib.recfunctions. If the >> goal is to help manipulate structured arrays, that submodule is >> appropriate since it already has other functions do manipulate fields in >> similar ways. What about calling it `pack_last_axis`? >> >> def pack_last_axis(arr, names=None): >> if arr.names: >> return arr >> names = names or ['f{}'.format(i) for i in range(arr.shape[-1])] >> return arr.view([(n, arr.dtype) for n in names]).squeeze(-1) >> >> Then you could do: >> >> >>> pack_last_axis(uv).tolist() >> >> to get a list of tuples. >> >> Allan >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Jan 26 15:38:06 2018 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 26 Jan 2018 12:38:06 -0800 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: <2c7a541a-dcc0-32f1-b18b-3ca35a34bb71@gmail.com> References: <66031cd1-b872-203b-bdcc-a4416efbf946@gmail.com> <2c7a541a-dcc0-32f1-b18b-3ca35a34bb71@gmail.com> Message-ID: On Fri, Jan 26, 2018 at 10:48 AM, Allan Haldane wrote: > > What do folks think about a totuple() method ? even before this I?ve > > wanted that. But in this case, it seems particularly useful. > > Two thoughts: > > 1. `totuple` makes most sense for 2d arrays. But what should it do for > 1d or 3+d arrays? I suppose it could make the last dimension a tuple, so > 1d arrays would give a list of tuples of size 1. > I was thinking it would be exactly like .tolist() but with tuples -- so you'd get tuples all the way down (or is that turtles?) IN this use case, it would have saved me the generator expression: (tuple(r) for r in arr) not a huge deal, but it would be nice to not have to write that, and to have the looping be in C with no intermediate array generation. 2. structured array's .tolist() already returns a list of tuples. If we > have a 2d structured array, would it add one more layer of tuples? no -- why? it would return a tuple of tuples instead. > That > would raise an exception if read back in by `np.array` with the same dtype. > Hmm -- indeed, if the top-level structure is a tuple, the array constructor gets confused: This works fine -- as it should: In [*84*]: new_full = np.array(full.tolist(), full.dtype) But this does not: In [*85*]: new_full = np.array(tuple(full.tolist()), full.dtype) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () ----> 1 new_full = np.array(tuple(full.tolist()), full.dtype) ValueError: could not assign tuple of length 4 to structure with 2 fields. I was hoping it would dig down to the inner structures looking for a match to the dtype, rather than looking at the type of the top level. Oh well. So yeah, not sure where you would go from tuple to list -- probably at the bottom level, but that may not always be unambiguous. These points make me think that instead of a `.totuple` method, this > might be more suitable as a new function in np.lib.recfunctions. I don't seem to have that module -- and I'm running 1.14.0 -- is this a new idea? > If the > goal is to help manipulate structured arrays, that submodule is > appropriate since it already has other functions do manipulate fields in > similar ways. What about calling it `pack_last_axis`? > > def pack_last_axis(arr, names=None): > if arr.names: > return arr > names = names or ['f{}'.format(i) for i in range(arr.shape[-1])] > return arr.view([(n, arr.dtype) for n in names]).squeeze(-1) > > Then you could do: > > >>> pack_last_axis(uv).tolist() > > to get a list of tuples. > not sure what idea is here -- in my example, I had a regular 2-d array, so no names: In [*90*]: pack_last_axis(uv) --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) in () ----> 1 pack_last_axis(uv) in pack_last_axis(arr, names) * 1* def pack_last_axis(arr, names=None): ----> 2 if arr.names: * 3* return arr * 4* names = names or ['f{}'.format(i) for i in range(arr.shape[-1 ])] * 5* return arr.view([(n, arr.dtype) for n in names]).squeeze(-1) AttributeError: 'numpy.ndarray' object has no attribute 'names' So maybe you meants something like: In [*95*]: *def* pack_last_axis(arr, names=None): ...: *try*: ...: arr.names ...: *return* arr ...: *except* *AttributeError*: ...: names = names *or* ['f{}'.format(i) *for* i *in* range (arr.shape[-1])] ...: *return* arr.view([(n, arr.dtype) *for* n *in* names]).squeeze(-1) which does work, but seems like a convoluted way to get tuples! However, I didn't actually need tuples, I needed something I could pack into a stuctarray, and this does work, without the tolist: full = np.array(zip(time, pack_last_axis(uv)), dtype=dt) So maybe that is the way to go. I'm not sure I'd have thought to look for this function, but what can you do? Thanks for your attention to this, -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Fri Jan 26 16:42:24 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Fri, 26 Jan 2018 21:42:24 +0000 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: References: <66031cd1-b872-203b-bdcc-a4416efbf946@gmail.com> <2c7a541a-dcc0-32f1-b18b-3ca35a34bb71@gmail.com> Message-ID: arr.names should have been arr.dtype.names in that pack_last_axis function Eric ? On Fri, 26 Jan 2018 at 12:45 Chris Barker wrote: > On Fri, Jan 26, 2018 at 10:48 AM, Allan Haldane > wrote: > >> > What do folks think about a totuple() method ? even before this I?ve >> > wanted that. But in this case, it seems particularly useful. >> > > >> Two thoughts: >> > >> 1. `totuple` makes most sense for 2d arrays. But what should it do for >> 1d or 3+d arrays? I suppose it could make the last dimension a tuple, so >> 1d arrays would give a list of tuples of size 1. >> > > I was thinking it would be exactly like .tolist() but with tuples -- so > you'd get tuples all the way down (or is that turtles?) > > IN this use case, it would have saved me the generator expression: > > (tuple(r) for r in arr) > > not a huge deal, but it would be nice to not have to write that, and to > have the looping be in C with no intermediate array generation. > > 2. structured array's .tolist() already returns a list of tuples. If we >> have a 2d structured array, would it add one more layer of tuples? > > > no -- why? it would return a tuple of tuples instead. > > >> That >> would raise an exception if read back in by `np.array` with the same >> dtype. >> > > Hmm -- indeed, if the top-level structure is a tuple, the array > constructor gets confused: > > This works fine -- as it should: > > > In [*84*]: new_full = np.array(full.tolist(), full.dtype) > > > But this does not: > > > In [*85*]: new_full = np.array(tuple(full.tolist()), full.dtype) > > --------------------------------------------------------------------------- > > ValueError Traceback (most recent call > last) > > in () > > ----> 1 new_full = np.array(tuple(full.tolist()), full.dtype) > > > ValueError: could not assign tuple of length 4 to structure with 2 fields. > > I was hoping it would dig down to the inner structures looking for a match > to the dtype, rather than looking at the type of the top level. Oh well. > > So yeah, not sure where you would go from tuple to list -- probably at the > bottom level, but that may not always be unambiguous. > > These points make me think that instead of a `.totuple` method, this >> might be more suitable as a new function in np.lib.recfunctions. > > > I don't seem to have that module -- and I'm running 1.14.0 -- is this a > new idea? > > >> If the >> goal is to help manipulate structured arrays, that submodule is >> appropriate since it already has other functions do manipulate fields in >> similar ways. What about calling it `pack_last_axis`? >> >> def pack_last_axis(arr, names=None): >> if arr.names: >> return arr >> names = names or ['f{}'.format(i) for i in range(arr.shape[-1])] >> return arr.view([(n, arr.dtype) for n in names]).squeeze(-1) >> >> Then you could do: >> >> >>> pack_last_axis(uv).tolist() >> >> to get a list of tuples. >> > > not sure what idea is here -- in my example, I had a regular 2-d array, so > no names: > > In [*90*]: pack_last_axis(uv) > > --------------------------------------------------------------------------- > > AttributeError Traceback (most recent call > last) > > in () > > ----> 1 pack_last_axis(uv) > > > in pack_last_axis(arr, names) > > * 1* def pack_last_axis(arr, names=None): > > ----> 2 if arr.names: > > * 3* return arr > > * 4* names = names or ['f{}'.format(i) for i in range(arr.shape[- > 1])] > > * 5* return arr.view([(n, arr.dtype) for n in names]).squeeze(-1) > > > AttributeError: 'numpy.ndarray' object has no attribute 'names' > > > So maybe you meants something like: > > > In [*95*]: *def* pack_last_axis(arr, names=None): > > ...: *try*: > > ...: arr.names > > ...: *return* arr > > ...: *except* *AttributeError*: > > ...: names = names *or* ['f{}'.format(i) *for* i *in* range > (arr.shape[-1])] > > ...: *return* arr.view([(n, arr.dtype) *for* n *in* > names]).squeeze(-1) > > which does work, but seems like a convoluted way to get tuples! > > However, I didn't actually need tuples, I needed something I could pack > into a stuctarray, and this does work, without the tolist: > > full = np.array(zip(time, pack_last_axis(uv)), dtype=dt) > > > So maybe that is the way to go. > > I'm not sure I'd have thought to look for this function, but what can you > do? > > Thanks for your attention to this, > > -CHB > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Fri Jan 26 17:35:00 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Fri, 26 Jan 2018 17:35:00 -0500 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: References: <66031cd1-b872-203b-bdcc-a4416efbf946@gmail.com> <2c7a541a-dcc0-32f1-b18b-3ca35a34bb71@gmail.com> Message-ID: <89c7a41b-12c2-77b6-a12a-25d64f4cc6b9@gmail.com> On 01/26/2018 03:38 PM, Chris Barker wrote: > I was hoping it would dig down to the inner structures looking for a > match to the dtype, rather than looking at the type of the top level. Oh > well. > > So yeah, not sure where you would go from tuple to list -- probably at > the bottom level, but that may not always be unambiguous. As I remember, numpy has some fairly convoluted code for array creation which tries to make sense of various nested lists/tuples/ndarray combinations. It makes a difference for structured arrays and object arrays. I don't remember the details right now, but I know in some cases the rule is "If it's a Python list, recurse, otherwise assume it is an object array". While numpy does try to be lenient, I think we should guide the user to assume that if they want to specify a structured element, they should only use a tuple or a structured scalar, and if they want to specify a new dimension of elements, they should use a list. I expect less headaches that way. > These points make me think that instead of a `.totuple` method, this > might be more suitable as a new function in np.lib.recfunctions. > > I don't seem to have that module -- and I'm running 1.14.0 -- is this a > new idea? Sorry, I didn't specify it correctly. It is "numpy.lib.recfunctions". It is actually quite old, but has never been officially documented. I think that is because it has been considered "provisional" for a long time. See https://github.com/numpy/numpy/issues/5008 https://github.com/numpy/numpy/issues/2805 I still hesitate to make it more official now, since I'm not sure that structured arrays are yet bug-free enough to encourage more complex uses. Also, the functions in that module encourage "pandas-like" use of structured arrays, but I'm not sure they should be used that way. I've been thinking they should be primarily used for binary interfaces with/to numpy, eg to talk to C programs or to read complicated binary files. > However, I didn't actually need tuples, I needed something I could pack > into a stuctarray, and this does work, without the tolist: > > full = np.array(zip(time, pack_last_axis(uv)), dtype=dt) > > > So maybe that is the way to go. Right, that was my feeling: That we didn't really need `.totuple`, what we actually wanted is a special function for packing a nonstructured-array as a structured-array. > I'm not sure I'd have thought to look for this function, but what can > you do? > > Thanks for your attention to this, > > -CHB > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice > 7600 Sand Point Way NE ??(206) 526-6329?? fax > Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception > > Chris.Barker at noaa.gov > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > From chris.barker at noaa.gov Fri Jan 26 17:48:25 2018 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 26 Jan 2018 14:48:25 -0800 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: <89c7a41b-12c2-77b6-a12a-25d64f4cc6b9@gmail.com> References: <66031cd1-b872-203b-bdcc-a4416efbf946@gmail.com> <2c7a541a-dcc0-32f1-b18b-3ca35a34bb71@gmail.com> <89c7a41b-12c2-77b6-a12a-25d64f4cc6b9@gmail.com> Message-ID: On Fri, Jan 26, 2018 at 2:35 PM, Allan Haldane wrote: > As I remember, numpy has some fairly convoluted code for array creation > which tries to make sense of various nested lists/tuples/ndarray > combinations. It makes a difference for structured arrays and object > arrays. I don't remember the details right now, but I know in some cases > the rule is "If it's a Python list, recurse, otherwise assume it is an > object array". > that's at least explainable, and the "try to figure out what the user means" array cratinon is pretty much an impossible problem, so what we've got is probably about as good as it can get. > > These points make me think that instead of a `.totuple` method, this > > might be more suitable as a new function in np.lib.recfunctions. > > > > I don't seem to have that module -- and I'm running 1.14.0 -- is this a > > new idea? > > Sorry, I didn't specify it correctly. It is "numpy.lib.recfunctions". > thanks -- found it. > Also, the functions in that module encourage "pandas-like" use of > structured arrays, but I'm not sure they should be used that way. I've > been thinking they should be primarily used for binary interfaces > with/to numpy, eg to talk to C programs or to read complicated binary > files. > that's my use-case. And I agree -- if you really want to do that kind of thing, pandas is the way to go. I thought recarrays were pretty cool back in the day, but pandas is a much better option. So I pretty much only use structured arrays for data exchange with C code.... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Jan 26 18:01:52 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 26 Jan 2018 18:01:52 -0500 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: References: <66031cd1-b872-203b-bdcc-a4416efbf946@gmail.com> <2c7a541a-dcc0-32f1-b18b-3ca35a34bb71@gmail.com> <89c7a41b-12c2-77b6-a12a-25d64f4cc6b9@gmail.com> Message-ID: On Fri, Jan 26, 2018 at 5:48 PM, Chris Barker wrote: > On Fri, Jan 26, 2018 at 2:35 PM, Allan Haldane > wrote: > >> As I remember, numpy has some fairly convoluted code for array creation >> which tries to make sense of various nested lists/tuples/ndarray >> combinations. It makes a difference for structured arrays and object >> arrays. I don't remember the details right now, but I know in some cases >> the rule is "If it's a Python list, recurse, otherwise assume it is an >> object array". >> > > that's at least explainable, and the "try to figure out what the user > means" array cratinon is pretty much an impossible problem, so what we've > got is probably about as good as it can get. > >> > These points make me think that instead of a `.totuple` method, this >> > might be more suitable as a new function in np.lib.recfunctions. >> > >> > I don't seem to have that module -- and I'm running 1.14.0 -- is this a >> > new idea? >> >> Sorry, I didn't specify it correctly. It is "numpy.lib.recfunctions". >> > > thanks -- found it. > > >> Also, the functions in that module encourage "pandas-like" use of >> structured arrays, but I'm not sure they should be used that way. I've >> been thinking they should be primarily used for binary interfaces >> with/to numpy, eg to talk to C programs or to read complicated binary >> files. >> > > that's my use-case. And I agree -- if you really want to do that kind of > thing, pandas is the way to go. > > I thought recarrays were pretty cool back in the day, but pandas is a much > better option. > > So I pretty much only use structured arrays for data exchange with C > code.... > My impression is that this turns into a deprecate recarrays and supporting recfunction issue. recfunctions and the associated function from matplotlib.mlab where explicitly designed for using structured dtypes as dataframe_like. (old question: does numpy have a sort_rows function now without detouring to structured dtype views?) Josef > > -CHB > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Fri Jan 26 19:28:54 2018 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 27 Jan 2018 09:28:54 +0900 Subject: [Numpy-discussion] Moving NumPy's PRNG Forward In-Reply-To: References: Message-ID: On Sat, Jan 27, 2018 at 1:14 AM, Kevin Sheppard wrote: > > I am a firm believer that the current situation is not sustainable. There are a lot of improvements that can practically be incorporated. While many of these are performance related, there are also improvements in accuracy over some ranges of parameters that cannot be incorporated. I also think that perfect stream reproducibility is a bit of a myth across versions since this really would require identical OS, compiler and possibly CPU for some of the generators that produce floats. > > I believe there is a case for separating the random generator from core NumPy. Some points that favor becoming a subproject: > > 1. It is a pure consumer of NumPy API. Other parts of the API do no depend on random. > 2. A stand alone package could be installed along side many different version of core NumPy which would reduce the pressure on freezing the stream. Removing numpy.random (or freezing it as deprecated legacy while all PRNG development moves elsewhere) is probably a non-starter. It's too used for us not to provide something. That said, we can (and ought to) make it much easier for external packages to provide PRNG capabilities (core PRNGs and distributions) that interoperate with the core functionality that numpy provides. I'm also happy to place a high barrier on adding more distributions to numpy.random once that is in place. Specifically, core uniform PRNGs should have a small common C API that distribution functions can use. This might just be a struct with an opaque `void*` state pointer and then 2 function pointers for drawing a uint64 (whole range) and a double in [0,1) from the state. It's important to expose our core uniform PRNGs as a C API because there has been a desire to interoperate at that level, using the same PRNG state inside C or Fortran or GPU code. If that's in place, then people can write new efficient distribution functions in C that use this small C API agnostic to the core PRNG algorithm. It also makes it easy to implement new core PRNGs that the distribution functions provided by numpy.random can use. > In terms of what is needed, I think that the underlying PRNG should be swappable. The will provide a simple mechanism to allow certain types of advancement while easily providing backward compat. In the current design this is very hard and requires compiling many nearly identical copies of RandomState. In pseudocode something like > > standard_normal(prng) > > where prng is a basic class that retains the PRNG state and has a small set of core random number generators that belong to the underlying PRNG -- probably something like int32, int64, double, and possibly int53. I am not advocating explicitly passing the PRNG as an argument, but having generators which can take any suitable PRNG would add a lot of flexibility in terms of taking advantage of improvements in the underlying PRNGs (see, e.g., xoroshiro128/xorshift1024). The "small" core PRNG would have responsibility over state and streams. The remainder of the module would transform the underlying PRNG into the required distributions. (edit: after writing the following verbiage, I realize it can be summed up with more respect to your suggestion: yes, we should do this design, but we don't need to and shouldn't give up on a class with distribution methods.) Once the core PRNG C API is in place, I don't think we necessarily need to move away from a class structure per se, though it becomes an option. We just separate the core PRNG object from the distribution-providing class. We don't need to make copies of the distribution-providing class just to use a new core PRNG. I'm coming around to Nathaniel's suggestion for the constructor API (though not the distribution-versioning, for reasons I can get into later). We have a couple of core uniform PRNG classes like `MT19937` and `PCG128`. Those have a tiny API, and probably don't have a lot of unnecessary code clones between them. Their constructors can be different depending on the different ways they can be instantiated, depending on the PRNG's features. I'm not sure that they'll have any common methods besides `__getstate__/__setstate__` and probably a `copy()`. They will expose their C API as a Python-opaque attribute. They can have whatever algorithm-dependent methods they need (e.g. to support jumpahead). I might not even expose to Python the uint64 and U(0,1) double sampling methods, but maybe so. Then we have a single `Distributions` class that provides all of the distributions that we want to support in numpy.random (i.e. what we currently have on `RandomState` and whatever passes our higher bar in the future). It takes one of the core PRNG instances as an argument to the constructor (nominally, at least; we can design factory functions to make this more convenient). prng = Distributions(PCG128(seed)) x = prng.normal(mean, std) If someone wants to write a WELL512 core PRNG, they can just implement that object and pass it to `Distributions()`. The `Distributions` code doesn't need to be copied, nor do we need to much around with `__new__` tricks in Cython. Why do this instead of distribution functions? Well, I have a damn lot of code that is expecting an object with at least the broad outlines of the `RandomState` interface. I'm not going to update that code to use functions instead. And if I'm not, no one is. There isn't a good transition path since the PRNG object needs to thread through all of the code, whether it's my code that I'm writing greenfield or library code that I don't control. That said, people writing new distributions outside of numpy.random would be writing functions, not trying to add to the `Distributions` class, but that's fine. It also allows the possibility for the `Distributions` class to be stateful if we want to do things like caching the next Box-Muller variate and not force that onto the core PRNG state like I currently do. Though I'd rather just drop Box-Muller, and that's not a common pattern outside of Box-Muller. But it's a possibility. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From solarjoe at posteo.org Sat Jan 27 04:30:47 2018 From: solarjoe at posteo.org (Joe) Date: Sat, 27 Jan 2018 10:30:47 +0100 Subject: [Numpy-discussion] Using np.frombuffer and cffi.buffer on array of C structs (problem with struct member padding) In-Reply-To: <88dbefc1-ab01-8efe-420c-102e13e3aaed@gmail.com> References: <87po6e234f.fsf@gmail.com> <878td2ny6m.fsf@otaria.sebmel.org> <8bd9f09ce2eb55140bc65775201ed47c@posteo.de> <88dbefc1-ab01-8efe-420c-102e13e3aaed@gmail.com> Message-ID: Thanks for your help on this! This solved my issue. Am 25.01.2018 um 19:01 schrieb Allan Haldane: > There is a new section discussing alignment in the numpy 1.14 structured > array docs, which has some hints about interfacing with C structs. > > These new 1.14 docs are not online yet on scipy.org, but in the meantime > you can view them here: > https://ahaldane.github.io/user/basics.rec.html#automatic-byte-offsets-and-alignment > > (That links specifically to the discussion of alignments and padding). > > Allan > > On 01/25/2018 11:33 AM, Chris Barker - NOAA Federal wrote: >> >>> >>> The numpy dtype constructor takes an ?align? keyword that will pad it >>> for you. >> >> https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.dtype.html >> >> -CHB >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Sat Jan 27 22:33:55 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 27 Jan 2018 22:33:55 -0500 Subject: [Numpy-discussion] runtime warning in linalg.det Message-ID: I'm trying to figure out some warnings in the statsmodels test suite Why do the following raise RuntimeWarnings? np.linalg.det(np.ones((3,3))) C:\...\python-3.4.4.amd64\lib\site-packages\numpy\linalg\linalg.py:1776: RuntimeWarning: invalid value encountered in det r = _umath_linalg.det(a, signature=signature) Out[21]: 0.0 np.linalg.det(np.zeros((3,3))) C:\...\python-3.4.4.amd64\lib\site-packages\numpy\linalg\linalg.py:1776: RuntimeWarning: invalid value encountered in det r = _umath_linalg.det(a, signature=signature) Out[22]: 0.0 np.__version__ Out[23]: '1.11.2' and is there a way to distinguish those from user problems a = np.ones((3,3)) a[1,1] = np.nan np.linalg.det(a) C:\...\WinPython-64bit-3.4.4.5Qt5\python-3.4.4.amd64\lib\site-packages\numpy\linalg\linalg.py:1776: RuntimeWarning: invalid value encountered in det r = _umath_linalg.det(a, signature=signature) Out[26]: nan Josef -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Sat Jan 27 23:48:34 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Sat, 27 Jan 2018 23:48:34 -0500 Subject: [Numpy-discussion] Multiple-field indexing: view vs copy in 1.14+ In-Reply-To: References: <196d4e06-5d46-aa58-cb8f-7a033c30aaed@gmail.com> <20180125181631.l5cxygykjshpp44f@fastmail.com> Message-ID: On 01/25/2018 03:56 PM, josef.pktd at gmail.com wrote: > > > On Thu, Jan 25, 2018 at 1:49 PM, Marten van Kerkwijk > > wrote: > > On Thu, Jan 25, 2018 at 1:16 PM, Stefan van der Walt > > wrote: > > On Mon, 22 Jan 2018 10:11:08 -0500, Marten van Kerkwijk wrote: > >> > >> I think on the consistency argument is perhaps the most important: > >> views are very powerful and in many ways one *counts* on them > >> happening, especially in working with large arrays. > > > > > > I had the same gut feeling, but the fancy indexing example made me > > pause: > > > > In [9]: x = np.arange(12, dtype=float).reshape((3, 4)) > > > > In [10]: p = x[[0, 1]]? # copy of data > > > > Then: > > > > In [11]: x = np.array([(0, 1), (2, 3)], dtype=[('a', int), ('b', int)]) > > > > In [12]: p = x[['a', 'b']]? # copy of data, but proposal will change that > > > What does this do? > p = x[['a', 'b']].copy() In 1.14.0 this creates an exact copy of what was returned by `x[['a', 'b']]`, including any padding bytes. > My impression is that the problems with the view are because the padded > view doesn't behave like a "standard" dtype or array, i.e. the follow-up > behavior is the problematic part. I think the padded view is a "standard array" in the sense that you can easily create structured arrays with padding bytes, for example by using the `align=True` options. >>> np.zeros(3, dtype=np.dtype('u1,f4', align=True)) array([(0, 0.), (0, 0.), (0, 0.)], dtype={'names':['f0','f1'], 'formats':['u1','>> np.zeros(3, dtype='u1,u1,u1,u1,f4')[['f0', 'f4']] array([(0, 0.), (0, 0.), (0, 0.)], dtype={'names':['f0','f4'], 'formats':['u1',' Josef > > > > > We're not doing the same kind of indexing here exactly (in one case we > > grab elements, in the other parts of elements), but the view behavior > > may still break the "mental expectation". > > A bit off-topic, but maybe this is another argument to just allow > `x['a', 'b']` -- I never understood why a tuple was not the > appropriate iterable for getting multiple items from a record. > > -- Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > From allanhaldane at gmail.com Sat Jan 27 23:50:58 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Sat, 27 Jan 2018 23:50:58 -0500 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: References: <66031cd1-b872-203b-bdcc-a4416efbf946@gmail.com> <2c7a541a-dcc0-32f1-b18b-3ca35a34bb71@gmail.com> <89c7a41b-12c2-77b6-a12a-25d64f4cc6b9@gmail.com> Message-ID: <388385e2-163b-ad2e-6bf6-1ac68a709278@gmail.com> On 01/26/2018 06:01 PM, josef.pktd at gmail.com wrote: > I thought recarrays were pretty cool back in the day, but pandas is > a much better option. > > So I pretty much only use structured arrays for data exchange with C > code.... > > My impression is that this turns into a deprecate recarrays and > supporting recfunction issue. > > recfunctions and the associated function from matplotlib.mlab where > explicitly designed for using structured dtypes as dataframe_like > > (old question: does numpy have a sort_rows function now without > detouring to structured dtype views?) No, that's still the way to do it. *should* we have any dataframe-like functionality in numpy? We get requests every once in a while about how to sort rows, or about adding a "groupby" function. I myself have used recarrays in a dataframe-like way, when I wanted a quick multiple-array object that supported numpy indexing. So there is some demand to have minimal "dataframe-like" behavior in numpy itself. recarrays play part of this role currently, though imperfectly due to padding and cache issues. I think I'm comfortable with supporting some minor use of structured/recarrays as dataframe-like, with a warning in docs that the user should really look at pandas/xarray, and that structured arrays are primarily for data exchange. (If we want to dream, maybe one day we should make a minimal multiple-array container class. I imagine it would look pretty similar to recarray, but stored as a set of arrays instead of a structured array. But maybe recarrays are good enough, and let's not reimplement pandas either.) Allan > Josef > > > > -CHB > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 ?? voice > 7600 Sand Point Way NE (206) 526-6329 ?? fax > Seattle, WA ?98115 (206) 526-6317 ?? main > reception > > Chris.Barker at noaa.gov From mathoscope at netcourrier.com Sun Jan 28 01:07:30 2018 From: mathoscope at netcourrier.com (Vincent Douce Mathoscope) Date: Sun, 28 Jan 2018 07:07:30 +0100 Subject: [Numpy-discussion] linspaces Message-ID: hi i have 2 questions - i have a parametric plot : T=np.linespace(things) X=f(T)*np.cos(T) Y=f(T)*np.sin(T) i would like to get the first and the last point of this polar curve in the specs of "linspace" (https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html ) i dont see how to do that automatically is there something simple like "first(X)" and "last(X)" - i have a function : rotation(vect,theta) which takes vect=[a,b] and theta-rotates it by a matrix how to use this function or such a function to rotate a tuple (X,Y) with X,Y as described above in this message ? ?????????????????????????? Vincent Douce :=: Mathoscope :=: http://mathoscope.xyz 06?13?11?07?26 Bagn?res de Bigorre 65200 -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmhobson at gmail.com Sun Jan 28 01:28:06 2018 From: pmhobson at gmail.com (Paul Hobson) Date: Sat, 27 Jan 2018 22:28:06 -0800 Subject: [Numpy-discussion] linspaces In-Reply-To: References: Message-ID: I'm not sure I understand the question, but in python, you can get the first and elements of most types of sequences with e.g., X[0] and Y[-1], respectively. Is the second part of your question just a dot product? On Sat, Jan 27, 2018 at 10:07 PM, Vincent Douce Mathoscope < mathoscope at netcourrier.com> wrote: > hi > i have 2 questions > > - i have a parametric plot : > T=np.linespace(things) > X=f(T)*np.cos(T) > Y=f(T)*np.sin(T) > i would like to get the first and the last point of this polar curve > in the specs of "linspace" (https://docs.scipy.org/doc/ > numpy/reference/generated/numpy.linspace.html) i dont see how to do that > automatically > is there something simple like "first(X)" and "last(X)" > > - i have a function : > rotation(vect,theta) which takes vect=[a,b] and theta-rotates it by a > matrix > how to use this function or such a function to rotate a tuple (X,Y) with > X,Y as described above in this message ? > > ?????????????????????????? > Vincent Douce > :=: Mathoscope :=: > http://mathoscope.xyz > 06?13?11?07?26 > Bagn?res de Bigorre 65200 > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Jan 29 12:38:02 2018 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 29 Jan 2018 09:38:02 -0800 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: <388385e2-163b-ad2e-6bf6-1ac68a709278@gmail.com> References: <66031cd1-b872-203b-bdcc-a4416efbf946@gmail.com> <2c7a541a-dcc0-32f1-b18b-3ca35a34bb71@gmail.com> <89c7a41b-12c2-77b6-a12a-25d64f4cc6b9@gmail.com> <388385e2-163b-ad2e-6bf6-1ac68a709278@gmail.com> Message-ID: On Sat, Jan 27, 2018 at 8:50 PM, Allan Haldane wrote: > On 01/26/2018 06:01 PM, josef.pktd at gmail.com wrote: > >> I thought recarrays were pretty cool back in the day, but pandas is >> a much better option. >> >> So I pretty much only use structured arrays for data exchange with C >> code.... >> >> My impression is that this turns into a deprecate recarrays and >> supporting recfunction issue. >> >> > *should* we have any dataframe-like functionality in numpy? > > We get requests every once in a while about how to sort rows, or about > adding a "groupby" function. I myself have used recarrays in a > dataframe-like way, when I wanted a quick multiple-array object that > supported numpy indexing. So there is some demand to have minimal > "dataframe-like" behavior in numpy itself. > > recarrays play part of this role currently, though imperfectly due to > padding and cache issues. I think I'm comfortable with supporting some > minor use of structured/recarrays as dataframe-like, with a warning in docs > that the user should really look at pandas/xarray, and that structured > arrays are primarily for data exchange. > Well, I think we should either: deprecate recarrays -- i.e. explicitly not support DataFrame-like functionality in numpy, keeping only the data-exchange functionality as maintained. or Properly support it -- which doesn't mean re-implementing Pandas or xarray, but it would mean addressing any bug-like issues like not dealing properly with padding. Personally, I don't need/want it enough to contribute, but if someone does, great. This reminds me a bit of the old numpy.Matrix issue -- it was ALMOST there, but not quite, with issues, and there was essentially no overlap between the people that wanted it and the people that had the time and skills to really make it work. (If we want to dream, maybe one day we should make a minimal multiple-array > container class. I imagine it would look pretty similar to recarray, but > stored as a set of arrays instead of a structured array. But maybe > recarrays are good enough, and let's not reimplement pandas either.) > Exactly -- we really don't need to re-implement Pandas.... (except it's CSV reading capability :-) ) -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Mon Jan 29 13:22:12 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Mon, 29 Jan 2018 18:22:12 +0000 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: References: <66031cd1-b872-203b-bdcc-a4416efbf946@gmail.com> <2c7a541a-dcc0-32f1-b18b-3ca35a34bb71@gmail.com> <89c7a41b-12c2-77b6-a12a-25d64f4cc6b9@gmail.com> <388385e2-163b-ad2e-6bf6-1ac68a709278@gmail.com> Message-ID: I think that there's a lot of confusion going around about recarrays vs structured arrays. [`recarray`]( https://github.com/numpy/numpy/blob/v1.13.0/numpy/core/records.py) are a wrapper around structured arrays that provide: * Attribute access to fields as `arr.field` in addition to the normal `arr['field']` * Automatic datatype-guessing for nested lists of tuples (which needs a little work, but seems like a justifiable feature) * An undocumented `field` method that behaves like the 1.14 indexing behavior (!) Meanwhile, `recfunctions` is a collection of functions that work on normal structured arrays - so is misleadingly named. The only link to recarrays is that most of the functions have a `asrecarray` parameter which applies `.view(recarray)` to the result. > deprecate recarrays Given how thin an abstraction they are over structured arrays, I don't think you mean this. Are you advocating for deprecating structured arrays entirely, or just deprecating recfunctions? Eric On Mon, 29 Jan 2018 at 09:39 Chris Barker wrote: > On Sat, Jan 27, 2018 at 8:50 PM, Allan Haldane > wrote: > >> On 01/26/2018 06:01 PM, josef.pktd at gmail.com wrote: >> >>> I thought recarrays were pretty cool back in the day, but pandas is >>> a much better option. >>> >>> So I pretty much only use structured arrays for data exchange with C >>> code.... >>> >>> My impression is that this turns into a deprecate recarrays and >>> supporting recfunction issue. >>> >>> > >> *should* we have any dataframe-like functionality in numpy? > > >> >> We get requests every once in a while about how to sort rows, or about >> adding a "groupby" function. I myself have used recarrays in a >> dataframe-like way, when I wanted a quick multiple-array object that >> supported numpy indexing. So there is some demand to have minimal >> "dataframe-like" behavior in numpy itself. >> >> recarrays play part of this role currently, though imperfectly due to >> padding and cache issues. I think I'm comfortable with supporting some >> minor use of structured/recarrays as dataframe-like, with a warning in docs >> that the user should really look at pandas/xarray, and that structured >> arrays are primarily for data exchange. >> > > Well, I think we should either: > > deprecate recarrays -- i.e. explicitly not support DataFrame-like > functionality in numpy, keeping only the data-exchange functionality as > maintained. > > or > > Properly support it -- which doesn't mean re-implementing Pandas or > xarray, but it would mean addressing any bug-like issues like not dealing > properly with padding. > > Personally, I don't need/want it enough to contribute, but if someone > does, great. > > This reminds me a bit of the old numpy.Matrix issue -- it was ALMOST > there, but not quite, with issues, and there was essentially no overlap > between the people that wanted it and the people that had the time and > skills to really make it work. > > (If we want to dream, maybe one day we should make a minimal >> multiple-array container class. I imagine it would look pretty similar to >> recarray, but stored as a set of arrays instead of a structured array. But >> maybe recarrays are good enough, and let's not reimplement pandas either.) >> > > Exactly -- we really don't need to re-implement Pandas.... > > (except it's CSV reading capability :-) ) > > -CHB > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Jan 29 14:10:56 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 29 Jan 2018 14:10:56 -0500 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: References: <66031cd1-b872-203b-bdcc-a4416efbf946@gmail.com> <2c7a541a-dcc0-32f1-b18b-3ca35a34bb71@gmail.com> <89c7a41b-12c2-77b6-a12a-25d64f4cc6b9@gmail.com> <388385e2-163b-ad2e-6bf6-1ac68a709278@gmail.com> Message-ID: On Mon, Jan 29, 2018 at 1:22 PM, Eric Wieser wrote: > I think that there's a lot of confusion going around about recarrays vs > structured arrays. > > [`recarray`](https://github.com/numpy/numpy/blob/v1.13.0/ > numpy/core/records.py) are a wrapper around structured arrays that > provide: > * Attribute access to fields as `arr.field` in addition to the normal > `arr['field']` > * Automatic datatype-guessing for nested lists of tuples (which needs a > little work, but seems like a justifiable feature) > * An undocumented `field` method that behaves like the 1.14 indexing > behavior (!) > > Meanwhile, `recfunctions` is a collection of functions that work on normal > structured arrays - so is misleadingly named. > The only link to recarrays is that most of the functions have a > `asrecarray` parameter which applies `.view(recarray)` to the result. > > > deprecate recarrays > > Given how thin an abstraction they are over structured arrays, I don't > think you mean this. > Are you advocating for deprecating structured arrays entirely, or just > deprecating recfunctions? > First, statsmodels is in the pandas camp for dataframes, so I don't have any invested interest in recarrays/structured dtypes anymore. What I meant was that structured dtypes with implicit (hidden?) padding becomes unintuitive for the recarray/dataframe usecase. (At least I won't try to update my intuition about having extra things in there that are not specified by the main structured dtype.) Also the dataframe_like usage of structured dtypes doesn't seem to be much under consideration anymore. So, my **impression** is that the recent changes make the recarray/dataframe usecase for structured dtypes more difficult. Given that there is pandas, xarray, dask and more, numpy could as well drop any pretense of supporting dataframe_likes. Or, adjust the recfunctions so we can still work dataframe_like with structured dtypes/recarrays/recfunctions. Josef > > Eric > > On Mon, 29 Jan 2018 at 09:39 Chris Barker wrote: > >> On Sat, Jan 27, 2018 at 8:50 PM, Allan Haldane >> wrote: >> >>> On 01/26/2018 06:01 PM, josef.pktd at gmail.com wrote: >>> >>>> I thought recarrays were pretty cool back in the day, but pandas is >>>> a much better option. >>>> >>>> So I pretty much only use structured arrays for data exchange with C >>>> code.... >>>> >>>> My impression is that this turns into a deprecate recarrays and >>>> supporting recfunction issue. >>>> >>>> >> >>> *should* we have any dataframe-like functionality in numpy? >> >> >>> >>> We get requests every once in a while about how to sort rows, or about >>> adding a "groupby" function. I myself have used recarrays in a >>> dataframe-like way, when I wanted a quick multiple-array object that >>> supported numpy indexing. So there is some demand to have minimal >>> "dataframe-like" behavior in numpy itself. >>> >>> recarrays play part of this role currently, though imperfectly due to >>> padding and cache issues. I think I'm comfortable with supporting some >>> minor use of structured/recarrays as dataframe-like, with a warning in docs >>> that the user should really look at pandas/xarray, and that structured >>> arrays are primarily for data exchange. >>> >> >> Well, I think we should either: >> >> deprecate recarrays -- i.e. explicitly not support DataFrame-like >> functionality in numpy, keeping only the data-exchange functionality as >> maintained. >> >> or >> >> Properly support it -- which doesn't mean re-implementing Pandas or >> xarray, but it would mean addressing any bug-like issues like not dealing >> properly with padding. >> >> Personally, I don't need/want it enough to contribute, but if someone >> does, great. >> >> This reminds me a bit of the old numpy.Matrix issue -- it was ALMOST >> there, but not quite, with issues, and there was essentially no overlap >> between the people that wanted it and the people that had the time and >> skills to really make it work. >> >> (If we want to dream, maybe one day we should make a minimal >>> multiple-array container class. I imagine it would look pretty similar to >>> recarray, but stored as a set of arrays instead of a structured array. But >>> maybe recarrays are good enough, and let's not reimplement pandas either.) >>> >> >> Exactly -- we really don't need to re-implement Pandas.... >> >> (except it's CSV reading capability :-) ) >> >> -CHB >> >> >> -- >> >> Christopher Barker, Ph.D. >> Oceanographer >> >> Emergency Response Division >> NOAA/NOS/OR&R (206) 526-6959 voice >> 7600 Sand Point Way NE (206) 526-6329 fax >> Seattle, WA 98115 (206) 526-6317 main reception >> >> Chris.Barker at noaa.gov >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Mon Jan 29 14:55:48 2018 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Mon, 29 Jan 2018 11:55:48 -0800 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: References: <2c7a541a-dcc0-32f1-b18b-3ca35a34bb71@gmail.com> <89c7a41b-12c2-77b6-a12a-25d64f4cc6b9@gmail.com> <388385e2-163b-ad2e-6bf6-1ac68a709278@gmail.com> Message-ID: <20180129195548.dtopwmanutp6ly4u@fastmail.com> On Mon, 29 Jan 2018 14:10:56 -0500, josef.pktd at gmail.com wrote: >Given that there is pandas, xarray, dask and more, numpy could as well drop >any pretense of supporting dataframe_likes. Or, adjust the recfunctions so >we can still work dataframe_like with structured >dtypes/recarrays/recfunctions. I haven't been following the duckarray discussion carefully, but could this be an opportunity for a dataframe protocol, so that we can have libraries ingest structured arrays, record arrays, pandas dataframes, etc. without too much specialized code? St?fan From josef.pktd at gmail.com Mon Jan 29 15:24:43 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 29 Jan 2018 15:24:43 -0500 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: <20180129195548.dtopwmanutp6ly4u@fastmail.com> References: <2c7a541a-dcc0-32f1-b18b-3ca35a34bb71@gmail.com> <89c7a41b-12c2-77b6-a12a-25d64f4cc6b9@gmail.com> <388385e2-163b-ad2e-6bf6-1ac68a709278@gmail.com> <20180129195548.dtopwmanutp6ly4u@fastmail.com> Message-ID: On Mon, Jan 29, 2018 at 2:55 PM, Stefan van der Walt wrote: > On Mon, 29 Jan 2018 14:10:56 -0500, josef.pktd at gmail.com wrote: > >> Given that there is pandas, xarray, dask and more, numpy could as well >> drop >> any pretense of supporting dataframe_likes. Or, adjust the recfunctions so >> we can still work dataframe_like with structured >> dtypes/recarrays/recfunctions. >> > > I haven't been following the duckarray discussion carefully, but could > this be an opportunity for a dataframe protocol, so that we can have > libraries ingest structured arrays, record arrays, pandas dataframes, > etc. without too much specialized code? > AFAIU while not being in the data handling area, pandas defines the interface and other libraries provide pandas compatible interfaces or implementations. statsmodels currently still has recarray support and usage. In some interfaces we support pandas, recarrays and plain arrays, or anything where asarray works correctly. But recarrays became messy to support, one rewrite of some functions last year converts recarrays to pandas, does the manipulation and then converts back to recarrays. Also we need to adjust our recarray usage with new numpy versions. But there is no real benefit because I doubt that statsmodels still has any recarray/structured dtype users. So, we only have to remove our own uses in the datasets and unit tests. Josef > > St?fan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.debuyl at chem.kuleuven.be Mon Jan 29 15:39:15 2018 From: pierre.debuyl at chem.kuleuven.be (Pierre de Buyl) Date: Mon, 29 Jan 2018 21:39:15 +0100 Subject: [Numpy-discussion] Moving NumPy's PRNG Forward In-Reply-To: References: Message-ID: <20180129203915.GJ1201@pi-x230> Hello, On Sat, Jan 27, 2018 at 09:28:54AM +0900, Robert Kern wrote: > On Sat, Jan 27, 2018 at 1:14 AM, Kevin Sheppard > wrote: > > > > In terms of what is needed, I think that the underlying PRNG should > be swappable. The will provide a simple mechanism to allow certain > types of advancement while easily providing backward compat. In the > current design this is very hard and requires compiling many nearly > identical copies of RandomState. In pseudocode something like > > > > standard_normal(prng) > > > > where prng is a basic class that retains the PRNG state and has a > small set of core random number generators that belong to the > underlying PRNG -- probably something like int32, int64, double, and > possibly int53. I am not advocating explicitly passing the PRNG as an > argument, but having generators which can take any suitable PRNG would > add a lot of flexibility in terms of taking advantage of improvements > in the underlying PRNGs (see, e.g., xoroshiro128/xorshift1024). The > "small" core PRNG would have responsibility over state and streams. > The remainder of the module would transform the underlying PRNG into > the required distributions. > (edit: after writing the following verbiage, I realize it can be summed > up with more respect to your suggestion: yes, we should do this design, > but we don't need to and shouldn't give up on a class with distribution > methods.) > Once the core PRNG C API is in place, I don't think we necessarily need > to move away from a class structure per se, though it becomes an > option. (Sorry for cutting so much, I have a short question) My typical use case for the C API of NumPy's random features is that I start coding in pure Python and then switch to Cython. I have at least twice in the past resorted to include "randomkit.h" and use that directly. My last work actually implements a Python/Cython interface for rngs, see http://threefry.readthedocs.io/using_from_cython.html The goal is to use exactly the same backend in Python and Cython, with a cimport and a few cdefs the only changes needed for a first port to Cython. Is this type of feature in discussion or in project for the future of numpy.random? Regards, Pierre From ben.v.root at gmail.com Mon Jan 29 15:44:56 2018 From: ben.v.root at gmail.com (Benjamin Root) Date: Mon, 29 Jan 2018 15:44:56 -0500 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: References: <2c7a541a-dcc0-32f1-b18b-3ca35a34bb71@gmail.com> <89c7a41b-12c2-77b6-a12a-25d64f4cc6b9@gmail.com> <388385e2-163b-ad2e-6bf6-1ac68a709278@gmail.com> <20180129195548.dtopwmanutp6ly4u@fastmail.com> Message-ID: I <3 structured arrays. I love the fact that I can access data by row and then by fieldname, or vice versa. There are times when I need to pass just a column into a function, and there are times when I need to process things row by row. Yes, pandas is nice if you want the specialized indexing features, but it becomes a bear to deal with if all you want is normal indexing, or even the ability to easily loop over the dataset. Cheers! Ben Root On Mon, Jan 29, 2018 at 3:24 PM, wrote: > > > On Mon, Jan 29, 2018 at 2:55 PM, Stefan van der Walt > wrote: > >> On Mon, 29 Jan 2018 14:10:56 -0500, josef.pktd at gmail.com wrote: >> >>> Given that there is pandas, xarray, dask and more, numpy could as well >>> drop >>> any pretense of supporting dataframe_likes. Or, adjust the recfunctions >>> so >>> we can still work dataframe_like with structured >>> dtypes/recarrays/recfunctions. >>> >> >> I haven't been following the duckarray discussion carefully, but could >> this be an opportunity for a dataframe protocol, so that we can have >> libraries ingest structured arrays, record arrays, pandas dataframes, >> etc. without too much specialized code? >> > > AFAIU while not being in the data handling area, pandas defines the > interface and other libraries provide pandas compatible interfaces or > implementations. > > statsmodels currently still has recarray support and usage. In some > interfaces we support pandas, recarrays and plain arrays, or anything where > asarray works correctly. > > But recarrays became messy to support, one rewrite of some functions last > year converts recarrays to pandas, does the manipulation and then converts > back to recarrays. > Also we need to adjust our recarray usage with new numpy versions. But > there is no real benefit because I doubt that statsmodels still has any > recarray/structured dtype users. So, we only have to remove our own uses in > the datasets and unit tests. > > Josef > > > >> >> St?fan >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jeremy.Solbrig at colostate.edu Mon Jan 29 13:16:02 2018 From: Jeremy.Solbrig at colostate.edu (Solbrig,Jeremy) Date: Mon, 29 Jan 2018 18:16:02 +0000 Subject: [Numpy-discussion] f2py bug in numpy v1.12.0 and above? Message-ID: I have a suite of fortran code that I compile with f2py and use as a plugin to a python package. I am using Python v2.7 from Anaconda. When compiled using numpy v1.11.3 or lower, everything works fine, but if I upgrade to any more recent version I begin running into a runtime error. Presumably I am just not linking something that needs to be linked, but I'm not sure what library that would be. The error: ImportError: .../libstddev_kernel.so: undefined symbol: _gfortran_stop_numeric_f08 Where libestddev_kernel.so is built using this command: f2py --fcompiler=gnu95 --quiet --opt="-O3" -L. -L/usr/lib -I. -I.../include -I/usr/include -m libstddev_kernel -c stddev_kernel/stddev_kernel.f90 config.o Is there something additional I should be linking to within numpy or my anaconda installation? Thank you, Jeremy Jeremy Solbrig Research Associate Cooperative Institute for Research in the Atmosphere Colorado State University jeremy.solbrig at colostate.edu (970) 491-8805 -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon Jan 29 15:56:12 2018 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 29 Jan 2018 12:56:12 -0800 Subject: [Numpy-discussion] f2py bug in numpy v1.12.0 and above? In-Reply-To: References: Message-ID: Hi, On Mon, Jan 29, 2018 at 10:16 AM, Solbrig,Jeremy wrote: > I have a suite of fortran code that I compile with f2py and use as a plugin > to a python package. I am using Python v2.7 from Anaconda. When compiled > using numpy v1.11.3 or lower, everything works fine, but if I upgrade to any > more recent version I begin running into a runtime error. Presumably I am > just not linking something that needs to be linked, but I'm not sure what > library that would be. > > > The error: > > ImportError: .../libstddev_kernel.so: undefined symbol: > _gfortran_stop_numeric_f08 > > Where libestddev_kernel.so is built using this command: > f2py --fcompiler=gnu95 --quiet --opt="-O3" -L. -L/usr/lib -I. -I.../include > -I/usr/include -m libstddev_kernel -c stddev_kernel/stddev_kernel.f90 > config.o > > > Is there something additional I should be linking to within numpy or my > anaconda installation? Do you get the same problem with a pip install of numpy? If not, this may be an Anaconda packaging bug rather than a numpy bug ... Cheers, Matthew From andyfaff at gmail.com Mon Jan 29 16:02:02 2018 From: andyfaff at gmail.com (Andrew Nelson) Date: Tue, 30 Jan 2018 08:02:02 +1100 Subject: [Numpy-discussion] f2py bug in numpy v1.12.0 and above? In-Reply-To: References: Message-ID: Something similar was mentioned at https://github.com/scipy/scipy/issues/8325. -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Jan 29 16:02:39 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 29 Jan 2018 16:02:39 -0500 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: References: <2c7a541a-dcc0-32f1-b18b-3ca35a34bb71@gmail.com> <89c7a41b-12c2-77b6-a12a-25d64f4cc6b9@gmail.com> <388385e2-163b-ad2e-6bf6-1ac68a709278@gmail.com> <20180129195548.dtopwmanutp6ly4u@fastmail.com> Message-ID: On Mon, Jan 29, 2018 at 3:44 PM, Benjamin Root wrote: > I <3 structured arrays. I love the fact that I can access data by row and > then by fieldname, or vice versa. There are times when I need to pass just > a column into a function, and there are times when I need to process things > row by row. Yes, pandas is nice if you want the specialized indexing > features, but it becomes a bear to deal with if all you want is normal > indexing, or even the ability to easily loop over the dataset. > I don't think there is a doubt that structured arrays, arrays with structured dtypes, are a useful container. The question is whether they should be more or the foundation for more. For example, computing a mean, or reduce operation, over numeric element ("columns"). Before padded views it was possible to index by selecting the relevant "columns" and view them as standard array. With padded views that breaks and AFAICS, there is no way in numpy 1.14.0 to compute a mean of some "columns". (I don't have numpy 1.14 to try or find a workaround, like maybe looping over all relevant columns.) Josef > > Cheers! > Ben Root > > On Mon, Jan 29, 2018 at 3:24 PM, wrote: > >> >> >> On Mon, Jan 29, 2018 at 2:55 PM, Stefan van der Walt < >> stefanv at berkeley.edu> wrote: >> >>> On Mon, 29 Jan 2018 14:10:56 -0500, josef.pktd at gmail.com wrote: >>> >>>> Given that there is pandas, xarray, dask and more, numpy could as well >>>> drop >>>> any pretense of supporting dataframe_likes. Or, adjust the recfunctions >>>> so >>>> we can still work dataframe_like with structured >>>> dtypes/recarrays/recfunctions. >>>> >>> >>> I haven't been following the duckarray discussion carefully, but could >>> this be an opportunity for a dataframe protocol, so that we can have >>> libraries ingest structured arrays, record arrays, pandas dataframes, >>> etc. without too much specialized code? >>> >> >> AFAIU while not being in the data handling area, pandas defines the >> interface and other libraries provide pandas compatible interfaces or >> implementations. >> >> statsmodels currently still has recarray support and usage. In some >> interfaces we support pandas, recarrays and plain arrays, or anything where >> asarray works correctly. >> >> But recarrays became messy to support, one rewrite of some functions last >> year converts recarrays to pandas, does the manipulation and then converts >> back to recarrays. >> Also we need to adjust our recarray usage with new numpy versions. But >> there is no real benefit because I doubt that statsmodels still has any >> recarray/structured dtype users. So, we only have to remove our own uses in >> the datasets and unit tests. >> >> Josef >> >> >> >>> >>> St?fan >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Mon Jan 29 16:11:18 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Mon, 29 Jan 2018 16:11:18 -0500 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: References: <2c7a541a-dcc0-32f1-b18b-3ca35a34bb71@gmail.com> <89c7a41b-12c2-77b6-a12a-25d64f4cc6b9@gmail.com> <388385e2-163b-ad2e-6bf6-1ac68a709278@gmail.com> <20180129195548.dtopwmanutp6ly4u@fastmail.com> Message-ID: On 01/29/2018 04:02 PM, josef.pktd at gmail.com wrote: > > > On Mon, Jan 29, 2018 at 3:44 PM, Benjamin Root > wrote: > > I <3 structured arrays. I love the fact that I can access data by > row and then by fieldname, or vice versa. There are times when I > need to pass just a column into a function, and there are times when > I need to process things row by row. Yes, pandas is nice if you want > the specialized indexing features, but it becomes a bear to deal > with if all you want is normal indexing, or even the ability to > easily loop over the dataset. > > > I don't think there is a doubt that structured arrays, arrays with > structured dtypes, are a useful container. The question is whether they > should be more or the foundation for more. > > For example, computing a mean, or reduce operation, over numeric element > ("columns"). Before padded views it was possible to index by selecting > the relevant "columns" and view them as standard array. With padded > views that breaks and AFAICS, there is no way in numpy 1.14.0 to compute > a mean of some "columns". (I don't have numpy 1.14 to try or find a > workaround, like maybe looping over all relevant columns.) > > Josef Just to clarify, structured types have always had padding bytes, that isn't new. What *is* new (which we are pushing to 1.15, I think) is that it may be somewhat more common to end up with padding than before, and only if you are specifically using multi-field indexing, which is a fairly specialized case. I think recfunctions already account properly for padding bytes. Except for the bug in #8100, which we will fix, padding-bytes in recarrays are more or less invisible to a non-expert who only cares about dataframe-like behavior. In other words, padding is no obstacle at all to computing a mean over a column, and single-field indexes in 1.15 behave identically as before. The only thing that will change in 1.15 is multi-field indexing, and it has never been possible to compute a mean (or any binary operation) on multiple fields. Allan > > Cheers! > Ben Root > > On Mon, Jan 29, 2018 at 3:24 PM, > wrote: > > > > On Mon, Jan 29, 2018 at 2:55 PM, Stefan van der Walt > > wrote: > > On Mon, 29 Jan 2018 14:10:56 -0500, josef.pktd at gmail.com > wrote: > > Given that there is pandas, xarray, dask and more, numpy > could as well drop > any pretense of supporting dataframe_likes. Or, adjust > the recfunctions so > we can still work dataframe_like with structured > dtypes/recarrays/recfunctions. > > > I haven't been following the duckarray discussion carefully, > but could > this be an opportunity for a dataframe protocol, so that we > can have > libraries ingest structured arrays, record arrays, pandas > dataframes, > etc. without too much specialized code? > > > AFAIU while not being in the data handling area, pandas defines > the interface and other libraries provide pandas compatible > interfaces or implementations. > > statsmodels currently still has recarray support and usage. In > some interfaces we support pandas, recarrays and plain arrays, > or anything where asarray works correctly. > > But recarrays became messy to support, one rewrite of some > functions last year converts recarrays to pandas, does the > manipulation and then converts back to recarrays. > Also we need to adjust our recarray usage with new numpy > versions. But there is no real benefit because I doubt that > statsmodels still has any recarray/structured dtype users. So, > we only have to remove our own uses in the datasets and unit tests. > > Josef > > ? > > > St?fan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > From robert.kern at gmail.com Mon Jan 29 16:16:06 2018 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 30 Jan 2018 06:16:06 +0900 Subject: [Numpy-discussion] Moving NumPy's PRNG Forward In-Reply-To: <20180129203915.GJ1201@pi-x230> References: <20180129203915.GJ1201@pi-x230> Message-ID: On Tue, Jan 30, 2018 at 5:39 AM, Pierre de Buyl < pierre.debuyl at chem.kuleuven.be> wrote: > > Hello, > > On Sat, Jan 27, 2018 at 09:28:54AM +0900, Robert Kern wrote: > > On Sat, Jan 27, 2018 at 1:14 AM, Kevin Sheppard > > wrote: > > > > > > In terms of what is needed, I think that the underlying PRNG should > > be swappable. The will provide a simple mechanism to allow certain > > types of advancement while easily providing backward compat. In the > > current design this is very hard and requires compiling many nearly > > identical copies of RandomState. In pseudocode something like > > > > > > standard_normal(prng) > > > > > > where prng is a basic class that retains the PRNG state and has a > > small set of core random number generators that belong to the > > underlying PRNG -- probably something like int32, int64, double, and > > possibly int53. I am not advocating explicitly passing the PRNG as an > > argument, but having generators which can take any suitable PRNG would > > add a lot of flexibility in terms of taking advantage of improvements > > in the underlying PRNGs (see, e.g., xoroshiro128/xorshift1024). The > > "small" core PRNG would have responsibility over state and streams. > > The remainder of the module would transform the underlying PRNG into > > the required distributions. > > (edit: after writing the following verbiage, I realize it can be summed > > up with more respect to your suggestion: yes, we should do this design, > > but we don't need to and shouldn't give up on a class with distribution > > methods.) > > Once the core PRNG C API is in place, I don't think we necessarily need > > to move away from a class structure per se, though it becomes an > > option. > > (Sorry for cutting so much, I have a short question) > > My typical use case for the C API of NumPy's random features is that I start > coding in pure Python and then switch to Cython. I have at least twice in the > past resorted to include "randomkit.h" and use that directly. My last work > actually implements a Python/Cython interface for rngs, see > http://threefry.readthedocs.io/using_from_cython.html > > The goal is to use exactly the same backend in Python and Cython, with a cimport > and a few cdefs the only changes needed for a first port to Cython. > > Is this type of feature in discussion or in project for the future of > numpy.random? Sort of, but not really. For sure, once we've made the decisions that let us move forward to a new design, we'll have Cython implementations that can be used natively from Cython as well as Python without code changes. *But* it's not going to be an automatic speedup like your threefry library allows. You designed that API such that each of the methods returns a single scalar, so all you need to do is declare your functions `cpdef` and provide a `.pxd`. Our methods return either a scalar or an array depending on the arguments, so the methods will be declared to return `object`, and you will pay the overhead costs for checking the arguments and such. We're not going to change that Python API; we're only considering dropping stream-compatibility, not source-compatibility. I would like to make sure that we do expose a C/Cython API to the distribution functions (i.e. that only draw a single variate and return a scalar), but it's not likely to look exactly like the Python API. There might be clever tricks that we can do to minimize the amount of changes that one needs to do, though, if you are only drawing single variates at a time (e.g. an agent-based simulation) and you want to make it go faster by moving to Cython. For example, maybe we collect all of the single-variate C-implemented methods into a single object sitting as an attribute on the `Distributions` object. cdef class DistributionsCAPI: cdef double normal(double loc, double scale) cdef double uniform(double low, double high) cdef class Distributions: cdef DistributionsCAPI capi cpdef object normal(loc, scale, size=None): if size is None and np.isscalar(loc) and np.isscalar(scale): return self.capi.normal(loc, scale) else: # ... Make an array prng = Distributions(...) # From Python: x = prng.normal(mean, std) # From Cython: cdef double x = prng.capi.normal(mean, std) But we need to make some higher-level decisions first before we can get down to this level of design. Please do jump in and remind us of this use case once we do get down to actual work on the new API design. Thanks! -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon Jan 29 16:16:26 2018 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 29 Jan 2018 13:16:26 -0800 Subject: [Numpy-discussion] f2py bug in numpy v1.12.0 and above? In-Reply-To: References: Message-ID: Hi, On Mon, Jan 29, 2018 at 1:02 PM, Andrew Nelson wrote: > Something similar was mentioned at > https://github.com/scipy/scipy/issues/8325. Yes, that one was also for Anaconda... Cheers, Matthew From kevin.k.sheppard at gmail.com Mon Jan 29 16:39:32 2018 From: kevin.k.sheppard at gmail.com (Kevin Sheppard) Date: Mon, 29 Jan 2018 21:39:32 +0000 Subject: [Numpy-discussion] Moving NumPy's PRNG Forward In-Reply-To: References: Message-ID: I agree with pretty much everything you wrote Robert. I didn't have quote the right frame but the generic class that takes a low-level core PRNG sounds like the right design, and this should make user-generated distributions easier to develop. I was thinking along these lines inspired by the SpiPy changes that use a LowLevelCallable, e.g., https://ilovesymposia.com/2017/03/12/scipys-new-lowlevelcallable-is-a-game-changer/ This might also allow users to extend the core PRNGs using something like Numba JIT classes as an alternative. Another area that needs though is how to correctly spawn in Multiprocess application. This might be most easily addressed by providing a guide on a good way rather than the arbitrary way used now. Kevin On Sat, Jan 27, 2018 at 5:03 PM wrote: > Send NumPy-Discussion mailing list submissions to > numpy-discussion at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/numpy-discussion > or, via email, send a message with subject or body 'help' to > numpy-discussion-request at python.org > > You can reach the person managing the list at > numpy-discussion-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of NumPy-Discussion digest..." > > > Today's Topics: > > 1. Re: Moving NumPy's PRNG Forward (Robert Kern) > 2. Re: Using np.frombuffer and cffi.buffer on array of C structs > (problem with struct member padding) (Joe) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sat, 27 Jan 2018 09:28:54 +0900 > From: Robert Kern > To: Discussion of Numerical Python > Subject: Re: [Numpy-discussion] Moving NumPy's PRNG Forward > Message-ID: > w at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > On Sat, Jan 27, 2018 at 1:14 AM, Kevin Sheppard < > kevin.k.sheppard at gmail.com> > wrote: > > > > I am a firm believer that the current situation is not sustainable. > There are a lot of improvements that can practically be incorporated. > While many of these are performance related, there are also improvements in > accuracy over some ranges of parameters that cannot be incorporated. I also > think that perfect stream reproducibility is a bit of a myth across > versions since this really would require identical OS, compiler and > possibly CPU for some of the generators that produce floats. > > > > I believe there is a case for separating the random generator from core > NumPy. Some points that favor becoming a subproject: > > > > 1. It is a pure consumer of NumPy API. Other parts of the API do no > depend on random. > > 2. A stand alone package could be installed along side many different > version of core NumPy which would reduce the pressure on freezing the > stream. > > Removing numpy.random (or freezing it as deprecated legacy while all PRNG > development moves elsewhere) is probably a non-starter. It's too used for > us not to provide something. That said, we can (and ought to) make it much > easier for external packages to provide PRNG capabilities (core PRNGs and > distributions) that interoperate with the core functionality that numpy > provides. I'm also happy to place a high barrier on adding more > distributions to numpy.random once that is in place. > > Specifically, core uniform PRNGs should have a small common C API that > distribution functions can use. This might just be a struct with an opaque > `void*` state pointer and then 2 function pointers for drawing a uint64 > (whole range) and a double in [0,1) from the state. It's important to > expose our core uniform PRNGs as a C API because there has been a desire to > interoperate at that level, using the same PRNG state inside C or Fortran > or GPU code. If that's in place, then people can write new efficient > distribution functions in C that use this small C API agnostic to the core > PRNG algorithm. It also makes it easy to implement new core PRNGs that the > distribution functions provided by numpy.random can use. > > > In terms of what is needed, I think that the underlying PRNG should be > swappable. The will provide a simple mechanism to allow certain types of > advancement while easily providing backward compat. In the current design > this is very hard and requires compiling many nearly identical copies of > RandomState. In pseudocode something like > > > > standard_normal(prng) > > > > where prng is a basic class that retains the PRNG state and has a small > set of core random number generators that belong to the underlying PRNG -- > probably something like int32, int64, double, and possibly int53. I am not > advocating explicitly passing the PRNG as an argument, but having > generators which can take any suitable PRNG would add a lot of flexibility > in terms of taking advantage of improvements in the underlying PRNGs (see, > e.g., xoroshiro128/xorshift1024). The "small" core PRNG would have > responsibility over state and streams. The remainder of the module would > transform the underlying PRNG into the required distributions. > > (edit: after writing the following verbiage, I realize it can be summed up > with more respect to your suggestion: yes, we should do this design, but we > don't need to and shouldn't give up on a class with distribution methods.) > > Once the core PRNG C API is in place, I don't think we necessarily need to > move away from a class structure per se, though it becomes an option. We > just separate the core PRNG object from the distribution-providing class. > We don't need to make copies of the distribution-providing class just to > use a new core PRNG. I'm coming around to Nathaniel's suggestion for the > constructor API (though not the distribution-versioning, for reasons I can > get into later). We have a couple of core uniform PRNG classes like > `MT19937` and `PCG128`. Those have a tiny API, and probably don't have a > lot of unnecessary code clones between them. Their constructors can be > different depending on the different ways they can be instantiated, > depending on the PRNG's features. I'm not sure that they'll have any common > methods besides `__getstate__/__setstate__` and probably a `copy()`. They > will expose their C API as a Python-opaque attribute. They can have > whatever algorithm-dependent methods they need (e.g. to support jumpahead). > I might not even expose to Python the uint64 and U(0,1) double sampling > methods, but maybe so. > > Then we have a single `Distributions` class that provides all of the > distributions that we want to support in numpy.random (i.e. what we > currently have on `RandomState` and whatever passes our higher bar in the > future). It takes one of the core PRNG instances as an argument to the > constructor (nominally, at least; we can design factory functions to make > this more convenient). > > prng = Distributions(PCG128(seed)) > x = prng.normal(mean, std) > > If someone wants to write a WELL512 core PRNG, they can just implement that > object and pass it to `Distributions()`. The `Distributions` code doesn't > need to be copied, nor do we need to much around with `__new__` tricks in > Cython. > > Why do this instead of distribution functions? Well, I have a damn lot of > code that is expecting an object with at least the broad outlines of the > `RandomState` interface. I'm not going to update that code to use functions > instead. And if I'm not, no one is. There isn't a good transition path > since the PRNG object needs to thread through all of the code, whether it's > my code that I'm writing greenfield or library code that I don't control. > That said, people writing new distributions outside of numpy.random would > be writing functions, not trying to add to the `Distributions` class, but > that's fine. > > It also allows the possibility for the `Distributions` class to be stateful > if we want to do things like caching the next Box-Muller variate and not > force that onto the core PRNG state like I currently do. Though I'd rather > just drop Box-Muller, and that's not a common pattern outside of > Box-Muller. But it's a possibility. > > -- > Robert Kern > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.python.org/pipermail/numpy-discussion/attachments/20180127/c76673e3/attachment-0001.html > > > > ------------------------------ > > Message: 2 > Date: Sat, 27 Jan 2018 10:30:47 +0100 > From: Joe > To: numpy-discussion at python.org > Subject: Re: [Numpy-discussion] Using np.frombuffer and cffi.buffer on > array of C structs (problem with struct member padding) > Message-ID: > Content-Type: text/plain; charset=utf-8; format=flowed > > Thanks for your help on this! This solved my issue. > > > Am 25.01.2018 um 19:01 schrieb Allan Haldane: > > There is a new section discussing alignment in the numpy 1.14 structured > > array docs, which has some hints about interfacing with C structs. > > > > These new 1.14 docs are not online yet on scipy.org, but in the meantime > > you can view them here: > > > https://ahaldane.github.io/user/basics.rec.html#automatic-byte-offsets-and-alignment > > > > (That links specifically to the discussion of alignments and padding). > > > > Allan > > > > On 01/25/2018 11:33 AM, Chris Barker - NOAA Federal wrote: > >> > >>> > >>> The numpy dtype constructor takes an ?align? keyword that will pad it > >>> for you. > >> > >> > https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.dtype.html > >> > >> -CHB > >> > >> > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at python.org > >> https://mail.python.org/mailman/listinfo/numpy-discussion > >> > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > > ------------------------------ > > End of NumPy-Discussion Digest, Vol 136, Issue 37 > ************************************************* > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Jan 29 17:50:20 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 29 Jan 2018 17:50:20 -0500 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: References: <2c7a541a-dcc0-32f1-b18b-3ca35a34bb71@gmail.com> <89c7a41b-12c2-77b6-a12a-25d64f4cc6b9@gmail.com> <388385e2-163b-ad2e-6bf6-1ac68a709278@gmail.com> <20180129195548.dtopwmanutp6ly4u@fastmail.com> Message-ID: On Mon, Jan 29, 2018 at 4:11 PM, Allan Haldane wrote: > On 01/29/2018 04:02 PM, josef.pktd at gmail.com wrote: > > > > > > On Mon, Jan 29, 2018 at 3:44 PM, Benjamin Root > > wrote: > > > > I <3 structured arrays. I love the fact that I can access data by > > row and then by fieldname, or vice versa. There are times when I > > need to pass just a column into a function, and there are times when > > I need to process things row by row. Yes, pandas is nice if you want > > the specialized indexing features, but it becomes a bear to deal > > with if all you want is normal indexing, or even the ability to > > easily loop over the dataset. > > > > > > I don't think there is a doubt that structured arrays, arrays with > > structured dtypes, are a useful container. The question is whether they > > should be more or the foundation for more. > > > > For example, computing a mean, or reduce operation, over numeric element > > ("columns"). Before padded views it was possible to index by selecting > > the relevant "columns" and view them as standard array. With padded > > views that breaks and AFAICS, there is no way in numpy 1.14.0 to compute > > a mean of some "columns". (I don't have numpy 1.14 to try or find a > > workaround, like maybe looping over all relevant columns.) > > > > Josef > > Just to clarify, structured types have always had padding bytes, that > isn't new. > > What *is* new (which we are pushing to 1.15, I think) is that it may be > somewhat more common to end up with padding than before, and only if you > are specifically using multi-field indexing, which is a fairly > specialized case. > > I think recfunctions already account properly for padding bytes. Except > for the bug in #8100, which we will fix, padding-bytes in recarrays are > more or less invisible to a non-expert who only cares about > dataframe-like behavior. > > In other words, padding is no obstacle at all to computing a mean over a > column, and single-field indexes in 1.15 behave identically as before. > The only thing that will change in 1.15 is multi-field indexing, and it > has never been possible to compute a mean (or any binary operation) on > multiple fields. > from the example in the other thread a[['b', 'c']].view(('f8', 2)).mean(0) (from the statsmodels usecase: read csv with genfromtext to get recarray or structured array select/index the numeric columns view them as standard array do whatever we can do with standard numpy arrays ) Josef > > Allan > > > > > Cheers! > > Ben Root > > > > On Mon, Jan 29, 2018 at 3:24 PM, > > wrote: > > > > > > > > On Mon, Jan 29, 2018 at 2:55 PM, Stefan van der Walt > > > wrote: > > > > On Mon, 29 Jan 2018 14:10:56 -0500, josef.pktd at gmail.com > > wrote: > > > > Given that there is pandas, xarray, dask and more, numpy > > could as well drop > > any pretense of supporting dataframe_likes. Or, adjust > > the recfunctions so > > we can still work dataframe_like with structured > > dtypes/recarrays/recfunctions. > > > > > > I haven't been following the duckarray discussion carefully, > > but could > > this be an opportunity for a dataframe protocol, so that we > > can have > > libraries ingest structured arrays, record arrays, pandas > > dataframes, > > etc. without too much specialized code? > > > > > > AFAIU while not being in the data handling area, pandas defines > > the interface and other libraries provide pandas compatible > > interfaces or implementations. > > > > statsmodels currently still has recarray support and usage. In > > some interfaces we support pandas, recarrays and plain arrays, > > or anything where asarray works correctly. > > > > But recarrays became messy to support, one rewrite of some > > functions last year converts recarrays to pandas, does the > > manipulation and then converts back to recarrays. > > Also we need to adjust our recarray usage with new numpy > > versions. But there is no real benefit because I doubt that > > statsmodels still has any recarray/structured dtype users. So, > > we only have to remove our own uses in the datasets and unit > tests. > > > > Josef > > > > > > > > > > St?fan > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org python.org> > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Jan 29 17:59:29 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 29 Jan 2018 17:59:29 -0500 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: References: <2c7a541a-dcc0-32f1-b18b-3ca35a34bb71@gmail.com> <89c7a41b-12c2-77b6-a12a-25d64f4cc6b9@gmail.com> <388385e2-163b-ad2e-6bf6-1ac68a709278@gmail.com> <20180129195548.dtopwmanutp6ly4u@fastmail.com> Message-ID: On Mon, Jan 29, 2018 at 5:50 PM, wrote: > > > On Mon, Jan 29, 2018 at 4:11 PM, Allan Haldane > wrote: > >> On 01/29/2018 04:02 PM, josef.pktd at gmail.com wrote: >> > >> > >> > On Mon, Jan 29, 2018 at 3:44 PM, Benjamin Root > > > wrote: >> > >> > I <3 structured arrays. I love the fact that I can access data by >> > row and then by fieldname, or vice versa. There are times when I >> > need to pass just a column into a function, and there are times when >> > I need to process things row by row. Yes, pandas is nice if you want >> > the specialized indexing features, but it becomes a bear to deal >> > with if all you want is normal indexing, or even the ability to >> > easily loop over the dataset. >> > >> > >> > I don't think there is a doubt that structured arrays, arrays with >> > structured dtypes, are a useful container. The question is whether they >> > should be more or the foundation for more. >> > >> > For example, computing a mean, or reduce operation, over numeric element >> > ("columns"). Before padded views it was possible to index by selecting >> > the relevant "columns" and view them as standard array. With padded >> > views that breaks and AFAICS, there is no way in numpy 1.14.0 to compute >> > a mean of some "columns". (I don't have numpy 1.14 to try or find a >> > workaround, like maybe looping over all relevant columns.) >> > >> > Josef >> >> Just to clarify, structured types have always had padding bytes, that >> isn't new. >> >> What *is* new (which we are pushing to 1.15, I think) is that it may be >> somewhat more common to end up with padding than before, and only if you >> are specifically using multi-field indexing, which is a fairly >> specialized case. >> >> I think recfunctions already account properly for padding bytes. Except >> for the bug in #8100, which we will fix, padding-bytes in recarrays are >> more or less invisible to a non-expert who only cares about >> dataframe-like behavior. >> >> In other words, padding is no obstacle at all to computing a mean over a >> column, and single-field indexes in 1.15 behave identically as before. >> The only thing that will change in 1.15 is multi-field indexing, and it >> has never been possible to compute a mean (or any binary operation) on >> multiple fields. >> > > from the example in the other thread > a[['b', 'c']].view(('f8', 2)).mean(0) > > > (from the statsmodels usecase: > read csv with genfromtext to get recarray or structured array > select/index the numeric columns > view them as standard array > do whatever we can do with standard numpy arrays > ) > Or, to phrase it as a question: How do we get a standard array with homogeneous dtype from the corresponding elements of a structured dtype in numpy 1.14.0? Josef > > Josef > > >> >> Allan >> >> > >> > Cheers! >> > Ben Root >> > >> > On Mon, Jan 29, 2018 at 3:24 PM, > > > wrote: >> > >> > >> > >> > On Mon, Jan 29, 2018 at 2:55 PM, Stefan van der Walt >> > > wrote: >> > >> > On Mon, 29 Jan 2018 14:10:56 -0500, josef.pktd at gmail.com >> > wrote: >> > >> > Given that there is pandas, xarray, dask and more, numpy >> > could as well drop >> > any pretense of supporting dataframe_likes. Or, adjust >> > the recfunctions so >> > we can still work dataframe_like with structured >> > dtypes/recarrays/recfunctions. >> > >> > >> > I haven't been following the duckarray discussion carefully, >> > but could >> > this be an opportunity for a dataframe protocol, so that we >> > can have >> > libraries ingest structured arrays, record arrays, pandas >> > dataframes, >> > etc. without too much specialized code? >> > >> > >> > AFAIU while not being in the data handling area, pandas defines >> > the interface and other libraries provide pandas compatible >> > interfaces or implementations. >> > >> > statsmodels currently still has recarray support and usage. In >> > some interfaces we support pandas, recarrays and plain arrays, >> > or anything where asarray works correctly. >> > >> > But recarrays became messy to support, one rewrite of some >> > functions last year converts recarrays to pandas, does the >> > manipulation and then converts back to recarrays. >> > Also we need to adjust our recarray usage with new numpy >> > versions. But there is no real benefit because I doubt that >> > statsmodels still has any recarray/structured dtype users. So, >> > we only have to remove our own uses in the datasets and unit >> tests. >> > >> > Josef >> > >> > >> > >> > >> > St?fan >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at python.org > n.org> >> > https://mail.python.org/mailman/listinfo/numpy-discussion >> > >> > >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at python.org > > >> > https://mail.python.org/mailman/listinfo/numpy-discussion >> > >> > >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at python.org >> > https://mail.python.org/mailman/listinfo/numpy-discussion >> > >> > >> > >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at python.org >> > https://mail.python.org/mailman/listinfo/numpy-discussion >> > >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jeremy.Solbrig at colostate.edu Mon Jan 29 16:09:41 2018 From: Jeremy.Solbrig at colostate.edu (Solbrig,Jeremy) Date: Mon, 29 Jan 2018 21:09:41 +0000 Subject: [Numpy-discussion] f2py bug in numpy v1.12.0 and above? In-Reply-To: References: , Message-ID: I think that I actually may have this solved now. It seems that I can resolve the issue by running "conda install gcc" then linking against the anaconda gfortran libraries rather than my system install. The only problem is that installing gcc through anaconda causes anaconda's version of gcc to supersede the system version... Anyway, not a numpy issue it seems, just an Anaconda packaging issue as you both said. ________________________________ From: NumPy-Discussion on behalf of Andrew Nelson Sent: Monday, January 29, 2018 2:02:02 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] f2py bug in numpy v1.12.0 and above? Something similar was mentioned at https://github.com/scipy/scipy/issues/8325. -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Mon Jan 29 22:44:11 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Mon, 29 Jan 2018 22:44:11 -0500 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: References: <89c7a41b-12c2-77b6-a12a-25d64f4cc6b9@gmail.com> <388385e2-163b-ad2e-6bf6-1ac68a709278@gmail.com> <20180129195548.dtopwmanutp6ly4u@fastmail.com> Message-ID: <32dfc3bf-9197-2bf1-9767-25529e2d10fd@gmail.com> On 01/29/2018 05:59 PM, josef.pktd at gmail.com wrote: > > > On Mon, Jan 29, 2018 at 5:50 PM, > wrote: > > > > On Mon, Jan 29, 2018 at 4:11 PM, Allan Haldane > > wrote: > > On 01/29/2018 04:02 PM, josef.pktd at gmail.com > wrote: > > > > > > On Mon, Jan 29, 2018 at 3:44 PM, Benjamin Root > > >> wrote: > > > >? ? ?I <3 structured arrays. I love the fact that I can access data by > >? ? ?row and then by fieldname, or vice versa. There are times when I > >? ? ?need to pass just a column into a function, and there are times when > >? ? ?I need to process things row by row. Yes, pandas is nice if you want > >? ? ?the specialized indexing features, but it becomes a bear to deal > >? ? ?with if all you want is normal indexing, or even the ability to > >? ? ?easily loop over the dataset. > > > > > > I don't think there is a doubt that structured arrays, arrays with > > structured dtypes, are a useful container. The question is whether they > > should be more or the foundation for more. > > > > For example, computing a mean, or reduce operation, over numeric element > > ("columns"). Before padded views it was possible to index by selecting > > the relevant "columns" and view them as standard array. With padded > > views that breaks and AFAICS, there is no way in numpy 1.14.0 to compute > > a mean of some "columns". (I don't have numpy 1.14 to try or find a > > workaround, like maybe looping over all relevant columns.) > > > > Josef > > Just to clarify, structured types have always had padding bytes, > that > isn't new. > > What *is* new (which we are pushing to 1.15, I think) is that it > may be > somewhat more common to end up with padding than before, and > only if you > are specifically using multi-field indexing, which is a fairly > specialized case. > > I think recfunctions already account properly for padding bytes. > Except > for the bug in #8100, which we will fix, padding-bytes in > recarrays are > more or less invisible to a non-expert who only cares about > dataframe-like behavior. > > In other words, padding is no obstacle at all to computing a > mean over a > column, and single-field indexes in 1.15 behave identically as > before. > The only thing that will change in 1.15 is multi-field indexing, > and it > has never been possible to compute a mean (or any binary > operation) on > multiple fields. > > > from the example in the other thread > a[['b', 'c']].view(('f8', 2)).mean(0) > > > (from the statsmodels usecase: > read csv with genfromtext to get recarray or structured array > select/index the numeric columns > view them as standard array > do whatever we can do with standard numpy? arrays > ) Oh ok, I misunderstood. I see your point: a mean over fields is more difficult than before. > Or, to phrase it as a question: > > How do we get a standard array with homogeneous dtype from the > corresponding elements of a structured dtype in numpy 1.14.0? > > Josef The answer may be that "numpy has never had a way to that", even if in a few special cases you might hack a workaround using views. That's what your example seems like to me. It uses an explicit view, which is an "expert" feature since views depend on the exact memory layout and binary representation of the array. Your example only works if the two fields have exactly the same dtype as each other and as the final dtype, and evidently breaks if there is byte padding for any reason. Pandas can do row means without these problems: >>> pd.DataFrame(np.ones(10, dtype='i8,f8')).mean(axis=0) Numpy is missing this functionality, so you or whoever wrote that example figured out a fragile workaround using views. I suggest that if we want to allow either means over fields, or conversion of a n-D structured array to an n+1-D regular ndarray, we should add a dedicated function to do so in numpy.lib.recfunctions which does not depend on the binary representation of the array. Allan > Josef > > > Allan > > > > >? ? ?Cheers! > >? ? ?Ben Root > > > >? ? ?On Mon, Jan 29, 2018 at 3:24 PM, > >? ? ?>> wrote: > > > > > > > >? ? ? ? ?On Mon, Jan 29, 2018 at 2:55 PM, Stefan van der Walt > >? ? ? ? ? > >> wrote: > > > >? ? ? ? ? ? ?On Mon, 29 Jan 2018 14:10:56 -0500, josef.pktd at gmail.com > >? ? ? ? ? ? ? > wrote: > > > >? ? ? ? ? ? ? ? ?Given that there is pandas, xarray, dask and > more, numpy > >? ? ? ? ? ? ? ? ?could as well drop > >? ? ? ? ? ? ? ? ?any pretense of supporting dataframe_likes. > Or, adjust > >? ? ? ? ? ? ? ? ?the recfunctions so > >? ? ? ? ? ? ? ? ?we can still work dataframe_like with structured > >? ? ? ? ? ? ? ? ?dtypes/recarrays/recfunctions. > > > > > >? ? ? ? ? ? ?I haven't been following the duckarray discussion > carefully, > >? ? ? ? ? ? ?but could > >? ? ? ? ? ? ?this be an opportunity for a dataframe protocol, > so that we > >? ? ? ? ? ? ?can have > >? ? ? ? ? ? ?libraries ingest structured arrays, record > arrays, pandas > >? ? ? ? ? ? ?dataframes, > >? ? ? ? ? ? ?etc. without too much specialized code? > > > > > >? ? ? ? ?AFAIU while not being in the data handling area, > pandas defines > >? ? ? ? ?the interface and other libraries provide pandas > compatible > >? ? ? ? ?interfaces or implementations. > > > >? ? ? ? ?statsmodels currently still has recarray support and > usage. In > >? ? ? ? ?some interfaces we support pandas, recarrays and > plain arrays, > >? ? ? ? ?or anything where asarray works correctly. > > > >? ? ? ? ?But recarrays became messy to support, one rewrite of > some > >? ? ? ? ?functions last year converts recarrays to pandas, > does the > >? ? ? ? ?manipulation and then converts back to recarrays. > >? ? ? ? ?Also we need to adjust our recarray usage with new numpy > >? ? ? ? ?versions. But there is no real benefit because I > doubt that > >? ? ? ? ?statsmodels still has any recarray/structured dtype > users. So, > >? ? ? ? ?we only have to remove our own uses in the datasets > and unit tests. > > > >? ? ? ? ?Josef > > > > > > > > > >? ? ? ? ? ? ?St?fan > > > >? ? ? ? ? ? ?_______________________________________________ > >? ? ? ? ? ? ?NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > >? ? ? ? ? ? ? > > > > > > > > >? ? ? ? ?_______________________________________________ > >? ? ? ? ?NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > >? ? ? ? ? > > > > > > > > >? ? ?_______________________________________________ > >? ? ?NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > ? > > > > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Mon Jan 29 23:50:46 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 29 Jan 2018 23:50:46 -0500 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: <32dfc3bf-9197-2bf1-9767-25529e2d10fd@gmail.com> References: <89c7a41b-12c2-77b6-a12a-25d64f4cc6b9@gmail.com> <388385e2-163b-ad2e-6bf6-1ac68a709278@gmail.com> <20180129195548.dtopwmanutp6ly4u@fastmail.com> <32dfc3bf-9197-2bf1-9767-25529e2d10fd@gmail.com> Message-ID: On Mon, Jan 29, 2018 at 10:44 PM, Allan Haldane wrote: > On 01/29/2018 05:59 PM, josef.pktd at gmail.com wrote: > >> >> >> On Mon, Jan 29, 2018 at 5:50 PM, > josef.pktd at gmail.com>> wrote: >> >> >> >> On Mon, Jan 29, 2018 at 4:11 PM, Allan Haldane >> > wrote: >> >> On 01/29/2018 04:02 PM, josef.pktd at gmail.com >> wrote: >> > >> > >> > On Mon, Jan 29, 2018 at 3:44 PM, Benjamin Root < >> ben.v.root at gmail.com >> > >> >> wrote: >> > >> > I <3 structured arrays. I love the fact that I can access >> data by >> > row and then by fieldname, or vice versa. There are times >> when I >> > need to pass just a column into a function, and there are >> times when >> > I need to process things row by row. Yes, pandas is nice if >> you want >> > the specialized indexing features, but it becomes a bear to >> deal >> > with if all you want is normal indexing, or even the >> ability to >> > easily loop over the dataset. >> > >> > >> > I don't think there is a doubt that structured arrays, arrays >> with >> > structured dtypes, are a useful container. The question is >> whether they >> > should be more or the foundation for more. >> > >> > For example, computing a mean, or reduce operation, over >> numeric element >> > ("columns"). Before padded views it was possible to index by >> selecting >> > the relevant "columns" and view them as standard array. With >> padded >> > views that breaks and AFAICS, there is no way in numpy 1.14.0 >> to compute >> > a mean of some "columns". (I don't have numpy 1.14 to try or >> find a >> > workaround, like maybe looping over all relevant columns.) >> > >> > Josef >> >> Just to clarify, structured types have always had padding bytes, >> that >> isn't new. >> >> What *is* new (which we are pushing to 1.15, I think) is that it >> may be >> somewhat more common to end up with padding than before, and >> only if you >> are specifically using multi-field indexing, which is a fairly >> specialized case. >> >> I think recfunctions already account properly for padding bytes. >> Except >> for the bug in #8100, which we will fix, padding-bytes in >> recarrays are >> more or less invisible to a non-expert who only cares about >> dataframe-like behavior. >> >> In other words, padding is no obstacle at all to computing a >> mean over a >> column, and single-field indexes in 1.15 behave identically as >> before. >> The only thing that will change in 1.15 is multi-field indexing, >> and it >> has never been possible to compute a mean (or any binary >> operation) on >> multiple fields. >> >> >> from the example in the other thread >> a[['b', 'c']].view(('f8', 2)).mean(0) >> >> >> (from the statsmodels usecase: >> read csv with genfromtext to get recarray or structured array >> select/index the numeric columns >> view them as standard array >> do whatever we can do with standard numpy arrays >> ) >> > > Oh ok, I misunderstood. I see your point: a mean over fields is more > difficult than before. > > Or, to phrase it as a question: >> >> How do we get a standard array with homogeneous dtype from the >> corresponding elements of a structured dtype in numpy 1.14.0? >> >> Josef >> > > The answer may be that "numpy has never had a way to that", > even if in a few special cases you might hack a workaround using views. > > That's what your example seems like to me. It uses an explicit view, which > is an "expert" feature since views depend on the exact memory layout and > binary representation of the array. Your example only works if the two > fields have exactly the same dtype as each other and as the final dtype, > and evidently breaks if there is byte padding for any reason. > > Pandas can do row means without these problems: > > >>> pd.DataFrame(np.ones(10, dtype='i8,f8')).mean(axis=0) > > Numpy is missing this functionality, so you or whoever wrote that example > figured out a fragile workaround using views. > Once upon a time (*) this wasn't fragile but the only and recommended way. Because dtypes were low level with clear memory layout and stayed that way, it was easy to check item size or whatever and get different views on it. e.g. https://mail.scipy.org/pipermail/numpy-discussion/2008-December/039340.html (*) pre-pandas, pre-stackoverflow on the mailing lists which was for me roughly 2008 to 2012 but a late thread https://mail.scipy.org/pipermail/numpy-discussion/2015-October/074014.html "What is now the recommended way of converting structured dtypes/recarrays to ndarrays?" > I suggest that if we want to allow either means over fields, or conversion > of a n-D structured array to an n+1-D regular ndarray, we should add a > dedicated function to do so in numpy.lib.recfunctions > which does not depend on the binary representation of the array. > > I don't really want to defend an obsolete (?) usecase of structured dtypes. However, I think there should be a decision about the future plans for whether dataframe like usages of structure dtypes or through higher level classes or functions are still supported, instead of removing slowly and silently (*) the foundation for this use case, either support this usage or say you will be dropping it. (*) I didn't read the details of the release notes And another footnote about obsolete: Given that I'm the only one arguing about the dataframe_like usecase of recarrays and structured dtypes, I think they are dead for this specific usecase and only my inertia and conservativeness kept them alive in statsmodels. Josef > Allan > > > Josef >> >> >> Allan >> >> > >> > Cheers! >> > Ben Root >> > >> > On Mon, Jan 29, 2018 at 3:24 PM, > >> > >> >> wrote: >> > >> > >> > >> > On Mon, Jan 29, 2018 at 2:55 PM, Stefan van der Walt >> > >> >> >> wrote: >> > >> > On Mon, 29 Jan 2018 14:10:56 -0500, >> josef.pktd at gmail.com >> > > >> > wrote: >> > >> > Given that there is pandas, xarray, dask and >> more, numpy >> > could as well drop >> > any pretense of supporting dataframe_likes. >> Or, adjust >> > the recfunctions so >> > we can still work dataframe_like with >> structured >> > dtypes/recarrays/recfunctions. >> > >> > >> > I haven't been following the duckarray discussion >> carefully, >> > but could >> > this be an opportunity for a dataframe protocol, >> so that we >> > can have >> > libraries ingest structured arrays, record >> arrays, pandas >> > dataframes, >> > etc. without too much specialized code? >> > >> > >> > AFAIU while not being in the data handling area, >> pandas defines >> > the interface and other libraries provide pandas >> compatible >> > interfaces or implementations. >> > >> > statsmodels currently still has recarray support and >> usage. In >> > some interfaces we support pandas, recarrays and >> plain arrays, >> > or anything where asarray works correctly. >> > >> > But recarrays became messy to support, one rewrite of >> some >> > functions last year converts recarrays to pandas, >> does the >> > manipulation and then converts back to recarrays. >> > Also we need to adjust our recarray usage with new >> numpy >> > versions. But there is no real benefit because I >> doubt that >> > statsmodels still has any recarray/structured dtype >> users. So, >> > we only have to remove our own uses in the datasets >> and unit tests. >> > >> > Josef >> > >> > >> > >> > >> > St?fan >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at python.org >> >> > > >> > https://mail.python.org/mailman/listinfo/numpy-discussion >> >> > > man/listinfo/numpy-discussion >> > >> > >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at python.org >> >> > > >> > https://mail.python.org/mailman/listinfo/numpy-discussion >> >> > > man/listinfo/numpy-discussion >> > >> > >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at python.org >> >> > > >> > https://mail.python.org/mailman/listinfo/numpy-discussion >> >> > > man/listinfo/numpy-discussion >> > >> > >> > >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at python.org > n.org> >> > https://mail.python.org/mailman/listinfo/numpy-discussion >> >> > >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> >> >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andyfaff at gmail.com Tue Jan 30 01:22:22 2018 From: andyfaff at gmail.com (Andrew Nelson) Date: Tue, 30 Jan 2018 17:22:22 +1100 Subject: [Numpy-discussion] Optimisation of matrix multiplication Message-ID: Hi all, I have a matrix multiplication that I'd like to optimize. I have a matrix `a` (dtype=complex) with shape (N, M, 2, 2). I'd like to do the following multiplication: a[:, 0] @ a[:, 1] @ ... @ a[:, M-1] where the first dimension, N, is element wise (and hopefully vectorisable) and M>=2. So for each row do M-1 matrix multiplications of 2x2 matrices. The output should have shape (N, 2, 2). What would be the best way of going about this? -- _____________________________________ Dr. Andrew Nelson _____________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From andyfaff at gmail.com Tue Jan 30 01:27:26 2018 From: andyfaff at gmail.com (Andrew Nelson) Date: Tue, 30 Jan 2018 17:27:26 +1100 Subject: [Numpy-discussion] Optimisation of matrix multiplication In-Reply-To: References: Message-ID: On 30 January 2018 at 17:22, Andrew Nelson wrote: > Hi all, > I have a matrix multiplication that I'd like to optimize. > > I have a matrix `a` (dtype=complex) with shape (N, M, 2, 2). I'd like to > do the following multiplication: > > a[:, 0] @ a[:, 1] @ ... @ a[:, M-1] > > where the first dimension, N, is element wise (and hopefully vectorisable) > and M>=2. So for each row do M-1 matrix multiplications of 2x2 matrices. > The output should have shape (N, 2, 2). > > What would be the best way of going about this? > I should add that at the moment I have 4 (N, M) arrays and am doing the matrix multiplication in a for loop over `range(0, M)`, in an unrolled fashion for each of the 4 elements. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wieser.eric+numpy at gmail.com Tue Jan 30 03:24:48 2018 From: wieser.eric+numpy at gmail.com (Eric Wieser) Date: Tue, 30 Jan 2018 08:24:48 +0000 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: References: <89c7a41b-12c2-77b6-a12a-25d64f4cc6b9@gmail.com> <388385e2-163b-ad2e-6bf6-1ac68a709278@gmail.com> <20180129195548.dtopwmanutp6ly4u@fastmail.com> <32dfc3bf-9197-2bf1-9767-25529e2d10fd@gmail.com> Message-ID: Because dtypes were low level with clear memory layout and stayed that way Dtypes have supported padded and out-of-order-fields since at least 2005 (v0.8.4) , and I would guess that the memory layout has not changed since. The house has always been made out of glass, it just didn?t look fragile until we showed people where the stones were. ? On Mon, 29 Jan 2018 at 20:51 wrote: > On Mon, Jan 29, 2018 at 10:44 PM, Allan Haldane > wrote: > >> On 01/29/2018 05:59 PM, josef.pktd at gmail.com wrote: >> >>> >>> >>> On Mon, Jan 29, 2018 at 5:50 PM, >> josef.pktd at gmail.com>> wrote: >>> >>> >>> >>> On Mon, Jan 29, 2018 at 4:11 PM, Allan Haldane >>> > wrote: >>> >>> On 01/29/2018 04:02 PM, josef.pktd at gmail.com >>> wrote: >>> > >>> > >>> > On Mon, Jan 29, 2018 at 3:44 PM, Benjamin Root < >>> ben.v.root at gmail.com >>> > >> >>> wrote: >>> > >>> > I <3 structured arrays. I love the fact that I can access >>> data by >>> > row and then by fieldname, or vice versa. There are times >>> when I >>> > need to pass just a column into a function, and there are >>> times when >>> > I need to process things row by row. Yes, pandas is nice >>> if you want >>> > the specialized indexing features, but it becomes a bear >>> to deal >>> > with if all you want is normal indexing, or even the >>> ability to >>> > easily loop over the dataset. >>> > >>> > >>> > I don't think there is a doubt that structured arrays, arrays >>> with >>> > structured dtypes, are a useful container. The question is >>> whether they >>> > should be more or the foundation for more. >>> > >>> > For example, computing a mean, or reduce operation, over >>> numeric element >>> > ("columns"). Before padded views it was possible to index by >>> selecting >>> > the relevant "columns" and view them as standard array. With >>> padded >>> > views that breaks and AFAICS, there is no way in numpy 1.14.0 >>> to compute >>> > a mean of some "columns". (I don't have numpy 1.14 to try or >>> find a >>> > workaround, like maybe looping over all relevant columns.) >>> > >>> > Josef >>> >>> Just to clarify, structured types have always had padding bytes, >>> that >>> isn't new. >>> >>> What *is* new (which we are pushing to 1.15, I think) is that it >>> may be >>> somewhat more common to end up with padding than before, and >>> only if you >>> are specifically using multi-field indexing, which is a fairly >>> specialized case. >>> >>> I think recfunctions already account properly for padding bytes. >>> Except >>> for the bug in #8100, which we will fix, padding-bytes in >>> recarrays are >>> more or less invisible to a non-expert who only cares about >>> dataframe-like behavior. >>> >>> In other words, padding is no obstacle at all to computing a >>> mean over a >>> column, and single-field indexes in 1.15 behave identically as >>> before. >>> The only thing that will change in 1.15 is multi-field indexing, >>> and it >>> has never been possible to compute a mean (or any binary >>> operation) on >>> multiple fields. >>> >>> >>> from the example in the other thread >>> a[['b', 'c']].view(('f8', 2)).mean(0) >>> >>> >>> (from the statsmodels usecase: >>> read csv with genfromtext to get recarray or structured array >>> select/index the numeric columns >>> view them as standard array >>> do whatever we can do with standard numpy arrays >>> ) >>> >> >> Oh ok, I misunderstood. I see your point: a mean over fields is more >> difficult than before. >> >> Or, to phrase it as a question: >>> >>> How do we get a standard array with homogeneous dtype from the >>> corresponding elements of a structured dtype in numpy 1.14.0? >>> >>> Josef >>> >> >> The answer may be that "numpy has never had a way to that", >> even if in a few special cases you might hack a workaround using views. >> >> That's what your example seems like to me. It uses an explicit view, >> which is an "expert" feature since views depend on the exact memory layout >> and binary representation of the array. Your example only works if the two >> fields have exactly the same dtype as each other and as the final dtype, >> and evidently breaks if there is byte padding for any reason. >> >> Pandas can do row means without these problems: >> >> >>> pd.DataFrame(np.ones(10, dtype='i8,f8')).mean(axis=0) >> >> Numpy is missing this functionality, so you or whoever wrote that example >> figured out a fragile workaround using views. >> > > Once upon a time (*) this wasn't fragile but the only and recommended way. > Because dtypes were low level with clear memory layout and stayed that way, > it was easy to check item size or whatever and get different views on it. > e.g. > https://mail.scipy.org/pipermail/numpy-discussion/2008-December/039340.html > > (*) pre-pandas, pre-stackoverflow on the mailing lists which was for me > roughly 2008 to 2012 > but a late thread > https://mail.scipy.org/pipermail/numpy-discussion/2015-October/074014.html > "What is now the recommended way of converting structured dtypes/recarrays > to ndarrays?" > > > > >> I suggest that if we want to allow either means over fields, or >> conversion of a n-D structured array to an n+1-D regular ndarray, we should >> add a dedicated function to do so in numpy.lib.recfunctions >> which does not depend on the binary representation of the array. >> >> > I don't really want to defend an obsolete (?) usecase of structured dtypes. > > However, I think there should be a decision about the future plans for > whether dataframe like usages of structure dtypes or through higher level > classes or functions are still supported, instead of removing slowly and > silently (*) the foundation for this use case, either support this usage or > say you will be dropping it. > > (*) I didn't read the details of the release notes > > > And another footnote about obsolete: > Given that I'm the only one arguing about the dataframe_like usecase of > recarrays and structured dtypes, I think they are dead for this specific > usecase and only my inertia and conservativeness kept them alive in > statsmodels. > > > Josef > > > > >> Allan >> >> >> Josef >>> >>> >>> Allan >>> >>> > >>> > Cheers! >>> > Ben Root >>> > >>> > On Mon, Jan 29, 2018 at 3:24 PM, >> >>> > >> >>> wrote: >>> > >>> > >>> > >>> > On Mon, Jan 29, 2018 at 2:55 PM, Stefan van der Walt >>> > >>> >> >>> wrote: >>> > >>> > On Mon, 29 Jan 2018 14:10:56 -0500, >>> josef.pktd at gmail.com >>> > >> >>> > wrote: >>> > >>> > Given that there is pandas, xarray, dask and >>> more, numpy >>> > could as well drop >>> > any pretense of supporting dataframe_likes. >>> Or, adjust >>> > the recfunctions so >>> > we can still work dataframe_like with >>> structured >>> > dtypes/recarrays/recfunctions. >>> > >>> > >>> > I haven't been following the duckarray discussion >>> carefully, >>> > but could >>> > this be an opportunity for a dataframe protocol, >>> so that we >>> > can have >>> > libraries ingest structured arrays, record >>> arrays, pandas >>> > dataframes, >>> > etc. without too much specialized code? >>> > >>> > >>> > AFAIU while not being in the data handling area, >>> pandas defines >>> > the interface and other libraries provide pandas >>> compatible >>> > interfaces or implementations. >>> > >>> > statsmodels currently still has recarray support and >>> usage. In >>> > some interfaces we support pandas, recarrays and >>> plain arrays, >>> > or anything where asarray works correctly. >>> > >>> > But recarrays became messy to support, one rewrite of >>> some >>> > functions last year converts recarrays to pandas, >>> does the >>> > manipulation and then converts back to recarrays. >>> > Also we need to adjust our recarray usage with new >>> numpy >>> > versions. But there is no real benefit because I >>> doubt that >>> > statsmodels still has any recarray/structured dtype >>> users. So, >>> > we only have to remove our own uses in the datasets >>> and unit tests. >>> > >>> > Josef >>> > >>> > >>> > >>> > >>> > St?fan >>> > >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion at python.org >>> >>> >> > >>> > https://mail.python.org/mailman/listinfo/numpy-discussion >>> >>> > < >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> > >>> > >>> > >>> > >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion at python.org >>> >>> >> > >>> > https://mail.python.org/mailman/listinfo/numpy-discussion >>> >>> > < >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> > >>> > >>> > >>> > >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion at python.org >>> >>> >> > >>> > https://mail.python.org/mailman/listinfo/numpy-discussion >>> >>> > < >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> > >>> > >>> > >>> > >>> > >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion at python.org >> NumPy-Discussion at python.org> >>> > https://mail.python.org/mailman/listinfo/numpy-discussion >>> >>> > >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Jan 30 07:49:04 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 30 Jan 2018 07:49:04 -0500 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: References: <89c7a41b-12c2-77b6-a12a-25d64f4cc6b9@gmail.com> <388385e2-163b-ad2e-6bf6-1ac68a709278@gmail.com> <20180129195548.dtopwmanutp6ly4u@fastmail.com> <32dfc3bf-9197-2bf1-9767-25529e2d10fd@gmail.com> Message-ID: On Tue, Jan 30, 2018 at 3:24 AM, Eric Wieser wrote: > Because dtypes were low level with clear memory layout and stayed that way > > Dtypes have supported padded and out-of-order-fields since at least 2005 > (v0.8.4) > , > and I would guess that the memory layout has not changed since. > > The house has always been made out of glass, it just didn?t look fragile > until we showed people where the stones were. > Even so, I don't remember any problems with it. There might have been stones on the side streets and alleys, but 1.14.0 puts a big padded stone right in the front of the drive way. (Maybe only the solarium was made out of glass, now it's also the billiard room.) (I never had to learn about padding and I don't remember having any related problems getting statsmodels through Debian testing on various machine types.) Josef > ? > > On Mon, 29 Jan 2018 at 20:51 wrote: > >> On Mon, Jan 29, 2018 at 10:44 PM, Allan Haldane >> wrote: >> >>> On 01/29/2018 05:59 PM, josef.pktd at gmail.com wrote: >>> >>>> >>>> >>>> On Mon, Jan 29, 2018 at 5:50 PM, >>> josef.pktd at gmail.com>> wrote: >>>> >>>> >>>> >>>> On Mon, Jan 29, 2018 at 4:11 PM, Allan Haldane >>>> > wrote: >>>> >>>> On 01/29/2018 04:02 PM, josef.pktd at gmail.com >>>> wrote: >>>> > >>>> > >>>> > On Mon, Jan 29, 2018 at 3:44 PM, Benjamin Root < >>>> ben.v.root at gmail.com >>>> > >> >>>> wrote: >>>> > >>>> > I <3 structured arrays. I love the fact that I can access >>>> data by >>>> > row and then by fieldname, or vice versa. There are times >>>> when I >>>> > need to pass just a column into a function, and there are >>>> times when >>>> > I need to process things row by row. Yes, pandas is nice >>>> if you want >>>> > the specialized indexing features, but it becomes a bear >>>> to deal >>>> > with if all you want is normal indexing, or even the >>>> ability to >>>> > easily loop over the dataset. >>>> > >>>> > >>>> > I don't think there is a doubt that structured arrays, arrays >>>> with >>>> > structured dtypes, are a useful container. The question is >>>> whether they >>>> > should be more or the foundation for more. >>>> > >>>> > For example, computing a mean, or reduce operation, over >>>> numeric element >>>> > ("columns"). Before padded views it was possible to index by >>>> selecting >>>> > the relevant "columns" and view them as standard array. With >>>> padded >>>> > views that breaks and AFAICS, there is no way in numpy 1.14.0 >>>> to compute >>>> > a mean of some "columns". (I don't have numpy 1.14 to try or >>>> find a >>>> > workaround, like maybe looping over all relevant columns.) >>>> > >>>> > Josef >>>> >>>> Just to clarify, structured types have always had padding bytes, >>>> that >>>> isn't new. >>>> >>>> What *is* new (which we are pushing to 1.15, I think) is that it >>>> may be >>>> somewhat more common to end up with padding than before, and >>>> only if you >>>> are specifically using multi-field indexing, which is a fairly >>>> specialized case. >>>> >>>> I think recfunctions already account properly for padding bytes. >>>> Except >>>> for the bug in #8100, which we will fix, padding-bytes in >>>> recarrays are >>>> more or less invisible to a non-expert who only cares about >>>> dataframe-like behavior. >>>> >>>> In other words, padding is no obstacle at all to computing a >>>> mean over a >>>> column, and single-field indexes in 1.15 behave identically as >>>> before. >>>> The only thing that will change in 1.15 is multi-field indexing, >>>> and it >>>> has never been possible to compute a mean (or any binary >>>> operation) on >>>> multiple fields. >>>> >>>> >>>> from the example in the other thread >>>> a[['b', 'c']].view(('f8', 2)).mean(0) >>>> >>>> >>>> (from the statsmodels usecase: >>>> read csv with genfromtext to get recarray or structured array >>>> select/index the numeric columns >>>> view them as standard array >>>> do whatever we can do with standard numpy arrays >>>> ) >>>> >>> >>> Oh ok, I misunderstood. I see your point: a mean over fields is more >>> difficult than before. >>> >>> Or, to phrase it as a question: >>>> >>>> How do we get a standard array with homogeneous dtype from the >>>> corresponding elements of a structured dtype in numpy 1.14.0? >>>> >>>> Josef >>>> >>> >>> The answer may be that "numpy has never had a way to that", >>> even if in a few special cases you might hack a workaround using views. >>> >>> That's what your example seems like to me. It uses an explicit view, >>> which is an "expert" feature since views depend on the exact memory layout >>> and binary representation of the array. Your example only works if the two >>> fields have exactly the same dtype as each other and as the final dtype, >>> and evidently breaks if there is byte padding for any reason. >>> >>> Pandas can do row means without these problems: >>> >>> >>> pd.DataFrame(np.ones(10, dtype='i8,f8')).mean(axis=0) >>> >>> Numpy is missing this functionality, so you or whoever wrote that >>> example figured out a fragile workaround using views. >>> >> >> Once upon a time (*) this wasn't fragile but the only and recommended >> way. Because dtypes were low level with clear memory layout and stayed that >> way, it was easy to check item size or whatever and get different views on >> it. >> e.g. https://mail.scipy.org/pipermail/numpy-discussion/ >> 2008-December/039340.html >> >> (*) pre-pandas, pre-stackoverflow on the mailing lists which was for me >> roughly 2008 to 2012 >> but a late thread https://mail.scipy.org/pipermail/numpy-discussion/ >> 2015-October/074014.html >> "What is now the recommended way of converting structured >> dtypes/recarrays to ndarrays?" >> >> >> >> >>> I suggest that if we want to allow either means over fields, or >>> conversion of a n-D structured array to an n+1-D regular ndarray, we should >>> add a dedicated function to do so in numpy.lib.recfunctions >>> which does not depend on the binary representation of the array. >>> >>> >> I don't really want to defend an obsolete (?) usecase of structured >> dtypes. >> >> However, I think there should be a decision about the future plans for >> whether dataframe like usages of structure dtypes or through higher level >> classes or functions are still supported, instead of removing slowly and >> silently (*) the foundation for this use case, either support this usage or >> say you will be dropping it. >> >> (*) I didn't read the details of the release notes >> >> >> And another footnote about obsolete: >> Given that I'm the only one arguing about the dataframe_like usecase of >> recarrays and structured dtypes, I think they are dead for this specific >> usecase and only my inertia and conservativeness kept them alive in >> statsmodels. >> >> >> Josef >> >> >> >> >>> Allan >>> >>> >>> Josef >>>> >>>> >>>> Allan >>>> >>>> > >>>> > Cheers! >>>> > Ben Root >>>> > >>>> > On Mon, Jan 29, 2018 at 3:24 PM, >>> >>>> > >>> >>> wrote: >>>> > >>>> > >>>> > >>>> > On Mon, Jan 29, 2018 at 2:55 PM, Stefan van der Walt >>>> > >>>> >> >>>> wrote: >>>> > >>>> > On Mon, 29 Jan 2018 14:10:56 -0500, >>>> josef.pktd at gmail.com >>>> > >>> >>>> > wrote: >>>> > >>>> > Given that there is pandas, xarray, dask and >>>> more, numpy >>>> > could as well drop >>>> > any pretense of supporting dataframe_likes. >>>> Or, adjust >>>> > the recfunctions so >>>> > we can still work dataframe_like with >>>> structured >>>> > dtypes/recarrays/recfunctions. >>>> > >>>> > >>>> > I haven't been following the duckarray discussion >>>> carefully, >>>> > but could >>>> > this be an opportunity for a dataframe protocol, >>>> so that we >>>> > can have >>>> > libraries ingest structured arrays, record >>>> arrays, pandas >>>> > dataframes, >>>> > etc. without too much specialized code? >>>> > >>>> > >>>> > AFAIU while not being in the data handling area, >>>> pandas defines >>>> > the interface and other libraries provide pandas >>>> compatible >>>> > interfaces or implementations. >>>> > >>>> > statsmodels currently still has recarray support and >>>> usage. In >>>> > some interfaces we support pandas, recarrays and >>>> plain arrays, >>>> > or anything where asarray works correctly. >>>> > >>>> > But recarrays became messy to support, one rewrite of >>>> some >>>> > functions last year converts recarrays to pandas, >>>> does the >>>> > manipulation and then converts back to recarrays. >>>> > Also we need to adjust our recarray usage with new >>>> numpy >>>> > versions. But there is no real benefit because I >>>> doubt that >>>> > statsmodels still has any recarray/structured dtype >>>> users. So, >>>> > we only have to remove our own uses in the datasets >>>> and unit tests. >>>> > >>>> > Josef >>>> > >>>> > >>>> > >>>> > >>>> > St?fan >>>> > >>>> > _______________________________________________ >>>> > NumPy-Discussion mailing list >>>> > NumPy-Discussion at python.org >>>> >>>> >>> > >>>> > https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>>> > >>> discussion >>>> > >>>> > >>>> > >>>> > >>>> > _______________________________________________ >>>> > NumPy-Discussion mailing list >>>> > NumPy-Discussion at python.org >>>> >>>> >>> > >>>> > https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>>> > >>> discussion >>>> > >>>> > >>>> > >>>> > >>>> > _______________________________________________ >>>> > NumPy-Discussion mailing list >>>> > NumPy-Discussion at python.org >>>> >>>> >>> > >>>> > https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>>> > >>> discussion >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > _______________________________________________ >>>> > NumPy-Discussion mailing list >>>> > NumPy-Discussion at python.org >>> python.org> >>>> > https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>>> > >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at python.org >>> > >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at python.org >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Jan 30 11:55:23 2018 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 30 Jan 2018 08:55:23 -0800 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: <32dfc3bf-9197-2bf1-9767-25529e2d10fd@gmail.com> References: <89c7a41b-12c2-77b6-a12a-25d64f4cc6b9@gmail.com> <388385e2-163b-ad2e-6bf6-1ac68a709278@gmail.com> <20180129195548.dtopwmanutp6ly4u@fastmail.com> <32dfc3bf-9197-2bf1-9767-25529e2d10fd@gmail.com> Message-ID: On Mon, Jan 29, 2018 at 7:44 PM, Allan Haldane wrote: > I suggest that if we want to allow either means over fields, or conversion > of a n-D structured array to an n+1-D regular ndarray, we should add a > dedicated function to do so in numpy.lib.recfunctions > which does not depend on the binary representation of the array. > IIUC, the core use-case of structured dtypes is binary compatibility with external systems (arrays of C structs, mostly) -- at least that's how I use them :-) In which case, "conversion of a n-D structured array to an n+1-D regular ndarray" is an important feature -- actually even more important if you don't use recarrays So yes, let's have a utility to make that easy. as for recarrays -- are we that far from having them be robust and useful? in which case, why not keep them around, fix the few issues, but explicitly not try to extend them into more dataframe-like domains -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Tue Jan 30 12:28:52 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Tue, 30 Jan 2018 12:28:52 -0500 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: References: <388385e2-163b-ad2e-6bf6-1ac68a709278@gmail.com> <20180129195548.dtopwmanutp6ly4u@fastmail.com> <32dfc3bf-9197-2bf1-9767-25529e2d10fd@gmail.com> Message-ID: On 01/29/2018 11:50 PM, josef.pktd at gmail.com wrote: > > > On Mon, Jan 29, 2018 at 10:44 PM, Allan Haldane > wrote: > > On 01/29/2018 05:59 PM, josef.pktd at gmail.com > wrote: > > > > On Mon, Jan 29, 2018 at 5:50 PM, >> wrote: > > > > ? ? On Mon, Jan 29, 2018 at 4:11 PM, Allan Haldane > ? ? > >> > wrote: > > ? ? ? ? On 01/29/2018 04:02 PM, josef.pktd at gmail.com > > ? ? ? ? > wrote: > ? ? ? ? > > ? ? ? ? > > ? ? ? ? > On Mon, Jan 29, 2018 at 3:44 PM, Benjamin Root > > > > ? ? ? ? > >>> wrote: > ? ? ? ? > > ? ? ? ? >? ? ?I <3 structured arrays. I love the fact that I > can access data by > ? ? ? ? >? ? ?row and then by fieldname, or vice versa. There > are times when I > ? ? ? ? >? ? ?need to pass just a column into a function, and > there are times when > ? ? ? ? >? ? ?I need to process things row by row. Yes, pandas > is nice if you want > ? ? ? ? >? ? ?the specialized indexing features, but it becomes > a bear to deal > ? ? ? ? >? ? ?with if all you want is normal indexing, or even > the ability to > ? ? ? ? >? ? ?easily loop over the dataset. > ? ? ? ? > > ? ? ? ? > > ? ? ? ? > I don't think there is a doubt that structured > arrays, arrays with > ? ? ? ? > structured dtypes, are a useful container. The > question is whether they > ? ? ? ? > should be more or the foundation for more. > ? ? ? ? > > ? ? ? ? > For example, computing a mean, or reduce operation, > over numeric element > ? ? ? ? > ("columns"). Before padded views it was possible to > index by selecting > ? ? ? ? > the relevant "columns" and view them as standard > array. With padded > ? ? ? ? > views that breaks and AFAICS, there is no way in > numpy 1.14.0 to compute > ? ? ? ? > a mean of some "columns". (I don't have numpy 1.14 to > try or find a > ? ? ? ? > workaround, like maybe looping over all relevant > columns.) > ? ? ? ? > > ? ? ? ? > Josef > > ? ? ? ? Just to clarify, structured types have always had > padding bytes, > ? ? ? ? that > ? ? ? ? isn't new. > > ? ? ? ? What *is* new (which we are pushing to 1.15, I think) > is that it > ? ? ? ? may be > ? ? ? ? somewhat more common to end up with padding than > before, and > ? ? ? ? only if you > ? ? ? ? are specifically using multi-field indexing, which is a > fairly > ? ? ? ? specialized case. > > ? ? ? ? I think recfunctions already account properly for > padding bytes. > ? ? ? ? Except > ? ? ? ? for the bug in #8100, which we will fix, padding-bytes in > ? ? ? ? recarrays are > ? ? ? ? more or less invisible to a non-expert who only cares about > ? ? ? ? dataframe-like behavior. > > ? ? ? ? In other words, padding is no obstacle at all to > computing a > ? ? ? ? mean over a > ? ? ? ? column, and single-field indexes in 1.15 behave > identically as > ? ? ? ? before. > ? ? ? ? The only thing that will change in 1.15 is multi-field > indexing, > ? ? ? ? and it > ? ? ? ? has never been possible to compute a mean (or any binary > ? ? ? ? operation) on > ? ? ? ? multiple fields. > > > ? ? from the example in the other thread > ? ? a[['b', 'c']].view(('f8', 2)).mean(0) > > > ? ? (from the statsmodels usecase: > ? ? read csv with genfromtext to get recarray or structured array > ? ? select/index the numeric columns > ? ? view them as standard array > ? ? do whatever we can do with standard numpy? arrays > ? ? ) > > > Oh ok, I misunderstood. I see your point: a mean over fields is more > difficult than before. > > Or, to phrase it as a question: > > How do we get a standard array with homogeneous dtype from the > corresponding elements of a structured dtype in numpy 1.14.0? > > Josef > > > The answer may be that "numpy has never had a way to that", > even if in a few special cases you might hack a workaround using views. > > That's what your example seems like to me. It uses an explicit view, > which is an "expert" feature since views depend on the exact memory > layout and binary representation of the array. Your example only > works if the two fields have exactly the same dtype as each other > and as the final dtype, and evidently breaks if there is byte > padding for any reason. > > Pandas can do row means without these problems: > > ? ? >>> pd.DataFrame(np.ones(10, dtype='i8,f8')).mean(axis=0) > > Numpy is missing this functionality, so you or whoever wrote that > example figured out a fragile workaround using views. > > > Once upon a time (*) this wasn't fragile but the only and recommended > way. Because dtypes were low level with clear memory layout and stayed > that way, it was easy to check item size or whatever and get different > views on it. > e.g. > https://mail.scipy.org/pipermail/numpy-discussion/2008-December/039340.html > > (*) pre-pandas, pre-stackoverflow on the mailing lists which was for me > roughly 2008 to 2012 > but a late thread > https://mail.scipy.org/pipermail/numpy-discussion/2015-October/074014.html > "What is now the recommended way of converting structured > dtypes/recarrays to ndarrays?" > > > > > I suggest that if we want to allow either means over fields, or > conversion of a n-D structured array to an n+1-D regular ndarray, we > should add a dedicated function to do so in numpy.lib.recfunctions > which does not depend on the binary representation of the array. > > > I don't really want to defend an obsolete (?) usecase of structured dtypes. > > However, I think there should be a decision about the future plans for > whether dataframe like usages of structure dtypes or through higher > level classes or functions are still supported, instead of removing > slowly and silently (*) the foundation for this use case, either support > this usage or say you will be dropping it. > > (*) I didn't read the details of the release notes > > > And another footnote about obsolete: > Given that I'm the only one arguing about the dataframe_like usecase of > recarrays and structured dtypes, I think they are dead for this specific > usecase?and only my inertia and conservativeness kept them alive in > statsmodels. > > > Josef It's a bit of a stretch to say that we are "silently" dropping support for dataframe-like use of structured arrays. First, we still allow pretty much all dataframe-like use we have supported since numpy 1.7, limited as it may be. We are really only dropping one very specialized, expert use involving an explicit view, which I still have doubts was ever more than a hack. That 2008 mailing list message didn't involve multi-field indexing, which didn't exist then (only introduced in 2009), and we have wanted to make them views (not copies) since their inception. Second, I don't think we are doing so silently: We have warned about this in release notes since numpy 1.7 in 2012/2013, and it gets mention in most releases since then. We have also raised FutureWarnings about it since 1.7. Unfortunately we missed warning in your specific case for a while, but we corrected this in 1.12 so you should have seen FutureWarnings since then. I don't feel the need to officially declare that we are dropping support for dataframe-like use of structured arrays. It's unclear where that use ends and other uses of structured arrays begin. I think updating the docs to warn that pandas/dask may be a better choice is enough, as I've been doing, and then users can decide for themselves. There is still the question about whether we should make numpy.lib.recfunctions more official. I don't have a strong opinion. I suppose it would be good to add a section to the structured array docs which lists those methods and says something like "the submodule numpy.lib.recfunctions provides minimal functionality to split, combine, and manipulate structured datatypes and arrays. In most cases, we strongly recommend users use a dedicated module such as pandas/xarray/dask instead of these methods, but they are provided for occasional convenience." Allan > Allan > > > ? ? Josef > > > ? ? ? ? Allan > > ? ? ? ? > > ? ? ? ? >? ? ?Cheers! > ? ? ? ? >? ? ?Ben Root > ? ? ? ? > > ? ? ? ? >? ? ?On Mon, Jan 29, 2018 at 3:24 PM, > > > > ? ? ? ? >? ? ? >>> wrote: > ? ? ? ? > > ? ? ? ? > > ? ? ? ? > > ? ? ? ? >? ? ? ? ?On Mon, Jan 29, 2018 at 2:55 PM, Stefan van > der Walt > ? ? ? ? >? ? ? ? ? > > ? ? ? ? >>> wrote: > ? ? ? ? > > ? ? ? ? >? ? ? ? ? ? ?On Mon, 29 Jan 2018 14:10:56 -0500, > josef.pktd at gmail.com > > > ? ? ? ? ?>? ? ? ? ? ? ? > > ? ? ? ? >> wrote: > ? ? ? ? ?> > ? ? ? ? ?>? ? ? ? ? ? ? ? ?Given that there is pandas, xarray, > dask and > ? ? ? ? more, numpy > ? ? ? ? ?>? ? ? ? ? ? ? ? ?could as well drop > ? ? ? ? ?>? ? ? ? ? ? ? ? ?any pretense of supporting > dataframe_likes. > ? ? ? ? Or, adjust > ? ? ? ? ?>? ? ? ? ? ? ? ? ?the recfunctions so > ? ? ? ? ?>? ? ? ? ? ? ? ? ?we can still work dataframe_like > with structured > ? ? ? ? ?>? ? ? ? ? ? ? ? ?dtypes/recarrays/recfunctions. > ? ? ? ? ?> > ? ? ? ? ?> > ? ? ? ? ?>? ? ? ? ? ? ?I haven't been following the duckarray > discussion > ? ? ? ? carefully, > ? ? ? ? ?>? ? ? ? ? ? ?but could > ? ? ? ? ?>? ? ? ? ? ? ?this be an opportunity for a dataframe > protocol, > ? ? ? ? so that we > ? ? ? ? ?>? ? ? ? ? ? ?can have > ? ? ? ? ?>? ? ? ? ? ? ?libraries ingest structured arrays, record > ? ? ? ? arrays, pandas > ? ? ? ? ?>? ? ? ? ? ? ?dataframes, > ? ? ? ? ?>? ? ? ? ? ? ?etc. without too much specialized code? > ? ? ? ? ?> > ? ? ? ? ?> > ? ? ? ? ?>? ? ? ? ?AFAIU while not being in the data handling area, > ? ? ? ? pandas defines > ? ? ? ? ?>? ? ? ? ?the interface and other libraries provide pandas > ? ? ? ? compatible > ? ? ? ? ?>? ? ? ? ?interfaces or implementations. > ? ? ? ? ?> > ? ? ? ? ?>? ? ? ? ?statsmodels currently still has recarray > support and > ? ? ? ? usage. In > ? ? ? ? ?>? ? ? ? ?some interfaces we support pandas, recarrays and > ? ? ? ? plain arrays, > ? ? ? ? ?>? ? ? ? ?or anything where asarray works correctly. > ? ? ? ? ?> > ? ? ? ? ?>? ? ? ? ?But recarrays became messy to support, one > rewrite of > ? ? ? ? some > ? ? ? ? ?>? ? ? ? ?functions last year converts recarrays to > pandas, > ? ? ? ? does the > ? ? ? ? ?>? ? ? ? ?manipulation and then converts back to > recarrays. > ? ? ? ? ?>? ? ? ? ?Also we need to adjust our recarray usage > with new numpy > ? ? ? ? ?>? ? ? ? ?versions. But there is no real benefit because I > ? ? ? ? doubt that > ? ? ? ? ?>? ? ? ? ?statsmodels still has any > recarray/structured dtype > ? ? ? ? users. So, > ? ? ? ? ?>? ? ? ? ?we only have to remove our own uses in the > datasets > ? ? ? ? and unit tests. > ? ? ? ? ?> > ? ? ? ? ?>? ? ? ? ?Josef > ? ? ? ? ?> > ? ? ? ? ?> > ? ? ? ? ?> > ? ? ? ? ?> > ? ? ? ? ?>? ? ? ? ? ? ?St?fan > ? ? ? ? ?> > ? ? ? ? ?> > ?_______________________________________________ > ? ? ? ? ?>? ? ? ? ? ? ?NumPy-Discussion mailing list > ? ? ? ? ?> NumPy-Discussion at python.org > > ? ? ? ? > > ? ? ? ? > ? ? ? ? >> > ? ? ? ? ?> > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > ? ? ? ? > > ? > > >> > ? ? ? ? > > ? ? ? ? > > ? ? ? ? > > ? ? ? ? >? ? ? ? ?_______________________________________________ > ? ? ? ? >? ? ? ? ?NumPy-Discussion mailing list > ? ? ? ? ?> NumPy-Discussion at python.org > > ? ? ? ? > > ? ? ? ? > ? ? ? ? >> > ? ? ? ? ?> > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > ? ? ? ? > > ? > > >> > ? ? ? ? > > ? ? ? ? > > ? ? ? ? > > ? ? ? ? >? ? ?_______________________________________________ > ? ? ? ? >? ? ?NumPy-Discussion mailing list > ? ? ? ? ?> NumPy-Discussion at python.org > > ? ? ? ? > > ? ? ? ? > ? ? ? ? >> > ? ? ? ? ?> > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > ? ? ? ? ?> > ? > > >> > ? ? ? ? ?> > ? ? ? ? ?> > ? ? ? ? ?> > ? ? ? ? ?> > ? ? ? ? ?> _______________________________________________ > ? ? ? ? ?> NumPy-Discussion mailing list > ? ? ? ? ?> NumPy-Discussion at python.org > > > > ? ? ? ? ?> > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > ? ? ? ? ?> > > ? ? ? ? _______________________________________________ > ? ? ? ? NumPy-Discussion mailing list > NumPy-Discussion at python.org > > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Tue Jan 30 13:33:01 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 30 Jan 2018 13:33:01 -0500 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: References: <388385e2-163b-ad2e-6bf6-1ac68a709278@gmail.com> <20180129195548.dtopwmanutp6ly4u@fastmail.com> <32dfc3bf-9197-2bf1-9767-25529e2d10fd@gmail.com> Message-ID: On Tue, Jan 30, 2018 at 12:28 PM, Allan Haldane wrote: > On 01/29/2018 11:50 PM, josef.pktd at gmail.com wrote: > >> >> >> On Mon, Jan 29, 2018 at 10:44 PM, Allan Haldane > > wrote: >> >> On 01/29/2018 05:59 PM, josef.pktd at gmail.com >> wrote: >> >> >> >> On Mon, Jan 29, 2018 at 5:50 PM, > > >> wrote: >> >> >> >> On Mon, Jan 29, 2018 at 4:11 PM, Allan Haldane >> >> >> >> wrote: >> >> On 01/29/2018 04:02 PM, josef.pktd at gmail.com >> >> > > wrote: >> > >> > >> > On Mon, Jan 29, 2018 at 3:44 PM, Benjamin Root >> >> > >> > > > >>> wrote: >> > >> > I <3 structured arrays. I love the fact that I >> can access data by >> > row and then by fieldname, or vice versa. There >> are times when I >> > need to pass just a column into a function, and >> there are times when >> > I need to process things row by row. Yes, pandas >> is nice if you want >> > the specialized indexing features, but it becomes >> a bear to deal >> > with if all you want is normal indexing, or even >> the ability to >> > easily loop over the dataset. >> > >> > >> > I don't think there is a doubt that structured >> arrays, arrays with >> > structured dtypes, are a useful container. The >> question is whether they >> > should be more or the foundation for more. >> > >> > For example, computing a mean, or reduce operation, >> over numeric element >> > ("columns"). Before padded views it was possible to >> index by selecting >> > the relevant "columns" and view them as standard >> array. With padded >> > views that breaks and AFAICS, there is no way in >> numpy 1.14.0 to compute >> > a mean of some "columns". (I don't have numpy 1.14 to >> try or find a >> > workaround, like maybe looping over all relevant >> columns.) >> > >> > Josef >> >> Just to clarify, structured types have always had >> padding bytes, >> that >> isn't new. >> >> What *is* new (which we are pushing to 1.15, I think) >> is that it >> may be >> somewhat more common to end up with padding than >> before, and >> only if you >> are specifically using multi-field indexing, which is a >> fairly >> specialized case. >> >> I think recfunctions already account properly for >> padding bytes. >> Except >> for the bug in #8100, which we will fix, padding-bytes in >> recarrays are >> more or less invisible to a non-expert who only cares >> about >> dataframe-like behavior. >> >> In other words, padding is no obstacle at all to >> computing a >> mean over a >> column, and single-field indexes in 1.15 behave >> identically as >> before. >> The only thing that will change in 1.15 is multi-field >> indexing, >> and it >> has never been possible to compute a mean (or any binary >> operation) on >> multiple fields. >> >> >> from the example in the other thread >> a[['b', 'c']].view(('f8', 2)).mean(0) >> >> >> (from the statsmodels usecase: >> read csv with genfromtext to get recarray or structured array >> select/index the numeric columns >> view them as standard array >> do whatever we can do with standard numpy arrays >> ) >> >> >> Oh ok, I misunderstood. I see your point: a mean over fields is more >> difficult than before. >> >> Or, to phrase it as a question: >> >> How do we get a standard array with homogeneous dtype from the >> corresponding elements of a structured dtype in numpy 1.14.0? >> >> Josef >> >> >> The answer may be that "numpy has never had a way to that", >> even if in a few special cases you might hack a workaround using >> views. >> >> That's what your example seems like to me. It uses an explicit view, >> which is an "expert" feature since views depend on the exact memory >> layout and binary representation of the array. Your example only >> works if the two fields have exactly the same dtype as each other >> and as the final dtype, and evidently breaks if there is byte >> padding for any reason. >> >> Pandas can do row means without these problems: >> >> >>> pd.DataFrame(np.ones(10, dtype='i8,f8')).mean(axis=0) >> >> Numpy is missing this functionality, so you or whoever wrote that >> example figured out a fragile workaround using views. >> >> >> Once upon a time (*) this wasn't fragile but the only and recommended >> way. Because dtypes were low level with clear memory layout and stayed that >> way, it was easy to check item size or whatever and get different views on >> it. >> e.g. https://mail.scipy.org/pipermail/numpy-discussion/2008- >> December/039340.html >> >> (*) pre-pandas, pre-stackoverflow on the mailing lists which was for me >> roughly 2008 to 2012 >> but a late thread https://mail.scipy.org/pipermail/numpy-discussion/2015- >> October/074014.html >> "What is now the recommended way of converting structured >> dtypes/recarrays to ndarrays?" >> >> >> >> >> I suggest that if we want to allow either means over fields, or >> conversion of a n-D structured array to an n+1-D regular ndarray, we >> should add a dedicated function to do so in numpy.lib.recfunctions >> which does not depend on the binary representation of the array. >> >> >> I don't really want to defend an obsolete (?) usecase of structured >> dtypes. >> >> However, I think there should be a decision about the future plans for >> whether dataframe like usages of structure dtypes or through higher level >> classes or functions are still supported, instead of removing slowly and >> silently (*) the foundation for this use case, either support this usage or >> say you will be dropping it. >> >> (*) I didn't read the details of the release notes >> >> >> And another footnote about obsolete: >> Given that I'm the only one arguing about the dataframe_like usecase of >> recarrays and structured dtypes, I think they are dead for this specific >> usecase and only my inertia and conservativeness kept them alive in >> statsmodels. >> >> >> Josef >> > > It's a bit of a stretch to say that we are "silently" dropping support for > dataframe-like use of structured arrays. > > First, we still allow pretty much all dataframe-like use we have supported > since numpy 1.7, limited as it may be. We are really only dropping one very > specialized, expert use involving an explicit view, which I still have > doubts was ever more than a hack. That 2008 mailing list message didn't > involve multi-field indexing, which didn't exist then (only introduced in > 2009), and we have wanted to make them views (not copies) since their > inception. > The 2008 mailing list thread introduced me to the working with views on structured arrays as the ONLY way to switch between structured and homogenous dtypes (if the underlying item size was homogeneous). The new stats.models started in 2009. > > Second, I don't think we are doing so silently: We have warned about this > in release notes since numpy 1.7 in 2012/2013, and it gets mention in most > releases since then. We have also raised FutureWarnings about it since 1.7. > Unfortunately we missed warning in your specific case for a while, but we > corrected this in 1.12 so you should have seen FutureWarnings since then. > If I see warnings in the test suite about getting a view instead copy from numpy, then the only/main consequence I think about is whether I need to watch out for inline modification. I didn't expect that the followup computation would change, and that it's a padded view and not a view on the selected memory. However, I just checked and padding is mentioned in the 1.12 release notes (which I never read before, ). AFAICS, one problem is that the padded view didn't come with the matching down stream usage support, the pack function as mentioned, an alternative way to convert to a standard ndarray, copy doesn't get rid of the padding and so on. eg. another mailing list thread I just found with the same problem http://numpy-discussion.10968.n7.nabble.com/view-of-recarray-issue-td32001.html quoting Ralf: Question: is that really the recommended way to get an (N, 2) size float array from two columns of a larger record array? If so, why isn't there a better way? If you'd want to write to that (N, 2) array you have to append a copy, making it even uglier. Also, then there really should be tests for views in test_records.py. This "better way" never showed up, AFAIK. And it looks like we came back to this problem every few years. Josef > > I don't feel the need to officially declare that we are dropping support > for dataframe-like use of structured arrays. It's unclear where that use > ends and other uses of structured arrays begin. I think updating the docs > to warn that pandas/dask may be a better choice is enough, as I've been > doing, and then users can decide for themselves. > There is still the question about whether we should make > numpy.lib.recfunctions more official. I don't have a strong opinion. I > suppose it would be good to add a section to the structured array docs > which lists those methods and says something like > > "the submodule numpy.lib.recfunctions provides minimal functionality to > split, combine, and manipulate structured datatypes and arrays. In most > cases, we strongly recommend users use a dedicated module such as > pandas/xarray/dask instead of these methods, but they are provided for > occasional convenience." > > Allan > > > > Allan >> >> >> Josef >> >> >> Allan >> >> > >> > Cheers! >> > Ben Root >> > >> > On Mon, Jan 29, 2018 at 3:24 PM, >> >> > >> > > > >>> wrote: >> > >> > >> > >> > On Mon, Jan 29, 2018 at 2:55 PM, Stefan van >> der Walt >> > > > > >> > > >>> wrote: >> > >> > On Mon, 29 Jan 2018 14:10:56 -0500, >> josef.pktd at gmail.com >> > >> > > >> >> > >> wrote: >> > >> > Given that there is pandas, xarray, >> dask and >> more, numpy >> > could as well drop >> > any pretense of supporting >> dataframe_likes. >> Or, adjust >> > the recfunctions so >> > we can still work dataframe_like >> with structured >> > dtypes/recarrays/recfunctions. >> > >> > >> > I haven't been following the duckarray >> discussion >> carefully, >> > but could >> > this be an opportunity for a dataframe >> protocol, >> so that we >> > can have >> > libraries ingest structured arrays, record >> arrays, pandas >> > dataframes, >> > etc. without too much specialized code? >> > >> > >> > AFAIU while not being in the data handling >> area, >> pandas defines >> > the interface and other libraries provide >> pandas >> compatible >> > interfaces or implementations. >> > >> > statsmodels currently still has recarray >> support and >> usage. In >> > some interfaces we support pandas, recarrays >> and >> plain arrays, >> > or anything where asarray works correctly. >> > >> > But recarrays became messy to support, one >> rewrite of >> some >> > functions last year converts recarrays to >> pandas, >> does the >> > manipulation and then converts back to >> recarrays. >> > Also we need to adjust our recarray usage >> with new numpy >> > versions. But there is no real benefit >> because I >> doubt that >> > statsmodels still has any >> recarray/structured dtype >> users. So, >> > we only have to remove our own uses in the >> datasets >> and unit tests. >> > >> > Josef >> > >> > >> > >> > >> > St?fan >> > >> > _____________________________ >> __________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at python.org >> >> > > >> > >> > >> >> > >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> > an/listinfo/numpy-discussion >> > >> > > man/listinfo/numpy-discussion >> >> > an/listinfo/numpy-discussion >> >> >> > >> > >> > >> > _____________________________ >> __________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at python.org >> >> > > >> > >> > >> >> > >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> > an/listinfo/numpy-discussion >> > >> > > man/listinfo/numpy-discussion >> >> > an/listinfo/numpy-discussion >> >> >> > >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at python.org >> >> > > >> > >> > >> >> > >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> > an/listinfo/numpy-discussion >> > >> > > man/listinfo/numpy-discussion >> >> > an/listinfo/numpy-discussion >> >> >> > >> > >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at python.org >> >> > > >> > >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> > an/listinfo/numpy-discussion >> > >> > >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> > > >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> > an/listinfo/numpy-discussion >> > >> >> >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Jan 30 14:42:41 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 30 Jan 2018 14:42:41 -0500 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: References: <388385e2-163b-ad2e-6bf6-1ac68a709278@gmail.com> <20180129195548.dtopwmanutp6ly4u@fastmail.com> <32dfc3bf-9197-2bf1-9767-25529e2d10fd@gmail.com> Message-ID: On Tue, Jan 30, 2018 at 1:33 PM, wrote: > > > On Tue, Jan 30, 2018 at 12:28 PM, Allan Haldane > wrote: > >> On 01/29/2018 11:50 PM, josef.pktd at gmail.com wrote: >> >>> >>> >>> On Mon, Jan 29, 2018 at 10:44 PM, Allan Haldane >> > wrote: >>> >>> On 01/29/2018 05:59 PM, josef.pktd at gmail.com >>> wrote: >>> >>> >>> >>> On Mon, Jan 29, 2018 at 5:50 PM, >> >> >> wrote: >>> >>> >>> >>> On Mon, Jan 29, 2018 at 4:11 PM, Allan Haldane >>> >>> >> >>> wrote: >>> >>> On 01/29/2018 04:02 PM, josef.pktd at gmail.com >>> >>> >> > wrote: >>> > >>> > >>> > On Mon, Jan 29, 2018 at 3:44 PM, Benjamin Root >>> >>> > >>> > >> >> >>> wrote: >>> > >>> > I <3 structured arrays. I love the fact that I >>> can access data by >>> > row and then by fieldname, or vice versa. There >>> are times when I >>> > need to pass just a column into a function, and >>> there are times when >>> > I need to process things row by row. Yes, pandas >>> is nice if you want >>> > the specialized indexing features, but it becomes >>> a bear to deal >>> > with if all you want is normal indexing, or even >>> the ability to >>> > easily loop over the dataset. >>> > >>> > >>> > I don't think there is a doubt that structured >>> arrays, arrays with >>> > structured dtypes, are a useful container. The >>> question is whether they >>> > should be more or the foundation for more. >>> > >>> > For example, computing a mean, or reduce operation, >>> over numeric element >>> > ("columns"). Before padded views it was possible to >>> index by selecting >>> > the relevant "columns" and view them as standard >>> array. With padded >>> > views that breaks and AFAICS, there is no way in >>> numpy 1.14.0 to compute >>> > a mean of some "columns". (I don't have numpy 1.14 to >>> try or find a >>> > workaround, like maybe looping over all relevant >>> columns.) >>> > >>> > Josef >>> >>> Just to clarify, structured types have always had >>> padding bytes, >>> that >>> isn't new. >>> >>> What *is* new (which we are pushing to 1.15, I think) >>> is that it >>> may be >>> somewhat more common to end up with padding than >>> before, and >>> only if you >>> are specifically using multi-field indexing, which is a >>> fairly >>> specialized case. >>> >>> I think recfunctions already account properly for >>> padding bytes. >>> Except >>> for the bug in #8100, which we will fix, padding-bytes >>> in >>> recarrays are >>> more or less invisible to a non-expert who only cares >>> about >>> dataframe-like behavior. >>> >>> In other words, padding is no obstacle at all to >>> computing a >>> mean over a >>> column, and single-field indexes in 1.15 behave >>> identically as >>> before. >>> The only thing that will change in 1.15 is multi-field >>> indexing, >>> and it >>> has never been possible to compute a mean (or any binary >>> operation) on >>> multiple fields. >>> >>> >>> from the example in the other thread >>> a[['b', 'c']].view(('f8', 2)).mean(0) >>> >>> >>> (from the statsmodels usecase: >>> read csv with genfromtext to get recarray or structured >>> array >>> select/index the numeric columns >>> view them as standard array >>> do whatever we can do with standard numpy arrays >>> ) >>> >>> >>> Oh ok, I misunderstood. I see your point: a mean over fields is more >>> difficult than before. >>> >>> Or, to phrase it as a question: >>> >>> How do we get a standard array with homogeneous dtype from the >>> corresponding elements of a structured dtype in numpy 1.14.0? >>> >>> Josef >>> >>> >>> The answer may be that "numpy has never had a way to that", >>> even if in a few special cases you might hack a workaround using >>> views. >>> >>> That's what your example seems like to me. It uses an explicit view, >>> which is an "expert" feature since views depend on the exact memory >>> layout and binary representation of the array. Your example only >>> works if the two fields have exactly the same dtype as each other >>> and as the final dtype, and evidently breaks if there is byte >>> padding for any reason. >>> >>> Pandas can do row means without these problems: >>> >>> >>> pd.DataFrame(np.ones(10, dtype='i8,f8')).mean(axis=0) >>> >>> Numpy is missing this functionality, so you or whoever wrote that >>> example figured out a fragile workaround using views. >>> >>> >>> Once upon a time (*) this wasn't fragile but the only and recommended >>> way. Because dtypes were low level with clear memory layout and stayed that >>> way, it was easy to check item size or whatever and get different views on >>> it. >>> e.g. https://mail.scipy.org/pipermail/numpy-discussion/2008-Decem >>> ber/039340.html >>> >>> (*) pre-pandas, pre-stackoverflow on the mailing lists which was for me >>> roughly 2008 to 2012 >>> but a late thread https://mail.scipy.org/piperma >>> il/numpy-discussion/2015-October/074014.html >>> "What is now the recommended way of converting structured >>> dtypes/recarrays to ndarrays?" >>> >>> on final historical note (once upon a time users relied on cookbooks) http://scipy-cookbook.readthedocs.io/items/Recarray. html#Converting-to-regular-arrays-and-reshaping 2010-03-09 (last modified), 2008-06-27 (created) which I assume is broken in numpy 1.4.0 > >>> >>> >>> I suggest that if we want to allow either means over fields, or >>> conversion of a n-D structured array to an n+1-D regular ndarray, we >>> should add a dedicated function to do so in numpy.lib.recfunctions >>> which does not depend on the binary representation of the array. >>> >>> >>> I don't really want to defend an obsolete (?) usecase of structured >>> dtypes. >>> >>> However, I think there should be a decision about the future plans for >>> whether dataframe like usages of structure dtypes or through higher level >>> classes or functions are still supported, instead of removing slowly and >>> silently (*) the foundation for this use case, either support this usage or >>> say you will be dropping it. >>> >>> (*) I didn't read the details of the release notes >>> >>> >>> And another footnote about obsolete: >>> Given that I'm the only one arguing about the dataframe_like usecase of >>> recarrays and structured dtypes, I think they are dead for this specific >>> usecase and only my inertia and conservativeness kept them alive in >>> statsmodels. >>> >>> >>> Josef >>> >> >> It's a bit of a stretch to say that we are "silently" dropping support >> for dataframe-like use of structured arrays. >> >> First, we still allow pretty much all dataframe-like use we have >> supported since numpy 1.7, limited as it may be. We are really only >> dropping one very specialized, expert use involving an explicit view, which >> I still have doubts was ever more than a hack. That 2008 mailing list >> message didn't involve multi-field indexing, which didn't exist then (only >> introduced in 2009), and we have wanted to make them views (not copies) >> since their inception. >> > > The 2008 mailing list thread introduced me to the working with views on > structured arrays as the ONLY way to switch between structured and > homogenous dtypes (if the underlying item size was homogeneous). > The new stats.models started in 2009. > > >> >> Second, I don't think we are doing so silently: We have warned about this >> in release notes since numpy 1.7 in 2012/2013, and it gets mention in most >> releases since then. We have also raised FutureWarnings about it since 1.7. >> Unfortunately we missed warning in your specific case for a while, but we >> corrected this in 1.12 so you should have seen FutureWarnings since then. >> > > If I see warnings in the test suite about getting a view instead copy from > numpy, then the only/main consequence I think about is whether I need to > watch out for inline modification. > I didn't expect that the followup computation would change, and that it's > a padded view and not a view on the selected memory. However, I just > checked and padding is mentioned in the 1.12 release notes (which I never > read before, ). > > AFAICS, one problem is that the padded view didn't come with the matching > down stream usage support, the pack function as mentioned, an alternative > way to convert to a standard ndarray, copy doesn't get rid of the padding > and so on. > > eg. another mailing list thread I just found with the same problem > http://numpy-discussion.10968.n7.nabble.com/view-of-recarray > -issue-td32001.html > > quoting Ralf: > Question: is that really the recommended way to get an (N, 2) size float > array from two columns of a larger record array? If so, why isn't there a > better way? If you'd want to write to that (N, 2) array you have to append > a copy, making it even uglier. Also, then there really should be tests for > views in test_records.py. > > > This "better way" never showed up, AFAIK. And it looks like we came back > to this problem every few years. > > Josef > > >> >> I don't feel the need to officially declare that we are dropping support >> for dataframe-like use of structured arrays. It's unclear where that use >> ends and other uses of structured arrays begin. I think updating the docs >> to warn that pandas/dask may be a better choice is enough, as I've been >> doing, and then users can decide for themselves. > > >> There is still the question about whether we should make >> numpy.lib.recfunctions more official. I don't have a strong opinion. I >> suppose it would be good to add a section to the structured array docs >> which lists those methods and says something like >> >> "the submodule numpy.lib.recfunctions provides minimal functionality to >> split, combine, and manipulate structured datatypes and arrays. In most >> cases, we strongly recommend users use a dedicated module such as >> pandas/xarray/dask instead of these methods, but they are provided for >> occasional convenience." >> >> Allan >> >> >> >> Allan >>> >>> >>> Josef >>> >>> >>> Allan >>> >>> > >>> > Cheers! >>> > Ben Root >>> > >>> > On Mon, Jan 29, 2018 at 3:24 PM, >>> >>> > >>> > >> >> >>> wrote: >>> > >>> > >>> > >>> > On Mon, Jan 29, 2018 at 2:55 PM, Stefan van >>> der Walt >>> > >> >> > >>> >> >> >>> wrote: >>> > >>> > On Mon, 29 Jan 2018 14:10:56 -0500, >>> josef.pktd at gmail.com >>> > >>> > >> >>> >>> >> >> wrote: >>> > >>> > Given that there is pandas, xarray, >>> dask and >>> more, numpy >>> > could as well drop >>> > any pretense of supporting >>> dataframe_likes. >>> Or, adjust >>> > the recfunctions so >>> > we can still work dataframe_like >>> with structured >>> > dtypes/recarrays/recfunctions. >>> > >>> > >>> > I haven't been following the duckarray >>> discussion >>> carefully, >>> > but could >>> > this be an opportunity for a dataframe >>> protocol, >>> so that we >>> > can have >>> > libraries ingest structured arrays, >>> record >>> arrays, pandas >>> > dataframes, >>> > etc. without too much specialized code? >>> > >>> > >>> > AFAIU while not being in the data handling >>> area, >>> pandas defines >>> > the interface and other libraries provide >>> pandas >>> compatible >>> > interfaces or implementations. >>> > >>> > statsmodels currently still has recarray >>> support and >>> usage. In >>> > some interfaces we support pandas, recarrays >>> and >>> plain arrays, >>> > or anything where asarray works correctly. >>> > >>> > But recarrays became messy to support, one >>> rewrite of >>> some >>> > functions last year converts recarrays to >>> pandas, >>> does the >>> > manipulation and then converts back to >>> recarrays. >>> > Also we need to adjust our recarray usage >>> with new numpy >>> > versions. But there is no real benefit >>> because I >>> doubt that >>> > statsmodels still has any >>> recarray/structured dtype >>> users. So, >>> > we only have to remove our own uses in the >>> datasets >>> and unit tests. >>> > >>> > Josef >>> > >>> > >>> > >>> > >>> > St?fan >>> > >>> > _____________________________ >>> __________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion at python.org >>> >>> >> > >>> >> >>> >> >> >>> > >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >>> >> an/listinfo/numpy-discussion >>> > >>> > >> man/listinfo/numpy-discussion >>> >>> >> an/listinfo/numpy-discussion >>> >> >>> > >>> > >>> > >>> > _____________________________ >>> __________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion at python.org >>> >>> >> > >>> >> >>> >> >> >>> > >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >>> >> an/listinfo/numpy-discussion >>> > >>> > >> man/listinfo/numpy-discussion >>> >>> >> an/listinfo/numpy-discussion >>> >> >>> > >>> > >>> > >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion at python.org >>> >>> >> > >>> >> >>> >> >> >>> > >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >>> >> an/listinfo/numpy-discussion >>> > >>> > >> man/listinfo/numpy-discussion >>> >>> >> an/listinfo/numpy-discussion >>> >> >>> > >>> > >>> > >>> > >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion at python.org >>> >>> >> > >>> > >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >>> >> an/listinfo/numpy-discussion >>> > >>> > >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> >> > >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >>> >> an/listinfo/numpy-discussion >>> > >>> >>> >>> >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >>> >>> >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Jan 30 14:49:10 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 30 Jan 2018 14:49:10 -0500 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: References: <388385e2-163b-ad2e-6bf6-1ac68a709278@gmail.com> <20180129195548.dtopwmanutp6ly4u@fastmail.com> <32dfc3bf-9197-2bf1-9767-25529e2d10fd@gmail.com> Message-ID: On Tue, Jan 30, 2018 at 2:42 PM, wrote: > > > On Tue, Jan 30, 2018 at 1:33 PM, wrote: > >> >> >> On Tue, Jan 30, 2018 at 12:28 PM, Allan Haldane >> wrote: >> >>> On 01/29/2018 11:50 PM, josef.pktd at gmail.com wrote: >>> >>>> >>>> >>>> On Mon, Jan 29, 2018 at 10:44 PM, Allan Haldane >>> > wrote: >>>> >>>> On 01/29/2018 05:59 PM, josef.pktd at gmail.com >>>> wrote: >>>> >>>> >>>> >>>> On Mon, Jan 29, 2018 at 5:50 PM, >>> >>> >> wrote: >>>> >>>> >>>> >>>> On Mon, Jan 29, 2018 at 4:11 PM, Allan Haldane >>>> >>>> >>> >>> >>>> wrote: >>>> >>>> On 01/29/2018 04:02 PM, josef.pktd at gmail.com >>>> >>>> >>> > wrote: >>>> > >>>> > >>>> > On Mon, Jan 29, 2018 at 3:44 PM, Benjamin Root >>>> >>>> > >>>> > >>> >>> >>> wrote: >>>> > >>>> > I <3 structured arrays. I love the fact that I >>>> can access data by >>>> > row and then by fieldname, or vice versa. There >>>> are times when I >>>> > need to pass just a column into a function, and >>>> there are times when >>>> > I need to process things row by row. Yes, pandas >>>> is nice if you want >>>> > the specialized indexing features, but it becomes >>>> a bear to deal >>>> > with if all you want is normal indexing, or even >>>> the ability to >>>> > easily loop over the dataset. >>>> > >>>> > >>>> > I don't think there is a doubt that structured >>>> arrays, arrays with >>>> > structured dtypes, are a useful container. The >>>> question is whether they >>>> > should be more or the foundation for more. >>>> > >>>> > For example, computing a mean, or reduce operation, >>>> over numeric element >>>> > ("columns"). Before padded views it was possible to >>>> index by selecting >>>> > the relevant "columns" and view them as standard >>>> array. With padded >>>> > views that breaks and AFAICS, there is no way in >>>> numpy 1.14.0 to compute >>>> > a mean of some "columns". (I don't have numpy 1.14 to >>>> try or find a >>>> > workaround, like maybe looping over all relevant >>>> columns.) >>>> > >>>> > Josef >>>> >>>> Just to clarify, structured types have always had >>>> padding bytes, >>>> that >>>> isn't new. >>>> >>>> What *is* new (which we are pushing to 1.15, I think) >>>> is that it >>>> may be >>>> somewhat more common to end up with padding than >>>> before, and >>>> only if you >>>> are specifically using multi-field indexing, which is a >>>> fairly >>>> specialized case. >>>> >>>> I think recfunctions already account properly for >>>> padding bytes. >>>> Except >>>> for the bug in #8100, which we will fix, padding-bytes >>>> in >>>> recarrays are >>>> more or less invisible to a non-expert who only cares >>>> about >>>> dataframe-like behavior. >>>> >>>> In other words, padding is no obstacle at all to >>>> computing a >>>> mean over a >>>> column, and single-field indexes in 1.15 behave >>>> identically as >>>> before. >>>> The only thing that will change in 1.15 is multi-field >>>> indexing, >>>> and it >>>> has never been possible to compute a mean (or any >>>> binary >>>> operation) on >>>> multiple fields. >>>> >>>> >>>> from the example in the other thread >>>> a[['b', 'c']].view(('f8', 2)).mean(0) >>>> >>>> >>>> (from the statsmodels usecase: >>>> read csv with genfromtext to get recarray or structured >>>> array >>>> select/index the numeric columns >>>> view them as standard array >>>> do whatever we can do with standard numpy arrays >>>> ) >>>> >>>> >>>> Oh ok, I misunderstood. I see your point: a mean over fields is more >>>> difficult than before. >>>> >>>> Or, to phrase it as a question: >>>> >>>> How do we get a standard array with homogeneous dtype from the >>>> corresponding elements of a structured dtype in numpy 1.14.0? >>>> >>>> Josef >>>> >>>> >>>> The answer may be that "numpy has never had a way to that", >>>> even if in a few special cases you might hack a workaround using >>>> views. >>>> >>>> That's what your example seems like to me. It uses an explicit view, >>>> which is an "expert" feature since views depend on the exact memory >>>> layout and binary representation of the array. Your example only >>>> works if the two fields have exactly the same dtype as each other >>>> and as the final dtype, and evidently breaks if there is byte >>>> padding for any reason. >>>> >>>> Pandas can do row means without these problems: >>>> >>>> >>> pd.DataFrame(np.ones(10, dtype='i8,f8')).mean(axis=0) >>>> >>>> Numpy is missing this functionality, so you or whoever wrote that >>>> example figured out a fragile workaround using views. >>>> >>>> >>>> Once upon a time (*) this wasn't fragile but the only and recommended >>>> way. Because dtypes were low level with clear memory layout and stayed that >>>> way, it was easy to check item size or whatever and get different views on >>>> it. >>>> e.g. https://mail.scipy.org/pipermail/numpy-discussion/2008-Decem >>>> ber/039340.html >>>> >>>> (*) pre-pandas, pre-stackoverflow on the mailing lists which was for me >>>> roughly 2008 to 2012 >>>> but a late thread https://mail.scipy.org/piperma >>>> il/numpy-discussion/2015-October/074014.html >>>> "What is now the recommended way of converting structured >>>> dtypes/recarrays to ndarrays?" >>>> >>>> > on final historical note (once upon a time users relied on cookbooks) > http://scipy-cookbook.readthedocs.io/items/Recarray.html# > Converting-to-regular-arrays-and-reshaping > 2010-03-09 (last modified), 2008-06-27 (created) > which I assume is broken in numpy 1.4.0 > and a final grumpy note https://docs.scipy.org/doc/numpy-1.14.0/release.html#multiple-field-indexing-assignment-of-structured-arrays " which will affect code such as" = "which will break your code without offering an alternative" Josef > > > >> >>>> >>>> >>>> I suggest that if we want to allow either means over fields, or >>>> conversion of a n-D structured array to an n+1-D regular ndarray, we >>>> should add a dedicated function to do so in numpy.lib.recfunctions >>>> which does not depend on the binary representation of the array. >>>> >>>> >>>> I don't really want to defend an obsolete (?) usecase of structured >>>> dtypes. >>>> >>>> However, I think there should be a decision about the future plans for >>>> whether dataframe like usages of structure dtypes or through higher level >>>> classes or functions are still supported, instead of removing slowly and >>>> silently (*) the foundation for this use case, either support this usage or >>>> say you will be dropping it. >>>> >>>> (*) I didn't read the details of the release notes >>>> >>>> >>>> And another footnote about obsolete: >>>> Given that I'm the only one arguing about the dataframe_like usecase of >>>> recarrays and structured dtypes, I think they are dead for this specific >>>> usecase and only my inertia and conservativeness kept them alive in >>>> statsmodels. >>>> >>>> >>>> Josef >>>> >>> >>> It's a bit of a stretch to say that we are "silently" dropping support >>> for dataframe-like use of structured arrays. >>> >>> First, we still allow pretty much all dataframe-like use we have >>> supported since numpy 1.7, limited as it may be. We are really only >>> dropping one very specialized, expert use involving an explicit view, which >>> I still have doubts was ever more than a hack. That 2008 mailing list >>> message didn't involve multi-field indexing, which didn't exist then (only >>> introduced in 2009), and we have wanted to make them views (not copies) >>> since their inception. >>> >> >> The 2008 mailing list thread introduced me to the working with views on >> structured arrays as the ONLY way to switch between structured and >> homogenous dtypes (if the underlying item size was homogeneous). >> The new stats.models started in 2009. >> >> >>> >>> Second, I don't think we are doing so silently: We have warned about >>> this in release notes since numpy 1.7 in 2012/2013, and it gets mention in >>> most releases since then. We have also raised FutureWarnings about it since >>> 1.7. Unfortunately we missed warning in your specific case for a while, but >>> we corrected this in 1.12 so you should have seen FutureWarnings since then. >>> >> >> If I see warnings in the test suite about getting a view instead copy >> from numpy, then the only/main consequence I think about is whether I need >> to watch out for inline modification. >> I didn't expect that the followup computation would change, and that it's >> a padded view and not a view on the selected memory. However, I just >> checked and padding is mentioned in the 1.12 release notes (which I never >> read before, ). >> >> AFAICS, one problem is that the padded view didn't come with the matching >> down stream usage support, the pack function as mentioned, an alternative >> way to convert to a standard ndarray, copy doesn't get rid of the padding >> and so on. >> >> eg. another mailing list thread I just found with the same problem >> http://numpy-discussion.10968.n7.nabble.com/view-of-recarray >> -issue-td32001.html >> >> quoting Ralf: >> Question: is that really the recommended way to get an (N, 2) size float >> array from two columns of a larger record array? If so, why isn't there a >> better way? If you'd want to write to that (N, 2) array you have to append >> a copy, making it even uglier. Also, then there really should be tests for >> views in test_records.py. >> >> >> This "better way" never showed up, AFAIK. And it looks like we came back >> to this problem every few years. >> >> Josef >> >> >>> >>> I don't feel the need to officially declare that we are dropping support >>> for dataframe-like use of structured arrays. It's unclear where that use >>> ends and other uses of structured arrays begin. I think updating the docs >>> to warn that pandas/dask may be a better choice is enough, as I've been >>> doing, and then users can decide for themselves. >> >> >>> There is still the question about whether we should make >>> numpy.lib.recfunctions more official. I don't have a strong opinion. I >>> suppose it would be good to add a section to the structured array docs >>> which lists those methods and says something like >>> >>> "the submodule numpy.lib.recfunctions provides minimal functionality to >>> split, combine, and manipulate structured datatypes and arrays. In most >>> cases, we strongly recommend users use a dedicated module such as >>> pandas/xarray/dask instead of these methods, but they are provided for >>> occasional convenience." >>> >>> Allan >>> >>> >>> >>> Allan >>>> >>>> >>>> Josef >>>> >>>> >>>> Allan >>>> >>>> > >>>> > Cheers! >>>> > Ben Root >>>> > >>>> > On Mon, Jan 29, 2018 at 3:24 PM, >>>> >>>> > >>>> > >>> >>> >>> wrote: >>>> > >>>> > >>>> > >>>> > On Mon, Jan 29, 2018 at 2:55 PM, Stefan van >>>> der Walt >>>> > >>> >>> > >>>> >>> >>> >>> wrote: >>>> > >>>> > On Mon, 29 Jan 2018 14:10:56 -0500, >>>> josef.pktd at gmail.com >>>> > >>>> > >>> >>>> >>>> >>> >> wrote: >>>> > >>>> > Given that there is pandas, xarray, >>>> dask and >>>> more, numpy >>>> > could as well drop >>>> > any pretense of supporting >>>> dataframe_likes. >>>> Or, adjust >>>> > the recfunctions so >>>> > we can still work dataframe_like >>>> with structured >>>> > dtypes/recarrays/recfunctions. >>>> > >>>> > >>>> > I haven't been following the duckarray >>>> discussion >>>> carefully, >>>> > but could >>>> > this be an opportunity for a dataframe >>>> protocol, >>>> so that we >>>> > can have >>>> > libraries ingest structured arrays, >>>> record >>>> arrays, pandas >>>> > dataframes, >>>> > etc. without too much specialized code? >>>> > >>>> > >>>> > AFAIU while not being in the data handling >>>> area, >>>> pandas defines >>>> > the interface and other libraries provide >>>> pandas >>>> compatible >>>> > interfaces or implementations. >>>> > >>>> > statsmodels currently still has recarray >>>> support and >>>> usage. In >>>> > some interfaces we support pandas, >>>> recarrays and >>>> plain arrays, >>>> > or anything where asarray works correctly. >>>> > >>>> > But recarrays became messy to support, one >>>> rewrite of >>>> some >>>> > functions last year converts recarrays to >>>> pandas, >>>> does the >>>> > manipulation and then converts back to >>>> recarrays. >>>> > Also we need to adjust our recarray usage >>>> with new numpy >>>> > versions. But there is no real benefit >>>> because I >>>> doubt that >>>> > statsmodels still has any >>>> recarray/structured dtype >>>> users. So, >>>> > we only have to remove our own uses in the >>>> datasets >>>> and unit tests. >>>> > >>>> > Josef >>>> > >>>> > >>>> > >>>> > >>>> > St?fan >>>> > >>>> > _____________________________ >>>> __________________ >>>> > NumPy-Discussion mailing list >>>> > NumPy-Discussion at python.org >>>> >>>> >>> > >>>> >>> >>>> >>> >> >>>> > >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> an/listinfo/numpy-discussion >>>> > >>>> > >>> man/listinfo/numpy-discussion >>>> >>>> >>> an/listinfo/numpy-discussion >>>> >> >>>> > >>>> > >>>> > >>>> > _____________________________ >>>> __________________ >>>> > NumPy-Discussion mailing list >>>> > NumPy-Discussion at python.org >>>> >>>> >>> > >>>> >>> >>>> >>> >> >>>> > >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> an/listinfo/numpy-discussion >>>> > >>>> > >>> man/listinfo/numpy-discussion >>>> >>>> >>> an/listinfo/numpy-discussion >>>> >> >>>> > >>>> > >>>> > >>>> > _______________________________________________ >>>> > NumPy-Discussion mailing list >>>> > NumPy-Discussion at python.org >>>> >>>> >>> > >>>> >>> >>>> >>> >> >>>> > >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> an/listinfo/numpy-discussion >>>> > >>>> > >>> man/listinfo/numpy-discussion >>>> >>>> >>> an/listinfo/numpy-discussion >>>> >> >>>> > >>>> > >>>> > >>>> > >>>> > _______________________________________________ >>>> > NumPy-Discussion mailing list >>>> > NumPy-Discussion at python.org >>>> >>>> >>> > >>>> > >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> an/listinfo/numpy-discussion >>>> > >>>> > >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at python.org >>> > >>>> >>> > >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> an/listinfo/numpy-discussion >>>> > >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at python.org >>> > >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at python.org >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at python.org >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Tue Jan 30 15:21:20 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Tue, 30 Jan 2018 15:21:20 -0500 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: References: <20180129195548.dtopwmanutp6ly4u@fastmail.com> <32dfc3bf-9197-2bf1-9767-25529e2d10fd@gmail.com> Message-ID: <1cd22704-e889-cea1-6aa8-a2126a9d2525@gmail.com> On 01/30/2018 01:33 PM, josef.pktd at gmail.com wrote: > AFAICS, one problem is that the padded view didn't come with the > matching down stream usage support, the pack function as mentioned, an > alternative way to convert to a standard ndarray, copy doesn't get rid > of the padding and so on. > > eg. another mailing list thread I just found with the same problem > http://numpy-discussion.10968.n7.nabble.com/view-of-recarray-issue-td32001.html > > quoting Ralf: > Question: is that really the recommended way to get an (N, 2) size float > array from two columns of a larger record array? If so, why isn't there > a better way? If you'd want to write to that (N, 2) array you have to > append a copy, making it even uglier. Also, then there really should be > tests for views in test_records.py. > > > This "better way" never showed up, AFAIK. And it looks like we came back > to this problem every few years. > > Josef Since we are at least pushing off this change to a later release (1.15?), we have some time to prepare/catch up. What can we add to numpy.lib.recfunctions to make the multi-field copy->view change smoother? We have discussed at least two functions: * repack_fields - rearrange the memory layout of a structured array to add/remove padding between fields * structured_to_unstructured - turns a n-D structured array into an (n+1)-D unstructured ndarray, whose dtype is the highest common type of all the fields. May want the inverse function too. We might also consider * apply_along_fields(arr, method) - applies the method along the "field" axis, equivalent to something like method(struct_to_unstructured(arr), axis=-1) I think these are pretty minimal and shouldn't be too hard to implement. Allan From josef.pktd at gmail.com Tue Jan 30 16:54:23 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 30 Jan 2018 16:54:23 -0500 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: <1cd22704-e889-cea1-6aa8-a2126a9d2525@gmail.com> References: <20180129195548.dtopwmanutp6ly4u@fastmail.com> <32dfc3bf-9197-2bf1-9767-25529e2d10fd@gmail.com> <1cd22704-e889-cea1-6aa8-a2126a9d2525@gmail.com> Message-ID: On Tue, Jan 30, 2018 at 3:21 PM, Allan Haldane wrote: > On 01/30/2018 01:33 PM, josef.pktd at gmail.com wrote: > > AFAICS, one problem is that the padded view didn't come with the > > matching down stream usage support, the pack function as mentioned, an > > alternative way to convert to a standard ndarray, copy doesn't get rid > > of the padding and so on. > > > > eg. another mailing list thread I just found with the same problem > > http://numpy-discussion.10968.n7.nabble.com/view-of- > recarray-issue-td32001.html > > > > quoting Ralf: > > Question: is that really the recommended way to get an (N, 2) size float > > array from two columns of a larger record array? If so, why isn't there > > a better way? If you'd want to write to that (N, 2) array you have to > > append a copy, making it even uglier. Also, then there really should be > > tests for views in test_records.py. > > > > > > This "better way" never showed up, AFAIK. And it looks like we came back > > to this problem every few years. > > > > Josef > > Since we are at least pushing off this change to a later release > (1.15?), we have some time to prepare/catch up. > > What can we add to numpy.lib.recfunctions to make the multi-field > copy->view change smoother? We have discussed at least two functions: > > * repack_fields - rearrange the memory layout of a structured array to > add/remove padding between fields > > * structured_to_unstructured - turns a n-D structured array into an > (n+1)-D unstructured ndarray, whose dtype is the highest common type of > all the fields. May want the inverse function too. > The only sticky point with statsmodels is to have an equivalent of a[['b', 'c']].view(('f8', 2)). Highest common dtype might be object, the main usecase for this is to select some elements of a specific dtype and then use them as standard,homogeneous ndarray. In our case and other cases that I have seen it is mainly to select a subset of the floating point numbers. Another case of this might be to combine two strings into one a[['b', 'c']].view(('S8')) if b is s5 and c is S3, but I don't think I used this in serious code. for inverse function: I guess it is still possible to view any standard homogenous ndarray with a structured dtype as long as the itemsize matches. Browsing through old mailing list threads, I saw that adding multiple fields or concatenating two arrays with structured dtypes into an array with a single combined dtype was missing and I guess still is. (IIRC this is the usecase where we go now the pandas detour in statsmodels.) > > > We might also consider > > * apply_along_fields(arr, method) - applies the method along the > "field" axis, equivalent to something like > method(struct_to_unstructured(arr), axis=-1) > If this works on a padded view of an existing array, then this would be an improvement over the current version of having to extract and copy the relevant fields of an existing structured dtype or loop over different numeric dtypes, ints, floats. In general there will need to be a way to apply `method` only to selected columns, or columns of a matching dtype. (e.g. We don't want the sum or mean of a string.) (e.g. we use ptp() on numeric fields to check if there is already a constant column in the array or dataframe) > > > I think these are pretty minimal and shouldn't be too hard to implement. > AFAICS, it would cover the statsmodels usage. Josef > > Allan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From renato.fabbri at gmail.com Tue Jan 30 17:32:22 2018 From: renato.fabbri at gmail.com (Renato Fabbri) Date: Tue, 30 Jan 2018 20:32:22 -0200 Subject: [Numpy-discussion] doc? music through mathematical relations between LPCM samples and musical elements/characteristics Message-ID: the half-shape suite: archive.org/details/ShapeSuite was completely synthesized using psychophysical relations for each resulting 16bit 44kHz samples: https://arxiv.org/abs/1412.6853 I am thinking about the ways in which to make the documentation at least to: * mass (music and audio in sample sequences): the framework. It might be considered a toolbox, but it is not a package: https://github.com/ttm/mass/ * music: a package for making music, including the routines used to make the half-shape suite: https://github.com/ttm/music == It seems reasonable at first to: * upload the package to PyPI (mass is not a package), music is there (beta but there). * Make available some summary of the file tree, including some automated code documentation. As I follow numpy's convention (or try to), and the package and framework are particularly useful for proficient numpy users, I used Sphinx. The very preliminary version is: https://pythonmusic.github.io/ * Make a nice readthedocs documentation. "music" project name was taken so I made: http://music-documentation.readthedocs.io/en/latest/ "mass" project name was also taken so I made: http://musicmass.readthedocs.io/en/latest/ And I found these URLs clumsy. Is there a standard or would you go with musicpackage.read.. and massframework.read... ? More importantly: * would you use readthedocs for the sphinx/doxigen output? * would you use readthedocs.org or a gitbook would be better or...? Should I contact the scipy community to make available a scikit or integrate it to numpy in any way beyond using and citing it appropriately?. BTW. I am using vimwiki with some constant attempt to organize and track my I/O: https://arxiv.org/abs/1712.06933 So maybe a unified solution for all these documentation instances using a wiki structure seems reasonable at the moment. Maybe upload the generated html from Sphinx and from Vimwiki to readthedocs...? Best, R. -- Renato Fabbri GNU/Linux User #479299 labmacambira.sourceforge.net -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Tue Jan 30 19:33:50 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Tue, 30 Jan 2018 19:33:50 -0500 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: References: <20180129195548.dtopwmanutp6ly4u@fastmail.com> <32dfc3bf-9197-2bf1-9767-25529e2d10fd@gmail.com> <1cd22704-e889-cea1-6aa8-a2126a9d2525@gmail.com> Message-ID: On 01/30/2018 04:54 PM, josef.pktd at gmail.com wrote: > > > On Tue, Jan 30, 2018 at 3:21 PM, Allan Haldane > wrote: > > On 01/30/2018 01:33 PM, josef.pktd at gmail.com > wrote: > > AFAICS, one problem is that the padded view didn't come with the > > matching down stream usage support, the pack function as mentioned, an > > alternative way to convert to a standard ndarray, copy doesn't get rid > > of the padding and so on. > > > > eg. another mailing list thread I just found with the same problem > > http://numpy-discussion.10968.n7.nabble.com/view-of-recarray-issue-td32001.html > > > > > quoting Ralf: > > Question: is that really the recommended way to get an (N, 2) size float > > array from two columns of a larger record array? If so, why isn't there > > a better way? If you'd want to write to that (N, 2) array you have to > > append a copy, making it even uglier. Also, then there really should be > > tests for views in test_records.py. > > > > > > This "better way" never showed up, AFAIK. And it looks like we came back > > to this problem every few years. > > > > Josef > > Since we are at least pushing off this change to a later release > (1.15?), we have some time to prepare/catch up. > > What can we add to numpy.lib.recfunctions to make the multi-field > copy->view change smoother? We have discussed at least two functions: > > ?* repack_fields - rearrange the memory layout of a structured array to > add/remove padding between fields > > ?* structured_to_unstructured - turns a n-D structured array into an > (n+1)-D unstructured ndarray, whose dtype is the highest common type of > all the fields. May want the inverse function too. > > > The only sticky point with statsmodels is to have an equivalent of > a[['b', 'c']].view(('f8', 2)). > > Highest common dtype might be object, the main usecase for this is to > select some elements of a specific dtype and then use them as > standard,homogeneous ndarray. In our case and other cases that I have > seen it is mainly to select a subset of the floating point numbers. > Another case of this might be to combine two strings into one? a[['b', > 'c']].view(('S8'))? ? if b is s5 and c is S3, but I don't think I used > this in serious code. I implemented and put up a draft of these functions in https://github.com/numpy/numpy/pull/10411 I think they satisfy all your cases: code like >>> a = np.ones(3, dtype=[('a', 'f8'), ('b', 'f8'), ('c', 'f8')]) >>> a[['b', 'c']].view(('f8', 2))` becomes: >>> import numpy.lib.recfunctions as rf >>> rf.structured_to_unstructured(a[['b', 'c']]) array([[1., 1.], [1., 1.], [1., 1.]]) The highest common dtype is usually not "Object", since I use `np.result_type` to determine the output type. So two fields of 'S5' and 'S3' result in an 'S5' array. > > for inverse function: I guess it is still possible to view any standard > homogenous ndarray with a structured dtype as long as the itemsize matches. The inverse is implemented too. And it even supports varied field dtypes, nested fields, and subarrays, as you can see in the docstring examples. > Browsing through old mailing list threads, I saw that adding multiple > fields or concatenating two arrays with structured dtypes into an array > with a single combined dtype was missing and I guess still is. (IIRC > this is the usecase where we go now the pandas detour in statsmodels.) > > We might also consider > > ?* apply_along_fields(arr, method) - applies the method along the > "field" axis, equivalent to something like > method(struct_to_unstructured(arr), axis=-1) > > > If this works on a padded view of an existing array, then this would be > an improvement over the current version of having to extract and copy > the relevant fields of an existing structured dtype or loop over > different numeric dtypes, ints, floats. > > In general there will need to be a way to apply `method` only to > selected columns, or columns of a matching dtype. (e.g. We don't want > the sum or mean of a string.) > (e.g. we use ptp() on numeric fields to check if there is already a > constant column in the array or dataframe) Means over selected columns are accounted for using multi-field indexing. For example: >>> b = np.array([(1, 2, 5), (4, 5, 7), (7, 8 ,11), (10, 11, 12)], ... dtype=[('x', 'i4'), ('y', 'f4'), ('z', 'f8')]) >>> rf.apply_along_fields(np.mean, b) array([ 2.66666667, 5.33333333, 8.66666667, 11. ]) >>> rf.apply_along_fields(np.mean, b[['x', 'z']]) array([ 3. , 5.5, 9. , 11. ]) This is unaffected by the 1.14 to 1.15 changes. Allan > > ? > > > > I think these are pretty minimal and shouldn't be too hard to implement. > > > AFAICS, it would cover the statsmodels usage. > > > Josef > > ? > > > Allan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Tue Jan 30 22:09:11 2018 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 30 Jan 2018 22:09:11 -0500 Subject: [Numpy-discussion] Setting custom dtypes and 1.14 In-Reply-To: References: <20180129195548.dtopwmanutp6ly4u@fastmail.com> <32dfc3bf-9197-2bf1-9767-25529e2d10fd@gmail.com> <1cd22704-e889-cea1-6aa8-a2126a9d2525@gmail.com> Message-ID: On Tue, Jan 30, 2018 at 7:33 PM, Allan Haldane wrote: > On 01/30/2018 04:54 PM, josef.pktd at gmail.com wrote: > > > > > > On Tue, Jan 30, 2018 at 3:21 PM, Allan Haldane > > wrote: > > > > On 01/30/2018 01:33 PM, josef.pktd at gmail.com > > wrote: > > > AFAICS, one problem is that the padded view didn't come with the > > > matching down stream usage support, the pack function as > mentioned, an > > > alternative way to convert to a standard ndarray, copy doesn't get > rid > > > of the padding and so on. > > > > > > eg. another mailing list thread I just found with the same problem > > > http://numpy-discussion.10968.n7.nabble.com/view-of- > recarray-issue-td32001.html > > recarray-issue-td32001.html> > > > > > > quoting Ralf: > > > Question: is that really the recommended way to get an (N, 2) size > float > > > array from two columns of a larger record array? If so, why isn't > there > > > a better way? If you'd want to write to that (N, 2) array you have > to > > > append a copy, making it even uglier. Also, then there really > should be > > > tests for views in test_records.py. > > > > > > > > > This "better way" never showed up, AFAIK. And it looks like we > came back > > > to this problem every few years. > > > > > > Josef > > > > Since we are at least pushing off this change to a later release > > (1.15?), we have some time to prepare/catch up. > > > > What can we add to numpy.lib.recfunctions to make the multi-field > > copy->view change smoother? We have discussed at least two functions: > > > > * repack_fields - rearrange the memory layout of a structured array > to > > add/remove padding between fields > > > > * structured_to_unstructured - turns a n-D structured array into an > > (n+1)-D unstructured ndarray, whose dtype is the highest common type > of > > all the fields. May want the inverse function too. > > > > > > The only sticky point with statsmodels is to have an equivalent of > > a[['b', 'c']].view(('f8', 2)). > > > > Highest common dtype might be object, the main usecase for this is to > > select some elements of a specific dtype and then use them as > > standard,homogeneous ndarray. In our case and other cases that I have > > seen it is mainly to select a subset of the floating point numbers. > > Another case of this might be to combine two strings into one a[['b', > > 'c']].view(('S8')) if b is s5 and c is S3, but I don't think I used > > this in serious code. > > I implemented and put up a draft of these functions in > https://github.com/numpy/numpy/pull/10411 Comments based on reading the last commit > > > I think they satisfy all your cases: code like > > >>> a = np.ones(3, dtype=[('a', 'f8'), ('b', 'f8'), ('c', 'f8')]) > >>> a[['b', 'c']].view(('f8', 2))` > > becomes: > > >>> import numpy.lib.recfunctions as rf > >>> rf.structured_to_unstructured(a[['b', 'c']]) > array([[1., 1.], > [1., 1.], > [1., 1.]]) > > The highest common dtype is usually not "Object", since I use > `np.result_type` to determine the output type. So two fields of 'S5' and > 'S3' result in an 'S5' array. > > structured_to_unstructured looks good to me > > > > > for inverse function: I guess it is still possible to view any standard > > homogenous ndarray with a structured dtype as long as the itemsize > matches. > > The inverse is implemented too. And it even supports varied field > dtypes, nested fields, and subarrays, as you can see in the docstring > examples. > > > > Browsing through old mailing list threads, I saw that adding multiple > > fields or concatenating two arrays with structured dtypes into an array > > with a single combined dtype was missing and I guess still is. (IIRC > > this is the usecase where we go now the pandas detour in statsmodels.) > > > > We might also consider > > > > * apply_along_fields(arr, method) - applies the method along the > > "field" axis, equivalent to something like > > method(struct_to_unstructured(arr), axis=-1) > > > > > > If this works on a padded view of an existing array, then this would be > > an improvement over the current version of having to extract and copy > > the relevant fields of an existing structured dtype or loop over > > different numeric dtypes, ints, floats. > > > > In general there will need to be a way to apply `method` only to > > selected columns, or columns of a matching dtype. (e.g. We don't want > > the sum or mean of a string.) > > (e.g. we use ptp() on numeric fields to check if there is already a > > constant column in the array or dataframe) > > Means over selected columns are accounted for using multi-field > indexing. For example: > > >>> b = np.array([(1, 2, 5), (4, 5, 7), (7, 8 ,11), (10, 11, 12)], > ... dtype=[('x', 'i4'), ('y', 'f4'), ('z', 'f8')]) > > >>> rf.apply_along_fields(np.mean, b) > array([ 2.66666667, 5.33333333, 8.66666667, 11. ]) > > >>> rf.apply_along_fields(np.mean, b[['x', 'z']]) > array([ 3. , 5.5, 9. , 11. ]) > actually, I would have expected apply_along_columns, i.e. reduce over all observations each field. This might need an axis argument. However, in the current form it is less practical than doing it ourselves with structured_to_unstructured because it makes a copy each time of all elements. e.g. rf.apply_along_fields(np.mean, b[['x', 'z']]) rf.apply_along_fields(np.std, b[['x', 'z']]) would do the same structured_to_unstructured copy of all array elements twice. Josef > > > This is unaffected by the 1.14 to 1.15 changes. > > Allan > > > > > > > > > > > > > I think these are pretty minimal and shouldn't be too hard to > implement. > > > > > > AFAICS, it would cover the statsmodels usage. > > > > > > Josef > > > > > > > > > > Allan > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at python.org > > https://mail.python.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jalnliu at lbl.gov Wed Jan 31 01:25:03 2018 From: jalnliu at lbl.gov (Jialin Liu) Date: Tue, 30 Jan 2018 22:25:03 -0800 Subject: [Numpy-discussion] Extending C with Python Message-ID: Hello, I'm extending C with python (which is opposite way of what people usually do, extending python with C), I'm currently stuck in passing a C array to python layer, could anyone plz advise? I have a C buffer in my C code and want to pass it to a python function. In the C code, I have: npy_intp dims [2]; > dims[0] = 10; > dims[1] = 20; > import_array(); > npy_intp m=2; > PyObject * py_dims = PyArray_SimpleNewFromData(1, &m, NPY_INT16 ,(void > *)dims ); // I also tried NPY_INT > PyObject_CallMethod(pInstance, method_name, "O", py_dims); In the Python code, I want to just print that array: def f(self, dims): print ("np array:%d,%d"%(dims[0],dims[1])) But it only prints the first number correctly, i.e., dims[0]. The second number is always 0. Best, Jialin LBNL/NERSC -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Jan 31 01:52:00 2018 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 31 Jan 2018 15:52:00 +0900 Subject: [Numpy-discussion] Extending C with Python In-Reply-To: References: Message-ID: On Wed, Jan 31, 2018 at 3:25 PM, Jialin Liu wrote: > Hello, > I'm extending C with python (which is opposite way of what people usually > do, extending python with C), I'm currently stuck in passing a C array to > python layer, could anyone plz advise? > > I have a C buffer in my C code and want to pass it to a python function. > In the C code, I have: > > npy_intp dims [2]; >> dims[0] = 10; >> dims[1] = 20; >> import_array(); >> npy_intp m=2; >> PyObject * py_dims = PyArray_SimpleNewFromData(1, &m, NPY_INT16 ,(void >> *)dims ); // I also tried NPY_INT >> PyObject_CallMethod(pInstance, method_name, "O", py_dims); > > > In the Python code, I want to just print that array: > > def f(self, dims): > > print ("np array:%d,%d"%(dims[0],dims[1])) > > > > But it only prints the first number correctly, i.e., dims[0]. The second > number is always 0. > The correct typecode would be NPY_INTP. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From jalnliu at lbl.gov Wed Jan 31 01:57:12 2018 From: jalnliu at lbl.gov (Jialin Liu) Date: Tue, 30 Jan 2018 22:57:12 -0800 Subject: [Numpy-discussion] Extending C with Python In-Reply-To: References: Message-ID: Amazing! It works! Thank you Robert. I've been stuck with this many days. Best, Jialin LBNL/NERSC On Tue, Jan 30, 2018 at 10:52 PM, Robert Kern wrote: > On Wed, Jan 31, 2018 at 3:25 PM, Jialin Liu wrote: > >> Hello, >> I'm extending C with python (which is opposite way of what people usually >> do, extending python with C), I'm currently stuck in passing a C array to >> python layer, could anyone plz advise? >> >> I have a C buffer in my C code and want to pass it to a python function. >> In the C code, I have: >> >> npy_intp dims [2]; >>> dims[0] = 10; >>> dims[1] = 20; >>> import_array(); >>> npy_intp m=2; >>> PyObject * py_dims = PyArray_SimpleNewFromData(1, &m, NPY_INT16 ,(void >>> *)dims ); // I also tried NPY_INT >>> PyObject_CallMethod(pInstance, method_name, "O", py_dims); >> >> >> In the Python code, I want to just print that array: >> >> def f(self, dims): >> >> print ("np array:%d,%d"%(dims[0],dims[1])) >> >> >> >> But it only prints the first number correctly, i.e., dims[0]. The second >> number is always 0. >> > > The correct typecode would be NPY_INTP. > > -- > Robert Kern > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solarjoe at posteo.org Wed Jan 31 02:06:39 2018 From: solarjoe at posteo.org (Joe) Date: Wed, 31 Jan 2018 08:06:39 +0100 Subject: [Numpy-discussion] Using np.frombuffer and cffi.buffer on array of C structs (problem with struct member padding) In-Reply-To: References: <87po6e234f.fsf@gmail.com> <878td2ny6m.fsf@otaria.sebmel.org> <8bd9f09ce2eb55140bc65775201ed47c@posteo.de> <88dbefc1-ab01-8efe-420c-102e13e3aaed@gmail.com> Message-ID: Does someone know of a function or a convenient way to automatically derive a dtype object from a C typedef struct string or a cffi.typeof()? Am 27.01.2018 10:30 schrieb Joe: > Thanks for your help on this! This solved my issue. > > > Am 25.01.2018 um 19:01 schrieb Allan Haldane: >> There is a new section discussing alignment in the numpy 1.14 >> structured >> array docs, which has some hints about interfacing with C structs. >> >> These new 1.14 docs are not online yet on scipy.org, but in the >> meantime >> you can view them here: >> https://ahaldane.github.io/user/basics.rec.html#automatic-byte-offsets-and-alignment >> >> (That links specifically to the discussion of alignments and padding). >> >> Allan >> >> On 01/25/2018 11:33 AM, Chris Barker - NOAA Federal wrote: >>> >>>> >>>> The numpy dtype constructor takes an ?align? keyword that will pad >>>> it >>>> for you. >>> >>> https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.dtype.html >>> >>> -CHB >>> >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From allanhaldane at gmail.com Wed Jan 31 10:41:32 2018 From: allanhaldane at gmail.com (Allan Haldane) Date: Wed, 31 Jan 2018 10:41:32 -0500 Subject: [Numpy-discussion] Using np.frombuffer and cffi.buffer on array of C structs (problem with struct member padding) In-Reply-To: References: <87po6e234f.fsf@gmail.com> <878td2ny6m.fsf@otaria.sebmel.org> <8bd9f09ce2eb55140bc65775201ed47c@posteo.de> <88dbefc1-ab01-8efe-420c-102e13e3aaed@gmail.com> Message-ID: <28c401cd-715b-6763-36a6-d5c0da052091@gmail.com> On 01/31/2018 02:06 AM, Joe wrote: > Does someone know of a function or a convenient way to automatically > derive a dtype object from a C typedef struct string or a cffi.typeof()? I remember when I wrote those docs over a year ago, I searched for an established way to do this but didn't find one. If you find/come up with one, let us know and I might add a pointer or description of it in the docs. Searching just now, I came across this recent gist: https://gist.github.com/inactivist/4ef7058c2132fa16759d So it looks like we can do something like: # typemap from C type to numpy type may need some work... typemap = {'char': 'byte', 'short': 'short', 'int': 'intc', 'long long': 'longlong', 'size_t': 'intp', 'float': 'float', 'double': 'double'} def extract_field_info(tp): fields = [] for field, fieldtype in tp.fields: if fieldtype.type.kind == 'primitive': format = typemap[fieldtype.type.cname] else: format = extract_field_info(fieldtype.type) fields.append((field, format, fieldtype.offset)) return fields from cffi import FFI ffi = FFI() ffi.cdef("struct foo { int a; char b; double d;};") tp = ffi.typeof('struct foo') names, formats, offsets = zip(*extract_field_info(tp)) np.dtype({'names': names, 'formats': formats, 'offsets': offsets}) # compare to: np.dtype('i4,i1,f8', align=True) I just made that up now and it may be buggy and have missing corner cases. But if you agree it works, maybe we can mention it in the structured array docs instead of the vague sentence there now, that "some work may be needed to obtain correspondence with the C struct". Allan > Am 27.01.2018 10:30 schrieb Joe: >> Thanks for your help on this! This solved my issue. >> >> >> Am 25.01.2018 um 19:01 schrieb Allan Haldane: >>> There is a new section discussing alignment in the numpy 1.14 structured >>> array docs, which has some hints about interfacing with C structs. >>> >>> These new 1.14 docs are not online yet on scipy.org, but in the meantime >>> ? you can view them here: >>> https://ahaldane.github.io/user/basics.rec.html#automatic-byte-offsets-and-alignment >>> >>> >>> (That links specifically to the discussion of alignments and padding). >>> >>> Allan >>> >>> On 01/25/2018 11:33 AM, Chris Barker - NOAA Federal wrote: >>>> >>>>> >>>>> The numpy dtype constructor takes an ?align? keyword that will pad it >>>>> for you. >>>> >>>> https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.dtype.html >>>> >>>> >>>> -CHB >>>> >>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at python.org >>>> https://mail.python.org/mailman/listinfo/numpy-discussion >>>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at python.org >>> https://mail.python.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion From renato.fabbri at gmail.com Wed Jan 31 17:38:22 2018 From: renato.fabbri at gmail.com (Renato Fabbri) Date: Wed, 31 Jan 2018 20:38:22 -0200 Subject: [Numpy-discussion] documentation, sound and music [was: doc? music through mathematical relations between LPCM samples and musical elements/characteristics] Message-ID: Dear Scipy-ers, If you think I should split the message so that things get more clear... But the things are: 1) How to document numpy-heavy projects? 2) How to better make these contributions available to the numpy/scipy community? Directions will be greatly appreciated. I suspect that this info is all gathered somewhere I did not find. PS. I apologize for the cross-post, this is a message sent to numpy list but without any replies, so I found fit to send the message to both lists. Thanks in advance, Best, R. ---------- Forwarded message ---------- From: Renato Fabbri Date: Tue, Jan 30, 2018 at 8:32 PM Subject: doc? music through mathematical relations between LPCM samples and musical elements/characteristics To: Discussion of Numerical Python the half-shape suite: archive.org/details/ShapeSuite was completely synthesized using psychophysical relations for each resulting 16bit 44kHz samples: https://arxiv.org/abs/1412.6853 I am thinking about the ways in which to make the documentation at least to: * mass (music and audio in sample sequences): the framework. It might be considered a toolbox, but it is not a package: https://github.com/ttm/mass/ * music: a package for making music, including the routines used to make the half-shape suite: https://github.com/ttm/music == It seems reasonable at first to: * upload the package to PyPI (mass is not a package), music is there (beta but there). * Make available some summary of the file tree, including some automated code documentation. As I follow numpy's convention (or try to), and the package and framework are particularly useful for proficient numpy users, I used Sphinx. The very preliminary version is: https://pythonmusic.github.io/ * Make a nice readthedocs documentation. "music" project name was taken so I made: http://music-documentation.readthedocs.io/en/latest/ "mass" project name was also taken so I made: http://musicmass.readthedocs.io/en/latest/ And I found these URLs clumsy. Is there a standard or would you go with musicpackage.read.. and massframework.read... ? More importantly: * would you use readthedocs for the sphinx/doxigen output? * would you use readthedocs.org or a gitbook would be better or...? Should I contact the scipy community to make available a scikit or integrate it to numpy in any way beyond using and citing it appropriately?. BTW. I am using vimwiki with some constant attempt to organize and track my I/O: https://arxiv.org/abs/1712.06933 So maybe a unified solution for all these documentation instances using a wiki structure seems reasonable at the moment. Maybe upload the generated html from Sphinx and from Vimwiki to readthedocs...? Best, R. -- Renato Fabbri GNU/Linux User #479299 labmacambira.sourceforge.net -- Renato Fabbri GNU/Linux User #479299 labmacambira.sourceforge.net -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Jan 31 17:58:34 2018 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 31 Jan 2018 14:58:34 -0800 Subject: [Numpy-discussion] Extending C with Python In-Reply-To: References: Message-ID: I'm guessing you could use Cython to make this easier. It's usually used for calling C from Python, but can do the sandwich in both directions... Just a thought -- it will help with some of that boilerplate code... -CHB On Tue, Jan 30, 2018 at 10:57 PM, Jialin Liu wrote: > Amazing! It works! Thank you Robert. > > I've been stuck with this many days. > > Best, > Jialin > LBNL/NERSC > > On Tue, Jan 30, 2018 at 10:52 PM, Robert Kern > wrote: > >> On Wed, Jan 31, 2018 at 3:25 PM, Jialin Liu wrote: >> >>> Hello, >>> I'm extending C with python (which is opposite way of what people >>> usually do, extending python with C), I'm currently stuck in passing a C >>> array to python layer, could anyone plz advise? >>> >>> I have a C buffer in my C code and want to pass it to a python function. >>> In the C code, I have: >>> >>> npy_intp dims [2]; >>>> dims[0] = 10; >>>> dims[1] = 20; >>>> import_array(); >>>> npy_intp m=2; >>>> PyObject * py_dims = PyArray_SimpleNewFromData(1, &m, NPY_INT16 ,(void >>>> *)dims ); // I also tried NPY_INT >>>> PyObject_CallMethod(pInstance, method_name, "O", py_dims); >>> >>> >>> In the Python code, I want to just print that array: >>> >>> def f(self, dims): >>> >>> print ("np array:%d,%d"%(dims[0],dims[1])) >>> >>> >>> >>> But it only prints the first number correctly, i.e., dims[0]. The second >>> number is always 0. >>> >> >> The correct typecode would be NPY_INTP. >> >> -- >> Robert Kern >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at seefeld.name Wed Jan 31 18:11:52 2018 From: stefan at seefeld.name (Stefan Seefeld) Date: Wed, 31 Jan 2018 18:11:52 -0500 Subject: [Numpy-discussion] Extending C with Python In-Reply-To: References: Message-ID: On 31.01.2018 17:58, Chris Barker wrote: > I'm guessing you could use Cython to make this easier. ... or Boost.Python (http://boostorg.github.io/python), which has built-in support for NumPy (http://boostorg.github.io/python/doc/html/numpy/index.html), and supports both directions: extending Python with C++, as well as embedding Python into C++ applications. Stefan -- ...ich hab' noch einen Koffer in Berlin... -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.png Type: image/png Size: 1478 bytes Desc: not available URL: From robert.kern at gmail.com Wed Jan 31 18:20:48 2018 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 1 Feb 2018 08:20:48 +0900 Subject: [Numpy-discussion] documentation, sound and music [was: doc? music through mathematical relations between LPCM samples and musical elements/characteristics] In-Reply-To: References: Message-ID: On Thu, Feb 1, 2018 at 7:38 AM, Renato Fabbri wrote: > > Dear Scipy-ers, > > If you think I should split the message so that > things get more clear... > > But the things are: > 1) How to document numpy-heavy projects? There is nothing particularly special about numpy-heavy projects with respect to documentation. The tools used for numpy and scipy's documentation may (or may not) be useful to you. For example, you may want to embed matplotlib plots into your documentation. https://pypi.python.org/pypi/numpydoc But if not, don't worry about it. Sphinx is all most numpy-heavy projects need, like most Python projects. Hosting the documentation on either readthedocs or Github is fine. Do whatever's convenient for you. Either is just as convenient for us. > 2) How to better make these contributions available > to the numpy/scipy community? Having it up on Github and PyPI is all you need to do. Participate in relevant mailing list conversations. While we don't forbid release announcement emails on either of the lists, I wouldn't do much more than announce the initial release and maybe a really major update (i.e. not for every bugfix release). > Directions will be greatly appreciated. > I suspect that this info is all gathered somewhere > I did not find. Sorry this isn't gathered anywhere, but truly, the answer is "there is not much to it". You're doing everything right. :-) -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From renato.fabbri at gmail.com Wed Jan 31 19:29:32 2018 From: renato.fabbri at gmail.com (Renato Fabbri) Date: Wed, 31 Jan 2018 22:29:32 -0200 Subject: [Numpy-discussion] documentation, sound and music [was: doc? music through mathematical relations between LPCM samples and musical elements/characteristics] In-Reply-To: References: Message-ID: I think you gave me the answers: 0) It seems reasonable to just use sphinx, no jekyll or anything more, only if a need for it is found. I might use some HTML generated by vimwiki for pure convenience. 1) Make the main documentation with sphinx. That means joining: and 2) github.io, gitbook, readthedocs, all is just as fine. Maybe github.io makes it easier and has more features, while readthedocs is maybe more traditional for these kinds of python-related docs. 3) just realese on pypi, make the documentation fine and tell the list. If any major release deserves a message, it is ok to send. 3.1) A scikit to make psychoacoutic experiments, synthesize audio and music, is not absurd nor deeply compelling. 4) Am I loosing something? Being mindful of this question is always good, but that is it, we are on track. (the python and latex files are on github, so there is an issue tracker, a wiki etc etc there if anything comes up) tx++ On Wed, Jan 31, 2018 at 9:20 PM, Robert Kern wrote: > On Thu, Feb 1, 2018 at 7:38 AM, Renato Fabbri > wrote: > > > > Dear Scipy-ers, > > > > If you think I should split the message so that > > things get more clear... > > > > But the things are: > > 1) How to document numpy-heavy projects? > > There is nothing particularly special about numpy-heavy projects with > respect to documentation. The tools used for numpy and scipy's > documentation may (or may not) be useful to you. For example, you may want > to embed matplotlib plots into your documentation. > > https://pypi.python.org/pypi/numpydoc > > But if not, don't worry about it. Sphinx is all most numpy-heavy projects > need, like most Python projects. Hosting the documentation on either > readthedocs or Github is fine. Do whatever's convenient for you. Either is > just as convenient for us. > > > 2) How to better make these contributions available > > to the numpy/scipy community? > > Having it up on Github and PyPI is all you need to do. Participate in > relevant mailing list conversations. While we don't forbid release > announcement emails on either of the lists, I wouldn't do much more than > announce the initial release and maybe a really major update (i.e. not for > every bugfix release). > > > Directions will be greatly appreciated. > > I suspect that this info is all gathered somewhere > > I did not find. > > Sorry this isn't gathered anywhere, but truly, the answer is "there is not > much to it". You're doing everything right. :-) > > -- > Robert Kern > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > > -- Renato Fabbri GNU/Linux User #479299 labmacambira.sourceforge.net -------------- next part -------------- An HTML attachment was scrubbed... URL: