Mailman 3 March 2016 - NumPy-Discussion

ANN: pandas v0.18.0 Final released
by Jeff Reback March 12, 2016

March 12, 2016

Hi, This is a major release from 0.17.1 and includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version. This was a release of 3.5 months with 381 commits by 100 authors encompassing 465 issues and 290 pull-requests. *What is it:* *pandas* is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational”… [View More]

1 0

Fwd: Looping and searching in numpy array
by Hedieh Ebrahimi March 10, 2016

March 10, 2016

Dear all, I need to loop over a numpy array and then do the following search. The following is taking almost 60(s) for an array (npArray1 and npArray2 in the example below) with around 300K values. In other words, I am looking for the index of the first occurence in npArray2 for every value of npArray1. for id in np.nditer(npArray1): newId=(np.where(npArray2==id))[0][0] Is there anyway I can make the above faster using numpy? I need to run the script above on much bigger arrays (… [View More]

1 0

ANN: pandas v0.18.0rc2 - RELEASE CANDIDATE
by Jeff Reback March 9, 2016

March 9, 2016

Hi, I'm pleased to announce the availability of the second release candidate of Pandas 0.18.0. Please try this RC and report any issues here: Pandas Issues <https://github.com/pydata/pandas/issues>. Compared to RC1, we have added updated read_sas and fixed float indexing. We will be releasing officially very shortly. THIS IS NOT A PRODUCTION RELEASE This is a major release from 0.17.1 and includes a small number of API changes, several new features, enhancements, and performance … [View More]

1 0

100 numpy exercises (80/100)
by Nicolas P. Rougier March 9, 2016

March 9, 2016

Hi all, I've just added some exercises to the collection at https://github.com/rougier/numpy-100 (and in the process, I've discovered np.argpartition... nice!) If you have some ideas/comments/corrections... Still 20 to go... Nicolas

2 1

Windows wheels, built, but should we deploy?
by Matthew Brett March 8, 2016

March 8, 2016

Hi, Summary: I propose that we upload Windows wheels to pypi. The wheels are likely to be stable and relatively easy to maintain, but will have slower performance than other versions of numpy linked against faster BLAS / LAPACK libraries. Background: There's a long discussion going on at issue github #5479 [1], where the old problem of Windows wheels for numpy came up. For those of you not following this issue, the current situation for community-built numpy Windows binaries is dire: * … [View More]We have not so far provided windows wheels on pypi, so `pip install numpy` on Windows will bring you a world of pain; * Until recently we did provide .exe "superpack" installers on sourceforge, but these became increasingly difficult to build and we gave up building them as of the latest (1.10.4) release. Despite this, popularity of Windows wheels on pypi is high. A few weeks ago, Donald Stufft ran a query for the binary wheels most often downloaded from pypi, for any platform [2] . The top five most downloaded were (n_downloads, name): 6646, numpy-1.10.4-cp27-none-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl 5445, cryptography-1.2.1-cp27-none-win_amd64.whl 5243, matplotlib-1.4.0-cp34-none-win32.whl 5241, scikit_learn-0.15.1-cp34-none-win32.whl 4573, pandas-0.17.1-cp27-none-win_amd64.whl So a) the OSX numpy wheel is very popular and b) despite the fact that we don't provide a numpy wheel for Windows, matplotlib, sckit_learn and pandas, that depend on numpy, are the 3rd, 4th and 5th most downloaded wheels as of a few weeks ago. So, there seems to be a large appetite for numpy wheels. Current proposal: I have now built numpy wheels, using the ATLAS blas / lapack library - the build is automatic and reproducible [3]. I chose ATLAS to build against, rather than, say OpenBLAS, because we've had some significant worries in the past about the reliability of OpenBLAS, and I thought it better to err on the side of correctness. However, these builds are relatively slow for matrix multiply and other linear algebra routines compared numpy built against OpenBLAS or MKL (which we cannot use because of its license) [4]. In my very crude array test of a dot product and matrix inversion, the ATLAS wheels were 2-3 times slower than MKL. Other benchmarks on Julia found about the same result for ATLAS vs OpenBLAS on 32-bit bit, but a much bigger difference on 64-bit (for an earlier version of ATLAS than we are currently using) [5]. So, our numpy wheels likely to be stable and give correct results, but will be somewhat slow for linear algebra. I propose that we upload these ATLAS wheels to pypi. The upside is that this gives our Windows users a much better experience with pip, and allows other developers to build Windows wheels that depend on numpy. The downside is that these will not be optimized for performance on modern processors. In order to signal that, I propose adding the following text to the numpy pypi front page: ``` All numpy wheels distributed from pypi are BSD licensed. Windows wheels are linked against the ATLAS BLAS / LAPACK library, restricted to SSE2 instructions, so may not give optimal linear algebra performance for your machine. See http://docs.scipy.org/doc/numpy/user/install.html for alternatives. ``` In a way this is very similar to our previous situation, in that the superpack installers also used ATLAS - in fact an older version of ATLAS. Once we are up and running with numpy wheels, we can consider whether we should switch to other BLAS libraries, such as OpenBLAS or BLIS (see [6]). I'm posting here hoping for your feedback... Cheers, Matthew [1] https://github.com/numpy/numpy/issues/5479 [2] https://gist.github.com/dstufft/1dda9a9f87ee7121e0ee [3] https://ci.appveyor.com/project/matthew-brett/np-wheel-builder [4] http://mingwpy.github.io/blas_lapack.html#intel-math-kernel-library [5] https://github.com/numpy/numpy/issues/5479#issuecomment-185033668 [6] https://github.com/numpy/numpy/issues/7372 [View Less]

7 9

tracemalloc + numpy?
by Neal Becker March 8, 2016

March 8, 2016

I'm trying tracemalloc to find memory usage. Will numpy array memory usage be counted by tracemalloc? (Doesn't seem to)

2 1

[ANN] bcolz 1.0.0 RC1 released
by Francesc Alted March 8, 2016

March 8, 2016

========================== Announcing bcolz 1.0.0 RC1 ========================== What's new ========== Yeah, 1.0.0 is finally here. We are not introducing any exciting new feature (just some optimizations and bug fixes), but bcolz is already 6 years old and it implements most of the capabilities that it was designed for, so I decided to release a 1.0.0 meaning that the format is declared stable and that people can be assured that future bcolz releases will be able to read bcolz 1.0 data … [View More]files (and probably much earlier ones too) for a long while. Such a format is fully described at: https://github.com/Blosc/bcolz/blob/master/DISK_FORMAT_v1.rst Also, a 1.0.0 release means that bcolz 1.x series will be based on C-Blosc 1.x series (https://github.com/Blosc/c-blosc). After C-Blosc 2.x (https://github.com/Blosc/c-blosc2) would be out, a new bcolz 2.x is expected taking advantage of shiny new features of C-Blosc2 (more compressors, more filters, native variable length support and the concept of super-chunks), which should be very beneficial for next bcolz generation. Important: this is a Release Candidate, so please test it as much as you can. If no issues would appear in a week or so, I will proceed to tag and release 1.0.0 final. Enjoy! For a more detailed change log, see: https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst What it is ========== *bcolz* provides columnar and compressed data containers that can live either on-disk or in-memory. Column storage allows for efficiently querying tables with a large number of columns. It also allows for cheap addition and removal of column. In addition, bcolz objects are compressed by default for reducing memory/disk I/O needs. The compression process is carried out internally by Blosc, an extremely fast meta-compressor that is optimized for binary data. Lastly, high-performance iterators (like ``iter()``, ``where()``) for querying the objects are provided. bcolz can use numexpr internally so as to accelerate many vector and query operations (although it can use pure NumPy for doing so too). numexpr optimizes the memory usage and use several cores for doing the computations, so it is blazing fast. Moreover, since the carray/ctable containers can be disk-based, and it is possible to use them for seamlessly performing out-of-memory computations. bcolz has minimal dependencies (NumPy), comes with an exhaustive test suite and fully supports both 32-bit and 64-bit platforms. Also, it is typically tested on both UNIX and Windows operating systems. Together, bcolz and the Blosc compressor, are finally fulfilling the promise of accelerating memory I/O, at least for some real scenarios: http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/queryi… Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) the Blaze project (http://blaze.pydata.org/), Quantopian (https://www.quantopian.com/) and Scikit-Allel (https://github.com/cggh/scikit-allel) which you can read more about by pointing your browser at the links below. * Visualfabriq: * *bquery*, A query and aggregation framework for Bcolz: * https://github.com/visualfabriq/bquery * Blaze: * Notebooks showing Blaze + Pandas + BColz interaction: * http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-csv.ipynb * http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-bcolz.ip… * Quantopian: * Using compressed data containers for faster backtesting at scale: * https://quantopian.github.io/talks/NeedForSpeed/slides.html * Scikit-Allel * Provides an alternative backend to work with compressed arrays * https://scikit-allel.readthedocs.org/en/latest/model/bcolz.html Installing ========== bcolz is in the PyPI repository, so installing it is easy:: $ pip install -U bcolz Resources ========= Visit the main bcolz site repository at: http://github.com/Blosc/bcolz Manual: http://bcolz.blosc.org Home of Blosc compressor: http://blosc.org User's mail list: bcolz(a)googlegroups.com http://groups.google.com/group/bcolz License is the new BSD: https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt Release notes can be found in the Git repository: https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst ---- **Enjoy data!** -- Francesc Alted [View Less]

1 0

GSoC?
by Chris Barker March 6, 2016

March 6, 2016

ANyone interested in Google Summer of Code this year? I think the real challenge is having folks with the time to really put into mentoring, but if folks want to do it -- numpy could really benefit. Maybe as a python.org sub-project? https://wiki.python.org/moin/SummerOfCode/2016 Deadlines are approaching -- so I thought I'd ping the list and see if folks are interested. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) … [View More]

10 16

Ufunc identity for bitwise reduction of object arrays.
by Charles R Harris March 4, 2016

March 4, 2016

Hi All, There is currently some discussion <https://github.com/numpy/numpy/pull/7373> on whether or not object arrays should have an identity for bitwise reductions. Currently, they do not use the identity for non-empty arrays, so this would only affect reductions on empty arrays. Currently bitwise_or, bitwise_xor, and bitwise_and will return (bool_) 0, (bool_) 0, and (int) -1 respectively in that case. Note the non-object arrays work as they should, the question is only about object arrays. Thougts? Chuck

1 0

Weighted percentile / quantile
by Alex Rogozhnikov March 2, 2016

March 2, 2016

Hi, I know the topic was already raised a long ago: https://mail.scipy.org/pipermail/numpy-discussion/2010-July/051851.html There are also several questions on SO: http://stackoverflow.com/questions/20601872/numpy-or-scipy-to-calculate-wei… http://stackoverflow.com/questions/13546146/percentile-calculation-with-wei… http://stackoverflow.com/questions/26102867/python-weighted-median-algorith… The only working solution with numpy: http://stackoverflow.com/questions/21844024/weighted-… [View More]

2 2