From robbmcleod at gmail.com Wed Feb 1 03:28:12 2017 From: robbmcleod at gmail.com (Robert McLeod) Date: Wed, 1 Feb 2017 09:28:12 +0100 Subject: [Numpy-discussion] composing Euler rotation matrices In-Reply-To: <87d1f24ml4.fsf@otaria.sebmel.org> References: <87h94e4tkx.fsf@otaria.sebmel.org> <87d1f24ml4.fsf@otaria.sebmel.org> Message-ID: Instead of trying to decipher what someone wrote on a Wikipedia, why don't you look at a working piece of source code? e.g. https://github.com/3dem/relion/blob/master/src/euler.cpp Robert On Wed, Feb 1, 2017 at 4:27 AM, Seb wrote: > On Tue, 31 Jan 2017 21:23:55 -0500, > Joseph Fox-Rabinovitz wrote: > > > Could you show what you are doing to get the statement "However, I > > cannot reproduce this matrix via composition; i.e. by multiplying the > > underlying rotation matrices.". I would guess something involving the > > `*` operator instead of `@`, but guessing probably won't help you > > solve your issue. > > Sure, although composition is not something I can take credit for, as > it's a well-described operation for generating linear transformations. > It is the matrix multiplication of two or more transformation matrices. > In the case of Euler transformations, it's matrices specifying rotations > around 3 orthogonal axes by 3 given angles. I'm using `numpy.dot' to > perform matrix multiplication on 2D arrays representing matrices. > > However, it's not obvious from the link I provided what particular > rotation matrices are multiplied and in what order (i.e. what > composition) is used to arrive at the Z1Y2X3 rotation matrix shown. > Perhaps I'm not understanding the conventions used therein. This is one > of my attempts at reproducing that rotation matrix via composition: > > ---<--------------------cut here---------------start------ > ------------->--- > import numpy as np > > angles = np.radians(np.array([30, 20, 10])) > > def z1y2x3(alpha, beta, gamma): > """Z1Y2X3 rotation matrix given Euler angles""" > return np.array([[np.cos(alpha) * np.cos(beta), > np.cos(alpha) * np.sin(beta) * np.sin(gamma) - > np.cos(gamma) * np.sin(alpha), > np.sin(alpha) * np.sin(gamma) + > np.cos(alpha) * np.cos(gamma) * np.sin(beta)], > [np.cos(beta) * np.sin(alpha), > np.cos(alpha) * np.cos(gamma) + > np.sin(alpha) * np.sin(beta) * np.sin(gamma), > np.cos(gamma) * np.sin(alpha) * np.sin(beta) - > np.cos(alpha) * np.sin(gamma)], > [-np.sin(beta), np.cos(beta) * np.sin(gamma), > np.cos(beta) * np.cos(gamma)]]) > > euler_mat = z1y2x3(angles[0], angles[1], angles[2]) > > ## Now via composition > > def rotation_matrix(theta, axis, active=False): > """Generate rotation matrix for a given axis > > Parameters > ---------- > > theta: numeric, optional > The angle (degrees) by which to perform the rotation. Default is > 0, which means return the coordinates of the vector in the rotated > coordinate system, when rotate_vectors=False. > axis: int, optional > Axis around which to perform the rotation (x=0; y=1; z=2) > active: bool, optional > Whether to return active transformation matrix. > > Returns > ------- > numpy.ndarray > 3x3 rotation matrix > """ > theta = np.radians(theta) > if axis == 0: > R_theta = np.array([[1, 0, 0], > [0, np.cos(theta), -np.sin(theta)], > [0, np.sin(theta), np.cos(theta)]]) > elif axis == 1: > R_theta = np.array([[np.cos(theta), 0, np.sin(theta)], > [0, 1, 0], > [-np.sin(theta), 0, np.cos(theta)]]) > else: > R_theta = np.array([[np.cos(theta), -np.sin(theta), 0], > [np.sin(theta), np.cos(theta), 0], > [0, 0, 1]]) > if active: > R_theta = np.transpose(R_theta) > return R_theta > > ## The rotations are given as active > xmat = rotation_matrix(angles[2], 0, active=True) > ymat = rotation_matrix(angles[1], 1, active=True) > zmat = rotation_matrix(angles[0], 2, active=True) > ## The operation seems to imply this composition > euler_comp_mat = np.dot(xmat, np.dot(ymat, zmat)) > ---<--------------------cut here---------------end-------- > ------------->--- > > I believe the matrices `euler_mat' and `euler_comp_mat' should be the > same, but they aren't, so it's unclear to me what particular composition > is meant to produce the matrix specified by this Z1Y2X3 transformation. > What am I missing? > > -- > Seb > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Robert McLeod, Ph.D. Center for Cellular Imaging and Nano Analytics (C-CINA) Biozentrum der Universit?t Basel Mattenstrasse 26, 4058 Basel Work: +41.061.387.3225 robert.mcleod at unibas.ch robert.mcleod at bsse.ethz.ch robbmcleod at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From naresh.p at okdollar.com Wed Feb 1 03:31:27 2017 From: naresh.p at okdollar.com (Naresh P) Date: Wed, 1 Feb 2017 15:01:27 +0630 Subject: [Numpy-discussion] Fwd: Installation Problem In-Reply-To: References: Message-ID: *Warm Regards,* *Naresh* *Developer * *Consumer Goods Myanmar Limited ( CGM )No.15, Junction Square Complex, 37-G, Pyay Road,**13041 Kamayut Township**, Yangon, Myanmar.* *Tel: +959979867914* http://www.cg-m.com ---------- Forwarded message ---------- From: Date: Wed, Feb 1, 2017 at 2:59 PM Subject: Fwd: Installation Problem To: naresh.p at okdollar.com This is a members-only list. Your message has been automatically rejected, since it came from a non-member's email address. Please make sure to use the email account that you used to join this list. ---------- Forwarded message ---------- From: Naresh P To: numpy-discussion at scipy.org Cc: Date: Wed, 1 Feb 2017 14:59:08 +0630 Subject: Fwd: Installation Problem *Warm Regards,* *Naresh* *Developer * *Consumer Goods Myanmar Limited ( CGM )No.15, Junction Square Complex, 37-G, Pyay Road,**13041 Kamayut Township**, Yangon, Myanmar.* *Tel: +959979867914* http://www.cg-m.com On Wed, Feb 1, 2017 at 2:54 PM, wrote: > This is a members-only list. Your message has been automatically > rejected, since it came from a non-member's email address. Please > make sure to use the email account that you used to join this list. > > > > ---------- Forwarded message ---------- > From: Naresh P > To: numpy-discussion at scipy.org > Cc: > Date: Wed, 1 Feb 2017 14:54:21 +0630 > Subject: Fwd: Installation Problem > Hi Team, > > i have still that problem , can please help me. > > let me know , i don't know how get ur membership , > > Thanks > > > > *Warm Regards,* > *Naresh* > *Developer * > *Consumer Goods Myanmar Limited ( CGM )No.15, Junction Square > Complex, 37-G, Pyay Road,**13041 Kamayut Township**, Yangon, Myanmar.* > *Tel: +959979867914* > http://www.cg-m.com > > ---------- Forwarded message ---------- > From: Naresh P > Date: Tue, Jan 31, 2017 at 3:20 PM > Subject: Installation Problem > To: numpy-discussion at scipy.org > > > > Hi , > i tried to somany times , but not install ,below msg are display. > help me > > > [image: Inline image 1] > > > *Warm Regards,* > *Naresh* > *Developer * > *Consumer Goods Myanmar Limited ( CGM )No.15, Junction Square > Complex, 37-G, Pyay Road,**13041 Kamayut Township**, Yangon, Myanmar.* > *Tel: +959979867914* > http://www.cg-m.com > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 27658 bytes Desc: not available URL: From mmwoodman at gmail.com Wed Feb 1 03:55:31 2017 From: mmwoodman at gmail.com (Marmaduke Woodman) Date: Wed, 1 Feb 2017 09:55:31 +0100 Subject: [Numpy-discussion] ANN: xarray v0.9 released In-Reply-To: References: Message-ID: > On 1 Feb 2017, at 05:19, Stephan Hoyer wrote: > > This release includes five months worth of enhancements and bug fixes from 24 contributors, including some significant enhancements to the data model that are not fully backwards compatible. Looks very nice; is the API stable or are you waiting for a v1.0 release? Is there significant overhead compared to plain ndarray? From matthew.brett at gmail.com Wed Feb 1 04:42:15 2017 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 1 Feb 2017 09:42:15 +0000 Subject: [Numpy-discussion] composing Euler rotation matrices In-Reply-To: References: <87h94e4tkx.fsf@otaria.sebmel.org> <87d1f24ml4.fsf@otaria.sebmel.org> Message-ID: Hi, On Wed, Feb 1, 2017 at 8:28 AM, Robert McLeod wrote: > Instead of trying to decipher what someone wrote on a Wikipedia, why don't > you look at a working piece of source code? > > e.g. > > https://github.com/3dem/relion/blob/master/src/euler.cpp Also - have a look at https://pypi.python.org/pypi/transforms3d - and in particular you might get some use from symbolic versions of the transformations, e.g. here : https://github.com/matthew-brett/transforms3d/blob/master/transforms3d/derivations/eulerangles.py It's really easy to mix up the conventions, as I'm sure you know - see http://matthew-brett.github.io/transforms3d/reference/transforms3d.euler.html Cheers, Matthew From shoyer at gmail.com Wed Feb 1 12:33:51 2017 From: shoyer at gmail.com (Stephan Hoyer) Date: Wed, 1 Feb 2017 09:33:51 -0800 Subject: [Numpy-discussion] ANN: xarray v0.9 released In-Reply-To: References: Message-ID: On Wed, Feb 1, 2017 at 12:55 AM, Marmaduke Woodman wrote: > Looks very nice; is the API stable or are you waiting for a v1.0 release? > We are pretty close to full API stability but not quite there yet. Enough people are using xarray in production that breaking changes are made with serious caution (and deprecation cycles whenever feasible). The only major backwards-incompatible change planned is an overhaul of indexing to use labeled broadcasting and alignment: https://github.com/pydata/xarray/issues/974 There are a few other "nice to have" features for v1.0 but that's the only one that has the potential to change functionality in a way that we can't cleanly deprecate. > Is there significant overhead compared to plain ndarray? Xarray is implemented in Python (not C), so it does have significant overhead for every operation. Adding two arrays takes ~100 us, rather than <1 us in NumPy. So you don't want to use it in your inner loop. That said, the overhead is independent of the size of the array. So if you work with large arrays, it is negligible. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stuart at stuartreynolds.net Wed Feb 1 13:16:26 2017 From: stuart at stuartreynolds.net (Stuart Reynolds) Date: Wed, 1 Feb 2017 10:16:26 -0800 Subject: [Numpy-discussion] composing Euler rotation matrices In-Reply-To: References: <87h94e4tkx.fsf@otaria.sebmel.org> <87d1f24ml4.fsf@otaria.sebmel.org> Message-ID: [off topic] Nothing good ever comes from using Euler matrices. All the cool kids a using quaternions these days. They're (in some ways) simpler, can be interpolated easily, don't suffer from gimbal lock (discontinuity), and are not confused about which axis rotation is applied first (for Euler you much decide whether you want to apply x.y.z or z.y.x). They'd be a good addition to numpy. On Wed, Feb 1, 2017 at 1:42 AM, Matthew Brett wrote: > Hi, > > On Wed, Feb 1, 2017 at 8:28 AM, Robert McLeod > wrote: > > Instead of trying to decipher what someone wrote on a Wikipedia, why > don't > > you look at a working piece of source code? > > > > e.g. > > > > https://github.com/3dem/relion/blob/master/src/euler.cpp > > Also - have a look at https://pypi.python.org/pypi/transforms3d - and > in particular you might get some use from symbolic versions of the > transformations, e.g. here : > https://github.com/matthew-brett/transforms3d/blob/master/transforms3d/ > derivations/eulerangles.py > > It's really easy to mix up the conventions, as I'm sure you know - see > http://matthew-brett.github.io/transforms3d/reference/ > transforms3d.euler.html > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From spluque at gmail.com Wed Feb 1 13:31:51 2017 From: spluque at gmail.com (Seb) Date: Wed, 01 Feb 2017 12:31:51 -0600 Subject: [Numpy-discussion] composing Euler rotation matrices References: <87h94e4tkx.fsf@otaria.sebmel.org> <87d1f24ml4.fsf@otaria.sebmel.org> Message-ID: <87poj1wync.fsf@gmail.com> On Wed, 1 Feb 2017 09:42:15 +0000, Matthew Brett wrote: > Hi, > On Wed, Feb 1, 2017 at 8:28 AM, Robert McLeod wrote: >> Instead of trying to decipher what someone wrote on a Wikipedia, why >> don't you look at a working piece of source code? >> e.g. >> https://github.com/3dem/relion/blob/master/src/euler.cpp > Also - have a look at https://pypi.python.org/pypi/transforms3d - and > in particular you might get some use from symbolic versions of the > transformations, e.g. here : > https://github.com/matthew-brett/transforms3d/blob/master/transforms3d/derivations/eulerangles.py > It's really easy to mix up the conventions, as I'm sure you know - see > http://matthew-brett.github.io/transforms3d/reference/transforms3d.euler.html Thank you very much for providing this package. It looks like this is exactly what I was trying to do (learn). The symbolic versions really help show what is going on in the derivations sub-package by showing how each of the 9 matrix elements are found. I'll try to hack it to use active rotations. -- Seb From boukhdhiramal at yahoo.fr Tue Feb 7 17:40:46 2017 From: boukhdhiramal at yahoo.fr (Boukhdhir Amal) Date: Tue, 7 Feb 2017 22:40:46 +0000 (UTC) Subject: [Numpy-discussion] zero values in the output of PyArray_AsCArray( References: <187263153.61218.1486507246009.ref@mail.yahoo.com> Message-ID: <187263153.61218.1486507246009@mail.yahoo.com> Hi,? ?I am trying to access an array as a C-Type using the function 'PyArray_AsCArray'The problem is that I am getting many 0 values in the resulting C- array. Some of the indexedvalues are correct. This is my code:? static PyObject* cos_func_np(PyObject* self, PyObject* args){? ? ? ? PyObject *in_array_object;? ? ? ? PyObject *out_array; ?? ? ? ? int** segs_2d_array;? ? ? ? ? ? ??? ? ? ? /* Parse single numpy array argument*/? ? ? ? if (! PyArg_ParseTuple(args, "O", &in_array_object))? ? ? ? ? ? return NULL;? ? ? ? int typenum = NPY_INT64;? ? ? ? PyArray_Descr *descr;? ? ? ? descr = PyArray_DescrFromType(typenum); ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? ? ? npy_intp dims[2]; ? ? ? ? ? ? ? ? ? ? ?? ? ? ? PyArray_AsCArray(&in_array_object, (void**) &segs_2d_array, dims, 2, descr); ? ? ? ? ? ? ? ? ? ? ? ? printf("\n-segs_2d_array: %d --\n", segs_2d_array[1][5]); ? ? ? ? ? ? ? ? ? ? ? ??? ? ? ??? ? ? ? //return Py_BuildValue("O", in_array_object);? ? ? ??} For example:segs_2d_array[0][0] ?and segs_2d_array[1][2] outputs the correct values, however, segs_2d_array[1][3] and segs_2d_array[1][5] are equal to zero. What is wrong with this code please ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Feb 13 16:01:37 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 13 Feb 2017 14:01:37 -0700 Subject: [Numpy-discussion] Marten van Kerkwijk added to numpy team. Message-ID: Hi All, I'm pleased to welcome Marten to the numpy team. His reviews of PRs have been very useful in the past and I am happy that he has accepted our invitation to join the team. Cheers, Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Tue Feb 14 03:32:32 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 14 Feb 2017 21:32:32 +1300 Subject: [Numpy-discussion] Marten van Kerkwijk added to numpy team. In-Reply-To: References: Message-ID: On Tue, Feb 14, 2017 at 10:01 AM, Charles R Harris < charlesr.harris at gmail.com> wrote: > Hi All, > > I'm pleased to welcome Marten to the numpy team. His reviews of PRs have > been very useful in the past and I am happy that he has accepted our > invitation to join the team. > Excellent, welcome Marten! Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Tue Feb 14 04:59:36 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 14 Feb 2017 22:59:36 +1300 Subject: [Numpy-discussion] Building external c modules with mingw64 / numpy In-Reply-To: <243DBD016692E54EB12F37B87C66E70E815DB8@didag1> References: <243DBD016692E54EB12F37B87C66E70E815DB8@didag1> Message-ID: On Sat, Jan 21, 2017 at 9:23 PM, Schnizer, Pierre < pierre.schnizer at helmholtz-berlin.de> wrote: > Dear all, > > > > I built an external c-module (pygsl) using mingw 64 from > msys2 mingw64-gcc compiler. > > > > This built required some changes to numpy.distutils to get the > > ?python setup.py config? > > and > > ?python setup.py build? > > working. In this process I replaced 2 files in numpy.distutils from > numpy git repository: > > - numpy.dist_utils.misc_utils.py version ec0e046 > on > 14 Dec 2016 > > - numpy.dist_utils. *mingw32ccompiler.py version *ec0e046 > on > 14 Dec 2016 > > > > mingw32ccompiler.py required to be modified to get it work > > n preprocessor had to be defined as I am using setup.py config > > n specifying the runtime library search path to the linker > > n include path of the vcrtruntime > > > > I attached a patch reflecting the changes I had to make to file > mingw32ccompile.py > > If this information is useful I am happy to answer questions > Thanks for the patch Pierre. For future reference: a pull request on GitHub or a link to a Gist is preferred for us and usually gets you a response quicker. Regarding your question in the patch on including Python's install directory: that shouldn't be necessary, and I'd be wary of applying your patch without understanding why the current numpy.distutils code doesn't work for you. But if your patch works for you then it can't hurt I think. Cheers, Ralf > > > Sincerely yours > > Pierre > > > > PS Version infos: > > Python: > > Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 > bit (AMD64)] on win32 > > > > Numpy: > > >> help(numpy.version) > > Help on module numpy.version in numpy: > > DATA > > full_version = '1.12.0' > > git_revision = '561f1accf861ad8606ea2dd723d2be2b09a2dffa' > > release = True > > short_version = '1.12.0' > > version = '1.12.0' > > > > gcc.exe (Rev2, Built by MSYS2 project) 6.2.0 > > > > > > ------------------------------ > > Helmholtz-Zentrum Berlin f?r Materialien und Energie GmbH > > Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher > Forschungszentren e.V. > > Aufsichtsrat: Vorsitzender Dr. Karl Eugen Huthmacher, stv. Vorsitzende Dr. > Jutta Koch-Unterseher > Gesch?ftsf?hrung: Prof. Dr. Anke Rita Kaysser-Pyzalla, Thomas Frederking > > Sitz Berlin, AG Charlottenburg, 89 HRB 5583 > > Postadresse: > Hahn-Meitner-Platz 1 > D-14109 Berlin > > http://www.helmholtz-berlin.de > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From larsson at cs.uchicago.edu Tue Feb 14 18:34:32 2017 From: larsson at cs.uchicago.edu (Gustav Larsson) Date: Tue, 14 Feb 2017 15:34:32 -0800 Subject: [Numpy-discussion] Proposal to support __format__ Message-ID: Hi everyone! I want to discuss adding support for __format__ in ndarray and I am willing to contribute code-wise once consensus has been reached. It was briefly discussed on GitHub two years ago (https://github.com/numpy/numpy/issues/5543) and I will re-iterate some of the points made there and build off of that. I have been thinking about this a lot in the last few weeks and my thoughts turned into a fairly fleshed out proposal. The discussion should probably start more high-level, so I apologize if the level of detail is inappropriate at this point in time. I decided on a gist, since the email got too long and clear formatting helps: https://gist.github.com/gustavla/2783543be1204d2b5d368f6a1fb4d069 OK, those are my thoughts for now. What do you think? Cheers, Gustav From shoyer at gmail.com Tue Feb 14 18:59:49 2017 From: shoyer at gmail.com (Stephan Hoyer) Date: Tue, 14 Feb 2017 15:59:49 -0800 Subject: [Numpy-discussion] Proposal to support __format__ In-Reply-To: References: Message-ID: On Tue, Feb 14, 2017 at 3:34 PM, Gustav Larsson wrote: > Hi everyone! > > I want to discuss adding support for __format__ in ndarray and I am > willing to > contribute code-wise once consensus has been reached. It was briefly > discussed on GitHub two years ago (https://github.com/numpy/ > numpy/issues/5543) > and I will re-iterate some of the points made there and build off of that. > I > have been thinking about this a lot in the last few weeks and my thoughts > turned > into a fairly fleshed out proposal. The discussion should probably start > more > high-level, so I apologize if the level of detail is inappropriate at this > point in time. > > I decided on a gist, since the email got too long and clear formatting > helps: > > https://gist.github.com/gustavla/2783543be1204d2b5d368f6a1fb4d069 This is a lovely and clearly written document. Thanks for taking the time to think through this! I encourage you to submit it as a pull request to the NumPy repository as a "NumPy Enhancement Proposal", either now or after we've discussed it: https://docs.scipy.org/doc/numpy-dev/neps/index.html > OK, those are my thoughts for now. What do you think? > Two thoughts for now: 1. For object arrays, I would default to calling format on each element (your "map principle") rather than raising an error. 2. It's absolutely OK to leave functionality unimplemented and not immediately nail down every edge case. As a default, I would suggest raising errors whenever non-empty type specifications are provided rather than raising errors in every case. -------------- next part -------------- An HTML attachment was scrubbed... URL: From larsson at cs.uchicago.edu Tue Feb 14 20:35:23 2017 From: larsson at cs.uchicago.edu (Gustav Larsson) Date: Tue, 14 Feb 2017 17:35:23 -0800 Subject: [Numpy-discussion] Proposal to support __format__ In-Reply-To: References: Message-ID: > > I encourage you to submit it as a pull request to the NumPy repository as > a "NumPy Enhancement Proposal", either now or after we've discussed it: > https://docs.scipy.org/doc/numpy-dev/neps/index.html OK, I will let it go through one iteration of comments and then I'll submit one. Thanks! 1. For object arrays, I would default to calling format on each element > (your "map principle") rather than raising an error. I'm glad you brought this up as a possibility. It might be possible, but there are some issues that would need to be resolved. First of all, {} and {:} always works and gives the same result it currently does. So, this only affects the situation where the format spec is non-empty. I think there are two main issues: Heterogeneity: Let's say we have x = np.array([12.3, True, 'string', Foo(10)], dtype=np.object). Then, presumably {:.1f} should cause a ValueError since the string does not support format type 'f'. This could create a lot of ValueError land mines for the user. For x[:2] however it should work and produce something like [12.3 1.0]. Note, the "map principle" still can't be strictly true. Let's say we have an array with type object and mostly string-like elements. Then {:5s} will still not produce exactly {:5s} element-wise, because the string representations need to be repr-based inside the array (otherwise it could break for newlines and things like that and produce spaces that make the boundary between elements ambiguous). This brings me to the next issue. Str vs. repr: If we have a homogeneous object-array with types Foo and Foo implements __format__, it would be great if this worked. However, one issue is that Foo.__format__ might return things like newline (or spaces), which would break (or confuse) the printed output (unless it is made incredibly smart to support "vertical alignment"). This issue is essentially the same as for strings in general, which is why they use repr instead. I can think of two solutions: 1) Try to sanitize (or repr-ify) the string returned by __format__ somehow; 2) Put the responsibility on the user and simply let the rendering break if Foo.__format__ does not play well. 2. It's absolutely OK to leave functionality unimplemented and not > immediately nail down every edge case. As a default, I would suggest > raising errors whenever non-empty type specifications are provided rather > than raising errors in every case. > I agree. Gustav On Tue, Feb 14, 2017 at 3:59 PM, Stephan Hoyer wrote: > On Tue, Feb 14, 2017 at 3:34 PM, Gustav Larsson > wrote: > >> Hi everyone! >> >> I want to discuss adding support for __format__ in ndarray and I am >> willing to >> contribute code-wise once consensus has been reached. It was briefly >> discussed on GitHub two years ago (https://github.com/numpy/nump >> y/issues/5543) >> and I will re-iterate some of the points made there and build off of >> that. I >> have been thinking about this a lot in the last few weeks and my thoughts >> turned >> into a fairly fleshed out proposal. The discussion should probably start >> more >> high-level, so I apologize if the level of detail is inappropriate at this >> point in time. >> >> I decided on a gist, since the email got too long and clear formatting >> helps: >> >> https://gist.github.com/gustavla/2783543be1204d2b5d368f6a1fb4d069 > > > This is a lovely and clearly written document. Thanks for taking the time > to think through this! > > I encourage you to submit it as a pull request to the NumPy repository as > a "NumPy Enhancement Proposal", either now or after we've discussed it: > https://docs.scipy.org/doc/numpy-dev/neps/index.html > > >> OK, those are my thoughts for now. What do you think? >> > > Two thoughts for now: > 1. For object arrays, I would default to calling format on each element > (your "map principle") rather than raising an error. > 2. It's absolutely OK to leave functionality unimplemented and not > immediately nail down every edge case. As a default, I would suggest > raising errors whenever non-empty type specifications are provided rather > than raising errors in every case. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Tue Feb 14 20:55:21 2017 From: shoyer at gmail.com (Stephan Hoyer) Date: Tue, 14 Feb 2017 17:55:21 -0800 Subject: [Numpy-discussion] Proposal to support __format__ In-Reply-To: References: Message-ID: On Tue, Feb 14, 2017 at 5:35 PM, Gustav Larsson wrote: > 1. For object arrays, I would default to calling format on each element >> (your "map principle") rather than raising an error. >> > > I'm glad you brought this up as a possibility. It might be possible, but > there are some issues that would need to be resolved. First of all, {} and > {:} always works and gives the same result it currently does. So, this only > affects the situation where the format spec is non-empty. I think there are > two main issues: > > Heterogeneity: Let's say we have x = np.array([12.3, True, 'string', > Foo(10)], dtype=np.object). Then, presumably {:.1f} should cause a > ValueError since the string does not support format type 'f'. This could > create a lot of ValueError land mines for the user. > Things will absolutely break if you try to do complex operations on in-homogeneously typed arrays. I would put the onus on the user in such a case. > For x[:2] however it should work and produce something like [12.3 1.0]. > Note, the "map principle" still can't be strictly true. Let's say we have > an array with type object and mostly string-like elements. Then {:5s} will > still not produce exactly {:5s} element-wise, because the string > representations need to be repr-based inside the array (otherwise it could > break for newlines and things like that and produce spaces that make the > boundary between elements ambiguous). This brings me to the next issue. > Indeed, this will be a departure from the behavior without a format string, which just uses repr. In my mind, this is the strongest argument against using the map principle here, because there is a discontinuous shift between providing and not providing a format string. > Str vs. repr: If we have a homogeneous object-array with types Foo and Foo > implements __format__, it would be great if this worked. However, one issue > is that Foo.__format__ might return things like newline (or spaces), which > would break (or confuse) the printed output (unless it is made incredibly > smart to support "vertical alignment"). This issue is essentially the same > as for strings in general, which is why they use repr instead. I can think > of two solutions: 1) Try to sanitize (or repr-ify) the string returned by > __format__ somehow; 2) Put the responsibility on the user and simply let > the rendering break if Foo.__format__ does not play well. > I wouldn't do anything fancy here to worry about line breaks. It's basically impossible to get this right for edge cases, so I would certainly put the responsibility on the user. On another note, about Python 2 vs 3: I would definitely take the approach of copying the Python 3 behavior on all versions of NumPy (when feasible) and not being too concerned about compatibility with format on Python 2. The future is Python 3. -------------- next part -------------- An HTML attachment was scrubbed... URL: From amit at haystackapp.net Tue Feb 14 23:24:20 2017 From: amit at haystackapp.net (Amit Bhosle) Date: Tue, 14 Feb 2017 20:24:20 -0800 Subject: [Numpy-discussion] ImportError: Importing the multiarray numpy extension module failed Message-ID: Hi, I'm struggling with a numpy issue and web search hasn't helped. I'm on windows 10, and using Python27. I've tried reinstalling numpy, and also a few different versions, but without any luck. numpy was pulled in as dependency of timezonefinder==1.5.7 that i need, and the numpy-1.12.0.dist-info distribution was installed.. The error on my google-app-engine server's console is as below.. Can someone pls help? thanks a bunch in advance.. AB File "timezonefinder\timezonefinder.py", line 8, in from numpy import array, empty, fromfile File "numpy\__init__.py", lin e 142, in from . import add_newdocs File "numpy\add_newdocs.py", line 13, in from numpy.lib import add_newdoc File "numpy\lib\__init__.py", line 8, in from .type_check import * File "numpy\lib\type_check.py ", line 11, in import numpy.core.numeric as _nx File "numpy\core\__init__.py" , line 24, in raise ImportError(msg) ImportError: Importing the multiarray numpy extension module failed. Most likely you are trying to import a failed build of numpy. If you're working with a numpy git repo, try `git clean -xdf` (removes all files not under version control). Otherwise reinstall numpy. -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Feb 15 00:01:09 2017 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 14 Feb 2017 21:01:09 -0800 Subject: [Numpy-discussion] ImportError: Importing the multiarray numpy extension module failed In-Reply-To: References: Message-ID: On Tue, Feb 14, 2017 at 8:24 PM, Amit Bhosle wrote: > Hi, > > I'm struggling with a numpy issue and web search hasn't helped. I'm on > windows 10, and using Python27. > > I've tried reinstalling numpy, and also a few different versions, but > without any luck. > > numpy was pulled in as dependency of timezonefinder==1.5.7 that i need, and > the numpy-1.12.0.dist-info distribution was installed.. > > The error on my google-app-engine server's console is as below.. > Can someone pls help? Are you using the app engine "standard environment"? That's a very weird Python environment that forbids the installation of all packages that contain C code. This obviously includes numpy, and would explain your error. They do provide a pre-installed super-ancient version of numpy with some features removed, which might work for you if you force-uninstall numpy. Otherwise you might need to switch to the "flexible environment". -n -- Nathaniel J. Smith -- https://vorpus.org From m.h.vankerkwijk at gmail.com Wed Feb 15 11:03:51 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Wed, 15 Feb 2017 11:03:51 -0500 Subject: [Numpy-discussion] Proposal to support __format__ In-Reply-To: References: Message-ID: Hi Gustav, This is great! A few quick comments (mostly echo-ing Stephan's). 1. You basically have a NEP already! Making a PR from it allows to give line-by-line comments, so would help! 2. Don't worry about supporting python2 specifics; just try to ensure it doesn't break; I would not say more about it! 3. On `set_printoptions` -- ideally, it will become possible to use this as a context (i.e., `with set_printoption(...)`). It might make sense to have an `override_format` keyword argument to it. 4. Otherwise, my main suggestion is to start small with the more obvious ones, and not worry too much about format validation, but rather about getting the simple ones to work well (e.g., for an object array, just apply the format given; if it doesn't work, it will error out on its own, which is OK). 5. One bit of detail: the "g" one does confuse me. All the best, Marten From matthew.brett at gmail.com Wed Feb 15 14:02:35 2017 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 15 Feb 2017 19:02:35 +0000 Subject: [Numpy-discussion] PowerPC testing servers Message-ID: Hey, A recent post to the wheel-builders mailing list pointed out some links to places providing free PowerPC hosting for open source projects, if they agree to a submitted request: https://mail.python.org/pipermail/wheel-builders/2017-February/000257.html It would be good to get some testing going on these architectures. Shall we apply for hosting, as the numpy organization? Cheers, Matthew From ralf.gommers at gmail.com Wed Feb 15 14:37:06 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 16 Feb 2017 08:37:06 +1300 Subject: [Numpy-discussion] PowerPC testing servers In-Reply-To: References: Message-ID: On Thu, Feb 16, 2017 at 8:02 AM, Matthew Brett wrote: > Hey, > > A recent post to the wheel-builders mailing list pointed out some > links to places providing free PowerPC hosting for open source > projects, if they agree to a submitted request: > > https://mail.python.org/pipermail/wheel-builders/2017-February/000257.html > > It would be good to get some testing going on these architectures. > Shall we apply for hosting, as the numpy organization? > Those are bare VMs it seems. Remembering the Buildbot and Mailman horrors, I think we should be very reluctant to taking responsibility for maintaining CI on anything that's not hosted and can be controlled with a simple config file in our repo. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Feb 15 14:45:49 2017 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 15 Feb 2017 19:45:49 +0000 Subject: [Numpy-discussion] PowerPC testing servers In-Reply-To: References: Message-ID: On Wed, Feb 15, 2017 at 7:37 PM, Ralf Gommers wrote: > > > On Thu, Feb 16, 2017 at 8:02 AM, Matthew Brett > wrote: >> >> Hey, >> >> A recent post to the wheel-builders mailing list pointed out some >> links to places providing free PowerPC hosting for open source >> projects, if they agree to a submitted request: >> >> https://mail.python.org/pipermail/wheel-builders/2017-February/000257.html >> >> It would be good to get some testing going on these architectures. >> Shall we apply for hosting, as the numpy organization? > > > Those are bare VMs it seems. Remembering the Buildbot and Mailman horrors, I > think we should be very reluctant to taking responsibility for maintaining > CI on anything that's not hosted and can be controlled with a simple config > file in our repo. Not sure what you mean about mailman - maybe the Enthought servers we didn't have access to? For buildbot, I've been maintaining about 12 crappy old machines for about 7 years now [1] - I'm happy to do the same job for a couple of properly hosted PPC machines. At least we'd have some way of testing for these machines, if we get stuck - even if that involved spinning up a VM and installing the stuff we needed from the command line. Cheers, Matthew [1] http://nipy.bic.berkeley.edu/buildslaves From ralf.gommers at gmail.com Wed Feb 15 14:55:33 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 16 Feb 2017 08:55:33 +1300 Subject: [Numpy-discussion] PowerPC testing servers In-Reply-To: References: Message-ID: On Thu, Feb 16, 2017 at 8:45 AM, Matthew Brett wrote: > On Wed, Feb 15, 2017 at 7:37 PM, Ralf Gommers > wrote: > > > > > > On Thu, Feb 16, 2017 at 8:02 AM, Matthew Brett > > wrote: > >> > >> Hey, > >> > >> A recent post to the wheel-builders mailing list pointed out some > >> links to places providing free PowerPC hosting for open source > >> projects, if they agree to a submitted request: > >> > >> https://mail.python.org/pipermail/wheel-builders/2017- > February/000257.html > >> > >> It would be good to get some testing going on these architectures. > >> Shall we apply for hosting, as the numpy organization? > > > > > > Those are bare VMs it seems. Remembering the Buildbot and Mailman > horrors, I > > think we should be very reluctant to taking responsibility for > maintaining > > CI on anything that's not hosted and can be controlled with a simple > config > > file in our repo. > > Not sure what you mean about mailman - maybe the Enthought servers we > didn't have access to? We did have access (for most of the time), it's just that no one is interested in putting in lots of hours on sysadmin duties. > For buildbot, I've been maintaining about 12 > crappy old machines for about 7 years now [1] - I'm happy to do the > same job for a couple of properly hosted PPC machines. That's awesome persistence. The NumPy and SciPy buildbots certainly weren't maintained like that, half of them were offline or broken for long periods usually. > At least we'd > have some way of testing for these machines, if we get stuck - even if > that involved spinning up a VM and installing the stuff we needed from > the command line. > I do see the value of testing on more platforms of course. It's just about logistics/responsibilities. If you're saying that you'll do the maintenance, and want to apply for resources using the NumPy name, that's much better I think then making "the numpy devs" collectively responsible. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Feb 15 14:58:00 2017 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 15 Feb 2017 19:58:00 +0000 Subject: [Numpy-discussion] PowerPC testing servers In-Reply-To: References: Message-ID: On Wed, Feb 15, 2017 at 7:55 PM, Ralf Gommers wrote: > > > On Thu, Feb 16, 2017 at 8:45 AM, Matthew Brett > wrote: >> >> On Wed, Feb 15, 2017 at 7:37 PM, Ralf Gommers >> wrote: >> > >> > >> > On Thu, Feb 16, 2017 at 8:02 AM, Matthew Brett >> > wrote: >> >> >> >> Hey, >> >> >> >> A recent post to the wheel-builders mailing list pointed out some >> >> links to places providing free PowerPC hosting for open source >> >> projects, if they agree to a submitted request: >> >> >> >> >> >> https://mail.python.org/pipermail/wheel-builders/2017-February/000257.html >> >> >> >> It would be good to get some testing going on these architectures. >> >> Shall we apply for hosting, as the numpy organization? >> > >> > >> > Those are bare VMs it seems. Remembering the Buildbot and Mailman >> > horrors, I >> > think we should be very reluctant to taking responsibility for >> > maintaining >> > CI on anything that's not hosted and can be controlled with a simple >> > config >> > file in our repo. >> >> Not sure what you mean about mailman - maybe the Enthought servers we >> didn't have access to? > > > We did have access (for most of the time), it's just that no one is > interested in putting in lots of hours on sysadmin duties. > >> >> For buildbot, I've been maintaining about 12 >> crappy old machines for about 7 years now [1] - I'm happy to do the >> same job for a couple of properly hosted PPC machines. > > > That's awesome persistence. The NumPy and SciPy buildbots certainly weren't > maintained like that, half of them were offline or broken for long periods > usually. Right - they do need persistence, and to have someone who takes responsibility for them. >> >> At least we'd >> have some way of testing for these machines, if we get stuck - even if >> that involved spinning up a VM and installing the stuff we needed from >> the command line. > > > I do see the value of testing on more platforms of course. It's just about > logistics/responsibilities. If you're saying that you'll do the maintenance, > and want to apply for resources using the NumPy name, that's much better I > think then making "the numpy devs" collectively responsible. Yes, exactly. I'm happy to take responsibility for them, I just wanted to make sure that numpy devs could get at them if I'm not around for some reason. Matthew From evgeny.burovskiy at gmail.com Wed Feb 15 15:48:04 2017 From: evgeny.burovskiy at gmail.com (Evgeni Burovski) Date: Wed, 15 Feb 2017 23:48:04 +0300 Subject: [Numpy-discussion] ANN: scipy 0.19.0 release candidate 1 Message-ID: Hi, I'm pleased to announce the availability of the first release candidate for scipy 0.19.0. It contains contributions from 120 people over the course of seven months. Please try this release and report any issues on Github tracker, https://github.com/scipy/scipy, or scipy-dev mailing list. Source tarballs and release notes are available from Github releases, https://github.com/scipy/scipy/releases/tag/v0.19.0rc1 Please note that this is a source-only release. We do not provide Windows binaries for this release. OS X and Linux wheels will be provided for the final release. The current release schedule is 22 Feb : 0.19.0rc2, if needed 09 Mar : 0.19.0 final Thanks to everyone who contributed to this release! Cheers, Evgeni A part of the release notes follows below: =================================================== ========================== SciPy 0.19.0 Release Notes ========================== .. note:: Scipy 0.19.0 is not released yet! .. contents:: SciPy 0.19.0 is the culmination of X months of hard work. It contains many new features, numerous bug-fixes, improved test coverage and better documentation. There have been a number of deprecations and API changes in this release, which are documented below. All users are encouraged to upgrade to this release, as there are a large number of bug-fixes and optimizations. Moreover, our development attention will now shift to bug-fix releases on the 0.19.x branch, and on adding new features on the master branch. This release requires Python 2.7 or 3.4-3.6 and NumPy 1.8.2 or greater. Highlights of this release include: - A unified foreign function interface layer, `scipy.LowLevelCallable`. - Cython API for scalar, typed versions of the universal functions from the `scipy.special` module, via `cimport scipy.special.cython_special`. New features ============ Foreign function interface improvements --------------------------------------- `scipy.LowLevelCallable` provides a new unified interface for wrapping low-level compiled callback functions in the Python space. It supports Cython imported "api" functions, ctypes function pointers, CFFI function pointers, ``PyCapsules``, Numba jitted functions and more. See `gh-6509 `_ for details. `scipy.linalg` improvements --------------------------- The function `scipy.linalg.solve` obtained two more keywords ``assume_a`` and ``transposed``. The underlying LAPACK routines are replaced with "expert" versions and now can also be used to solve symmetric, hermitian and positive definite coefficient matrices. Moreover, ill-conditioned matrices now cause a warning to be emitted with the estimated condition number information. Old ``sym_pos`` keyword is kept for backwards compatibility reasons however it is identical to using ``assume_a='pos'``. Moreover, the ``debug`` keyword, which had no function but only printing the ``overwrite_`` values, is deprecated. The function `scipy.linalg.matrix_balance` was added to perform the so-called matrix balancing using the LAPACK xGEBAL routine family. This can be used to approximately equate the row and column norms through diagonal similarity transformations. The functions `scipy.linalg.solve_continuous_are` and `scipy.linalg.solve_discrete_are` have numerically more stable algorithms. These functions can also solve generalized algebraic matrix Riccati equations. Moreover, both gained a ``balanced`` keyword to turn balancing on and off. `scipy.spatial` improvements ---------------------------- `scipy.spatial.SphericalVoronoi.sort_vertices_of_regions` has been re-written in Cython to improve performance. `scipy.spatial.SphericalVoronoi` can handle > 200 k points (at least 10 million) and has improved performance. The function `scipy.spatial.distance.directed_hausdorff` was added to calculate the directed Hausdorff distance. ``count_neighbors`` method of `scipy.spatial.cKDTree` gained an ability to perform weighted pair counting via the new keywords ``weights`` and ``cumulative``. See `gh-5647 `_ for details. `scipy.ndimage` improvements ---------------------------- The callback function C API supports PyCapsules in Python 2.7 Multidimensional filters now allow having different extrapolation modes for different axes. `scipy.optimize` improvements ----------------------------- The `scipy.optimize.basinhopping` global minimizer obtained a new keyword, `seed`, which can be used to seed the random number generator and obtain repeatable minimizations. The keyword `sigma` in `scipy.optimize.curve_fit` was overloaded to also accept the covariance matrix of errors in the data. `scipy.signal` improvements --------------------------- The function `scipy.signal.correlate` and `scipy.signal.convolve` have a new optional parameter `method`. The default value of `auto` estimates the fastest of two computation methods, the direct approach and the Fourier transform approach. A new function has been added to choose the convolution/correlation method, `scipy.signal.choose_conv_method` which may be appropriate if convolutions or correlations are performed on many arrays of the same size. New functions have been added to calculate complex short time fourier transforms of an input signal, and to invert the transform to recover the original signal: `scipy.signal.stft` and `scipy.signal.istft`. This implementation also fixes the previously incorrect ouput of `scipy.signal.spectrogram` when complex output data were requested. The function `scipy.signal.sosfreqz` was added to compute the frequency response from second-order sections. The function `scipy.signal.unit_impulse` was added to conveniently generate an impulse function. The function `scipy.signal.iirnotch` was added to design second-order IIR notch filters that can be used to remove a frequency component from a signal. The dual function `scipy.signal.iirpeak` was added to compute the coefficients of a second-order IIR peak (resonant) filter. The function `scipy.signal.minimum_phase` was added to convert linear-phase FIR filters to minimum phase. The functions `scipy.signal.upfirdn` and `scipy.signal.resample_poly` are now substantially faster when operating on some n-dimensional arrays when n > 1. The largest reduction in computation time is realized in cases where the size of the array is small (<1k samples or so) along the axis to be filtered. `scipy.fftpack` improvements ---------------------------- Fast Fourier transform routines now accept `np.float16` inputs and upcast them to `np.float32`. Previously, they would raise an error. `scipy.cluster` improvements ---------------------------- Methods ``"centroid"`` and ``"median"`` of `scipy.cluster.hierarchy.linkage` have been significantly sped up. Long-standing issues with using ``linkage`` on large input data (over 16 GB) have been resolved. `scipy.sparse` improvements --------------------------- The functions `scipy.sparse.save_npz` and `scipy.sparse.load_npz` were added, providing simple serialization for some sparse formats. The `prune` method of classes `bsr_matrix`, `csc_matrix`, and `csr_matrix` was updated to reallocate backing arrays under certain conditions, reducing memory usage. The methods `argmin` and `argmax` were added to classes `coo_matrix`, `csc_matrix`, `csr_matrix`, and `bsr_matrix`. New function `scipy.sparse.csgraph.structural_rank` computes the structural rank of a graph with a given sparsity pattern. New function `scipy.sparse.linalg.spsolve_triangular` solves a sparse linear system with a triangular left hand side matrix. `scipy.special` improvements ---------------------------- Scalar, typed versions of universal functions from `scipy.special` are available in the Cython space via ``cimport`` from the new module `scipy.special.cython_special`. These scalar functions can be expected to be significantly faster then the universal functions for scalar arguments. See the `scipy.special` tutorial for details. Better control over special-function errors is offered by the functions `scipy.special.geterr` and `scipy.special.seterr` and the context manager `scipy.special.errstate`. The names of orthogonal polynomial root functions have been changed to be consistent with other functions relating to orthogonal polynomials. For example, `scipy.special.j_roots` has been renamed `scipy.special.roots_jacobi` for consistency with the related functions `scipy.special.jacobi` and `scipy.special.eval_jacobi`. To preserve back-compatibility the old names have been left as aliases. Wright Omega function is implemented as `scipy.special.wrightomega`. `scipy.stats` improvements -------------------------- The function `scipy.stats.weightedtau` was added. It provides a weighted version of Kendall's tau. New class `scipy.stats.multinomial` implements the multinomial distribution. New class `scipy.stats.rv_histogram` constructs a continuous univariate distribution with a piecewise linear CDF from a binned data sample. New class `scipy.stats.argus` implements the Argus distribution. `scipy.interpolate` improvements -------------------------------- New class `scipy.interpolate.BSpline` represents splines. ``BSpline`` objects contain knots and coefficients and can evaluate the spline. The format is consistent with FITPACK, so that one can do, for example:: >>> t, c, k = splrep(x, y, s=0) >>> spl = BSpline(t, c, k) >>> np.allclose(spl(x), y) ``spl*`` functions, `scipy.interpolate.splev`, `scipy.interpolate.splint`, `scipy.interpolate.splder` and `scipy.interpolate.splantider`, accept both ``BSpline`` objects and ``(t, c, k)`` tuples for backwards compatibility. For multidimensional splines, ``c.ndim > 1``, ``BSpline`` objects are consistent with piecewise polynomials, `scipy.interpolate.PPoly`. This means that ``BSpline`` objects are not immediately consistent with `scipy.interpolate.splprep`, and one *cannot* do ``>>> BSpline(*splprep([x, y])[0])``. Consult the `scipy.interpolate` test suite for examples of the precise equivalence. In new code, prefer using ``scipy.interpolate.BSpline`` objects instead of manipulating ``(t, c, k)`` tuples directly. New function `scipy.interpolate.make_interp_spline` constructs an interpolating spline given data points and boundary conditions. New function `scipy.interpolate.make_lsq_spline` constructs a least-squares spline approximation given data points. `scipy.integrate` improvements ------------------------------ Now `scipy.integrate.fixed_quad` supports vector-valued functions. Deprecated features =================== `scipy.interpolate.splmake`, `scipy.interpolate.spleval` and `scipy.interpolate.spline` are deprecated. The format used by `splmake/spleval` was inconsistent with `splrep/splev` which was confusing to users. `scipy.special.errprint` is deprecated. Improved functionality is available in `scipy.special.seterr`. Backwards incompatible changes ============================== The deprecated ``scipy.weave`` submodule was removed. `scipy.spatial.distance.squareform` now returns arrays of the same dtype as the input, instead of always float64. `scipy.special.errprint` now returns a boolean. The function `scipy.signal.find_peaks_cwt` now returns an array instead of a list. `scipy.stats.kendalltau` now computes the correct p-value in case the input contains ties. The p-value is also identical to that computed by `scipy.stats.mstats.kendalltau` and by R. If the input does not contain ties there is no change w.r.t. the previous implementation. The function `scipy.linalg.block_diag` will not ignore zero-sized matrices anymore. Instead it will insert rows or columns of zeros of the appropriate size. See gh-4908 for more details. Other changes ============= SciPy wheels will now report their dependency on ``numpy`` on all platforms. This change was made because Numpy wheels are available, and because the pip upgrade behavior is finally changing for the better (use ``--upgrade-strategy=only-if-needed`` for ``pip >= 8.2``; that behavior will become the default in the next major version of ``pip``). Numerical values returned by `scipy.interpolate.interp1d` with ``kind="cubic"`` and ``"quadratic"`` may change relative to previous scipy versions. If your code depended on specific numeric values (i.e., on implementation details of the interpolators), you may want to double-check your results. Authors ======= * @endolith * Max Argus + * Herv? Audren * Alessandro Pietro Bardelli + * Michael Benfield + * Felix Berkenkamp * Matthew Brett * Per Brodtkorb * Evgeni Burovski * Pierre de Buyl * CJ Carey * Brandon Carter + * Tim Cera * Klesk Chonkin * Christian H?ggstr?m + * Luca Citi * Peadar Coyle + * Daniel da Silva + * Greg Dooper + * John Draper + * drlvk + * David Ellis + * Yu Feng * Baptiste Fontaine + * Jed Frey + * Siddhartha Gandhi + * GiggleLiu + * Wim Glenn + * Akash Goel + * Ralf Gommers * Alexander Goncearenco + * Richard Gowers + * Alex Griffing * Radoslaw Guzinski + * Charles Harris * Callum Jacob Hays + * Ian Henriksen * Randy Heydon + * Lindsey Hiltner + * Gerrit Holl + * Hiroki IKEDA + * jfinkels + * Mher Kazandjian + * Thomas Keck + * keuj6 + * Kornel Kielczewski + * Sergey B Kirpichev + * Vasily Kokorev + * Eric Larson * Denis Laxalde * Gregory R. Lee * Josh Lefler + * Julien Lhermitte + * Evan Limanto + * Nikolay Mayorov * Geordie McBain + * Josue Melka + * Matthieu Melot * michaelvmartin15 + * Surhud More + * Brett M. Morris + * Chris Mutel + * Paul Nation * Andrew Nelson * David Nicholson + * Aaron Nielsen + * Joel Nothman * nrnrk + * Juan Nunez-Iglesias * Mikhail Pak + * Gavin Parnaby + * Thomas Pingel + * Ilhan Polat + * Aman Pratik + * Sebastian Pucilowski * Ted Pudlik * puenka + * Eric Quintero * Tyler Reddy * Joscha Reimer * Antonio Horta Ribeiro + * Edward Richards + * Roman Ring + * Rafael Rossi + * Colm Ryan + * Sami Salonen + * Alvaro Sanchez-Gonzalez + * Johannes Schmitz * Kari Schoonbee * Yurii Shevchuk + * Jonathan Siebert + * Jonathan Tammo Siebert + * Scott Sievert + * Sourav Singh * Byron Smith + * Srikiran + * Samuel St-Jean + * Yoni Teitelbaum + * Bhavika Tekwani * Martin Thoma * timbalam + * Svend Vanderveken + * Sebastiano Vigna + * Aditya Vijaykumar + * Santi Villalba + * Ze Vinicius * Pauli Virtanen * Matteo Visconti * Yusuke Watanabe + * Warren Weckesser * Phillip Weinberg + * Nils Werner * Jakub Wilk * Josh Wilson * wirew0rm + * David Wolever + * Nathan Woods * ybeltukov + * G Young * Evgeny Zhurko + A total of 120 people contributed to this release. People with a "+" by their names contributed a patch for the first time. This list of names is automatically generated, and may not be fully complete. From larsson at cs.uchicago.edu Wed Feb 15 16:48:34 2017 From: larsson at cs.uchicago.edu (Gustav Larsson) Date: Wed, 15 Feb 2017 13:48:34 -0800 Subject: [Numpy-discussion] Proposal to support __format__ In-Reply-To: References: Message-ID: > > This is great! Thanks! Glad to be met by enthusiasm about this. 1. You basically have a NEP already! Making a PR from it allows to > give line-by-line comments, so would help! I will do this soon. 2. Don't worry about supporting python2 specifics; just try to ensure > it doesn't break; I would not say more about it! Sounds good to me. 3. On `set_printoptions` -- ideally, it will become possible to use > this as a context (i.e., `with set_printoption(...)`). It might make > sense to have an `override_format` keyword argument to it. Having a `with np.printoptions(...)` context manager is a great idea. It does sound orthogonal to __format__ though, so it could be addressed separately. 4. Otherwise, my main suggestion is to start small with the more > obvious ones, and not worry too much about format validation, but > rather about getting the simple ones to work well (e.g., for an object > array, just apply the format given; if it doesn't work, it will error > out on its own, which is OK). Sounds good to me. I was thinking of approaching the implementation by writing unit tests first and group them into different priority tiers. That way, the unit tests can go through another review before implementation gets going. I agree that __format__ doesn't have to check format validation if a ValueError is going to be raised anyway by sub-calls. 5. One bit of detail: the "g" one does confuse me. I will re-write this a bit to make it clearer. Basically, the 'g' with the mix of 'e'/'f' depending on max/min>1000 is all from the current numpy behavior, so it is not something I had much creative input on at all. Although, as it is written right now it may seem so. That is, the goal is to have {:} == {:g} for float arrays, analogous to how {:} == {:g} for built-in floats. Then, if the user departs a bit, like {:.2g}, it will simply be identical to calling np.set_printoptions(precision=2) first. Gustav On Wed, Feb 15, 2017 at 8:03 AM, Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > Hi Gustav, > > This is great! A few quick comments (mostly echo-ing Stephan's). > > 1. You basically have a NEP already! Making a PR from it allows to > give line-by-line comments, so would help! > > 2. Don't worry about supporting python2 specifics; just try to ensure > it doesn't break; I would not say more about it! > > 3. On `set_printoptions` -- ideally, it will become possible to use > this as a context (i.e., `with set_printoption(...)`). It might make > sense to have an `override_format` keyword argument to it. > > 4. Otherwise, my main suggestion is to start small with the more > obvious ones, and not worry too much about format validation, but > rather about getting the simple ones to work well (e.g., for an object > array, just apply the format given; if it doesn't work, it will error > out on its own, which is OK). > > 5. One bit of detail: the "g" one does confuse me. > > All the best, > > Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilhanpolat at gmail.com Wed Feb 15 17:05:00 2017 From: ilhanpolat at gmail.com (Ilhan Polat) Date: Wed, 15 Feb 2017 23:05:00 +0100 Subject: [Numpy-discussion] Proposal to support __format__ In-Reply-To: References: Message-ID: On the last item, do we really have to follow that strange, `d`,`g` and so on conventions on formatting? With all respect to the humongous historical baggage, I think that notation is pretty archaic and terminal like. If being pythonic is of a concern here, maybe it is better to use a more verbose syntax. Just throwing out an idea after 15 seconds of thought (so by no means an alternative suggestion) eng:6i5d -> engineering notation (always powers of ten of multiples of 3) 6 integral digits and 5 decimal digits. float (whatever the default is) float:4i2d (you get the idea) etc. FULL DISCLOSURE: I am a very displeased customer of `fprintf ` of matlab (and others) and this archaic formatting. I never got a hang of it so it might be the case that I don't quite get the rationale behind it and I almost always get it wrong. Maybe at least the rationale can be clarified. Lastly, repeating what others mentioned: thank you for this well prepared initiative On Wed, Feb 15, 2017 at 10:48 PM, Gustav Larsson wrote: > This is great! > > > Thanks! Glad to be met by enthusiasm about this. > > 1. You basically have a NEP already! Making a PR from it allows to >> give line-by-line comments, so would help! > > > I will do this soon. > > 2. Don't worry about supporting python2 specifics; just try to ensure >> it doesn't break; I would not say more about it! > > > Sounds good to me. > > 3. On `set_printoptions` -- ideally, it will become possible to use >> this as a context (i.e., `with set_printoption(...)`). It might make >> sense to have an `override_format` keyword argument to it. > > > Having a `with np.printoptions(...)` context manager is a great idea. It > does sound orthogonal to __format__ though, so it could be addressed > separately. > > 4. Otherwise, my main suggestion is to start small with the more >> obvious ones, and not worry too much about format validation, but >> rather about getting the simple ones to work well (e.g., for an object >> array, just apply the format given; if it doesn't work, it will error >> out on its own, which is OK). > > > Sounds good to me. I was thinking of approaching the implementation by > writing unit tests first and group them into different priority tiers. That > way, the unit tests can go through another review before implementation > gets going. I agree that __format__ doesn't have to check format validation > if a ValueError is going to be raised anyway by sub-calls. > > 5. One bit of detail: the "g" one does confuse me. > > > I will re-write this a bit to make it clearer. Basically, the 'g' with the > mix of 'e'/'f' depending on max/min>1000 is all from the current numpy > behavior, so it is not something I had much creative input on at all. > Although, as it is written right now it may seem so. That is, the goal is > to have {:} == {:g} for float arrays, analogous to how {:} == {:g} for > built-in floats. Then, if the user departs a bit, like {:.2g}, it will > simply be identical to calling np.set_printoptions(precision=2) first. > > Gustav > > On Wed, Feb 15, 2017 at 8:03 AM, Marten van Kerkwijk < > m.h.vankerkwijk at gmail.com> wrote: > >> Hi Gustav, >> >> This is great! A few quick comments (mostly echo-ing Stephan's). >> >> 1. You basically have a NEP already! Making a PR from it allows to >> give line-by-line comments, so would help! >> >> 2. Don't worry about supporting python2 specifics; just try to ensure >> it doesn't break; I would not say more about it! >> >> 3. On `set_printoptions` -- ideally, it will become possible to use >> this as a context (i.e., `with set_printoption(...)`). It might make >> sense to have an `override_format` keyword argument to it. >> >> 4. Otherwise, my main suggestion is to start small with the more >> obvious ones, and not worry too much about format validation, but >> rather about getting the simple ones to work well (e.g., for an object >> array, just apply the format given; if it doesn't work, it will error >> out on its own, which is OK). >> >> 5. One bit of detail: the "g" one does confuse me. >> >> All the best, >> >> Marten >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan12343 at gmail.com Wed Feb 15 17:14:42 2017 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Wed, 15 Feb 2017 16:14:42 -0600 Subject: [Numpy-discussion] Proposal to support __format__ In-Reply-To: References: Message-ID: On Wed, Feb 15, 2017 at 4:05 PM, Ilhan Polat wrote: > On the last item, do we really have to follow that strange, `d`,`g` and so > on conventions on formatting? With all respect to the humongous historical > baggage, I think that notation is pretty archaic and terminal like. If > being pythonic is of a concern here, maybe it is better to use a more > verbose syntax. Just throwing out an idea after 15 seconds of thought (so > by no means an alternative suggestion) > > eng:6i5d -> engineering notation (always powers of ten of multiples of 3) > 6 integral digits and 5 decimal digits. > float (whatever the default is) > float:4i2d (you get the idea) > > etc. > > While I agree with you that printf format codes are arcane, unfortunately they need to be used here since they are supported by Python: https://docs.python.org/3.1/library/string.html#formatspec > > FULL DISCLOSURE: I am a very displeased customer of `fprintf ` of matlab > (and others) and this archaic formatting. I never got a hang of it so it > might be the case that I don't quite get the rationale behind it and I > almost always get it wrong. Maybe at least the rationale can be clarified. > > > Lastly, repeating what others mentioned: thank you for this well prepared > initiative > > > > > On Wed, Feb 15, 2017 at 10:48 PM, Gustav Larsson > wrote: > >> This is great! >> >> >> Thanks! Glad to be met by enthusiasm about this. >> >> 1. You basically have a NEP already! Making a PR from it allows to >>> give line-by-line comments, so would help! >> >> >> I will do this soon. >> >> 2. Don't worry about supporting python2 specifics; just try to ensure >>> it doesn't break; I would not say more about it! >> >> >> Sounds good to me. >> >> 3. On `set_printoptions` -- ideally, it will become possible to use >>> this as a context (i.e., `with set_printoption(...)`). It might make >>> sense to have an `override_format` keyword argument to it. >> >> >> Having a `with np.printoptions(...)` context manager is a great idea. It >> does sound orthogonal to __format__ though, so it could be addressed >> separately. >> >> 4. Otherwise, my main suggestion is to start small with the more >>> obvious ones, and not worry too much about format validation, but >>> rather about getting the simple ones to work well (e.g., for an object >>> array, just apply the format given; if it doesn't work, it will error >>> out on its own, which is OK). >> >> >> Sounds good to me. I was thinking of approaching the implementation by >> writing unit tests first and group them into different priority tiers. That >> way, the unit tests can go through another review before implementation >> gets going. I agree that __format__ doesn't have to check format validation >> if a ValueError is going to be raised anyway by sub-calls. >> >> 5. One bit of detail: the "g" one does confuse me. >> >> >> I will re-write this a bit to make it clearer. Basically, the 'g' with >> the mix of 'e'/'f' depending on max/min>1000 is all from the current numpy >> behavior, so it is not something I had much creative input on at all. >> Although, as it is written right now it may seem so. That is, the goal is >> to have {:} == {:g} for float arrays, analogous to how {:} == {:g} for >> built-in floats. Then, if the user departs a bit, like {:.2g}, it will >> simply be identical to calling np.set_printoptions(precision=2) first. >> >> Gustav >> >> On Wed, Feb 15, 2017 at 8:03 AM, Marten van Kerkwijk < >> m.h.vankerkwijk at gmail.com> wrote: >> >>> Hi Gustav, >>> >>> This is great! A few quick comments (mostly echo-ing Stephan's). >>> >>> 1. You basically have a NEP already! Making a PR from it allows to >>> give line-by-line comments, so would help! >>> >>> 2. Don't worry about supporting python2 specifics; just try to ensure >>> it doesn't break; I would not say more about it! >>> >>> 3. On `set_printoptions` -- ideally, it will become possible to use >>> this as a context (i.e., `with set_printoption(...)`). It might make >>> sense to have an `override_format` keyword argument to it. >>> >>> 4. Otherwise, my main suggestion is to start small with the more >>> obvious ones, and not worry too much about format validation, but >>> rather about getting the simple ones to work well (e.g., for an object >>> array, just apply the format given; if it doesn't work, it will error >>> out on its own, which is OK). >>> >>> 5. One bit of detail: the "g" one does confuse me. >>> >>> All the best, >>> >>> Marten >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amit at haystackapp.net Wed Feb 15 17:16:43 2017 From: amit at haystackapp.net (Amit Bhosle) Date: Wed, 15 Feb 2017 14:16:43 -0800 Subject: [Numpy-discussion] ImportError: Importing the multiarray numpy extension module failed In-Reply-To: References: Message-ID: Hi Nathan, Thanks for the quick response. Yeah - looks like the Google app engine supports only 1.6.1.. Reverting to that version has fixed this issue. Thanks AB On Feb 14, 2017 21:01, "Nathaniel Smith" wrote: > On Tue, Feb 14, 2017 at 8:24 PM, Amit Bhosle wrote: > > Hi, > > > > I'm struggling with a numpy issue and web search hasn't helped. I'm on > > windows 10, and using Python27. > > > > I've tried reinstalling numpy, and also a few different versions, but > > without any luck. > > > > numpy was pulled in as dependency of timezonefinder==1.5.7 that i need, > and > > the numpy-1.12.0.dist-info distribution was installed.. > > > > The error on my google-app-engine server's console is as below.. > > Can someone pls help? > > Are you using the app engine "standard environment"? That's a very > weird Python environment that forbids the installation of all packages > that contain C code. This obviously includes numpy, and would explain > your error. They do provide a pre-installed super-ancient version of > numpy with some features removed, which might work for you if you > force-uninstall numpy. Otherwise you might need to switch to the > "flexible environment". > > -n > > -- > Nathaniel J. Smith -- https://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From morph at debian.org Wed Feb 15 21:53:42 2017 From: morph at debian.org (Sandro Tosi) Date: Wed, 15 Feb 2017 21:53:42 -0500 Subject: [Numpy-discussion] PowerPC testing servers In-Reply-To: References: Message-ID: > A recent post to the wheel-builders mailing list pointed out some > links to places providing free PowerPC hosting for open source > projects, if they agree to a submitted request: The debian project has some powerpc machines (and we still build numpy on those boxes when i upload a new revision to our archives) and they also have hosts dedicated to let debian developers login and debug issues with their packages on that architecture. I can sponsor access to those machines for some of you, but it is not a place where you can host a CI instance. Just keep it in mind more broadly than powerpc, f.e. these are all the archs where numpy was built after the last upload https://buildd.debian.org/status/package.php?p=python-numpy&suite=unstable (the grayed out archs are the ones non release critical, so packages are built as best effort and if missing is not a big deal) -- Sandro "morph" Tosi My website: http://sandrotosi.me/ Me at Debian: http://wiki.debian.org/SandroTosi G+: https://plus.google.com/u/0/+SandroTosi From ralf.gommers at gmail.com Thu Feb 16 03:50:04 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 16 Feb 2017 21:50:04 +1300 Subject: [Numpy-discussion] PowerPC testing servers In-Reply-To: References: Message-ID: On Thu, Feb 16, 2017 at 8:58 AM, Matthew Brett wrote: > On Wed, Feb 15, 2017 at 7:55 PM, Ralf Gommers > wrote: > > > > > > On Thu, Feb 16, 2017 at 8:45 AM, Matthew Brett > > wrote: > >> > >> On Wed, Feb 15, 2017 at 7:37 PM, Ralf Gommers > >> wrote: > >> > > >> > > >> > On Thu, Feb 16, 2017 at 8:02 AM, Matthew Brett < > matthew.brett at gmail.com> > >> > wrote: > >> >> > >> >> Hey, > >> >> > >> >> A recent post to the wheel-builders mailing list pointed out some > >> >> links to places providing free PowerPC hosting for open source > >> >> projects, if they agree to a submitted request: > >> >> > >> >> > >> >> https://mail.python.org/pipermail/wheel-builders/2017- > February/000257.html > >> >> > >> >> It would be good to get some testing going on these architectures. > >> >> Shall we apply for hosting, as the numpy organization? > >> > > >> > > >> > Those are bare VMs it seems. Remembering the Buildbot and Mailman > >> > horrors, I > >> > think we should be very reluctant to taking responsibility for > >> > maintaining > >> > CI on anything that's not hosted and can be controlled with a simple > >> > config > >> > file in our repo. > >> > >> Not sure what you mean about mailman - maybe the Enthought servers we > >> didn't have access to? > > > > > > We did have access (for most of the time), it's just that no one is > > interested in putting in lots of hours on sysadmin duties. > > > >> > >> For buildbot, I've been maintaining about 12 > >> crappy old machines for about 7 years now [1] - I'm happy to do the > >> same job for a couple of properly hosted PPC machines. > > > > > > That's awesome persistence. The NumPy and SciPy buildbots certainly > weren't > > maintained like that, half of them were offline or broken for long > periods > > usually. > > Right - they do need persistence, and to have someone who takes > responsibility for them. > > >> > >> At least we'd > >> have some way of testing for these machines, if we get stuck - even if > >> that involved spinning up a VM and installing the stuff we needed from > >> the command line. > > > > > > I do see the value of testing on more platforms of course. It's just > about > > logistics/responsibilities. If you're saying that you'll do the > maintenance, > > and want to apply for resources using the NumPy name, that's much better > I > > think then making "the numpy devs" collectively responsible. > > Yes, exactly. I'm happy to take responsibility for them, I just > wanted to make sure that numpy devs could get at them if I'm not > around for some reason. > In that case, +1 from me! Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Thu Feb 16 03:55:36 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 16 Feb 2017 21:55:36 +1300 Subject: [Numpy-discussion] PowerPC testing servers In-Reply-To: References: Message-ID: On Thu, Feb 16, 2017 at 3:53 PM, Sandro Tosi wrote: > > A recent post to the wheel-builders mailing list pointed out some > > links to places providing free PowerPC hosting for open source > > projects, if they agree to a submitted request: > > The debian project has some powerpc machines (and we still build numpy > on those boxes when i upload a new revision to our archives) and they > also have hosts dedicated to let debian developers login and debug > issues with their packages on that architecture. I can sponsor access > to those machines for some of you, but it is not a place where you can > host a CI instance. > > Just keep it in mind more broadly than powerpc, f.e. these are all the > archs where numpy was built after the last upload > https://buildd.debian.org/status/package.php?p=python-numpy&suite=unstable > (the grayed out archs are the ones non release critical, so packages > are built as best effort and if missing is not a big deal) Thanks Sandro. It looks like even for the release-critical ones, it's just the build that has to succeed and failures are not detected? For example, armel is green but has 9 failures: https://buildd.debian.org/status/fetch.php?pkg=python-numpy&arch=armel&ver=1%3A1.12.0-2&stamp=1484889563&raw=0 Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Feb 16 12:52:16 2017 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 16 Feb 2017 12:52:16 -0500 Subject: [Numpy-discussion] PowerPC testing servers In-Reply-To: References: Message-ID: On Thu, Feb 16, 2017 at 3:55 AM, Ralf Gommers wrote: > > > On Thu, Feb 16, 2017 at 3:53 PM, Sandro Tosi wrote: > >> > A recent post to the wheel-builders mailing list pointed out some >> > links to places providing free PowerPC hosting for open source >> > projects, if they agree to a submitted request: >> >> The debian project has some powerpc machines (and we still build numpy >> on those boxes when i upload a new revision to our archives) and they >> also have hosts dedicated to let debian developers login and debug >> issues with their packages on that architecture. I can sponsor access >> to those machines for some of you, but it is not a place where you can >> host a CI instance. >> >> Just keep it in mind more broadly than powerpc, f.e. these are all the >> archs where numpy was built after the last upload >> https://buildd.debian.org/status/package.php?p=python-numpy& >> suite=unstable >> (the grayed out archs are the ones non release critical, so packages >> are built as best effort and if missing is not a big deal) > > > Thanks Sandro. It looks like even for the release-critical ones, it's just > the build that has to succeed and failures are not detected? For example, > armel is green but has 9 failures: https://buildd.debian.org/stat > us/fetch.php?pkg=python-numpy&arch=armel&ver=1%3A1.12.0-2& > stamp=1484889563&raw=0 > > Ralf > More general questions on this: Are there any overviews over which packages in the python for science or python for data anlaysis areas work correctly on different platforms: Are there any platforms/processors, besides the standard x32/x54, where this is important? for example for statsmodels: In early releases of statsmodels, maybe 5 to 7 years ago, Yarik and I were still debugging problems on several machines like ppc and s390x during Debian testing. Since then I haven't heard much about specific problems. The current status for statsmodels on Debian machines is pretty mixed. In several of them some dependencies are not available, in some cases we have errors that might be caused by errors in dependencies, e.g. cvxopt. ppc64el test run for statsmodels has a large number of failure but checking scipy, it looks like it's also not working properly https://buildd.debian.org/status/fetch.php?pkg=python- scipy&arch=ppc64el&ver=0.18.1-2&stamp=1477075663&raw=0 In those cases it would be impossible to start debugging, if we would have to debug through the entire dependency chain. CI-testing for Windows, Apple and Linux for mainly x64 seems to be working pretty well, with some delays while version incompatibilities are fixed. But anything that is not in a CI testing setup looks pretty random to me. (I'm mainly curious what the status for those machines are. I'm not really eager to create more debugging work, but sometimes failures on a machine point to code that is "fragile".) Josef > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From morph at debian.org Thu Feb 16 13:52:50 2017 From: morph at debian.org (Sandro Tosi) Date: Thu, 16 Feb 2017 13:52:50 -0500 Subject: [Numpy-discussion] PowerPC testing servers In-Reply-To: References: Message-ID: On Thu, Feb 16, 2017 at 3:55 AM, Ralf Gommers wrote: > Thanks Sandro. It looks like even for the release-critical ones, it's just > the build that has to succeed and failures are not detected? For example, > armel is green but has 9 failures: > https://buildd.debian.org/status/fetch.php?pkg=python-numpy&arch=armel&ver=1%3A1.12.0-2&stamp=1484889563&raw=0 i made any error in the test suite non-fatal so that we could collect the errors and then report them back. sadly i'm currently lacking the time to report all the errors in the archs, will try to get at that soon -- Sandro "morph" Tosi My website: http://sandrotosi.me/ Me at Debian: http://wiki.debian.org/SandroTosi G+: https://plus.google.com/u/0/+SandroTosi From robbmcleod at gmail.com Fri Feb 17 06:15:05 2017 From: robbmcleod at gmail.com (Robert McLeod) Date: Fri, 17 Feb 2017 12:15:05 +0100 Subject: [Numpy-discussion] ANN: NumExpr3 Alpha Message-ID: Hi everyone, I'm pleased to announce that a new branch of NumExpr has been developed that will hopefully lead to a new major version release in the future. You can find the branch on the PyData github repository, and installation is as follows: git clone https://github.com/pydata/numexpr.git cd numexpr git checkout numexpr-3.0 python setup.py install What's new? ========== Faster --------- The operations were re-written in such a way that gcc can auto-vectorize the loops to use SIMD instructions. Each operation now has a strided and aligned branch, which improves performance on aligned arrays by ~ 40 %. The setup time for threads has been reduced, by removing an unnecessary abstraction layer, and various other minor re-factorizations, resulting in improved thread scaling. The combination of speed-ups means that NumExpr3 often runs 200-500 % faster than NumExpr2.6 on a machine with AVX2 support. The break-even point with NumPy is now roughly arrays with 64k-elements, compared to 256-512k-elements for NE2. Plot of comparative performance for NumPy versus NE2 versus NE3 over a range of array sizes are available at: http://entropyproduction.blogspot.ch/2017/02/introduction-to-numexpr-3- alpha.html More NumPy Datatypes -------------------------------- The program was re-factorized from a ascii-encoded byte code to a struct array, so that the operation space is now 65535 instead of 128. As such, support for uint8, int8, uint16, int16, uint32, uint64, and complex64 data types was added. NumExpr3 now uses NumPy 'safe' casting rules. If an operation doesn't return the same result as NumPy, it's a bug. In the future other casting styles will be added if there is a demand for them. More complete function set ------------------------------------ With the enhanced operation space, almost the entire C++11 cmath function set is supported (if the compiler library has them; only C99 is expected). Also bitwise operations were added for all integer datatypes. There are now 436 operations/functions in NE3, with more to come, compared to 190 in NE2. Also a library-enum has been added to the op keys which allows multiple backend libraries to be linked to the interpreter, and changed on a per-expression basis, rather than picking between GNU std and Intel VML at compile time, for example. More complete Python language support ------------------------------------------------------ The Python compiler was re-written from scratch to use the CPython `ast` module and a functional programming approach. As such, NE3 now compiles a wider subset of the Python language. It supports multi-line evaluation, and assignment with named temporaries. The new compiler spends considerably less time in Python to compile expressions, about 200 us for 'a*b' compared to 550 us for NE2. Compare for example: out_ne2 = ne2.evaluate( 'exp( -sin(2*a**2) - cos(2*b**2) - 2*a**2*b**2' ) to: neObj = NumExp( '''a2 = a*a; b2 = b*b out_magic = exp( -sin(2*a2) - cos(2*b2) - 2*a2*b2''' ) This is a contrived example but the multi-line approach will allow for cleaner code and more sophisticated algorithms to be encapsulated in a single NumExpr call. The convention is that intermediate assignment targets are named temporaries if they do not exist in the calling frame, and full assignment targets if they do, which provides a method for multiple returns. Single-level de-referencing (e.g. `self.data`) is also supported for increased convenience and cleaner code. Slicing still needs to be performed above the ne3.evaluate() or ne3.NumExpr() call. More maintainable ------------------------- The code base was generally refactored to increase the prevalence of single-point declarations, such that modifications don't require extensive knowledge of the code. In NE2 a lot of code was generated by the pre-processor using nested #defines. That has been replaced by a object-oriented Python code generator called by setup.py, which generates about 15k lines of C code with 1k lines of Python. The use of generated code with defined line numbers makes debugging threaded code simpler. The generator also builds the autotest portion of the test submodule, for checking equivalence between NumPy and NumExpr3 operations and functions. What's TODO compared to NE2? ------------------------------------------ * strided complex functions * Intel VML support (less necessary now with gcc auto-vectorization) * bytes and unicode support * reductions (mean, sum, prod, std) What I'm looking for feedback on -------------------------------------------- * String arrays: How do you use them? How would unicode differ from bytes strings? * Interface: We now have a more object-oriented interface underneath the familiar evaluate() interface. How would you like to use this interface? Francesc suggested generator support, as currently it's more difficult to use NumExpr within a loop than it should be. Ideas for the future ------------------------- * vectorize real functions (such as exp, sqrt, log) similar to the complex_functions.hpp vectorization. * Add a keyword (likely 'yield') to indicate that a token is intended to be changed by a generator inside a loop with each call to NumExpr.run() If you have any thoughts or find any issues please don't hesitate to open an issue at the Github repo. Although unit tests have been run over the operation space there are undoubtedly a number of bugs to squash. Sincerely, Robert -- Robert McLeod, Ph.D. Center for Cellular Imaging and Nano Analytics (C-CINA) Biozentrum der Universit?t Basel Mattenstrasse 26, 4058 Basel Work: +41.061.387.3225 <061%20387%2032%2025> robert.mcleod at unibas.ch robert.mcleod at bsse.ethz.ch robbmcleod at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at gmail.com Fri Feb 17 07:13:00 2017 From: faltet at gmail.com (Francesc Alted) Date: Fri, 17 Feb 2017 13:13:00 +0100 Subject: [Numpy-discussion] ANN: NumExpr3 Alpha In-Reply-To: References: Message-ID: Yay! This looks really exciting. Thanks for all the hard work! Francesc 2017-02-17 12:15 GMT+01:00 Robert McLeod : > Hi everyone, > > I'm pleased to announce that a new branch of NumExpr has been developed > that will hopefully lead to a new major version release in the future. You > can find the branch on the PyData github repository, and installation is as > follows: > > git clone https://github.com/pydata/numexpr.git > cd numexpr > git checkout numexpr-3.0 > python setup.py install > > What's new? > ========== > > Faster > --------- > > The operations were re-written in such a way that gcc can auto-vectorize > the loops to use SIMD instructions. Each operation now has a strided and > aligned branch, which improves performance on aligned arrays by ~ 40 %. The > setup time for threads has been reduced, by removing an unnecessary > abstraction layer, and various other minor re-factorizations, resulting in > improved thread scaling. > > The combination of speed-ups means that NumExpr3 often runs 200-500 % > faster than NumExpr2.6 on a machine with AVX2 support. The break-even point > with NumPy is now roughly arrays with 64k-elements, compared to > 256-512k-elements for NE2. > > Plot of comparative performance for NumPy versus NE2 versus NE3 over a > range of array sizes are available at: > > http://entropyproduction.blogspot.ch/2017/02/introduction- > to-numexpr-3-alpha.html > > More NumPy Datatypes > -------------------------------- > > The program was re-factorized from a ascii-encoded byte code to a struct > array, so that the operation space is now 65535 instead of 128. As such, > support for uint8, int8, uint16, int16, uint32, uint64, and complex64 data > types was added. > > NumExpr3 now uses NumPy 'safe' casting rules. If an operation doesn't > return the same result as NumPy, it's a bug. In the future other casting > styles will be added if there is a demand for them. > > > More complete function set > ------------------------------------ > > With the enhanced operation space, almost the entire C++11 cmath function > set is supported (if the compiler library has them; only C99 is expected). > Also bitwise operations were added for all integer datatypes. There are now > 436 operations/functions in NE3, with more to come, compared to 190 in NE2. > > Also a library-enum has been added to the op keys which allows multiple > backend libraries to be linked to the interpreter, and changed on a > per-expression basis, rather than picking between GNU std and Intel VML at > compile time, for example. > > > More complete Python language support > ------------------------------------------------------ > > The Python compiler was re-written from scratch to use the CPython `ast` > module and a functional programming approach. As such, NE3 now compiles a > wider subset of the Python language. It supports multi-line evaluation, and > assignment with named temporaries. The new compiler spends considerably > less time in Python to compile expressions, about 200 us for 'a*b' compared > to 550 us for NE2. > > Compare for example: > > out_ne2 = ne2.evaluate( 'exp( -sin(2*a**2) - cos(2*b**2) - > 2*a**2*b**2' ) > > to: > > neObj = NumExp( '''a2 = a*a; b2 = b*b > out_magic = exp( -sin(2*a2) - cos(2*b2) - 2*a2*b2''' ) > > This is a contrived example but the multi-line approach will allow for > cleaner code and more sophisticated algorithms to be encapsulated in a > single NumExpr call. The convention is that intermediate assignment targets > are named temporaries if they do not exist in the calling frame, and full > assignment targets if they do, which provides a method for multiple > returns. Single-level de-referencing (e.g. `self.data`) is also supported > for increased convenience and cleaner code. Slicing still needs to be > performed above the ne3.evaluate() or ne3.NumExpr() call. > > > More maintainable > ------------------------- > > The code base was generally refactored to increase the prevalence of > single-point declarations, such that modifications don't require extensive > knowledge of the code. In NE2 a lot of code was generated by the > pre-processor using nested #defines. That has been replaced by a > object-oriented Python code generator called by setup.py, which generates > about 15k lines of C code with 1k lines of Python. The use of generated > code with defined line numbers makes debugging threaded code simpler. > > The generator also builds the autotest portion of the test submodule, for > checking equivalence between NumPy and NumExpr3 operations and functions. > > > What's TODO compared to NE2? > ------------------------------------------ > > * strided complex functions > * Intel VML support (less necessary now with gcc auto-vectorization) > * bytes and unicode support > * reductions (mean, sum, prod, std) > > > What I'm looking for feedback on > -------------------------------------------- > > * String arrays: How do you use them? How would unicode differ from bytes > strings? > * Interface: We now have a more object-oriented interface underneath the > familiar > evaluate() interface. How would you like to use this interface? > Francesc suggested > generator support, as currently it's more difficult to use NumExpr > within a loop than > it should be. > > > Ideas for the future > ------------------------- > > * vectorize real functions (such as exp, sqrt, log) similar to the > complex_functions.hpp vectorization. > * Add a keyword (likely 'yield') to indicate that a token is intended to > be changed by a generator inside a loop with each call to NumExpr.run() > > If you have any thoughts or find any issues please don't hesitate to open > an issue at the Github repo. Although unit tests have been run over the > operation space there are undoubtedly a number of bugs to squash. > > Sincerely, > > Robert > > -- > Robert McLeod, Ph.D. > Center for Cellular Imaging and Nano Analytics (C-CINA) > Biozentrum der Universit?t Basel > Mattenstrasse 26, 4058 Basel > Work: +41.061.387.3225 <061%20387%2032%2025> > robert.mcleod at unibas.ch > robert.mcleod at bsse.ethz.ch > robbmcleod at gmail.com > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Francesc Alted -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidmenhur at gmail.com Fri Feb 17 10:34:11 2017 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Fri, 17 Feb 2017 16:34:11 +0100 Subject: [Numpy-discussion] ANN: NumExpr3 Alpha In-Reply-To: References: Message-ID: This is very nice indeed! On 17 February 2017 at 12:15, Robert McLeod wrote: > * bytes and unicode support > * reductions (mean, sum, prod, std) I use both a lot, maybe I can help you get them working. Also, regarding "Vectorization hasn't been done yet with cmath functions for real numbers (such as sqrt(), exp(), etc.), only for complex functions". What is the bottleneck? Is it in GCC or just someone has to sit down and adapt it? From pierre.schnizer at helmholtz-berlin.de Fri Feb 17 10:54:56 2017 From: pierre.schnizer at helmholtz-berlin.de (Schnizer, Pierre) Date: Fri, 17 Feb 2017 15:54:56 +0000 Subject: [Numpy-discussion] Building external c modules with mingw64 / numpy In-Reply-To: References: <243DBD016692E54EB12F37B87C66E70E815DB8@didag1> Message-ID: <243DBD016692E54EB12F37B87C66E70E819AB4@didag1> Dear Ralf, I made some further improvements as one problem was related to my setup file. I will use numpy git repository to cross check it and then report again. Sincerely yours Pierre Von: NumPy-Discussion [mailto:numpy-discussion-bounces at scipy.org] Im Auftrag von Ralf Gommers Gesendet: Dienstag, 14. Februar 2017 11:00 An: Discussion of Numerical Python Betreff: Re: [Numpy-discussion] Building external c modules with mingw64 / numpy On Sat, Jan 21, 2017 at 9:23 PM, Schnizer, Pierre > wrote: Dear all, I built an external c-module (pygsl) using mingw 64 from msys2 mingw64-gcc compiler. This built required some changes to numpy.distutils to get the ?python setup.py config? and ?python setup.py build? working. In this process I replaced 2 files in numpy.distutils from numpy git repository: - numpy.dist_utils.misc_utils.py version ec0e046 on 14 Dec 2016 - numpy.dist_utils. mingw32ccompiler.py version ec0e046 on 14 Dec 2016 mingw32ccompiler.py required to be modified to get it work ? preprocessor had to be defined as I am using setup.py config ? specifying the runtime library search path to the linker ? include path of the vcrtruntime I attached a patch reflecting the changes I had to make to file mingw32ccompile.py If this information is useful I am happy to answer questions Thanks for the patch Pierre. For future reference: a pull request on GitHub or a link to a Gist is preferred for us and usually gets you a response quicker. Regarding your question in the patch on including Python's install directory: that shouldn't be necessary, and I'd be wary of applying your patch without understanding why the current numpy.distutils code doesn't work for you. But if your patch works for you then it can't hurt I think. Cheers, Ralf Sincerely yours Pierre PS Version infos: Python: Python 3.6.0 (v3.6.0:41df79263a11, Dec 23 2016, 08:06:12) [MSC v.1900 64 bit (AMD64)] on win32 Numpy: >> help(numpy.version) Help on module numpy.version in numpy: DATA full_version = '1.12.0' git_revision = '561f1accf861ad8606ea2dd723d2be2b09a2dffa' release = True short_version = '1.12.0' version = '1.12.0' gcc.exe (Rev2, Built by MSYS2 project) 6.2.0 ________________________________ Helmholtz-Zentrum Berlin f?r Materialien und Energie GmbH Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V. Aufsichtsrat: Vorsitzender Dr. Karl Eugen Huthmacher, stv. Vorsitzende Dr. Jutta Koch-Unterseher Gesch?ftsf?hrung: Prof. Dr. Anke Rita Kaysser-Pyzalla, Thomas Frederking Sitz Berlin, AG Charlottenburg, 89 HRB 5583 Postadresse: Hahn-Meitner-Platz 1 D-14109 Berlin http://www.helmholtz-berlin.de _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion ________________________________ Helmholtz-Zentrum Berlin f?r Materialien und Energie GmbH Mitglied der Hermann von Helmholtz-Gemeinschaft Deutscher Forschungszentren e.V. Aufsichtsrat: Vorsitzender Dr. Karl Eugen Huthmacher, stv. Vorsitzende Dr. Jutta Koch-Unterseher Gesch?ftsf?hrung: Prof. Dr. Anke Rita Kaysser-Pyzalla, Thomas Frederking Sitz Berlin, AG Charlottenburg, 89 HRB 5583 Postadresse: Hahn-Meitner-Platz 1 D-14109 Berlin http://www.helmholtz-berlin.de -------------- next part -------------- An HTML attachment was scrubbed... URL: From robbmcleod at gmail.com Fri Feb 17 11:42:09 2017 From: robbmcleod at gmail.com (Robert McLeod) Date: Fri, 17 Feb 2017 17:42:09 +0100 Subject: [Numpy-discussion] ANN: NumExpr3 Alpha In-Reply-To: References: Message-ID: Hi David, Thanks for your comments, reply below the fold. On Fri, Feb 17, 2017 at 4:34 PM, Da?id wrote: > This is very nice indeed! > > On 17 February 2017 at 12:15, Robert McLeod wrote: > > * bytes and unicode support > > * reductions (mean, sum, prod, std) > > I use both a lot, maybe I can help you get them working. > > Also, regarding "Vectorization hasn't been done yet with cmath > functions for real numbers (such as sqrt(), exp(), etc.), only for > complex functions". What is the bottleneck? Is it in GCC or just > someone has to sit down and adapt it? I just haven't done it yet. Basically I'm moving from Switzerland to Canada in a week so this was the gap to push something out that's usable if not perfect. Rather I just import cmath functions, which are inlined but I suspect what's needed is to break them down into their components. For example, the complex arccos function looks like this: static void nc_acos( npy_intp n, npy_complex64 *x, npy_complex64 *r) { npy_complex64 a; for( npy_intp I = 0; I < n; I++ ) { a = x[I]; _inline_mul( x[I], x[I], r[I] ); _inline_sub( Z_1, r[I], r[I] ); _inline_sqrt( r[I], r[I] ); _inline_muli( r[I], r[I] ); _inline_add( a, r[I], r[I] ); _inline_log( r[I] , r[I] ); _inline_muli( r[I], r[I] ); _inline_neg( r[I], r[I]); } } I haven't sat down and inspected whether the cmath versions get vectorized, but there's not a huge speed difference between NE2 and 3 for such a function on float (but their is for complex), so my suspicion is they aren't. Another option would be to add a library such as Yeppp! as LIB_YEPPP or some other library that's faster than glib. For example the glib function "fma(a,b,c)" is slower than doing "a*b+c" in NE3, and that's not how it should be. Yeppp is also built with Python generating C code, so it could either be very easy or very hard. On bytes and unicode, I haven't seen examples for how people use it, so I'm not sure where to start. Since there's practically not a limitation on the number of operations now (the library is 1.3 MB now, compared to 1.2 MB for NE2 with gcc 5.4) the string functions could grow significantly from what we have in NE2. With regards to reductions, NumExpr never multi-threaded them, and could only do outer reductions, so in the end there was no speed advantage to be had compared to having NumPy do them on the result. I suspect the primary value there was in PyTables and Pandas where the expression had to do everything. One of the things I've moved away from in NE3 is doing output buffering (rather it pre-allocates the output array), so for reductions the understanding NumExpr has of broadcasting would have to be deeper. In any event contributions would certainly be welcome. Robert -- Robert McLeod, Ph.D. Center for Cellular Imaging and Nano Analytics (C-CINA) Biozentrum der Universit?t Basel Mattenstrasse 26, 4058 Basel Work: +41.061.387.3225 <061%20387%2032%2025> robert.mcleod at unibas.ch robert.mcleod at bsse.ethz.ch robbmcleod at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Feb 18 13:07:08 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 18 Feb 2017 11:07:08 -0700 Subject: [Numpy-discussion] Eric Wieser added to NumPy team. Message-ID: Hi All, I'm pleased to welcome Eric to the NumPy team. There is a pile of pending PRs that grows every day and we are counting on Eric will help us keep it in check ;) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jni.soma at gmail.com Sat Feb 18 22:19:19 2017 From: jni.soma at gmail.com (Juan Nunez-Iglesias) Date: Sun, 19 Feb 2017 14:19:19 +1100 Subject: [Numpy-discussion] ANN: NumExpr3 Alpha In-Reply-To: References: Message-ID: <7b857f99-49fb-452c-979b-10495b00faf4@Spark> Hi everyone, Thanks for this. It looks absolutely fantastic. I've been putting off using numexpr but it looks like I don't have a choice anymore. ;) Regarding feature requests, I've always found it off putting that I have to wrap my expressions in a string to speed them up. Has anyone explored the possibility of using Python 3.6's frame evaluation API to do this? I remember a vague discussion on this list a while back but I don't know whether anything came of it. Thanks! Juan. On 18 Feb 2017, 3:42 AM +1100, Robert McLeod , wrote: > Hi David, > > Thanks for your comments, reply below the fold. > > > On Fri, Feb 17, 2017 at 4:34 PM, Da?id wrote: > > > This is very nice indeed! > > > > > > On 17 February 2017 at 12:15, Robert McLeod wrote: > > > > * bytes and unicode support > > > > * reductions (mean, sum, prod, std) > > > > > > I use both a lot, maybe I can help you get them working. > > > > > > Also, regarding "Vectorization hasn't been done yet with cmath > > > functions for real numbers (such as sqrt(), exp(), etc.), only for > > > complex functions". What is the bottleneck? Is it in GCC or just > > > someone has to sit down and adapt it? > > > > I just haven't done it yet.? Basically I'm moving from Switzerland to Canada in a week so this was the gap to push something out that's usable if not perfect. Rather I just import cmath functions, which are inlined but I suspect what's needed is to break them down into their components. For example, the complex arccos function looks like this: > > > > static void > > nc_acos( npy_intp n, npy_complex64 *x, npy_complex64 *r) > > { > > ? ? npy_complex64 a; > > ? ? for( npy_intp I = 0; I < n; I++ ) { > > ? ? ? ? a = x[I]; > > ? ? ? ? _inline_mul( x[I], x[I], r[I] ); > > ? ? ? ? _inline_sub( Z_1, r[I], r[I] ); > > ? ? ? ? _inline_sqrt( r[I], r[I] ); > > ? ? ? ? _inline_muli( r[I], r[I] ); > > ? ? ? ? _inline_add( a, r[I], r[I] ); > > ? ? ? ? _inline_log( r[I] , r[I] ); > > ? ? ? ? _inline_muli( r[I], r[I] ); > > ? ? ? ? _inline_neg( r[I], r[I]); > > ? ? } > > } > > > > I haven't sat down and inspected whether the cmath versions get vectorized, but there's not a huge speed difference between NE2 and 3 for such a function on float (but their is for complex), so my suspicion is they aren't.? Another option would be to add a library such as Yeppp! as LIB_YEPPP or some other library that's faster than glib.? For example the glib function "fma(a,b,c)" is slower than doing "a*b+c" in NE3, and that's not how it should be.? Yeppp is also built with Python generating C code, so it could either be very easy or very hard. > > > > On bytes and unicode, I haven't seen examples for how people use it, so I'm not sure where to start. Since there's practically not a limitation on the number of operations now (the library is 1.3 MB now, compared to 1.2 MB for NE2 with gcc 5.4) the string functions could grow significantly from what we have in NE2. > > > > With regards to reductions, NumExpr never multi-threaded them, and could only do outer reductions, so in the end there was no speed advantage to be had compared to having NumPy do them on the result.? I suspect the primary value there was in PyTables and Pandas where the expression had to do everything.? One of the things I've moved away from in NE3 is doing output buffering (rather it pre-allocates the output array), so for reductions the understanding NumExpr has of broadcasting would have to be deeper. > > > > In any event contributions would certainly be welcome. > > > > Robert > > > -- > Robert McLeod, Ph.D. > Center for Cellular Imaging and Nano Analytics (C-CINA) > Biozentrum der Universit?t Basel > Mattenstrasse 26, 4058 Basel > Work: +41.061.387.3225 > robert.mcleod at unibas.ch > robert.mcleod at bsse.ethz.ch > robbmcleod at gmail.com > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sun Feb 19 05:00:47 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 19 Feb 2017 23:00:47 +1300 Subject: [Numpy-discussion] Eric Wieser added to NumPy team. In-Reply-To: References: Message-ID: On Sun, Feb 19, 2017 at 7:07 AM, Charles R Harris wrote: > Hi All, > > I'm pleased to welcome Eric to the NumPy team. There is a pile of pending > PRs that grows every day and we are counting on Eric will help us keep it > in check ;) > Welcome to the team Eric! Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From robbmcleod at gmail.com Sun Feb 19 08:41:18 2017 From: robbmcleod at gmail.com (Robert McLeod) Date: Sun, 19 Feb 2017 14:41:18 +0100 Subject: [Numpy-discussion] ANN: NumExpr3 Alpha In-Reply-To: <7b857f99-49fb-452c-979b-10495b00faf4@Spark> References: <7b857f99-49fb-452c-979b-10495b00faf4@Spark> Message-ID: Hi Juan, A guy on reddit suggested looking at SymPy for just such a thing. I know that Dask also represents its process as a graph. https://www.reddit.com/r/Python/comments/5um04m/numexpr3/ I'll think about it some more but it seems a little abstract still. To a certain extent the NE3 compiler already works this way. The compiler has a dictionary in which keys are `ast.Node` types, and each value is a function pointer, which knows how to handle that particular node. Providing an external interface to this would be the most natural extension. There's quite a few things to do before I would think about a functional interface. The things I mentioned in my original mail; pickling of the C-object so that it can be using within modules like `multiprocessing`; having a pre-allocated shared memory region shared among threads for temporaries and parameters, etc. If someone else wants to dabble in it they are welcome to. Robert On Sun, Feb 19, 2017 at 4:19 AM, Juan Nunez-Iglesias wrote: > Hi everyone, > > Thanks for this. It looks absolutely fantastic. I've been putting off > using numexpr but it looks like I don't have a choice anymore. ;) > > Regarding feature requests, I've always found it off putting that I have > to wrap my expressions in a string to speed them up. Has anyone explored > the possibility of using Python 3.6's frame evaluation API to do this? I > remember a vague discussion on this list a while back but I don't know > whether anything came of it. > > Thanks! > > Juan. > > On 18 Feb 2017, 3:42 AM +1100, Robert McLeod , > wrote: > > Hi David, > > Thanks for your comments, reply below the fold. > > On Fri, Feb 17, 2017 at 4:34 PM, Da?id wrote: > >> This is very nice indeed! >> >> On 17 February 2017 at 12:15, Robert McLeod wrote: >> > * bytes and unicode support >> > * reductions (mean, sum, prod, std) >> >> I use both a lot, maybe I can help you get them working. >> >> Also, regarding "Vectorization hasn't been done yet with cmath >> functions for real numbers (such as sqrt(), exp(), etc.), only for >> complex functions". What is the bottleneck? Is it in GCC or just >> someone has to sit down and adapt it? > > > I just haven't done it yet. Basically I'm moving from Switzerland to > Canada in a week so this was the gap to push something out that's usable if > not perfect. Rather I just import cmath functions, which are inlined but I > suspect what's needed is to break them down into their components. For > example, the complex arccos function looks like this: > > static void > nc_acos( npy_intp n, npy_complex64 *x, npy_complex64 *r) > { > npy_complex64 a; > for( npy_intp I = 0; I < n; I++ ) { > a = x[I]; > _inline_mul( x[I], x[I], r[I] ); > _inline_sub( Z_1, r[I], r[I] ); > _inline_sqrt( r[I], r[I] ); > _inline_muli( r[I], r[I] ); > _inline_add( a, r[I], r[I] ); > _inline_log( r[I] , r[I] ); > _inline_muli( r[I], r[I] ); > _inline_neg( r[I], r[I]); > } > } > > I haven't sat down and inspected whether the cmath versions get > vectorized, but there's not a huge speed difference between NE2 and 3 for > such a function on float (but their is for complex), so my suspicion is > they aren't. Another option would be to add a library such as Yeppp! as > LIB_YEPPP or some other library that's faster than glib. For example the > glib function "fma(a,b,c)" is slower than doing "a*b+c" in NE3, and that's > not how it should be. Yeppp is also built with Python generating C code, > so it could either be very easy or very hard. > > On bytes and unicode, I haven't seen examples for how people use it, so > I'm not sure where to start. Since there's practically not a limitation on > the number of operations now (the library is 1.3 MB now, compared to 1.2 MB > for NE2 with gcc 5.4) the string functions could grow significantly from > what we have in NE2. > > With regards to reductions, NumExpr never multi-threaded them, and could > only do outer reductions, so in the end there was no speed advantage to be > had compared to having NumPy do them on the result. I suspect the primary > value there was in PyTables and Pandas where the expression had to do > everything. One of the things I've moved away from in NE3 is doing output > buffering (rather it pre-allocates the output array), so for reductions the > understanding NumExpr has of broadcasting would have to be deeper. > > In any event contributions would certainly be welcome. > > Robert > > -- > Robert McLeod, Ph.D. > Center for Cellular Imaging and Nano Analytics (C-CINA) > Biozentrum der Universit?t Basel > Mattenstrasse 26, 4058 Basel > Work: +41.061.387.3225 <061%20387%2032%2025> > robert.mcleod at unibas.ch > robert.mcleod at bsse.ethz.ch > robbmcleod at gmail.com > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Robert McLeod, Ph.D. Center for Cellular Imaging and Nano Analytics (C-CINA) Biozentrum der Universit?t Basel Mattenstrasse 26, 4058 Basel Work: +41.061.387.3225 robert.mcleod at unibas.ch robert.mcleod at bsse.ethz.ch robbmcleod at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Sun Feb 19 12:21:38 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Sun, 19 Feb 2017 12:21:38 -0500 Subject: [Numpy-discussion] ANN: NumExpr3 Alpha In-Reply-To: References: <7b857f99-49fb-452c-979b-10495b00faf4@Spark> Message-ID: Hi All, Just a side note that at a smaller scale some of the benefits of numexpr are coming to numpy: Julian Taylor has been working on identifying temporary arrays in https://github.com/numpy/numpy/pull/7997. Julian also commented (https://github.com/numpy/numpy/pull/7997#issuecomment-246118772) that with PEP 523 in python 3.6, this should indeed become a lot easier. All the best, Marten From ashwin.pathak at students.iiit.ac.in Mon Feb 20 13:32:35 2017 From: ashwin.pathak at students.iiit.ac.in (ashwin.pathak) Date: Tue, 21 Feb 2017 00:02:35 +0530 Subject: [Numpy-discussion] Numpy Development Queries Message-ID: <8cbf1c9b6c561724e2018d77faadae41@students.iiit.ac.in> Hello all, I am new to this organization and wanted to start with some easy-fix issues to get some knowledge about the soruce code. However, the issues under easy-fix labels have already been solved or someone is at it. Can someone help me find such issues? From ralf.gommers at gmail.com Tue Feb 21 00:01:55 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 21 Feb 2017 18:01:55 +1300 Subject: [Numpy-discussion] Numpy Development Queries In-Reply-To: <8cbf1c9b6c561724e2018d77faadae41@students.iiit.ac.in> References: <8cbf1c9b6c561724e2018d77faadae41@students.iiit.ac.in> Message-ID: On Tue, Feb 21, 2017 at 7:32 AM, ashwin.pathak < ashwin.pathak at students.iiit.ac.in> wrote: > Hello all, > I am new to this organization and wanted to start with some easy-fix > issues to get some knowledge about the soruce code. However, the issues > under easy-fix labels have already been solved or someone is at it. Can > someone help me find such issues? > Hi Ashwin, welcome. I don't want to seem discouraging, but I do want to explain that NumPy is significantly harder to get started on than SciPy (which you've started on already) as a newcomer to the scientific Python ecosystem. So I'd encourage you to spend some more time on the SciPy issues - there are more easy-fix ones there, and the process of contributing (pull requests, reviews, finding your way around the codebase) is similar for the two projects. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at gmail.com Tue Feb 21 04:10:04 2017 From: faltet at gmail.com (Francesc Alted) Date: Tue, 21 Feb 2017 10:10:04 +0100 Subject: [Numpy-discussion] ANN: NumExpr3 Alpha In-Reply-To: References: <7b857f99-49fb-452c-979b-10495b00faf4@Spark> Message-ID: Yes, Julian is doing an amazing work on getting rid of temporaries inside NumPy. However, NumExpr still has the advantage of using multi-threading right out of the box, as well as integration with Intel VML. Hopefully these features will eventually arrive to NumPy, but meanwhile there is still value in pushing NumExpr. Francesc 2017-02-19 18:21 GMT+01:00 Marten van Kerkwijk : > Hi All, > > Just a side note that at a smaller scale some of the benefits of > numexpr are coming to numpy: Julian Taylor has been working on > identifying temporary arrays in > https://github.com/numpy/numpy/pull/7997. Julian also commented > (https://github.com/numpy/numpy/pull/7997#issuecomment-246118772) that > with PEP 523 in python 3.6, this should indeed become a lot easier. > > All the best, > > Marten > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Francesc Alted -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Tue Feb 21 13:52:05 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Tue, 21 Feb 2017 13:52:05 -0500 Subject: [Numpy-discussion] Could we simplify backporting? Message-ID: Hi All, In gh-8594, a question came up how to mark things that should be backported and Chuck commented [1]: > Our backport policy is still somewhat ad hoc, especially as I the only one who has been doing release. What I currently do is set the milestone to the earlier version, so I will find the PR when looking for backports, then do a backport, label it as such, set the milestone on the backported version, and remove the milestone from the original. I'm not completely happy with the process, so if you have better ideas I'd like to hear them. One option I've considered is a `backported` label in addition to the `backport` label, then use the latter for things to be backported. It seems that continuing to set the milestone to a bug-release version if required was a good idea; it is little effort to anyone and keeps things clear. For the rest, might it be possible to make things more automated? E.g., might it be possible to have some travis magic that does a trial merge & test? Could one somehow merge to multiple branches at the same time? I have no idea myself really how these things work, but maybe some of you do! All the best, Marten From matthew.brett at gmail.com Tue Feb 21 14:41:32 2017 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 21 Feb 2017 11:41:32 -0800 Subject: [Numpy-discussion] PowerPC testing servers In-Reply-To: References: Message-ID: Hi, On Thu, Feb 16, 2017 at 12:50 AM, Ralf Gommers wrote: > > > On Thu, Feb 16, 2017 at 8:58 AM, Matthew Brett > wrote: >> >> On Wed, Feb 15, 2017 at 7:55 PM, Ralf Gommers >> wrote: >> > >> > >> > On Thu, Feb 16, 2017 at 8:45 AM, Matthew Brett >> > wrote: >> >> >> >> On Wed, Feb 15, 2017 at 7:37 PM, Ralf Gommers >> >> wrote: >> >> > >> >> > >> >> > On Thu, Feb 16, 2017 at 8:02 AM, Matthew Brett >> >> > >> >> > wrote: >> >> >> >> >> >> Hey, >> >> >> >> >> >> A recent post to the wheel-builders mailing list pointed out some >> >> >> links to places providing free PowerPC hosting for open source >> >> >> projects, if they agree to a submitted request: >> >> >> >> >> >> >> >> >> >> >> >> https://mail.python.org/pipermail/wheel-builders/2017-February/000257.html >> >> >> >> >> >> It would be good to get some testing going on these architectures. >> >> >> Shall we apply for hosting, as the numpy organization? >> >> > >> >> > >> >> > Those are bare VMs it seems. Remembering the Buildbot and Mailman >> >> > horrors, I >> >> > think we should be very reluctant to taking responsibility for >> >> > maintaining >> >> > CI on anything that's not hosted and can be controlled with a simple >> >> > config >> >> > file in our repo. >> >> >> >> Not sure what you mean about mailman - maybe the Enthought servers we >> >> didn't have access to? >> > >> > >> > We did have access (for most of the time), it's just that no one is >> > interested in putting in lots of hours on sysadmin duties. >> > >> >> >> >> For buildbot, I've been maintaining about 12 >> >> crappy old machines for about 7 years now [1] - I'm happy to do the >> >> same job for a couple of properly hosted PPC machines. >> > >> > >> > That's awesome persistence. The NumPy and SciPy buildbots certainly >> > weren't >> > maintained like that, half of them were offline or broken for long >> > periods >> > usually. >> >> Right - they do need persistence, and to have someone who takes >> responsibility for them. >> >> >> >> >> At least we'd >> >> have some way of testing for these machines, if we get stuck - even if >> >> that involved spinning up a VM and installing the stuff we needed from >> >> the command line. >> > >> > >> > I do see the value of testing on more platforms of course. It's just >> > about >> > logistics/responsibilities. If you're saying that you'll do the >> > maintenance, >> > and want to apply for resources using the NumPy name, that's much better >> > I >> > think then making "the numpy devs" collectively responsible. >> >> Yes, exactly. I'm happy to take responsibility for them, I just >> wanted to make sure that numpy devs could get at them if I'm not >> around for some reason. > > > In that case, +1 from me! OK - IBM have kindly given me access to a testing machine, via my own SSH public key. Would it make sense to have a Numpy key, with several people having access to the private key and passphrase? Cheers, Matthew From alex.rogozhnikov at yandex.ru Tue Feb 21 18:05:07 2017 From: alex.rogozhnikov at yandex.ru (Alex Rogozhnikov) Date: Wed, 22 Feb 2017 02:05:07 +0300 Subject: [Numpy-discussion] Fortran order in recarray. Message-ID: Hi, a question about numpy.recarray: There is a parameter order in constructor https://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.recarray.html , but it seems to have no effect: import numpy x = numpy.recarray(dtype=[('a', int), ('b', float)], shape=[1000], order='C') y = numpy.recarray(dtype=[('a', int), ('b', float)], shape=[1000], order='F') print numpy.array(x.ctypes.get_strides()) # [16] print numpy.array(y.ctypes.get_strides()) # [16] is this an intended behavior or bug? Thanks, Alex. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Feb 21 18:10:23 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 21 Feb 2017 15:10:23 -0800 Subject: [Numpy-discussion] Fortran order in recarray. In-Reply-To: References: Message-ID: On Tue, Feb 21, 2017 at 3:05 PM, Alex Rogozhnikov < alex.rogozhnikov at yandex.ru> wrote: > a question about numpy.recarray: > There is a parameter order in constructor https://docs.scipy.org/doc/ > numpy-1.10.1/reference/generated/numpy.recarray.html, but it seems to > have no effect: > x = numpy.recarray(dtype=[('a', int), ('b', float)], shape=[1000], > order='C') > you are creating a 1D array here -- there is no difference between Fortran and C order for a 1D array. For 2D: In [2]: x = numpy.recarray(dtype=[('a', int), ('b', float)], shape=[10,10], order='C') In [3]: x.strides Out[3]: (160, 16) In [4]: y = numpy.recarray(dtype=[('a', int), ('b', float)], shape=[10,10], order='F') In [5]: y.strides Out[5]: (16, 160) note the easier way to get the strides, too :-) -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex.rogozhnikov at yandex.ru Tue Feb 21 18:24:39 2017 From: alex.rogozhnikov at yandex.ru (Alex Rogozhnikov) Date: Wed, 22 Feb 2017 02:24:39 +0300 Subject: [Numpy-discussion] Fortran order in recarray. In-Reply-To: References: Message-ID: Ah, got it. Thanks, Chris! I thought recarray can be only one-dimensional (like tables with named columns). Maybe it's better to ask directly what I was looking for: something that works like a table with named columns (but no labelling for rows), and keeps data (of different dtypes) in a column-by-column way (and this is numpy, not pandas). Is there such a magic thing? Alex. > 22 ????. 2017 ?., ? 2:10, Chris Barker ???????(?): > > > > On Tue, Feb 21, 2017 at 3:05 PM, Alex Rogozhnikov > wrote: > a question about numpy.recarray: > There is a parameter order in constructor https://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.recarray.html , but it seems to have no effect: > x = numpy.recarray(dtype=[('a', int), ('b', float)], shape=[1000], order='C') > > you are creating a 1D array here -- there is no difference between Fortran and C order for a 1D array. For 2D: > > In [2]: x = numpy.recarray(dtype=[('a', int), ('b', float)], shape=[10,10], order='C') > > > In [3]: x.strides > Out[3]: (160, 16) > > > In [4]: y = numpy.recarray(dtype=[('a', int), ('b', float)], shape=[10,10], order='F') > > > In [5]: y.strides > Out[5]: (16, 160) > > note the easier way to get the strides, too :-) > > -CHB > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Feb 21 19:53:07 2017 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 21 Feb 2017 16:53:07 -0800 Subject: [Numpy-discussion] Fortran order in recarray. In-Reply-To: References: Message-ID: On Feb 21, 2017 3:24 PM, "Alex Rogozhnikov" wrote: Ah, got it. Thanks, Chris! I thought recarray can be only one-dimensional (like tables with named columns). Maybe it's better to ask directly what I was looking for: something that works like a table with named columns (but no labelling for rows), and keeps data (of different dtypes) in a column-by-column way (and this is numpy, not pandas). Is there such a magic thing? Well, that's what pandas is for... A dict of arrays? -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Wed Feb 22 04:24:29 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 22 Feb 2017 22:24:29 +1300 Subject: [Numpy-discussion] Could we simplify backporting? In-Reply-To: References: Message-ID: On Wed, Feb 22, 2017 at 7:52 AM, Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > Hi All, > > In gh-8594, a question came up how to mark things that should be > backported and Chuck commented [1]: > > > Our backport policy is still somewhat ad hoc, especially as I the only > one who has been doing release. What I currently do is set the milestone to > the earlier version, so I will find the PR when looking for backports, then > do a backport, label it as such, set the milestone on the backported > version, and remove the milestone from the original. I'm not completely > happy with the process, so if you have better ideas I'd like to hear them. > One option I've considered is a `backported` label in addition to the > `backport` label, then use the latter for things to be backported. > I really don't like the double work and the large amount of noise coming from backporting every other PR to NumPy very quickly. For SciPy the policy is: - anyone can set the "backport-candidate" label - the release manager backports, usually a bunch in one go - only important fixes get backported (involves some judging, but things like silencing warnings, doc fixes, etc. are not important enough) This works well, and I'd hope that we can make the NumPy approach similar. It seems that continuing to set the milestone to a bug-release version > if required was a good idea; it is little effort to anyone and keeps > things clear. +1 For the rest, might it be possible to make things more > automated? E.g., might it be possible to have some travis magic that > does a trial merge & test? Not sure how you would deal with merge conflicts on cherry-picks in an automatic backport thingy? Could one somehow merge to multiple > branches at the same time? > Don't think so. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex.rogozhnikov at yandex.ru Wed Feb 22 06:45:14 2017 From: alex.rogozhnikov at yandex.ru (Alex Rogozhnikov) Date: Wed, 22 Feb 2017 14:45:14 +0300 Subject: [Numpy-discussion] Fortran order in recarray. In-Reply-To: References: Message-ID: <0902C347-89B2-41EE-9367-F0C7A4F864D4@yandex.ru> Hi Nathaniel, > pandas yup, the idea was to have minimal pandas.DataFrame-like storage (which I was using for a long time), but without irritating problems with its row indexing and some other problems like interaction with matplotlib. > A dict of arrays? that's what I've started from and implemented, but at some point I decided that I'm reinventing the wheel and numpy has something already. In principle, I can ignore this 'column-oriented' storage requirement, but potentially it may turn out to be quite slow-ish if dtype's size is large. Suggestions are welcome. Another strange question: in general, it is considered that once numpy.array is created, it's shape not changed. But if i want to keep the same recarray and change it's dtype and/or shape, is there a way to do this? Thanks, Alex. > 22 ????. 2017 ?., ? 3:53, Nathaniel Smith ???????(?): > > On Feb 21, 2017 3:24 PM, "Alex Rogozhnikov" > wrote: > Ah, got it. Thanks, Chris! > I thought recarray can be only one-dimensional (like tables with named columns). > > Maybe it's better to ask directly what I was looking for: > something that works like a table with named columns (but no labelling for rows), and keeps data (of different dtypes) in a column-by-column way (and this is numpy, not pandas). > > Is there such a magic thing? > > Well, that's what pandas is for... > > A dict of arrays? > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Wed Feb 22 08:49:59 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Wed, 22 Feb 2017 08:49:59 -0500 Subject: [Numpy-discussion] Could we simplify backporting? In-Reply-To: References: Message-ID: Hi Ralf, Yes, good to think about other policies. For astropy, we do the decision by labelling with the bug-fix branch (with a policy that it really should fix a bug), and inserting text in that release's bug-fix notes (we really should automate that part...). Then, backporting is done shortly before the bug-fix release and, as far as I can tell (not having done it myself), outside of github. In rare cases with hard-to-resolve merge conflicts, the original PR author gets a note asking for help. As for a travis test: here I was mostly thinking of an allowed-to-fail test that would at least alert one if backporting was going to be an issue. I think travis runs again once one merges, correct? If so, on that merge it could, in principle, do the backport too (if given enough permission, of course; I'm not sure at all I'd want that, just pointing out the possibility! E.g., it might trigger on a message in the merge commit.). All the best, Marten From faltet at gmail.com Wed Feb 22 09:03:32 2017 From: faltet at gmail.com (Francesc Alted) Date: Wed, 22 Feb 2017 15:03:32 +0100 Subject: [Numpy-discussion] Fortran order in recarray. In-Reply-To: <0902C347-89B2-41EE-9367-F0C7A4F864D4@yandex.ru> References: <0902C347-89B2-41EE-9367-F0C7A4F864D4@yandex.ru> Message-ID: Hi Alex, 2017-02-22 12:45 GMT+01:00 Alex Rogozhnikov : > Hi Nathaniel, > > > pandas > > > yup, the idea was to have minimal pandas.DataFrame-like storage (which I > was using for a long time), > but without irritating problems with its row indexing and some other > problems like interaction with matplotlib. > > A dict of arrays? > > > that's what I've started from and implemented, but at some point I decided > that I'm reinventing the wheel and numpy has something already. In > principle, I can ignore this 'column-oriented' storage requirement, but > potentially it may turn out to be quite slow-ish if dtype's size is large. > > Suggestions are welcome. > ?You may want to try bcolz: https://github.com/Blosc/bcolz bcolz is a columnar storage, basically as you require, but data is compressed by default even when stored in-memory (although you can disable compression if you want to).? > > Another strange question: > in general, it is considered that once numpy.array is created, it's shape > not changed. > But if i want to keep the same recarray and change it's dtype and/or > shape, is there a way to do this? > ?You can change shapes of numpy arrays, but that usually involves copies of the whole container. With bcolz you can change length and add/del columns without copies.? If your containers are large, it is better to inform bcolz on its final estimated size. See: http://bcolz.blosc.org/en/latest/opt-tips.html ?Francesc? > > Thanks, > Alex. > > > > 22 ????. 2017 ?., ? 3:53, Nathaniel Smith ???????(?): > > On Feb 21, 2017 3:24 PM, "Alex Rogozhnikov" > wrote: > > Ah, got it. Thanks, Chris! > I thought recarray can be only one-dimensional (like tables with named > columns). > > Maybe it's better to ask directly what I was looking for: > something that works like a table with named columns (but no labelling for > rows), and keeps data (of different dtypes) in a column-by-column way (and > this is numpy, not pandas). > > Is there such a magic thing? > > > Well, that's what pandas is for... > > A dict of arrays? > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Francesc Alted -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Wed Feb 22 09:31:54 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Wed, 22 Feb 2017 09:31:54 -0500 Subject: [Numpy-discussion] __numpy_ufunc__ In-Reply-To: References: Message-ID: Hi All, I'd very much like to get `__array_ufunc__` in, and am willing to do some work, but fear we need to get past the last sticking point. As I noted in Chuck's PR [1], in python 3.6 there is now an explicit language change [2], which I think is relevant: ``` It is now possible to set a special method to None to indicate that the corresponding operation is not available. For example, if a class sets __iter__() to None, the class is not iterable. ``` It seems to me entirely logical (but then it would, I suggested it before...) that we allow opting out by setting `__array_ufunc__` to None; in that case, binops return NotImplemented and ufuncs raise errors. (In addtion, or alternatively, one could allow setting `__array__` to None, which would generally disable something to be turned into an array object). But I should note that I much prefer to get something in over wait yet another round! In astropy, there is now more and more clamouring to offer options for pure ndarray functions where quantities are more logical because quantities are twice as slow -- this would instantly be solved with __array_ufunc__... If we can decide on this, then I'd gladly help with remaining issues (e.g., the `ndarray.__array_ufunc__` method, so super can be used). All the best, Marten [1] https://github.com/numpy/numpy/pull/8247 [2] https://docs.python.org/3.6/whatsnew/3.6.html#other-language-changes From alex.rogozhnikov at yandex.ru Wed Feb 22 10:23:40 2017 From: alex.rogozhnikov at yandex.ru (Alex Rogozhnikov) Date: Wed, 22 Feb 2017 18:23:40 +0300 Subject: [Numpy-discussion] Fortran order in recarray. In-Reply-To: References: <0902C347-89B2-41EE-9367-F0C7A4F864D4@yandex.ru> Message-ID: <917C8C1B-FA4C-4128-9991-82DC6AFC3EAA@yandex.ru> Hi Francesc, thanks a lot for you reply and for your impressive job on bcolz! Bcolz seems to make stress on compression, which is not of much interest for me, but the ctable, and chunked operations look very appropriate to me now. (Of course, I'll need to test it much before I can say this for sure, that's current impression). The strongest concern with bcolz so far is that it seems to be completely non-trivial to install on windows systems, while pip provides binaries for most (or all?) OS for numpy. I didn't build pip binary wheels myself, but is it hard / impossible to cook pip-installabel binaries? > ?You can change shapes of numpy arrays, but that usually involves copies of the whole container. sure, but this is ok for me, as I plan to organize column editing in 'batches', so this should require seldom copying. It would be nice to see an example to understand how deep I need to go inside numpy. Cheers, Alex. > 22 ????. 2017 ?., ? 17:03, Francesc Alted ???????(?): > > Hi Alex, > > 2017-02-22 12:45 GMT+01:00 Alex Rogozhnikov >: > Hi Nathaniel, > > >> pandas > > yup, the idea was to have minimal pandas.DataFrame-like storage (which I was using for a long time), > but without irritating problems with its row indexing and some other problems like interaction with matplotlib. > >> A dict of arrays? > > > that's what I've started from and implemented, but at some point I decided that I'm reinventing the wheel and numpy has something already. In principle, I can ignore this 'column-oriented' storage requirement, but potentially it may turn out to be quite slow-ish if dtype's size is large. > > Suggestions are welcome. > > ?You may want to try bcolz: > > https://github.com/Blosc/bcolz > > bcolz is a columnar storage, basically as you require, but data is compressed by default even when stored in-memory (although you can disable compression if you want to).? > > > > Another strange question: > in general, it is considered that once numpy.array is created, it's shape not changed. > But if i want to keep the same recarray and change it's dtype and/or shape, is there a way to do this? > > ?You can change shapes of numpy arrays, but that usually involves copies of the whole container. With bcolz you can change length and add/del columns without copies.? If your containers are large, it is better to inform bcolz on its final estimated size. See: > > http://bcolz.blosc.org/en/latest/opt-tips.html > > ?Francesc? > > > Thanks, > Alex. > > > >> 22 ????. 2017 ?., ? 3:53, Nathaniel Smith > ???????(?): >> >> On Feb 21, 2017 3:24 PM, "Alex Rogozhnikov" > wrote: >> Ah, got it. Thanks, Chris! >> I thought recarray can be only one-dimensional (like tables with named columns). >> >> Maybe it's better to ask directly what I was looking for: >> something that works like a table with named columns (but no labelling for rows), and keeps data (of different dtypes) in a column-by-column way (and this is numpy, not pandas). >> >> Is there such a magic thing? >> >> Well, that's what pandas is for... >> >> A dict of arrays? >> >> -n >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > -- > Francesc Alted > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From kikocorreoso at gmail.com Wed Feb 22 10:30:18 2017 From: kikocorreoso at gmail.com (Kiko) Date: Wed, 22 Feb 2017 16:30:18 +0100 Subject: [Numpy-discussion] Fortran order in recarray. In-Reply-To: <917C8C1B-FA4C-4128-9991-82DC6AFC3EAA@yandex.ru> References: <0902C347-89B2-41EE-9367-F0C7A4F864D4@yandex.ru> <917C8C1B-FA4C-4128-9991-82DC6AFC3EAA@yandex.ru> Message-ID: 2017-02-22 16:23 GMT+01:00 Alex Rogozhnikov : > Hi Francesc, > thanks a lot for you reply and for your impressive job on bcolz! > > Bcolz seems to make stress on compression, which is not of much interest > for me, but the *ctable*, and chunked operations look very appropriate to > me now. (Of course, I'll need to test it much before I can say this for > sure, that's current impression). > > The strongest concern with bcolz so far is that it seems to be completely > non-trivial to install on windows systems, while pip provides binaries for > most (or all?) OS for numpy. > I didn't build pip binary wheels myself, but is it hard / impossible to > cook pip-installabel binaries? > http://www.lfd.uci.edu/~gohlke/pythonlibs/#bcolz Check if the link solves the issue with installing. > > ?You can change shapes of numpy arrays, but that usually involves copies > of the whole container. > > sure, but this is ok for me, as I plan to organize column editing in > 'batches', so this should require seldom copying. > It would be nice to see an example to understand how deep I need to go > inside numpy. > > Cheers, > Alex. > > > > > 22 ????. 2017 ?., ? 17:03, Francesc Alted ???????(?): > > Hi Alex, > > 2017-02-22 12:45 GMT+01:00 Alex Rogozhnikov : > >> Hi Nathaniel, >> >> >> pandas >> >> >> yup, the idea was to have minimal pandas.DataFrame-like storage (which I >> was using for a long time), >> but without irritating problems with its row indexing and some other >> problems like interaction with matplotlib. >> >> A dict of arrays? >> >> >> that's what I've started from and implemented, but at some point I >> decided that I'm reinventing the wheel and numpy has something already. In >> principle, I can ignore this 'column-oriented' storage requirement, but >> potentially it may turn out to be quite slow-ish if dtype's size is large. >> >> Suggestions are welcome. >> > > ?You may want to try bcolz: > > https://github.com/Blosc/bcolz > > bcolz is a columnar storage, basically as you require, but data is > compressed by default even when stored in-memory (although you can disable > compression if you want to).? > > > >> >> Another strange question: >> in general, it is considered that once numpy.array is created, it's shape >> not changed. >> But if i want to keep the same recarray and change it's dtype and/or >> shape, is there a way to do this? >> > > ?You can change shapes of numpy arrays, but that usually involves copies > of the whole container. With bcolz you can change length and add/del > columns without copies.? If your containers are large, it is better to > inform bcolz on its final estimated size. See: > > http://bcolz.blosc.org/en/latest/opt-tips.html > > ?Francesc? > > >> >> Thanks, >> Alex. >> >> >> >> 22 ????. 2017 ?., ? 3:53, Nathaniel Smith ???????(?): >> >> On Feb 21, 2017 3:24 PM, "Alex Rogozhnikov" >> wrote: >> >> Ah, got it. Thanks, Chris! >> I thought recarray can be only one-dimensional (like tables with named >> columns). >> >> Maybe it's better to ask directly what I was looking for: >> something that works like a table with named columns (but no labelling >> for rows), and keeps data (of different dtypes) in a column-by-column way >> (and this is numpy, not pandas). >> >> Is there such a magic thing? >> >> >> Well, that's what pandas is for... >> >> A dict of arrays? >> >> -n >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > Francesc Alted > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robbmcleod at gmail.com Wed Feb 22 10:34:00 2017 From: robbmcleod at gmail.com (Robert McLeod) Date: Wed, 22 Feb 2017 16:34:00 +0100 Subject: [Numpy-discussion] Fortran order in recarray. In-Reply-To: <917C8C1B-FA4C-4128-9991-82DC6AFC3EAA@yandex.ru> References: <0902C347-89B2-41EE-9367-F0C7A4F864D4@yandex.ru> <917C8C1B-FA4C-4128-9991-82DC6AFC3EAA@yandex.ru> Message-ID: Just as a note, Appveyor supports uploading modules to "public websites": https://packaging.python.org/appveyor/ The main issue I would see from this, is the PyPi has my password stored on my machine in a plain text file. I'm not sure whether there's a way to provide Appveyor with a SSH key instead. On Wed, Feb 22, 2017 at 4:23 PM, Alex Rogozhnikov < alex.rogozhnikov at yandex.ru> wrote: > Hi Francesc, > thanks a lot for you reply and for your impressive job on bcolz! > > Bcolz seems to make stress on compression, which is not of much interest > for me, but the *ctable*, and chunked operations look very appropriate to > me now. (Of course, I'll need to test it much before I can say this for > sure, that's current impression). > > The strongest concern with bcolz so far is that it seems to be completely > non-trivial to install on windows systems, while pip provides binaries for > most (or all?) OS for numpy. > I didn't build pip binary wheels myself, but is it hard / impossible to > cook pip-installabel binaries? > > ?You can change shapes of numpy arrays, but that usually involves copies > of the whole container. > > sure, but this is ok for me, as I plan to organize column editing in > 'batches', so this should require seldom copying. > It would be nice to see an example to understand how deep I need to go > inside numpy. > > Cheers, > Alex. > > > > > 22 ????. 2017 ?., ? 17:03, Francesc Alted ???????(?): > > Hi Alex, > > 2017-02-22 12:45 GMT+01:00 Alex Rogozhnikov : > >> Hi Nathaniel, >> >> >> pandas >> >> >> yup, the idea was to have minimal pandas.DataFrame-like storage (which I >> was using for a long time), >> but without irritating problems with its row indexing and some other >> problems like interaction with matplotlib. >> >> A dict of arrays? >> >> >> that's what I've started from and implemented, but at some point I >> decided that I'm reinventing the wheel and numpy has something already. In >> principle, I can ignore this 'column-oriented' storage requirement, but >> potentially it may turn out to be quite slow-ish if dtype's size is large. >> >> Suggestions are welcome. >> > > ?You may want to try bcolz: > > https://github.com/Blosc/bcolz > > bcolz is a columnar storage, basically as you require, but data is > compressed by default even when stored in-memory (although you can disable > compression if you want to).? > > > >> >> Another strange question: >> in general, it is considered that once numpy.array is created, it's shape >> not changed. >> But if i want to keep the same recarray and change it's dtype and/or >> shape, is there a way to do this? >> > > ?You can change shapes of numpy arrays, but that usually involves copies > of the whole container. With bcolz you can change length and add/del > columns without copies.? If your containers are large, it is better to > inform bcolz on its final estimated size. See: > > http://bcolz.blosc.org/en/latest/opt-tips.html > > ?Francesc? > > >> >> Thanks, >> Alex. >> >> >> >> 22 ????. 2017 ?., ? 3:53, Nathaniel Smith ???????(?): >> >> On Feb 21, 2017 3:24 PM, "Alex Rogozhnikov" >> wrote: >> >> Ah, got it. Thanks, Chris! >> I thought recarray can be only one-dimensional (like tables with named >> columns). >> >> Maybe it's better to ask directly what I was looking for: >> something that works like a table with named columns (but no labelling >> for rows), and keeps data (of different dtypes) in a column-by-column way >> (and this is numpy, not pandas). >> >> Is there such a magic thing? >> >> >> Well, that's what pandas is for... >> >> A dict of arrays? >> >> -n >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > Francesc Alted > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Robert McLeod, Ph.D. Center for Cellular Imaging and Nano Analytics (C-CINA) Biozentrum der Universit?t Basel Mattenstrasse 26, 4058 Basel Work: +41.061.387.3225 robert.mcleod at unibas.ch robert.mcleod at bsse.ethz.ch robbmcleod at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at gmail.com Wed Feb 22 10:37:07 2017 From: faltet at gmail.com (Francesc Alted) Date: Wed, 22 Feb 2017 16:37:07 +0100 Subject: [Numpy-discussion] Fortran order in recarray. In-Reply-To: References: <0902C347-89B2-41EE-9367-F0C7A4F864D4@yandex.ru> <917C8C1B-FA4C-4128-9991-82DC6AFC3EAA@yandex.ru> Message-ID: 2017-02-22 16:30 GMT+01:00 Kiko : > > > 2017-02-22 16:23 GMT+01:00 Alex Rogozhnikov : > >> Hi Francesc, >> thanks a lot for you reply and for your impressive job on bcolz! >> >> Bcolz seems to make stress on compression, which is not of much interest >> for me, but the *ctable*, and chunked operations look very appropriate >> to me now. (Of course, I'll need to test it much before I can say this for >> sure, that's current impression). >> > ?You can disable compression for bcolz by default too: http://bcolz.blosc.org/en/latest/defaults.html#list-of-default-values? > >> The strongest concern with bcolz so far is that it seems to be completely >> non-trivial to install on windows systems, while pip provides binaries for >> most (or all?) OS for numpy. >> I didn't build pip binary wheels myself, but is it hard / impossible to >> cook pip-installabel binaries? >> > > http://www.lfd.uci.edu/~gohlke/pythonlibs/#bcolz > Check if the link solves the issue with installing. > ?Yeah. Also, there are binaries for conda: http://bcolz.blosc.org/en/latest/install.html#installing-from-conda-forge? > >> ?You can change shapes of numpy arrays, but that usually involves copies >> of the whole container. >> >> sure, but this is ok for me, as I plan to organize column editing in >> 'batches', so this should require seldom copying. >> It would be nice to see an example to understand how deep I need to go >> inside numpy. >> > ?Well, if copying is not a problem for you, then you can just create a new numpy container and do the copy by yourself.? Francesc > >> Cheers, >> Alex. >> >> >> >> >> 22 ????. 2017 ?., ? 17:03, Francesc Alted ???????(?): >> >> Hi Alex, >> >> 2017-02-22 12:45 GMT+01:00 Alex Rogozhnikov : >> >>> Hi Nathaniel, >>> >>> >>> pandas >>> >>> >>> yup, the idea was to have minimal pandas.DataFrame-like storage (which I >>> was using for a long time), >>> but without irritating problems with its row indexing and some other >>> problems like interaction with matplotlib. >>> >>> A dict of arrays? >>> >>> >>> that's what I've started from and implemented, but at some point I >>> decided that I'm reinventing the wheel and numpy has something already. In >>> principle, I can ignore this 'column-oriented' storage requirement, but >>> potentially it may turn out to be quite slow-ish if dtype's size is large. >>> >>> Suggestions are welcome. >>> >> >> ?You may want to try bcolz: >> >> https://github.com/Blosc/bcolz >> >> bcolz is a columnar storage, basically as you require, but data is >> compressed by default even when stored in-memory (although you can disable >> compression if you want to).? >> >> >> >>> >>> Another strange question: >>> in general, it is considered that once numpy.array is created, it's >>> shape not changed. >>> But if i want to keep the same recarray and change it's dtype and/or >>> shape, is there a way to do this? >>> >> >> ?You can change shapes of numpy arrays, but that usually involves copies >> of the whole container. With bcolz you can change length and add/del >> columns without copies.? If your containers are large, it is better to >> inform bcolz on its final estimated size. See: >> >> http://bcolz.blosc.org/en/latest/opt-tips.html >> >> ?Francesc? >> >> >>> >>> Thanks, >>> Alex. >>> >>> >>> >>> 22 ????. 2017 ?., ? 3:53, Nathaniel Smith ???????(?): >>> >>> On Feb 21, 2017 3:24 PM, "Alex Rogozhnikov" >>> wrote: >>> >>> Ah, got it. Thanks, Chris! >>> I thought recarray can be only one-dimensional (like tables with named >>> columns). >>> >>> Maybe it's better to ask directly what I was looking for: >>> something that works like a table with named columns (but no labelling >>> for rows), and keeps data (of different dtypes) in a column-by-column way >>> (and this is numpy, not pandas). >>> >>> Is there such a magic thing? >>> >>> >>> Well, that's what pandas is for... >>> >>> A dict of arrays? >>> >>> -n >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> >> -- >> Francesc Alted >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Francesc Alted -------------- next part -------------- An HTML attachment was scrubbed... URL: From harrigan.matthew at gmail.com Wed Feb 22 10:38:45 2017 From: harrigan.matthew at gmail.com (Matthew Harrigan) Date: Wed, 22 Feb 2017 10:38:45 -0500 Subject: [Numpy-discussion] Fortran order in recarray. In-Reply-To: References: <0902C347-89B2-41EE-9367-F0C7A4F864D4@yandex.ru> <917C8C1B-FA4C-4128-9991-82DC6AFC3EAA@yandex.ru> Message-ID: Alex, Can you please post some code showing exactly what you are trying to do and any issues you are having, particularly the "irritating problems with its row indexing and some other problems" you quote above? On Wed, Feb 22, 2017 at 10:34 AM, Robert McLeod wrote: > Just as a note, Appveyor supports uploading modules to "public websites": > > https://packaging.python.org/appveyor/ > > The main issue I would see from this, is the PyPi has my password stored > on my machine in a plain text file. I'm not sure whether there's a way to > provide Appveyor with a SSH key instead. > > On Wed, Feb 22, 2017 at 4:23 PM, Alex Rogozhnikov < > alex.rogozhnikov at yandex.ru> wrote: > >> Hi Francesc, >> thanks a lot for you reply and for your impressive job on bcolz! >> >> Bcolz seems to make stress on compression, which is not of much interest >> for me, but the *ctable*, and chunked operations look very appropriate >> to me now. (Of course, I'll need to test it much before I can say this for >> sure, that's current impression). >> >> The strongest concern with bcolz so far is that it seems to be completely >> non-trivial to install on windows systems, while pip provides binaries for >> most (or all?) OS for numpy. >> I didn't build pip binary wheels myself, but is it hard / impossible to >> cook pip-installabel binaries? >> >> ?You can change shapes of numpy arrays, but that usually involves copies >> of the whole container. >> >> sure, but this is ok for me, as I plan to organize column editing in >> 'batches', so this should require seldom copying. >> It would be nice to see an example to understand how deep I need to go >> inside numpy. >> >> Cheers, >> Alex. >> >> >> >> >> 22 ????. 2017 ?., ? 17:03, Francesc Alted ???????(?): >> >> Hi Alex, >> >> 2017-02-22 12:45 GMT+01:00 Alex Rogozhnikov : >> >>> Hi Nathaniel, >>> >>> >>> pandas >>> >>> >>> yup, the idea was to have minimal pandas.DataFrame-like storage (which I >>> was using for a long time), >>> but without irritating problems with its row indexing and some other >>> problems like interaction with matplotlib. >>> >>> A dict of arrays? >>> >>> >>> that's what I've started from and implemented, but at some point I >>> decided that I'm reinventing the wheel and numpy has something already. In >>> principle, I can ignore this 'column-oriented' storage requirement, but >>> potentially it may turn out to be quite slow-ish if dtype's size is large. >>> >>> Suggestions are welcome. >>> >> >> ?You may want to try bcolz: >> >> https://github.com/Blosc/bcolz >> >> bcolz is a columnar storage, basically as you require, but data is >> compressed by default even when stored in-memory (although you can disable >> compression if you want to).? >> >> >> >>> >>> Another strange question: >>> in general, it is considered that once numpy.array is created, it's >>> shape not changed. >>> But if i want to keep the same recarray and change it's dtype and/or >>> shape, is there a way to do this? >>> >> >> ?You can change shapes of numpy arrays, but that usually involves copies >> of the whole container. With bcolz you can change length and add/del >> columns without copies.? If your containers are large, it is better to >> inform bcolz on its final estimated size. See: >> >> http://bcolz.blosc.org/en/latest/opt-tips.html >> >> ?Francesc? >> >> >>> >>> Thanks, >>> Alex. >>> >>> >>> >>> 22 ????. 2017 ?., ? 3:53, Nathaniel Smith ???????(?): >>> >>> On Feb 21, 2017 3:24 PM, "Alex Rogozhnikov" >>> wrote: >>> >>> Ah, got it. Thanks, Chris! >>> I thought recarray can be only one-dimensional (like tables with named >>> columns). >>> >>> Maybe it's better to ask directly what I was looking for: >>> something that works like a table with named columns (but no labelling >>> for rows), and keeps data (of different dtypes) in a column-by-column way >>> (and this is numpy, not pandas). >>> >>> Is there such a magic thing? >>> >>> >>> Well, that's what pandas is for... >>> >>> A dict of arrays? >>> >>> -n >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> >> -- >> Francesc Alted >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > Robert McLeod, Ph.D. > Center for Cellular Imaging and Nano Analytics (C-CINA) > Biozentrum der Universit?t Basel > Mattenstrasse 26, 4058 Basel > Work: +41.061.387.3225 <+41%2061%20387%2032%2025> > robert.mcleod at unibas.ch > robert.mcleod at bsse.ethz.ch > robbmcleod at gmail.com > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From harrigan.matthew at gmail.com Wed Feb 22 10:52:15 2017 From: harrigan.matthew at gmail.com (Matthew Harrigan) Date: Wed, 22 Feb 2017 10:52:15 -0500 Subject: [Numpy-discussion] Numpy Development Queries In-Reply-To: References: <8cbf1c9b6c561724e2018d77faadae41@students.iiit.ac.in> Message-ID: Ashwin, I don't know your background but perhaps it is similar to mine. I use numpy extensively in my day job and starting contributing to numpy a few months ago. From using numpy, I found some things that I thought should be added/improved. I researched them and the associated numpy code enough to be confident I would be capable of making the change (some of the C code is intense). Before I got too far along, I posted an issue and got feedback from the experts. Then I did it. On Tue, Feb 21, 2017 at 12:01 AM, Ralf Gommers wrote: > > > On Tue, Feb 21, 2017 at 7:32 AM, ashwin.pathak < > ashwin.pathak at students.iiit.ac.in> wrote: > >> Hello all, >> I am new to this organization and wanted to start with some easy-fix >> issues to get some knowledge about the soruce code. However, the issues >> under easy-fix labels have already been solved or someone is at it. Can >> someone help me find such issues? >> > > Hi Ashwin, welcome. I don't want to seem discouraging, but I do want to > explain that NumPy is significantly harder to get started on than SciPy > (which you've started on already) as a newcomer to the scientific Python > ecosystem. So I'd encourage you to spend some more time on the SciPy issues > - there are more easy-fix ones there, and the process of contributing (pull > requests, reviews, finding your way around the codebase) is similar for the > two projects. > > Cheers, > Ralf > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex.rogozhnikov at yandex.ru Wed Feb 22 11:57:57 2017 From: alex.rogozhnikov at yandex.ru (Alex Rogozhnikov) Date: Wed, 22 Feb 2017 19:57:57 +0300 Subject: [Numpy-discussion] Fortran order in recarray. In-Reply-To: References: <0902C347-89B2-41EE-9367-F0C7A4F864D4@yandex.ru> <917C8C1B-FA4C-4128-9991-82DC6AFC3EAA@yandex.ru> Message-ID: <04A70978-5E6B-43BB-B662-539DC018DAAA@yandex.ru> Hi Matthew, maybe it is not the best place to discuss problems of pandas, but to show that I am not missing something, let's consider a simple example. # simplest DataFrame x = pandas.DataFrame(dict(a=numpy.arange(10), b=numpy.arange(10, 20))) # simplest indexing. Can you predict results without looking at comments? x[:2] # returns two first rows, as expected x[[0, 1]] # returns copy of x, whole dataframe x[numpy.array(2)] # fails with IndexError: indices are out-of-bounds (can you guess why?) x[[0, 1], :] # unhashable type: list just in case - I know about .loc and .iloc, but when you write code with many subroutines, you concentrate on numpy inputs, and at some point you simply forget to convert some of the data you operated with to numpy and it continues to work, but it yields wrong results (while you tested everything, but you tested this for numpy). Checking all the inputs in each small subroutine is strange. Ok, a bit more: x[x['a'] > 5] # works as expected x[x['a'] > 5, :] # 'Series' objects are mutable, thus they cannot be hashed lookup = numpy.arange(10) x[lookup[x['a']] > 5] # works as expected x[lookup[x['a']] > 5, :] # TypeError: unhashable type: 'numpy.ndarray' x[lookup]['a'] # indexError x['a'][lookup] # works as expected Now let's go a bit further: train/test splitted the data for machine learning (again, the most frequent operation) from sklearn.model_selection import train_test_split x1, x2 = train_test_split(x, random_state=42) # compare next to operations with pandas.DataFrame col = x1['a'] print col[:2] # first two elements print col[[0, 1]] # doesn't fail (while there in no row with index 0), fills it with NaN print col[numpy.arange(2)] # same as previous print col[col > 4] # as expected print col[col.values > 4] # as expected print col.values[col > 4] # converts boolean to int, uses int indexing, but at least raises warning Mistakes done by such silent misoperating are not easy to detect (when your data pipeline consists of several steps), quite hard to locate the source of problem and almost impossible to be sure that you indeed avoided all such caveats. Code review turns into paranoidal process (if you care about the result, of course). Things are even worse, because I've demonstrated this for my installation, and probably if you run this with some other pandas installation, you get some other results (that were really basic operations). So things that worked ok in one version, may work different way in the other, this becomes completely intractable. Pandas may be nice, if you need a report, and you need get it done tomorrow. Then you'll throw away the code. When we initially used pandas as main data storage in yandex/rep, it looked like an good idea, but a year later it was obvious this was a wrong decision. In case when you build data pipeline / research that should be working several years later (using some other installation by someone else), usage of pandas shall be minimal. That's why I am looking for a reliable pandas substitute, which should be: - completely consistent with numpy and should fail when this wasn't implemented / impossible - fewer new abstractions, nobody wants to learn one-more-way-to-manipulate-the-data, specifically other researchers - it may be less convenient for interactive data mungling - in particular, less methods is ok - written code should be interpretable, and hardly can be misinterpreted. - not super slow, 1-10 gigabytes datasets are a normal situation Well, that's it. Sorry for large letter. Alex. > 22 ????. 2017 ?., ? 18:38, Matthew Harrigan ???????(?): > > Alex, > > Can you please post some code showing exactly what you are trying to do and any issues you are having, particularly the "irritating problems with its row indexing and some other problems" you quote above? > > On Wed, Feb 22, 2017 at 10:34 AM, Robert McLeod > wrote: > Just as a note, Appveyor supports uploading modules to "public websites": > > https://packaging.python.org/appveyor/ > > The main issue I would see from this, is the PyPi has my password stored on my machine in a plain text file. I'm not sure whether there's a way to provide Appveyor with a SSH key instead. > > On Wed, Feb 22, 2017 at 4:23 PM, Alex Rogozhnikov > wrote: > Hi Francesc, > thanks a lot for you reply and for your impressive job on bcolz! > > Bcolz seems to make stress on compression, which is not of much interest for me, but the ctable, and chunked operations look very appropriate to me now. (Of course, I'll need to test it much before I can say this for sure, that's current impression). > > The strongest concern with bcolz so far is that it seems to be completely non-trivial to install on windows systems, while pip provides binaries for most (or all?) OS for numpy. > I didn't build pip binary wheels myself, but is it hard / impossible to cook pip-installabel binaries? > >> ?You can change shapes of numpy arrays, but that usually involves copies of the whole container. > sure, but this is ok for me, as I plan to organize column editing in 'batches', so this should require seldom copying. > It would be nice to see an example to understand how deep I need to go inside numpy. > > Cheers, > Alex. > > > > >> 22 ????. 2017 ?., ? 17:03, Francesc Alted > ???????(?): >> >> Hi Alex, >> >> 2017-02-22 12:45 GMT+01:00 Alex Rogozhnikov >: >> Hi Nathaniel, >> >> >>> pandas >> >> yup, the idea was to have minimal pandas.DataFrame-like storage (which I was using for a long time), >> but without irritating problems with its row indexing and some other problems like interaction with matplotlib. >> >>> A dict of arrays? >> >> >> that's what I've started from and implemented, but at some point I decided that I'm reinventing the wheel and numpy has something already. In principle, I can ignore this 'column-oriented' storage requirement, but potentially it may turn out to be quite slow-ish if dtype's size is large. >> >> Suggestions are welcome. >> >> ?You may want to try bcolz: >> >> https://github.com/Blosc/bcolz >> >> bcolz is a columnar storage, basically as you require, but data is compressed by default even when stored in-memory (although you can disable compression if you want to).? >> >> >> >> Another strange question: >> in general, it is considered that once numpy.array is created, it's shape not changed. >> But if i want to keep the same recarray and change it's dtype and/or shape, is there a way to do this? >> >> ?You can change shapes of numpy arrays, but that usually involves copies of the whole container. With bcolz you can change length and add/del columns without copies.? If your containers are large, it is better to inform bcolz on its final estimated size. See: >> >> http://bcolz.blosc.org/en/latest/opt-tips.html >> >> ?Francesc? >> >> >> Thanks, >> Alex. >> >> >> >>> 22 ????. 2017 ?., ? 3:53, Nathaniel Smith > ???????(?): >>> >>> On Feb 21, 2017 3:24 PM, "Alex Rogozhnikov" > wrote: >>> Ah, got it. Thanks, Chris! >>> I thought recarray can be only one-dimensional (like tables with named columns). >>> >>> Maybe it's better to ask directly what I was looking for: >>> something that works like a table with named columns (but no labelling for rows), and keeps data (of different dtypes) in a column-by-column way (and this is numpy, not pandas). >>> >>> Is there such a magic thing? >>> >>> Well, that's what pandas is for... >>> >>> A dict of arrays? >>> >>> -n >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> >> -- >> Francesc Alted >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > -- > Robert McLeod, Ph.D. > Center for Cellular Imaging and Nano Analytics (C-CINA) > Biozentrum der Universit?t Basel > Mattenstrasse 26, 4058 Basel > Work: +41.061.387.3225 > robert.mcleod at unibas.ch > robert.mcleod at bsse.ethz.ch > robbmcleod at gmail.com > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Wed Feb 22 12:25:27 2017 From: shoyer at gmail.com (Stephan Hoyer) Date: Wed, 22 Feb 2017 09:25:27 -0800 Subject: [Numpy-discussion] __numpy_ufunc__ In-Reply-To: References: Message-ID: On Wed, Feb 22, 2017 at 6:31 AM, Marten van Kerkwijk < m.h.vankerkwijk at gmail.com> wrote: > It seems to me entirely logical (but then it would, I suggested it > before...) that we allow opting out by setting `__array_ufunc__` to > None; in that case, binops return NotImplemented and ufuncs raise > errors. (In addtion, or alternatively, one could allow setting > `__array__` to None, which would generally disable something to be > turned into an array object). > This is indeed appealing, but I recall this was still a point of contention because it leaves intact two different ways to override arithmetic involving numpy arrays. Mimicking all this logic on classes designed to wrap well-behaved array-like classes (e.g., xarray, which can wrap NumPy or Dask arrays) could be painful -- it's easier to just call np.add and let it handle all the dispatching rather than also worrying about NotImplemented. That said, I think the opt-out is probably OK, as long we make it clear that defining __array_ufunc__ to do arithmetic overriding is the preferred solution (and provide appropriate Mixin classes to make this easier). Just to be clear: if __array__ = None but __array_ufunc__ is defined, this would be a class that defines array-like operations but can't be directly converted into a NumPy array? For example, a scipy.sparse matrix? -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Feb 22 12:39:38 2017 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 22 Feb 2017 12:39:38 -0500 Subject: [Numpy-discussion] Fortran order in recarray. In-Reply-To: <04A70978-5E6B-43BB-B662-539DC018DAAA@yandex.ru> References: <0902C347-89B2-41EE-9367-F0C7A4F864D4@yandex.ru> <917C8C1B-FA4C-4128-9991-82DC6AFC3EAA@yandex.ru> <04A70978-5E6B-43BB-B662-539DC018DAAA@yandex.ru> Message-ID: On Wed, Feb 22, 2017 at 11:57 AM, Alex Rogozhnikov < alex.rogozhnikov at yandex.ru> wrote: > Hi Matthew, > maybe it is not the best place to discuss problems of pandas, but to show > that I am not missing something, let's consider a simple example. > > # simplest DataFrame > x = pandas.DataFrame(dict(a=numpy.arange(10), b=numpy.arange(10, 20))) > # simplest indexing. Can you predict results without looking at comments? > x[:2] # returns two first rows, as expected > x[[0, 1]] # returns copy of x, whole dataframe > x[numpy.array(2)] # fails with IndexError: indices are out-of-bounds (can you guess why?) > x[[0, 1], :] # unhashable type: list > > > just in case - I know about .loc and .iloc, but when you write code with > many subroutines, you concentrate on numpy inputs, and at some point you > simply *forget* to convert some of the data you operated with to numpy > and it *continues* to work, but it yields wrong results (while you tested > everything, but you tested this for numpy). Checking all the inputs in each > small subroutine is strange. > > Ok, a bit more: > > x[x['a'] > 5] # works as expected > x[x['a'] > 5, :] # 'Series' objects are mutable, thus they cannot be hashed > lookup = numpy.arange(10) > x[lookup[x['a']] > 5] # works as expected > x[lookup[x['a']] > 5, :] # TypeError: unhashable type: 'numpy.ndarray' > > x[lookup]['a'] # indexError > x['a'][lookup] # works as expected > > > Now let's go a bit further: train/test splitted the data for machine > learning (again, the most frequent operation) > > from sklearn.model_selection import train_test_split > x1, x2 = train_test_split(x, random_state=42) > # compare next to operations with pandas.DataFrame > col = x1['a']print col[:2] # first two elementsprint col[[0, 1]] # doesn't fail (while there in no row with index 0), fills it with NaNprint col[numpy.arange(2)] # same as previous > print col[col > 4] # as expectedprint col[col.values > 4] # as expectedprint col.values[col > 4] # converts boolean to int, uses int indexing, but at least raises warning > > > Mistakes done by such silent misoperating are not easy to detect (when > your data pipeline consists of several steps), quite hard to locate the > source of problem and almost impossible to be sure that you indeed avoided > all such caveats. Code review turns into paranoidal process (if you care > about the result, of course). > > Things are even worse, because I've demonstrated this for my installation, > and probably if you run this with some other pandas installation, you get > some other results (that were really basic operations). So things that > worked ok in one version, may work different way in the other, this becomes > completely intractable. > > Pandas may be nice, if you need a report, and you need get it done > tomorrow. Then you'll throw away the code. When we initially used pandas as > main data storage in yandex/rep, it looked like an good idea, but a year > later it was obvious this was a wrong decision. In case when you build data > pipeline / research that should be working several years later (using some > other installation by someone else), usage of pandas shall be *minimal*. > > That's why I am looking for a reliable pandas substitute, which should be: > - completely consistent with numpy and should fail when this wasn't > implemented / impossible > - fewer new abstractions, nobody wants to learn one-more-way-to-manipulate-the-data, > specifically other researchers > - it may be less convenient for interactive data mungling > - in particular, less methods is ok > - written code should be interpretable, and hardly can be misinterpreted. > - not super slow, 1-10 gigabytes datasets are a normal situation > Just to the pandas part statsmodels supported pandas almost from the very beginning (or maybe after 1.5 years) when the new pandas was still very young. However, what I insisted on is that pandas is in the wrapper/interface code, and internally only numpy arrays are used. Besides the confusing "magic" indexing of early pandas, there were a lot of details that silently produced different results, e.g. default iteration on axis=1, ddof in std and var =1 instead of numpy =0. Essentially, every interface corresponds to np.asarry, but we store the DataFrame information, mainly the index and column names, wo we can return the appropriate pandas object if a pandas object was used for the input. This has worked pretty well. Users can have their dataframes, and we have pure numpy algorithms. Recently we have started to use pandas inside a few functions or classes that are less tightly integrated into the overall setup. We also use pandas for some things that are not convenient or not available in numpy. Our internal use of pandas groupby and similar will most likely increase over time. (One of the main issues we had was date and time index because that was a moving target in both numpy and pandas.) One issue for computational efficiency that we do not control is whether `asarray` creates a view or needs to make a copy because that depends on whether the dtype and memory layout that the user has in the data frame corresponds to what we need in the algorithms. If it matches, then no copies should be made except where explicitly needed. The intention is to extend this over time to other array structures like xarray and likely dask arrays. Josef > > Well, that's it. > Sorry for large letter. > > Alex. > > > > 22 ????. 2017 ?., ? 18:38, Matthew Harrigan > ???????(?): > > Alex, > > Can you please post some code showing exactly what you are trying to do > and any issues you are having, particularly the "irritating problems with > its row indexing and some other problems" you quote above? > > On Wed, Feb 22, 2017 at 10:34 AM, Robert McLeod > wrote: > >> Just as a note, Appveyor supports uploading modules to "public websites": >> >> https://packaging.python.org/appveyor/ >> >> The main issue I would see from this, is the PyPi has my password stored >> on my machine in a plain text file. I'm not sure whether there's a way to >> provide Appveyor with a SSH key instead. >> >> On Wed, Feb 22, 2017 at 4:23 PM, Alex Rogozhnikov < >> alex.rogozhnikov at yandex.ru> wrote: >> >>> Hi Francesc, >>> thanks a lot for you reply and for your impressive job on bcolz! >>> >>> Bcolz seems to make stress on compression, which is not of much interest >>> for me, but the *ctable*, and chunked operations look very appropriate >>> to me now. (Of course, I'll need to test it much before I can say this for >>> sure, that's current impression). >>> >>> The strongest concern with bcolz so far is that it seems to be >>> completely non-trivial to install on windows systems, while pip provides >>> binaries for most (or all?) OS for numpy. >>> I didn't build pip binary wheels myself, but is it hard / impossible to >>> cook pip-installabel binaries? >>> >>> ?You can change shapes of numpy arrays, but that usually involves copies >>> of the whole container. >>> >>> sure, but this is ok for me, as I plan to organize column editing in >>> 'batches', so this should require seldom copying. >>> It would be nice to see an example to understand how deep I need to go >>> inside numpy. >>> >>> Cheers, >>> Alex. >>> >>> >>> >>> >>> 22 ????. 2017 ?., ? 17:03, Francesc Alted ???????(?): >>> >>> Hi Alex, >>> >>> 2017-02-22 12:45 GMT+01:00 Alex Rogozhnikov >>> : >>> >>>> Hi Nathaniel, >>>> >>>> >>>> pandas >>>> >>>> >>>> yup, the idea was to have minimal pandas.DataFrame-like storage (which >>>> I was using for a long time), >>>> but without irritating problems with its row indexing and some other >>>> problems like interaction with matplotlib. >>>> >>>> A dict of arrays? >>>> >>>> >>>> that's what I've started from and implemented, but at some point I >>>> decided that I'm reinventing the wheel and numpy has something already. In >>>> principle, I can ignore this 'column-oriented' storage requirement, but >>>> potentially it may turn out to be quite slow-ish if dtype's size is large. >>>> >>>> Suggestions are welcome. >>>> >>> >>> ?You may want to try bcolz: >>> >>> https://github.com/Blosc/bcolz >>> >>> bcolz is a columnar storage, basically as you require, but data is >>> compressed by default even when stored in-memory (although you can disable >>> compression if you want to).? >>> >>> >>> >>>> >>>> Another strange question: >>>> in general, it is considered that once numpy.array is created, it's >>>> shape not changed. >>>> But if i want to keep the same recarray and change it's dtype and/or >>>> shape, is there a way to do this? >>>> >>> >>> ?You can change shapes of numpy arrays, but that usually involves copies >>> of the whole container. With bcolz you can change length and add/del >>> columns without copies.? If your containers are large, it is better to >>> inform bcolz on its final estimated size. See: >>> >>> http://bcolz.blosc.org/en/latest/opt-tips.html >>> >>> ?Francesc? >>> >>> >>>> >>>> Thanks, >>>> Alex. >>>> >>>> >>>> >>>> 22 ????. 2017 ?., ? 3:53, Nathaniel Smith ???????(?): >>>> >>>> On Feb 21, 2017 3:24 PM, "Alex Rogozhnikov" >>>> wrote: >>>> >>>> Ah, got it. Thanks, Chris! >>>> I thought recarray can be only one-dimensional (like tables with named >>>> columns). >>>> >>>> Maybe it's better to ask directly what I was looking for: >>>> something that works like a table with named columns (but no labelling >>>> for rows), and keeps data (of different dtypes) in a column-by-column way >>>> (and this is numpy, not pandas). >>>> >>>> Is there such a magic thing? >>>> >>>> >>>> Well, that's what pandas is for... >>>> >>>> A dict of arrays? >>>> >>>> -n >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> >>> >>> -- >>> Francesc Alted >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> >> -- >> Robert McLeod, Ph.D. >> Center for Cellular Imaging and Nano Analytics (C-CINA) >> Biozentrum der Universit?t Basel >> Mattenstrasse 26, 4058 Basel >> Work: +41.061.387.3225 <+41%2061%20387%2032%2025> >> robert.mcleod at unibas.ch >> robert.mcleod at bsse.ethz.ch >> robbmcleod at gmail.com >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Wed Feb 22 12:55:07 2017 From: shoyer at gmail.com (Stephan Hoyer) Date: Wed, 22 Feb 2017 09:55:07 -0800 Subject: [Numpy-discussion] Fortran order in recarray. In-Reply-To: <04A70978-5E6B-43BB-B662-539DC018DAAA@yandex.ru> References: <0902C347-89B2-41EE-9367-F0C7A4F864D4@yandex.ru> <917C8C1B-FA4C-4128-9991-82DC6AFC3EAA@yandex.ru> <04A70978-5E6B-43BB-B662-539DC018DAAA@yandex.ru> Message-ID: On Wed, Feb 22, 2017 at 8:57 AM, Alex Rogozhnikov < alex.rogozhnikov at yandex.ru> wrote: > Pandas may be nice, if you need a report, and you need get it done > tomorrow. Then you'll throw away the code. When we initially used pandas as > main data storage in yandex/rep, it looked like an good idea, but a year > later it was obvious this was a wrong decision. In case when you build data > pipeline / research that should be working several years later (using some > other installation by someone else), usage of pandas shall be *minimal*. > The pandas development team (myself included) is well aware of these issues. There are long term plans/hopes to fix this, but there's a lot of work to be done and some hard choices to make: https://github.com/pandas-dev/pandas/issues/10000 https://github.com/pandas-dev/pandas/issues/13862 That's why I am looking for a reliable pandas substitute, which should be: > - completely consistent with numpy and should fail when this wasn't > implemented / impossible > - fewer new abstractions, nobody wants to learn one-more-way-to-manipulate-the-data, > specifically other researchers > - it may be less convenient for interactive data mungling > - in particular, less methods is ok > - written code should be interpretable, and hardly can be misinterpreted. > - not super slow, 1-10 gigabytes datasets are a normal situation > This has some overlap with our motivations for writing Xarray ( http://xarray.pydata.org), so I encourage you to take a look. It still might be more complex than you're looking for, but we did try to clean up the really ambiguous APIs from pandas like indexing. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex.rogozhnikov at yandex.ru Wed Feb 22 13:02:17 2017 From: alex.rogozhnikov at yandex.ru (Alex Rogozhnikov) Date: Wed, 22 Feb 2017 21:02:17 +0300 Subject: [Numpy-discussion] Fortran order in recarray. In-Reply-To: References: <0902C347-89B2-41EE-9367-F0C7A4F864D4@yandex.ru> <917C8C1B-FA4C-4128-9991-82DC6AFC3EAA@yandex.ru> <04A70978-5E6B-43BB-B662-539DC018DAAA@yandex.ru> Message-ID: <50E42B63-53F4-4D73-BAB5-97B1751C3D6B@yandex.ru> > 22 ????. 2017 ?., ? 20:39, josef.pktd at gmail.com ???????(?): > > > > On Wed, Feb 22, 2017 at 11:57 AM, Alex Rogozhnikov > wrote: > Hi Matthew, > maybe it is not the best place to discuss problems of pandas, but to show that I am not missing something, let's consider a simple example. > > # simplest DataFrame > x = pandas.DataFrame(dict(a=numpy.arange(10), b=numpy.arange(10, 20))) > > # simplest indexing. Can you predict results without looking at comments? > x[:2] # returns two first rows, as expected > x[[0, 1]] # returns copy of x, whole dataframe > x[numpy.array(2)] # fails with IndexError: indices are out-of-bounds (can you guess why?) > x[[0, 1], :] # unhashable type: list > > just in case - I know about .loc and .iloc, but when you write code with many subroutines, you concentrate on numpy inputs, and at some point you simply forget to convert some of the data you operated with to numpy and it continues to work, but it yields wrong results (while you tested everything, but you tested this for numpy). Checking all the inputs in each small subroutine is strange. > > Ok, a bit more: > x[x['a'] > 5] # works as expected > x[x['a'] > 5, :] # 'Series' objects are mutable, thus they cannot be hashed > lookup = numpy.arange(10) > x[lookup[x['a']] > 5] # works as expected > x[lookup[x['a']] > 5, :] # TypeError: unhashable type: 'numpy.ndarray' > > x[lookup]['a'] # indexError > x['a'][lookup] # works as expected > > Now let's go a bit further: train/test splitted the data for machine learning (again, the most frequent operation) > > from sklearn.model_selection import train_test_split > x1, x2 = train_test_split(x, random_state=42) > > # compare next to operations with pandas.DataFrame > col = x1['a'] > print col[:2] # first two elements > print col[[0, 1]] # doesn't fail (while there in no row with index 0), fills it with NaN > print col[numpy.arange(2)] # same as previous > > print col[col > 4] # as expected > print col[col.values > 4] # as expected > print col.values[col > 4] # converts boolean to int, uses int indexing, but at least raises warning > > Mistakes done by such silent misoperating are not easy to detect (when your data pipeline consists of several steps), quite hard to locate the source of problem and almost impossible to be sure that you indeed avoided all such caveats. Code review turns into paranoidal process (if you care about the result, of course). > > Things are even worse, because I've demonstrated this for my installation, and probably if you run this with some other pandas installation, you get some other results (that were really basic operations). So things that worked ok in one version, may work different way in the other, this becomes completely intractable. > > Pandas may be nice, if you need a report, and you need get it done tomorrow. Then you'll throw away the code. When we initially used pandas as main data storage in yandex/rep, it looked like an good idea, but a year later it was obvious this was a wrong decision. In case when you build data pipeline / research that should be working several years later (using some other installation by someone else), usage of pandas shall be minimal. > > That's why I am looking for a reliable pandas substitute, which should be: > - completely consistent with numpy and should fail when this wasn't implemented / impossible > - fewer new abstractions, nobody wants to learn one-more-way-to-manipulate-the-data, specifically other researchers > - it may be less convenient for interactive data mungling > - in particular, less methods is ok > - written code should be interpretable, and hardly can be misinterpreted. > - not super slow, 1-10 gigabytes datasets are a normal situation > > Just to the pandas part > > statsmodels supported pandas almost from the very beginning (or maybe after 1.5 years) when the new pandas was still very young. > > However, what I insisted on is that pandas is in the wrapper/interface code, and internally only numpy arrays are used. Besides the confusing "magic" indexing of early pandas, there were a lot of details that silently produced different results, e.g. default iteration on axis=1, ddof in std and var =1 instead of numpy =0. > > Essentially, every interface corresponds to np.asarry, but we store the DataFrame information, mainly the index and column names, wo we can return the appropriate pandas object if a pandas object was used for the input. Yes, it seems to be the best practice. But apart from libraries, there is lots of code for my research / research in my team, and we don't make such checks all the time, moreover many functions are intended to operate with DataFrames (and use particular feature names). So, the approach is not completely applicable for research code, which is very diverse, and has many functions which are used 2-3 times. It is irrational to make all the code more complex to protect yourself from one library - because all the benefit is lost (as for user of package, you'll anyway need checks to protect him from passing something inappropriate) > > This has worked pretty well. Users can have their dataframes, and we have pure numpy algorithms. > > Recently we have started to use pandas inside a few functions or classes that are less tightly integrated into the overall setup. We also use pandas for some things that are not convenient or not available in numpy. Our internal use of pandas groupby and similar will most likely increase over time. > (One of the main issues we had was date and time index because that was a moving target in both numpy and pandas.) > > > One issue for computational efficiency that we do not control is whether `asarray` creates a view or needs to make a copy because that depends on whether the dtype and memory layout that the user has in the data frame corresponds to what we need in the algorithms. If it matches, then no copies should be made except where explicitly needed. > > The intention is to extend this over time to other array structures like xarray and likely dask arrays. > > Josef > > > Well, that's it. > Sorry for large letter. > > Alex. > > > >> 22 ????. 2017 ?., ? 18:38, Matthew Harrigan > ???????(?): >> >> Alex, >> >> Can you please post some code showing exactly what you are trying to do and any issues you are having, particularly the "irritating problems with its row indexing and some other problems" you quote above? >> >> On Wed, Feb 22, 2017 at 10:34 AM, Robert McLeod > wrote: >> Just as a note, Appveyor supports uploading modules to "public websites": >> >> https://packaging.python.org/appveyor/ >> >> The main issue I would see from this, is the PyPi has my password stored on my machine in a plain text file. I'm not sure whether there's a way to provide Appveyor with a SSH key instead. >> >> On Wed, Feb 22, 2017 at 4:23 PM, Alex Rogozhnikov > wrote: >> Hi Francesc, >> thanks a lot for you reply and for your impressive job on bcolz! >> >> Bcolz seems to make stress on compression, which is not of much interest for me, but the ctable, and chunked operations look very appropriate to me now. (Of course, I'll need to test it much before I can say this for sure, that's current impression). >> >> The strongest concern with bcolz so far is that it seems to be completely non-trivial to install on windows systems, while pip provides binaries for most (or all?) OS for numpy. >> I didn't build pip binary wheels myself, but is it hard / impossible to cook pip-installabel binaries? >> >>> ?You can change shapes of numpy arrays, but that usually involves copies of the whole container. >> sure, but this is ok for me, as I plan to organize column editing in 'batches', so this should require seldom copying. >> It would be nice to see an example to understand how deep I need to go inside numpy. >> >> Cheers, >> Alex. >> >> >> >> >>> 22 ????. 2017 ?., ? 17:03, Francesc Alted > ???????(?): >>> >>> Hi Alex, >>> >>> 2017-02-22 12:45 GMT+01:00 Alex Rogozhnikov >: >>> Hi Nathaniel, >>> >>> >>>> pandas >>> >>> yup, the idea was to have minimal pandas.DataFrame-like storage (which I was using for a long time), >>> but without irritating problems with its row indexing and some other problems like interaction with matplotlib. >>> >>>> A dict of arrays? >>> >>> >>> that's what I've started from and implemented, but at some point I decided that I'm reinventing the wheel and numpy has something already. In principle, I can ignore this 'column-oriented' storage requirement, but potentially it may turn out to be quite slow-ish if dtype's size is large. >>> >>> Suggestions are welcome. >>> >>> ?You may want to try bcolz: >>> >>> https://github.com/Blosc/bcolz >>> >>> bcolz is a columnar storage, basically as you require, but data is compressed by default even when stored in-memory (although you can disable compression if you want to).? >>> >>> >>> >>> Another strange question: >>> in general, it is considered that once numpy.array is created, it's shape not changed. >>> But if i want to keep the same recarray and change it's dtype and/or shape, is there a way to do this? >>> >>> ?You can change shapes of numpy arrays, but that usually involves copies of the whole container. With bcolz you can change length and add/del columns without copies.? If your containers are large, it is better to inform bcolz on its final estimated size. See: >>> >>> http://bcolz.blosc.org/en/latest/opt-tips.html >>> >>> ?Francesc? >>> >>> >>> Thanks, >>> Alex. >>> >>> >>> >>>> 22 ????. 2017 ?., ? 3:53, Nathaniel Smith > ???????(?): >>>> >>>> On Feb 21, 2017 3:24 PM, "Alex Rogozhnikov" > wrote: >>>> Ah, got it. Thanks, Chris! >>>> I thought recarray can be only one-dimensional (like tables with named columns). >>>> >>>> Maybe it's better to ask directly what I was looking for: >>>> something that works like a table with named columns (but no labelling for rows), and keeps data (of different dtypes) in a column-by-column way (and this is numpy, not pandas). >>>> >>>> Is there such a magic thing? >>>> >>>> Well, that's what pandas is for... >>>> >>>> A dict of arrays? >>>> >>>> -n >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >>> >>> >>> -- >>> Francesc Alted >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> >> -- >> Robert McLeod, Ph.D. >> Center for Cellular Imaging and Nano Analytics (C-CINA) >> Biozentrum der Universit?t Basel >> Mattenstrasse 26, 4058 Basel >> Work: +41.061.387.3225 >> robert.mcleod at unibas.ch >> robert.mcleod at bsse.ethz.ch >> robbmcleod at gmail.com >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From alex.rogozhnikov at yandex.ru Wed Feb 22 13:24:00 2017 From: alex.rogozhnikov at yandex.ru (Alex Rogozhnikov) Date: Wed, 22 Feb 2017 21:24:00 +0300 Subject: [Numpy-discussion] Fortran order in recarray. In-Reply-To: References: <0902C347-89B2-41EE-9367-F0C7A4F864D4@yandex.ru> <917C8C1B-FA4C-4128-9991-82DC6AFC3EAA@yandex.ru> <04A70978-5E6B-43BB-B662-539DC018DAAA@yandex.ru> Message-ID: <77BC87D6-D986-4E13-AC02-8303CD519A70@yandex.ru> Hi Stephan, thanks for the note. The progress over last two years wasn't impressive IMO, but I hope you'll manage. As you suggest, I'll have a look at xarray too, as I see xarray.Dataset. I was sure that it doesn't work with non-homogeneous data at all, clearly I need to refresh my opinion. > 22 ????. 2017 ?., ? 20:55, Stephan Hoyer ???????(?): > > On Wed, Feb 22, 2017 at 8:57 AM, Alex Rogozhnikov > wrote: > Pandas may be nice, if you need a report, and you need get it done tomorrow. Then you'll throw away the code. When we initially used pandas as main data storage in yandex/rep, it looked like an good idea, but a year later it was obvious this was a wrong decision. In case when you build data pipeline / research that should be working several years later (using some other installation by someone else), usage of pandas shall be minimal. > > The pandas development team (myself included) is well aware of these issues. There are long term plans/hopes to fix this, but there's a lot of work to be done and some hard choices to make: > https://github.com/pandas-dev/pandas/issues/10000 > https://github.com/pandas-dev/pandas/issues/13862 > > That's why I am looking for a reliable pandas substitute, which should be: > - completely consistent with numpy and should fail when this wasn't implemented / impossible > - fewer new abstractions, nobody wants to learn one-more-way-to-manipulate-the-data, specifically other researchers > - it may be less convenient for interactive data mungling > - in particular, less methods is ok > - written code should be interpretable, and hardly can be misinterpreted. > - not super slow, 1-10 gigabytes datasets are a normal situation > > This has some overlap with our motivations for writing Xarray (http://xarray.pydata.org ), so I encourage you to take a look. It still might be more complex than you're looking for, but we did try to clean up the really ambiguous APIs from pandas like indexing. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Wed Feb 22 17:27:58 2017 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Wed, 22 Feb 2017 17:27:58 -0500 Subject: [Numpy-discussion] __numpy_ufunc__ In-Reply-To: References: Message-ID: HI Stephan, Indeed, `__array_ufunc__` is None would be for classes that interact with arrays only as if they were any other numeric type, and thus have no use for ufuncs, but may need normal operations (astropy's `Unit` class is a reasonably good example). Your example also makes clear that, indeed, setting __array__ or __array_ufunc__ to None implies different things, so concretely here the proposal is that if `__array_ufunc__` is None, ndarray methods will return `NotImplemented`. As an aside, I think that if we do not have something like that, we'll be stuck with supporting `__array_priority__`. (Which is OK by me too, but it might as well be a conscious choice.) All the best, Marten From evgeny.burovskiy at gmail.com Fri Feb 24 09:36:51 2017 From: evgeny.burovskiy at gmail.com (Evgeni Burovski) Date: Fri, 24 Feb 2017 17:36:51 +0300 Subject: [Numpy-discussion] ANN: Scipy 0.19.0 release candidate 2 Message-ID: Hi, I'm pleased to announce the availability of the second release candidate for Scipy 0.19.0. It contains contributions from 121 people over the course of seven months. The main difference to the rc1 is several Windows specific issues that were fixed (special thanks to Christoph Gohlke). Please try this release and report any issues on Github tracker, https://github.com/scipy/scipy, or scipy-dev mailing list. Source tarballs and release notes are available from Github releases, https://github.com/scipy/scipy/releases/tag/v0.19.0rc2 We appreciate if you could both run self-tests on your hardware, and test your code against this release. If no issues are reported, this will graduate to Scipy 0.19.0 final on the 9th March 2017. Thanks to everyone who contributed to this release! Cheers, Evgeni -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 ========================== SciPy 0.19.0 Release Notes ========================== .. note:: Scipy 0.19.0 is not released yet! .. contents:: SciPy 0.19.0 is the culmination of X months of hard work. It contains many new features, numerous bug-fixes, improved test coverage and better documentation. There have been a number of deprecations and API changes in this release, which are documented below. All users are encouraged to upgrade to this release, as there are a large number of bug-fixes and optimizations. Moreover, our development attention will now shift to bug-fix releases on the 0.19.x branch, and on adding new features on the master branch. This release requires Python 2.7 or 3.4-3.6 and NumPy 1.8.2 or greater. Highlights of this release include: - - A unified foreign function interface layer, `scipy.LowLevelCallable`. - - Cython API for scalar, typed versions of the universal functions from the `scipy.special` module, via `cimport scipy.special.cython_special`. New features ============ Foreign function interface improvements - --------------------------------------- `scipy.LowLevelCallable` provides a new unified interface for wrapping low-level compiled callback functions in the Python space. It supports Cython imported "api" functions, ctypes function pointers, CFFI function pointers, ``PyCapsules``, Numba jitted functions and more. See `gh-6509 `_ for details. `scipy.linalg` improvements - --------------------------- The function `scipy.linalg.solve` obtained two more keywords ``assume_a`` and ``transposed``. The underlying LAPACK routines are replaced with "expert" versions and now can also be used to solve symmetric, hermitian and positive definite coefficient matrices. Moreover, ill-conditioned matrices now cause a warning to be emitted with the estimated condition number information. Old ``sym_pos`` keyword is kept for backwards compatibility reasons however it is identical to using ``assume_a='pos'``. Moreover, the ``debug`` keyword, which had no function but only printing the ``overwrite_`` values, is deprecated. The function `scipy.linalg.matrix_balance` was added to perform the so-called matrix balancing using the LAPACK xGEBAL routine family. This can be used to approximately equate the row and column norms through diagonal similarity transformations. The functions `scipy.linalg.solve_continuous_are` and `scipy.linalg.solve_discrete_are` have numerically more stable algorithms. These functions can also solve generalized algebraic matrix Riccati equations. Moreover, both gained a ``balanced`` keyword to turn balancing on and off. `scipy.spatial` improvements - ---------------------------- `scipy.spatial.SphericalVoronoi.sort_vertices_of_regions` has been re-written in Cython to improve performance. `scipy.spatial.SphericalVoronoi` can handle > 200 k points (at least 10 million) and has improved performance. The function `scipy.spatial.distance.directed_hausdorff` was added to calculate the directed Hausdorff distance. ``count_neighbors`` method of `scipy.spatial.cKDTree` gained an ability to perform weighted pair counting via the new keywords ``weights`` and ``cumulative``. See `gh-5647 `_ for details. `scipy.spatial.distance.pdist` and `scipy.spatial.distance.cdist` now support non-double custom metrics. `scipy.ndimage` improvements - ---------------------------- The callback function C API supports PyCapsules in Python 2.7 Multidimensional filters now allow having different extrapolation modes for different axes. `scipy.optimize` improvements - ----------------------------- The `scipy.optimize.basinhopping` global minimizer obtained a new keyword, `seed`, which can be used to seed the random number generator and obtain repeatable minimizations. The keyword `sigma` in `scipy.optimize.curve_fit` was overloaded to also accept the covariance matrix of errors in the data. `scipy.signal` improvements - --------------------------- The function `scipy.signal.correlate` and `scipy.signal.convolve` have a new optional parameter `method`. The default value of `auto` estimates the fastest of two computation methods, the direct approach and the Fourier transform approach. A new function has been added to choose the convolution/correlation method, `scipy.signal.choose_conv_method` which may be appropriate if convolutions or correlations are performed on many arrays of the same size. New functions have been added to calculate complex short time fourier transforms of an input signal, and to invert the transform to recover the original signal: `scipy.signal.stft` and `scipy.signal.istft`. This implementation also fixes the previously incorrect ouput of `scipy.signal.spectrogram` when complex output data were requested. The function `scipy.signal.sosfreqz` was added to compute the frequency response from second-order sections. The function `scipy.signal.unit_impulse` was added to conveniently generate an impulse function. The function `scipy.signal.iirnotch` was added to design second-order IIR notch filters that can be used to remove a frequency component from a signal. The dual function `scipy.signal.iirpeak` was added to compute the coefficients of a second-order IIR peak (resonant) filter. The function `scipy.signal.minimum_phase` was added to convert linear-phase FIR filters to minimum phase. The functions `scipy.signal.upfirdn` and `scipy.signal.resample_poly` are now substantially faster when operating on some n-dimensional arrays when n > 1. The largest reduction in computation time is realized in cases where the size of the array is small (<1k samples or so) along the axis to be filtered. `scipy.fftpack` improvements - ---------------------------- Fast Fourier transform routines now accept `np.float16` inputs and upcast them to `np.float32`. Previously, they would raise an error. `scipy.cluster` improvements - ---------------------------- Methods ``"centroid"`` and ``"median"`` of `scipy.cluster.hierarchy.linkage` have been significantly sped up. Long-standing issues with using ``linkage`` on large input data (over 16 GB) have been resolved. `scipy.sparse` improvements - --------------------------- The functions `scipy.sparse.save_npz` and `scipy.sparse.load_npz` were added, providing simple serialization for some sparse formats. The `prune` method of classes `bsr_matrix`, `csc_matrix`, and `csr_matrix` was updated to reallocate backing arrays under certain conditions, reducing memory usage. The methods `argmin` and `argmax` were added to classes `coo_matrix`, `csc_matrix`, `csr_matrix`, and `bsr_matrix`. New function `scipy.sparse.csgraph.structural_rank` computes the structural rank of a graph with a given sparsity pattern. New function `scipy.sparse.linalg.spsolve_triangular` solves a sparse linear system with a triangular left hand side matrix. `scipy.special` improvements - ---------------------------- Scalar, typed versions of universal functions from `scipy.special` are available in the Cython space via ``cimport`` from the new module `scipy.special.cython_special`. These scalar functions can be expected to be significantly faster then the universal functions for scalar arguments. See the `scipy.special` tutorial for details. Better control over special-function errors is offered by the functions `scipy.special.geterr` and `scipy.special.seterr` and the context manager `scipy.special.errstate`. The names of orthogonal polynomial root functions have been changed to be consistent with other functions relating to orthogonal polynomials. For example, `scipy.special.j_roots` has been renamed `scipy.special.roots_jacobi` for consistency with the related functions `scipy.special.jacobi` and `scipy.special.eval_jacobi`. To preserve back-compatibility the old names have been left as aliases. Wright Omega function is implemented as `scipy.special.wrightomega`. `scipy.stats` improvements - -------------------------- The function `scipy.stats.weightedtau` was added. It provides a weighted version of Kendall's tau. New class `scipy.stats.multinomial` implements the multinomial distribution. New class `scipy.stats.rv_histogram` constructs a continuous univariate distribution with a piecewise linear CDF from a binned data sample. New class `scipy.stats.argus` implements the Argus distribution. `scipy.interpolate` improvements - -------------------------------- New class `scipy.interpolate.BSpline` represents splines. ``BSpline`` objects contain knots and coefficients and can evaluate the spline. The format is consistent with FITPACK, so that one can do, for example:: >>> t, c, k = splrep(x, y, s=0) >>> spl = BSpline(t, c, k) >>> np.allclose(spl(x), y) ``spl*`` functions, `scipy.interpolate.splev`, `scipy.interpolate.splint`, `scipy.interpolate.splder` and `scipy.interpolate.splantider`, accept both ``BSpline`` objects and ``(t, c, k)`` tuples for backwards compatibility. For multidimensional splines, ``c.ndim > 1``, ``BSpline`` objects are consistent with piecewise polynomials, `scipy.interpolate.PPoly`. This means that ``BSpline`` objects are not immediately consistent with `scipy.interpolate.splprep`, and one *cannot* do ``>>> BSpline(*splprep([x, y])[0])``. Consult the `scipy.interpolate` test suite for examples of the precise equivalence. In new code, prefer using ``scipy.interpolate.BSpline`` objects instead of manipulating ``(t, c, k)`` tuples directly. New function `scipy.interpolate.make_interp_spline` constructs an interpolating spline given data points and boundary conditions. New function `scipy.interpolate.make_lsq_spline` constructs a least-squares spline approximation given data points. `scipy.integrate` improvements - ------------------------------ Now `scipy.integrate.fixed_quad` supports vector-valued functions. Deprecated features =================== `scipy.interpolate.splmake`, `scipy.interpolate.spleval` and `scipy.interpolate.spline` are deprecated. The format used by `splmake/spleval` was inconsistent with `splrep/splev` which was confusing to users. `scipy.special.errprint` is deprecated. Improved functionality is available in `scipy.special.seterr`. calling `scipy.spatial.distance.pdist` or `scipy.spatial.distance.cdist` with arguments not needed by the chosen metric is deprecated. Also, metrics `"old_cosine"` and `"old_cos"` are deprecated. Backwards incompatible changes ============================== The deprecated ``scipy.weave`` submodule was removed. `scipy.spatial.distance.squareform` now returns arrays of the same dtype as the input, instead of always float64. `scipy.special.errprint` now returns a boolean. The function `scipy.signal.find_peaks_cwt` now returns an array instead of a list. `scipy.stats.kendalltau` now computes the correct p-value in case the input contains ties. The p-value is also identical to that computed by `scipy.stats.mstats.kendalltau` and by R. If the input does not contain ties there is no change w.r.t. the previous implementation. The function `scipy.linalg.block_diag` will not ignore zero-sized matrices anymore. Instead it will insert rows or columns of zeros of the appropriate size. See gh-4908 for more details. Other changes ============= SciPy wheels will now report their dependency on ``numpy`` on all platforms. This change was made because Numpy wheels are available, and because the pip upgrade behavior is finally changing for the better (use ``--upgrade-strategy=only-if-needed`` for ``pip >= 8.2``; that behavior will become the default in the next major version of ``pip``). Numerical values returned by `scipy.interpolate.interp1d` with ``kind="cubic"`` and ``"quadratic"`` may change relative to previous scipy versions. If your code depended on specific numeric values (i.e., on implementation details of the interpolators), you may want to double-check your results. Authors ======= * @endolith * Max Argus + * Herv? Audren * Alessandro Pietro Bardelli + * Michael Benfield + * Felix Berkenkamp * Matthew Brett * Per Brodtkorb * Evgeni Burovski * Pierre de Buyl * CJ Carey * Brandon Carter + * Tim Cera * Klesk Chonkin * Christian H?ggstr?m + * Luca Citi * Peadar Coyle + * Daniel da Silva + * Greg Dooper + * John Draper + * drlvk + * David Ellis + * Yu Feng * Baptiste Fontaine + * Jed Frey + * Siddhartha Gandhi + * GiggleLiu + * Wim Glenn + * Akash Goel + * Christoph Gohlke * Ralf Gommers * Alexander Goncearenco + * Richard Gowers + * Alex Griffing * Radoslaw Guzinski + * Charles Harris * Callum Jacob Hays + * Ian Henriksen * Randy Heydon + * Lindsey Hiltner + * Gerrit Holl + * Hiroki IKEDA + * jfinkels + * Mher Kazandjian + * Thomas Keck + * keuj6 + * Kornel Kielczewski + * Sergey B Kirpichev + * Vasily Kokorev + * Eric Larson * Denis Laxalde * Gregory R. Lee * Josh Lefler + * Julien Lhermitte + * Evan Limanto + * Nikolay Mayorov * Geordie McBain + * Josue Melka + * Matthieu Melot * michaelvmartin15 + * Surhud More + * Brett M. Morris + * Chris Mutel + * Paul Nation * Andrew Nelson * David Nicholson + * Aaron Nielsen + * Joel Nothman * nrnrk + * Juan Nunez-Iglesias * Mikhail Pak + * Gavin Parnaby + * Thomas Pingel + * Ilhan Polat + * Aman Pratik + * Sebastian Pucilowski * Ted Pudlik * puenka + * Eric Quintero * Tyler Reddy * Joscha Reimer * Antonio Horta Ribeiro + * Edward Richards + * Roman Ring + * Rafael Rossi + * Colm Ryan + * Sami Salonen + * Alvaro Sanchez-Gonzalez + * Johannes Schmitz * Kari Schoonbee * Yurii Shevchuk + * Jonathan Siebert + * Jonathan Tammo Siebert + * Scott Sievert + * Sourav Singh * Byron Smith + * Srikiran + * Samuel St-Jean + * Yoni Teitelbaum + * Bhavika Tekwani * Martin Thoma * timbalam + * Svend Vanderveken + * Sebastiano Vigna + * Aditya Vijaykumar + * Santi Villalba + * Ze Vinicius * Pauli Virtanen * Matteo Visconti * Yusuke Watanabe + * Warren Weckesser * Phillip Weinberg + * Nils Werner * Jakub Wilk * Josh Wilson * wirew0rm + * David Wolever + * Nathan Woods * ybeltukov + * G Young * Evgeny Zhurko + A total of 121 people contributed to this release. People with a "+" by their names contributed a patch for the first time. This list of names is automatically generated, and may not be fully complete. Issues closed for 0.19.0 - ------------------------ - - `#1767 `__: Function definitions in __fitpack.h should be moved. (Trac #1240) - - `#1774 `__: _kmeans chokes on large thresholds (Trac #1247) - - `#2089 `__: Integer overflows cause segfault in linkage function with large... - - `#2190 `__: Are odd-length window functions supposed to be always symmetrical?... - - `#2251 `__: solve_discrete_are in scipy.linalg does (sometimes) not solve... - - `#2580 `__: scipy.interpolate.UnivariateSpline (or a new superclass of it)... - - `#2592 `__: scipy.stats.anderson assumes gumbel_l - - `#3054 `__: scipy.linalg.eig does not handle infinite eigenvalues - - `#3160 `__: multinomial pmf / logpmf - - `#3904 `__: scipy.special.ellipj dn wrong values at quarter period - - `#4044 `__: Inconsistent code book initialization in kmeans - - `#4234 `__: scipy.signal.flattop documentation doesn't list a source for... - - `#4831 `__: Bugs in C code in __quadpack.h - - `#4908 `__: bug: unnessesary validity check for block dimension in scipy.sparse.block_diag - - `#4917 `__: BUG: indexing error for sparse matrix with ix_ - - `#4938 `__: Docs on extending ndimage need to be updated. - - `#5056 `__: sparse matrix element-wise multiplying dense matrix returns dense... - - `#5337 `__: Formula in documentation for correlate is wrong - - `#5537 `__: use OrderedDict in io.netcdf - - `#5750 `__: [doc] missing data index value in KDTree, cKDTree - - `#5755 `__: p-value computation in scipy.stats.kendalltau() in broken in... - - `#5757 `__: BUG: Incorrect complex output of signal.spectrogram - - `#5964 `__: ENH: expose scalar versions of scipy.special functions to cython - - `#6107 `__: scipy.cluster.hierarchy.single segmentation fault with 2**16... - - `#6278 `__: optimize.basinhopping should take a RandomState object - - `#6296 `__: InterpolatedUnivariateSpline: check_finite fails when w is unspecified - - `#6306 `__: Anderson-Darling bad results - - `#6314 `__: scipy.stats.kendaltau() p value not in agreement with R, SPSS... - - `#6340 `__: Curve_fit bounds and maxfev - - `#6377 `__: expm_multiply, complex matrices not working using start,stop,ect... - - `#6382 `__: optimize.differential_evolution stopping criterion has unintuitive... - - `#6391 `__: Global Benchmarking times out at 600s. - - `#6397 `__: mmwrite errors with large (but still 64-bit) integers - - `#6413 `__: scipy.stats.dirichlet computes multivariate gaussian differential... - - `#6428 `__: scipy.stats.mstats.mode modifies input - - `#6440 `__: Figure out ABI break policy for scipy.special Cython API - - `#6441 `__: Using Qhull for halfspace intersection : segfault - - `#6442 `__: scipy.spatial : In incremental mode volume is not recomputed - - `#6451 `__: Documentation for scipy.cluster.hierarchy.to_tree is confusing... - - `#6490 `__: interp1d (kind=zero) returns wrong value for rightmost interpolation... - - `#6521 `__: scipy.stats.entropy does *not* calculate the KL divergence - - `#6530 `__: scipy.stats.spearmanr unexpected NaN handling - - `#6541 `__: Test runner does not run scipy._lib/tests? - - `#6552 `__: BUG: misc.bytescale returns unexpected results when using cmin/cmax... - - `#6556 `__: RectSphereBivariateSpline(u, v, r) fails if min(v) >= pi - - `#6559 `__: Differential_evolution maxiter causing memory overflow - - `#6565 `__: Coverage of spectral functions could be improved - - `#6628 `__: Incorrect parameter name in binomial documentation - - `#6634 `__: Expose LAPACK's xGESVX family for linalg.solve ill-conditioned... - - `#6657 `__: Confusing documentation for `scipy.special.sph_harm` - - `#6676 `__: optimize: Incorrect size of Jacobian returned by `minimize(...,... - - `#6681 `__: add a new context manager to wrap `scipy.special.seterr` - - `#6700 `__: BUG: scipy.io.wavfile.read stays in infinite loop, warns on wav... - - `#6721 `__: scipy.special.chebyt(N) throw a 'TypeError' when N > 64 - - `#6727 `__: Documentation for scipy.stats.norm.fit is incorrect - - `#6764 `__: Documentation for scipy.spatial.Delaunay is partially incorrect - - `#6811 `__: scipy.spatial.SphericalVoronoi fails for large number of points - - `#6841 `__: spearmanr fails when nan_policy='omit' is set - - `#6869 `__: Currently in gaussian_kde, the logpdf function is calculated... - - `#6875 `__: SLSQP inconsistent handling of invalid bounds - - `#6876 `__: Python stopped working (Segfault?) with minimum/maximum filter... - - `#6889 `__: dblquad gives different results under scipy 0.17.1 and 0.18.1 - - `#6898 `__: BUG: dblquad ignores error tolerances - - `#6901 `__: Solving sparse linear systems in CSR format with complex values - - `#6903 `__: issue in spatial.distance.pdist docstring - - `#6917 `__: Problem in passing drop_rule to scipy.sparse.linalg.spilu - - `#6926 `__: signature mismatches for LowLevelCallable - - `#6961 `__: Scipy contains shebang pointing to /usr/bin/python and /bin/bash... - - `#6972 `__: BUG: special: `generate_ufuncs.py` is broken - - `#6984 `__: Assert raises test failure for test_ill_condition_warning - - `#6990 `__: BUG: sparse: Bad documentation of the `k` argument in `sparse.linalg.eigs` - - `#6991 `__: Division by zero in linregress() - - `#7011 `__: possible speed improvment in rv_continuous.fit() - - `#7015 `__: Test failure with Python 3.5 and numpy master Pull requests for 0.19.0 - ------------------------ - - `#2908 `__: Scipy 1.0 Roadmap - - `#3174 `__: add b-splines - - `#4606 `__: ENH: Add a unit impulse waveform function - - `#5608 `__: Adds keyword argument to choose faster convolution method - - `#5647 `__: ENH: Faster count_neighour in cKDTree / + weighted input data - - `#6021 `__: Netcdf append - - `#6058 `__: ENH: scipy.signal - Add stft and istft - - `#6059 `__: ENH: More accurate signal.freqresp for zpk systems - - `#6195 `__: ENH: Cython interface for special - - `#6234 `__: DOC: Fixed a typo in ward() help - - `#6261 `__: ENH: add docstring and clean up code for signal.normalize - - `#6270 `__: MAINT: special: add tests for cdflib - - `#6271 `__: Fix for scipy.cluster.hierarchy.is_isomorphic - - `#6273 `__: optimize: rewrite while loops as for loops - - `#6279 `__: MAINT: Bessel tweaks - - `#6291 `__: Fixes gh-6219: remove runtime warning from genextreme distribution - - `#6294 `__: STY: Some PEP8 and cleaning up imports in stats/_continuous_distns.py - - `#6297 `__: Clarify docs in misc/__init__.py - - `#6300 `__: ENH: sparse: Loosen input validation for `diags` with empty inputs - - `#6301 `__: BUG: standardizes check_finite behavior re optional weights,... - - `#6303 `__: Fixing example in _lazyselect docstring. - - `#6307 `__: MAINT: more improvements to gammainc/gammaincc - - `#6308 `__: Clarified documentation of hypergeometric distribution. - - `#6309 `__: BUG: stats: Improve calculation of the Anderson-Darling statistic. - - `#6315 `__: ENH: Descending order of x in PPoly - - `#6317 `__: ENH: stats: Add support for nan_policy to stats.median_test - - `#6321 `__: TST: fix a typo in test name - - `#6328 `__: ENH: sosfreqz - - `#6335 `__: Define LinregressResult outside of linregress - - `#6337 `__: In anderson test, added support for right skewed gumbel distribution. - - `#6341 `__: Accept several spellings for the curve_fit max number of function... - - `#6342 `__: DOC: cluster: clarify hierarchy.linkage usage - - `#6352 `__: DOC: removed brentq from its own 'see also' - - `#6362 `__: ENH: stats: Use explicit formulas for sf, logsf, etc in weibull... - - `#6369 `__: MAINT: special: add a comment to hyp0f1_complex - - `#6375 `__: Added the multinomial distribution. - - `#6387 `__: MAINT: special: improve accuracy of ellipj's `dn` at quarter... - - `#6388 `__: BenchmarkGlobal - getting it to work in Python3 - - `#6394 `__: ENH: scipy.sparse: add save and load functions for sparse matrices - - `#6400 `__: MAINT: moves global benchmark run from setup_cache to track_all - - `#6403 `__: ENH: seed kwd for basinhopping. Closes #6278 - - `#6404 `__: ENH: signal: added irrnotch and iirpeak functions. - - `#6406 `__: ENH: special: extend `sici`/`shichi` to complex arguments - - `#6407 `__: ENH: Window functions should not accept non-integer or negative... - - `#6408 `__: MAINT: _differentialevolution now uses _lib._util.check_random_state - - `#6427 `__: MAINT: Fix gmpy build & test that mpmath uses gmpy - - `#6439 `__: MAINT: ndimage: update callback function c api - - `#6443 `__: BUG: Fix volume computation in incremental mode - - `#6447 `__: Fixes issue #6413 - Minor documentation fix in the entropy function... - - `#6448 `__: ENH: Add halfspace mode to Qhull - - `#6449 `__: ENH: rtol and atol for differential_evolution termination fixes... - - `#6453 `__: DOC: Add some See Also links between similar functions - - `#6454 `__: DOC: linalg: clarify callable signature in `ordqz` - - `#6457 `__: ENH: spatial: enable non-double dtypes in squareform - - `#6459 `__: BUG: Complex matrices not handled correctly by expm_multiply... - - `#6465 `__: TST DOC Window docs, tests, etc. - - `#6469 `__: ENH: linalg: better handling of infinite eigenvalues in `eig`/`eigvals` - - `#6475 `__: DOC: calling interp1d/interp2d with NaNs is undefined - - `#6477 `__: Document magic numbers in optimize.py - - `#6481 `__: TST: Supress some warnings from test_windows - - `#6485 `__: DOC: spatial: correct typo in procrustes - - `#6487 `__: Fix Bray-Curtis formula in pdist docstring - - `#6493 `__: ENH: Add covariance functionality to scipy.optimize.curve_fit - - `#6494 `__: ENH: stats: Use log1p() to improve some calculations. - - `#6495 `__: BUG: Use MST algorithm instead of SLINK for single linkage clustering - - `#6497 `__: MRG: Add minimum_phase filter function - - `#6505 `__: reset scipy.signal.resample window shape to 1-D - - `#6507 `__: BUG: linkage: Raise exception if y contains non-finite elements - - `#6509 `__: ENH: _lib: add common machinery for low-level callback functions - - `#6520 `__: scipy.sparse.base.__mul__ non-numpy/scipy objects with 'shape'... - - `#6522 `__: Replace kl_div by rel_entr in entropy - - `#6524 `__: DOC: add next_fast_len to list of functions - - `#6527 `__: DOC: Release notes to reflect the new covariance feature in optimize.curve_fit - - `#6532 `__: ENH: Simplify _cos_win, document it, add symmetric/periodic arg - - `#6535 `__: MAINT: sparse.csgraph: updating old cython loops - - `#6540 `__: DOC: add to documentation of orthogonal polynomials - - `#6544 `__: TST: Ensure tests for scipy._lib are run by scipy.test() - - `#6546 `__: updated docstring of stats.linregress - - `#6553 `__: commited changes that I originally submitted for scipy.signal.cspline? - - `#6561 `__: BUG: modify signal.find_peaks_cwt() to return array and accept... - - `#6562 `__: DOC: Negative binomial distribution clarification - - `#6563 `__: MAINT: be more liberal in requiring numpy - - `#6567 `__: MAINT: use xrange for iteration in differential_evolution fixes... - - `#6572 `__: BUG: "sp.linalg.solve_discrete_are" fails for random data - - `#6578 `__: BUG: misc: allow both cmin/cmax and low/high params in bytescale - - `#6581 `__: Fix some unfortunate typos - - `#6582 `__: MAINT: linalg: make handling of infinite eigenvalues in `ordqz`... - - `#6585 `__: DOC: interpolate: correct seealso links to ndimage - - `#6588 `__: Update docstring of scipy.spatial.distance_matrix - - `#6592 `__: DOC: Replace 'first' by 'smallest' in mode - - `#6593 `__: MAINT: remove scipy.weave submodule - - `#6594 `__: DOC: distance.squareform: fix html docs, add note about dtype... - - `#6598 `__: [DOC] Fix incorrect error message in medfilt2d - - `#6599 `__: MAINT: linalg: turn a `solve_discrete_are` test back on - - `#6600 `__: DOC: Add SOS goals to roadmap - - `#6601 `__: DEP: Raise minimum numpy version to 1.8.2 - - `#6605 `__: MAINT: 'new' module is deprecated, don't use it - - `#6607 `__: DOC: add note on change in wheel dependency on numpy and pip... - - `#6609 `__: Fixes #6602 - Typo in docs - - `#6616 `__: ENH: generalization of continuous and discrete Riccati solvers... - - `#6621 `__: DOC: improve cluster.hierarchy docstrings. - - `#6623 `__: CS matrix prune method should copy data from large unpruned arrays - - `#6625 `__: DOC: special: complete documentation of `eval_*` functions - - `#6626 `__: TST: special: silence some deprecation warnings - - `#6631 `__: fix parameter name doc for discrete distributions - - `#6632 `__: MAINT: stats: change some instances of `special` to `sc` - - `#6633 `__: MAINT: refguide: py2k long integers are equal to py3k integers - - `#6638 `__: MAINT: change type declaration in cluster.linkage, prevent overflow - - `#6640 `__: BUG: fix issue with duplicate values used in cluster.vq.kmeans - - `#6641 `__: BUG: fix corner case in cluster.vq.kmeans for large thresholds - - `#6643 `__: MAINT: clean up truncation modes of dendrogram - - `#6645 `__: MAINT: special: rename `*_roots` functions - - `#6646 `__: MAINT: clean up mpmath imports - - `#6647 `__: DOC: add sqrt to Mahalanobis description for pdist - - `#6648 `__: DOC: special: add a section on `cython_special` to the tutorial - - `#6649 `__: ENH: Added scipy.spatial.distance.directed_hausdorff - - `#6650 `__: DOC: add Sphinx roles for DOI and arXiv links - - `#6651 `__: BUG: mstats: make sure mode(..., None) does not modify its input - - `#6652 `__: DOC: special: add section to tutorial on functions not in special - - `#6653 `__: ENH: special: add the Wright Omega function - - `#6656 `__: ENH: don't coerce input to double with custom metric in cdist... - - `#6658 `__: Faster/shorter code for computation of discordances - - `#6659 `__: DOC: special: make __init__ summaries and html summaries match - - `#6661 `__: general.rst: Fix a typo - - `#6664 `__: TST: Spectral functions' window correction factor - - `#6665 `__: [DOC] Conditions on v in RectSphereBivariateSpline - - `#6668 `__: DOC: Mention negative masses for center of mass - - `#6675 `__: MAINT: special: remove outdated README - - `#6677 `__: BUG: Fixes computation of p-values. - - `#6679 `__: BUG: optimize: return correct Jacobian for method 'SLSQP' in... - - `#6680 `__: ENH: Add structural rank to sparse.csgraph - - `#6686 `__: TST: Added Airspeed Velocity benchmarks for SphericalVoronoi - - `#6687 `__: DOC: add section "deciding on new features" to developer guide. - - `#6691 `__: ENH: Clearer error when fmin_slsqp obj doesn't return scalar - - `#6702 `__: TST: Added airspeed velocity benchmarks for scipy.spatial.distance.cdist - - `#6707 `__: TST: interpolate: test fitpack wrappers, not _impl - - `#6709 `__: TST: fix a number of test failures on 32-bit systems - - `#6711 `__: MAINT: move function definitions from __fitpack.h to _fitpackmodule.c - - `#6712 `__: MAINT: clean up wishlist in stats.morestats, and copyright statement. - - `#6715 `__: DOC: update the release notes with BSpline et al. - - `#6716 `__: MAINT: scipy.io.wavfile: No infinite loop when trying to read... - - `#6717 `__: some style cleanup - - `#6723 `__: BUG: special: cast to float before in-place multiplication in... - - `#6726 `__: address performance regressions in interp1d - - `#6728 `__: DOC: made code examples in `integrate` tutorial copy-pasteable - - `#6731 `__: DOC: scipy.optimize: Added an example for wrapping complex-valued... - - `#6732 `__: MAINT: cython_special: remove `errprint` - - `#6733 `__: MAINT: special: fix some pyflakes warnings - - `#6734 `__: DOC: sparse.linalg: fixed matrix description in `bicgstab` doc - - `#6737 `__: BLD: update `cythonize.py` to detect changes in pxi files - - `#6740 `__: DOC: special: some small fixes to docstrings - - `#6741 `__: MAINT: remove dead code in interpolate.py - - `#6742 `__: BUG: fix ``linalg.block_diag`` to support zero-sized matrices. - - `#6744 `__: ENH: interpolate: make PPoly.from_spline accept BSpline objects - - `#6746 `__: DOC: special: clarify use of Condon-Shortley phase in `sph_harm`/`lpmv` - - `#6750 `__: ENH: sparse: avoid densification on broadcasted elem-wise mult - - `#6751 `__: sinm doc explained cosm - - `#6753 `__: ENH: special: allow for more fine-tuned error handling - - `#6759 `__: Move logsumexp and pade from scipy.misc to scipy.special and... - - `#6761 `__: ENH: argmax and argmin methods for sparse matrices - - `#6762 `__: DOC: Improve docstrings of sparse matrices - - `#6763 `__: ENH: Weighted tau - - `#6768 `__: ENH: cythonized spherical Voronoi region polygon vertex sorting - - `#6770 `__: Correction of Delaunay class' documentation - - `#6775 `__: ENH: Integrating LAPACK "expert" routines with conditioning warnings... - - `#6776 `__: MAINT: Removing the trivial f2py warnings - - `#6777 `__: DOC: Update rv_continuous.fit doc. - - `#6778 `__: MAINT: cluster.hierarchy: Improved wording of error msgs - - `#6786 `__: BLD: increase minimum Cython version to 0.23.4 - - `#6787 `__: DOC: expand on ``linalg.block_diag`` changes in 0.19.0 release... - - `#6789 `__: ENH: Add further documentation for norm.fit - - `#6790 `__: MAINT: Fix a potential problem in nn_chain linkage algorithm - - `#6791 `__: DOC: Add examples to scipy.ndimage.fourier - - `#6792 `__: DOC: fix some numpydoc / Sphinx issues. - - `#6793 `__: MAINT: fix circular import after moving functions out of misc - - `#6796 `__: TST: test importing each submodule. Regression test for gh-6793. - - `#6799 `__: ENH: stats: Argus distribution - - `#6801 `__: ENH: stats: Histogram distribution - - `#6803 `__: TST: make sure tests for ``_build_utils`` are run. - - `#6804 `__: MAINT: more fixes in `loggamma` - - `#6806 `__: ENH: Faster linkage for 'centroid' and 'median' methods - - `#6810 `__: ENH: speed up upfirdn and resample_poly for n-dimensional arrays - - `#6812 `__: TST: Added ConvexHull asv benchmark code - - `#6814 `__: ENH: Different extrapolation modes for different dimensions in... - - `#6826 `__: Signal spectral window default fix - - `#6828 `__: BUG: SphericalVoronoi Space Complexity (Fixes #6811) - - `#6830 `__: RealData docstring correction - - `#6834 `__: DOC: Added reference for skewtest function. See #6829 - - `#6836 `__: DOC: Added mode='mirror' in the docstring for the functions accepting... - - `#6838 `__: MAINT: sparse: start removing old BSR methods - - `#6844 `__: handle incompatible dimensions when input is not an ndarray in... - - `#6847 `__: Added maxiter to golden search. - - `#6850 `__: BUG: added check for optional param scipy.stats.spearmanr - - `#6858 `__: MAINT: Removing redundant tests - - `#6861 `__: DEP: Fix escape sequences deprecated in Python 3.6. - - `#6862 `__: DOC: dx should be float, not int - - `#6863 `__: updated documentation curve_fit - - `#6866 `__: DOC : added some documentation to j1 referring to spherical_jn - - `#6867 `__: DOC: cdist move long examples list into Notes section - - `#6868 `__: BUG: Make stats.mode return a ModeResult namedtuple on empty... - - `#6871 `__: Corrected documentation. - - `#6874 `__: ENH: gaussian_kde.logpdf based on logsumexp - - `#6877 `__: BUG: ndimage: guard against footprints of all zeros - - `#6881 `__: python 3.6 - - `#6885 `__: Vectorized integrate.fixed_quad - - `#6886 `__: fixed typo - - `#6891 `__: TST: fix failures for linalg.dare/care due to tightened test... - - `#6892 `__: DOC: fix a bunch of Sphinx errors. - - `#6894 `__: TST: Added asv benchmarks for scipy.spatial.Voronoi - - `#6908 `__: BUG: Fix return dtype for complex input in spsolve - - `#6909 `__: ENH: fftpack: use float32 routines for float16 inputs. - - `#6911 `__: added min/max support to binned_statistic - - `#6913 `__: Fix 6875: SLSQP raise ValueError for all invalid bounds. - - `#6914 `__: DOCS: GH6903 updating docs of Spatial.distance.pdist - - `#6916 `__: MAINT: fix some issues for 32-bit Python - - `#6924 `__: BLD: update Bento build for scipy.LowLevelCallable - - `#6932 `__: ENH: Use OrderedDict in io.netcdf. Closes gh-5537 - - `#6933 `__: BUG: fix LowLevelCallable issue on 32-bit Python. - - `#6936 `__: BUG: sparse: handle size-1 2D indexes correctly - - `#6938 `__: TST: fix test failures in special on 32-bit Python. - - `#6939 `__: Added attributes list to cKDTree docstring - - `#6940 `__: improve efficiency of dok_matrix.tocoo - - `#6942 `__: DOC: add link to liac-arff package in the io.arff docstring. - - `#6943 `__: MAINT: Docstring fixes and an additional test for linalg.solve - - `#6944 `__: DOC: Add example of odeint with a banded Jacobian to the integrate... - - `#6946 `__: ENH: hypergeom.logpmf in terms of betaln - - `#6947 `__: TST: speedup distance tests - - `#6948 `__: DEP: Deprecate the keyword "debug" from linalg.solve - - `#6950 `__: BUG: Correctly treat large integers in MMIO (fixes #6397) - - `#6952 `__: ENH: Minor user-friendliness cleanup in LowLevelCallable - - `#6956 `__: DOC: improve description of 'output' keyword for convolve - - `#6957 `__: ENH more informative error in sparse.bmat - - `#6962 `__: Shebang fixes - - `#6964 `__: DOC: note argmin/argmax addition - - `#6965 `__: BUG: Fix issues passing error tolerances in dblquad and tplquad. - - `#6971 `__: fix the docstring of signaltools.correlate - - `#6973 `__: Silence expected numpy warnings in scipy.ndimage.interpolation.zoom() - - `#6975 `__: BUG: special: fix regex in `generate_ufuncs.py` - - `#6976 `__: Update docstring for griddata - - `#6978 `__: Avoid division by zero in zoom factor calculation - - `#6979 `__: BUG: ARE solvers did not check the generalized case carefully - - `#6985 `__: ENH: sparse: add scipy.sparse.linalg.spsolve_triangular - - `#6994 `__: MAINT: spatial: updates to plotting utils - - `#6995 `__: DOC: Bad documentation of k in sparse.linalg.eigs See #6990 - - `#6997 `__: TST: Changed the test with a less singular example - - `#7000 `__: DOC: clarify interp1d 'zero' argument - - `#7007 `__: BUG: Fix division by zero in linregress() for 2 data points - - `#7009 `__: BUG: Fix problem in passing drop_rule to scipy.sparse.linalg.spilu - - `#7012 `__: speed improvment in _distn_infrastructure.py - - `#7014 `__: Fix Typo: add a single quotation mark to fix a slight typo - - `#7021 `__: MAINT: stats: use machine constants from np.finfo, not machar - - `#7026 `__: MAINT: update .mailmap - - `#7032 `__: Fix layout of rv_histogram docs - - `#7035 `__: DOC: update 0.19.0 release notes - - `#7036 `__: ENH: Add more boundary options to signal.stft - - `#7040 `__: TST: stats: skip too slow tests - - `#7042 `__: MAINT: sparse: speed up setdiag tests - - `#7043 `__: MAINT: refactory and code cleaning Xdist - - `#7053 `__: Fix msvc 9 and 10 compile errors - - `#7060 `__: DOC: updated release notes with #7043 and #6656 - - `#7062 `__: MAINT: Change defaut STFT boundary kwarg to "zeros" - - `#7064 `__: Fix ValueError: path is on mount 'X:', start on mount 'D:' on... - - `#7067 `__: TST: Fix PermissionError: [Errno 13] Permission denied on Windows - - `#7068 `__: TST: Fix UnboundLocalError: local variable 'data' referenced... - - `#7069 `__: Fix OverflowError: Python int too large to convert to C long... - - `#7071 `__: TST: silence RuntimeWarning for nan test of stats.spearmanr - - `#7072 `__: Fix OverflowError: Python int too large to convert to C long... - - `#7084 `__: TST: linalg: bump tolerance in test_falker Checksums ========= MD5 ~~~ a1d4a08cb0408fda554787e502a99d89 scipy-0.19.0rc2.tar.gz e5531359c06c6cccb544e0fcb728d832 scipy-0.19.0rc2.tar.xz aad1a08d3eee196a6d3f850b61d8aec5 scipy-0.19.0rc2.zip SHA256 ~~~~~~ a9f978f8cc569138d16f0b115476da8355fd149cbacefed7f283682d52540d05 scipy-0.19.0rc2.tar.gz 643d47551d16d965efe4f16613ee71ac6396481df649df3ad3a63848776cc084 scipy-0.19.0rc2.tar.xz e8f80a26a1089b35b7f410509fa703598a4cd74dd42636f22aa1b65085beaf9f scipy-0.19.0rc2.zip -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAEBCAAGBQJYsEDoAAoJEIp0pQ0zQcu+UjAH/1gErT0trZxgNqtxEtIbQ0RP V9EIzQ3kj1QJHx7KKgheswHT4Ee6LYhA5MXzMAIDRx2TIMzE3ByBktK/So6+8U96 jCkceIHMwHgVnzxA8lkoB6AILHWMaaN/jNKk0u9Y6IKwUY4S1kQkYysAlMOw3gaG g1fOJzw1Bn4Yq0dEXw+tAAFRVPSAYKohJgc7U6AvW435VI3QmOqsCCbwh2ua1oN6 egaIBD0QBOGVYb7vlhOMiKPKKkIhEY2DVXnA7EVCiJMykq/dc+pReIL8YNZ+xI6F WUNv1KUAmVfKXU5MfSwTcxLLH89HJ7WfE8asCmVpgavhqdk0i7D4zVzN20uGe18= =f4kz -----END PGP SIGNATURE----- From evgeny.burovskiy at gmail.com Fri Feb 24 10:00:10 2017 From: evgeny.burovskiy at gmail.com (Evgeni Burovski) Date: Fri, 24 Feb 2017 18:00:10 +0300 Subject: [Numpy-discussion] Could we simplify backporting? In-Reply-To: References: Message-ID: > I really don't like the double work and the large amount of noise coming > from backporting every other PR to NumPy very quickly. For SciPy the policy > is: > - anyone can set the "backport-candidate" label > - the release manager backports, usually a bunch in one go > - only important fixes get backported (involves some judging, but things > like silencing warnings, doc fixes, etc. are not important enough) > > This works well, and I'd hope that we can make the NumPy approach similar. Just to add to what Ralf is saying: * people sometimes send PRs against maintenance branches instead of master. In scipy we just label these as backport-candidate, and then the RM sorts them out: which ones to forward port and which ones to backport. This works OK on scipy scale (I had just trawled though a half dozen or so). If numpy needs more backport activity, it might make sense to have separate labels for backport-candidate and needs-forward-port. * A while ago Julian was advocating for some git magic of basing PRs on the common merge base for master and maintenance branches, so that a commit can be merged directly without a cherry-pick (I think). This seems to be beyond a common git-fu (beyond mine for sure!). What I did in scipy, I just edited the commit messages after cherry-picking to add a reference of the original PR a commit was cherry-picked from. Cheers, Evgeni From jtaylor.debian at googlemail.com Fri Feb 24 10:11:38 2017 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Fri, 24 Feb 2017 16:11:38 +0100 Subject: [Numpy-discussion] Could we simplify backporting? In-Reply-To: References: Message-ID: <0379a08c-0029-5fc3-06e6-4574f2f8f586@googlemail.com> On 24.02.2017 16:00, Evgeni Burovski wrote: >> I really don't like the double work and the large amount of noise coming >> from backporting every other PR to NumPy very quickly. For SciPy the policy >> is: >> - anyone can set the "backport-candidate" label >> - the release manager backports, usually a bunch in one go >> - only important fixes get backported (involves some judging, but things >> like silencing warnings, doc fixes, etc. are not important enough) >> >> This works well, and I'd hope that we can make the NumPy approach similar. > > > Just to add to what Ralf is saying: > > * people sometimes send PRs against maintenance branches instead of > master. In scipy we just label these as backport-candidate, and then > the RM sorts them out: which ones to forward port and which ones to > backport. This works OK on scipy scale (I had just trawled though a > half dozen or so). If numpy needs more backport activity, it might > make sense to have separate labels for backport-candidate and > needs-forward-port. > > * A while ago Julian was advocating for some git magic of basing PRs > on the common merge base for master and maintenance branches, so that > a commit can be merged directly without a cherry-pick (I think). This > seems to be beyond a common git-fu (beyond mine for sure!). What I did > in scipy, I just edited the commit messages after cherry-picking to > add a reference of the original PR a commit was cherry-picked from. > from the bugfix branch: git rebase --onto $(git merge-base master maintenance) HEAD^ From charlesr.harris at gmail.com Fri Feb 24 10:22:43 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 24 Feb 2017 08:22:43 -0700 Subject: [Numpy-discussion] Could we simplify backporting? In-Reply-To: References: Message-ID: On Fri, Feb 24, 2017 at 8:00 AM, Evgeni Burovski wrote: > > I really don't like the double work and the large amount of noise coming > > from backporting every other PR to NumPy very quickly. For SciPy the > policy > > is: > > - anyone can set the "backport-candidate" label > > - the release manager backports, usually a bunch in one go > > - only important fixes get backported (involves some judging, but > things > > like silencing warnings, doc fixes, etc. are not important enough) > > > > This works well, and I'd hope that we can make the NumPy approach > similar. > > > Just to add to what Ralf is saying: > > * people sometimes send PRs against maintenance branches instead of > master. In scipy we just label these as backport-candidate, and then > the RM sorts them out: which ones to forward port and which ones to > backport. This works OK on scipy scale (I had just trawled though a > half dozen or so). If numpy needs more backport activity, it might > make sense to have separate labels for backport-candidate and > needs-forward-port. > > * A while ago Julian was advocating for some git magic of basing PRs > on the common merge base for master and maintenance branches, so that > a commit can be merged directly without a cherry-pick (I think). This > seems to be beyond a common git-fu (beyond mine for sure!). What I did > in scipy, I just edited the commit messages after cherry-picking to > add a reference of the original PR a commit was cherry-picked from. > Cherry-picking is easier, especially when there are only a few backports without conflicts. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sat Feb 25 02:13:00 2017 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 24 Feb 2017 23:13:00 -0800 Subject: [Numpy-discussion] PowerPC testing servers In-Reply-To: References: Message-ID: On Wed, Feb 15, 2017 at 6:53 PM, Sandro Tosi wrote: >> A recent post to the wheel-builders mailing list pointed out some >> links to places providing free PowerPC hosting for open source >> projects, if they agree to a submitted request: > > The debian project has some powerpc machines (and we still build numpy > on those boxes when i upload a new revision to our archives) and they > also have hosts dedicated to let debian developers login and debug > issues with their packages on that architecture. I can sponsor access > to those machines for some of you, but it is not a place where you can > host a CI instance. > > Just keep it in mind more broadly than powerpc, f.e. these are all the > archs where numpy was built after the last upload > https://buildd.debian.org/status/package.php?p=python-numpy&suite=unstable > (the grayed out archs are the ones non release critical, so packages > are built as best effort and if missing is not a big deal) Numpy master now passes all tests on PPC64el. Still a couple of remaining failures for PPC64 (big-endian): * https://github.com/numpy/numpy/pull/8566 * https://github.com/numpy/numpy/issues/8325 Cheers, Matthew From jtaylor.debian at googlemail.com Sat Feb 25 05:17:55 2017 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Sat, 25 Feb 2017 11:17:55 +0100 Subject: [Numpy-discussion] automatically avoiding temporary arrays In-Reply-To: <283e3000-0b9c-886c-e322-1ff4d2e8cb26@googlemail.com> References: <283e3000-0b9c-886c-e322-1ff4d2e8cb26@googlemail.com> Message-ID: hi, This PR has now been merged. As it has the potential to break stuff please test your applications, in particular ones that use cython and C-extensions against master. It will only do something when working on large arrays (> 256 KiB) on platforms providing the backtrace() function. So linux and mac but windows probably not (though windows could probably be added if someone volunteers to write the equivalent winapi code). If you compile from source and want to see for sure if it is doing anything you can set the NPY_ELIDE_DEBUG define in ./numpy/core/src/multiarray/temp_elide.c to 1 or 2 and it will print out when it is eliding something. cheers, Julian On 30.09.2016 15:38, Julian Taylor wrote: > hi, > Temporary arrays generated in expressions are expensive as the imply > extra memory bandwidth which is the bottleneck in most numpy operations. > For example: > > r = a + b + c > > creates the b + c temporary and then adds a to it. > This can be rewritten to be more efficient using inplace operations: > > r = b + c > r += a > > This saves some memory bandwidth and can speedup the operation by 50% > for very large arrays or even more if the inplace operation allows it to > be completed completely in the cpu cache. > > The problem is that inplace operations are a lot less readable so they > are often only used in well optimized code. But due to pythons > refcounting semantics we can actually do some inplace conversions > transparently. > If an operand in python has a reference count of one it must be a > temporary so we can use it as the destination array. CPython itself does > this optimization for string concatenations. > > In numpy we have the issue that we can be called from the C-API directly > where the reference count may be one for other reasons. > To solve this we can check the backtrace until the python frame > evaluation function. If there are only numpy and python functions in > between that and our entry point we should be able to elide the temporary. > > This PR implements this: > https://github.com/numpy/numpy/pull/7997 > > It currently only supports Linux with glibc (which has reliable > backtraces via unwinding) and maybe MacOS depending on how good their > backtrace is. On windows the backtrace APIs are different and I don't > know them but in theory it could also be done there. > > A problem is that checking the backtrace is quite expensive, so should > only be enabled when the involved arrays are large enough for it to be > worthwhile. In my testing this seems to be around 180-300KiB sized > arrays, basically where they start spilling out of the CPU L2 cache. > > I made a little crappy benchmark script to test this cutoff in this branch: > https://github.com/juliantaylor/numpy/tree/elide-bench > > If you are interested you can run it with: > python setup.py build_ext -j 4 --inplace > ipython --profile=null check.ipy > > At the end it will plot the ratio between elided and non-elided runtime. > It should get larger than one around 180KiB on most cpus. > > If no one points out some flaw in the approach, I'm hoping to get this > into the next numpy version. > > cheers, > Julian > From charlesr.harris at gmail.com Sat Feb 25 10:21:42 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 25 Feb 2017 08:21:42 -0700 Subject: [Numpy-discussion] Removal of some numpy files Message-ID: Hi All, While looking through the numpy tools directory I noticed some scripts that look outdated that might be candidates for removal: 1. tools/numpy-macosx-installer/ 2. tools/win32build/ Does anyone know if either of those are stlll relevant? Cheers, Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sat Feb 25 10:48:43 2017 From: cournape at gmail.com (David Cournapeau) Date: Sat, 25 Feb 2017 15:48:43 +0000 Subject: [Numpy-discussion] Removal of some numpy files In-Reply-To: References: Message-ID: tools/win32build is used to build the so-called superpack installers, which we don't build anymore AFAIK tools/numpy-macosx-installer is used to build the .dmg for numpy (also not used anymore AFAIK). On Sat, Feb 25, 2017 at 3:21 PM, Charles R Harris wrote: > Hi All, > > While looking through the numpy tools directory I noticed some scripts > that look outdated that might be candidates for removal: > > 1. tools/numpy-macosx-installer/ > 2. tools/win32build/ > > Does anyone know if either of those are stlll relevant? > > Cheers, > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sat Feb 25 16:34:10 2017 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 25 Feb 2017 13:34:10 -0800 Subject: [Numpy-discussion] Removal of some numpy files In-Reply-To: References: Message-ID: On Sat, Feb 25, 2017 at 7:48 AM, David Cournapeau wrote: > tools/win32build is used to build the so-called superpack installers, which > we don't build anymore AFAIK > > tools/numpy-macosx-installer is used to build the .dmg for numpy (also not > used anymore AFAIK). No, we aren't using the .dmg script anymore, dmg installers have been fully replaced by wheels. Cheers, Matthew From charlesr.harris at gmail.com Sat Feb 25 17:22:19 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 25 Feb 2017 15:22:19 -0700 Subject: [Numpy-discussion] Removal of some numpy files In-Reply-To: References: Message-ID: On Sat, Feb 25, 2017 at 2:34 PM, Matthew Brett wrote: > On Sat, Feb 25, 2017 at 7:48 AM, David Cournapeau > wrote: > > tools/win32build is used to build the so-called superpack installers, > which > > we don't build anymore AFAIK > > > > tools/numpy-macosx-installer is used to build the .dmg for numpy (also > not > > used anymore AFAIK). > > No, we aren't using the .dmg script anymore, dmg installers have been > fully replaced by wheels. > I've put up a PR, #8695 to do this. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.v.root at gmail.com Mon Feb 27 13:43:23 2017 From: ben.v.root at gmail.com (Benjamin Root) Date: Mon, 27 Feb 2017 13:43:23 -0500 Subject: [Numpy-discussion] automatically avoiding temporary arrays In-Reply-To: References: <283e3000-0b9c-886c-e322-1ff4d2e8cb26@googlemail.com> Message-ID: What's the timeline for the next release? I have the perfect usecase for this (Haversine calculation on large arrays that takes up ~33% of one of my processing scripts). However, to test it out, I have a huge dependency mess to wade through first, and there are no resources devoted to that project for at least a few weeks. I want to make sure I get feedback to y'all. Cheers! Ben Root On Sat, Feb 25, 2017 at 5:17 AM, Julian Taylor < jtaylor.debian at googlemail.com> wrote: > hi, > This PR has now been merged. As it has the potential to break stuff > please test your applications, in particular ones that use cython and > C-extensions against master. > > It will only do something when working on large arrays (> 256 KiB) on > platforms providing the backtrace() function. So linux and mac but > windows probably not (though windows could probably be added if someone > volunteers to write the equivalent winapi code). > > If you compile from source and want to see for sure if it is doing > anything you can set the NPY_ELIDE_DEBUG define in > ./numpy/core/src/multiarray/temp_elide.c to 1 or 2 and it will print out > when it is eliding something. > > cheers, > Julian > > > On 30.09.2016 15:38, Julian Taylor wrote: > > hi, > > Temporary arrays generated in expressions are expensive as the imply > > extra memory bandwidth which is the bottleneck in most numpy operations. > > For example: > > > > r = a + b + c > > > > creates the b + c temporary and then adds a to it. > > This can be rewritten to be more efficient using inplace operations: > > > > r = b + c > > r += a > > > > This saves some memory bandwidth and can speedup the operation by 50% > > for very large arrays or even more if the inplace operation allows it to > > be completed completely in the cpu cache. > > > > The problem is that inplace operations are a lot less readable so they > > are often only used in well optimized code. But due to pythons > > refcounting semantics we can actually do some inplace conversions > > transparently. > > If an operand in python has a reference count of one it must be a > > temporary so we can use it as the destination array. CPython itself does > > this optimization for string concatenations. > > > > In numpy we have the issue that we can be called from the C-API directly > > where the reference count may be one for other reasons. > > To solve this we can check the backtrace until the python frame > > evaluation function. If there are only numpy and python functions in > > between that and our entry point we should be able to elide the > temporary. > > > > This PR implements this: > > https://github.com/numpy/numpy/pull/7997 > > > > It currently only supports Linux with glibc (which has reliable > > backtraces via unwinding) and maybe MacOS depending on how good their > > backtrace is. On windows the backtrace APIs are different and I don't > > know them but in theory it could also be done there. > > > > A problem is that checking the backtrace is quite expensive, so should > > only be enabled when the involved arrays are large enough for it to be > > worthwhile. In my testing this seems to be around 180-300KiB sized > > arrays, basically where they start spilling out of the CPU L2 cache. > > > > I made a little crappy benchmark script to test this cutoff in this > branch: > > https://github.com/juliantaylor/numpy/tree/elide-bench > > > > If you are interested you can run it with: > > python setup.py build_ext -j 4 --inplace > > ipython --profile=null check.ipy > > > > At the end it will plot the ratio between elided and non-elided runtime. > > It should get larger than one around 180KiB on most cpus. > > > > If no one points out some flaw in the approach, I'm hoping to get this > > into the next numpy version. > > > > cheers, > > Julian > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Feb 27 14:27:48 2017 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 27 Feb 2017 12:27:48 -0700 Subject: [Numpy-discussion] automatically avoiding temporary arrays In-Reply-To: References: <283e3000-0b9c-886c-e322-1ff4d2e8cb26@googlemail.com> Message-ID: On Mon, Feb 27, 2017 at 11:43 AM, Benjamin Root wrote: > What's the timeline for the next release? I have the perfect usecase for > this (Haversine calculation on large arrays that takes up ~33% of one of my > processing scripts). However, to test it out, I have a huge dependency mess > to wade through first, and there are no resources devoted to that project > for at least a few weeks. I want to make sure I get feedback to y'all. > I'd like to branch 1.13.x at the end of March. The planned features that still need to go in are the `__array_ufunc__` work and the `lapack_lite` update. The first RC should not take much longer. I believe Matthew is building wheels for testing on the fly but I don't know where you can find them. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon Feb 27 14:31:53 2017 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 27 Feb 2017 11:31:53 -0800 Subject: [Numpy-discussion] automatically avoiding temporary arrays In-Reply-To: References: <283e3000-0b9c-886c-e322-1ff4d2e8cb26@googlemail.com> Message-ID: Hi, On Mon, Feb 27, 2017 at 11:27 AM, Charles R Harris wrote: > > > On Mon, Feb 27, 2017 at 11:43 AM, Benjamin Root > wrote: >> >> What's the timeline for the next release? I have the perfect usecase for >> this (Haversine calculation on large arrays that takes up ~33% of one of my >> processing scripts). However, to test it out, I have a huge dependency mess >> to wade through first, and there are no resources devoted to that project >> for at least a few weeks. I want to make sure I get feedback to y'all. > > > I'd like to branch 1.13.x at the end of March. The planned features that > still need to go in are the `__array_ufunc__` work and the `lapack_lite` > update. The first RC should not take much longer. I believe Matthew is > building wheels for testing on the fly but I don't know where you can find > them. Latest wheels at : https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a83.ssl.cf2.rackcdn.com/ PRE_URL=https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a83.ssl.cf2.rackcdn.com pip install -f $PRE_URL --pre numpy Cheers, Matthew From ben.v.root at gmail.com Mon Feb 27 14:46:18 2017 From: ben.v.root at gmail.com (Benjamin Root) Date: Mon, 27 Feb 2017 14:46:18 -0500 Subject: [Numpy-discussion] automatically avoiding temporary arrays In-Reply-To: References: <283e3000-0b9c-886c-e322-1ff4d2e8cb26@googlemail.com> Message-ID: Ah, that wheel would be a huge help. Most of the packages I have as dependencies for this project were compiled against v1.10, so I am hoping that there won't be too big of a problem. Are the mostlinux wheels still compatible with CentOS5, or has the base image been bumped to CentOS6? Cheers! Ben Root On Mon, Feb 27, 2017 at 2:31 PM, Matthew Brett wrote: > Hi, > > On Mon, Feb 27, 2017 at 11:27 AM, Charles R Harris > wrote: > > > > > > On Mon, Feb 27, 2017 at 11:43 AM, Benjamin Root > > wrote: > >> > >> What's the timeline for the next release? I have the perfect usecase for > >> this (Haversine calculation on large arrays that takes up ~33% of one > of my > >> processing scripts). However, to test it out, I have a huge dependency > mess > >> to wade through first, and there are no resources devoted to that > project > >> for at least a few weeks. I want to make sure I get feedback to y'all. > > > > > > I'd like to branch 1.13.x at the end of March. The planned features that > > still need to go in are the `__array_ufunc__` work and the `lapack_lite` > > update. The first RC should not take much longer. I believe Matthew is > > building wheels for testing on the fly but I don't know where you can > find > > them. > > Latest wheels at : > https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a > 83.ssl.cf2.rackcdn.com/ > > PRE_URL=https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a > 83.ssl.cf2.rackcdn.com > pip install -f $PRE_URL --pre numpy > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon Feb 27 14:52:31 2017 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 27 Feb 2017 11:52:31 -0800 Subject: [Numpy-discussion] automatically avoiding temporary arrays In-Reply-To: References: <283e3000-0b9c-886c-e322-1ff4d2e8cb26@googlemail.com> Message-ID: Hi, On Mon, Feb 27, 2017 at 11:46 AM, Benjamin Root wrote: > Ah, that wheel would be a huge help. Most of the packages I have as > dependencies for this project were compiled against v1.10, so I am hoping > that there won't be too big of a problem. I don't think so - the problems generally arise when you try to import a package compiled against a more recent version of numpy than the one you have installed. > Are the mostlinux wheels still > compatible with CentOS5, or has the base image been bumped to CentOS6? Yup, still CentOS5. When it shifts to CentOS6, probably not very soon, you'll likely see a version number bump to manylinux2 (manylinux1 is the current version). Cheers, Matthew From matthew.brett at gmail.com Tue Feb 28 03:43:47 2017 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 28 Feb 2017 00:43:47 -0800 Subject: [Numpy-discussion] Daily numpy wheel builds - prefer to per-commit builds? Message-ID: Hi, I've been working to get daily travis-ci cron-job manylinux builds working for numpy and scipy wheels. They are now working OK: https://travis-ci.org/MacPython/numpy-wheels https://travis-ci.org/MacPython/scipy-wheels https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a83.ssl.cf2.rackcdn.com These are daily builds of the respective numpy and scipy master branches. Numpy already has a system, kindly worked up by Olivier Grisel, which uploads wheel builds from each travis-ci run. travis-ci uploads these wheels for every commit, at the "travis-dev-wheels" container on Rackspace, visible at https://f66d8a5767b134cb96d3-4ffdece11fd3f72855e4665bc61c7445.ssl.cf2.rackcdn.com . These builds are specific to the Linux they were built on - in this case Ubuntu 12.04. Some projects use these per-commit builds for testing compatibility with development Numpy code. Now I'm wondering what the relationship should be between the current every-commit builds and the new wheel builds. I think that we should prefer the new wheel builds and deprecate the previous per-commit builds, because: * the new manylinux wheels are self-contained, and so can be installed with pip without extra lines of `apt` installs; * the manylinux builds work on any travis container, not just the current 12.04 container; * manylinux builds should be faster, as they are linked against OpenBLAS; * manylinux wheels are closer to the wheels we distribute for releases, and therefore more useful for testing against. What do y'all think? Cheers, Matthew From ralf.gommers at gmail.com Tue Feb 28 03:50:42 2017 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 28 Feb 2017 21:50:42 +1300 Subject: [Numpy-discussion] Daily numpy wheel builds - prefer to per-commit builds? In-Reply-To: References: Message-ID: On Tue, Feb 28, 2017 at 9:43 PM, Matthew Brett wrote: > Hi, > > I've been working to get daily travis-ci cron-job manylinux builds > working for numpy and scipy wheels. They are now working OK: > > https://travis-ci.org/MacPython/numpy-wheels > https://travis-ci.org/MacPython/scipy-wheels > https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a > 83.ssl.cf2.rackcdn.com > > These are daily builds of the respective numpy and scipy master branches. > > Numpy already has a system, kindly worked up by Olivier Grisel, which > uploads wheel builds from each travis-ci run. travis-ci uploads these > wheels for every commit, at the > "travis-dev-wheels" container on Rackspace, visible at > https://f66d8a5767b134cb96d3-4ffdece11fd3f72855e4665bc61c74 > 45.ssl.cf2.rackcdn.com > . These builds are specific to the Linux they were built on - in this > case Ubuntu 12.04. > > Some projects use these per-commit builds for testing compatibility > with development Numpy code. > > Now I'm wondering what the relationship should be between the current > every-commit builds and the new wheel builds. > > I think that we should prefer the new wheel builds and deprecate the > previous per-commit builds, because: > > * the new manylinux wheels are self-contained, and so can be installed > with pip without extra lines of `apt` installs; > * the manylinux builds work on any travis container, not just the current > 12.04 container; > * manylinux builds should be faster, as they are linked against OpenBLAS; > * manylinux wheels are closer to the wheels we distribute for > releases, and therefore more useful for testing against. > > What do y'all think? > Uploading your daily Linux and OS X builds and just turning off uploads of the per-commit builds sounds like an improvement to me. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From cimrman3 at ntc.zcu.cz Tue Feb 28 05:51:05 2017 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Tue, 28 Feb 2017 11:51:05 +0100 Subject: [Numpy-discussion] ANN: SfePy 2017.1 Message-ID: <8f48385c-0507-1820-c8d4-8758c3b14c11@ntc.zcu.cz> I am pleased to announce release 2017.1 of SfePy. Description ----------- SfePy (simple finite elements in Python) is a software for solving systems of coupled partial differential equations by the finite element method or by the isogeometric analysis (limited support). It is distributed under the new BSD license. Home page: http://sfepy.org Mailing list: http://groups.google.com/group/sfepy-devel Git (source) repository, issue tracker: https://github.com/sfepy/sfepy Highlights of this release -------------------------- - spline-box parametrization of an arbitrary field - conda-forge recipe (thanks to Daniel Wheeler) - fixes for Python 3.6 For full release notes see http://docs.sfepy.org/doc/release_notes.html#id1 (rather long and technical). Cheers, Robert Cimrman --- Contributors to this release in alphabetical order: Siwei Chen Robert Cimrman Jan Heczko Vladimir Lukes Matyas Novak From rmay31 at gmail.com Tue Feb 28 12:32:10 2017 From: rmay31 at gmail.com (Ryan May) Date: Tue, 28 Feb 2017 10:32:10 -0700 Subject: [Numpy-discussion] Arrays and format() Message-ID: Hi, Can someone take a look at: https://github.com/numpy/numpy/issues/7978 The crux of the issue is that this: # This works a = "%0.3g" % np.array(2) a '2' # This does not a = "{0:0.3g}".format(np.array(2)) TypeError: non-empty format string passed to object.__format__ I've now hit this in my code. If someone can even point me in the general direction of the code to dig into for this (please let it be python, please let it be python...), I'll dig in more. Ryan -- Ryan May -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan12343 at gmail.com Tue Feb 28 12:38:02 2017 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Tue, 28 Feb 2017 17:38:02 +0000 Subject: [Numpy-discussion] Arrays and format() In-Reply-To: References: Message-ID: See this issue: https://github.com/numpy/numpy/issues/5543 There was also a very thorough discussion of this recently on this mailing list: http://numpy-discussion.10968.n7.nabble.com/Proposal-to-support-format-td43931.html On Tue, Feb 28, 2017 at 11:32 AM Ryan May wrote: > Hi, > > Can someone take a look at: https://github.com/numpy/numpy/issues/7978 > > The crux of the issue is that this: > > # This works > a = "%0.3g" % np.array(2) > a > '2' > > # This does not > a = "{0:0.3g}".format(np.array(2)) > TypeError: non-empty format string passed to object.__format__ > > I've now hit this in my code. If someone can even point me in the general > direction of the code to dig into for this (please let it be python, please > let it be python...), I'll dig in more. > > Ryan > > -- > Ryan May > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From larsson at cs.uchicago.edu Tue Feb 28 14:59:27 2017 From: larsson at cs.uchicago.edu (Gustav Larsson) Date: Tue, 28 Feb 2017 11:59:27 -0800 Subject: [Numpy-discussion] Arrays and format() In-Reply-To: References: Message-ID: I am hoping to submit a PR for a __format__ numpy enhancement proposal this weekend. I will be a slightly revised version of my original draft posted here two weeks ago. Ryan, if you have any thoughts on the writeup so far, I'd love to hear them. On Tue, Feb 28, 2017 at 9:38 AM, Nathan Goldbaum wrote: > See this issue: > > https://github.com/numpy/numpy/issues/5543 > > There was also a very thorough discussion of this recently on this mailing > list: > > http://numpy-discussion.10968.n7.nabble.com/Proposal-to- > support-format-td43931.html > > On Tue, Feb 28, 2017 at 11:32 AM Ryan May wrote: > >> Hi, >> >> Can someone take a look at: https://github.com/numpy/numpy/issues/7978 >> >> The crux of the issue is that this: >> >> # This works >> a = "%0.3g" % np.array(2) >> a >> '2' >> >> # This does not >> a = "{0:0.3g}".format(np.array(2)) >> TypeError: non-empty format string passed to object.__format__ >> >> I've now hit this in my code. If someone can even point me in the general >> direction of the code to dig into for this (please let it be python, please >> let it be python...), I'll dig in more. >> >> Ryan >> >> -- >> Ryan May >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmay31 at gmail.com Tue Feb 28 15:32:05 2017 From: rmay31 at gmail.com (Ryan May) Date: Tue, 28 Feb 2017 13:32:05 -0700 Subject: [Numpy-discussion] Arrays and format() In-Reply-To: References: Message-ID: Gustav, I had seen this discussion, but completely blanked when I posted my problem. I looked over the proposal and nothing jumped out at me on a quick read-through; it seemed straightforward and would meet my needs. I'll try to carve out some time to think a bit more about it and let you know if anything jumps out. Ryan On Tue, Feb 28, 2017 at 12:59 PM, Gustav Larsson wrote: > I am hoping to submit a PR for a __format__ numpy enhancement proposal > this weekend. I will be a slightly revised version of my original draft > posted here two weeks ago. Ryan, if you have any thoughts on the writeup > so > far, I'd love to hear them. > > > On Tue, Feb 28, 2017 at 9:38 AM, Nathan Goldbaum > wrote: > >> See this issue: >> >> https://github.com/numpy/numpy/issues/5543 >> >> There was also a very thorough discussion of this recently on this >> mailing list: >> >> http://numpy-discussion.10968.n7.nabble.com/Proposal-to-supp >> ort-format-td43931.html >> >> On Tue, Feb 28, 2017 at 11:32 AM Ryan May wrote: >> >>> Hi, >>> >>> Can someone take a look at: https://github.com/numpy/numpy/issues/7978 >>> >>> The crux of the issue is that this: >>> >>> # This works >>> a = "%0.3g" % np.array(2) >>> a >>> '2' >>> >>> # This does not >>> a = "{0:0.3g}".format(np.array(2)) >>> TypeError: non-empty format string passed to object.__format__ >>> >>> I've now hit this in my code. If someone can even point me in the >>> general direction of the code to dig into for this (please let it be >>> python, please let it be python...), I'll dig in more. >>> >>> Ryan >>> >>> -- >>> Ryan May >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Ryan May -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastiankaster at googlemail.com Tue Feb 28 16:47:16 2017 From: sebastiankaster at googlemail.com (Sebastian K) Date: Tue, 28 Feb 2017 22:47:16 +0100 Subject: [Numpy-discussion] Numpy Overhead Message-ID: Hello everyone, I'm interested in the numpy project and tried a lot with the numpy array. I'm wondering what is actually done that there is so much overhead when I call a function in Numpy. What is the reason? Thanks in advance. Regards Sebastian Kaster -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.v.root at gmail.com Tue Feb 28 17:03:18 2017 From: ben.v.root at gmail.com (Benjamin Root) Date: Tue, 28 Feb 2017 17:03:18 -0500 Subject: [Numpy-discussion] Numpy Overhead In-Reply-To: References: Message-ID: You are going to need to provide much more context than that. Overhead compared to what? And where (io, cpu, etc.)? What are the size of your arrays, and what sort of operations are you doing? Finally, how much overhead are you seeing? There can be all sorts of reasons for overhead, and some can easily be mitigated, and others not so much. Cheers! Ben Root On Tue, Feb 28, 2017 at 4:47 PM, Sebastian K wrote: > Hello everyone, > > I'm interested in the numpy project and tried a lot with the numpy array. > I'm wondering what is actually done that there is so much overhead when I > call a function in Numpy. What is the reason? > Thanks in advance. > > Regards > > Sebastian Kaster > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastiankaster at googlemail.com Tue Feb 28 17:12:16 2017 From: sebastiankaster at googlemail.com (Sebastian K) Date: Tue, 28 Feb 2017 23:12:16 +0100 Subject: [Numpy-discussion] Numpy Overhead In-Reply-To: References: Message-ID: Thank you for your answer. For example a very simple algorithm is a matrix multiplication. I can see that the heap peak is much higher for the numpy version in comparison to a pure python 3 implementation. The heap is measured with the libmemusage from libc: *heap peak* Maximum of all *size* arguments of malloc(3) , all products of *nmemb***size* of calloc(3) , all *size* arguments of realloc(3) , *length* arguments of mmap(2) , and *new_size* arguments of mremap(2) . Regards Sebastian On 28 Feb 2017 11:03 p.m., "Benjamin Root" wrote: > You are going to need to provide much more context than that. Overhead > compared to what? And where (io, cpu, etc.)? What are the size of your > arrays, and what sort of operations are you doing? Finally, how much > overhead are you seeing? > > There can be all sorts of reasons for overhead, and some can easily be > mitigated, and others not so much. > > Cheers! > Ben Root > > > On Tue, Feb 28, 2017 at 4:47 PM, Sebastian K < > sebastiankaster at googlemail.com> wrote: > >> Hello everyone, >> >> I'm interested in the numpy project and tried a lot with the numpy array. >> I'm wondering what is actually done that there is so much overhead when I >> call a function in Numpy. What is the reason? >> Thanks in advance. >> >> Regards >> >> Sebastian Kaster >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfoxrabinovitz at gmail.com Tue Feb 28 17:17:33 2017 From: jfoxrabinovitz at gmail.com (Joseph Fox-Rabinovitz) Date: Tue, 28 Feb 2017 17:17:33 -0500 Subject: [Numpy-discussion] Numpy Overhead In-Reply-To: References: Message-ID: It would really help to see the code you are using in both cases as well as some heap usage numbers... -Joe On Tue, Feb 28, 2017 at 5:12 PM, Sebastian K wrote: > Thank you for your answer. > For example a very simple algorithm is a matrix multiplication. I can see > that the heap peak is much higher for the numpy version in comparison to a > pure python 3 implementation. > The heap is measured with the libmemusage from libc: > > *heap peak* > Maximum of all *size* arguments of malloc(3) , all products > of *nmemb***size* of calloc(3) , all *size* arguments of > realloc(3) , *length* arguments of mmap(2) , and *new_size* > arguments of mremap(2) . > > Regards > > Sebastian > > > On 28 Feb 2017 11:03 p.m., "Benjamin Root" wrote: > >> You are going to need to provide much more context than that. Overhead >> compared to what? And where (io, cpu, etc.)? What are the size of your >> arrays, and what sort of operations are you doing? Finally, how much >> overhead are you seeing? >> >> There can be all sorts of reasons for overhead, and some can easily be >> mitigated, and others not so much. >> >> Cheers! >> Ben Root >> >> >> On Tue, Feb 28, 2017 at 4:47 PM, Sebastian K < >> sebastiankaster at googlemail.com> wrote: >> >>> Hello everyone, >>> >>> I'm interested in the numpy project and tried a lot with the numpy >>> array. I'm wondering what is actually done that there is so much overhead >>> when I call a function in Numpy. What is the reason? >>> Thanks in advance. >>> >>> Regards >>> >>> Sebastian Kaster >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue Feb 28 17:17:15 2017 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 28 Feb 2017 14:17:15 -0800 Subject: [Numpy-discussion] Numpy Overhead In-Reply-To: References: Message-ID: Hi, On Tue, Feb 28, 2017 at 2:12 PM, Sebastian K wrote: > Thank you for your answer. > For example a very simple algorithm is a matrix multiplication. I can see > that the heap peak is much higher for the numpy version in comparison to a > pure python 3 implementation. > The heap is measured with the libmemusage from libc: > > heap peak > Maximum of all size arguments of malloc(3), all products > of nmemb*size of calloc(3), all size arguments of > realloc(3), length arguments of mmap(2), and new_size > arguments of mremap(2). Could you post the exact code you're comparing? I think you'll find that a naive Python 3 matrix multiplication method is much, much slower than the same thing with Numpy, with arrays of any reasonable size. Cheers, Matthew From sebastiankaster at googlemail.com Tue Feb 28 17:57:49 2017 From: sebastiankaster at googlemail.com (Sebastian K) Date: Tue, 28 Feb 2017 23:57:49 +0100 Subject: [Numpy-discussion] Numpy Overhead In-Reply-To: References: Message-ID: Yes it is true the execution time is much faster with the numpy function. The Code for numpy version: def createMatrix(n): Matrix = np.empty(shape=(n,n), dtype='float64') for x in range(n): for y in range(n): Matrix[x, y] = 0.1 + ((x*y)%1000)/1000.0 return Matrix if __name__ == '__main__': n = getDimension() if n > 0: A = createMatrix(n) B = createMatrix(n) C = np.empty(shape=(n,n), dtype='float64') C = np.dot(A,B) #print(C) In the pure python version I am just implementing the multiplication with three for-loops. Measured data with libmemusage: dimension of matrix: 100x100 heap peak pure python3: 1060565 heap peakt numpy function: 4917180 2017-02-28 23:17 GMT+01:00 Matthew Brett : > Hi, > > On Tue, Feb 28, 2017 at 2:12 PM, Sebastian K > wrote: > > Thank you for your answer. > > For example a very simple algorithm is a matrix multiplication. I can see > > that the heap peak is much higher for the numpy version in comparison to > a > > pure python 3 implementation. > > The heap is measured with the libmemusage from libc: > > > > heap peak > > Maximum of all size arguments of malloc(3), all > products > > of nmemb*size of calloc(3), all size arguments of > > realloc(3), length arguments of mmap(2), and new_size > > arguments of mremap(2). > > Could you post the exact code you're comparing? > > I think you'll find that a naive Python 3 matrix multiplication method > is much, much slower than the same thing with Numpy, with arrays of > any reasonable size. > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jfoxrabinovitz at gmail.com Tue Feb 28 18:00:07 2017 From: jfoxrabinovitz at gmail.com (Joseph Fox-Rabinovitz) Date: Tue, 28 Feb 2017 18:00:07 -0500 Subject: [Numpy-discussion] Numpy Overhead In-Reply-To: References: Message-ID: For one thing, `C = np.empty(shape=(n,n), dtype='float64')` allocates 10^4 extra elements before being immediately discarded. -Joe On Tue, Feb 28, 2017 at 5:57 PM, Sebastian K wrote: > Yes it is true the execution time is much faster with the numpy function. > > The Code for numpy version: > > def createMatrix(n): > Matrix = np.empty(shape=(n,n), dtype='float64') > for x in range(n): > for y in range(n): > Matrix[x, y] = 0.1 + ((x*y)%1000)/1000.0 > return Matrix > > > > if __name__ == '__main__': > n = getDimension() > if n > 0: > A = createMatrix(n) > B = createMatrix(n) > C = np.empty(shape=(n,n), dtype='float64') > C = np.dot(A,B) > > #print(C) > > In the pure python version I am just implementing the multiplication with > three for-loops. > > Measured data with libmemusage: > dimension of matrix: 100x100 > heap peak pure python3: 1060565 > heap peakt numpy function: 4917180 > > > 2017-02-28 23:17 GMT+01:00 Matthew Brett : > >> Hi, >> >> On Tue, Feb 28, 2017 at 2:12 PM, Sebastian K >> wrote: >> > Thank you for your answer. >> > For example a very simple algorithm is a matrix multiplication. I can >> see >> > that the heap peak is much higher for the numpy version in comparison >> to a >> > pure python 3 implementation. >> > The heap is measured with the libmemusage from libc: >> > >> > heap peak >> > Maximum of all size arguments of malloc(3), all >> products >> > of nmemb*size of calloc(3), all size arguments of >> > realloc(3), length arguments of mmap(2), and new_size >> > arguments of mremap(2). >> >> Could you post the exact code you're comparing? >> >> I think you'll find that a naive Python 3 matrix multiplication method >> is much, much slower than the same thing with Numpy, with arrays of >> any reasonable size. >> >> Cheers, >> >> Matthew >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastiankaster at googlemail.com Tue Feb 28 18:04:19 2017 From: sebastiankaster at googlemail.com (Sebastian K) Date: Wed, 1 Mar 2017 00:04:19 +0100 Subject: [Numpy-discussion] Numpy Overhead In-Reply-To: References: Message-ID: Yes you are right. There is no need to add that line. I deleted it. But the measured heap peak is still the same. 2017-03-01 0:00 GMT+01:00 Joseph Fox-Rabinovitz : > For one thing, `C = np.empty(shape=(n,n), dtype='float64')` allocates 10^4 > extra elements before being immediately discarded. > > -Joe > > On Tue, Feb 28, 2017 at 5:57 PM, Sebastian K com> wrote: > >> Yes it is true the execution time is much faster with the numpy function. >> >> The Code for numpy version: >> >> def createMatrix(n): >> Matrix = np.empty(shape=(n,n), dtype='float64') >> for x in range(n): >> for y in range(n): >> Matrix[x, y] = 0.1 + ((x*y)%1000)/1000.0 >> return Matrix >> >> >> >> if __name__ == '__main__': >> n = getDimension() >> if n > 0: >> A = createMatrix(n) >> B = createMatrix(n) >> C = np.empty(shape=(n,n), dtype='float64') >> C = np.dot(A,B) >> >> #print(C) >> >> In the pure python version I am just implementing the multiplication with >> three for-loops. >> >> Measured data with libmemusage: >> dimension of matrix: 100x100 >> heap peak pure python3: 1060565 >> heap peakt numpy function: 4917180 >> >> >> 2017-02-28 23:17 GMT+01:00 Matthew Brett : >> >>> Hi, >>> >>> On Tue, Feb 28, 2017 at 2:12 PM, Sebastian K >>> wrote: >>> > Thank you for your answer. >>> > For example a very simple algorithm is a matrix multiplication. I can >>> see >>> > that the heap peak is much higher for the numpy version in comparison >>> to a >>> > pure python 3 implementation. >>> > The heap is measured with the libmemusage from libc: >>> > >>> > heap peak >>> > Maximum of all size arguments of malloc(3), all >>> products >>> > of nmemb*size of calloc(3), all size arguments of >>> > realloc(3), length arguments of mmap(2), and new_size >>> > arguments of mremap(2). >>> >>> Could you post the exact code you're comparing? >>> >>> I think you'll find that a naive Python 3 matrix multiplication method >>> is much, much slower than the same thing with Numpy, with arrays of >>> any reasonable size. >>> >>> Cheers, >>> >>> Matthew >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue Feb 28 18:18:07 2017 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 28 Feb 2017 15:18:07 -0800 Subject: [Numpy-discussion] Numpy Overhead In-Reply-To: References: Message-ID: Hi, On Tue, Feb 28, 2017 at 3:04 PM, Sebastian K wrote: > Yes you are right. There is no need to add that line. I deleted it. But the > measured heap peak is still the same. You're applying the naive matrix multiplication algorithm, which is ideal for minimizing memory use during the computation, but terrible for speed-related stuff like keeping values in the CPU cache: https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm The Numpy version is likely calling into a highly optimized compiled routine for matrix multiplication, which can load chunks of the matrices at a time, to speed up computation. If you really need minimum memory heap usage and don't care about the order of magnitude(s) slowdown, then you might need to use the naive method, maybe implemented in Cython / C. Cheers, Matthew From sebastiankaster at googlemail.com Tue Feb 28 18:24:53 2017 From: sebastiankaster at googlemail.com (Sebastian K) Date: Wed, 1 Mar 2017 00:24:53 +0100 Subject: [Numpy-discussion] Numpy Overhead In-Reply-To: References: Message-ID: Thank you! That is the information I needed. 2017-03-01 0:18 GMT+01:00 Matthew Brett : > Hi, > > On Tue, Feb 28, 2017 at 3:04 PM, Sebastian K > wrote: > > Yes you are right. There is no need to add that line. I deleted it. But > the > > measured heap peak is still the same. > > You're applying the naive matrix multiplication algorithm, which is > ideal for minimizing memory use during the computation, but terrible > for speed-related stuff like keeping values in the CPU cache: > > https://en.wikipedia.org/wiki/Matrix_multiplication_algorithm > > The Numpy version is likely calling into a highly optimized compiled > routine for matrix multiplication, which can load chunks of the > matrices at a time, to speed up computation. If you really need > minimum memory heap usage and don't care about the order of > magnitude(s) slowdown, then you might need to use the naive method, > maybe implemented in Cython / C. > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Tue Feb 28 18:50:01 2017 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Wed, 1 Mar 2017 00:50:01 +0100 Subject: [Numpy-discussion] Daily numpy wheel builds - prefer to per-commit builds? In-Reply-To: References: Message-ID: +1 as well. -- Olivier From njs at pobox.com Tue Feb 28 23:15:20 2017 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 28 Feb 2017 20:15:20 -0800 Subject: [Numpy-discussion] Numpy Overhead In-Reply-To: References: Message-ID: On Feb 28, 2017 2:57 PM, "Sebastian K" wrote: Yes it is true the execution time is much faster with the numpy function. The Code for numpy version: def createMatrix(n): Matrix = np.empty(shape=(n,n), dtype='float64') for x in range(n): for y in range(n): Matrix[x, y] = 0.1 + ((x*y)%1000)/1000.0 return Matrix if __name__ == '__main__': n = getDimension() if n > 0: A = createMatrix(n) B = createMatrix(n) C = np.empty(shape=(n,n), dtype='float64') C = np.dot(A,B) #print(C) In the pure python version I am just implementing the multiplication with three for-loops. Measured data with libmemusage: dimension of matrix: 100x100 heap peak pure python3: 1060565 heap peakt numpy function: 4917180 4 megabytes is less than the memory needed just to load numpy :-). Try a 1000x1000 array (or even bigger), and I think you'll see more reasonable results. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: