From Catherine.M.Moroney at jpl.nasa.gov Tue Jul 1 19:46:32 2014 From: Catherine.M.Moroney at jpl.nasa.gov (Moroney, Catherine M (398D)) Date: Tue, 1 Jul 2014 23:46:32 +0000 Subject: [Numpy-discussion] numpy.histogram not giving expected results Message-ID: <19894208-1D97-461B-86EB-CF4394176CEE@jpl.nasa.gov> Hello, I'm trying to calculate a 1-d histogram of a distribution that contains mostly zeros, and I'm having problems with examples where the values to be histogrammed fall exactly on the bin boundaries: For example, this gives me the expected results (entering the exact bin values): >>> data array([ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.05, -0.05]) >>> bins_list = numpy.array([-0.1, -0.05, 0.0, 0.05, 0.1]) >>> (counts, edges) = numpy.histogram(data, bins=bins_list) >>> counts array([ 0, 1, 10, 1]) >>> edges array([-0.1 , -0.05, 0. , 0.05, 0.1 ]) but this does not (generating the bin values via bumpy.arange): >>> bins_arange = numpy.arange(-0.1, 0.101, 0.05) >>> data array([ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.05, -0.05]) >>> bins_arange array([-0.1 , -0.05, 0. , 0.05, 0.1 ]) >>> (counts, edges) = numpy.histogram(data, bins=bins_arange) >>> counts array([ 0, 1, 11, 0]) I'm assuming this is due to slight rounding in the calculation of bins_arange, as compared to the manually entered values in bins_list. What is the recommended way of getting the first set of results, without having to manually enter all the values in the "bins" argument? The following also gives me unexpected results: >>> data array([ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.05, -0.05]) counts, edges) = numpy.histogram(data, range=(-0.1, 0.1), bins=4) >>> counts array([ 0, 1, 11, 0]) Thank you for any advice, Catherine From chris.barker at noaa.gov Tue Jul 1 20:05:50 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 1 Jul 2014 17:05:50 -0700 Subject: [Numpy-discussion] numpy.histogram not giving expected results In-Reply-To: <19894208-1D97-461B-86EB-CF4394176CEE@jpl.nasa.gov> References: <19894208-1D97-461B-86EB-CF4394176CEE@jpl.nasa.gov> Message-ID: A few thoughts: 1) don't use arange() for flaoting point numbers, use linspace(). 2) histogram1d is a floating point function, and you shouldn't expect exact results for floating point -- in particular, values exactly at the bin boundaries are likely to be "uncertain" -- not quite the right word, but you get the idea. 3) if you expect have a lot of certain specific values, say, integers, or zeros -- then you don't want your bin boundaries to be exactly at the value -- they should be between the expected values. 4) remember that histogramming is inherently sensitive to bin position anyway -- if these small bin-boundary differences matter, than you may not be using teh best approach. -HTH, -Chris > >>> data > array([ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , > 0. , 0.05, -0.05]) > >>> bins_list = numpy.array([-0.1, -0.05, 0.0, 0.05, 0.1]) > >>> (counts, edges) = numpy.histogram(data, bins=bins_list) > >>> counts > array([ 0, 1, 10, 1]) > >>> edges > array([-0.1 , -0.05, 0. , 0.05, 0.1 ]) > > > > but this does not (generating the bin values via bumpy.arange): > > >>> bins_arange = numpy.arange(-0.1, 0.101, 0.05) > >>> data > array([ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , > 0. , 0.05, -0.05]) > >>> bins_arange > array([-0.1 , -0.05, 0. , 0.05, 0.1 ]) > >>> (counts, edges) = numpy.histogram(data, bins=bins_arange) > >>> counts > array([ 0, 1, 11, 0]) > > I'm assuming this is due to slight rounding in the calculation of > bins_arange, > as compared to the manually entered values in bins_list. > > What is the recommended way of getting the first set of results, without > having to manually enter all the values in the "bins" argument? > > The following also gives me unexpected results: > > >>> data > array([ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , > 0. , 0.05, -0.05]) > counts, edges) = numpy.histogram(data, range=(-0.1, 0.1), bins=4) > >>> counts > array([ 0, 1, 11, 0]) > > > > Thank you for any advice, > > Catherine > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Wed Jul 2 03:24:44 2014 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Wed, 2 Jul 2014 09:24:44 +0200 Subject: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing In-Reply-To: References: <536CB2C6.1030305@googlemail.com> Message-ID: Hi Matthew and Ralf, Has anyone managed to build working whl packages for numpy and scipy on win32 using the static mingw-w64 toolchain? -- Olivier From mszepien at gmail.com Wed Jul 2 04:07:04 2014 From: mszepien at gmail.com (Mark Szepieniec) Date: Wed, 2 Jul 2014 10:07:04 +0200 Subject: [Numpy-discussion] numpy.histogram not giving expected results In-Reply-To: References: <19894208-1D97-461B-86EB-CF4394176CEE@jpl.nasa.gov> Message-ID: Hi Catherine, I can't reproduce your issue with bins_list vs. bins_arange, but passing both range and number of bins to np.histogram does give the same strange behavior for me: In [16]: data = np.array([ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.05, -0.05]) In [17]: bins_list = np.array([-0.1, -0.05, 0.0, 0.05, 0.1]) In [18]: np.histogram(data, bins=bins_list) Out[18]: (array([ 0, 1, 10, 1]), array([-0.1 , -0.05, 0. , 0.05, 0.1 ])) In [19]: bins_arange = np.arange(-0.1, 0.101, 0.05) In [20]: np.histogram(data, bins=bins_arange) Out[20]: (array([ 0, 1, 10, 1]), array([-0.1 , -0.05, 0. , 0.05, 0.1 ])) In [21]: np.histogram(data, range=(-0.1, 0.1), bins=4) Out[21]: (array([ 0, 1, 11, 0]), array([-0.1 , -0.05, 0. , 0.05, 0.1 ])) In [22]: np.version.version Out[22]: '1.8.1' Looks like the 0.05 value of data is being binned differently in the last case, but I'm not sure why either... Mark On Wed, Jul 2, 2014 at 2:05 AM, Chris Barker wrote: > A few thoughts: > > 1) don't use arange() for flaoting point numbers, use linspace(). > > 2) histogram1d is a floating point function, and you shouldn't expect > exact results for floating point -- in particular, values exactly at the > bin boundaries are likely to be "uncertain" -- not quite the right word, > but you get the idea. > > 3) if you expect have a lot of certain specific values, say, integers, or > zeros -- then you don't want your bin boundaries to be exactly at the value > -- they should be between the expected values. > > 4) remember that histogramming is inherently sensitive to bin position > anyway -- if these small bin-boundary differences matter, than you may not > be using teh best approach. > > -HTH, > -Chris > > > > > > >> >>> data >> array([ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , >> 0. , 0.05, -0.05]) >> >>> bins_list = numpy.array([-0.1, -0.05, 0.0, 0.05, 0.1]) >> >>> (counts, edges) = numpy.histogram(data, bins=bins_list) >> >>> counts >> array([ 0, 1, 10, 1]) >> >>> edges >> array([-0.1 , -0.05, 0. , 0.05, 0.1 ]) >> >> >> >> but this does not (generating the bin values via bumpy.arange): >> >> >>> bins_arange = numpy.arange(-0.1, 0.101, 0.05) >> >>> data >> array([ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , >> 0. , 0.05, -0.05]) >> >>> bins_arange >> array([-0.1 , -0.05, 0. , 0.05, 0.1 ]) >> >>> (counts, edges) = numpy.histogram(data, bins=bins_arange) >> >>> counts >> array([ 0, 1, 11, 0]) >> >> I'm assuming this is due to slight rounding in the calculation of >> bins_arange, >> as compared to the manually entered values in bins_list. >> >> What is the recommended way of getting the first set of results, without >> having to manually enter all the values in the "bins" argument? >> >> The following also gives me unexpected results: >> >> >>> data >> array([ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , >> 0. , 0.05, -0.05]) >> counts, edges) = numpy.histogram(data, range=(-0.1, 0.1), bins=4) >> >>> counts >> array([ 0, 1, 11, 0]) >> >> >> >> Thank you for any advice, >> >> Catherine >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Jul 2 04:49:07 2014 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 2 Jul 2014 09:49:07 +0100 Subject: [Numpy-discussion] Fwd: [Python-ideas] PEP pre-draft: Support for indexing with keyword arguments In-Reply-To: <53B33800.1030300@ferrara.linux.it> References: <53B33800.1030300@ferrara.linux.it> Message-ID: There's some discussion on python-ideas about making it possible for python indexing to accept kwargs, eg arr[1:2, foo=bar] Since numpy is a very heavy user of indexing which might benefit from this, I thought I should forward it here. If we have clear use cases for such a feature then that may strongly affect the discussion. I admit I can't actually think of any features this would enable for us though... -n ---------- Forwarded message ---------- From: "Stefano Borini" Date: 2 Jul 2014 00:17 Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword arguments To: "python-ideas at python.org" , "Joseph Martinot-Lagarde" Cc: Dear all, after the first mailing list feedback, and further private discussion with Joseph Martinot-Lagarde, I drafted a first iteration of a PEP for keyword arguments in indexing. The document is available here. https://github.com/stefanoborini/pep-keyword/blob/master/PEP-XXX.txt The document is not in final form when it comes to specifications. In fact, it requires additional discussion about the best strategy to achieve the desired result. Particular attention has been devoted to present alternative implementation strategies, their pros and cons. I will examine all feedback tomorrow morning European time (in approx 10 hrs), and apply any pull requests or comments you may have. When the specification is finalized, or this community suggests that the PEP is in a form suitable for official submission despite potential open issues, I will submit it to the editor panel for further discussion, and deploy an actual implementation according to the agreed specification for a working test run. I apologize for potential mistakes in the PEP drafting and submission process, as this is my first PEP. Kind Regards, Stefano Borini _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmkleffner at gmail.com Wed Jul 2 05:36:44 2014 From: cmkleffner at gmail.com (Carl Kleffner) Date: Wed, 2 Jul 2014 11:36:44 +0200 Subject: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing In-Reply-To: References: <536CB2C6.1030305@googlemail.com> Message-ID: Hi all, I do regulary builds for python-2.7. Due to my limited resources I didn't build for 3.3 or 3.4 right now. I didn't updated my toolchhain from february, but I do regulary builds of OpenBLAS. OpenBLAS is under heavy development right now, thanks to Werner Saar, see: https://github.com/wernsaar/OpenBLAS . A lot of bugs have been canceled out at the cost of performance, see the kernel TODO list: https://github.com/xianyi/OpenBLAS/wiki/Fixed-optimized-kernels-To-do-List . Many bugs related to Windows have been corrected. A very weird bug i.e.: https://github.com/xianyi/OpenBLAS/issues/394 and https://github.com/JuliaLang/julia/issues/5574 . I got the impression, that the Julia community (and maybe the R and octave community) is very interested getting towards a stable Windows OpenBLAS. OpenBLAS is the only free OSS optimized BLAS/Lapack solution maintained for Windows today. Atlas seems not to be maintained for Windows anymore (is this true Matthew?) somewhat older test wheels for python-2.7 can be downloaded here: see: http://figshare.com/articles/search?q=numpy&quick=1&x=0&y=0 (2014-06-10) numpy and scipy wheels for py-2.7 The scipy test suite (amd64) emits segfaults with multithreaded OpenBLAS, but is stable with single thread (see the log files). I didn't dig into this further. Win32 works with MT OpenBLAS, but has some test failures with atan2 and hypot. The is more or less the status today. I can upload new wheels linked against a recent OpenBLAS, maybe tomorrow on Binstar. Regards, Carl 2014-07-02 9:24 GMT+02:00 Olivier Grisel : > Hi Matthew and Ralf, > > Has anyone managed to build working whl packages for numpy and scipy > on win32 using the static mingw-w64 toolchain? > > -- > Olivier > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mads.ipsen at gmail.com Wed Jul 2 06:15:25 2014 From: mads.ipsen at gmail.com (Mads Ipsen) Date: Wed, 02 Jul 2014 12:15:25 +0200 Subject: [Numpy-discussion] Accessing irregular sized array data from C Message-ID: <53B3DBBD.8030202@gmail.com> Hi, If you setup an M x N array like this a = 1.0*numpy.arange(24).reshape(8,3) you can access the data from a C function like this void foo(PyObject * numpy_data) { // Get dimension and data pointer int const m = static_cast(PyArray_DIMS(numpy_data)[0]); int const n = static_cast(PyArray_DIMS(numpy_data)[1]); double * const data = (double *) PyArray_DATA(numpy_data); // Access data ... } Now, suppose I have an irregular shaped numpy array like this a1 = numpy.array([ 1.0, 2.0, 3.0]) a2 = numpy.array([-2.0, 4.0]) a3 = numpy.array([5.0]) b = numpy.array([a1,a2,a3]) How can open up the doors to the array data of b on the C-side? Best regards, Mads -- +---------------------------------------------------------+ | Mads Ipsen | +----------------------+----------------------------------+ | G?seb?ksvej 7, 4. tv | phone: +45-29716388 | | DK-2500 Valby | email: mads.ipsen at gmail.com | | Denmark | map : www.tinyurl.com/ns52fpa | +----------------------+----------------------------------+ From matthew.brett at gmail.com Wed Jul 2 06:29:07 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 2 Jul 2014 11:29:07 +0100 Subject: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing In-Reply-To: References: <536CB2C6.1030305@googlemail.com> Message-ID: Hi, On Wed, Jul 2, 2014 at 10:36 AM, Carl Kleffner wrote: > Hi all, > > I do regulary builds for python-2.7. Due to my limited resources I didn't > build for 3.3 or 3.4 right now. I didn't updated my toolchhain from > february, but I do regulary builds of OpenBLAS. OpenBLAS is under heavy > development right now, thanks to Werner Saar, see: > https://github.com/wernsaar/OpenBLAS . > A lot of bugs have been canceled out at the cost of performance, see the > kernel TODO list: > https://github.com/xianyi/OpenBLAS/wiki/Fixed-optimized-kernels-To-do-List . > Many bugs related to Windows have been corrected. A very weird bug i.e.: > https://github.com/xianyi/OpenBLAS/issues/394 and > https://github.com/JuliaLang/julia/issues/5574 . > I got the impression, that the Julia community (and maybe the R and octave > community) is very interested getting towards a stable Windows OpenBLAS. > OpenBLAS is the only free OSS optimized BLAS/Lapack solution maintained for > Windows today. Atlas seems not to be maintained for Windows anymore (is this > true Matthew?) No, it's not true, but it's not really false either. Clint Whaley is the ATLAS maintainer and his interests are firmly in high-performance-computing so he is much more interested in exotic new chips than in Windows. But, he does aim to make the latest stable release buildable on Windows, and he's helped me do that for the latest stable, with some hope he'll continue to work on the 64-bit Windows kernels which are hobbled at the moment because of differences in the Windows / other OS 64-bit ABI. Builds here: https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/ > somewhat older test wheels for python-2.7 can be downloaded here: > see: http://figshare.com/articles/search?q=numpy&quick=1&x=0&y=0 > (2014-06-10) numpy and scipy wheels for py-2.7 > The scipy test suite (amd64) emits segfaults with multithreaded OpenBLAS, > but is stable with single thread (see the log files). I didn't dig into this > further. Win32 works with MT OpenBLAS, but has some test failures with atan2 > and hypot. The is more or less the status today. I can upload new wheels > linked against a recent OpenBLAS, maybe tomorrow on Binstar. I built some 64-bit wheels against Carl's toolchain and the ATLAS above, I think they don't have any threading issues, but the scipy wheel fails one scipy test due to some very small precision differences in the mingw runtime. I think we agreed this failure wasn't important. https://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.1-cp27-none-win_amd64.whl https://nipy.bic.berkeley.edu/scipy_installers/scipy-0.13.3-cp27-none-win_amd64.whl Cheers, Matthew From matthew.brett at gmail.com Wed Jul 2 06:37:16 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 2 Jul 2014 11:37:16 +0100 Subject: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing In-Reply-To: References: <536CB2C6.1030305@googlemail.com> Message-ID: Hi, On Wed, Jul 2, 2014 at 11:29 AM, Matthew Brett wrote: > Hi, > > On Wed, Jul 2, 2014 at 10:36 AM, Carl Kleffner wrote: >> Hi all, >> >> I do regulary builds for python-2.7. Due to my limited resources I didn't >> build for 3.3 or 3.4 right now. I didn't updated my toolchhain from >> february, but I do regulary builds of OpenBLAS. OpenBLAS is under heavy >> development right now, thanks to Werner Saar, see: >> https://github.com/wernsaar/OpenBLAS . >> A lot of bugs have been canceled out at the cost of performance, see the >> kernel TODO list: >> https://github.com/xianyi/OpenBLAS/wiki/Fixed-optimized-kernels-To-do-List . >> Many bugs related to Windows have been corrected. A very weird bug i.e.: >> https://github.com/xianyi/OpenBLAS/issues/394 and >> https://github.com/JuliaLang/julia/issues/5574 . >> I got the impression, that the Julia community (and maybe the R and octave >> community) is very interested getting towards a stable Windows OpenBLAS. >> OpenBLAS is the only free OSS optimized BLAS/Lapack solution maintained for >> Windows today. Atlas seems not to be maintained for Windows anymore (is this >> true Matthew?) > > No, it's not true, but it's not really false either. Clint Whaley is > the ATLAS maintainer and his interests are firmly in > high-performance-computing so he is much more interested in exotic new > chips than in Windows. But, he does aim to make the latest stable > release buildable on Windows, and he's helped me do that for the > latest stable, with some hope he'll continue to work on the 64-bit > Windows kernels which are hobbled at the moment because of differences > in the Windows / other OS 64-bit ABI. Builds here: > > https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/ > >> somewhat older test wheels for python-2.7 can be downloaded here: >> see: http://figshare.com/articles/search?q=numpy&quick=1&x=0&y=0 >> (2014-06-10) numpy and scipy wheels for py-2.7 >> The scipy test suite (amd64) emits segfaults with multithreaded OpenBLAS, >> but is stable with single thread (see the log files). I didn't dig into this >> further. Win32 works with MT OpenBLAS, but has some test failures with atan2 >> and hypot. The is more or less the status today. I can upload new wheels >> linked against a recent OpenBLAS, maybe tomorrow on Binstar. > > I built some 64-bit wheels against Carl's toolchain and the ATLAS > above, I think they don't have any threading issues, but the scipy > wheel fails one scipy test due to some very small precision > differences in the mingw runtime. I think we agreed this failure > wasn't important. > > https://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.1-cp27-none-win_amd64.whl > https://nipy.bic.berkeley.edu/scipy_installers/scipy-0.13.3-cp27-none-win_amd64.whl Sorry - I wasn't paying attention - you asked about 32-bit wheels. Honestly, using the same toolchain, they wouldn't be at all hard to build. One issue is that the ATLAS builds depend on SSE2. That isn't an issue for 64 bit builds because the 64-bit ABI requires SSE2, but it is an issue for 32-bit where we have no such guarantee. It looks like 99% of Windows users do have SSE2 though [1]. So I think what is required is * Build the wheels for 32-bit (easy) * Patch the wheels to check and give helpful error in absence of SSE2 (fairly easy) * Get agreement these should go up on pypi and be maintained (feedback anyone?) Cheers, Matthew [1] https://github.com/numpy/numpy/wiki/Windows-versions#sse--sse2 From jtaylor.debian at googlemail.com Wed Jul 2 06:46:36 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Wed, 2 Jul 2014 12:46:36 +0200 Subject: [Numpy-discussion] Accessing irregular sized array data from C In-Reply-To: <53B3DBBD.8030202@gmail.com> References: <53B3DBBD.8030202@gmail.com> Message-ID: On Wed, Jul 2, 2014 at 12:15 PM, Mads Ipsen wrote: > Hi, > > If you setup an M x N array like this > > a = 1.0*numpy.arange(24).reshape(8,3) > > you can access the data from a C function like this > > void foo(PyObject * numpy_data) > { > // Get dimension and data pointer > int const m = static_cast(PyArray_DIMS(numpy_data)[0]); > int const n = static_cast(PyArray_DIMS(numpy_data)[1]); > double * const data = (double *) PyArray_DATA(numpy_data); > > // Access data > ... > } > > Now, suppose I have an irregular shaped numpy array like this > > a1 = numpy.array([ 1.0, 2.0, 3.0]) > a2 = numpy.array([-2.0, 4.0]) > a3 = numpy.array([5.0]) > b = numpy.array([a1,a2,a3]) > > How can open up the doors to the array data of b on the C-side? > numpy does not directly support irregular shaped arrays (or ragged arrays). If you look at the result of your example you will see this: In [5]: b Out[5]: array([array([ 1., 2., 3.]), array([-2., 4.]), array([ 5.])], dtype=object) b has datatype object, this means it is a 1d array containing more array objects. Numpy does not directly know about the shapes or types the sub arrays. It is not necessarily homogeneous anymore, but compared to a regular python list you still have elementwise operations (if the contained python objects support them) and it can have multiple dimensions. In C you would access such an array it like this: PyArrayObject * const data = (PyArrayObject *) PyArray_DATA(numpy_data); for (i=0; i < PyArray_DIMS(numpy_data)[0]; i++) { assert(PyArray_Check(data[i])); double * const sub_data = (double *) PyArray_DATA(data[i]); } From cmkleffner at gmail.com Wed Jul 2 07:18:07 2014 From: cmkleffner at gmail.com (Carl Kleffner) Date: Wed, 2 Jul 2014 13:18:07 +0200 Subject: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing In-Reply-To: References: <536CB2C6.1030305@googlemail.com> Message-ID: Hi, The mingw-w64 based wheels (Atlas and openBLAS) are based on a patched numpy version, that hasn't been published as numpy pull for revision until now (my failure). I could try to do this tomorrow in the evening. Another important point is, that the toolchain, that is capable to compile numpy/scipy was adapted to allow for MSVC / mingw runtime compatibility and does not create any gcc/mingw runtime dependency anymore. OpenBLAS has one advantage over Atlas: numpy/scipy are linked dynamically against OpenBLAS. Statically linked BLAS like MKL or ATLAS creates huge python extensions and have considerable higher memory consumption compared to dynamically linkage. On the other hand correctness is more important, so ATLAS has to be preferred now. Users with non SEE processors could be provided with wheels distributed on binstar. Regards Carl 2014-07-02 12:37 GMT+02:00 Matthew Brett : > Hi, > > On Wed, Jul 2, 2014 at 11:29 AM, Matthew Brett > wrote: > > Hi, > > > > On Wed, Jul 2, 2014 at 10:36 AM, Carl Kleffner > wrote: > >> Hi all, > >> > >> I do regulary builds for python-2.7. Due to my limited resources I > didn't > >> build for 3.3 or 3.4 right now. I didn't updated my toolchhain from > >> february, but I do regulary builds of OpenBLAS. OpenBLAS is under heavy > >> development right now, thanks to Werner Saar, see: > >> https://github.com/wernsaar/OpenBLAS . > >> A lot of bugs have been canceled out at the cost of performance, see the > >> kernel TODO list: > >> > https://github.com/xianyi/OpenBLAS/wiki/Fixed-optimized-kernels-To-do-List > . > >> Many bugs related to Windows have been corrected. A very weird bug i.e.: > >> https://github.com/xianyi/OpenBLAS/issues/394 and > >> https://github.com/JuliaLang/julia/issues/5574 . > >> I got the impression, that the Julia community (and maybe the R and > octave > >> community) is very interested getting towards a stable Windows OpenBLAS. > >> OpenBLAS is the only free OSS optimized BLAS/Lapack solution maintained > for > >> Windows today. Atlas seems not to be maintained for Windows anymore (is > this > >> true Matthew?) > > > > No, it's not true, but it's not really false either. Clint Whaley is > > the ATLAS maintainer and his interests are firmly in > > high-performance-computing so he is much more interested in exotic new > > chips than in Windows. But, he does aim to make the latest stable > > release buildable on Windows, and he's helped me do that for the > > latest stable, with some hope he'll continue to work on the 64-bit > > Windows kernels which are hobbled at the moment because of differences > > in the Windows / other OS 64-bit ABI. Builds here: > > > > https://nipy.bic.berkeley.edu/scipy_installers/atlas_builds/ > > > >> somewhat older test wheels for python-2.7 can be downloaded here: > >> see: http://figshare.com/articles/search?q=numpy&quick=1&x=0&y=0 > >> (2014-06-10) numpy and scipy wheels for py-2.7 > >> The scipy test suite (amd64) emits segfaults with multithreaded > OpenBLAS, > >> but is stable with single thread (see the log files). I didn't dig into > this > >> further. Win32 works with MT OpenBLAS, but has some test failures with > atan2 > >> and hypot. The is more or less the status today. I can upload new wheels > >> linked against a recent OpenBLAS, maybe tomorrow on Binstar. > > > > I built some 64-bit wheels against Carl's toolchain and the ATLAS > > above, I think they don't have any threading issues, but the scipy > > wheel fails one scipy test due to some very small precision > > differences in the mingw runtime. I think we agreed this failure > > wasn't important. > > > > > https://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.1-cp27-none-win_amd64.whl > > > https://nipy.bic.berkeley.edu/scipy_installers/scipy-0.13.3-cp27-none-win_amd64.whl > > Sorry - I wasn't paying attention - you asked about 32-bit wheels. > Honestly, using the same toolchain, they wouldn't be at all hard to > build. > > One issue is that the ATLAS builds depend on SSE2. That isn't an > issue for 64 bit builds because the 64-bit ABI requires SSE2, but it > is an issue for 32-bit where we have no such guarantee. It looks like > 99% of Windows users do have SSE2 though [1]. So I think what is > required is > > * Build the wheels for 32-bit (easy) > * Patch the wheels to check and give helpful error in absence of SSE2 > (fairly easy) > * Get agreement these should go up on pypi and be maintained (feedback > anyone?) > > Cheers, > > Matthew > > [1] https://github.com/numpy/numpy/wiki/Windows-versions#sse--sse2 > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Jul 2 07:35:07 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 2 Jul 2014 12:35:07 +0100 Subject: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing In-Reply-To: References: <536CB2C6.1030305@googlemail.com> Message-ID: Hi, On Wed, Jul 2, 2014 at 12:18 PM, Carl Kleffner wrote: > Hi, > > The mingw-w64 based wheels (Atlas and openBLAS) are based on a patched numpy > version, that hasn't been published as numpy pull for revision until now (my > failure). I could try to do this tomorrow in the evening. That would be really good. I'll try and help with review if I can. > Another important > point is, that the toolchain, that is capable to compile numpy/scipy was > adapted to allow for MSVC / mingw runtime compatibility and does not create > any gcc/mingw runtime dependency anymore. > > OpenBLAS has one advantage over Atlas: numpy/scipy are linked dynamically > against OpenBLAS. Statically linked BLAS like MKL or ATLAS creates huge > python extensions and have considerable higher memory consumption compared > to dynamically linkage. On the other hand correctness is more important, so > ATLAS has to be preferred now. Do you have any index of what the memory cost is? If it's in the order of 20M presumably that won't have much practical impact? > Users with non SEE processors could be provided with wheels distributed on > binstar. The last plan we seemed to have was to continue making the 'superpack' exe installers which contain no-SSE, SSE2 and SSE3 builds where the installer selects which one to install at runtime. The warning from the wheel would point to these installers as the backup option. If we did want to produce alternative wheels, I guess a specific static https directory would be easiest; otherwise the user would get the odd effect that they'd get a hobbled wheel by default when installing from binstar (assuming they did in fact have SSE2). I mean, this pip install -f https://somewhere.org/no_sse_wheels --no-index numpy seems to make more sense as an alternative install command for non-SSE, than this: pip install -i http://binstar.org numpy because in the former case, you can see what is special about the command. Cheers, Matthew From mads.ipsen at gmail.com Wed Jul 2 07:44:45 2014 From: mads.ipsen at gmail.com (Mads Ipsen) Date: Wed, 02 Jul 2014 13:44:45 +0200 Subject: [Numpy-discussion] Accessing irregular sized array data from C In-Reply-To: References: <53B3DBBD.8030202@gmail.com> Message-ID: <53B3F0AD.70700@gmail.com> On 02/07/14 12:46, Julian Taylor wrote: > On Wed, Jul 2, 2014 at 12:15 PM, Mads Ipsen wrote: >> Hi, >> >> If you setup an M x N array like this >> >> a = 1.0*numpy.arange(24).reshape(8,3) >> >> you can access the data from a C function like this >> >> void foo(PyObject * numpy_data) >> { >> // Get dimension and data pointer >> int const m = static_cast(PyArray_DIMS(numpy_data)[0]); >> int const n = static_cast(PyArray_DIMS(numpy_data)[1]); >> double * const data = (double *) PyArray_DATA(numpy_data); >> >> // Access data >> ... >> } >> >> Now, suppose I have an irregular shaped numpy array like this >> >> a1 = numpy.array([ 1.0, 2.0, 3.0]) >> a2 = numpy.array([-2.0, 4.0]) >> a3 = numpy.array([5.0]) >> b = numpy.array([a1,a2,a3]) >> >> How can open up the doors to the array data of b on the C-side? >> > > numpy does not directly support irregular shaped arrays (or ragged arrays). > If you look at the result of your example you will see this: > In [5]: b > Out[5]: array([array([ 1., 2., 3.]), array([-2., 4.]), array([ > 5.])], dtype=object) > > b has datatype object, this means it is a 1d array containing more > array objects. Numpy does not directly know about the shapes or types > the sub arrays. It is not necessarily homogeneous anymore, but > compared to a regular python list you still have elementwise > operations (if the contained python objects support them) and it can > have multiple dimensions. > > In C you would access such an array it like this: > > PyArrayObject * const data = (PyArrayObject *) PyArray_DATA(numpy_data); > for (i=0; i < PyArray_DIMS(numpy_data)[0]; i++) { > assert(PyArray_Check(data[i])); > double * const sub_data = (double *) PyArray_DATA(data[i]); > } > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Thanks - that'll get me going! Best, Mads -- +---------------------------------------------------------+ | Mads Ipsen | +----------------------+----------------------------------+ | G?seb?ksvej 7, 4. tv | phone: +45-29716388 | | DK-2500 Valby | email: mads.ipsen at gmail.com | | Denmark | map : www.tinyurl.com/ns52fpa | +----------------------+----------------------------------+ From olivier.grisel at ensta.org Wed Jul 2 07:47:27 2014 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Wed, 2 Jul 2014 13:47:27 +0200 Subject: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing In-Reply-To: References: <536CB2C6.1030305@googlemail.com> Message-ID: Hi Carl, All the items you suggest would be very appreciated. Don't hesitate to ping me if you need me to test new packages. Also the sklearn project has a free Rackspace Cloud account that Matthew is already using to make travis upload OSX wheels for the master branch of various scipy stack projects. Rackspace cloud can also be used to start windows VMs if needed. Please tell me if you want a some user credentials and API key. Myself I use the Rackspace Cloud account to build sklearn wheels following those instructions: https://github.com/scikit-learn/scikit-learn/wiki/How-to-make-a-release#building-windows-binary-packages We are using msvc express (but only for 32bit Python) right now. I have yet to try to build sklearn with your mingw-w64 static toolchain. Rackspace granted us $2000 worth of cloud resource per month (e.g. bandwith and VM time) so there is plenty of resource left to help with upstream projects such as numpy and scipy. Best, -- Olivier From cmkleffner at gmail.com Wed Jul 2 09:24:13 2014 From: cmkleffner at gmail.com (Carl Kleffner) Date: Wed, 2 Jul 2014 15:24:13 +0200 Subject: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing In-Reply-To: References: <536CB2C6.1030305@googlemail.com> Message-ID: Hi, personally I don't have a preference of Binstar over somewhere.org. More important is that one has to agree where to find the binaries. Binstar has the concept of channels and allow wheels. So one could provide a channel for NOSSE and more channels for other specialized builds: ATLAS/OpenBLAS/RefBLAS, SSE4/AVX and so on. A generic binary should be build with generic optimizing GCC switches and SSE2 per default. I propose to provide generic binaries for PYPI instead of superbinaries. and specialized binaries on Binstar or somewhere else. Just thinking two or three steps ahead. Regards Carl 2014-07-02 13:35 GMT+02:00 Matthew Brett : > Hi, > > On Wed, Jul 2, 2014 at 12:18 PM, Carl Kleffner > wrote: > > Hi, > > > > The mingw-w64 based wheels (Atlas and openBLAS) are based on a patched > numpy > > version, that hasn't been published as numpy pull for revision until now > (my > > failure). I could try to do this tomorrow in the evening. > > That would be really good. I'll try and help with review if I can. > > > Another important > > point is, that the toolchain, that is capable to compile numpy/scipy was > > adapted to allow for MSVC / mingw runtime compatibility and does not > create > > any gcc/mingw runtime dependency anymore. > > > > OpenBLAS has one advantage over Atlas: numpy/scipy are linked dynamically > > against OpenBLAS. Statically linked BLAS like MKL or ATLAS creates huge > > python extensions and have considerable higher memory consumption > compared > > to dynamically linkage. On the other hand correctness is more important, > so > > ATLAS has to be preferred now. > > Do you have any index of what the memory cost is? If it's in the > order of 20M presumably that won't have much practical impact? > > > Users with non SEE processors could be provided with wheels distributed > on > > binstar. > > The last plan we seemed to have was to continue making the 'superpack' > exe installers which contain no-SSE, SSE2 and SSE3 builds where the > installer selects which one to install at runtime. The warning from > the wheel would point to these installers as the backup option. > > If we did want to produce alternative wheels, I guess a specific > static https directory would be easiest; otherwise the user would get > the odd effect that they'd get a hobbled wheel by default when > installing from binstar (assuming they did in fact have SSE2). I > mean, this > > pip install -f https://somewhere.org/no_sse_wheels --no-index numpy > > seems to make more sense as an alternative install command for > non-SSE, than this: > > pip install -i http://binstar.org numpy > > because in the former case, you can see what is special about the command. > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Jul 2 09:36:57 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 2 Jul 2014 14:36:57 +0100 Subject: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing In-Reply-To: References: <536CB2C6.1030305@googlemail.com> Message-ID: Hi, On Wed, Jul 2, 2014 at 2:24 PM, Carl Kleffner wrote: > Hi, > > personally I don't have a preference of Binstar over somewhere.org. More > important is that one has to agree where to find the binaries. Binstar has > the concept of channels and allow wheels. So one could provide a channel for > NOSSE and more channels for other specialized builds: > ATLAS/OpenBLAS/RefBLAS, SSE4/AVX and so on. Having a noSSE channel would make sense. > A generic binary should be build with generic optimizing GCC switches and > SSE2 per default. I propose to provide generic binaries for PYPI instead of > superbinaries. and specialized binaries on Binstar or somewhere else. The exe superbinary installers can also go on pypi without causing confusion to pip at least, but it would be good to have wheels as well. > Just thinking two or three steps ahead. It's good to have a plan :) Cheers, Matthew From mszepien at gmail.com Wed Jul 2 10:57:29 2014 From: mszepien at gmail.com (Mark Szepieniec) Date: Wed, 2 Jul 2014 16:57:29 +0200 Subject: [Numpy-discussion] numpy.histogram not giving expected results In-Reply-To: References: <19894208-1D97-461B-86EB-CF4394176CEE@jpl.nasa.gov> Message-ID: Looks this could be a float32 vs float64 problem: In [19]: data32 = np.array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.05, -0.05], dtype=np.float32) In [20]: data64 = np.array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.05, -0.05], dtype=np.float64) In [21]: bins32 = np.arange(-0.1, 0.101, 0.05, dtype=np.float32) In [22]: bins64 = np.arange(-0.1, 0.101, 0.05, dtype=np.float64) In [23]: np.histogram(data32, bins32) Out[23]: (array([ 0, 1, 10, 1]), array([-0.1 , -0.05, 0. , 0.05, 0.1 ], dtype=float32)) In [24]: np.histogram(data32, bins64) Out[24]: (array([ 1, 0, 10, 1]), array([-0.1 , -0.05, 0. , 0.05, 0.1 ])) In [25]: np.histogram(data64, bins32) Out[25]: (array([ 0, 1, 11, 0]), array([-0.1 , -0.05, 0. , 0.05, 0.1 ], dtype=float32)) In [26]: np.histogram(data64, bins64) Out[26]: (array([ 0, 1, 10, 1]), array([-0.1 , -0.05, 0. , 0.05, 0.1 ])) I guess users always be very careful when mixing floating point types, but should numpy prevent (or warn) the user from doing so in this case? On Wed, Jul 2, 2014 at 10:07 AM, Mark Szepieniec wrote: > Hi Catherine, > > I can't reproduce your issue with bins_list vs. bins_arange, but passing > both range and number of bins to np.histogram does give the same strange > behavior for me: > > In [16]: data = np.array([ 0. , 0. , 0. , 0. , 0. , 0. , 0. , > 0. , 0. , > 0. , 0.05, -0.05]) > > In [17]: bins_list = np.array([-0.1, -0.05, 0.0, 0.05, 0.1]) > > In [18]: np.histogram(data, bins=bins_list) > Out[18]: (array([ 0, 1, 10, 1]), array([-0.1 , -0.05, 0. , 0.05, 0.1 > ])) > > In [19]: bins_arange = np.arange(-0.1, 0.101, 0.05) > > In [20]: np.histogram(data, bins=bins_arange) > Out[20]: (array([ 0, 1, 10, 1]), array([-0.1 , -0.05, 0. , 0.05, 0.1 > ])) > > In [21]: np.histogram(data, range=(-0.1, 0.1), bins=4) > Out[21]: (array([ 0, 1, 11, 0]), array([-0.1 , -0.05, 0. , 0.05, 0.1 > ])) > > In [22]: np.version.version > Out[22]: '1.8.1' > > Looks like the 0.05 value of data is being binned differently in the last > case, but I'm not sure why either... > > Mark > > > On Wed, Jul 2, 2014 at 2:05 AM, Chris Barker > wrote: > >> A few thoughts: >> >> 1) don't use arange() for flaoting point numbers, use linspace(). >> >> 2) histogram1d is a floating point function, and you shouldn't expect >> exact results for floating point -- in particular, values exactly at the >> bin boundaries are likely to be "uncertain" -- not quite the right word, >> but you get the idea. >> >> 3) if you expect have a lot of certain specific values, say, integers, or >> zeros -- then you don't want your bin boundaries to be exactly at the value >> -- they should be between the expected values. >> >> 4) remember that histogramming is inherently sensitive to bin position >> anyway -- if these small bin-boundary differences matter, than you may not >> be using teh best approach. >> >> -HTH, >> -Chris >> >> >> >> >> >> >>> >>> data >>> array([ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , >>> 0. , 0.05, -0.05]) >>> >>> bins_list = numpy.array([-0.1, -0.05, 0.0, 0.05, 0.1]) >>> >>> (counts, edges) = numpy.histogram(data, bins=bins_list) >>> >>> counts >>> array([ 0, 1, 10, 1]) >>> >>> edges >>> array([-0.1 , -0.05, 0. , 0.05, 0.1 ]) >>> >>> >>> >>> but this does not (generating the bin values via bumpy.arange): >>> >>> >>> bins_arange = numpy.arange(-0.1, 0.101, 0.05) >>> >>> data >>> array([ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , >>> 0. , 0.05, -0.05]) >>> >>> bins_arange >>> array([-0.1 , -0.05, 0. , 0.05, 0.1 ]) >>> >>> (counts, edges) = numpy.histogram(data, bins=bins_arange) >>> >>> counts >>> array([ 0, 1, 11, 0]) >>> >>> I'm assuming this is due to slight rounding in the calculation of >>> bins_arange, >>> as compared to the manually entered values in bins_list. >>> >>> What is the recommended way of getting the first set of results, without >>> having to manually enter all the values in the "bins" argument? >>> >>> The following also gives me unexpected results: >>> >>> >>> data >>> array([ 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , >>> 0. , 0.05, -0.05]) >>> counts, edges) = numpy.histogram(data, range=(-0.1, 0.1), bins=4) >>> >>> counts >>> array([ 0, 1, 11, 0]) >>> >>> >>> >>> Thank you for any advice, >>> >>> Catherine >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> >> >> -- >> >> Christopher Barker, Ph.D. >> Oceanographer >> >> Emergency Response Division >> NOAA/NOS/OR&R (206) 526-6959 voice >> 7600 Sand Point Way NE (206) 526-6329 fax >> Seattle, WA 98115 (206) 526-6317 main reception >> >> Chris.Barker at noaa.gov >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Jul 2 13:24:53 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 2 Jul 2014 10:24:53 -0700 Subject: [Numpy-discussion] Accessing irregular sized array data from C In-Reply-To: References: <53B3DBBD.8030202@gmail.com> Message-ID: On Wed, Jul 2, 2014 at 3:46 AM, Julian Taylor wrote: > numpy does not directly support irregular shaped arrays (or ragged arrays). > If you look at the result of your example you will see this: > In [5]: b > Out[5]: array([array([ 1., 2., 3.]), array([-2., 4.]), array([ > 5.])], dtype=object) > > b has datatype object, this means it is a 1d array containing more > array objects. Numpy does not directly know about the shapes or types > the sub arrays. It is not necessarily homogeneous anymore, but > compared to a regular python list you still have elementwise > operations (if the contained python objects support them) and it can > have multiple dimensions. > All true, but afiew notes: 1) you probably wan to look at Cython for making this sor tof thing easier. 2) a numpy=based ragged array implementation might make sense as well. You essentially store the data in a rank-1 shaped numpy array, and provide custom indexing to get the "rows" out. This would allow you to have all the data in a single memory block available to C (or Cython), so that you could fully optimize indexing and access, and have a data structure that makes sense in pure C. I've enclosed a start off such a class ( I honestly can't remember how far I got with it!, but it was at least useful for one project of mine.) HTH, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ragged_array.py Type: text/x-python-script Size: 4305 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: test_ragged_array.py Type: text/x-python-script Size: 3068 bytes Desc: not available URL: From chris.barker at noaa.gov Wed Jul 2 13:29:17 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 2 Jul 2014 10:29:17 -0700 Subject: [Numpy-discussion] numpy.histogram not giving expected results In-Reply-To: References: <19894208-1D97-461B-86EB-CF4394176CEE@jpl.nasa.gov> Message-ID: On Wed, Jul 2, 2014 at 7:57 AM, Mark Szepieniec wrote: > Looks this could be a float32 vs float64 problem: > that would explain it. > I guess users always be very careful when mixing floating point types, but > should numpy prevent (or warn) the user from doing so in this case? > I don't think so -- this "uncertainty" is very much the nature of histogramming, particularly with floating point values -- you should expect to get different results with different data precisions. As you should for ANY floating point computation. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Jul 2 13:34:40 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 2 Jul 2014 10:34:40 -0700 Subject: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing In-Reply-To: References: <536CB2C6.1030305@googlemail.com> Message-ID: On Wed, Jul 2, 2014 at 3:37 AM, Matthew Brett wrote: > It looks like > 99% of Windows users do have SSE2 though [1]. So I think what is > required is > > * Build the wheels for 32-bit (easy) > * Patch the wheels to check and give helpful error in absence of SSE2 > (fairly easy) > * Get agreement these should go up on pypi and be maintained (feedback > anyone?) > +Inf It would benefit the community a LOT to have binary wheels up on PyPi, and the very small number of failures due to old hardware will be no big deal, as long as the users get a meaningful message, rather than a hard crash. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Wed Jul 2 13:36:25 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Wed, 02 Jul 2014 19:36:25 +0200 Subject: [Numpy-discussion] numpy.histogram not giving expected results In-Reply-To: References: <19894208-1D97-461B-86EB-CF4394176CEE@jpl.nasa.gov> Message-ID: <53B44319.4000007@googlemail.com> On 02.07.2014 19:29, Chris Barker wrote: > On Wed, Jul 2, 2014 at 7:57 AM, Mark Szepieniec > wrote: > > Looks this could be a float32 vs float64 problem: > > > that would explain it. > > > I guess users always be very careful when mixing floating point > types, but should numpy prevent (or warn) the user from doing so in > this case? > > > I don't think so -- this "uncertainty" is very much the nature of > histogramming, particularly with floating point values -- you should > expect to get different results with different data precisions. As you > should for ANY floating point computation. > we recently fixed a float32/float64 issue in histogram. https://github.com/numpy/numpy/issues/4799 I think it boils down to the use of round() in histogram which is not so great in python as its based on decimals not significant figures (so it does nothing for float32 values > 1e7). Though this one seems different as it still occurs in git master. From jtaylor.debian at googlemail.com Wed Jul 2 13:38:08 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Wed, 02 Jul 2014 19:38:08 +0200 Subject: [Numpy-discussion] Accessing irregular sized array data from C In-Reply-To: <53B3F0AD.70700@gmail.com> References: <53B3DBBD.8030202@gmail.com> <53B3F0AD.70700@gmail.com> Message-ID: <53B44380.4030901@googlemail.com> On 02.07.2014 13:44, Mads Ipsen wrote: > > > On 02/07/14 12:46, Julian Taylor wrote: >> On Wed, Jul 2, 2014 at 12:15 PM, Mads Ipsen wrote: >>> Hi, >>> >>> If you setup an M x N array like this >>> >>> a = 1.0*numpy.arange(24).reshape(8,3) >>> >>> you can access the data from a C function like this >>> >>> void foo(PyObject * numpy_data) >>> { >>> // Get dimension and data pointer >>> int const m = static_cast(PyArray_DIMS(numpy_data)[0]); >>> int const n = static_cast(PyArray_DIMS(numpy_data)[1]); >>> double * const data = (double *) PyArray_DATA(numpy_data); >>> >>> // Access data >>> ... >>> } >>> >>> Now, suppose I have an irregular shaped numpy array like this >>> >>> a1 = numpy.array([ 1.0, 2.0, 3.0]) >>> a2 = numpy.array([-2.0, 4.0]) >>> a3 = numpy.array([5.0]) >>> b = numpy.array([a1,a2,a3]) >>> >>> How can open up the doors to the array data of b on the C-side? >>> >> >> numpy does not directly support irregular shaped arrays (or ragged arrays). >> If you look at the result of your example you will see this: >> In [5]: b >> Out[5]: array([array([ 1., 2., 3.]), array([-2., 4.]), array([ >> 5.])], dtype=object) >> >> b has datatype object, this means it is a 1d array containing more >> array objects. Numpy does not directly know about the shapes or types >> the sub arrays. It is not necessarily homogeneous anymore, but >> compared to a regular python list you still have elementwise >> operations (if the contained python objects support them) and it can >> have multiple dimensions. >> >> In C you would access such an array it like this: >> >> PyArrayObject * const data = (PyArrayObject *) PyArray_DATA(numpy_data); >> for (i=0; i < PyArray_DIMS(numpy_data)[0]; i++) { >> assert(PyArray_Check(data[i])); >> double * const sub_data = (double *) PyArray_DATA(data[i]); >> } >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > Thanks - that'll get me going! > another thing, don't use int as the index to the array, use npy_intp which is large enough to also index arrays > 4GB if the platform supports it. Also note that object arrays are not very well optimized in numpy, so numerous operations can be slow. From chris.barker at noaa.gov Wed Jul 2 13:55:41 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 2 Jul 2014 10:55:41 -0700 Subject: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing In-Reply-To: References: <536CB2C6.1030305@googlemail.com> Message-ID: On Wed, Jul 2, 2014 at 6:36 AM, Matthew Brett wrote: > > Having a noSSE channel would make sense. > > Indeed -- the default (i.e what you get with pip install numpy) should be SSE2 -- I":d much rather have a few folks with old hardware have to go through some hoops that n have most people get something that is "much slower than MATLAB". > The exe superbinary installers can also go on pypi without causing > confusion to pip at least, but it would be good to have wheels as > well. > it doesn't hurt to have them, but we really need to get Windows away from the exe installers into the pip / virtualenv / etc world. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Wed Jul 2 14:01:23 2014 From: shoyer at gmail.com (Stephan Hoyer) Date: Wed, 2 Jul 2014 11:01:23 -0700 Subject: [Numpy-discussion] Fwd: [Python-ideas] PEP pre-draft: Support for indexing with keyword arguments In-Reply-To: References: <53B33800.1030300@ferrara.linux.it> Message-ID: NumPy doesn't have named axes, but perhaps it should. See, for example, Fernando Perez's datarray prototype (https://github.com/fperez/datarray) or my project, xray (https://github.com/xray/xray). Syntactical support for indexing an axis by name would makes using named axes much more readable. For example, compare: gridValues[x=3, y=5, z=0:8] = 0 vs. gridValues.set_items(dict(x=3, y=5, z=slice(0, 8)), 0) This is case 2 in the draft PEP. I am less sure about the other cases. For some of these, such as get with a default, using a function call is a perfectly fine substitute. Best, Stephan On Wed, Jul 2, 2014 at 1:49 AM, Nathaniel Smith wrote: > There's some discussion on python-ideas about making it possible for > python indexing to accept kwargs, eg > > arr[1:2, foo=bar] > > Since numpy is a very heavy user of indexing which might benefit from > this, I thought I should forward it here. If we have clear use cases for > such a feature then that may strongly affect the discussion. > > I admit I can't actually think of any features this would enable for us > though... > > -n > ---------- Forwarded message ---------- > From: "Stefano Borini" > Date: 2 Jul 2014 00:17 > Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword > arguments > To: "python-ideas at python.org" , "Joseph > Martinot-Lagarde" > Cc: > > Dear all, > > after the first mailing list feedback, and further private discussion with > Joseph Martinot-Lagarde, I drafted a first iteration of a PEP for keyword > arguments in indexing. The document is available here. > > https://github.com/stefanoborini/pep-keyword/blob/master/PEP-XXX.txt > > The document is not in final form when it comes to specifications. In > fact, it requires additional discussion about the best strategy to achieve > the desired result. Particular attention has been devoted to present > alternative implementation strategies, their pros and cons. I will examine > all feedback tomorrow morning European time (in approx 10 hrs), and apply > any pull requests or comments you may have. > > When the specification is finalized, or this community suggests that the > PEP is in a form suitable for official submission despite potential open > issues, I will submit it to the editor panel for further discussion, and > deploy an actual implementation according to the agreed specification for a > working test run. > > I apologize for potential mistakes in the PEP drafting and submission > process, as this is my first PEP. > > Kind Regards, > > Stefano Borini > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Jul 2 14:04:42 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 2 Jul 2014 11:04:42 -0700 Subject: [Numpy-discussion] numpy.histogram not giving expected results In-Reply-To: <53B44319.4000007@googlemail.com> References: <19894208-1D97-461B-86EB-CF4394176CEE@jpl.nasa.gov> <53B44319.4000007@googlemail.com> Message-ID: On Wed, Jul 2, 2014 at 10:36 AM, Julian Taylor < jtaylor.debian at googlemail.com> wrote: we recently fixed a float32/float64 issue in histogram. > https://github.com/numpy/numpy/issues/4799 It's a good idea to keep the edges in the same dtype as the input data, it will make for fewer surprises, but I'm not sure that it's necessarily any more "correct". A value within an eps of a bin could arbitrarily end up on either side -- that's simply the nature of floating point. > I think it boils down to the use of round() in histogram which is not so > great in python as its based on decimals not significant figures (so it > does nothing for float32 values > 1e7). > Using decimals rather than sig-figs is a problem regardless of precision, and isn't that the same problem with C libmath round() ? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Wed Jul 2 14:17:53 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Wed, 02 Jul 2014 20:17:53 +0200 Subject: [Numpy-discussion] numpy.histogram not giving expected results In-Reply-To: References: <19894208-1D97-461B-86EB-CF4394176CEE@jpl.nasa.gov> <53B44319.4000007@googlemail.com> Message-ID: <53B44CD1.8000107@googlemail.com> On 02.07.2014 20:04, Chris Barker wrote: > On Wed, Jul 2, 2014 at 10:36 AM, Julian Taylor > > > wrote: > > we recently fixed a float32/float64 issue in histogram. > https://github.com/numpy/numpy/issues/4799 > > > It's a good idea to keep the edges in the same dtype as the input data, > it will make for fewer surprises, but I'm not sure that it's necessarily > any more "correct". A value within an eps of a bin could arbitrarily end > up on either side -- that's simply the nature of floating point. > > > > I think it boils down to the use of round() in histogram which is not so > great in python as its based on decimals not significant figures (so it > does nothing for float32 values > 1e7). > > > Using decimals rather than sig-figs is a problem regardless of > precision, and isn't that the same problem with C libmath round() ? > C round just rounds to the nearest integer and the result is still a float. numpy/python is different and implements round as round(d * 10**decimal) / 10**decimal From sturla.molden at gmail.com Wed Jul 2 15:12:17 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Wed, 2 Jul 2014 19:12:17 +0000 (UTC) Subject: [Numpy-discussion] Accessing irregular sized array data from C References: <53B3DBBD.8030202@gmail.com> <53B3F0AD.70700@gmail.com> <53B44380.4030901@googlemail.com> Message-ID: <801964251426020918.364158sturla.molden-gmail.com@news.gmane.org> Julian Taylor wrote: > another thing, don't use int as the index to the array, use npy_intp > which is large enough to also index arrays > 4GB if the platform > supports it. With double* a 32-bit int can index 16 GB, a 32-bit unsigned int can index 32 GB. With char* a 32-bit int can only index 2 GB. Sturla From njs at pobox.com Wed Jul 2 15:16:54 2014 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 2 Jul 2014 20:16:54 +0100 Subject: [Numpy-discussion] Accessing irregular sized array data from C In-Reply-To: <801964251426020918.364158sturla.molden-gmail.com@news.gmane.org> References: <53B3DBBD.8030202@gmail.com> <53B3F0AD.70700@gmail.com> <53B44380.4030901@googlemail.com> <801964251426020918.364158sturla.molden-gmail.com@news.gmane.org> Message-ID: On 2 Jul 2014 20:12, "Sturla Molden" wrote: > > Julian Taylor wrote: > > > another thing, don't use int as the index to the array, use npy_intp > > which is large enough to also index arrays > 4GB if the platform > > supports it. > > With double* a 32-bit int can index 16 GB, a 32-bit unsigned int can index > 32 GB. > > With char* a 32-bit int can only index 2 GB. Per dimension, if we're talking about addressing. Numpy internally does all index/stride calculations in units of bytes, though, so if accessing the data array directly and using strides, the only reliable approach is to use intp or equivalent. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Wed Jul 2 15:20:52 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Wed, 2 Jul 2014 19:20:52 +0000 (UTC) Subject: [Numpy-discussion] Accessing irregular sized array data from C References: <53B3DBBD.8030202@gmail.com> Message-ID: <268090784426021268.412829sturla.molden-gmail.com@news.gmane.org> Chris Barker wrote: > 2) a numpy=based ragged array implementation might make sense as well. You > essentially store the data in a rank-1 shaped numpy array, and provide > custom indexing to get the "rows" out. This would allow you to have all the > data in a single memory block available to C (or Cython), so that you could > fully optimize indexing and access, and have a data structure that makes > sense in pure C. If the sub-arrays are contiguous, an ndarray of ndarrays is not inherently slower in C than the common double** idiom. As with double** the performance depends on iterating along the contiguous sub-arrays in the innermost loop. >From the Python side it will be more hurtful, yes, but not when working with the NumPy C API. Sturla From sturla.molden at gmail.com Wed Jul 2 15:33:20 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Wed, 2 Jul 2014 19:33:20 +0000 (UTC) Subject: [Numpy-discussion] Accessing irregular sized array data from C References: <53B3DBBD.8030202@gmail.com> <53B3F0AD.70700@gmail.com> <53B44380.4030901@googlemail.com> <801964251426020918.364158sturla.molden-gmail.com@news.gmane.org> Message-ID: <1406948660426021705.905843sturla.molden-gmail.com@news.gmane.org> Nathaniel Smith wrote: > Numpy internally does all index/stride calculations in units of bytes, > though, so if accessing the data array directly and using strides, the only > reliable approach is to use intp or equivalent. If we use PyArray_STRIDES we should use npy_intp, yes, because we are computing the address directly from a char*. It depends on how much we know about the array in advance. Also a C standard pendant would point out we can only assume an int will be at least 16 bit, and we should use long to make sure it is at least 32 bit. Sturla From fperez.net at gmail.com Wed Jul 2 22:17:19 2014 From: fperez.net at gmail.com (Fernando Perez) Date: Wed, 2 Jul 2014 19:17:19 -0700 Subject: [Numpy-discussion] Fwd: [Python-ideas] PEP pre-draft: Support for indexing with keyword arguments In-Reply-To: References: <53B33800.1030300@ferrara.linux.it> Message-ID: Added to the py3 Bof ideas page: https://github.com/ipython/ipython/wiki/Sprints:-SciPy2014-Py3-BoF Thanks for this heads-up! On Wed, Jul 2, 2014 at 11:01 AM, Stephan Hoyer wrote: > NumPy doesn't have named axes, but perhaps it should. See, for example, > Fernando Perez's datarray prototype (https://github.com/fperez/datarray) > or my project, xray (https://github.com/xray/xray). > > Syntactical support for indexing an axis by name would makes using named > axes much more readable. For example, compare: > > gridValues[x=3, y=5, z=0:8] = 0 > > vs. > > gridValues.set_items(dict(x=3, y=5, z=slice(0, 8)), 0) > > This is case 2 in the draft PEP. > > I am less sure about the other cases. For some of these, such as get with > a default, using a function call is a perfectly fine substitute. > > Best, > Stephan > > > > > On Wed, Jul 2, 2014 at 1:49 AM, Nathaniel Smith wrote: > >> There's some discussion on python-ideas about making it possible for >> python indexing to accept kwargs, eg >> >> arr[1:2, foo=bar] >> >> Since numpy is a very heavy user of indexing which might benefit from >> this, I thought I should forward it here. If we have clear use cases for >> such a feature then that may strongly affect the discussion. >> >> I admit I can't actually think of any features this would enable for us >> though... >> >> -n >> ---------- Forwarded message ---------- >> From: "Stefano Borini" >> Date: 2 Jul 2014 00:17 >> Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword >> arguments >> To: "python-ideas at python.org" , "Joseph >> Martinot-Lagarde" >> Cc: >> >> Dear all, >> >> after the first mailing list feedback, and further private discussion >> with Joseph Martinot-Lagarde, I drafted a first iteration of a PEP for >> keyword arguments in indexing. The document is available here. >> >> https://github.com/stefanoborini/pep-keyword/blob/master/PEP-XXX.txt >> >> The document is not in final form when it comes to specifications. In >> fact, it requires additional discussion about the best strategy to achieve >> the desired result. Particular attention has been devoted to present >> alternative implementation strategies, their pros and cons. I will examine >> all feedback tomorrow morning European time (in approx 10 hrs), and apply >> any pull requests or comments you may have. >> >> When the specification is finalized, or this community suggests that the >> PEP is in a form suitable for official submission despite potential open >> issues, I will submit it to the editor panel for further discussion, and >> deploy an actual implementation according to the agreed specification for a >> working test run. >> >> I apologize for potential mistakes in the PEP drafting and submission >> process, as this is my first PEP. >> >> Kind Regards, >> >> Stefano Borini >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Fernando Perez (@fperez_org; http://fperez.org) fperez.net-at-gmail: mailing lists only (I ignore this when swamped!) fernando.perez-at-berkeley: contact me here for any direct mail -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Wed Jul 2 23:56:17 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 03 Jul 2014 05:56:17 +0200 Subject: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing In-Reply-To: References: <536CB2C6.1030305@googlemail.com> Message-ID: On 02/07/14 19:55, Chris Barker wrote: > > Indeed -- the default (i.e what you get with pip install numpy) should > be SSE2 -- I":d much rather have a few folks with old hardware have to > go through some hoops that n have most people get something that is > "much slower than MATLAB". I think we should use SSE3 as default. It is already ten years old. Most users (99.999 %) who want binary wheels have an SSE3 capable CPU. According to Wikipedia: AMD: Athlon 64 (since Venice Stepping E3 and San Diego Stepping E4) Athlon 64 X2 Athlon 64 FX (since San Diego Stepping E4) Opteron (since Stepping E4) Sempron (since Palermo. Stepping E3) Phenom Phenom II Athlon II Turion 64 Turion 64 X2 Turion X2 Turion X2 Ultra Turion II X2 Mobile Turion II X2 Ultra APU FX Series Intel: Celeron D Celeron (starting with Core microarchitecture) Pentium 4 (since Prescott) Pentium D Pentium Extreme Edition (but NOT Pentium 4 Extreme Edition) Pentium Dual-Core Pentium (starting with Core microarchitecture) Core Xeon (since Nocona) Atom If you have Pentium II, you can build your own NumPy... Sturla From jtaylor.debian at googlemail.com Thu Jul 3 03:42:41 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Thu, 03 Jul 2014 09:42:41 +0200 Subject: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing In-Reply-To: References: <536CB2C6.1030305@googlemail.com> Message-ID: <53B50971.7040407@googlemail.com> On 03.07.2014 05:56, Sturla Molden wrote: > On 02/07/14 19:55, Chris Barker wrote: > >> >> Indeed -- the default (i.e what you get with pip install numpy) should >> be SSE2 -- I":d much rather have a few folks with old hardware have to >> go through some hoops that n have most people get something that is >> "much slower than MATLAB". > > > I think we should use SSE3 as default. It is already ten years old. Most > users (99.999 %) who want binary wheels have an SSE3 capable CPU. > while true that pretty much all cpus currently around have it there is no technical requirement for even new cpus to have SSE3. Compared to SSE2 you do not have to implement it to sell a compatible 64 bit cpu. Not even the new x32 ABI requires it. In practice I think we could easily get away with using SSE3 as default but I still would like to see if it makes any performance difference in benchmarks. In my experience (which is exclusively on pre-haswell machines) the horizontal operations it offers tend to be slower than other solutions. From m.hulsman at tudelft.nl Thu Jul 3 04:51:31 2014 From: m.hulsman at tudelft.nl (Marc Hulsman) Date: Thu, 03 Jul 2014 10:51:31 +0200 Subject: [Numpy-discussion] Fast way to convert (nested) list to numpy object array? Message-ID: <53B51993.7080207@tudelft.nl> Hello, In my application I use nested, someitmes variable length lists, e.g. [[1,2], [1,2,3], ...]. These can also become double nested, etc. up to arbitrary complexity. I like to use numpy indicing on the outer list, i.e. I want to create: array([[1, 2], [1, 2, 3]], dtype=object) However, because numpy likes to 'walk' through the nested lists, this becomes rather slow when the nested lists are large, e.g. k = [range(i) for i in range(10000)] %timeit numpy.array(k) 1 loops, best of 3: 2.11 s per loop Compared to shorter lists, e.g: k2 = [range(numpy.random.randint(0,10)) for i in range(10000)] %timeit numpy.array(k2) 100 loops, best of 3: 2.7 ms per loop As I know beforehand that numpy does not have to descend into these objects, I would just like to create a 1-dimensional array. I thought about using fromiter, but his fails with: ValueError: cannot create object arrays from iterator A second approach I tried is to create an empty array, and then fill it: x = numpy.empty(len(k), dtype=object) %timeit x[:] = k 1000 loops, best of 3: 220 ?s per loop This works already much, much better, but the loop still takes time to 'descend' into the objects if they have a fixed size, e.g.: k3 = [[range(10) for i in range(100)] for i in range(10000)] %timeit x[:] = k3 10 loops, best of 3: 45.6 ms per loop A python loop is in these cases even faster %timeit for pos, e in enumerate(k3): x[pos] = e 1000 loops, best of 3: 1.02 ms per loop This piece of code is quite time-critical in my application, and I observe slow downs due to this behaviour. My question therefore is if there is a fast way to just convert a list simply into a 1-dimensional object array, without each object being descended into? More in general, if i create an array with numpy.array(k), would it be possible to indicate that it should search only 1,2,... nested levels deep into k? Thanks for any advice, Marc From pablopg at computer.org Thu Jul 3 05:14:31 2014 From: pablopg at computer.org (=?UTF-8?B?UGFibG8gUMOpcmV6IEdhcmPDrWE=?=) Date: Thu, 3 Jul 2014 11:14:31 +0200 Subject: [Numpy-discussion] Numpy and debug symbols Message-ID: Hello, I'm a newcomer and I have a question I did not manage to solve yet, I posted it into these two stack-overflow entries: http://stackoverflow.com/questions/24529811/compiling-numpy-for-windows-python-2-7-7 http://stackoverflow.com/questions/24548485/using-numpy-on-an-embedded-python-interpreter-using-vs2008-under-windows-7 Thank you very much in advance! -- Pablo P?rez Garc?a -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Thu Jul 3 05:22:57 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Thu, 3 Jul 2014 11:22:57 +0200 Subject: [Numpy-discussion] Numpy and debug symbols In-Reply-To: References: Message-ID: On Thu, Jul 3, 2014 at 11:14 AM, Pablo P?rez Garc?a wrote: > Hello, I'm a newcomer and I have a question I did not manage to solve yet, I > posted it into these two stack-overflow entries: > > http://stackoverflow.com/questions/24529811/compiling-numpy-for-windows-python-2-7-7 > > http://stackoverflow.com/questions/24548485/using-numpy-on-an-embedded-python-interpreter-using-vs2008-under-windows-7 > I don't know how it works on windows but on linux/mac in order to import debug builds of binary extensions you need to use debug build of python which is a different runtime. I guess on windows you either have to download a special installer with the debug build or build it yourself (configure --with-pydebug) From jtaylor.debian at googlemail.com Thu Jul 3 05:30:33 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Thu, 3 Jul 2014 11:30:33 +0200 Subject: [Numpy-discussion] Fast way to convert (nested) list to numpy object array? In-Reply-To: <53B51993.7080207@tudelft.nl> References: <53B51993.7080207@tudelft.nl> Message-ID: numpy descends into the lists even if you request a object dtype as it treats object arrays containing nested lists of equal size as ndimensional: np.array([[1,2], [3,4]], dtype=object).ndim 2 I don't think we have a constructor that limits the maximum dimension, only one the minimum dimension. I guess we could add one e.g. np.array(nested_list, dtype=object, ndmax=1) But I'm not sure if its really worth it, can't you somehow move the array construction out of your tight loops? From jtaylor.debian at googlemail.com Thu Jul 3 05:43:20 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Thu, 3 Jul 2014 11:43:20 +0200 Subject: [Numpy-discussion] Fast way to convert (nested) list to numpy object array? In-Reply-To: References: <53B51993.7080207@tudelft.nl> Message-ID: On Thu, Jul 3, 2014 at 11:30 AM, Julian Taylor wrote: > numpy descends into the lists even if you request a object dtype as it > treats object arrays containing nested lists of equal size as > ndimensional: > > np.array([[1,2], [3,4]], dtype=object).ndim > 2 > > I don't think we have a constructor that limits the maximum dimension, > only one the minimum dimension. > I guess we could add one e.g. np.array(nested_list, dtype=object, ndmax=1) > But I'm not sure if its really worth it, can't you somehow move the > array construction out of your tight loops? On second though I guess adding a short circuit to the dimension discovery on mismatching list length with object type should solve the issue too. A bit more information on the use case would still be useful, why do you need to use numpy arrays for this in the first place? From matthew.brett at gmail.com Thu Jul 3 06:06:39 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 3 Jul 2014 11:06:39 +0100 Subject: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing In-Reply-To: References: <536CB2C6.1030305@googlemail.com> Message-ID: Hi, On Thu, Jul 3, 2014 at 4:56 AM, Sturla Molden wrote: > On 02/07/14 19:55, Chris Barker wrote: > >> >> Indeed -- the default (i.e what you get with pip install numpy) should >> be SSE2 -- I":d much rather have a few folks with old hardware have to >> go through some hoops that n have most people get something that is >> "much slower than MATLAB". > > > I think we should use SSE3 as default. It is already ten years old. Most > users (99.999 %) who want binary wheels have an SSE3 capable CPU. The 99% for SSE2 comes from the Firefox crash reports, where the large majority are for very recent Firefox downloads. If you can identify SSE3 machines from the reported CPU string (as the Firefox people did for SSE2), please do have a look a see if you can get a count for SSE3 in the Firefox crash reports; if it's close to 99% that would make a strong argument: https://github.com/numpy/numpy/wiki/Windows-versions#sse--sse2 https://gist.github.com/matthew-brett/9cb5274f7451a3eb8fc0 Cheers, Matthew From cmkleffner at gmail.com Thu Jul 3 06:33:35 2014 From: cmkleffner at gmail.com (Carl Kleffner) Date: Thu, 3 Jul 2014 12:33:35 +0200 Subject: [Numpy-discussion] Numpy and debug symbols In-Reply-To: References: Message-ID: Hi, to trace this error, you can try to run your programm with the dependency walker http://www.dependencywalker.com/ . In the menu there is a profiling option. With 'Start profiling' you get messages of all accesses to DLLs and Python extensions. Most likely a DLL is not found. Be aware: for 64bit development you need a dedicated zip-file for the dependency walker. Regards Carl 2014-07-03 11:22 GMT+02:00 Julian Taylor : > On Thu, Jul 3, 2014 at 11:14 AM, Pablo P?rez Garc?a > wrote: > > Hello, I'm a newcomer and I have a question I did not manage to solve > yet, I > > posted it into these two stack-overflow entries: > > > > > http://stackoverflow.com/questions/24529811/compiling-numpy-for-windows-python-2-7-7 > > > > > http://stackoverflow.com/questions/24548485/using-numpy-on-an-embedded-python-interpreter-using-vs2008-under-windows-7 > > > > I don't know how it works on windows but on linux/mac in order to > import debug builds of binary extensions you need to use debug build > of python which is a different runtime. I guess on windows you either > have to download a special installer with the debug build or build it > yourself (configure --with-pydebug) > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Thu Jul 3 06:46:23 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 3 Jul 2014 11:46:23 +0100 Subject: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing In-Reply-To: References: <536CB2C6.1030305@googlemail.com> Message-ID: I guess this one's mainly for Carl: On Thu, Jul 3, 2014 at 11:06 AM, Matthew Brett wrote: > Hi, > > On Thu, Jul 3, 2014 at 4:56 AM, Sturla Molden wrote: >> On 02/07/14 19:55, Chris Barker wrote: >> >>> >>> Indeed -- the default (i.e what you get with pip install numpy) should >>> be SSE2 -- I":d much rather have a few folks with old hardware have to >>> go through some hoops that n have most people get something that is >>> "much slower than MATLAB". >> >> >> I think we should use SSE3 as default. It is already ten years old. Most >> users (99.999 %) who want binary wheels have an SSE3 capable CPU. > > The 99% for SSE2 comes from the Firefox crash reports, where the large > majority are for very recent Firefox downloads. > > If you can identify SSE3 machines from the reported CPU string (as the > Firefox people did for SSE2), please do have a look a see if you can > get a count for SSE3 in the Firefox crash reports; if it's close to > 99% that would make a strong argument: > > https://github.com/numpy/numpy/wiki/Windows-versions#sse--sse2 > https://gist.github.com/matthew-brett/9cb5274f7451a3eb8fc0 Jonathan Helmus recently pointed out https://ci.appveyor.com in a discussion on the scikit-image mailing list. The scikit-image team are trying to get builds and tests working there. The configuration file allows arbitrary cmd and powershell commands executed in a clean Windows virtual machine. Do you think it would be possible to get the wheel builds working on something like that? That would be a big step forward, just because the current procedure is rather fiddly, even if not very difficult. Any news on the pull request to numpy? Waiting eagerly :) Cheers, Matthew From pablopg at computer.org Thu Jul 3 06:51:35 2014 From: pablopg at computer.org (=?UTF-8?B?UGFibG8gUMOpcmV6IEdhcmPDrWE=?=) Date: Thu, 3 Jul 2014 12:51:35 +0200 Subject: [Numpy-discussion] Numpy and debug symbols In-Reply-To: References: Message-ID: Hello, I was able to run Dependency Walker and I noticed that in Debug mode the following type of libraries are not loaded: "MULTIARRAY.PYD", "UMATH.PYD" Also in debug mode Python27_D is loaded and in release mode Python27 which sounds good to me... but for some reason debug mode cannot load necessary dependencies. I attach both files. By the way, I like this community! 2014-07-03 12:33 GMT+02:00 Carl Kleffner : > Hi, > > to trace this error, you can try to run your programm with the dependency > walker http://www.dependencywalker.com/ . In the menu there is a > profiling option. With 'Start profiling' you get messages of all accesses > to DLLs and Python extensions. Most likely a DLL is not found. > Be aware: for 64bit development you need a dedicated zip-file for the > dependency walker. > > Regards > > Carl > > > 2014-07-03 11:22 GMT+02:00 Julian Taylor : > > On Thu, Jul 3, 2014 at 11:14 AM, Pablo P?rez Garc?a >> wrote: >> > Hello, I'm a newcomer and I have a question I did not manage to solve >> yet, I >> > posted it into these two stack-overflow entries: >> > >> > >> http://stackoverflow.com/questions/24529811/compiling-numpy-for-windows-python-2-7-7 >> > >> > >> http://stackoverflow.com/questions/24548485/using-numpy-on-an-embedded-python-interpreter-using-vs2008-under-windows-7 >> > >> >> I don't know how it works on windows but on linux/mac in order to >> import debug builds of binary extensions you need to use debug build >> of python which is a different runtime. I guess on windows you either >> have to download a special installer with the debug build or build it >> yourself (configure --with-pydebug) >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Pablo P?rez Garc?a -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- #Dependency Walker for DEBUG. Warning: At least one delay-load dependency module was not found. Warning: At least one module has an unresolved import due to a missing export function in a delay-load dependent module. -------------------------------------------------------------------------------- Starting profile on 03/07/2014 at 12:41:42 Options Selected: Simulate ShellExecute by inserting any App Paths directories into the PATH environment variable. Log DllMain calls for process attach and process detach messages. Hook the process to gather more detailed dependency information. Log LoadLibrary function calls. Log GetProcAddress function calls. Log debug output messages. Automatically open and profile child processes. -------------------------------------------------------------------------------- Started "DEMO_FOR_PYTHON.EXE" (process 0xBD4) at address 0x00DD0000. Successfully hooked module. Loaded "NTDLL.DLL" at address 0x77D50000. Successfully hooked module. Loaded "KERNEL32.DLL" at address 0x75910000. Successfully hooked module. Loaded "KERNELBASE.DLL" at address 0x77900000. Successfully hooked module. DllMain(0x77900000, DLL_PROCESS_ATTACH, 0x00000000) in "KERNELBASE.DLL" called. DllMain(0x77900000, DLL_PROCESS_ATTACH, 0x00000000) in "KERNELBASE.DLL" returned 1 (0x1). DllMain(0x75910000, DLL_PROCESS_ATTACH, 0x00000000) in "KERNEL32.DLL" called. DllMain(0x75910000, DLL_PROCESS_ATTACH, 0x00000000) in "KERNEL32.DLL" returned 1 (0x1). Injected "DEPENDS.DLL" at address 0x08370000. DllMain(0x08370000, DLL_PROCESS_ATTACH, 0x00000000) in "DEPENDS.DLL" called. DllMain(0x08370000, DLL_PROCESS_ATTACH, 0x00000000) in "DEPENDS.DLL" returned 1 (0x1). Loaded "MSVCR90D.DLL" at address 0x634F0000. Successfully hooked module. Loaded "PYTHON27_D.DLL" at address 0x1E000000. Successfully hooked module. Loaded "USER32.DLL" at address 0x76560000. Successfully hooked module. Loaded "GDI32.DLL" at address 0x75A60000. Successfully hooked module. Loaded "LPK.DLL" at address 0x764F0000. Successfully hooked module. Loaded "USP10.DLL" at address 0x77850000. Successfully hooked module. Loaded "MSVCRT.DLL" at address 0x75830000. Successfully hooked module. Loaded "ADVAPI32.DLL" at address 0x75DB0000. Successfully hooked module. Loaded "SECHOST.DLL" at address 0x760D0000. Successfully hooked module. Loaded "RPCRT4.DLL" at address 0x75B80000. Successfully hooked module. Loaded "SSPICLI.DLL" at address 0x75750000. Successfully hooked module. Loaded "CRYPTBASE.DLL" at address 0x75740000. Successfully hooked module. Loaded "SHELL32.DLL" at address 0x76790000. Successfully hooked module. Loaded "SHLWAPI.DLL" at address 0x76490000. Successfully hooked module. Entrypoint reached. All implicit modules have been loaded. DllMain(0x634F0000, DLL_PROCESS_ATTACH, 0x002DF840) in "MSVCR90D.DLL" called. GetProcAddress(0x75910000 [KERNEL32.DLL], "FlsAlloc") called from "MSVCR90D.DLL" at address 0x6352E339 and returned 0x75924EF3. GetProcAddress(0x75910000 [KERNEL32.DLL], "FlsGetValue") called from "MSVCR90D.DLL" at address 0x6352E34D and returned 0x75921252. GetProcAddress(0x75910000 [KERNEL32.DLL], "FlsSetValue") called from "MSVCR90D.DLL" at address 0x6352E361 and returned 0x759241D0. GetProcAddress(0x75910000 [KERNEL32.DLL], "FlsFree") called from "MSVCR90D.DLL" at address 0x6352E375 and returned 0x7592355F. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90D.DLL" at address 0x6352E0DC and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90D.DLL" at address 0x6352E0DC and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90D.DLL" at address 0x6352E0DC and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90D.DLL" at address 0x6352E0DC and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90D.DLL" at address 0x6352E0DC and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90D.DLL" at address 0x6352E0DC and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90D.DLL" at address 0x6352E0DC and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90D.DLL" at address 0x6352E1DC and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90D.DLL" at address 0x6352E1DC and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90D.DLL" at address 0x6352E5DB and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90D.DLL" at address 0x6352E5F3 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "IsProcessorFeaturePresent") called from "MSVCR90D.DLL" at address 0x635E5A0B and returned 0x759251FD. GetProcAddress(0x75910000 [KERNEL32.DLL], "FindActCtxSectionStringW") called from "MSVCR90D.DLL" at address 0x6352CA3A and returned 0x7592A6D8. DllMain(0x634F0000, DLL_PROCESS_ATTACH, 0x002DF840) in "MSVCR90D.DLL" returned 1 (0x1). DllMain(0x75830000, DLL_PROCESS_ATTACH, 0x002DF840) in "MSVCRT.DLL" called. DllMain(0x75830000, DLL_PROCESS_ATTACH, 0x002DF840) in "MSVCRT.DLL" returned 1 (0x1). DllMain(0x77850000, DLL_PROCESS_ATTACH, 0x002DF840) in "USP10.DLL" called. LoadLibraryA("gdi32.dll") called from "USP10.DLL" at address 0x77866020. LoadLibraryA("gdi32.dll") returned 0x75A60000. GetProcAddress(0x75A60000 [GDI32.DLL], "GetCharABCWidthsI") called from "USP10.DLL" at address 0x77866055 and returned 0x75A799A3. DllMain(0x77850000, DLL_PROCESS_ATTACH, 0x002DF840) in "USP10.DLL" returned 1 (0x1). DllMain(0x764F0000, DLL_PROCESS_ATTACH, 0x002DF840) in "LPK.DLL" called. DllMain(0x764F0000, DLL_PROCESS_ATTACH, 0x002DF840) in "LPK.DLL" returned 1 (0x1). DllMain(0x75A60000, DLL_PROCESS_ATTACH, 0x002DF840) in "GDI32.DLL" called. DllMain(0x75A60000, DLL_PROCESS_ATTACH, 0x002DF840) in "GDI32.DLL" returned 1 (0x1). DllMain(0x75740000, DLL_PROCESS_ATTACH, 0x002DF840) in "CRYPTBASE.DLL" called. DllMain(0x75740000, DLL_PROCESS_ATTACH, 0x002DF840) in "CRYPTBASE.DLL" returned 1 (0x1). DllMain(0x75750000, DLL_PROCESS_ATTACH, 0x002DF840) in "SSPICLI.DLL" called. DllMain(0x75750000, DLL_PROCESS_ATTACH, 0x002DF840) in "SSPICLI.DLL" returned 1 (0x1). DllMain(0x75B80000, DLL_PROCESS_ATTACH, 0x002DF840) in "RPCRT4.DLL" called. DllMain(0x75B80000, DLL_PROCESS_ATTACH, 0x002DF840) in "RPCRT4.DLL" returned 1975101185 (0x75B9A701). DllMain(0x760D0000, DLL_PROCESS_ATTACH, 0x002DF840) in "SECHOST.DLL" called. DllMain(0x760D0000, DLL_PROCESS_ATTACH, 0x002DF840) in "SECHOST.DLL" returned 1 (0x1). DllMain(0x75DB0000, DLL_PROCESS_ATTACH, 0x002DF840) in "ADVAPI32.DLL" called. DllMain(0x75DB0000, DLL_PROCESS_ATTACH, 0x002DF840) in "ADVAPI32.DLL" returned 1 (0x1). DllMain(0x76560000, DLL_PROCESS_ATTACH, 0x002DF840) in "USER32.DLL" called. LoadLibraryW("C:\Windows\system32\IMM32.DLL") called from "USER32.DLL" at address 0x7657CF0E. Loaded "IMM32.DLL" at address 0x76500000. Successfully hooked module. Loaded "MSCTF.DLL" at address 0x75FF0000. Successfully hooked module. DllMain(0x75FF0000, DLL_PROCESS_ATTACH, 0x00000000) in "MSCTF.DLL" called. DllMain(0x75FF0000, DLL_PROCESS_ATTACH, 0x00000000) in "MSCTF.DLL" returned 1 (0x1). DllMain(0x76500000, DLL_PROCESS_ATTACH, 0x00000000) in "IMM32.DLL" called. GetProcAddress(0x76500000 [IMM32.DLL], "ImmWINNLSEnableIME") called from "USER32.DLL" at address 0x7657C312 and returned 0x7651F637. GetProcAddress(0x76500000 [IMM32.DLL], "ImmWINNLSGetEnableStatus") called from "USER32.DLL" at address 0x7657C327 and returned 0x7651F65E. GetProcAddress(0x76500000 [IMM32.DLL], "ImmSendIMEMessageExW") called from "USER32.DLL" at address 0x7657C33C and returned 0x7651F8EC. GetProcAddress(0x76500000 [IMM32.DLL], "ImmSendIMEMessageExA") called from "USER32.DLL" at address 0x7657C351 and returned 0x7651F907. GetProcAddress(0x76500000 [IMM32.DLL], "ImmIMPGetIMEW") called from "USER32.DLL" at address 0x7657C366 and returned 0x7651FB65. GetProcAddress(0x76500000 [IMM32.DLL], "ImmIMPGetIMEA") called from "USER32.DLL" at address 0x7657C37B and returned 0x7651FB99. GetProcAddress(0x76500000 [IMM32.DLL], "ImmIMPQueryIMEW") called from "USER32.DLL" at address 0x7657C390 and returned 0x7651F9CA. GetProcAddress(0x76500000 [IMM32.DLL], "ImmIMPQueryIMEA") called from "USER32.DLL" at address 0x7657C3A5 and returned 0x7651FAD6. GetProcAddress(0x76500000 [IMM32.DLL], "ImmIMPSetIMEW") called from "USER32.DLL" at address 0x7657C3BA and returned 0x7651F746. GetProcAddress(0x76500000 [IMM32.DLL], "ImmIMPSetIMEA") called from "USER32.DLL" at address 0x7657C3CF and returned 0x7651F86E. GetProcAddress(0x76500000 [IMM32.DLL], "ImmAssociateContext") called from "USER32.DLL" at address 0x7657C3E4 and returned 0x76513540. GetProcAddress(0x76500000 [IMM32.DLL], "ImmEscapeA") called from "USER32.DLL" at address 0x7657C3F9 and returned 0x76519327. GetProcAddress(0x76500000 [IMM32.DLL], "ImmEscapeW") called from "USER32.DLL" at address 0x7657C40E and returned 0x765195A9. GetProcAddress(0x76500000 [IMM32.DLL], "ImmGetCompositionStringA") called from "USER32.DLL" at address 0x7657C423 and returned 0x76517A37. GetProcAddress(0x76500000 [IMM32.DLL], "ImmGetCompositionStringW") called from "USER32.DLL" at address 0x7657C438 and returned 0x7651420C. GetProcAddress(0x76500000 [IMM32.DLL], "ImmGetCompositionWindow") called from "USER32.DLL" at address 0x7657C44D and returned 0x76512E79. GetProcAddress(0x76500000 [IMM32.DLL], "ImmGetContext") called from "USER32.DLL" at address 0x7657C462 and returned 0x76512084. GetProcAddress(0x76500000 [IMM32.DLL], "ImmGetDefaultIMEWnd") called from "USER32.DLL" at address 0x7657C477 and returned 0x76511F9D. GetProcAddress(0x76500000 [IMM32.DLL], "ImmIsIME") called from "USER32.DLL" at address 0x7657C48C and returned 0x76512FC7. GetProcAddress(0x76500000 [IMM32.DLL], "ImmReleaseContext") called from "USER32.DLL" at address 0x7657C4A1 and returned 0x765121A2. GetProcAddress(0x76500000 [IMM32.DLL], "ImmRegisterClient") called from "USER32.DLL" at address 0x7657C4B6 and returned 0x76511346. GetProcAddress(0x76500000 [IMM32.DLL], "ImmGetCompositionFontW") called from "USER32.DLL" at address 0x7657C4CB and returned 0x765168C8. GetProcAddress(0x76500000 [IMM32.DLL], "ImmGetCompositionFontA") called from "USER32.DLL" at address 0x7657C4E0 and returned 0x7651682C. GetProcAddress(0x76500000 [IMM32.DLL], "ImmSetCompositionFontW") called from "USER32.DLL" at address 0x7657C4F5 and returned 0x76513938. GetProcAddress(0x76500000 [IMM32.DLL], "ImmSetCompositionFontA") called from "USER32.DLL" at address 0x7657C50A and returned 0x76516964. GetProcAddress(0x76500000 [IMM32.DLL], "ImmSetCompositionWindow") called from "USER32.DLL" at address 0x7657C51F and returned 0x765138AA. GetProcAddress(0x76500000 [IMM32.DLL], "ImmNotifyIME") called from "USER32.DLL" at address 0x7657C534 and returned 0x76513C6C. GetProcAddress(0x76500000 [IMM32.DLL], "ImmLockIMC") called from "USER32.DLL" at address 0x7657C549 and returned 0x76511E7D. GetProcAddress(0x76500000 [IMM32.DLL], "ImmUnlockIMC") called from "USER32.DLL" at address 0x7657C55E and returned 0x76511E95. GetProcAddress(0x76500000 [IMM32.DLL], "ImmLoadIME") called from "USER32.DLL" at address 0x7657C573 and returned 0x7651197A. GetProcAddress(0x76500000 [IMM32.DLL], "ImmSetOpenStatus") called from "USER32.DLL" at address 0x7657C588 and returned 0x76513FF3. GetProcAddress(0x76500000 [IMM32.DLL], "ImmFreeLayout") called from "USER32.DLL" at address 0x7657C59D and returned 0x765197EF. GetProcAddress(0x76500000 [IMM32.DLL], "ImmActivateLayout") called from "USER32.DLL" at address 0x7657C5B2 and returned 0x76518DF5. GetProcAddress(0x76500000 [IMM32.DLL], "ImmGetCandidateWindow") called from "USER32.DLL" at address 0x7657C5C7 and returned 0x76512EBC. GetProcAddress(0x76500000 [IMM32.DLL], "ImmSetCandidateWindow") called from "USER32.DLL" at address 0x7657C5DC and returned 0x76513E02. GetProcAddress(0x76500000 [IMM32.DLL], "ImmConfigureIMEW") called from "USER32.DLL" at address 0x7657C5F1 and returned 0x7651913F. GetProcAddress(0x76500000 [IMM32.DLL], "ImmGetConversionStatus") called from "USER32.DLL" at address 0x7657C606 and returned 0x765124E9. GetProcAddress(0x76500000 [IMM32.DLL], "ImmSetConversionStatus") called from "USER32.DLL" at address 0x7657C61B and returned 0x76513EE6. GetProcAddress(0x76500000 [IMM32.DLL], "ImmSetStatusWindowPos") called from "USER32.DLL" at address 0x7657C630 and returned 0x76516A7C. GetProcAddress(0x76500000 [IMM32.DLL], "ImmGetImeInfoEx") called from "USER32.DLL" at address 0x7657C645 and returned 0x765114D8. GetProcAddress(0x76500000 [IMM32.DLL], "ImmLockImeDpi") called from "USER32.DLL" at address 0x7657C65A and returned 0x76512025. GetProcAddress(0x76500000 [IMM32.DLL], "ImmUnlockImeDpi") called from "USER32.DLL" at address 0x7657C66F and returned 0x76511FD8. GetProcAddress(0x76500000 [IMM32.DLL], "ImmGetOpenStatus") called from "USER32.DLL" at address 0x7657C684 and returned 0x76513DCF. GetProcAddress(0x76500000 [IMM32.DLL], "ImmSetActiveContext") called from "USER32.DLL" at address 0x7657C699 and returned 0x76512246. GetProcAddress(0x76500000 [IMM32.DLL], "ImmTranslateMessage") called from "USER32.DLL" at address 0x7657C6AE and returned 0x7651F27F. GetProcAddress(0x76500000 [IMM32.DLL], "ImmLoadLayout") called from "USER32.DLL" at address 0x7657C6C3 and returned 0x76519E79. GetProcAddress(0x76500000 [IMM32.DLL], "ImmProcessKey") called from "USER32.DLL" at address 0x7657C6D8 and returned 0x76513A3C. GetProcAddress(0x76500000 [IMM32.DLL], "ImmPutImeMenuItemsIntoMappedFile") called from "USER32.DLL" at address 0x7657C6ED and returned 0x76524E96. GetProcAddress(0x76500000 [IMM32.DLL], "ImmGetProperty") called from "USER32.DLL" at address 0x7657C702 and returned 0x76513BB8. GetProcAddress(0x76500000 [IMM32.DLL], "ImmSetCompositionStringA") called from "USER32.DLL" at address 0x7657C717 and returned 0x765183C2. GetProcAddress(0x76500000 [IMM32.DLL], "ImmSetCompositionStringW") called from "USER32.DLL" at address 0x7657C72C and returned 0x765183E9. GetProcAddress(0x76500000 [IMM32.DLL], "ImmEnumInputContext") called from "USER32.DLL" at address 0x7657C741 and returned 0x765131DD. GetProcAddress(0x76500000 [IMM32.DLL], "ImmSystemHandler") called from "USER32.DLL" at address 0x7657C756 and returned 0x7651B1CF. GetProcAddress(0x76500000 [IMM32.DLL], "CtfImmTIMActivate") called from "USER32.DLL" at address 0x7657C767 and returned 0x76511888. GetProcAddress(0x76500000 [IMM32.DLL], "CtfImmRestoreToolbarWnd") called from "USER32.DLL" at address 0x7657C778 and returned 0x76525114. GetProcAddress(0x76500000 [IMM32.DLL], "CtfImmHideToolbarWnd") called from "USER32.DLL" at address 0x7657C789 and returned 0x7652514B. GetProcAddress(0x76500000 [IMM32.DLL], "CtfImmDispatchDefImeMessage") called from "USER32.DLL" at address 0x7657C79A and returned 0x7651163C. GetProcAddress(0x76500000 [IMM32.DLL], "CtfImmNotify") called from "USER32.DLL" at address 0x7657C7AB and returned 0x765115D0. GetProcAddress(0x76500000 [IMM32.DLL], "CtfImmSetDefaultRemoteKeyboardLayout") called from "USER32.DLL" at address 0x7657C7BC and returned 0x765253CC. GetProcAddress(0x76500000 [IMM32.DLL], "CtfImmGetCompatibleKeyboardLayout") called from "USER32.DLL" at address 0x7657C7CD and returned 0x765253DC. DllMain(0x76500000, DLL_PROCESS_ATTACH, 0x00000000) in "IMM32.DLL" returned 1 (0x1). LoadLibraryW("C:\Windows\system32\IMM32.DLL") returned 0x76500000. GetProcAddress(0x764F0000 [LPK.DLL], "LpkTabbedTextOut") called from "GDI32.DLL" at address 0x75A76970 and returned 0x764F48A0. GetProcAddress(0x764F0000 [LPK.DLL], "LpkPSMTextOut") called from "GDI32.DLL" at address 0x75A7697B and returned 0x764F1430. GetProcAddress(0x764F0000 [LPK.DLL], "LpkDrawTextEx") called from "GDI32.DLL" at address 0x75A76986 and returned 0x764F13D0. GetProcAddress(0x764F0000 [LPK.DLL], "LpkEditControl") called from "GDI32.DLL" at address 0x75A76991 and returned 0x764F7000. DllMain(0x76560000, DLL_PROCESS_ATTACH, 0x002DF840) in "USER32.DLL" returned 1 (0x1). DllMain(0x76490000, DLL_PROCESS_ATTACH, 0x002DF840) in "SHLWAPI.DLL" called. DllMain(0x76490000, DLL_PROCESS_ATTACH, 0x002DF840) in "SHLWAPI.DLL" returned 1 (0x1). DllMain(0x76790000, DLL_PROCESS_ATTACH, 0x002DF840) in "SHELL32.DLL" called. DllMain(0x76790000, DLL_PROCESS_ATTACH, 0x002DF840) in "SHELL32.DLL" returned 1 (0x1). DllMain(0x1E000000, DLL_PROCESS_ATTACH, 0x002DF840) in "PYTHON27_D.DLL" called. GetProcAddress(0x75910000 [KERNEL32.DLL], "GetCurrentActCtx") called from "PYTHON27_D.DLL" at address 0x1E180A17 and returned 0x7593D521. GetProcAddress(0x75910000 [KERNEL32.DLL], "ActivateActCtx") called from "PYTHON27_D.DLL" at address 0x1E180A34 and returned 0x75925458. GetProcAddress(0x75910000 [KERNEL32.DLL], "DeactivateActCtx") called from "PYTHON27_D.DLL" at address 0x1E180A48 and returned 0x75925424. GetProcAddress(0x75910000 [KERNEL32.DLL], "AddRefActCtx") called from "PYTHON27_D.DLL" at address 0x1E180A5C and returned 0x7593D510. GetProcAddress(0x75910000 [KERNEL32.DLL], "ReleaseActCtx") called from "PYTHON27_D.DLL" at address 0x1E180A70 and returned 0x75925489. DllMain(0x1E000000, DLL_PROCESS_ATTACH, 0x002DF840) in "PYTHON27_D.DLL" returned 1 (0x1). DllMain(0x76500000, DLL_PROCESS_DETACH, 0x00000001) in "IMM32.DLL" called. DllMain(0x76500000, DLL_PROCESS_DETACH, 0x00000001) in "IMM32.DLL" returned 1 (0x1). DllMain(0x75FF0000, DLL_PROCESS_DETACH, 0x00000001) in "MSCTF.DLL" called. DllMain(0x75FF0000, DLL_PROCESS_DETACH, 0x00000001) in "MSCTF.DLL" returned 1 (0x1). DllMain(0x1E000000, DLL_PROCESS_DETACH, 0x00000001) in "PYTHON27_D.DLL" called. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90D.DLL" at address 0x6352E1DC and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90D.DLL" at address 0x6352E1DC and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90D.DLL" at address 0x6352E0DC and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90D.DLL" at address 0x6352E1DC and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90D.DLL" at address 0x6352E0DC and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90D.DLL" at address 0x6352E1DC and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90D.DLL" at address 0x6352E1DC and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90D.DLL" at address 0x6352E0DC and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90D.DLL" at address 0x6352E1DC and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90D.DLL" at address 0x6352E0DC and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90D.DLL" at address 0x6352E1DC and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90D.DLL" at address 0x6352E1DC and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90D.DLL" at address 0x6352E0DC and returned 0x77D9107B. DllMain(0x1E000000, DLL_PROCESS_DETACH, 0x00000001) in "PYTHON27_D.DLL" returned 1 (0x1). DllMain(0x76790000, DLL_PROCESS_DETACH, 0x00000001) in "SHELL32.DLL" called. DllMain(0x76790000, DLL_PROCESS_DETACH, 0x00000001) in "SHELL32.DLL" returned 1 (0x1). DllMain(0x76490000, DLL_PROCESS_DETACH, 0x00000001) in "SHLWAPI.DLL" called. DllMain(0x76490000, DLL_PROCESS_DETACH, 0x00000001) in "SHLWAPI.DLL" returned 1 (0x1). DllMain(0x76560000, DLL_PROCESS_DETACH, 0x00000001) in "USER32.DLL" called. DllMain(0x76560000, DLL_PROCESS_DETACH, 0x00000001) in "USER32.DLL" returned 1 (0x1). DllMain(0x75DB0000, DLL_PROCESS_DETACH, 0x00000001) in "ADVAPI32.DLL" called. DllMain(0x75DB0000, DLL_PROCESS_DETACH, 0x00000001) in "ADVAPI32.DLL" returned 1 (0x1). DllMain(0x760D0000, DLL_PROCESS_DETACH, 0x00000001) in "SECHOST.DLL" called. DllMain(0x760D0000, DLL_PROCESS_DETACH, 0x00000001) in "SECHOST.DLL" returned 1 (0x1). DllMain(0x75B80000, DLL_PROCESS_DETACH, 0x00000001) in "RPCRT4.DLL" called. DllMain(0x75B80000, DLL_PROCESS_DETACH, 0x00000001) in "RPCRT4.DLL" returned 1 (0x1). DllMain(0x75750000, DLL_PROCESS_DETACH, 0x00000001) in "SSPICLI.DLL" called. DllMain(0x75750000, DLL_PROCESS_DETACH, 0x00000001) in "SSPICLI.DLL" returned 1 (0x1). DllMain(0x75740000, DLL_PROCESS_DETACH, 0x00000001) in "CRYPTBASE.DLL" called. DllMain(0x75740000, DLL_PROCESS_DETACH, 0x00000001) in "CRYPTBASE.DLL" returned 1 (0x1). DllMain(0x75A60000, DLL_PROCESS_DETACH, 0x00000001) in "GDI32.DLL" called. DllMain(0x75A60000, DLL_PROCESS_DETACH, 0x00000001) in "GDI32.DLL" returned 1 (0x1). DllMain(0x764F0000, DLL_PROCESS_DETACH, 0x00000001) in "LPK.DLL" called. DllMain(0x764F0000, DLL_PROCESS_DETACH, 0x00000001) in "LPK.DLL" returned 1 (0x1). DllMain(0x77850000, DLL_PROCESS_DETACH, 0x00000001) in "USP10.DLL" called. DllMain(0x77850000, DLL_PROCESS_DETACH, 0x00000001) in "USP10.DLL" returned 1 (0x1). DllMain(0x75830000, DLL_PROCESS_DETACH, 0x00000001) in "MSVCRT.DLL" called. DllMain(0x75830000, DLL_PROCESS_DETACH, 0x00000001) in "MSVCRT.DLL" returned 1 (0x1). DllMain(0x634F0000, DLL_PROCESS_DETACH, 0x00000001) in "MSVCR90D.DLL" called. DllMain(0x634F0000, DLL_PROCESS_DETACH, 0x00000001) in "MSVCR90D.DLL" returned 1 (0x1). DllMain(0x08370000, DLL_PROCESS_DETACH, 0x00000001) in "DEPENDS.DLL" called. DllMain(0x08370000, DLL_PROCESS_DETACH, 0x00000001) in "DEPENDS.DLL" returned 1 (0x1). DllMain(0x75910000, DLL_PROCESS_DETACH, 0x00000001) in "KERNEL32.DLL" called. DllMain(0x75910000, DLL_PROCESS_DETACH, 0x00000001) in "KERNEL32.DLL" returned 1 (0x1). DllMain(0x77900000, DLL_PROCESS_DETACH, 0x00000001) in "KERNELBASE.DLL" called. DllMain(0x77900000, DLL_PROCESS_DETACH, 0x00000001) in "KERNELBASE.DLL" returned 1 (0x1). Exited "DEMO_FOR_PYTHON.EXE" (process 0xBD4) with code 0 (0x0). -------------- next part -------------- Warning: At least one delay-load dependency module was not found. Warning: At least one module has an unresolved import due to a missing export function in a delay-load dependent module. -------------------------------------------------------------------------------- Starting profile on 03/07/2014 at 12:38:08 Options Selected: Simulate ShellExecute by inserting any App Paths directories into the PATH environment variable. Log DllMain calls for process attach and process detach messages. Hook the process to gather more detailed dependency information. Log LoadLibrary function calls. Log GetProcAddress function calls. Log debug output messages. Automatically open and profile child processes. -------------------------------------------------------------------------------- Started "DEMO_FOR_PYTHON.EXE" (process 0x1FE0) at address 0x00220000. Successfully hooked module. Loaded "NTDLL.DLL" at address 0x77D50000. Successfully hooked module. Loaded "KERNEL32.DLL" at address 0x75910000. Successfully hooked module. Loaded "KERNELBASE.DLL" at address 0x77900000. Successfully hooked module. DllMain(0x77900000, DLL_PROCESS_ATTACH, 0x00000000) in "KERNELBASE.DLL" called. DllMain(0x77900000, DLL_PROCESS_ATTACH, 0x00000000) in "KERNELBASE.DLL" returned 1 (0x1). DllMain(0x75910000, DLL_PROCESS_ATTACH, 0x00000000) in "KERNEL32.DLL" called. DllMain(0x75910000, DLL_PROCESS_ATTACH, 0x00000000) in "KERNEL32.DLL" returned 1 (0x1). Injected "DEPENDS.DLL" at address 0x08370000. DllMain(0x08370000, DLL_PROCESS_ATTACH, 0x00000000) in "DEPENDS.DLL" called. DllMain(0x08370000, DLL_PROCESS_ATTACH, 0x00000000) in "DEPENDS.DLL" returned 1 (0x1). Loaded "MSVCR90.DLL" at address 0x74160000. Successfully hooked module. Loaded "PYTHON27.DLL" at address 0x1E000000. Successfully hooked module. Loaded "USER32.DLL" at address 0x76560000. Successfully hooked module. Loaded "GDI32.DLL" at address 0x75A60000. Successfully hooked module. Loaded "LPK.DLL" at address 0x764F0000. Successfully hooked module. Loaded "USP10.DLL" at address 0x77850000. Successfully hooked module. Loaded "MSVCRT.DLL" at address 0x75830000. Successfully hooked module. Loaded "ADVAPI32.DLL" at address 0x75DB0000. Successfully hooked module. Loaded "SECHOST.DLL" at address 0x760D0000. Successfully hooked module. Loaded "RPCRT4.DLL" at address 0x75B80000. Successfully hooked module. Loaded "SSPICLI.DLL" at address 0x75750000. Successfully hooked module. Loaded "CRYPTBASE.DLL" at address 0x75740000. Successfully hooked module. Loaded "SHELL32.DLL" at address 0x76790000. Successfully hooked module. Loaded "SHLWAPI.DLL" at address 0x76490000. Successfully hooked module. Entrypoint reached. All implicit modules have been loaded. DllMain(0x74160000, DLL_PROCESS_ATTACH, 0x0045FAE4) in "MSVCR90.DLL" called. GetProcAddress(0x75910000 [KERNEL32.DLL], "FlsAlloc") called from "MSVCR90.DLL" at address 0x74183ACC and returned 0x75924EF3. GetProcAddress(0x75910000 [KERNEL32.DLL], "FlsGetValue") called from "MSVCR90.DLL" at address 0x74183AD9 and returned 0x75921252. GetProcAddress(0x75910000 [KERNEL32.DLL], "FlsSetValue") called from "MSVCR90.DLL" at address 0x74183AE6 and returned 0x759241D0. GetProcAddress(0x75910000 [KERNEL32.DLL], "FlsFree") called from "MSVCR90.DLL" at address 0x74183AF3 and returned 0x7592355F. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x7418379B and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x741837AB and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "IsProcessorFeaturePresent") called from "MSVCR90.DLL" at address 0x741E386B and returned 0x759251FD. GetProcAddress(0x75910000 [KERNEL32.DLL], "FindActCtxSectionStringW") called from "MSVCR90.DLL" at address 0x74182822 and returned 0x7592A6D8. DllMain(0x74160000, DLL_PROCESS_ATTACH, 0x0045FAE4) in "MSVCR90.DLL" returned 1 (0x1). DllMain(0x75830000, DLL_PROCESS_ATTACH, 0x0045FAE4) in "MSVCRT.DLL" called. DllMain(0x75830000, DLL_PROCESS_ATTACH, 0x0045FAE4) in "MSVCRT.DLL" returned 1 (0x1). DllMain(0x77850000, DLL_PROCESS_ATTACH, 0x0045FAE4) in "USP10.DLL" called. LoadLibraryA("gdi32.dll") called from "USP10.DLL" at address 0x77866020. LoadLibraryA("gdi32.dll") returned 0x75A60000. GetProcAddress(0x75A60000 [GDI32.DLL], "GetCharABCWidthsI") called from "USP10.DLL" at address 0x77866055 and returned 0x75A799A3. DllMain(0x77850000, DLL_PROCESS_ATTACH, 0x0045FAE4) in "USP10.DLL" returned 1 (0x1). DllMain(0x764F0000, DLL_PROCESS_ATTACH, 0x0045FAE4) in "LPK.DLL" called. DllMain(0x764F0000, DLL_PROCESS_ATTACH, 0x0045FAE4) in "LPK.DLL" returned 1 (0x1). DllMain(0x75A60000, DLL_PROCESS_ATTACH, 0x0045FAE4) in "GDI32.DLL" called. DllMain(0x75A60000, DLL_PROCESS_ATTACH, 0x0045FAE4) in "GDI32.DLL" returned 1 (0x1). DllMain(0x75740000, DLL_PROCESS_ATTACH, 0x0045FAE4) in "CRYPTBASE.DLL" called. DllMain(0x75740000, DLL_PROCESS_ATTACH, 0x0045FAE4) in "CRYPTBASE.DLL" returned 1 (0x1). DllMain(0x75750000, DLL_PROCESS_ATTACH, 0x0045FAE4) in "SSPICLI.DLL" called. DllMain(0x75750000, DLL_PROCESS_ATTACH, 0x0045FAE4) in "SSPICLI.DLL" returned 1 (0x1). DllMain(0x75B80000, DLL_PROCESS_ATTACH, 0x0045FAE4) in "RPCRT4.DLL" called. DllMain(0x75B80000, DLL_PROCESS_ATTACH, 0x0045FAE4) in "RPCRT4.DLL" returned 1975101185 (0x75B9A701). DllMain(0x760D0000, DLL_PROCESS_ATTACH, 0x0045FAE4) in "SECHOST.DLL" called. DllMain(0x760D0000, DLL_PROCESS_ATTACH, 0x0045FAE4) in "SECHOST.DLL" returned 1 (0x1). DllMain(0x75DB0000, DLL_PROCESS_ATTACH, 0x0045FAE4) in "ADVAPI32.DLL" called. DllMain(0x75DB0000, DLL_PROCESS_ATTACH, 0x0045FAE4) in "ADVAPI32.DLL" returned 1 (0x1). DllMain(0x76560000, DLL_PROCESS_ATTACH, 0x0045FAE4) in "USER32.DLL" called. LoadLibraryW("C:\Windows\system32\IMM32.DLL") called from "USER32.DLL" at address 0x7657CF0E. Loaded "IMM32.DLL" at address 0x76500000. Successfully hooked module. Loaded "MSCTF.DLL" at address 0x75FF0000. Successfully hooked module. DllMain(0x75FF0000, DLL_PROCESS_ATTACH, 0x00000000) in "MSCTF.DLL" called. DllMain(0x75FF0000, DLL_PROCESS_ATTACH, 0x00000000) in "MSCTF.DLL" returned 1 (0x1). DllMain(0x76500000, DLL_PROCESS_ATTACH, 0x00000000) in "IMM32.DLL" called. GetProcAddress(0x76500000 [IMM32.DLL], "ImmWINNLSEnableIME") called from "USER32.DLL" at address 0x7657C312 and returned 0x7651F637. GetProcAddress(0x76500000 [IMM32.DLL], "ImmWINNLSGetEnableStatus") called from "USER32.DLL" at address 0x7657C327 and returned 0x7651F65E. GetProcAddress(0x76500000 [IMM32.DLL], "ImmSendIMEMessageExW") called from "USER32.DLL" at address 0x7657C33C and returned 0x7651F8EC. GetProcAddress(0x76500000 [IMM32.DLL], "ImmSendIMEMessageExA") called from "USER32.DLL" at address 0x7657C351 and returned 0x7651F907. GetProcAddress(0x76500000 [IMM32.DLL], "ImmIMPGetIMEW") called from "USER32.DLL" at address 0x7657C366 and returned 0x7651FB65. GetProcAddress(0x76500000 [IMM32.DLL], "ImmIMPGetIMEA") called from "USER32.DLL" at address 0x7657C37B and returned 0x7651FB99. GetProcAddress(0x76500000 [IMM32.DLL], "ImmIMPQueryIMEW") called from "USER32.DLL" at address 0x7657C390 and returned 0x7651F9CA. GetProcAddress(0x76500000 [IMM32.DLL], "ImmIMPQueryIMEA") called from "USER32.DLL" at address 0x7657C3A5 and returned 0x7651FAD6. GetProcAddress(0x76500000 [IMM32.DLL], "ImmIMPSetIMEW") called from "USER32.DLL" at address 0x7657C3BA and returned 0x7651F746. GetProcAddress(0x76500000 [IMM32.DLL], "ImmIMPSetIMEA") called from "USER32.DLL" at address 0x7657C3CF and returned 0x7651F86E. GetProcAddress(0x76500000 [IMM32.DLL], "ImmAssociateContext") called from "USER32.DLL" at address 0x7657C3E4 and returned 0x76513540. GetProcAddress(0x76500000 [IMM32.DLL], "ImmEscapeA") called from "USER32.DLL" at address 0x7657C3F9 and returned 0x76519327. GetProcAddress(0x76500000 [IMM32.DLL], "ImmEscapeW") called from "USER32.DLL" at address 0x7657C40E and returned 0x765195A9. GetProcAddress(0x76500000 [IMM32.DLL], "ImmGetCompositionStringA") called from "USER32.DLL" at address 0x7657C423 and returned 0x76517A37. GetProcAddress(0x76500000 [IMM32.DLL], "ImmGetCompositionStringW") called from "USER32.DLL" at address 0x7657C438 and returned 0x7651420C. GetProcAddress(0x76500000 [IMM32.DLL], "ImmGetCompositionWindow") called from "USER32.DLL" at address 0x7657C44D and returned 0x76512E79. GetProcAddress(0x76500000 [IMM32.DLL], "ImmGetContext") called from "USER32.DLL" at address 0x7657C462 and returned 0x76512084. GetProcAddress(0x76500000 [IMM32.DLL], "ImmGetDefaultIMEWnd") called from "USER32.DLL" at address 0x7657C477 and returned 0x76511F9D. GetProcAddress(0x76500000 [IMM32.DLL], "ImmIsIME") called from "USER32.DLL" at address 0x7657C48C and returned 0x76512FC7. GetProcAddress(0x76500000 [IMM32.DLL], "ImmReleaseContext") called from "USER32.DLL" at address 0x7657C4A1 and returned 0x765121A2. GetProcAddress(0x76500000 [IMM32.DLL], "ImmRegisterClient") called from "USER32.DLL" at address 0x7657C4B6 and returned 0x76511346. GetProcAddress(0x76500000 [IMM32.DLL], "ImmGetCompositionFontW") called from "USER32.DLL" at address 0x7657C4CB and returned 0x765168C8. GetProcAddress(0x76500000 [IMM32.DLL], "ImmGetCompositionFontA") called from "USER32.DLL" at address 0x7657C4E0 and returned 0x7651682C. GetProcAddress(0x76500000 [IMM32.DLL], "ImmSetCompositionFontW") called from "USER32.DLL" at address 0x7657C4F5 and returned 0x76513938. GetProcAddress(0x76500000 [IMM32.DLL], "ImmSetCompositionFontA") called from "USER32.DLL" at address 0x7657C50A and returned 0x76516964. GetProcAddress(0x76500000 [IMM32.DLL], "ImmSetCompositionWindow") called from "USER32.DLL" at address 0x7657C51F and returned 0x765138AA. GetProcAddress(0x76500000 [IMM32.DLL], "ImmNotifyIME") called from "USER32.DLL" at address 0x7657C534 and returned 0x76513C6C. GetProcAddress(0x76500000 [IMM32.DLL], "ImmLockIMC") called from "USER32.DLL" at address 0x7657C549 and returned 0x76511E7D. GetProcAddress(0x76500000 [IMM32.DLL], "ImmUnlockIMC") called from "USER32.DLL" at address 0x7657C55E and returned 0x76511E95. GetProcAddress(0x76500000 [IMM32.DLL], "ImmLoadIME") called from "USER32.DLL" at address 0x7657C573 and returned 0x7651197A. GetProcAddress(0x76500000 [IMM32.DLL], "ImmSetOpenStatus") called from "USER32.DLL" at address 0x7657C588 and returned 0x76513FF3. GetProcAddress(0x76500000 [IMM32.DLL], "ImmFreeLayout") called from "USER32.DLL" at address 0x7657C59D and returned 0x765197EF. GetProcAddress(0x76500000 [IMM32.DLL], "ImmActivateLayout") called from "USER32.DLL" at address 0x7657C5B2 and returned 0x76518DF5. GetProcAddress(0x76500000 [IMM32.DLL], "ImmGetCandidateWindow") called from "USER32.DLL" at address 0x7657C5C7 and returned 0x76512EBC. GetProcAddress(0x76500000 [IMM32.DLL], "ImmSetCandidateWindow") called from "USER32.DLL" at address 0x7657C5DC and returned 0x76513E02. GetProcAddress(0x76500000 [IMM32.DLL], "ImmConfigureIMEW") called from "USER32.DLL" at address 0x7657C5F1 and returned 0x7651913F. GetProcAddress(0x76500000 [IMM32.DLL], "ImmGetConversionStatus") called from "USER32.DLL" at address 0x7657C606 and returned 0x765124E9. GetProcAddress(0x76500000 [IMM32.DLL], "ImmSetConversionStatus") called from "USER32.DLL" at address 0x7657C61B and returned 0x76513EE6. GetProcAddress(0x76500000 [IMM32.DLL], "ImmSetStatusWindowPos") called from "USER32.DLL" at address 0x7657C630 and returned 0x76516A7C. GetProcAddress(0x76500000 [IMM32.DLL], "ImmGetImeInfoEx") called from "USER32.DLL" at address 0x7657C645 and returned 0x765114D8. GetProcAddress(0x76500000 [IMM32.DLL], "ImmLockImeDpi") called from "USER32.DLL" at address 0x7657C65A and returned 0x76512025. GetProcAddress(0x76500000 [IMM32.DLL], "ImmUnlockImeDpi") called from "USER32.DLL" at address 0x7657C66F and returned 0x76511FD8. GetProcAddress(0x76500000 [IMM32.DLL], "ImmGetOpenStatus") called from "USER32.DLL" at address 0x7657C684 and returned 0x76513DCF. GetProcAddress(0x76500000 [IMM32.DLL], "ImmSetActiveContext") called from "USER32.DLL" at address 0x7657C699 and returned 0x76512246. GetProcAddress(0x76500000 [IMM32.DLL], "ImmTranslateMessage") called from "USER32.DLL" at address 0x7657C6AE and returned 0x7651F27F. GetProcAddress(0x76500000 [IMM32.DLL], "ImmLoadLayout") called from "USER32.DLL" at address 0x7657C6C3 and returned 0x76519E79. GetProcAddress(0x76500000 [IMM32.DLL], "ImmProcessKey") called from "USER32.DLL" at address 0x7657C6D8 and returned 0x76513A3C. GetProcAddress(0x76500000 [IMM32.DLL], "ImmPutImeMenuItemsIntoMappedFile") called from "USER32.DLL" at address 0x7657C6ED and returned 0x76524E96. GetProcAddress(0x76500000 [IMM32.DLL], "ImmGetProperty") called from "USER32.DLL" at address 0x7657C702 and returned 0x76513BB8. GetProcAddress(0x76500000 [IMM32.DLL], "ImmSetCompositionStringA") called from "USER32.DLL" at address 0x7657C717 and returned 0x765183C2. GetProcAddress(0x76500000 [IMM32.DLL], "ImmSetCompositionStringW") called from "USER32.DLL" at address 0x7657C72C and returned 0x765183E9. GetProcAddress(0x76500000 [IMM32.DLL], "ImmEnumInputContext") called from "USER32.DLL" at address 0x7657C741 and returned 0x765131DD. GetProcAddress(0x76500000 [IMM32.DLL], "ImmSystemHandler") called from "USER32.DLL" at address 0x7657C756 and returned 0x7651B1CF. GetProcAddress(0x76500000 [IMM32.DLL], "CtfImmTIMActivate") called from "USER32.DLL" at address 0x7657C767 and returned 0x76511888. GetProcAddress(0x76500000 [IMM32.DLL], "CtfImmRestoreToolbarWnd") called from "USER32.DLL" at address 0x7657C778 and returned 0x76525114. GetProcAddress(0x76500000 [IMM32.DLL], "CtfImmHideToolbarWnd") called from "USER32.DLL" at address 0x7657C789 and returned 0x7652514B. GetProcAddress(0x76500000 [IMM32.DLL], "CtfImmDispatchDefImeMessage") called from "USER32.DLL" at address 0x7657C79A and returned 0x7651163C. GetProcAddress(0x76500000 [IMM32.DLL], "CtfImmNotify") called from "USER32.DLL" at address 0x7657C7AB and returned 0x765115D0. GetProcAddress(0x76500000 [IMM32.DLL], "CtfImmSetDefaultRemoteKeyboardLayout") called from "USER32.DLL" at address 0x7657C7BC and returned 0x765253CC. GetProcAddress(0x76500000 [IMM32.DLL], "CtfImmGetCompatibleKeyboardLayout") called from "USER32.DLL" at address 0x7657C7CD and returned 0x765253DC. DllMain(0x76500000, DLL_PROCESS_ATTACH, 0x00000000) in "IMM32.DLL" returned 1 (0x1). LoadLibraryW("C:\Windows\system32\IMM32.DLL") returned 0x76500000. GetProcAddress(0x764F0000 [LPK.DLL], "LpkTabbedTextOut") called from "GDI32.DLL" at address 0x75A76970 and returned 0x764F48A0. GetProcAddress(0x764F0000 [LPK.DLL], "LpkPSMTextOut") called from "GDI32.DLL" at address 0x75A7697B and returned 0x764F1430. GetProcAddress(0x764F0000 [LPK.DLL], "LpkDrawTextEx") called from "GDI32.DLL" at address 0x75A76986 and returned 0x764F13D0. GetProcAddress(0x764F0000 [LPK.DLL], "LpkEditControl") called from "GDI32.DLL" at address 0x75A76991 and returned 0x764F7000. DllMain(0x76560000, DLL_PROCESS_ATTACH, 0x0045FAE4) in "USER32.DLL" returned 1 (0x1). DllMain(0x76490000, DLL_PROCESS_ATTACH, 0x0045FAE4) in "SHLWAPI.DLL" called. DllMain(0x76490000, DLL_PROCESS_ATTACH, 0x0045FAE4) in "SHLWAPI.DLL" returned 1 (0x1). DllMain(0x76790000, DLL_PROCESS_ATTACH, 0x0045FAE4) in "SHELL32.DLL" called. DllMain(0x76790000, DLL_PROCESS_ATTACH, 0x0045FAE4) in "SHELL32.DLL" returned 1 (0x1). DllMain(0x1E000000, DLL_PROCESS_ATTACH, 0x0045FAE4) in "PYTHON27.DLL" called. GetProcAddress(0x75910000 [KERNEL32.DLL], "GetCurrentActCtx") called from "PYTHON27.DLL" at address 0x1E0DE6E1 and returned 0x7593D521. GetProcAddress(0x75910000 [KERNEL32.DLL], "ActivateActCtx") called from "PYTHON27.DLL" at address 0x1E0DE6F7 and returned 0x75925458. GetProcAddress(0x75910000 [KERNEL32.DLL], "DeactivateActCtx") called from "PYTHON27.DLL" at address 0x1E0DE704 and returned 0x75925424. GetProcAddress(0x75910000 [KERNEL32.DLL], "AddRefActCtx") called from "PYTHON27.DLL" at address 0x1E0DE711 and returned 0x7593D510. GetProcAddress(0x75910000 [KERNEL32.DLL], "ReleaseActCtx") called from "PYTHON27.DLL" at address 0x1E0DE71E and returned 0x75925489. DllMain(0x1E000000, DLL_PROCESS_ATTACH, 0x0045FAE4) in "PYTHON27.DLL" returned 1 (0x1). LoadLibraryExA("C:\Python27\lib\site-packages\numpy\core\multiarray.pyd", 0x00000000, LOAD_WITH_ALTERED_SEARCH_PATH) called from "PYTHON27.DLL" at address 0x1E0F970F. Loaded "MULTIARRAY.PYD" at address 0x10000000. Successfully hooked module. DllMain(0x10000000, DLL_PROCESS_ATTACH, 0x00000000) in "MULTIARRAY.PYD" called. DllMain(0x10000000, DLL_PROCESS_ATTACH, 0x00000000) in "MULTIARRAY.PYD" returned 1 (0x1). LoadLibraryExA("C:\Python27\lib\site-packages\numpy\core\multiarray.pyd", 0x00000000, LOAD_WITH_ALTERED_SEARCH_PATH) returned 0x10000000. GetProcAddress(0x10000000 [MULTIARRAY.PYD], "initmultiarray") called from "PYTHON27.DLL" at address 0x1E0F98BF and returned 0x100915D0. LoadLibraryExA("C:\Python27\lib\site-packages\numpy\core\umath.pyd", 0x00000000, LOAD_WITH_ALTERED_SEARCH_PATH) called from "PYTHON27.DLL" at address 0x1E0F970F. Loaded "UMATH.PYD" at address 0x023A0000. Successfully hooked module. DllMain(0x023A0000, DLL_PROCESS_ATTACH, 0x00000000) in "UMATH.PYD" called. DllMain(0x023A0000, DLL_PROCESS_ATTACH, 0x00000000) in "UMATH.PYD" returned 1 (0x1). LoadLibraryExA("C:\Python27\lib\site-packages\numpy\core\umath.pyd", 0x00000000, LOAD_WITH_ALTERED_SEARCH_PATH) returned 0x023A0000. GetProcAddress(0x023A0000 [UMATH.PYD], "initumath") called from "PYTHON27.DLL" at address 0x1E0F98BF and returned 0x023A7CE0. LoadLibraryExA("C:\Python27\lib\site-packages\numpy\core\_dotblas.pyd", 0x00000000, LOAD_WITH_ALTERED_SEARCH_PATH) called from "PYTHON27.DLL" at address 0x1E0F970F. Loaded "_DOTBLAS.PYD" at address 0x02400000. Successfully hooked module. Loaded "LIBIOMP5MD.DLL" at address 0x028F0000. Successfully hooked module. DllMain(0x028F0000, DLL_PROCESS_ATTACH, 0x00000000) in "LIBIOMP5MD.DLL" called. GetProcAddress(0x75910000 [KERNEL32.DLL], "FlsAlloc") called from "LIBIOMP5MD.DLL" at address 0x02993113 and returned 0x75924EF3. GetProcAddress(0x75910000 [KERNEL32.DLL], "FlsGetValue") called from "LIBIOMP5MD.DLL" at address 0x02993120 and returned 0x75921252. GetProcAddress(0x75910000 [KERNEL32.DLL], "FlsSetValue") called from "LIBIOMP5MD.DLL" at address 0x0299312D and returned 0x759241D0. GetProcAddress(0x75910000 [KERNEL32.DLL], "FlsFree") called from "LIBIOMP5MD.DLL" at address 0x0299313A and returned 0x7592355F. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "LIBIOMP5MD.DLL" at address 0x02992C79 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "LIBIOMP5MD.DLL" at address 0x02992C79 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "LIBIOMP5MD.DLL" at address 0x02992C79 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "LIBIOMP5MD.DLL" at address 0x02992C79 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "LIBIOMP5MD.DLL" at address 0x02992C79 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "LIBIOMP5MD.DLL" at address 0x02992C79 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "LIBIOMP5MD.DLL" at address 0x02992C79 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "LIBIOMP5MD.DLL" at address 0x02992CF4 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "LIBIOMP5MD.DLL" at address 0x02992CF4 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "LIBIOMP5MD.DLL" at address 0x02992E05 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "LIBIOMP5MD.DLL" at address 0x02992E15 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "IsProcessorFeaturePresent") called from "LIBIOMP5MD.DLL" at address 0x02996805 and returned 0x759251FD. DllMain(0x028F0000, DLL_PROCESS_ATTACH, 0x00000000) in "LIBIOMP5MD.DLL" returned 1 (0x1). DllMain(0x02400000, DLL_PROCESS_ATTACH, 0x00000000) in "_DOTBLAS.PYD" called. DllMain(0x02400000, DLL_PROCESS_ATTACH, 0x00000000) in "_DOTBLAS.PYD" returned 1 (0x1). LoadLibraryExA("C:\Python27\lib\site-packages\numpy\core\_dotblas.pyd", 0x00000000, LOAD_WITH_ALTERED_SEARCH_PATH) returned 0x02400000. GetProcAddress(0x02400000 [_DOTBLAS.PYD], "init_dotblas") called from "PYTHON27.DLL" at address 0x1E0F98BF and returned 0x02403560. LoadLibraryExA("C:\Python27\lib\site-packages\numpy\core\scalarmath.pyd", 0x00000000, LOAD_WITH_ALTERED_SEARCH_PATH) called from "PYTHON27.DLL" at address 0x1E0F970F. Loaded "SCALARMATH.PYD" at address 0x00330000. Successfully hooked module. DllMain(0x00330000, DLL_PROCESS_ATTACH, 0x00000000) in "SCALARMATH.PYD" called. DllMain(0x00330000, DLL_PROCESS_ATTACH, 0x00000000) in "SCALARMATH.PYD" returned 1 (0x1). LoadLibraryExA("C:\Python27\lib\site-packages\numpy\core\scalarmath.pyd", 0x00000000, LOAD_WITH_ALTERED_SEARCH_PATH) returned 0x00330000. GetProcAddress(0x00330000 [SCALARMATH.PYD], "initscalarmath") called from "PYTHON27.DLL" at address 0x1E0F98BF and returned 0x0034CCE0. LoadLibraryExA("C:\Python27\lib\site-packages\numpy\lib\_compiled_base.pyd", 0x00000000, LOAD_WITH_ALTERED_SEARCH_PATH) called from "PYTHON27.DLL" at address 0x1E0F970F. Loaded "_COMPILED_BASE.PYD" at address 0x00150000. Successfully hooked module. DllMain(0x00150000, DLL_PROCESS_ATTACH, 0x00000000) in "_COMPILED_BASE.PYD" called. DllMain(0x00150000, DLL_PROCESS_ATTACH, 0x00000000) in "_COMPILED_BASE.PYD" returned 1 (0x1). LoadLibraryExA("C:\Python27\lib\site-packages\numpy\lib\_compiled_base.pyd", 0x00000000, LOAD_WITH_ALTERED_SEARCH_PATH) returned 0x00150000. GetProcAddress(0x00150000 [_COMPILED_BASE.PYD], "init_compiled_base") called from "PYTHON27.DLL" at address 0x1E0F98BF and returned 0x001540A0. LoadLibraryExA("C:\Python27\lib\site-packages\numpy\linalg\lapack_lite.pyd", 0x00000000, LOAD_WITH_ALTERED_SEARCH_PATH) called from "PYTHON27.DLL" at address 0x1E0F970F. Loaded "LAPACK_LITE.PYD" at address 0x02E60000. Successfully hooked module. DllMain(0x02E60000, DLL_PROCESS_ATTACH, 0x00000000) in "LAPACK_LITE.PYD" called. DllMain(0x02E60000, DLL_PROCESS_ATTACH, 0x00000000) in "LAPACK_LITE.PYD" returned 1 (0x1). LoadLibraryExA("C:\Python27\lib\site-packages\numpy\linalg\lapack_lite.pyd", 0x00000000, LOAD_WITH_ALTERED_SEARCH_PATH) returned 0x02E60000. GetProcAddress(0x02E60000 [LAPACK_LITE.PYD], "initlapack_lite") called from "PYTHON27.DLL" at address 0x1E0F98BF and returned 0x02E63050. LoadLibraryExA("C:\Python27\lib\site-packages\numpy\linalg\_umath_linalg.pyd", 0x00000000, LOAD_WITH_ALTERED_SEARCH_PATH) called from "PYTHON27.DLL" at address 0x1E0F970F. Loaded "_UMATH_LINALG.PYD" at address 0x03780000. Successfully hooked module. DllMain(0x03780000, DLL_PROCESS_ATTACH, 0x00000000) in "_UMATH_LINALG.PYD" called. DllMain(0x03780000, DLL_PROCESS_ATTACH, 0x00000000) in "_UMATH_LINALG.PYD" returned 1 (0x1). LoadLibraryExA("C:\Python27\lib\site-packages\numpy\linalg\_umath_linalg.pyd", 0x00000000, LOAD_WITH_ALTERED_SEARCH_PATH) returned 0x03780000. GetProcAddress(0x03780000 [_UMATH_LINALG.PYD], "init_umath_linalg") called from "PYTHON27.DLL" at address 0x1E0F98BF and returned 0x03788C90. LoadLibraryExA("C:\Python27\lib\site-packages\numpy\fft\fftpack_lite.pyd", 0x00000000, LOAD_WITH_ALTERED_SEARCH_PATH) called from "PYTHON27.DLL" at address 0x1E0F970F. Loaded "FFTPACK_LITE.PYD" at address 0x00520000. Successfully hooked module. DllMain(0x00520000, DLL_PROCESS_ATTACH, 0x00000000) in "FFTPACK_LITE.PYD" called. DllMain(0x00520000, DLL_PROCESS_ATTACH, 0x00000000) in "FFTPACK_LITE.PYD" returned 1 (0x1). LoadLibraryExA("C:\Python27\lib\site-packages\numpy\fft\fftpack_lite.pyd", 0x00000000, LOAD_WITH_ALTERED_SEARCH_PATH) returned 0x00520000. GetProcAddress(0x00520000 [FFTPACK_LITE.PYD], "initfftpack_lite") called from "PYTHON27.DLL" at address 0x1E0F98BF and returned 0x00521AB0. LoadLibraryExA("C:\Python27\lib\site-packages\numpy\random\mtrand.pyd", 0x00000000, LOAD_WITH_ALTERED_SEARCH_PATH) called from "PYTHON27.DLL" at address 0x1E0F970F. Loaded "MTRAND.PYD" at address 0x048E0000. Successfully hooked module. DllMain(0x048E0000, DLL_PROCESS_ATTACH, 0x00000000) in "MTRAND.PYD" called. DllMain(0x048E0000, DLL_PROCESS_ATTACH, 0x00000000) in "MTRAND.PYD" returned 1 (0x1). LoadLibraryExA("C:\Python27\lib\site-packages\numpy\random\mtrand.pyd", 0x00000000, LOAD_WITH_ALTERED_SEARCH_PATH) returned 0x048E0000. GetProcAddress(0x048E0000 [MTRAND.PYD], "initmtrand") called from "PYTHON27.DLL" at address 0x1E0F98BF and returned 0x048F8D10. LoadLibraryExA("CRYPTSP.dll", 0x00000000, 0x00000000) called from "ADVAPI32.DLL" at address 0x75DC46D9. Loaded "CRYPTSP.DLL" at address 0x70AF0000. Successfully hooked module. DllMain(0x70AF0000, DLL_PROCESS_ATTACH, 0x00000000) in "CRYPTSP.DLL" called. DllMain(0x70AF0000, DLL_PROCESS_ATTACH, 0x00000000) in "CRYPTSP.DLL" returned 1 (0x1). LoadLibraryExA("CRYPTSP.dll", 0x00000000, 0x00000000) returned 0x70AF0000. GetProcAddress(0x70AF0000 [CRYPTSP.DLL], "CryptAcquireContextA") called from "ADVAPI32.DLL" at address 0x75DC473D and returned 0x70AF42A3. Loaded "RSAENH.DLL" at address 0x70A70000. Successfully hooked module. DllMain(0x70A70000, DLL_PROCESS_ATTACH, 0x00000000) in "RSAENH.DLL" called. DllMain(0x70A70000, DLL_PROCESS_ATTACH, 0x00000000) in "RSAENH.DLL" returned 1 (0x1). LoadLibraryExA("ADVAPI32.dll", 0x00000000, 0x00000000) called from "RSAENH.DLL" at address 0x70A758D0. LoadLibraryExA("ADVAPI32.dll", 0x00000000, 0x00000000) returned 0x75DB0000. GetProcAddress(0x75DB0000 [ADVAPI32.DLL], "OpenThreadToken") called from "RSAENH.DLL" at address 0x70A757EF and returned 0x75DC42AC. GetProcAddress(0x75DB0000 [ADVAPI32.DLL], "OpenProcessToken") called from "RSAENH.DLL" at address 0x70A757EF and returned 0x75DC4284. GetProcAddress(0x75DB0000 [ADVAPI32.DLL], "GetTokenInformation") called from "RSAENH.DLL" at address 0x70A757EF and returned 0x75DC429C. GetProcAddress(0x75DB0000 [ADVAPI32.DLL], "AllocateAndInitializeSid") called from "RSAENH.DLL" at address 0x70A757EF and returned 0x75DC4066. GetProcAddress(0x75DB0000 [ADVAPI32.DLL], "EqualSid") called from "RSAENH.DLL" at address 0x70A757EF and returned 0x75DC408B. GetProcAddress(0x75DB0000 [ADVAPI32.DLL], "FreeSid") called from "RSAENH.DLL" at address 0x70A757EF and returned 0x75DC40AE. LoadLibraryExA("CRYPTBASE.dll", 0x00000000, 0x00000000) called from "RSAENH.DLL" at address 0x70A758D0. LoadLibraryExA("CRYPTBASE.dll", 0x00000000, 0x00000000) returned 0x75740000. GetProcAddress(0x75740000 [CRYPTBASE.DLL], "SystemFunction036") called from "RSAENH.DLL" at address 0x70A757EF and returned 0x757412F0. GetProcAddress(0x70AF0000 [CRYPTSP.DLL], "CryptGenRandom") called from "ADVAPI32.DLL" at address 0x75DC473D and returned 0x70AF4F73. GetProcAddress(0x70AF0000 [CRYPTSP.DLL], "CryptReleaseContext") called from "ADVAPI32.DLL" at address 0x75DC473D and returned 0x70AF2EF0. LoadLibraryExA("C:\Python27\DLLs\_ctypes.pyd", 0x00000000, LOAD_WITH_ALTERED_SEARCH_PATH) called from "PYTHON27.DLL" at address 0x1E0F970F. Loaded "_CTYPES.PYD" at address 0x1D1A0000. Successfully hooked module. Loaded "OLE32.DLL" at address 0x774B0000. Successfully hooked module. Loaded "OLEAUT32.DLL" at address 0x75AF0000. Successfully hooked module. DllMain(0x774B0000, DLL_PROCESS_ATTACH, 0x00000000) in "OLE32.DLL" called. DllMain(0x774B0000, DLL_PROCESS_ATTACH, 0x00000000) in "OLE32.DLL" returned 1 (0x1). DllMain(0x75AF0000, DLL_PROCESS_ATTACH, 0x00000000) in "OLEAUT32.DLL" called. DllMain(0x75AF0000, DLL_PROCESS_ATTACH, 0x00000000) in "OLEAUT32.DLL" returned 1 (0x1). DllMain(0x1D1A0000, DLL_PROCESS_ATTACH, 0x00000000) in "_CTYPES.PYD" called. DllMain(0x1D1A0000, DLL_PROCESS_ATTACH, 0x00000000) in "_CTYPES.PYD" returned 1 (0x1). LoadLibraryExA("C:\Python27\DLLs\_ctypes.pyd", 0x00000000, LOAD_WITH_ALTERED_SEARCH_PATH) returned 0x1D1A0000. GetProcAddress(0x1D1A0000 [_CTYPES.PYD], "init_ctypes") called from "PYTHON27.DLL" at address 0x1E0F98BF and returned 0x1D1A7130. LoadLibraryA("kernel32") called from "_CTYPES.PYD" at address 0x1D1A96B2. LoadLibraryA("kernel32") returned 0x75910000. GetProcAddress(0x75910000 [KERNEL32.DLL], "GetLastError") called from "_CTYPES.PYD" at address 0x1D1A445F and returned 0x759211C0. DllMain(0x1D1A0000, DLL_PROCESS_DETACH, 0x00000001) in "_CTYPES.PYD" called. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. DllMain(0x1D1A0000, DLL_PROCESS_DETACH, 0x00000001) in "_CTYPES.PYD" returned 1 (0x1). DllMain(0x75AF0000, DLL_PROCESS_DETACH, 0x00000001) in "OLEAUT32.DLL" called. DllMain(0x75AF0000, DLL_PROCESS_DETACH, 0x00000001) in "OLEAUT32.DLL" returned 1 (0x1). DllMain(0x774B0000, DLL_PROCESS_DETACH, 0x00000001) in "OLE32.DLL" called. DllMain(0x774B0000, DLL_PROCESS_DETACH, 0x00000001) in "OLE32.DLL" returned 1 (0x1). DllMain(0x70A70000, DLL_PROCESS_DETACH, 0x00000001) in "RSAENH.DLL" called. DllMain(0x70A70000, DLL_PROCESS_DETACH, 0x00000001) in "RSAENH.DLL" returned 1 (0x1). DllMain(0x70AF0000, DLL_PROCESS_DETACH, 0x00000001) in "CRYPTSP.DLL" called. DllMain(0x70AF0000, DLL_PROCESS_DETACH, 0x00000001) in "CRYPTSP.DLL" returned 1 (0x1). DllMain(0x048E0000, DLL_PROCESS_DETACH, 0x00000001) in "MTRAND.PYD" called. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. DllMain(0x048E0000, DLL_PROCESS_DETACH, 0x00000001) in "MTRAND.PYD" returned 1 (0x1). DllMain(0x00520000, DLL_PROCESS_DETACH, 0x00000001) in "FFTPACK_LITE.PYD" called. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. DllMain(0x00520000, DLL_PROCESS_DETACH, 0x00000001) in "FFTPACK_LITE.PYD" returned 1 (0x1). DllMain(0x03780000, DLL_PROCESS_DETACH, 0x00000001) in "_UMATH_LINALG.PYD" called. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. DllMain(0x03780000, DLL_PROCESS_DETACH, 0x00000001) in "_UMATH_LINALG.PYD" returned 1 (0x1). DllMain(0x02E60000, DLL_PROCESS_DETACH, 0x00000001) in "LAPACK_LITE.PYD" called. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. DllMain(0x02E60000, DLL_PROCESS_DETACH, 0x00000001) in "LAPACK_LITE.PYD" returned 1 (0x1). DllMain(0x00150000, DLL_PROCESS_DETACH, 0x00000001) in "_COMPILED_BASE.PYD" called. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. DllMain(0x00150000, DLL_PROCESS_DETACH, 0x00000001) in "_COMPILED_BASE.PYD" returned 1 (0x1). DllMain(0x00330000, DLL_PROCESS_DETACH, 0x00000001) in "SCALARMATH.PYD" called. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. DllMain(0x00330000, DLL_PROCESS_DETACH, 0x00000001) in "SCALARMATH.PYD" returned 1 (0x1). DllMain(0x02400000, DLL_PROCESS_DETACH, 0x00000001) in "_DOTBLAS.PYD" called. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. DllMain(0x02400000, DLL_PROCESS_DETACH, 0x00000001) in "_DOTBLAS.PYD" returned 1 (0x1). DllMain(0x028F0000, DLL_PROCESS_DETACH, 0x00000001) in "LIBIOMP5MD.DLL" called. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "LIBIOMP5MD.DLL" at address 0x02992CF4 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "LIBIOMP5MD.DLL" at address 0x02992CF4 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "LIBIOMP5MD.DLL" at address 0x02992C79 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "LIBIOMP5MD.DLL" at address 0x02992CF4 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "LIBIOMP5MD.DLL" at address 0x02992C79 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "LIBIOMP5MD.DLL" at address 0x02992CF4 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "LIBIOMP5MD.DLL" at address 0x02992CF4 and returned 0x77D89DD5. DllMain(0x028F0000, DLL_PROCESS_DETACH, 0x00000001) in "LIBIOMP5MD.DLL" returned 1 (0x1). DllMain(0x023A0000, DLL_PROCESS_DETACH, 0x00000001) in "UMATH.PYD" called. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. DllMain(0x023A0000, DLL_PROCESS_DETACH, 0x00000001) in "UMATH.PYD" returned 1 (0x1). DllMain(0x10000000, DLL_PROCESS_DETACH, 0x00000001) in "MULTIARRAY.PYD" called. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. DllMain(0x10000000, DLL_PROCESS_DETACH, 0x00000001) in "MULTIARRAY.PYD" returned 1 (0x1). DllMain(0x76500000, DLL_PROCESS_DETACH, 0x00000001) in "IMM32.DLL" called. DllMain(0x76500000, DLL_PROCESS_DETACH, 0x00000001) in "IMM32.DLL" returned 1 (0x1). DllMain(0x75FF0000, DLL_PROCESS_DETACH, 0x00000001) in "MSCTF.DLL" called. DllMain(0x75FF0000, DLL_PROCESS_DETACH, 0x00000001) in "MSCTF.DLL" returned 1 (0x1). DllMain(0x1E000000, DLL_PROCESS_DETACH, 0x00000001) in "PYTHON27.DLL" called. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "DecodePointer") called from "MSVCR90.DLL" at address 0x74183667 and returned 0x77D89DD5. GetProcAddress(0x75910000 [KERNEL32.DLL], "EncodePointer") called from "MSVCR90.DLL" at address 0x741835E2 and returned 0x77D9107B. DllMain(0x1E000000, DLL_PROCESS_DETACH, 0x00000001) in "PYTHON27.DLL" returned 1 (0x1). DllMain(0x76790000, DLL_PROCESS_DETACH, 0x00000001) in "SHELL32.DLL" called. DllMain(0x76790000, DLL_PROCESS_DETACH, 0x00000001) in "SHELL32.DLL" returned 1 (0x1). DllMain(0x76490000, DLL_PROCESS_DETACH, 0x00000001) in "SHLWAPI.DLL" called. DllMain(0x76490000, DLL_PROCESS_DETACH, 0x00000001) in "SHLWAPI.DLL" returned 1 (0x1). DllMain(0x76560000, DLL_PROCESS_DETACH, 0x00000001) in "USER32.DLL" called. DllMain(0x76560000, DLL_PROCESS_DETACH, 0x00000001) in "USER32.DLL" returned 1 (0x1). DllMain(0x75DB0000, DLL_PROCESS_DETACH, 0x00000001) in "ADVAPI32.DLL" called. DllMain(0x75DB0000, DLL_PROCESS_DETACH, 0x00000001) in "ADVAPI32.DLL" returned 1 (0x1). DllMain(0x760D0000, DLL_PROCESS_DETACH, 0x00000001) in "SECHOST.DLL" called. DllMain(0x760D0000, DLL_PROCESS_DETACH, 0x00000001) in "SECHOST.DLL" returned 1 (0x1). DllMain(0x75B80000, DLL_PROCESS_DETACH, 0x00000001) in "RPCRT4.DLL" called. DllMain(0x75B80000, DLL_PROCESS_DETACH, 0x00000001) in "RPCRT4.DLL" returned 1 (0x1). DllMain(0x75750000, DLL_PROCESS_DETACH, 0x00000001) in "SSPICLI.DLL" called. DllMain(0x75750000, DLL_PROCESS_DETACH, 0x00000001) in "SSPICLI.DLL" returned 1 (0x1). DllMain(0x75740000, DLL_PROCESS_DETACH, 0x00000001) in "CRYPTBASE.DLL" called. DllMain(0x75740000, DLL_PROCESS_DETACH, 0x00000001) in "CRYPTBASE.DLL" returned 1 (0x1). DllMain(0x75A60000, DLL_PROCESS_DETACH, 0x00000001) in "GDI32.DLL" called. DllMain(0x75A60000, DLL_PROCESS_DETACH, 0x00000001) in "GDI32.DLL" returned 1 (0x1). DllMain(0x764F0000, DLL_PROCESS_DETACH, 0x00000001) in "LPK.DLL" called. DllMain(0x764F0000, DLL_PROCESS_DETACH, 0x00000001) in "LPK.DLL" returned 1 (0x1). DllMain(0x77850000, DLL_PROCESS_DETACH, 0x00000001) in "USP10.DLL" called. DllMain(0x77850000, DLL_PROCESS_DETACH, 0x00000001) in "USP10.DLL" returned 1 (0x1). DllMain(0x75830000, DLL_PROCESS_DETACH, 0x00000001) in "MSVCRT.DLL" called. DllMain(0x75830000, DLL_PROCESS_DETACH, 0x00000001) in "MSVCRT.DLL" returned 1 (0x1). DllMain(0x74160000, DLL_PROCESS_DETACH, 0x00000001) in "MSVCR90.DLL" called. DllMain(0x74160000, DLL_PROCESS_DETACH, 0x00000001) in "MSVCR90.DLL" returned 1 (0x1). DllMain(0x08370000, DLL_PROCESS_DETACH, 0x00000001) in "DEPENDS.DLL" called. DllMain(0x08370000, DLL_PROCESS_DETACH, 0x00000001) in "DEPENDS.DLL" returned 1 (0x1). DllMain(0x75910000, DLL_PROCESS_DETACH, 0x00000001) in "KERNEL32.DLL" called. DllMain(0x75910000, DLL_PROCESS_DETACH, 0x00000001) in "KERNEL32.DLL" returned 1 (0x1). DllMain(0x77900000, DLL_PROCESS_DETACH, 0x00000001) in "KERNELBASE.DLL" called. DllMain(0x77900000, DLL_PROCESS_DETACH, 0x00000001) in "KERNELBASE.DLL" returned 1 (0x1). Exited "DEMO_FOR_PYTHON.EXE" (process 0x1FE0) with code 0 (0x0). From cmkleffner at gmail.com Thu Jul 3 07:17:31 2014 From: cmkleffner at gmail.com (Carl Kleffner) Date: Thu, 3 Jul 2014 13:17:31 +0200 Subject: [Numpy-discussion] Numpy and debug symbols In-Reply-To: References: Message-ID: Hi, numpy extensions are linked against python27.dll. I have no idea, if it works to copy python27.dll side by side to python27_d.dll (I guess not). But you can try it anyway. The clean way is to get or compile a debug numpy version linked against python27_d.dll Regards Carll 2014-07-03 12:51 GMT+02:00 Pablo P?rez Garc?a : > Hello, > > I was able to run Dependency Walker and I noticed that in Debug mode the > following type of libraries are not loaded: > > "MULTIARRAY.PYD", "UMATH.PYD" > > Also in debug mode Python27_D is loaded and in release mode Python27 which > sounds good to me... but for some reason debug mode cannot load necessary > dependencies. > > I attach both files. > > By the way, I like this community! > > > > 2014-07-03 12:33 GMT+02:00 Carl Kleffner : > > Hi, >> >> to trace this error, you can try to run your programm with the dependency >> walker http://www.dependencywalker.com/ . In the menu there is a >> profiling option. With 'Start profiling' you get messages of all accesses >> to DLLs and Python extensions. Most likely a DLL is not found. >> Be aware: for 64bit development you need a dedicated zip-file for the >> dependency walker. >> >> Regards >> >> Carl >> >> >> 2014-07-03 11:22 GMT+02:00 Julian Taylor : >> >> On Thu, Jul 3, 2014 at 11:14 AM, Pablo P?rez Garc?a >>> wrote: >>> > Hello, I'm a newcomer and I have a question I did not manage to solve >>> yet, I >>> > posted it into these two stack-overflow entries: >>> > >>> > >>> http://stackoverflow.com/questions/24529811/compiling-numpy-for-windows-python-2-7-7 >>> > >>> > >>> http://stackoverflow.com/questions/24548485/using-numpy-on-an-embedded-python-interpreter-using-vs2008-under-windows-7 >>> > >>> >>> I don't know how it works on windows but on linux/mac in order to >>> import debug builds of binary extensions you need to use debug build >>> of python which is a different runtime. I guess on windows you either >>> have to download a special installer with the debug build or build it >>> yourself (configure --with-pydebug) >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > Pablo P?rez Garc?a > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmkleffner at gmail.com Thu Jul 3 07:51:56 2014 From: cmkleffner at gmail.com (Carl Kleffner) Date: Thu, 3 Jul 2014 13:51:56 +0200 Subject: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing In-Reply-To: References: <536CB2C6.1030305@googlemail.com> Message-ID: Hi Matthew, I can make it in the late evening (MEZ timezone), so you have to wait a bit ... I also will try to create new numpy/scipy wheels. I now have the latest OpenBLAS version ready. Olivier gaves me access to rackspace. I wil try it out on the weekend. Regards Carl 2014-07-03 12:46 GMT+02:00 Matthew Brett : > I guess this one's mainly for Carl: > > On Thu, Jul 3, 2014 at 11:06 AM, Matthew Brett > wrote: > > Hi, > > > > On Thu, Jul 3, 2014 at 4:56 AM, Sturla Molden > wrote: > >> On 02/07/14 19:55, Chris Barker wrote: > >> > >>> > >>> Indeed -- the default (i.e what you get with pip install numpy) should > >>> be SSE2 -- I":d much rather have a few folks with old hardware have to > >>> go through some hoops that n have most people get something that is > >>> "much slower than MATLAB". > >> > >> > >> I think we should use SSE3 as default. It is already ten years old. Most > >> users (99.999 %) who want binary wheels have an SSE3 capable CPU. > > > > The 99% for SSE2 comes from the Firefox crash reports, where the large > > majority are for very recent Firefox downloads. > > > > If you can identify SSE3 machines from the reported CPU string (as the > > Firefox people did for SSE2), please do have a look a see if you can > > get a count for SSE3 in the Firefox crash reports; if it's close to > > 99% that would make a strong argument: > > > > https://github.com/numpy/numpy/wiki/Windows-versions#sse--sse2 > > https://gist.github.com/matthew-brett/9cb5274f7451a3eb8fc0 > > Jonathan Helmus recently pointed out https://ci.appveyor.com in a > discussion on the scikit-image mailing list. The scikit-image team > are trying to get builds and tests working there. The configuration > file allows arbitrary cmd and powershell commands executed in a clean > Windows virtual machine. Do you think it would be possible to get the > wheel builds working on something like that? That would be a big step > forward, just because the current procedure is rather fiddly, even if > not very difficult. > > Any news on the pull request to numpy? Waiting eagerly :) > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.hulsman at tudelft.nl Thu Jul 3 08:36:17 2014 From: m.hulsman at tudelft.nl (Marc Hulsman) Date: Thu, 03 Jul 2014 14:36:17 +0200 Subject: [Numpy-discussion] Fast way to convert (nested) list to numpy object array? In-Reply-To: References: <53B51993.7080207@tudelft.nl> Message-ID: <53B54E41.8090309@tudelft.nl> On 07/03/2014 11:43 AM, Julian Taylor wrote: > On second though I guess adding a short circuit to the dimension > discovery on mismatching list length with object type should solve the > issue too. A bit more information on the use case would still be > useful, why do you need to use numpy arrays for this in the first place? I use numpy as the base for a prototype data handling language (which matches dimensions not on position as in numpy, but by identity). This allows SQL like operations on complex data structures. The code has to be generic, to handle the corner cases. Numpy is used as it provides the fast indicing/ufuncs. Input is often formatted using regular Python constructs. This input data is 'unpacked' to a certain depth, which means that it is converted to numpy arrays, to allow for generic query operations. This can however go wrong. Say that we have nested variable length lists, what sometimes happens is that part of the data has (by chance) only fixed length nested lists, while another part has variable length nested lists. If we then unpack, numpy will for the first case construct a multi-dimensional array, while for the second case it will construct a single-dimensional array of nested lists. If we then want to e.g. concatenate this data using a generic operation, it will have trouble to handle the mix of multi-dimensional and 1-dimensional arrays. The code becomes quite a bit simpler if I know at forehand that I can expect just e.g. a 1-dimensional array. This is maybe somewhat of a corner case :) However, I was still wondering why, when assigning x[:] = k, k is still 'descended into' further than needed given the limited dimension of x. This seems unnecessary? Also, it is also not really clear to me why fromiter does not work using object dtypes. A solution for these two more general problems would already help me a lot. The generic solution of adding an nmaxdim parameter to numpy.array would of course be even more ideal :) From sebastian at sipsolutions.net Thu Jul 3 08:44:19 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 03 Jul 2014 14:44:19 +0200 Subject: [Numpy-discussion] Fast way to convert (nested) list to numpy object array? In-Reply-To: <53B54E41.8090309@tudelft.nl> References: <53B51993.7080207@tudelft.nl> <53B54E41.8090309@tudelft.nl> Message-ID: <1404391459.13834.8.camel@sebastian-t440> On Do, 2014-07-03 at 14:36 +0200, Marc Hulsman wrote: > On 07/03/2014 11:43 AM, Julian Taylor wrote: > > On second though I guess adding a short circuit to the dimension > > discovery on mismatching list length with object type should solve the > > issue too. A bit more information on the use case would still be > > useful, why do you need to use numpy arrays for this in the first place? > > I use numpy as the base for a prototype data handling language (which > matches dimensions not on position as in numpy, but by identity). > This allows SQL like operations on complex data structures. The code has > to be generic, to handle the corner cases. Numpy is used as it > provides the fast indicing/ufuncs. > > Input is often formatted using regular Python constructs. This input > data is 'unpacked' to a certain depth, which means > that it is converted to numpy arrays, to allow for generic query > operations. > > This can however go wrong. Say that we have nested variable length > lists, what sometimes happens is that part of the data has > (by chance) only fixed length nested lists, while another part has > variable length nested lists. If we then unpack, numpy will for > the first case construct a multi-dimensional array, while for the second > case it will construct a single-dimensional > array of nested lists. If we then want to e.g. concatenate this data > using a generic operation, it will have trouble to handle the mix of > multi-dimensional and 1-dimensional arrays. The code becomes quite a > bit simpler if I know at forehand that I can expect just e.g. > a 1-dimensional array. > > This is maybe somewhat of a corner case :) However, I was still > wondering why, when assigning x[:] = k, k is still 'descended into' > further than needed given the limited dimension of x. This seems > unnecessary? Also, it is also not really clear to me why fromiter > does not work using object dtypes. A solution for these two more general > problems would already help me a lot. True and true. I don't see a problem with fromiter being more general, just someone has to sit down and add new error checks/cleanup stuff for the object case. The assignment could probably also be optimized, not sure how hard that is, I would expect it isn't that hard. As usually, someone just needs to find time and the interest to actually do it ;). - Sebastian > > The generic solution of adding an nmaxdim parameter to numpy.array would > of course be even more ideal :) > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From sturla.molden at gmail.com Thu Jul 3 09:27:24 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 3 Jul 2014 13:27:24 +0000 (UTC) Subject: [Numpy-discussion] [Python-ideas] PEP pre-draft: Support for indexing with keyword arguments References: Message-ID: <295992745426086730.054539sturla.molden-gmail.com@news.gmane.org> Pandas might have more use for this than NumPy. Database interfaces might also have use for this. Sturla Nathaniel Smith wrote: > There's some discussion on python-ideas about making it possible for python > indexing to accept kwargs, eg > > arr[1:2, foo=bar] > > Since numpy is a very heavy user of indexing which might benefit from this, > I thought I should forward it here. If we have clear use cases for such a > feature then that may strongly affect the discussion. > > I admit I can't actually think of any features this would enable for us > though... > > -n > ---------- Forwarded message ---------- > From: "Stefano Borini" > Date: 2 Jul 2014 00:17 > Subject: [Python-ideas] PEP pre-draft: Support for indexing with keyword > arguments > To: "python-ideas at python.org" , "Joseph > Martinot-Lagarde" > Cc: > > Dear all, > > after the first mailing list feedback, and further private discussion with > Joseph Martinot-Lagarde, I drafted a first iteration of a PEP for keyword > arguments in indexing. The document is available here. > > href="https://github.com/stefanoborini/pep-keyword/blob/master/PEP-XXX.txt">https://github.com/stefanoborini/pep-keyword/blob/master/PEP-XXX.txt > > The document is not in final form when it comes to specifications. In fact, > it requires additional discussion about the best strategy to achieve the > desired result. Particular attention has been devoted to present > alternative implementation strategies, their pros and cons. I will examine > all feedback tomorrow morning European time (in approx 10 hrs), and apply > any pull requests or comments you may have. > > When the specification is finalized, or this community suggests that the > PEP is in a form suitable for official submission despite potential open > issues, I will submit it to the editor panel for further discussion, and > deploy an actual implementation according to the agreed specification for a > working test run. > > I apologize for potential mistakes in the PEP drafting and submission > process, as this is my first PEP. > > Kind Regards, > > Stefano Borini > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > href="https://mail.python.org/mailman/listinfo/python-ideas">https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: href="http://python.org/psf/codeofconduct/">http://python.org/psf/codeofconduct/ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > href="http://mail.scipy.org/mailman/listinfo/numpy-discussion">http://mail.scipy.org/mailman/listinfo/numpy-discussion From matthew.brett at gmail.com Thu Jul 3 10:43:51 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 3 Jul 2014 15:43:51 +0100 Subject: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing In-Reply-To: References: <536CB2C6.1030305@googlemail.com> Message-ID: Hi, On Thu, Jul 3, 2014 at 12:51 PM, Carl Kleffner wrote: > Hi Matthew, > > I can make it in the late evening (MEZ timezone), so you have to wait a bit > ... I also will try to create new numpy/scipy wheels. I now have the latest > OpenBLAS version ready. Olivier gaves me access to rackspace. I wil try it > out on the weekend. Great - thanks a lot, Matthew From chris.barker at noaa.gov Thu Jul 3 11:59:00 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 3 Jul 2014 08:59:00 -0700 Subject: [Numpy-discussion] Teaching Scipy BoF at SciPy Message-ID: HI Folks, I will be hosting a "Teaching the SciPy Stack" BoF at SciPy this year: https://conference.scipy.org/scipy2014/schedule/presentation/1762/ (Actually, I proposed it for the conference, but would be more than happy to have other folks join me in facilitating, hosting, etc.) I've put up a Wiki page to collect ideas for topics. Please take a look and add your $0.02: https://github.com/numpy/numpy/wiki/TeachingSciPy-BoF-at-Scipy-2014 See you there, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From ted.sandler at gmail.com Thu Jul 3 13:17:10 2014 From: ted.sandler at gmail.com (Ted Sandler) Date: Thu, 3 Jul 2014 10:17:10 -0700 Subject: [Numpy-discussion] parsing dtype descriptors Message-ID: Hi all, is there a spec or grammar for valid values of numpy dtype descriptor strings? I am writing code to parse ".npy" files from Java and want to be able to handle the range of ndarray descriptor strings. I came across this code: dtype = numpy.dtype(d['descr']) at line 267 in format.py: https://github.com/numpy/numpy/blob/master/numpy/lib/format.py However, I can't seem to find where it's implemented. Help is appreciated. Thanks! Ted -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Thu Jul 3 13:30:01 2014 From: shoyer at gmail.com (Stephan Hoyer) Date: Thu, 3 Jul 2014 10:30:01 -0700 Subject: [Numpy-discussion] Fast way to convert (nested) list to numpy object array? In-Reply-To: <53B54E41.8090309@tudelft.nl> References: <53B51993.7080207@tudelft.nl> <53B54E41.8090309@tudelft.nl> Message-ID: On Thu, Jul 3, 2014 at 5:36 AM, Marc Hulsman wrote: > This can however go wrong. Say that we have nested variable length > lists, what sometimes happens is that part of the data has > (by chance) only fixed length nested lists, while another part has > variable length nested lists. If we then unpack, numpy will for > the first case construct a multi-dimensional array, while for the second > case it will construct a single-dimensional > array of nested lists. If we then want to e.g. concatenate this data > using a generic operation, it will have trouble to handle the mix of > multi-dimensional and 1-dimensional arrays. The code becomes quite a > bit simpler if I know at forehand that I can expect just e.g. > a 1-dimensional array. > Pandas has a couple of awkward work-arounds to do just that (creating object arrays). Might be worth taking a look: https://github.com/pydata/pandas/blob/master/pandas/lib.pyx#L315 https://github.com/pydata/pandas/blob/master/pandas/core/common.py#L2124 Cheers, Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From valentin at haenel.co Thu Jul 3 15:35:06 2014 From: valentin at haenel.co (Valentin Haenel) Date: Thu, 3 Jul 2014 21:35:06 +0200 Subject: [Numpy-discussion] parsing dtype descriptors In-Reply-To: References: Message-ID: <20140703193506.GA25653@kudu.in-berlin.de> Dear Ted, * Ted Sandler [2014-07-03]: > Hi all, is there a spec or grammar for valid values of numpy dtype > descriptor strings? > > I am writing code to parse ".npy" files from Java and want to be able to > handle the range of ndarray descriptor strings. I came across this code: > > dtype = numpy.dtype(d['descr']) > > at line 267 in format.py: > > https://github.com/numpy/numpy/blob/master/numpy/lib/format.py > > However, I can't seem to find where it's implemented. Not sure exactly, what you are looking for, but maybe the following helps: https://github.com/numpy/numpy/blob/master/numpy/lib/format.py#L210 best, V- From ted.sandler at gmail.com Thu Jul 3 17:53:51 2014 From: ted.sandler at gmail.com (Ted Sandler) Date: Thu, 3 Jul 2014 14:53:51 -0700 Subject: [Numpy-discussion] parsing dtype descriptors In-Reply-To: <20140703193506.GA25653@kudu.in-berlin.de> References: <20140703193506.GA25653@kudu.in-berlin.de> Message-ID: Thanks. No, it's not what I'm looking for. I'm looking for the code that parses the string "f8' '=f4' 'float32' '>c16' ... Ideally, I want the exhaustive list of valid input strings that describe standard ndarrays (i.e. ndarrays with simple entries as opposed to records or subarrays). Lacking an exhaustive list or spec, I'd like the source code that does the parsing for them. This stackoverflow post is worth looking at: http://stackoverflow.com/questions/13997087/what-are-the-available-datatypes-for-dtype-with-numpys-loadtxt-an-genfromtxt Thanks again, Ted On Thu, Jul 3, 2014 at 12:35 PM, Valentin Haenel wrote: > Dear Ted, > > * Ted Sandler [2014-07-03]: > > Hi all, is there a spec or grammar for valid values of numpy dtype > > descriptor strings? > > > > I am writing code to parse ".npy" files from Java and want to be able to > > handle the range of ndarray descriptor strings. I came across this code: > > > > dtype = numpy.dtype(d['descr']) > > > > at line 267 in format.py: > > > > https://github.com/numpy/numpy/blob/master/numpy/lib/format.py > > > > However, I can't seem to find where it's implemented. > > Not sure exactly, what you are looking for, but maybe the following > helps: > > https://github.com/numpy/numpy/blob/master/numpy/lib/format.py#L210 > > best, > > V- > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Jul 3 18:54:46 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 3 Jul 2014 16:54:46 -0600 Subject: [Numpy-discussion] Fast way to convert (nested) list to numpy object array? In-Reply-To: References: <53B51993.7080207@tudelft.nl> Message-ID: On Thu, Jul 3, 2014 at 3:30 AM, Julian Taylor wrote: > numpy descends into the lists even if you request a object dtype as it > treats object arrays containing nested lists of equal size as > ndimensional: > > np.array([[1,2], [3,4]], dtype=object).ndim > 2 > > I don't think we have a constructor that limits the maximum dimension, > only one the minimum dimension. > There was discussion of such some years ago specifically for the object case. I think it would be useful. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Fri Jul 4 02:03:12 2014 From: toddrjen at gmail.com (Todd) Date: Fri, 4 Jul 2014 08:03:12 +0200 Subject: [Numpy-discussion] Fwd: [Python-ideas] PEP pre-draft: Support for indexing with keyword arguments In-Reply-To: References: <53B33800.1030300@ferrara.linux.it> Message-ID: On Jul 2, 2014 10:49 AM, "Nathaniel Smith" wrote: > > I admit I can't actually think of any features this would enable for us though... Could it be useful for structured arrays? -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Fri Jul 4 04:39:33 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 04 Jul 2014 10:39:33 +0200 Subject: [Numpy-discussion] Fwd: [Python-ideas] PEP pre-draft: Support for indexing with keyword arguments In-Reply-To: References: <53B33800.1030300@ferrara.linux.it> Message-ID: <1404463173.2714.4.camel@sebastian-t440> On Fr, 2014-07-04 at 08:03 +0200, Todd wrote: > > On Jul 2, 2014 10:49 AM, "Nathaniel Smith" wrote: > > > > I admit I can't actually think of any features this would enable for > us though... > > Could it be useful for structured arrays? Not sure how. The named columns seem like a decent point to me. For toggling indexing options, I wonder if usually function calls or temporary object construction (at least for numpy) ala: arr.ox[...] arr.indx(option)[...] are not better in any case. - Sebastian > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From valentin at haenel.co Fri Jul 4 04:53:09 2014 From: valentin at haenel.co (Valentin Haenel) Date: Fri, 4 Jul 2014 10:53:09 +0200 Subject: [Numpy-discussion] parsing dtype descriptors In-Reply-To: References: <20140703193506.GA25653@kudu.in-berlin.de> Message-ID: <20140704085309.GB30233@kudu.in-berlin.de> Dear Ted, * Ted Sandler [2014-07-03]: > Thanks. No, it's not what I'm looking for. > > I'm looking for the code that parses the string " header's descriptor: > > {'descr': ' > There are many different descriptor strings, e.g.: > > '>f8' > '=f4' > 'float32' > '>c16' > ... > > Ideally, I want the exhaustive list of valid input strings that describe > standard ndarrays (i.e. ndarrays with simple entries as opposed to records > or subarrays). Lacking an exhaustive list or spec, I'd like the source code > that does the parsing for them. This stackoverflow post is worth looking > at: > > > http://stackoverflow.com/questions/13997087/what-are-the-available-datatypes-for-dtype-with-numpys-loadtxt-an-genfromtxt The only thing I could find in this direction, was: http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html But since you have mentioned the stackoverflow post, I presume you have already discovered this page. best, V- From robert.kern at gmail.com Fri Jul 4 04:53:36 2014 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 4 Jul 2014 09:53:36 +0100 Subject: [Numpy-discussion] parsing dtype descriptors In-Reply-To: References: <20140703193506.GA25653@kudu.in-berlin.de> Message-ID: On Thu, Jul 3, 2014 at 10:53 PM, Ted Sandler wrote: > Thanks. No, it's not what I'm looking for. > > I'm looking for the code that parses the string " header's descriptor: > > {'descr': ' > There are many different descriptor strings, e.g.: > > '>f8' > '=f4' > 'float32' > '>c16' > ... > > Ideally, I want the exhaustive list of valid input strings that describe > standard ndarrays (i.e. ndarrays with simple entries as opposed to records > or subarrays). Lacking an exhaustive list or spec, I'd like the source code > that does the parsing for them. https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/descriptor.c#L1321 https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/conversion_utils.c#L1000 https://github.com/numpy/numpy/blob/master/numpy/core/include/numpy/ndarraytypes.h#L97 -- Robert Kern From valentin at haenel.co Fri Jul 4 09:49:54 2014 From: valentin at haenel.co (Valentin Haenel) Date: Fri, 4 Jul 2014 15:49:54 +0200 Subject: [Numpy-discussion] About the npz format In-Reply-To: <53515EE1.4080101@googlemail.com> References: <535030C4.9020700@googlemail.com> <20140417202635.GB22624@kudu.in-berlin.de> <20140417205627.GA4192@kudu.in-berlin.de> <20140418162927.GB1837@kudu.in-berlin.de> <53515EE1.4080101@googlemail.com> Message-ID: <20140704134954.GB31861@kudu.in-berlin.de> sorry, for the top-post, but should we add this as an issue on the github tracker? I'd like to revisit it this summer. V- * Julian Taylor [2014-04-18]: > On 18.04.2014 18:29, Valentin Haenel wrote: > > Hi, > > > > * Valentin Haenel [2014-04-17]: > >> * Valentin Haenel [2014-04-17]: > >>> * Julian Taylor [2014-04-17]: > >>>> On 17.04.2014 21:30, onefire wrote: > >>>>> Thanks for the suggestion. I did profile the program before, just not > >>>>> using Python. > >>>> > >>>> one problem of npz is that the zipfile module does not support streaming > >>>> data in (or if it does now we aren't using it). > >>>> So numpy writes the file uncompressed to disk and then zips it which is > >>>> horrible for performance and disk usage. > >>> > >>> As a workaround may also be possible to write the temporary NPY files to > >>> cStringIO instances and then use ``ZipFile.writestr`` with the > >>> ``getvalue()`` of the cStringIO object. However that approach may > >>> require some memory. In python 2.7, for each array: one copy inside the > >>> cStringIO instance and then another copy of when calling getvalue on the > >>> cString, I believe. > >> > >> There is a proof-of-concept implementation here: > >> > >> https://github.com/esc/numpy/compare/feature;npz_no_temp_file > > > > Anybody interested in me fixing this up (unit tests, API, etc..) for > > inclusion? > > > > I wonder if it would be better to instead use a fifo to avoid the memory > doubling. Windows probably hasn't got them (exposed via python) but one > can slap a platform check in front. > attached a proof of concept without proper error handling (which is > unfortunately the tricky part) > >From 472b4c0a44804b65d0774147010ec7a931a1c52d Mon Sep 17 00:00:00 2001 > From: Julian Taylor > Date: Thu, 17 Apr 2014 23:01:47 +0200 > Subject: [PATCH] use a pipe for savez > > --- > numpy/lib/npyio.py | 25 +++++++++++-------------- > 1 file changed, 11 insertions(+), 14 deletions(-) > > diff --git a/numpy/lib/npyio.py b/numpy/lib/npyio.py > index 98b4b6e..baafa9d 100644 > --- a/numpy/lib/npyio.py > +++ b/numpy/lib/npyio.py > @@ -585,22 +585,19 @@ def _savez(file, args, kwds, compress): > zipf = zipfile_factory(file, mode="w", compression=compression) > > # Stage arrays in a temporary file on disk, before writing to zip. > - fd, tmpfile = tempfile.mkstemp(suffix='-numpy.npy') > - os.close(fd) > - try: > + import threading > + with tempfile.TemporaryDirectory() as td: > + fifoname = os.path.join(td, "fifo") > + os.mkfifo(fifoname) > for key, val in namedict.items(): > fname = key + '.npy' > - fid = open(tmpfile, 'wb') > - try: > - format.write_array(fid, np.asanyarray(val)) > - fid.close() > - fid = None > - zipf.write(tmpfile, arcname=fname) > - finally: > - if fid: > - fid.close() > - finally: > - os.remove(tmpfile) > + def mywrite(pipe, val): > + with open(pipe, "wb") as wpipe: > + format.write_array(wpipe, np.asanyarray(val)) > + t = threading.Thread(target=mywrite, args=(fifoname, val)) > + t.start() > + zipf.write(fifoname, arcname=fname) > + t.join() > > zipf.close() > > -- > 1.9.1 > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From pelson.pub at gmail.com Fri Jul 4 10:01:52 2014 From: pelson.pub at gmail.com (Phil Elson) Date: Fri, 4 Jul 2014 15:01:52 +0100 Subject: [Numpy-discussion] Teaching Scipy BoF at SciPy In-Reply-To: References: Message-ID: Nice idea. Just a repository of courses would be a great first step. For example, I know Jake Vanderplas's course at https://github.com/jakevdp/2013_fall_ASTR599 is useful, and I have a few introduction (3hr) courses at https://github.com/SciTools/courses. On 3 July 2014 16:59, Chris Barker wrote: > HI Folks, > > I will be hosting a "Teaching the SciPy Stack" BoF at SciPy this year: > > https://conference.scipy.org/scipy2014/schedule/presentation/1762/ > > (Actually, I proposed it for the conference, but would be more than happy > to have other folks join me in facilitating, hosting, etc.) > > I've put up a Wiki page to collect ideas for topics. Please take a look > and add your $0.02: > > https://github.com/numpy/numpy/wiki/TeachingSciPy-BoF-at-Scipy-2014 > > See you there, > > -Chris > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.hulsman at tudelft.nl Fri Jul 4 11:32:41 2014 From: m.hulsman at tudelft.nl (Marc Hulsman) Date: Fri, 04 Jul 2014 17:32:41 +0200 Subject: [Numpy-discussion] Fast way to convert (nested) list to numpy object array? In-Reply-To: <1404391459.13834.8.camel@sebastian-t440> References: <53B51993.7080207@tudelft.nl> <53B54E41.8090309@tudelft.nl> <1404391459.13834.8.camel@sebastian-t440> Message-ID: <53B6C919.4010806@tudelft.nl> On 07/03/2014 02:44 PM, Sebastian Berg wrote: > True and true. I don't see a problem with fromiter being more general, > just someone has to sit down and add new error checks/cleanup stuff > for the object case. The assignment could probably also be optimized, > not sure how hard that is, I would expect it isn't that hard. As > usually, someone just needs to find time and the interest to actually > do it ;). - Sebastian I looked at the code of FromIter below. /* * We would need to alter the memory RENEW code to decrement any * reference counts before throwing away any memory. */ if (PyDataType_REFCHK(dtype)) { PyErr_SetString(PyExc_ValueError, "cannot create object arrays from iterator"); goto done; } However, the memory renew code (which just reallocs the memory to increase the array size) uses a simple realloc. It seems to me that it is not necessary to adapt reference counts in this case (as the incref from the new memory compensates the decref from the memory that is removed)? For the addition of elements to the array, everything seems to be ok anyway, as setitem is used, which does the incref already. So I think it should be possible to just remove this check? I did not yet look at the assignment issue, had some difficulty finding the correct place in the code, does does anyone have any pointers were to look? >> The generic solution of adding an nmaxdim parameter to numpy.array would >> of course be even more ideal :) >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Fri Jul 4 15:42:41 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 4 Jul 2014 13:42:41 -0600 Subject: [Numpy-discussion] Questions about fixes for 1.9.0rc2 Message-ID: Sebastian Seberg has fixed one class of test failures due to the indexing changes in numpy 1.9.0b1. There are some remaining errors, and in the case of the Matplotlib failures, they look to me to be Matplotlib bugs. The 2-d arrays that cause the error are returned by the overloaded _interpolate_single_key function in CubicTriInterpolator that is documented in the base class to return a 1-d array, whereas the actual dimensions are of the form (n, 1). The question is, what is the best work around here for these sorts errors? Can we afford to break Matplotlib and other packages on account of a bug that was previously accepted by Numpy? Perhaps a FutureWarning rather than an error would be more appropriate at this point, and that modification would be easy to make. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jul 4 16:00:06 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 4 Jul 2014 14:00:06 -0600 Subject: [Numpy-discussion] Questions about fixes for 1.9.0rc2 In-Reply-To: References: Message-ID: On Fri, Jul 4, 2014 at 1:42 PM, Charles R Harris wrote: > Sebastian Seberg has fixed one class of test failures due to the indexing > changes in numpy 1.9.0b1. There are some remaining errors, and in the case > of the Matplotlib failures, they look to me to be Matplotlib bugs. The 2-d > arrays that cause the error are returned by the overloaded > _interpolate_single_key function in CubicTriInterpolator that is > documented in the base class to return a 1-d array, whereas the actual > dimensions are of the form (n, 1). The question is, what is the best work > around here for these sorts errors? Can we afford to break Matplotlib and > other packages on account of a bug that was previously accepted by Numpy? > Perhaps a FutureWarning rather than an error would be more appropriate at > this point, and that modification would be easy to make. > > Thoughts? > > I'll add that all of the remaining test failures, with the possible exception of the Tables errors, look like bugs to me. The Tables errors result from the fact that in fancy indexing assignment into 1-d array the right hand side used to be repeated until sufficient values for the assignment were available. Not sure what to do about that. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Fri Jul 4 16:02:29 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 4 Jul 2014 22:02:29 +0200 Subject: [Numpy-discussion] Questions about fixes for 1.9.0rc2 In-Reply-To: References: Message-ID: On Fri, Jul 4, 2014 at 10:00 PM, Charles R Harris wrote: > > > > On Fri, Jul 4, 2014 at 1:42 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> Sebastian Seberg has fixed one class of test failures due to the indexing >> changes in numpy 1.9.0b1. There are some remaining errors, and in the case >> of the Matplotlib failures, they look to me to be Matplotlib bugs. The 2-d >> arrays that cause the error are returned by the overloaded >> _interpolate_single_key function in CubicTriInterpolator that is >> documented in the base class to return a 1-d array, whereas the actual >> dimensions are of the form (n, 1). The question is, what is the best >> work around here for these sorts errors? Can we afford to break Matplotlib >> and other packages on account of a bug that was previously accepted by >> Numpy? >> > It depends how bad the break is, but in principle I'd say that breaking Matplotlib is not OK. > Perhaps a FutureWarning rather than an error would be more appropriate at >> this point, and that modification would be easy to make. >> > Sounds like a good idea then. Ralf > >> Thoughts? >> >> > I'll add that all of the remaining test failures, with the possible > exception of the Tables errors, look like bugs to me. The Tables errors > result from the fact that in fancy indexing assignment into 1-d array the > right hand side used to be repeated until sufficient values for the > assignment were available. Not sure what to do about that. > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Jul 4 16:09:45 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 4 Jul 2014 21:09:45 +0100 Subject: [Numpy-discussion] Questions about fixes for 1.9.0rc2 In-Reply-To: References: Message-ID: On Fri, Jul 4, 2014 at 9:02 PM, Ralf Gommers wrote: > > On Fri, Jul 4, 2014 at 10:00 PM, Charles R Harris > wrote: >> >> On Fri, Jul 4, 2014 at 1:42 PM, Charles R Harris >> wrote: >>> >>> Sebastian Seberg has fixed one class of test failures due to the indexing >>> changes in numpy 1.9.0b1. There are some remaining errors, and in the case >>> of the Matplotlib failures, they look to me to be Matplotlib bugs. The 2-d >>> arrays that cause the error are returned by the overloaded >>> _interpolate_single_key function in CubicTriInterpolator that is documented >>> in the base class to return a 1-d array, whereas the actual dimensions are >>> of the form (n, 1). The question is, what is the best work around here for >>> these sorts errors? Can we afford to break Matplotlib and other packages on >>> account of a bug that was previously accepted by Numpy? > > > It depends how bad the break is, but in principle I'd say that breaking > Matplotlib is not OK. I agree. If it's easy to hack around it and issue a warning for now, and doesn't have other negative consequences, then IMO we should give matplotlib a release or so worth of grace period to fix things. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From charlesr.harris at gmail.com Fri Jul 4 16:33:02 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 4 Jul 2014 14:33:02 -0600 Subject: [Numpy-discussion] Questions about fixes for 1.9.0rc2 In-Reply-To: References: Message-ID: On Fri, Jul 4, 2014 at 2:09 PM, Nathaniel Smith wrote: > On Fri, Jul 4, 2014 at 9:02 PM, Ralf Gommers > wrote: > > > > On Fri, Jul 4, 2014 at 10:00 PM, Charles R Harris > > wrote: > >> > >> On Fri, Jul 4, 2014 at 1:42 PM, Charles R Harris > >> wrote: > >>> > >>> Sebastian Seberg has fixed one class of test failures due to the > indexing > >>> changes in numpy 1.9.0b1. There are some remaining errors, and in the > case > >>> of the Matplotlib failures, they look to me to be Matplotlib bugs. The > 2-d > >>> arrays that cause the error are returned by the overloaded > >>> _interpolate_single_key function in CubicTriInterpolator that is > documented > >>> in the base class to return a 1-d array, whereas the actual dimensions > are > >>> of the form (n, 1). The question is, what is the best work around here > for > >>> these sorts errors? Can we afford to break Matplotlib and other > packages on > >>> account of a bug that was previously accepted by Numpy? > > > > > > It depends how bad the break is, but in principle I'd say that breaking > > Matplotlib is not OK. > > I agree. If it's easy to hack around it and issue a warning for now, > and doesn't have other negative consequences, then IMO we should give > matplotlib a release or so worth of grace period to fix things. > Here is another example, from skimage. ====================================================================== ERROR: test_join.test_relabel_sequential_offset1 ---------------------------------------------------------------------- Traceback (most recent call last): File "X:\Python27-x64\lib\site-packages\nose\case.py", line 197, in runTest self.test(*self.arg) File "X:\Python27-x64\lib\site-packages\skimage\segmentation\tests\test_join.py", line 30, in test_relabel_sequential_offset1 ar_relab, fw, inv = relabel_sequential(ar) File "X:\Python27-x64\lib\site-packages\skimage\segmentation\_join.py", line 127, in relabel_sequential forward_map[labels0] = np.arange(offset, offset + len(labels0) + 1) ValueError: shape mismatch: value array of shape (6,) could not be broadcast to indexing result of shape (5,) Which is pretty clearly a coding error. Unfortunately, the error is in the package rather than the test. The only easy way to fix all of these sorts of things is to revert the indexing changes, and I'm loathe to do that. Grrr... Chuck > > -n > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Jul 4 16:41:46 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 4 Jul 2014 21:41:46 +0100 Subject: [Numpy-discussion] Questions about fixes for 1.9.0rc2 In-Reply-To: References: Message-ID: On Fri, Jul 4, 2014 at 9:33 PM, Charles R Harris wrote: > > On Fri, Jul 4, 2014 at 2:09 PM, Nathaniel Smith wrote: >> >> On Fri, Jul 4, 2014 at 9:02 PM, Ralf Gommers >> wrote: >> > >> > On Fri, Jul 4, 2014 at 10:00 PM, Charles R Harris >> > wrote: >> >> >> >> On Fri, Jul 4, 2014 at 1:42 PM, Charles R Harris >> >> wrote: >> >>> >> >>> Sebastian Seberg has fixed one class of test failures due to the >> >>> indexing >> >>> changes in numpy 1.9.0b1. There are some remaining errors, and in the >> >>> case >> >>> of the Matplotlib failures, they look to me to be Matplotlib bugs. The >> >>> 2-d >> >>> arrays that cause the error are returned by the overloaded >> >>> _interpolate_single_key function in CubicTriInterpolator that is >> >>> documented >> >>> in the base class to return a 1-d array, whereas the actual dimensions >> >>> are >> >>> of the form (n, 1). The question is, what is the best work around here >> >>> for >> >>> these sorts errors? Can we afford to break Matplotlib and other >> >>> packages on >> >>> account of a bug that was previously accepted by Numpy? >> > >> > >> > It depends how bad the break is, but in principle I'd say that breaking >> > Matplotlib is not OK. >> >> I agree. If it's easy to hack around it and issue a warning for now, >> and doesn't have other negative consequences, then IMO we should give >> matplotlib a release or so worth of grace period to fix things. > > > Here is another example, from skimage. > > ====================================================================== > ERROR: test_join.test_relabel_sequential_offset1 > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "X:\Python27-x64\lib\site-packages\nose\case.py", line 197, in > runTest > self.test(*self.arg) > File > "X:\Python27-x64\lib\site-packages\skimage\segmentation\tests\test_join.py", > line 30, in test_relabel_sequential_offset1 > ar_relab, fw, inv = relabel_sequential(ar) > File "X:\Python27-x64\lib\site-packages\skimage\segmentation\_join.py", > line 127, in relabel_sequential > forward_map[labels0] = np.arange(offset, offset + len(labels0) + 1) > ValueError: shape mismatch: value array of shape (6,) could not be broadcast > to indexing result of shape (5,) > > Which is pretty clearly a coding error. Unfortunately, the error is in the > package rather than the test. > > The only easy way to fix all of these sorts of things is to revert the > indexing changes, and I'm loathe to do that. Grrr... Ugh, that's pretty bad :-/. Do you really think we can't use a band-aid over the new indexing code, though? -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From charlesr.harris at gmail.com Fri Jul 4 16:48:39 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 4 Jul 2014 14:48:39 -0600 Subject: [Numpy-discussion] Questions about fixes for 1.9.0rc2 In-Reply-To: References: Message-ID: On Fri, Jul 4, 2014 at 2:41 PM, Nathaniel Smith wrote: > On Fri, Jul 4, 2014 at 9:33 PM, Charles R Harris > wrote: > > > > On Fri, Jul 4, 2014 at 2:09 PM, Nathaniel Smith wrote: > >> > >> On Fri, Jul 4, 2014 at 9:02 PM, Ralf Gommers > >> wrote: > >> > > >> > On Fri, Jul 4, 2014 at 10:00 PM, Charles R Harris > >> > wrote: > >> >> > >> >> On Fri, Jul 4, 2014 at 1:42 PM, Charles R Harris > >> >> wrote: > >> >>> > >> >>> Sebastian Seberg has fixed one class of test failures due to the > >> >>> indexing > >> >>> changes in numpy 1.9.0b1. There are some remaining errors, and in > the > >> >>> case > >> >>> of the Matplotlib failures, they look to me to be Matplotlib bugs. > The > >> >>> 2-d > >> >>> arrays that cause the error are returned by the overloaded > >> >>> _interpolate_single_key function in CubicTriInterpolator that is > >> >>> documented > >> >>> in the base class to return a 1-d array, whereas the actual > dimensions > >> >>> are > >> >>> of the form (n, 1). The question is, what is the best work around > here > >> >>> for > >> >>> these sorts errors? Can we afford to break Matplotlib and other > >> >>> packages on > >> >>> account of a bug that was previously accepted by Numpy? > >> > > >> > > >> > It depends how bad the break is, but in principle I'd say that > breaking > >> > Matplotlib is not OK. > >> > >> I agree. If it's easy to hack around it and issue a warning for now, > >> and doesn't have other negative consequences, then IMO we should give > >> matplotlib a release or so worth of grace period to fix things. > > > > > > Here is another example, from skimage. > > > > ====================================================================== > > ERROR: test_join.test_relabel_sequential_offset1 > > ---------------------------------------------------------------------- > > Traceback (most recent call last): > > File "X:\Python27-x64\lib\site-packages\nose\case.py", line 197, in > > runTest > > self.test(*self.arg) > > File > > > "X:\Python27-x64\lib\site-packages\skimage\segmentation\tests\test_join.py", > > line 30, in test_relabel_sequential_offset1 > > ar_relab, fw, inv = relabel_sequential(ar) > > File "X:\Python27-x64\lib\site-packages\skimage\segmentation\_join.py", > > line 127, in relabel_sequential > > forward_map[labels0] = np.arange(offset, offset + len(labels0) + 1) > > ValueError: shape mismatch: value array of shape (6,) could not be > broadcast > > to indexing result of shape (5,) > > > > Which is pretty clearly a coding error. Unfortunately, the error is in > the > > package rather than the test. > > > > The only easy way to fix all of these sorts of things is to revert the > > indexing changes, and I'm loathe to do that. Grrr... > > Ugh, that's pretty bad :-/. Do you really think we can't use a > band-aid over the new indexing code, though? > Yeah, we can. But Sebastian doesn't have time and I'm unfamiliar with the code, so it may take a while... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Jul 4 17:15:01 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 4 Jul 2014 22:15:01 +0100 Subject: [Numpy-discussion] Questions about fixes for 1.9.0rc2 In-Reply-To: References: Message-ID: On Fri, Jul 4, 2014 at 9:48 PM, Charles R Harris wrote: > > On Fri, Jul 4, 2014 at 2:41 PM, Nathaniel Smith wrote: >> >> On Fri, Jul 4, 2014 at 9:33 PM, Charles R Harris >> wrote: >> > >> > On Fri, Jul 4, 2014 at 2:09 PM, Nathaniel Smith wrote: >> >> >> >> On Fri, Jul 4, 2014 at 9:02 PM, Ralf Gommers >> >> wrote: >> >> > >> >> > On Fri, Jul 4, 2014 at 10:00 PM, Charles R Harris >> >> > wrote: >> >> >> >> >> >> On Fri, Jul 4, 2014 at 1:42 PM, Charles R Harris >> >> >> wrote: >> >> >>> >> >> >>> Sebastian Seberg has fixed one class of test failures due to the >> >> >>> indexing >> >> >>> changes in numpy 1.9.0b1. There are some remaining errors, and in >> >> >>> the >> >> >>> case >> >> >>> of the Matplotlib failures, they look to me to be Matplotlib bugs. >> >> >>> The >> >> >>> 2-d >> >> >>> arrays that cause the error are returned by the overloaded >> >> >>> _interpolate_single_key function in CubicTriInterpolator that is >> >> >>> documented >> >> >>> in the base class to return a 1-d array, whereas the actual >> >> >>> dimensions >> >> >>> are >> >> >>> of the form (n, 1). The question is, what is the best work around >> >> >>> here >> >> >>> for >> >> >>> these sorts errors? Can we afford to break Matplotlib and other >> >> >>> packages on >> >> >>> account of a bug that was previously accepted by Numpy? >> >> > >> >> > >> >> > It depends how bad the break is, but in principle I'd say that >> >> > breaking >> >> > Matplotlib is not OK. >> >> >> >> I agree. If it's easy to hack around it and issue a warning for now, >> >> and doesn't have other negative consequences, then IMO we should give >> >> matplotlib a release or so worth of grace period to fix things. >> > >> > >> > Here is another example, from skimage. >> > >> > ====================================================================== >> > ERROR: test_join.test_relabel_sequential_offset1 >> > ---------------------------------------------------------------------- >> > Traceback (most recent call last): >> > File "X:\Python27-x64\lib\site-packages\nose\case.py", line 197, in >> > runTest >> > self.test(*self.arg) >> > File >> > >> > "X:\Python27-x64\lib\site-packages\skimage\segmentation\tests\test_join.py", >> > line 30, in test_relabel_sequential_offset1 >> > ar_relab, fw, inv = relabel_sequential(ar) >> > File >> > "X:\Python27-x64\lib\site-packages\skimage\segmentation\_join.py", >> > line 127, in relabel_sequential >> > forward_map[labels0] = np.arange(offset, offset + len(labels0) + 1) >> > ValueError: shape mismatch: value array of shape (6,) could not be >> > broadcast >> > to indexing result of shape (5,) >> > >> > Which is pretty clearly a coding error. Unfortunately, the error is in >> > the >> > package rather than the test. >> > >> > The only easy way to fix all of these sorts of things is to revert the >> > indexing changes, and I'm loathe to do that. Grrr... >> >> Ugh, that's pretty bad :-/. Do you really think we can't use a >> band-aid over the new indexing code, though? > > > Yeah, we can. But Sebastian doesn't have time and I'm unfamiliar with the > code, so it may take a while... Fair enough! I guess that if what are (arguably) bugs in matplotlib and scikit-image are holding up the numpy release, then it's worth CC'ing their mailing lists in case someone feels like volunteering to fix it... ;-). -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From charlesr.harris at gmail.com Fri Jul 4 17:31:55 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 4 Jul 2014 15:31:55 -0600 Subject: [Numpy-discussion] Questions about fixes for 1.9.0rc2 In-Reply-To: References: Message-ID: On Fri, Jul 4, 2014 at 3:15 PM, Nathaniel Smith wrote: > On Fri, Jul 4, 2014 at 9:48 PM, Charles R Harris > wrote: > > > > On Fri, Jul 4, 2014 at 2:41 PM, Nathaniel Smith wrote: > >> > >> On Fri, Jul 4, 2014 at 9:33 PM, Charles R Harris > >> wrote: > >> > > >> > On Fri, Jul 4, 2014 at 2:09 PM, Nathaniel Smith > wrote: > >> >> > >> >> On Fri, Jul 4, 2014 at 9:02 PM, Ralf Gommers > > >> >> wrote: > >> >> > > >> >> > On Fri, Jul 4, 2014 at 10:00 PM, Charles R Harris > >> >> > wrote: > >> >> >> > >> >> >> On Fri, Jul 4, 2014 at 1:42 PM, Charles R Harris > >> >> >> wrote: > >> >> >>> > >> >> >>> Sebastian Seberg has fixed one class of test failures due to the > >> >> >>> indexing > >> >> >>> changes in numpy 1.9.0b1. There are some remaining errors, and > in > >> >> >>> the > >> >> >>> case > >> >> >>> of the Matplotlib failures, they look to me to be Matplotlib > bugs. > >> >> >>> The > >> >> >>> 2-d > >> >> >>> arrays that cause the error are returned by the overloaded > >> >> >>> _interpolate_single_key function in CubicTriInterpolator that is > >> >> >>> documented > >> >> >>> in the base class to return a 1-d array, whereas the actual > >> >> >>> dimensions > >> >> >>> are > >> >> >>> of the form (n, 1). The question is, what is the best work around > >> >> >>> here > >> >> >>> for > >> >> >>> these sorts errors? Can we afford to break Matplotlib and other > >> >> >>> packages on > >> >> >>> account of a bug that was previously accepted by Numpy? > >> >> > > >> >> > > >> >> > It depends how bad the break is, but in principle I'd say that > >> >> > breaking > >> >> > Matplotlib is not OK. > >> >> > >> >> I agree. If it's easy to hack around it and issue a warning for now, > >> >> and doesn't have other negative consequences, then IMO we should give > >> >> matplotlib a release or so worth of grace period to fix things. > >> > > >> > > >> > Here is another example, from skimage. > >> > > >> > ====================================================================== > >> > ERROR: test_join.test_relabel_sequential_offset1 > >> > ---------------------------------------------------------------------- > >> > Traceback (most recent call last): > >> > File "X:\Python27-x64\lib\site-packages\nose\case.py", line 197, in > >> > runTest > >> > self.test(*self.arg) > >> > File > >> > > >> > > "X:\Python27-x64\lib\site-packages\skimage\segmentation\tests\test_join.py", > >> > line 30, in test_relabel_sequential_offset1 > >> > ar_relab, fw, inv = relabel_sequential(ar) > >> > File > >> > "X:\Python27-x64\lib\site-packages\skimage\segmentation\_join.py", > >> > line 127, in relabel_sequential > >> > forward_map[labels0] = np.arange(offset, offset + len(labels0) + > 1) > >> > ValueError: shape mismatch: value array of shape (6,) could not be > >> > broadcast > >> > to indexing result of shape (5,) > >> > > >> > Which is pretty clearly a coding error. Unfortunately, the error is in > >> > the > >> > package rather than the test. > >> > > >> > The only easy way to fix all of these sorts of things is to revert the > >> > indexing changes, and I'm loathe to do that. Grrr... > >> > >> Ugh, that's pretty bad :-/. Do you really think we can't use a > >> band-aid over the new indexing code, though? > > > > > > Yeah, we can. But Sebastian doesn't have time and I'm unfamiliar with the > > code, so it may take a while... > > Fair enough! > > I guess that if what are (arguably) bugs in matplotlib and > scikit-image are holding up the numpy release, then it's worth CC'ing > their mailing lists in case someone feels like volunteering to fix > it... ;-). > I can do that ;) Doesn't help with the release though unless we want to document the errors in the release notes and tell folks to wait on the next release of the packages. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Jul 4 17:33:13 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 4 Jul 2014 22:33:13 +0100 Subject: [Numpy-discussion] Questions about fixes for 1.9.0rc2 In-Reply-To: References: Message-ID: On Fri, Jul 4, 2014 at 10:31 PM, Charles R Harris wrote: > > On Fri, Jul 4, 2014 at 3:15 PM, Nathaniel Smith wrote: >> >> On Fri, Jul 4, 2014 at 9:48 PM, Charles R Harris >> wrote: >> > >> > On Fri, Jul 4, 2014 at 2:41 PM, Nathaniel Smith wrote: >> >> >> >> On Fri, Jul 4, 2014 at 9:33 PM, Charles R Harris >> >> wrote: >> >> > >> >> > On Fri, Jul 4, 2014 at 2:09 PM, Nathaniel Smith >> >> > wrote: >> >> >> >> >> >> On Fri, Jul 4, 2014 at 9:02 PM, Ralf Gommers >> >> >> >> >> >> wrote: >> >> >> > >> >> >> > On Fri, Jul 4, 2014 at 10:00 PM, Charles R Harris >> >> >> > wrote: >> >> >> >> >> >> >> >> On Fri, Jul 4, 2014 at 1:42 PM, Charles R Harris >> >> >> >> wrote: >> >> >> >>> >> >> >> >>> Sebastian Seberg has fixed one class of test failures due to the >> >> >> >>> indexing >> >> >> >>> changes in numpy 1.9.0b1. There are some remaining errors, and >> >> >> >>> in >> >> >> >>> the >> >> >> >>> case >> >> >> >>> of the Matplotlib failures, they look to me to be Matplotlib >> >> >> >>> bugs. >> >> >> >>> The >> >> >> >>> 2-d >> >> >> >>> arrays that cause the error are returned by the overloaded >> >> >> >>> _interpolate_single_key function in CubicTriInterpolator that is >> >> >> >>> documented >> >> >> >>> in the base class to return a 1-d array, whereas the actual >> >> >> >>> dimensions >> >> >> >>> are >> >> >> >>> of the form (n, 1). The question is, what is the best work >> >> >> >>> around >> >> >> >>> here >> >> >> >>> for >> >> >> >>> these sorts errors? Can we afford to break Matplotlib and other >> >> >> >>> packages on >> >> >> >>> account of a bug that was previously accepted by Numpy? >> >> >> > >> >> >> > >> >> >> > It depends how bad the break is, but in principle I'd say that >> >> >> > breaking >> >> >> > Matplotlib is not OK. >> >> >> >> >> >> I agree. If it's easy to hack around it and issue a warning for now, >> >> >> and doesn't have other negative consequences, then IMO we should >> >> >> give >> >> >> matplotlib a release or so worth of grace period to fix things. >> >> > >> >> > >> >> > Here is another example, from skimage. >> >> > >> >> > >> >> > ====================================================================== >> >> > ERROR: test_join.test_relabel_sequential_offset1 >> >> > >> >> > ---------------------------------------------------------------------- >> >> > Traceback (most recent call last): >> >> > File "X:\Python27-x64\lib\site-packages\nose\case.py", line 197, in >> >> > runTest >> >> > self.test(*self.arg) >> >> > File >> >> > >> >> > >> >> > "X:\Python27-x64\lib\site-packages\skimage\segmentation\tests\test_join.py", >> >> > line 30, in test_relabel_sequential_offset1 >> >> > ar_relab, fw, inv = relabel_sequential(ar) >> >> > File >> >> > "X:\Python27-x64\lib\site-packages\skimage\segmentation\_join.py", >> >> > line 127, in relabel_sequential >> >> > forward_map[labels0] = np.arange(offset, offset + len(labels0) + >> >> > 1) >> >> > ValueError: shape mismatch: value array of shape (6,) could not be >> >> > broadcast >> >> > to indexing result of shape (5,) >> >> > >> >> > Which is pretty clearly a coding error. Unfortunately, the error is >> >> > in >> >> > the >> >> > package rather than the test. >> >> > >> >> > The only easy way to fix all of these sorts of things is to revert >> >> > the >> >> > indexing changes, and I'm loathe to do that. Grrr... >> >> >> >> Ugh, that's pretty bad :-/. Do you really think we can't use a >> >> band-aid over the new indexing code, though? >> > >> > >> > Yeah, we can. But Sebastian doesn't have time and I'm unfamiliar with >> > the >> > code, so it may take a while... >> >> Fair enough! >> >> I guess that if what are (arguably) bugs in matplotlib and >> scikit-image are holding up the numpy release, then it's worth CC'ing >> their mailing lists in case someone feels like volunteering to fix >> it... ;-). > > I can do that ;) Doesn't help with the release though unless we want to > document the errors in the release notes and tell folks to wait on the next > release of the packages. Oh, I meant, in case they want to fix numpy so that their packages don't break :-). -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From charlesr.harris at gmail.com Fri Jul 4 19:07:22 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 4 Jul 2014 17:07:22 -0600 Subject: [Numpy-discussion] Questions about fixes for 1.9.0rc2 In-Reply-To: References: Message-ID: On Fri, Jul 4, 2014 at 3:33 PM, Nathaniel Smith wrote: > On Fri, Jul 4, 2014 at 10:31 PM, Charles R Harris > wrote: > > > > On Fri, Jul 4, 2014 at 3:15 PM, Nathaniel Smith wrote: > >> > >> On Fri, Jul 4, 2014 at 9:48 PM, Charles R Harris > >> wrote: > >> > > >> > On Fri, Jul 4, 2014 at 2:41 PM, Nathaniel Smith > wrote: > >> >> > >> >> On Fri, Jul 4, 2014 at 9:33 PM, Charles R Harris > >> >> wrote: > >> >> > > >> >> > On Fri, Jul 4, 2014 at 2:09 PM, Nathaniel Smith > >> >> > wrote: > >> >> >> > >> >> >> On Fri, Jul 4, 2014 at 9:02 PM, Ralf Gommers > >> >> >> > >> >> >> wrote: > >> >> >> > > >> >> >> > On Fri, Jul 4, 2014 at 10:00 PM, Charles R Harris > >> >> >> > wrote: > >> >> >> >> > >> >> >> >> On Fri, Jul 4, 2014 at 1:42 PM, Charles R Harris > >> >> >> >> wrote: > >> >> >> >>> > >> >> >> >>> Sebastian Seberg has fixed one class of test failures due to > the > >> >> >> >>> indexing > >> >> >> >>> changes in numpy 1.9.0b1. There are some remaining errors, > and > >> >> >> >>> in > >> >> >> >>> the > >> >> >> >>> case > >> >> >> >>> of the Matplotlib failures, they look to me to be Matplotlib > >> >> >> >>> bugs. > >> >> >> >>> The > >> >> >> >>> 2-d > >> >> >> >>> arrays that cause the error are returned by the overloaded > >> >> >> >>> _interpolate_single_key function in CubicTriInterpolator that > is > >> >> >> >>> documented > >> >> >> >>> in the base class to return a 1-d array, whereas the actual > >> >> >> >>> dimensions > >> >> >> >>> are > >> >> >> >>> of the form (n, 1). The question is, what is the best work > >> >> >> >>> around > >> >> >> >>> here > >> >> >> >>> for > >> >> >> >>> these sorts errors? Can we afford to break Matplotlib and > other > >> >> >> >>> packages on > >> >> >> >>> account of a bug that was previously accepted by Numpy? > >> >> >> > > >> >> >> > > >> >> >> > It depends how bad the break is, but in principle I'd say that > >> >> >> > breaking > >> >> >> > Matplotlib is not OK. > >> >> >> > >> >> >> I agree. If it's easy to hack around it and issue a warning for > now, > >> >> >> and doesn't have other negative consequences, then IMO we should > >> >> >> give > >> >> >> matplotlib a release or so worth of grace period to fix things. > >> >> > > >> >> > > >> >> > Here is another example, from skimage. > >> >> > > >> >> > > >> >> > > ====================================================================== > >> >> > ERROR: test_join.test_relabel_sequential_offset1 > >> >> > > >> >> > > ---------------------------------------------------------------------- > >> >> > Traceback (most recent call last): > >> >> > File "X:\Python27-x64\lib\site-packages\nose\case.py", line 197, > in > >> >> > runTest > >> >> > self.test(*self.arg) > >> >> > File > >> >> > > >> >> > > >> >> > > "X:\Python27-x64\lib\site-packages\skimage\segmentation\tests\test_join.py", > >> >> > line 30, in test_relabel_sequential_offset1 > >> >> > ar_relab, fw, inv = relabel_sequential(ar) > >> >> > File > >> >> > "X:\Python27-x64\lib\site-packages\skimage\segmentation\_join.py", > >> >> > line 127, in relabel_sequential > >> >> > forward_map[labels0] = np.arange(offset, offset + len(labels0) > + > >> >> > 1) > >> >> > ValueError: shape mismatch: value array of shape (6,) could not be > >> >> > broadcast > >> >> > to indexing result of shape (5,) > >> >> > > >> >> > Which is pretty clearly a coding error. Unfortunately, the error is > >> >> > in > >> >> > the > >> >> > package rather than the test. > >> >> > > >> >> > The only easy way to fix all of these sorts of things is to revert > >> >> > the > >> >> > indexing changes, and I'm loathe to do that. Grrr... > >> >> > >> >> Ugh, that's pretty bad :-/. Do you really think we can't use a > >> >> band-aid over the new indexing code, though? > >> > > >> > > >> > Yeah, we can. But Sebastian doesn't have time and I'm unfamiliar with > >> > the > >> > code, so it may take a while... > >> > >> Fair enough! > >> > >> I guess that if what are (arguably) bugs in matplotlib and > >> scikit-image are holding up the numpy release, then it's worth CC'ing > >> their mailing lists in case someone feels like volunteering to fix > >> it... ;-). > > > > I can do that ;) Doesn't help with the release though unless we want to > > document the errors in the release notes and tell folks to wait on the > next > > release of the packages. > > Oh, I meant, in case they want to fix numpy so that their packages > don't break :-). > > I've filed issues with all the affected projects. Here is the current status. matplotlib -- Reported, being fixed, should be in 1.4 in a few days. skimage -- Reported. scikit-learn -- Reported. tables -- Reported. statsmodels -- Reported, fixed in master. bottleneck -- Reported. IIRC, kwgoodman already knew of the changes. pyfits -- Reported to astropy. milk -- Reported. pandas -- Reportedly fixed in master. If the issues are fixed in matplotlib and pandas I'd be inclined to release as is with a mention of versions in the release notes. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeffreback at gmail.com Fri Jul 4 19:14:06 2014 From: jeffreback at gmail.com (Jeff Reback) Date: Fri, 4 Jul 2014 19:14:06 -0400 Subject: [Numpy-discussion] Questions about fixes for 1.9.0rc2 In-Reply-To: References: Message-ID: <190AABD2-4BF4-46AB-BCB6-9E8A2BCE00E7@gmail.com> ok from pandas we test with numpy master on Travis (which does pick up things!) thanks > On Jul 4, 2014, at 7:07 PM, Charles R Harris wrote: > > > > >> On Fri, Jul 4, 2014 at 3:33 PM, Nathaniel Smith wrote: >> On Fri, Jul 4, 2014 at 10:31 PM, Charles R Harris >> wrote: >> > >> > On Fri, Jul 4, 2014 at 3:15 PM, Nathaniel Smith wrote: >> >> >> >> On Fri, Jul 4, 2014 at 9:48 PM, Charles R Harris >> >> wrote: >> >> > >> >> > On Fri, Jul 4, 2014 at 2:41 PM, Nathaniel Smith wrote: >> >> >> >> >> >> On Fri, Jul 4, 2014 at 9:33 PM, Charles R Harris >> >> >> wrote: >> >> >> > >> >> >> > On Fri, Jul 4, 2014 at 2:09 PM, Nathaniel Smith >> >> >> > wrote: >> >> >> >> >> >> >> >> On Fri, Jul 4, 2014 at 9:02 PM, Ralf Gommers >> >> >> >> >> >> >> >> wrote: >> >> >> >> > >> >> >> >> > On Fri, Jul 4, 2014 at 10:00 PM, Charles R Harris >> >> >> >> > wrote: >> >> >> >> >> >> >> >> >> >> On Fri, Jul 4, 2014 at 1:42 PM, Charles R Harris >> >> >> >> >> wrote: >> >> >> >> >>> >> >> >> >> >>> Sebastian Seberg has fixed one class of test failures due to the >> >> >> >> >>> indexing >> >> >> >> >>> changes in numpy 1.9.0b1. There are some remaining errors, and >> >> >> >> >>> in >> >> >> >> >>> the >> >> >> >> >>> case >> >> >> >> >>> of the Matplotlib failures, they look to me to be Matplotlib >> >> >> >> >>> bugs. >> >> >> >> >>> The >> >> >> >> >>> 2-d >> >> >> >> >>> arrays that cause the error are returned by the overloaded >> >> >> >> >>> _interpolate_single_key function in CubicTriInterpolator that is >> >> >> >> >>> documented >> >> >> >> >>> in the base class to return a 1-d array, whereas the actual >> >> >> >> >>> dimensions >> >> >> >> >>> are >> >> >> >> >>> of the form (n, 1). The question is, what is the best work >> >> >> >> >>> around >> >> >> >> >>> here >> >> >> >> >>> for >> >> >> >> >>> these sorts errors? Can we afford to break Matplotlib and other >> >> >> >> >>> packages on >> >> >> >> >>> account of a bug that was previously accepted by Numpy? >> >> >> >> > >> >> >> >> > >> >> >> >> > It depends how bad the break is, but in principle I'd say that >> >> >> >> > breaking >> >> >> >> > Matplotlib is not OK. >> >> >> >> >> >> >> >> I agree. If it's easy to hack around it and issue a warning for now, >> >> >> >> and doesn't have other negative consequences, then IMO we should >> >> >> >> give >> >> >> >> matplotlib a release or so worth of grace period to fix things. >> >> >> > >> >> >> > >> >> >> > Here is another example, from skimage. >> >> >> > >> >> >> > >> >> >> > ====================================================================== >> >> >> > ERROR: test_join.test_relabel_sequential_offset1 >> >> >> > >> >> >> > ---------------------------------------------------------------------- >> >> >> > Traceback (most recent call last): >> >> >> > File "X:\Python27-x64\lib\site-packages\nose\case.py", line 197, in >> >> >> > runTest >> >> >> > self.test(*self.arg) >> >> >> > File >> >> >> > >> >> >> > >> >> >> > "X:\Python27-x64\lib\site-packages\skimage\segmentation\tests\test_join.py", >> >> >> > line 30, in test_relabel_sequential_offset1 >> >> >> > ar_relab, fw, inv = relabel_sequential(ar) >> >> >> > File >> >> >> > "X:\Python27-x64\lib\site-packages\skimage\segmentation\_join.py", >> >> >> > line 127, in relabel_sequential >> >> >> > forward_map[labels0] = np.arange(offset, offset + len(labels0) + >> >> >> > 1) >> >> >> > ValueError: shape mismatch: value array of shape (6,) could not be >> >> >> > broadcast >> >> >> > to indexing result of shape (5,) >> >> >> > >> >> >> > Which is pretty clearly a coding error. Unfortunately, the error is >> >> >> > in >> >> >> > the >> >> >> > package rather than the test. >> >> >> > >> >> >> > The only easy way to fix all of these sorts of things is to revert >> >> >> > the >> >> >> > indexing changes, and I'm loathe to do that. Grrr... >> >> >> >> >> >> Ugh, that's pretty bad :-/. Do you really think we can't use a >> >> >> band-aid over the new indexing code, though? >> >> > >> >> > >> >> > Yeah, we can. But Sebastian doesn't have time and I'm unfamiliar with >> >> > the >> >> > code, so it may take a while... >> >> >> >> Fair enough! >> >> >> >> I guess that if what are (arguably) bugs in matplotlib and >> >> scikit-image are holding up the numpy release, then it's worth CC'ing >> >> their mailing lists in case someone feels like volunteering to fix >> >> it... ;-). >> > >> > I can do that ;) Doesn't help with the release though unless we want to >> > document the errors in the release notes and tell folks to wait on the next >> > release of the packages. >> >> Oh, I meant, in case they want to fix numpy so that their packages >> don't break :-). > > I've filed issues with all the affected projects. Here is the current status. > > matplotlib -- Reported, being fixed, should be in 1.4 in a few days. > skimage -- Reported. > scikit-learn -- Reported. > tables -- Reported. > statsmodels -- Reported, fixed in master. > bottleneck -- Reported. IIRC, kwgoodman already knew of the changes. > pyfits -- Reported to astropy. > milk -- Reported. > pandas -- Reportedly fixed in master. > > If the issues are fixed in matplotlib and pandas I'd be inclined to release as is with a mention of versions in the release notes. > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Jul 4 19:41:17 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 5 Jul 2014 00:41:17 +0100 Subject: [Numpy-discussion] Questions about fixes for 1.9.0rc2 In-Reply-To: References: Message-ID: On 5 Jul 2014 00:07, "Charles R Harris" wrote: > > > > > On Fri, Jul 4, 2014 at 3:33 PM, Nathaniel Smith wrote: >> >> On Fri, Jul 4, 2014 at 10:31 PM, Charles R Harris >> wrote: >> > >> > On Fri, Jul 4, 2014 at 3:15 PM, Nathaniel Smith wrote: >> >> >> >> On Fri, Jul 4, 2014 at 9:48 PM, Charles R Harris >> >> wrote: >> >> > >> >> > On Fri, Jul 4, 2014 at 2:41 PM, Nathaniel Smith wrote: >> >> >> >> >> >> On Fri, Jul 4, 2014 at 9:33 PM, Charles R Harris >> >> >> wrote: >> >> >> > >> >> >> > On Fri, Jul 4, 2014 at 2:09 PM, Nathaniel Smith >> >> >> > wrote: >> >> >> >> >> >> >> >> On Fri, Jul 4, 2014 at 9:02 PM, Ralf Gommers >> >> >> >> >> >> >> >> wrote: >> >> >> >> > >> >> >> >> > On Fri, Jul 4, 2014 at 10:00 PM, Charles R Harris >> >> >> >> > wrote: >> >> >> >> >> >> >> >> >> >> On Fri, Jul 4, 2014 at 1:42 PM, Charles R Harris >> >> >> >> >> wrote: >> >> >> >> >>> >> >> >> >> >>> Sebastian Seberg has fixed one class of test failures due to the >> >> >> >> >>> indexing >> >> >> >> >>> changes in numpy 1.9.0b1. There are some remaining errors, and >> >> >> >> >>> in >> >> >> >> >>> the >> >> >> >> >>> case >> >> >> >> >>> of the Matplotlib failures, they look to me to be Matplotlib >> >> >> >> >>> bugs. >> >> >> >> >>> The >> >> >> >> >>> 2-d >> >> >> >> >>> arrays that cause the error are returned by the overloaded >> >> >> >> >>> _interpolate_single_key function in CubicTriInterpolator that is >> >> >> >> >>> documented >> >> >> >> >>> in the base class to return a 1-d array, whereas the actual >> >> >> >> >>> dimensions >> >> >> >> >>> are >> >> >> >> >>> of the form (n, 1). The question is, what is the best work >> >> >> >> >>> around >> >> >> >> >>> here >> >> >> >> >>> for >> >> >> >> >>> these sorts errors? Can we afford to break Matplotlib and other >> >> >> >> >>> packages on >> >> >> >> >>> account of a bug that was previously accepted by Numpy? >> >> >> >> > >> >> >> >> > >> >> >> >> > It depends how bad the break is, but in principle I'd say that >> >> >> >> > breaking >> >> >> >> > Matplotlib is not OK. >> >> >> >> >> >> >> >> I agree. If it's easy to hack around it and issue a warning for now, >> >> >> >> and doesn't have other negative consequences, then IMO we should >> >> >> >> give >> >> >> >> matplotlib a release or so worth of grace period to fix things. >> >> >> > >> >> >> > >> >> >> > Here is another example, from skimage. >> >> >> > >> >> >> > >> >> >> > ====================================================================== >> >> >> > ERROR: test_join.test_relabel_sequential_offset1 >> >> >> > >> >> >> > ---------------------------------------------------------------------- >> >> >> > Traceback (most recent call last): >> >> >> > File "X:\Python27-x64\lib\site-packages\nose\case.py", line 197, in >> >> >> > runTest >> >> >> > self.test(*self.arg) >> >> >> > File >> >> >> > >> >> >> > >> >> >> > "X:\Python27-x64\lib\site-packages\skimage\segmentation\tests\test_join.py", >> >> >> > line 30, in test_relabel_sequential_offset1 >> >> >> > ar_relab, fw, inv = relabel_sequential(ar) >> >> >> > File >> >> >> > "X:\Python27-x64\lib\site-packages\skimage\segmentation\_join.py", >> >> >> > line 127, in relabel_sequential >> >> >> > forward_map[labels0] = np.arange(offset, offset + len(labels0) + >> >> >> > 1) >> >> >> > ValueError: shape mismatch: value array of shape (6,) could not be >> >> >> > broadcast >> >> >> > to indexing result of shape (5,) >> >> >> > >> >> >> > Which is pretty clearly a coding error. Unfortunately, the error is >> >> >> > in >> >> >> > the >> >> >> > package rather than the test. >> >> >> > >> >> >> > The only easy way to fix all of these sorts of things is to revert >> >> >> > the >> >> >> > indexing changes, and I'm loathe to do that. Grrr... >> >> >> >> >> >> Ugh, that's pretty bad :-/. Do you really think we can't use a >> >> >> band-aid over the new indexing code, though? >> >> > >> >> > >> >> > Yeah, we can. But Sebastian doesn't have time and I'm unfamiliar with >> >> > the >> >> > code, so it may take a while... >> >> >> >> Fair enough! >> >> >> >> I guess that if what are (arguably) bugs in matplotlib and >> >> scikit-image are holding up the numpy release, then it's worth CC'ing >> >> their mailing lists in case someone feels like volunteering to fix >> >> it... ;-). >> > >> > I can do that ;) Doesn't help with the release though unless we want to >> > document the errors in the release notes and tell folks to wait on the next >> > release of the packages. >> >> Oh, I meant, in case they want to fix numpy so that their packages >> don't break :-). >> > > I've filed issues with all the affected projects. Here is the current status. > > matplotlib -- Reported, being fixed, should be in 1.4 in a few days. > skimage -- Reported. > scikit-learn -- Reported. > tables -- Reported. > statsmodels -- Reported, fixed in master. > bottleneck -- Reported. IIRC, kwgoodman already knew of the changes. > pyfits -- Reported to astropy. > milk -- Reported. > pandas -- Reportedly fixed in master. That is a massive pile of affected projects :-(. My worry is that if all these projects we know about are broken, then how many other codebases that we aren't testing are also broken? > If the issues are fixed in matplotlib and pandas I'd be inclined to release as is with a mention of versions in the release notes. Even if it's fixed in pandas master, how long until it's in user's hands? -n > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeffreback at gmail.com Fri Jul 4 19:43:28 2014 From: jeffreback at gmail.com (Jeff Reback) Date: Fri, 4 Jul 2014 19:43:28 -0400 Subject: [Numpy-discussion] Questions about fixes for 1.9.0rc2 In-Reply-To: References: Message-ID: <2DCD0E92-DC55-4662-BAA2-184FEAEEE471@gmail.com> pandas 0.14.1 scheduled for end of next week (was waiting to see schedule for numpy 1.9) but works either way > On Jul 4, 2014, at 7:41 PM, Nathaniel Smith wrote: > > On 5 Jul 2014 00:07, "Charles R Harris" wrote: > > > > > > > > > > On Fri, Jul 4, 2014 at 3:33 PM, Nathaniel Smith wrote: > >> > >> On Fri, Jul 4, 2014 at 10:31 PM, Charles R Harris > >> wrote: > >> > > >> > On Fri, Jul 4, 2014 at 3:15 PM, Nathaniel Smith wrote: > >> >> > >> >> On Fri, Jul 4, 2014 at 9:48 PM, Charles R Harris > >> >> wrote: > >> >> > > >> >> > On Fri, Jul 4, 2014 at 2:41 PM, Nathaniel Smith wrote: > >> >> >> > >> >> >> On Fri, Jul 4, 2014 at 9:33 PM, Charles R Harris > >> >> >> wrote: > >> >> >> > > >> >> >> > On Fri, Jul 4, 2014 at 2:09 PM, Nathaniel Smith > >> >> >> > wrote: > >> >> >> >> > >> >> >> >> On Fri, Jul 4, 2014 at 9:02 PM, Ralf Gommers > >> >> >> >> > >> >> >> >> wrote: > >> >> >> >> > > >> >> >> >> > On Fri, Jul 4, 2014 at 10:00 PM, Charles R Harris > >> >> >> >> > wrote: > >> >> >> >> >> > >> >> >> >> >> On Fri, Jul 4, 2014 at 1:42 PM, Charles R Harris > >> >> >> >> >> wrote: > >> >> >> >> >>> > >> >> >> >> >>> Sebastian Seberg has fixed one class of test failures due to the > >> >> >> >> >>> indexing > >> >> >> >> >>> changes in numpy 1.9.0b1. There are some remaining errors, and > >> >> >> >> >>> in > >> >> >> >> >>> the > >> >> >> >> >>> case > >> >> >> >> >>> of the Matplotlib failures, they look to me to be Matplotlib > >> >> >> >> >>> bugs. > >> >> >> >> >>> The > >> >> >> >> >>> 2-d > >> >> >> >> >>> arrays that cause the error are returned by the overloaded > >> >> >> >> >>> _interpolate_single_key function in CubicTriInterpolator that is > >> >> >> >> >>> documented > >> >> >> >> >>> in the base class to return a 1-d array, whereas the actual > >> >> >> >> >>> dimensions > >> >> >> >> >>> are > >> >> >> >> >>> of the form (n, 1). The question is, what is the best work > >> >> >> >> >>> around > >> >> >> >> >>> here > >> >> >> >> >>> for > >> >> >> >> >>> these sorts errors? Can we afford to break Matplotlib and other > >> >> >> >> >>> packages on > >> >> >> >> >>> account of a bug that was previously accepted by Numpy? > >> >> >> >> > > >> >> >> >> > > >> >> >> >> > It depends how bad the break is, but in principle I'd say that > >> >> >> >> > breaking > >> >> >> >> > Matplotlib is not OK. > >> >> >> >> > >> >> >> >> I agree. If it's easy to hack around it and issue a warning for now, > >> >> >> >> and doesn't have other negative consequences, then IMO we should > >> >> >> >> give > >> >> >> >> matplotlib a release or so worth of grace period to fix things. > >> >> >> > > >> >> >> > > >> >> >> > Here is another example, from skimage. > >> >> >> > > >> >> >> > > >> >> >> > ====================================================================== > >> >> >> > ERROR: test_join.test_relabel_sequential_offset1 > >> >> >> > > >> >> >> > ---------------------------------------------------------------------- > >> >> >> > Traceback (most recent call last): > >> >> >> > File "X:\Python27-x64\lib\site-packages\nose\case.py", line 197, in > >> >> >> > runTest > >> >> >> > self.test(*self.arg) > >> >> >> > File > >> >> >> > > >> >> >> > > >> >> >> > "X:\Python27-x64\lib\site-packages\skimage\segmentation\tests\test_join.py", > >> >> >> > line 30, in test_relabel_sequential_offset1 > >> >> >> > ar_relab, fw, inv = relabel_sequential(ar) > >> >> >> > File > >> >> >> > "X:\Python27-x64\lib\site-packages\skimage\segmentation\_join.py", > >> >> >> > line 127, in relabel_sequential > >> >> >> > forward_map[labels0] = np.arange(offset, offset + len(labels0) + > >> >> >> > 1) > >> >> >> > ValueError: shape mismatch: value array of shape (6,) could not be > >> >> >> > broadcast > >> >> >> > to indexing result of shape (5,) > >> >> >> > > >> >> >> > Which is pretty clearly a coding error. Unfortunately, the error is > >> >> >> > in > >> >> >> > the > >> >> >> > package rather than the test. > >> >> >> > > >> >> >> > The only easy way to fix all of these sorts of things is to revert > >> >> >> > the > >> >> >> > indexing changes, and I'm loathe to do that. Grrr... > >> >> >> > >> >> >> Ugh, that's pretty bad :-/. Do you really think we can't use a > >> >> >> band-aid over the new indexing code, though? > >> >> > > >> >> > > >> >> > Yeah, we can. But Sebastian doesn't have time and I'm unfamiliar with > >> >> > the > >> >> > code, so it may take a while... > >> >> > >> >> Fair enough! > >> >> > >> >> I guess that if what are (arguably) bugs in matplotlib and > >> >> scikit-image are holding up the numpy release, then it's worth CC'ing > >> >> their mailing lists in case someone feels like volunteering to fix > >> >> it... ;-). > >> > > >> > I can do that ;) Doesn't help with the release though unless we want to > >> > document the errors in the release notes and tell folks to wait on the next > >> > release of the packages. > >> > >> Oh, I meant, in case they want to fix numpy so that their packages > >> don't break :-). > >> > > > > I've filed issues with all the affected projects. Here is the current status. > > > > matplotlib -- Reported, being fixed, should be in 1.4 in a few days. > > skimage -- Reported. > > scikit-learn -- Reported. > > tables -- Reported. > > statsmodels -- Reported, fixed in master. > > bottleneck -- Reported. IIRC, kwgoodman already knew of the changes. > > pyfits -- Reported to astropy. > > milk -- Reported. > > pandas -- Reportedly fixed in master. > > That is a massive pile of affected projects :-(. > > My worry is that if all these projects we know about are broken, then how many other codebases that we aren't testing are also broken? > > > If the issues are fixed in matplotlib and pandas I'd be inclined to release as is with a mention of versions in the release notes. > > Even if it's fixed in pandas master, how long until it's in user's hands? > > -n > > > Chuck > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jul 4 22:25:45 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 4 Jul 2014 20:25:45 -0600 Subject: [Numpy-discussion] Remove bento from numpy Message-ID: Ralf likes the speed of bento, but it is not currently maintained and does not properly build numpy with all the optimizations added by Julian. I find the usual setup.py method fast enough and it has the advantage that all the numpy developers can deal with it. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jul 4 22:56:18 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 4 Jul 2014 20:56:18 -0600 Subject: [Numpy-discussion] Questions about fixes for 1.9.0rc2 In-Reply-To: References: Message-ID: On Fri, Jul 4, 2014 at 5:07 PM, Charles R Harris wrote: > > > > On Fri, Jul 4, 2014 at 3:33 PM, Nathaniel Smith wrote: > >> On Fri, Jul 4, 2014 at 10:31 PM, Charles R Harris >> wrote: >> > >> > On Fri, Jul 4, 2014 at 3:15 PM, Nathaniel Smith wrote: >> >> >> >> On Fri, Jul 4, 2014 at 9:48 PM, Charles R Harris >> >> wrote: >> >> > >> >> > On Fri, Jul 4, 2014 at 2:41 PM, Nathaniel Smith >> wrote: >> >> >> >> >> >> On Fri, Jul 4, 2014 at 9:33 PM, Charles R Harris >> >> >> wrote: >> >> >> > >> >> >> > On Fri, Jul 4, 2014 at 2:09 PM, Nathaniel Smith >> >> >> > wrote: >> >> >> >> >> >> >> >> On Fri, Jul 4, 2014 at 9:02 PM, Ralf Gommers >> >> >> >> >> >> >> >> wrote: >> >> >> >> > >> >> >> >> > On Fri, Jul 4, 2014 at 10:00 PM, Charles R Harris >> >> >> >> > wrote: >> >> >> >> >> >> >> >> >> >> On Fri, Jul 4, 2014 at 1:42 PM, Charles R Harris >> >> >> >> >> wrote: >> >> >> >> >>> >> >> >> >> >>> Sebastian Seberg has fixed one class of test failures due to >> the >> >> >> >> >>> indexing >> >> >> >> >>> changes in numpy 1.9.0b1. There are some remaining errors, >> and >> >> >> >> >>> in >> >> >> >> >>> the >> >> >> >> >>> case >> >> >> >> >>> of the Matplotlib failures, they look to me to be Matplotlib >> >> >> >> >>> bugs. >> >> >> >> >>> The >> >> >> >> >>> 2-d >> >> >> >> >>> arrays that cause the error are returned by the overloaded >> >> >> >> >>> _interpolate_single_key function in CubicTriInterpolator >> that is >> >> >> >> >>> documented >> >> >> >> >>> in the base class to return a 1-d array, whereas the actual >> >> >> >> >>> dimensions >> >> >> >> >>> are >> >> >> >> >>> of the form (n, 1). The question is, what is the best work >> >> >> >> >>> around >> >> >> >> >>> here >> >> >> >> >>> for >> >> >> >> >>> these sorts errors? Can we afford to break Matplotlib and >> other >> >> >> >> >>> packages on >> >> >> >> >>> account of a bug that was previously accepted by Numpy? >> >> >> >> > >> >> >> >> > >> >> >> >> > It depends how bad the break is, but in principle I'd say that >> >> >> >> > breaking >> >> >> >> > Matplotlib is not OK. >> >> >> >> >> >> >> >> I agree. If it's easy to hack around it and issue a warning for >> now, >> >> >> >> and doesn't have other negative consequences, then IMO we should >> >> >> >> give >> >> >> >> matplotlib a release or so worth of grace period to fix things. >> >> >> > >> >> >> > >> >> >> > Here is another example, from skimage. >> >> >> > >> >> >> > >> >> >> > >> ====================================================================== >> >> >> > ERROR: test_join.test_relabel_sequential_offset1 >> >> >> > >> >> >> > >> ---------------------------------------------------------------------- >> >> >> > Traceback (most recent call last): >> >> >> > File "X:\Python27-x64\lib\site-packages\nose\case.py", line >> 197, in >> >> >> > runTest >> >> >> > self.test(*self.arg) >> >> >> > File >> >> >> > >> >> >> > >> >> >> > >> "X:\Python27-x64\lib\site-packages\skimage\segmentation\tests\test_join.py", >> >> >> > line 30, in test_relabel_sequential_offset1 >> >> >> > ar_relab, fw, inv = relabel_sequential(ar) >> >> >> > File >> >> >> > "X:\Python27-x64\lib\site-packages\skimage\segmentation\_join.py", >> >> >> > line 127, in relabel_sequential >> >> >> > forward_map[labels0] = np.arange(offset, offset + >> len(labels0) + >> >> >> > 1) >> >> >> > ValueError: shape mismatch: value array of shape (6,) could not be >> >> >> > broadcast >> >> >> > to indexing result of shape (5,) >> >> >> > >> >> >> > Which is pretty clearly a coding error. Unfortunately, the error >> is >> >> >> > in >> >> >> > the >> >> >> > package rather than the test. >> >> >> > >> >> >> > The only easy way to fix all of these sorts of things is to revert >> >> >> > the >> >> >> > indexing changes, and I'm loathe to do that. Grrr... >> >> >> >> >> >> Ugh, that's pretty bad :-/. Do you really think we can't use a >> >> >> band-aid over the new indexing code, though? >> >> > >> >> > >> >> > Yeah, we can. But Sebastian doesn't have time and I'm unfamiliar with >> >> > the >> >> > code, so it may take a while... >> >> >> >> Fair enough! >> >> >> >> I guess that if what are (arguably) bugs in matplotlib and >> >> scikit-image are holding up the numpy release, then it's worth CC'ing >> >> their mailing lists in case someone feels like volunteering to fix >> >> it... ;-). >> > >> > I can do that ;) Doesn't help with the release though unless we want to >> > document the errors in the release notes and tell folks to wait on the >> next >> > release of the packages. >> >> Oh, I meant, in case they want to fix numpy so that their packages >> don't break :-). >> >> > I've filed issues with all the affected projects. Here is the current > status. > > matplotlib -- Reported, being fixed, should be in 1.4 in a few days. > skimage -- Reported. > scikit-learn -- Reported. > tables -- Reported. > statsmodels -- Reported, fixed in master. > bottleneck -- Reported. IIRC, kwgoodman already knew of the changes. > pyfits -- Reported to astropy. > milk -- Reported. > pandas -- Reportedly fixed in master. > skimage is now fixed in master. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sat Jul 5 04:13:46 2014 From: cournape at gmail.com (David Cournapeau) Date: Sat, 5 Jul 2014 17:13:46 +0900 Subject: [Numpy-discussion] Remove bento from numpy In-Reply-To: References: Message-ID: On Sat, Jul 5, 2014 at 11:25 AM, Charles R Harris wrote: > Ralf likes the speed of bento, but it is not currently maintained > What exactly is not maintained ? David -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Jul 5 04:14:21 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 5 Jul 2014 10:14:21 +0200 Subject: [Numpy-discussion] Questions about fixes for 1.9.0rc2 In-Reply-To: References: Message-ID: On Sat, Jul 5, 2014 at 1:41 AM, Nathaniel Smith wrote: > On 5 Jul 2014 00:07, "Charles R Harris" wrote: > > I've filed issues with all the affected projects. Here is the current > status. > > > > matplotlib -- Reported, being fixed, should be in 1.4 in a few days. > > skimage -- Reported. > > scikit-learn -- Reported. > > tables -- Reported. > > statsmodels -- Reported, fixed in master. > > bottleneck -- Reported. IIRC, kwgoodman already knew of the changes. > > pyfits -- Reported to astropy. > > milk -- Reported. > > pandas -- Reportedly fixed in master. > > That is a massive pile of affected projects :-(. > > My worry is that if all these projects we know about are broken, then how > many other codebases that we aren't testing are also broken? > Same worry here. If a major change in numpy breaks ~half of the projects that make up a typical scipy stack, that change should not be made without at least one release that emits warnings first. We would have caught this much earlier had we had something like https://github.com/matthew-brett/scipy-stack-osx-testing. Maybe a good idea to have that as a separate repo in the numpy org, add a few more projects to it, and then regularly run numpy master (or a PR) against the latest releases of those projects. Ralf > If the issues are fixed in matplotlib and pandas I'd be inclined to > release as is with a mention of versions in the release notes. > > Even if it's fixed in pandas master, how long until it's in user's hands? > > -n > > > Chuck > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Jul 5 04:23:26 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 5 Jul 2014 10:23:26 +0200 Subject: [Numpy-discussion] Remove bento from numpy In-Reply-To: References: Message-ID: On Sat, Jul 5, 2014 at 10:13 AM, David Cournapeau wrote: > > > > On Sat, Jul 5, 2014 at 11:25 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> Ralf likes the speed of bento, but it is not currently maintained >> > > What exactly is not maintained ? > The issue is that Julian made some slightly nontrivial changes to core/setup.py and didn't want to update core/bscript. No one else has taken the time either to make those changes. That didn't bother me enough yet to go fix it, because they're all optional features and using Bento builds works just fine at the moment (and is part of the Travis CI test runs, so it'll keep working). I don't think the above is a good reason to remove Bento support. The much faster builds alone are a good reason to keep it. And the assertion that all numpy devs understand numpy.distutils is more than a little questionable:) Ralf > > David > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sat Jul 5 04:44:38 2014 From: cournape at gmail.com (David Cournapeau) Date: Sat, 5 Jul 2014 17:44:38 +0900 Subject: [Numpy-discussion] Remove bento from numpy In-Reply-To: References: Message-ID: On Sat, Jul 5, 2014 at 5:23 PM, Ralf Gommers wrote: > > > > On Sat, Jul 5, 2014 at 10:13 AM, David Cournapeau > wrote: > >> >> >> >> On Sat, Jul 5, 2014 at 11:25 AM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> Ralf likes the speed of bento, but it is not currently maintained >>> >> >> What exactly is not maintained ? >> > > The issue is that Julian made some slightly nontrivial changes to > core/setup.py and didn't want to update core/bscript. No one else has taken > the time either to make those changes. That didn't bother me enough yet to > go fix it, because they're all optional features and using Bento builds > works just fine at the moment (and is part of the Travis CI test runs, so > it'll keep working). > What are those changes ? David -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Jul 5 05:02:14 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 5 Jul 2014 11:02:14 +0200 Subject: [Numpy-discussion] Remove bento from numpy In-Reply-To: References: Message-ID: On Sat, Jul 5, 2014 at 10:44 AM, David Cournapeau wrote: > > > > On Sat, Jul 5, 2014 at 5:23 PM, Ralf Gommers > wrote: > >> >> >> >> On Sat, Jul 5, 2014 at 10:13 AM, David Cournapeau >> wrote: >> >>> >>> >>> >>> On Sat, Jul 5, 2014 at 11:25 AM, Charles R Harris < >>> charlesr.harris at gmail.com> wrote: >>> >>>> Ralf likes the speed of bento, but it is not currently maintained >>>> >>> >>> What exactly is not maintained ? >>> >> >> The issue is that Julian made some slightly nontrivial changes to >> core/setup.py and didn't want to update core/bscript. No one else has taken >> the time either to make those changes. That didn't bother me enough yet to >> go fix it, because they're all optional features and using Bento builds >> works just fine at the moment (and is part of the Travis CI test runs, so >> it'll keep working). >> > > What are those changes? > Comment in bscript: # TODO: add OPTIONAL_HEADERS, OPTIONAL_INTRINSICS and # OPTIONAL_GCC_ATTRIBUTES (see setup.py and gh-3766). These are # performance optimizations for GCC. Plus the changes in https://github.com/numpy/numpy/pull/4692, that apparently weren't documented in bscript as TODO. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Sat Jul 5 07:32:49 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Sat, 05 Jul 2014 13:32:49 +0200 Subject: [Numpy-discussion] Remove bento from numpy In-Reply-To: References: Message-ID: <53B7E261.3010801@googlemail.com> On 05.07.2014 11:02, Ralf Gommers wrote: > > > > On Sat, Jul 5, 2014 at 10:44 AM, David Cournapeau > wrote: > > > > > On Sat, Jul 5, 2014 at 5:23 PM, Ralf Gommers > wrote: > > > > > On Sat, Jul 5, 2014 at 10:13 AM, David Cournapeau > > wrote: > > > > > On Sat, Jul 5, 2014 at 11:25 AM, Charles R Harris > > wrote: > > Ralf likes the speed of bento, but it is not currently > maintained > > > What exactly is not maintained ? > > > The issue is that Julian made some slightly nontrivial changes > to core/setup.py and didn't want to update core/bscript. No one > else has taken the time either to make those changes. That > didn't bother me enough yet to go fix it, because they're all > optional features and using Bento builds works just fine at the > moment (and is part of the Travis CI test runs, so it'll keep > working). > > > What are those changes? > > > Comment in bscript: > > # TODO: add OPTIONAL_HEADERS, OPTIONAL_INTRINSICS and > # OPTIONAL_GCC_ATTRIBUTES (see setup.py and gh-3766). These are > # performance optimizations for GCC. > > Plus the changes in https://github.com/numpy/numpy/pull/4692, that > apparently weren't documented in bscript as TODO. > + bento builds in debug mode which is could be slower because I sprinkled asserts in lots of places From njs at pobox.com Sat Jul 5 07:54:06 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 5 Jul 2014 12:54:06 +0100 Subject: [Numpy-discussion] Remove bento from numpy In-Reply-To: References: Message-ID: On 5 Jul 2014 09:23, "Ralf Gommers" wrote: > > On Sat, Jul 5, 2014 at 10:13 AM, David Cournapeau wrote: >> >> On Sat, Jul 5, 2014 at 11:25 AM, Charles R Harris < charlesr.harris at gmail.com> wrote: >>> >>> Ralf likes the speed of bento, but it is not currently maintained >> >> >> What exactly is not maintained ? > > > The issue is that Julian made some slightly nontrivial changes to core/setup.py and didn't want to update core/bscript. No one else has taken the time either to make those changes. That didn't bother me enough yet to go fix it, because they're all optional features and using Bento builds works just fine at the moment (and is part of the Travis CI test runs, so it'll keep working). Perhaps a compromise would be to declare it officially unsupported and remove it from Travis CI, while leaving the files in place to be used on an at-your-own-risk basis? As long as it's in Travis, the default is that anyone who breaks it has to fix it. If it's not in Travis, then the default is that the people (person?) who use bento are responsible for keeping it working for their needs. > I don't think the above is a good reason to remove Bento support. The much faster builds alone are a good reason to keep it. And the assertion that all numpy devs understand numpy.distutils is more than a little questionable:) They surely don't. But thousands of people use setup.py, and one or two use bento. Yet supporting both requires twice as much energy and attention as supporting just one. We've probably spent more person-hours talking about this, documenting the missing bscript bits, etc. than you've saved on those fast builds. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Jul 5 09:32:03 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 5 Jul 2014 15:32:03 +0200 Subject: [Numpy-discussion] Remove bento from numpy In-Reply-To: References: Message-ID: On Sat, Jul 5, 2014 at 1:54 PM, Nathaniel Smith wrote: > On 5 Jul 2014 09:23, "Ralf Gommers" wrote: > > > > On Sat, Jul 5, 2014 at 10:13 AM, David Cournapeau > wrote: > >> > >> On Sat, Jul 5, 2014 at 11:25 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >>> > >>> Ralf likes the speed of bento, but it is not currently maintained > >> > >> > >> What exactly is not maintained ? > > > > > > The issue is that Julian made some slightly nontrivial changes to > core/setup.py and didn't want to update core/bscript. No one else has taken > the time either to make those changes. That didn't bother me enough yet to > go fix it, because they're all optional features and using Bento builds > works just fine at the moment (and is part of the Travis CI test runs, so > it'll keep working). > > Perhaps a compromise would be to declare it officially unsupported and > remove it from Travis CI, while leaving the files in place to be used on an > at-your-own-risk basis? As long as it's in Travis, the default is that > anyone who breaks it has to fix it. If it's not in Travis, then the default > is that the people (person?) who use bento are responsible for keeping it > working for their needs. > -1 that just means that simple changes like adding a new extension will not get made before PRs get merged, and bento support will be in a broken state much more often. > > I don't think the above is a good reason to remove Bento support. The > much faster builds alone are a good reason to keep it. And the assertion > that all numpy devs understand numpy.distutils is more than a little > questionable:) > > They surely don't. But thousands of people use setup.py, and one or two > use bento. > I'm getting a little tired of these assertions. It's clear that David and I use it. A cursory search on Github reveals that Stefan, Fabian, Jonas and @aksarkar do (or did) as well: https://github.com/scipy/scipy/commit/74d823b3 https://github.com/numpy/numpy/issues/2993 https://github.com/numpy/numpy/pull/3606 https://github.com/numpy/numpy/issues/3889 For every user you can measure there's usually a number of users that you don't hear about. > Yet supporting both requires twice as much energy and attention as > supporting just one. > That's of course not true. For most changes the differences in where and how to update the build systems are small. Only for unusual changes like Julian patches to make use of optional GCC features, Bento and distutils may require very different changes. > We've probably spent more person-hours talking about this, documenting the > missing bscript bits, etc. than you've saved on those fast builds. > Then maybe stop talking about it:) Besides the fast builds, which is only one example of why I like Bento better, there's also the fundamental question of what we do with build tools in the long term. It's clear that distutils is a dead end. All the PEPs related to packaging move in the direction of supporting tools like Bento better. If in the future we need significant new features in our build tool, Bento is a much better base to build on than numpy.distutils. It's unfortunate that at the moment there's no one that works on improving our build situation, but that is what it is. Removing Bento support is a step in the wrong direction imho. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Jul 5 10:17:27 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 5 Jul 2014 15:17:27 +0100 Subject: [Numpy-discussion] Remove bento from numpy In-Reply-To: References: Message-ID: On Sat, Jul 5, 2014 at 2:32 PM, Ralf Gommers wrote: > > On Sat, Jul 5, 2014 at 1:54 PM, Nathaniel Smith wrote: >> >> On 5 Jul 2014 09:23, "Ralf Gommers" wrote: >> > >> > On Sat, Jul 5, 2014 at 10:13 AM, David Cournapeau >> > wrote: >> >> >> >> On Sat, Jul 5, 2014 at 11:25 AM, Charles R Harris >> >> wrote: >> >>> >> >>> Ralf likes the speed of bento, but it is not currently maintained >> >> >> >> >> >> What exactly is not maintained ? >> > >> > >> > The issue is that Julian made some slightly nontrivial changes to >> > core/setup.py and didn't want to update core/bscript. No one else has taken >> > the time either to make those changes. That didn't bother me enough yet to >> > go fix it, because they're all optional features and using Bento builds >> > works just fine at the moment (and is part of the Travis CI test runs, so >> > it'll keep working). >> >> Perhaps a compromise would be to declare it officially unsupported and >> remove it from Travis CI, while leaving the files in place to be used on an >> at-your-own-risk basis? As long as it's in Travis, the default is that >> anyone who breaks it has to fix it. If it's not in Travis, then the default >> is that the people (person?) who use bento are responsible for keeping it >> working for their needs. > > -1 that just means that simple changes like adding a new extension will not > get made before PRs get merged, and bento support will be in a broken state > much more often. Yes, and then the handful of people who care about this would fix it or not. Your -1 is attempting to veto other people's *not* paying attention to this build system. I... don't think -1's work that way :-( >> > I don't think the above is a good reason to remove Bento support. The >> > much faster builds alone are a good reason to keep it. And the assertion >> > that all numpy devs understand numpy.distutils is more than a little >> > questionable:) >> >> They surely don't. But thousands of people use setup.py, and one or two >> use bento. > > I'm getting a little tired of these assertions. It's clear that David and I > use it. A cursory search on Github reveals that Stefan, Fabian, Jonas and > @aksarkar do (or did) as well: > https://github.com/scipy/scipy/commit/74d823b3 > https://github.com/numpy/numpy/issues/2993 > https://github.com/numpy/numpy/pull/3606 > https://github.com/numpy/numpy/issues/3889 > For every user you can measure there's usually a number of users that you > don't hear about. I apologize for forgetting before that you do use Bento, but these patches you're finding don't really change the overall picture. Let's assume that there are 100 people using Bento, who would be slightly inconvenienced if they had to use setup.py instead, or got stuck patching the bento build themselves to keep it working. 100 is probably an order of magnitude too high, but whatever. OTOH numpy has almost 7 million downloads on PyPI+sf.net, of which approximately every one used setup.py one way or another, plus all the people get it from alternative channels like distros, which also AFAIK universally use setup.py. Software development is all about trade-offs. Time that numpy developers spend messing about with bento to benefit those hundred users is time that could instead be spent on improvements that benefit many orders of magnitudes more users. Why do you want us to spend our time producing x units of value when we could instead be producing 100*x units of value for the same effort? >> Yet supporting both requires twice as much energy and attention as >> supporting just one. > > That's of course not true. For most changes the differences in where and how > to update the build systems are small. Only for unusual changes like Julian > patches to make use of optional GCC features, Bento and distutils may > require very different changes. >> >> We've probably spent more person-hours talking about this, documenting the >> missing bscript bits, etc. than you've saved on those fast builds. > > Then maybe stop talking about it:) > > Besides the fast builds, which is only one example of why I like Bento > better, there's also the fundamental question of what we do with build tools > in the long term. It's clear that distutils is a dead end. All the PEPs > related to packaging move in the direction of supporting tools like Bento > better. If in the future we need significant new features in our build tool, > Bento is a much better base to build on than numpy.distutils. It's > unfortunate that at the moment there's no one that works on improving our > build situation, but that is what it is. Removing Bento support is a step in > the wrong direction imho. "We must do something! This is something!" Bento is pre-alpha software whose last upstream commit was in July 2013. It's own CI tests have been failing since Feb. 2013, almost a year and a half ago. Bento build support was added to numpy in early 2011, and 3.5 years later it still hasn't convinced most of the core team that it provides any value at all, yet it continues to take up time and attention. Maybe bento will revive and take over the new python packaging world! Maybe not. Maybe something else will. I don't see how our support for it will really affect these outcomes in any way. And I especially don't see why it's important to spend time *now* on keeping bento working, just in case it becomes useful *later*. If it proves valuable later, we can always fix our bscripts then. They won't dissolve irrecoverably out of history no matter what we do. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From cournape at gmail.com Sat Jul 5 10:21:24 2014 From: cournape at gmail.com (David Cournapeau) Date: Sat, 5 Jul 2014 23:21:24 +0900 Subject: [Numpy-discussion] Remove bento from numpy In-Reply-To: References: Message-ID: On Sat, Jul 5, 2014 at 11:17 PM, Nathaniel Smith wrote: > On Sat, Jul 5, 2014 at 2:32 PM, Ralf Gommers > wrote: > > > > On Sat, Jul 5, 2014 at 1:54 PM, Nathaniel Smith wrote: > >> > >> On 5 Jul 2014 09:23, "Ralf Gommers" wrote: > >> > > >> > On Sat, Jul 5, 2014 at 10:13 AM, David Cournapeau > > >> > wrote: > >> >> > >> >> On Sat, Jul 5, 2014 at 11:25 AM, Charles R Harris > >> >> wrote: > >> >>> > >> >>> Ralf likes the speed of bento, but it is not currently maintained > >> >> > >> >> > >> >> What exactly is not maintained ? > >> > > >> > > >> > The issue is that Julian made some slightly nontrivial changes to > >> > core/setup.py and didn't want to update core/bscript. No one else has > taken > >> > the time either to make those changes. That didn't bother me enough > yet to > >> > go fix it, because they're all optional features and using Bento > builds > >> > works just fine at the moment (and is part of the Travis CI test > runs, so > >> > it'll keep working). > >> > >> Perhaps a compromise would be to declare it officially unsupported and > >> remove it from Travis CI, while leaving the files in place to be used > on an > >> at-your-own-risk basis? As long as it's in Travis, the default is that > >> anyone who breaks it has to fix it. If it's not in Travis, then the > default > >> is that the people (person?) who use bento are responsible for keeping > it > >> working for their needs. > > > > -1 that just means that simple changes like adding a new extension will > not > > get made before PRs get merged, and bento support will be in a broken > state > > much more often. > > Yes, and then the handful of people who care about this would fix it > or not. Your -1 is attempting to veto other people's *not* paying > attention to this build system. I... don't think -1's work that way > :-( > > >> > I don't think the above is a good reason to remove Bento support. The > >> > much faster builds alone are a good reason to keep it. And the > assertion > >> > that all numpy devs understand numpy.distutils is more than a little > >> > questionable:) > >> > >> They surely don't. But thousands of people use setup.py, and one or two > >> use bento. > > > > I'm getting a little tired of these assertions. It's clear that David > and I > > use it. A cursory search on Github reveals that Stefan, Fabian, Jonas and > > @aksarkar do (or did) as well: > > https://github.com/scipy/scipy/commit/74d823b3 > > https://github.com/numpy/numpy/issues/2993 > > https://github.com/numpy/numpy/pull/3606 > > https://github.com/numpy/numpy/issues/3889 > > For every user you can measure there's usually a number of users that you > > don't hear about. > > I apologize for forgetting before that you do use Bento, but these > patches you're finding don't really change the overall picture. Let's > assume that there are 100 people using Bento, who would be slightly > inconvenienced if they had to use setup.py instead, or got stuck > patching the bento build themselves to keep it working. 100 is > probably an order of magnitude too high, but whatever. OTOH numpy has > almost 7 million downloads on PyPI+sf.net, of which approximately > every one used setup.py one way or another, plus all the people get it > from alternative channels like distros, which also AFAIK universally > use setup.py. Software development is all about trade-offs. Time that > numpy developers spend messing about with bento to benefit those > hundred users is time that could instead be spent on improvements that > benefit many orders of magnitudes more users. Why do you want us to > spend our time producing x units of value when we could instead be > producing 100*x units of value for the same effort? > > >> Yet supporting both requires twice as much energy and attention as > >> supporting just one. > > > > That's of course not true. For most changes the differences in where and > how > > to update the build systems are small. Only for unusual changes like > Julian > > patches to make use of optional GCC features, Bento and distutils may > > require very different changes. > >> > >> We've probably spent more person-hours talking about this, documenting > the > >> missing bscript bits, etc. than you've saved on those fast builds. > > > > Then maybe stop talking about it:) > > > > Besides the fast builds, which is only one example of why I like Bento > > better, there's also the fundamental question of what we do with build > tools > > in the long term. It's clear that distutils is a dead end. All the PEPs > > related to packaging move in the direction of supporting tools like Bento > > better. If in the future we need significant new features in our build > tool, > > Bento is a much better base to build on than numpy.distutils. It's > > unfortunate that at the moment there's no one that works on improving our > > build situation, but that is what it is. Removing Bento support is a > step in > > the wrong direction imho. > > "We must do something! This is something!" > > Bento is pre-alpha software whose last upstream commit was in July > 2013. It's own CI tests have been failing since Feb. 2013, almost a > year and a half ago. Bento build support was added to numpy in early > 2011, and 3.5 years later it still hasn't convinced most of the core > team that it provides any value at all, yet it continues to take up > time and attention. > > Maybe bento will revive and take over the new python packaging world! > Maybe not. Maybe something else will. I don't see how our support for > it will really affect these outcomes in any way. And I especially > don't see why it's important to spend time *now* on keeping bento > working, just in case it becomes useful *later*. But it is working right now, so that argument is moot. David -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sat Jul 5 10:28:16 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 5 Jul 2014 15:28:16 +0100 Subject: [Numpy-discussion] Remove bento from numpy In-Reply-To: References: Message-ID: On Sat, Jul 5, 2014 at 3:21 PM, David Cournapeau wrote: > > > > On Sat, Jul 5, 2014 at 11:17 PM, Nathaniel Smith wrote: >> >> On Sat, Jul 5, 2014 at 2:32 PM, Ralf Gommers >> wrote: >> > >> > On Sat, Jul 5, 2014 at 1:54 PM, Nathaniel Smith wrote: >> >> >> >> On 5 Jul 2014 09:23, "Ralf Gommers" wrote: >> >> > >> >> > On Sat, Jul 5, 2014 at 10:13 AM, David Cournapeau >> >> > >> >> > wrote: >> >> >> >> >> >> On Sat, Jul 5, 2014 at 11:25 AM, Charles R Harris >> >> >> wrote: >> >> >>> >> >> >>> Ralf likes the speed of bento, but it is not currently maintained >> >> >> >> >> >> >> >> >> What exactly is not maintained ? >> >> > >> >> > >> >> > The issue is that Julian made some slightly nontrivial changes to >> >> > core/setup.py and didn't want to update core/bscript. No one else has >> >> > taken >> >> > the time either to make those changes. That didn't bother me enough >> >> > yet to >> >> > go fix it, because they're all optional features and using Bento >> >> > builds >> >> > works just fine at the moment (and is part of the Travis CI test >> >> > runs, so >> >> > it'll keep working). >> >> >> >> Perhaps a compromise would be to declare it officially unsupported and >> >> remove it from Travis CI, while leaving the files in place to be used >> >> on an >> >> at-your-own-risk basis? As long as it's in Travis, the default is that >> >> anyone who breaks it has to fix it. If it's not in Travis, then the >> >> default >> >> is that the people (person?) who use bento are responsible for keeping >> >> it >> >> working for their needs. >> > >> > -1 that just means that simple changes like adding a new extension will >> > not >> > get made before PRs get merged, and bento support will be in a broken >> > state >> > much more often. >> >> Yes, and then the handful of people who care about this would fix it >> or not. Your -1 is attempting to veto other people's *not* paying >> attention to this build system. I... don't think -1's work that way >> :-( >> >> >> > I don't think the above is a good reason to remove Bento support. The >> >> > much faster builds alone are a good reason to keep it. And the >> >> > assertion >> >> > that all numpy devs understand numpy.distutils is more than a little >> >> > questionable:) >> >> >> >> They surely don't. But thousands of people use setup.py, and one or two >> >> use bento. >> > >> > I'm getting a little tired of these assertions. It's clear that David >> > and I >> > use it. A cursory search on Github reveals that Stefan, Fabian, Jonas >> > and >> > @aksarkar do (or did) as well: >> > https://github.com/scipy/scipy/commit/74d823b3 >> > https://github.com/numpy/numpy/issues/2993 >> > https://github.com/numpy/numpy/pull/3606 >> > https://github.com/numpy/numpy/issues/3889 >> > For every user you can measure there's usually a number of users that >> > you >> > don't hear about. >> >> I apologize for forgetting before that you do use Bento, but these >> patches you're finding don't really change the overall picture. Let's >> assume that there are 100 people using Bento, who would be slightly >> inconvenienced if they had to use setup.py instead, or got stuck >> patching the bento build themselves to keep it working. 100 is >> probably an order of magnitude too high, but whatever. OTOH numpy has >> almost 7 million downloads on PyPI+sf.net, of which approximately >> every one used setup.py one way or another, plus all the people get it >> from alternative channels like distros, which also AFAIK universally >> use setup.py. Software development is all about trade-offs. Time that >> numpy developers spend messing about with bento to benefit those >> hundred users is time that could instead be spent on improvements that >> benefit many orders of magnitudes more users. Why do you want us to >> spend our time producing x units of value when we could instead be >> producing 100*x units of value for the same effort? >> >> >> Yet supporting both requires twice as much energy and attention as >> >> supporting just one. >> > >> > That's of course not true. For most changes the differences in where and >> > how >> > to update the build systems are small. Only for unusual changes like >> > Julian >> > patches to make use of optional GCC features, Bento and distutils may >> > require very different changes. >> >> >> >> We've probably spent more person-hours talking about this, documenting >> >> the >> >> missing bscript bits, etc. than you've saved on those fast builds. >> > >> > Then maybe stop talking about it:) >> > >> > Besides the fast builds, which is only one example of why I like Bento >> > better, there's also the fundamental question of what we do with build >> > tools >> > in the long term. It's clear that distutils is a dead end. All the PEPs >> > related to packaging move in the direction of supporting tools like >> > Bento >> > better. If in the future we need significant new features in our build >> > tool, >> > Bento is a much better base to build on than numpy.distutils. It's >> > unfortunate that at the moment there's no one that works on improving >> > our >> > build situation, but that is what it is. Removing Bento support is a >> > step in >> > the wrong direction imho. >> >> "We must do something! This is something!" >> >> Bento is pre-alpha software whose last upstream commit was in July >> 2013. It's own CI tests have been failing since Feb. 2013, almost a >> year and a half ago. Bento build support was added to numpy in early >> 2011, and 3.5 years later it still hasn't convinced most of the core >> team that it provides any value at all, yet it continues to take up >> time and attention. >> >> Maybe bento will revive and take over the new python packaging world! >> Maybe not. Maybe something else will. I don't see how our support for >> it will really affect these outcomes in any way. And I especially >> don't see why it's important to spend time *now* on keeping bento >> working, just in case it becomes useful *later*. > > > But it is working right now, so that argument is moot. Why don't we wait until there is a significant problem with getting the Bento builds to work, and revisit then. Cheers, Matthew From charlesr.harris at gmail.com Sat Jul 5 10:51:56 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 5 Jul 2014 08:51:56 -0600 Subject: [Numpy-discussion] Remove bento from numpy In-Reply-To: References: Message-ID: On Sat, Jul 5, 2014 at 8:28 AM, Matthew Brett wrote: > On Sat, Jul 5, 2014 at 3:21 PM, David Cournapeau > wrote: > > > > > > > > On Sat, Jul 5, 2014 at 11:17 PM, Nathaniel Smith wrote: > >> > >> On Sat, Jul 5, 2014 at 2:32 PM, Ralf Gommers > >> wrote: > >> > > >> > On Sat, Jul 5, 2014 at 1:54 PM, Nathaniel Smith > wrote: > >> >> > >> >> On 5 Jul 2014 09:23, "Ralf Gommers" wrote: > >> >> > > >> >> > On Sat, Jul 5, 2014 at 10:13 AM, David Cournapeau > >> >> > > >> >> > wrote: > >> >> >> > >> >> >> On Sat, Jul 5, 2014 at 11:25 AM, Charles R Harris > >> >> >> wrote: > >> >> >>> > >> >> >>> Ralf likes the speed of bento, but it is not currently maintained > >> >> >> > >> >> >> > >> >> >> What exactly is not maintained ? > >> >> > > >> >> > > >> >> > The issue is that Julian made some slightly nontrivial changes to > >> >> > core/setup.py and didn't want to update core/bscript. No one else > has > >> >> > taken > >> >> > the time either to make those changes. That didn't bother me enough > >> >> > yet to > >> >> > go fix it, because they're all optional features and using Bento > >> >> > builds > >> >> > works just fine at the moment (and is part of the Travis CI test > >> >> > runs, so > >> >> > it'll keep working). > >> >> > >> >> Perhaps a compromise would be to declare it officially unsupported > and > >> >> remove it from Travis CI, while leaving the files in place to be used > >> >> on an > >> >> at-your-own-risk basis? As long as it's in Travis, the default is > that > >> >> anyone who breaks it has to fix it. If it's not in Travis, then the > >> >> default > >> >> is that the people (person?) who use bento are responsible for > keeping > >> >> it > >> >> working for their needs. > >> > > >> > -1 that just means that simple changes like adding a new extension > will > >> > not > >> > get made before PRs get merged, and bento support will be in a broken > >> > state > >> > much more often. > >> > >> Yes, and then the handful of people who care about this would fix it > >> or not. Your -1 is attempting to veto other people's *not* paying > >> attention to this build system. I... don't think -1's work that way > >> :-( > >> > >> >> > I don't think the above is a good reason to remove Bento support. > The > >> >> > much faster builds alone are a good reason to keep it. And the > >> >> > assertion > >> >> > that all numpy devs understand numpy.distutils is more than a > little > >> >> > questionable:) > >> >> > >> >> They surely don't. But thousands of people use setup.py, and one or > two > >> >> use bento. > >> > > >> > I'm getting a little tired of these assertions. It's clear that David > >> > and I > >> > use it. A cursory search on Github reveals that Stefan, Fabian, Jonas > >> > and > >> > @aksarkar do (or did) as well: > >> > https://github.com/scipy/scipy/commit/74d823b3 > >> > https://github.com/numpy/numpy/issues/2993 > >> > https://github.com/numpy/numpy/pull/3606 > >> > https://github.com/numpy/numpy/issues/3889 > >> > For every user you can measure there's usually a number of users that > >> > you > >> > don't hear about. > >> > >> I apologize for forgetting before that you do use Bento, but these > >> patches you're finding don't really change the overall picture. Let's > >> assume that there are 100 people using Bento, who would be slightly > >> inconvenienced if they had to use setup.py instead, or got stuck > >> patching the bento build themselves to keep it working. 100 is > >> probably an order of magnitude too high, but whatever. OTOH numpy has > >> almost 7 million downloads on PyPI+sf.net, of which approximately > >> every one used setup.py one way or another, plus all the people get it > >> from alternative channels like distros, which also AFAIK universally > >> use setup.py. Software development is all about trade-offs. Time that > >> numpy developers spend messing about with bento to benefit those > >> hundred users is time that could instead be spent on improvements that > >> benefit many orders of magnitudes more users. Why do you want us to > >> spend our time producing x units of value when we could instead be > >> producing 100*x units of value for the same effort? > >> > >> >> Yet supporting both requires twice as much energy and attention as > >> >> supporting just one. > >> > > >> > That's of course not true. For most changes the differences in where > and > >> > how > >> > to update the build systems are small. Only for unusual changes like > >> > Julian > >> > patches to make use of optional GCC features, Bento and distutils may > >> > require very different changes. > >> >> > >> >> We've probably spent more person-hours talking about this, > documenting > >> >> the > >> >> missing bscript bits, etc. than you've saved on those fast builds. > >> > > >> > Then maybe stop talking about it:) > >> > > >> > Besides the fast builds, which is only one example of why I like Bento > >> > better, there's also the fundamental question of what we do with build > >> > tools > >> > in the long term. It's clear that distutils is a dead end. All the > PEPs > >> > related to packaging move in the direction of supporting tools like > >> > Bento > >> > better. If in the future we need significant new features in our build > >> > tool, > >> > Bento is a much better base to build on than numpy.distutils. It's > >> > unfortunate that at the moment there's no one that works on improving > >> > our > >> > build situation, but that is what it is. Removing Bento support is a > >> > step in > >> > the wrong direction imho. > >> > >> "We must do something! This is something!" > >> > >> Bento is pre-alpha software whose last upstream commit was in July > >> 2013. It's own CI tests have been failing since Feb. 2013, almost a > >> year and a half ago. Bento build support was added to numpy in early > >> 2011, and 3.5 years later it still hasn't convinced most of the core > >> team that it provides any value at all, yet it continues to take up > >> time and attention. > >> > >> Maybe bento will revive and take over the new python packaging world! > >> Maybe not. Maybe something else will. I don't see how our support for > >> it will really affect these outcomes in any way. And I especially > >> don't see why it's important to spend time *now* on keeping bento > >> working, just in case it becomes useful *later*. > > > > > > But it is working right now, so that argument is moot. > > Why don't we wait until there is a significant problem with getting > the Bento builds to work, and revisit then. > > My feeling is that it is deceptive, as most folks who might use bento won't know that some optimizations are missing from the result. David, I have pinged you a number of times about getting the numpy bento build updated. The fact that bento builds numpy without failing is not the same as bento building numpy in the best way. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sat Jul 5 11:05:25 2014 From: cournape at gmail.com (David Cournapeau) Date: Sun, 6 Jul 2014 00:05:25 +0900 Subject: [Numpy-discussion] Remove bento from numpy In-Reply-To: References: Message-ID: On Sat, Jul 5, 2014 at 11:51 PM, Charles R Harris wrote: > > > > On Sat, Jul 5, 2014 at 8:28 AM, Matthew Brett > wrote: > >> On Sat, Jul 5, 2014 at 3:21 PM, David Cournapeau >> wrote: >> > >> > >> > >> > On Sat, Jul 5, 2014 at 11:17 PM, Nathaniel Smith wrote: >> >> >> >> On Sat, Jul 5, 2014 at 2:32 PM, Ralf Gommers >> >> wrote: >> >> > >> >> > On Sat, Jul 5, 2014 at 1:54 PM, Nathaniel Smith >> wrote: >> >> >> >> >> >> On 5 Jul 2014 09:23, "Ralf Gommers" wrote: >> >> >> > >> >> >> > On Sat, Jul 5, 2014 at 10:13 AM, David Cournapeau >> >> >> > >> >> >> > wrote: >> >> >> >> >> >> >> >> On Sat, Jul 5, 2014 at 11:25 AM, Charles R Harris >> >> >> >> wrote: >> >> >> >>> >> >> >> >>> Ralf likes the speed of bento, but it is not currently >> maintained >> >> >> >> >> >> >> >> >> >> >> >> What exactly is not maintained ? >> >> >> > >> >> >> > >> >> >> > The issue is that Julian made some slightly nontrivial changes to >> >> >> > core/setup.py and didn't want to update core/bscript. No one else >> has >> >> >> > taken >> >> >> > the time either to make those changes. That didn't bother me >> enough >> >> >> > yet to >> >> >> > go fix it, because they're all optional features and using Bento >> >> >> > builds >> >> >> > works just fine at the moment (and is part of the Travis CI test >> >> >> > runs, so >> >> >> > it'll keep working). >> >> >> >> >> >> Perhaps a compromise would be to declare it officially unsupported >> and >> >> >> remove it from Travis CI, while leaving the files in place to be >> used >> >> >> on an >> >> >> at-your-own-risk basis? As long as it's in Travis, the default is >> that >> >> >> anyone who breaks it has to fix it. If it's not in Travis, then the >> >> >> default >> >> >> is that the people (person?) who use bento are responsible for >> keeping >> >> >> it >> >> >> working for their needs. >> >> > >> >> > -1 that just means that simple changes like adding a new extension >> will >> >> > not >> >> > get made before PRs get merged, and bento support will be in a broken >> >> > state >> >> > much more often. >> >> >> >> Yes, and then the handful of people who care about this would fix it >> >> or not. Your -1 is attempting to veto other people's *not* paying >> >> attention to this build system. I... don't think -1's work that way >> >> :-( >> >> >> >> >> > I don't think the above is a good reason to remove Bento support. >> The >> >> >> > much faster builds alone are a good reason to keep it. And the >> >> >> > assertion >> >> >> > that all numpy devs understand numpy.distutils is more than a >> little >> >> >> > questionable:) >> >> >> >> >> >> They surely don't. But thousands of people use setup.py, and one or >> two >> >> >> use bento. >> >> > >> >> > I'm getting a little tired of these assertions. It's clear that David >> >> > and I >> >> > use it. A cursory search on Github reveals that Stefan, Fabian, Jonas >> >> > and >> >> > @aksarkar do (or did) as well: >> >> > https://github.com/scipy/scipy/commit/74d823b3 >> >> > https://github.com/numpy/numpy/issues/2993 >> >> > https://github.com/numpy/numpy/pull/3606 >> >> > https://github.com/numpy/numpy/issues/3889 >> >> > For every user you can measure there's usually a number of users that >> >> > you >> >> > don't hear about. >> >> >> >> I apologize for forgetting before that you do use Bento, but these >> >> patches you're finding don't really change the overall picture. Let's >> >> assume that there are 100 people using Bento, who would be slightly >> >> inconvenienced if they had to use setup.py instead, or got stuck >> >> patching the bento build themselves to keep it working. 100 is >> >> probably an order of magnitude too high, but whatever. OTOH numpy has >> >> almost 7 million downloads on PyPI+sf.net, of which approximately >> >> every one used setup.py one way or another, plus all the people get it >> >> from alternative channels like distros, which also AFAIK universally >> >> use setup.py. Software development is all about trade-offs. Time that >> >> numpy developers spend messing about with bento to benefit those >> >> hundred users is time that could instead be spent on improvements that >> >> benefit many orders of magnitudes more users. Why do you want us to >> >> spend our time producing x units of value when we could instead be >> >> producing 100*x units of value for the same effort? >> >> >> >> >> Yet supporting both requires twice as much energy and attention as >> >> >> supporting just one. >> >> > >> >> > That's of course not true. For most changes the differences in where >> and >> >> > how >> >> > to update the build systems are small. Only for unusual changes like >> >> > Julian >> >> > patches to make use of optional GCC features, Bento and distutils may >> >> > require very different changes. >> >> >> >> >> >> We've probably spent more person-hours talking about this, >> documenting >> >> >> the >> >> >> missing bscript bits, etc. than you've saved on those fast builds. >> >> > >> >> > Then maybe stop talking about it:) >> >> > >> >> > Besides the fast builds, which is only one example of why I like >> Bento >> >> > better, there's also the fundamental question of what we do with >> build >> >> > tools >> >> > in the long term. It's clear that distutils is a dead end. All the >> PEPs >> >> > related to packaging move in the direction of supporting tools like >> >> > Bento >> >> > better. If in the future we need significant new features in our >> build >> >> > tool, >> >> > Bento is a much better base to build on than numpy.distutils. It's >> >> > unfortunate that at the moment there's no one that works on improving >> >> > our >> >> > build situation, but that is what it is. Removing Bento support is a >> >> > step in >> >> > the wrong direction imho. >> >> >> >> "We must do something! This is something!" >> >> >> >> Bento is pre-alpha software whose last upstream commit was in July >> >> 2013. It's own CI tests have been failing since Feb. 2013, almost a >> >> year and a half ago. Bento build support was added to numpy in early >> >> 2011, and 3.5 years later it still hasn't convinced most of the core >> >> team that it provides any value at all, yet it continues to take up >> >> time and attention. >> >> >> >> Maybe bento will revive and take over the new python packaging world! >> >> Maybe not. Maybe something else will. I don't see how our support for >> >> it will really affect these outcomes in any way. And I especially >> >> don't see why it's important to spend time *now* on keeping bento >> >> working, just in case it becomes useful *later*. >> > >> > >> > But it is working right now, so that argument is moot. >> >> Why don't we wait until there is a significant problem with getting >> the Bento builds to work, and revisit then. >> >> > My feeling is that it is deceptive, as most folks who might use bento > won't know that some optimizations are missing from the result. > > David, I have pinged you a number of times about getting the numpy bento > build updated. The fact that bento builds numpy without failing is not the > same as bento building numpy in the best way. > Fair enough, let me look at it now, looks fairly trivial to fix David -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Sat Jul 5 11:11:03 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sat, 05 Jul 2014 17:11:03 +0200 Subject: [Numpy-discussion] Fast way to convert (nested) list to numpy object array? In-Reply-To: <53B6C919.4010806@tudelft.nl> References: <53B51993.7080207@tudelft.nl> <53B54E41.8090309@tudelft.nl> <1404391459.13834.8.camel@sebastian-t440> <53B6C919.4010806@tudelft.nl> Message-ID: <1404573063.3423.5.camel@sebastian-t440> On Fr, 2014-07-04 at 17:32 +0200, Marc Hulsman wrote: > On 07/03/2014 02:44 PM, Sebastian Berg wrote: > > True and true. I don't see a problem with fromiter being more general, > > just someone has to sit down and add new error checks/cleanup stuff > > for the object case. The assignment could probably also be optimized, > > not sure how hard that is, I would expect it isn't that hard. As > > usually, someone just needs to find time and the interest to actually > > do it ;). - Sebastian > > I looked at the code of FromIter below. > > /* > * We would need to alter the memory RENEW code to decrement any > * reference counts before throwing away any memory. > */ > if (PyDataType_REFCHK(dtype)) { > PyErr_SetString(PyExc_ValueError, > "cannot create object arrays from iterator"); > goto done; > } > > > However, the memory renew code (which just reallocs the memory to > increase the array size) uses > a simple realloc. It seems to me that it is not necessary to adapt > reference counts in this case (as the incref > from the new memory compensates the decref from the memory that is > removed)? For the addition of elements > to the array, everything seems to be ok anyway, as setitem is used, > which does the incref already. > So I think it should be possible to just remove this check? > Yes and no. I agree that the comment was just being overly careful, since the renew will copy the pointers without calling Py_INCREF. However, you *will* need to add new error cleanup logic in case the iterator throws an error, or you run out of memory. Since then you need to decref everything again. > I did not yet look at the assignment issue, had some difficulty finding > the correct place in the code, does does > anyone have any pointers were to look? > This is handled by PyArray_CopyObject in arrayobject.c. The actual logic is probably done by PyArray_GetArrayParamsFromObject in ctors.c, that is a public function, so my guess is, you would have to create a new one which allows passing in a maximum ndim and then make the old one call that one with NPY_MAXDIMS (or whatever it was) - Sebastian > > > > >> The generic solution of adding an nmaxdim parameter to numpy.array would > >> of course be even more ideal :) > >> > >> > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Sat Jul 5 11:24:36 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sat, 05 Jul 2014 17:24:36 +0200 Subject: [Numpy-discussion] Questions about fixes for 1.9.0rc2 In-Reply-To: References: Message-ID: <1404573876.3423.9.camel@sebastian-t440> On Sa, 2014-07-05 at 00:41 +0100, Nathaniel Smith wrote: > On 5 Jul 2014 00:07, "Charles R Harris" > > That is a massive pile of affected projects :-(. > > My worry is that if all these projects we know about are broken, then > how many other codebases that we aren't testing are also broken? > Yeah, I would imagine quite a few might be. It isn't that I guess many used the "feature" deliberately, but it is easy to just code it and assume that the code is correct since it works. So I think I will just need to fix it. The pull request *should* already do this with a band aid-solution, by just falling back to the old funky stuff if there is a failure. If someone is good with python exception handling and string formatting in C, please feel free to have a look ;). - Sebastian > > If the issues are fixed in matplotlib and pandas I'd be inclined to > release as is with a mention of versions in the release notes. > > Even if it's fixed in pandas master, how long until it's in user's > hands? > > -n > > > Chuck > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From njs at pobox.com Sat Jul 5 12:38:02 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 5 Jul 2014 17:38:02 +0100 Subject: [Numpy-discussion] Remove bento from numpy In-Reply-To: References: Message-ID: On Sat, Jul 5, 2014 at 3:21 PM, David Cournapeau wrote: > > On Sat, Jul 5, 2014 at 11:17 PM, Nathaniel Smith wrote: >> >> Maybe bento will revive and take over the new python packaging world! >> Maybe not. Maybe something else will. I don't see how our support for >> it will really affect these outcomes in any way. And I especially >> don't see why it's important to spend time *now* on keeping bento >> working, just in case it becomes useful *later*. > > But it is working right now, so that argument is moot. My suggestion was that we should drop the rule that a patch has to keep bento working to be merged. We're talking about future breakages and future effort. The fact that it's working now doesn't say anything about whether it's worth continuing to invest time in it. -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From cournape at gmail.com Sat Jul 5 12:40:17 2014 From: cournape at gmail.com (David Cournapeau) Date: Sun, 6 Jul 2014 01:40:17 +0900 Subject: [Numpy-discussion] Remove bento from numpy In-Reply-To: References: Message-ID: The efforts are on average less demanding than this discussion. We are talking about adding entries to a list in most cases... Also, while adding the optimization support for bento, I've noticed that a lot of the related distutils code is broken, and does not work as expected on at least OS X + clang. David On Sun, Jul 6, 2014 at 1:38 AM, Nathaniel Smith wrote: > On Sat, Jul 5, 2014 at 3:21 PM, David Cournapeau > wrote: > > > > On Sat, Jul 5, 2014 at 11:17 PM, Nathaniel Smith wrote: > >> > >> Maybe bento will revive and take over the new python packaging world! > >> Maybe not. Maybe something else will. I don't see how our support for > >> it will really affect these outcomes in any way. And I especially > >> don't see why it's important to spend time *now* on keeping bento > >> working, just in case it becomes useful *later*. > > > > But it is working right now, so that argument is moot. > > My suggestion was that we should drop the rule that a patch has to > keep bento working to be merged. We're talking about future breakages > and future effort. The fact that it's working now doesn't say anything > about whether it's worth continuing to invest time in it. > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Sat Jul 5 12:55:04 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Sat, 05 Jul 2014 18:55:04 +0200 Subject: [Numpy-discussion] Remove bento from numpy In-Reply-To: References: Message-ID: <53B82DE8.7090905@googlemail.com> On 05.07.2014 18:40, David Cournapeau wrote: > The efforts are on average less demanding than this discussion. We are > talking about adding entries to a list in most cases... > > Also, while adding the optimization support for bento, I've noticed that > a lot of the related distutils code is broken, and does not work as > expected on at least OS X + clang. It just spits out a lot of warnings but they are harmless. We could remove them by using with -Werror=attribute for the conftests if it really bothers someone. Or do you mean something else? > > David > > > On Sun, Jul 6, 2014 at 1:38 AM, Nathaniel Smith > wrote: > > On Sat, Jul 5, 2014 at 3:21 PM, David Cournapeau > wrote: > > > > On Sat, Jul 5, 2014 at 11:17 PM, Nathaniel Smith > wrote: > >> > >> Maybe bento will revive and take over the new python packaging world! > >> Maybe not. Maybe something else will. I don't see how our support for > >> it will really affect these outcomes in any way. And I especially > >> don't see why it's important to spend time *now* on keeping bento > >> working, just in case it becomes useful *later*. > > > > But it is working right now, so that argument is moot. > > My suggestion was that we should drop the rule that a patch has to > keep bento working to be merged. We're talking about future breakages > and future effort. The fact that it's working now doesn't say anything > about whether it's worth continuing to invest time in it. > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From cournape at gmail.com Sat Jul 5 13:11:30 2014 From: cournape at gmail.com (David Cournapeau) Date: Sun, 6 Jul 2014 02:11:30 +0900 Subject: [Numpy-discussion] Remove bento from numpy In-Reply-To: <53B82DE8.7090905@googlemail.com> References: <53B82DE8.7090905@googlemail.com> Message-ID: On Sun, Jul 6, 2014 at 1:55 AM, Julian Taylor wrote: > On 05.07.2014 18:40, David Cournapeau wrote: > > The efforts are on average less demanding than this discussion. We are > > talking about adding entries to a list in most cases... > > > > Also, while adding the optimization support for bento, I've noticed that > > a lot of the related distutils code is broken, and does not work as > > expected on at least OS X + clang. > > It just spits out a lot of warnings but they are harmless. > Adding lots of warnings are not harmless as they render the compiler warning system near useless (too many false alarms). I will fix the checks for both distutils and bento (using the autoconf macros setup, which should be more reliable than what we use for builtin and __attribute__-related checks) David > We could remove them by using with -Werror=attribute for the conftests > if it really bothers someone. > Or do you mean something else? > > > > > David > > > > > > On Sun, Jul 6, 2014 at 1:38 AM, Nathaniel Smith > > wrote: > > > > On Sat, Jul 5, 2014 at 3:21 PM, David Cournapeau > > wrote: > > > > > > On Sat, Jul 5, 2014 at 11:17 PM, Nathaniel Smith > > wrote: > > >> > > >> Maybe bento will revive and take over the new python packaging > world! > > >> Maybe not. Maybe something else will. I don't see how our support > for > > >> it will really affect these outcomes in any way. And I especially > > >> don't see why it's important to spend time *now* on keeping bento > > >> working, just in case it becomes useful *later*. > > > > > > But it is working right now, so that argument is moot. > > > > My suggestion was that we should drop the rule that a patch has to > > keep bento working to be merged. We're talking about future breakages > > and future effort. The fact that it's working now doesn't say > anything > > about whether it's worth continuing to invest time in it. > > > > -- > > Nathaniel J. Smith > > Postdoctoral researcher - Informatics - University of Edinburgh > > http://vorpus.org > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Sat Jul 5 13:24:55 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Sat, 05 Jul 2014 19:24:55 +0200 Subject: [Numpy-discussion] Remove bento from numpy In-Reply-To: References: <53B82DE8.7090905@googlemail.com> Message-ID: <53B834E7.4060109@googlemail.com> On 05.07.2014 19:11, David Cournapeau wrote: > On Sun, Jul 6, 2014 at 1:55 AM, Julian Taylor > > > wrote: > > On 05.07.2014 18:40, David Cournapeau wrote: > > The efforts are on average less demanding than this discussion. We are > > talking about adding entries to a list in most cases... > > > > Also, while adding the optimization support for bento, I've > noticed that > > a lot of the related distutils code is broken, and does not work as > > expected on at least OS X + clang. > > It just spits out a lot of warnings but they are harmless. > > > Adding lots of warnings are not harmless as they render the compiler > warning system near useless (too many false alarms). > true but until now we haven't received a single complaint nor fixes so probably not many developers are actually using macs/clang to work on numpy C code. But I do agree its bad and I have fixing that on my todo list, I didn't get around to it yet. > I will fix the checks for both distutils and bento (using the autoconf > macros setup, which should be more reliable than what we use for builtin > and __attribute__-related checks) > > David > > > We could remove them by using with -Werror=attribute for the conftests > if it really bothers someone. > Or do you mean something else? > > > > > David > > > > > > On Sun, Jul 6, 2014 at 1:38 AM, Nathaniel Smith > > >> wrote: > > > > On Sat, Jul 5, 2014 at 3:21 PM, David Cournapeau > > > >> wrote: > > > > > > On Sat, Jul 5, 2014 at 11:17 PM, Nathaniel Smith > > > >> wrote: > > >> > > >> Maybe bento will revive and take over the new python > packaging world! > > >> Maybe not. Maybe something else will. I don't see how our > support for > > >> it will really affect these outcomes in any way. And I > especially > > >> don't see why it's important to spend time *now* on keeping > bento > > >> working, just in case it becomes useful *later*. > > > > > > But it is working right now, so that argument is moot. > > > > My suggestion was that we should drop the rule that a patch has to > > keep bento working to be merged. We're talking about future > breakages > > and future effort. The fact that it's working now doesn't say > anything > > about whether it's worth continuing to invest time in it. > > > > -- > > Nathaniel J. Smith > > Postdoctoral researcher - Informatics - University of Edinburgh > > http://vorpus.org > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From cournape at gmail.com Sat Jul 5 13:28:14 2014 From: cournape at gmail.com (David Cournapeau) Date: Sun, 6 Jul 2014 02:28:14 +0900 Subject: [Numpy-discussion] Remove bento from numpy In-Reply-To: <53B834E7.4060109@googlemail.com> References: <53B82DE8.7090905@googlemail.com> <53B834E7.4060109@googlemail.com> Message-ID: On Sun, Jul 6, 2014 at 2:24 AM, Julian Taylor wrote: > On 05.07.2014 19:11, David Cournapeau wrote: > > On Sun, Jul 6, 2014 at 1:55 AM, Julian Taylor > > > > > wrote: > > > > On 05.07.2014 18:40, David Cournapeau wrote: > > > The efforts are on average less demanding than this discussion. We > are > > > talking about adding entries to a list in most cases... > > > > > > Also, while adding the optimization support for bento, I've > > noticed that > > > a lot of the related distutils code is broken, and does not work as > > > expected on at least OS X + clang. > > > > It just spits out a lot of warnings but they are harmless. > > > > > > Adding lots of warnings are not harmless as they render the compiler > > warning system near useless (too many false alarms). > > > > true but until now we haven't received a single complaint nor fixes so > probably not many developers are actually using macs/clang to work on > numpy C code. > Not many people are working on numpy C code period :) FWIW, clang is now the standard OS X compiler since Maverick (Apple in all its wisdom made gcc an alias to clang...). David > But I do agree its bad and I have fixing that on my todo list, I didn't > get around to it yet. > > > I will fix the checks for both distutils and bento (using the autoconf > > macros setup, which should be more reliable than what we use for builtin > > and __attribute__-related checks) > > > > David > > > > > > We could remove them by using with -Werror=attribute for the > conftests > > if it really bothers someone. > > Or do you mean something else? > > > > > > > > David > > > > > > > > > On Sun, Jul 6, 2014 at 1:38 AM, Nathaniel Smith > > > > >> wrote: > > > > > > On Sat, Jul 5, 2014 at 3:21 PM, David Cournapeau > > > > > >> > wrote: > > > > > > > > On Sat, Jul 5, 2014 at 11:17 PM, Nathaniel Smith > > > > > >> wrote: > > > >> > > > >> Maybe bento will revive and take over the new python > > packaging world! > > > >> Maybe not. Maybe something else will. I don't see how our > > support for > > > >> it will really affect these outcomes in any way. And I > > especially > > > >> don't see why it's important to spend time *now* on keeping > > bento > > > >> working, just in case it becomes useful *later*. > > > > > > > > But it is working right now, so that argument is moot. > > > > > > My suggestion was that we should drop the rule that a patch > has to > > > keep bento working to be merged. We're talking about future > > breakages > > > and future effort. The fact that it's working now doesn't say > > anything > > > about whether it's worth continuing to invest time in it. > > > > > > -- > > > Nathaniel J. Smith > > > Postdoctoral researcher - Informatics - University of Edinburgh > > > http://vorpus.org > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at scipy.org > > NumPy-Discussion at scipy.org>> > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at scipy.org > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Sat Jul 5 15:41:18 2014 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Sat, 5 Jul 2014 12:41:18 -0700 Subject: [Numpy-discussion] Teaching Scipy BoF at SciPy In-Reply-To: References: Message-ID: <5468590778974524027@unknownmsgid> On Jul 4, 2014, at 7:02 AM, Phil Elson wrote: Nice idea. Just a repository of courses would be a great first step. Yup -- or really even a curated page of links and refrrences. Maybe we can get first draft of such a thing put together during the BoF. Feel free to add this idea to the Wiki :-) I hope you can come to the BoF, I know there are a number of others at the same time. -CHB For example, I know Jake Vanderplas's course at https://github.com/jakevdp/2013_fall_ASTR599 is useful, and I have a few introduction (3hr) courses at https://github.com/SciTools/courses. On 3 July 2014 16:59, Chris Barker wrote: > HI Folks, > > I will be hosting a "Teaching the SciPy Stack" BoF at SciPy this year: > > https://conference.scipy.org/scipy2014/schedule/presentation/1762/ > > (Actually, I proposed it for the conference, but would be more than happy > to have other folks join me in facilitating, hosting, etc.) > > I've put up a Wiki page to collect ideas for topics. Please take a look > and add your $0.02: > > https://github.com/numpy/numpy/wiki/TeachingSciPy-BoF-at-Scipy-2014 > > See you there, > > -Chris > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Jul 5 18:42:41 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 6 Jul 2014 00:42:41 +0200 Subject: [Numpy-discussion] Remove bento from numpy In-Reply-To: References: Message-ID: On Sat, Jul 5, 2014 at 4:17 PM, Nathaniel Smith wrote: > On Sat, Jul 5, 2014 at 2:32 PM, Ralf Gommers > wrote: > > > > On Sat, Jul 5, 2014 at 1:54 PM, Nathaniel Smith wrote: > >> > >> On 5 Jul 2014 09:23, "Ralf Gommers" wrote: > >> Perhaps a compromise would be to declare it officially unsupported and > >> remove it from Travis CI, while leaving the files in place to be used > on an > >> at-your-own-risk basis? As long as it's in Travis, the default is that > >> anyone who breaks it has to fix it. If it's not in Travis, then the > default > >> is that the people (person?) who use bento are responsible for keeping > it > >> working for their needs. > > > > -1 that just means that simple changes like adding a new extension will > not > > get made before PRs get merged, and bento support will be in a broken > state > > much more often. > > Yes, and then the handful of people who care about this would fix it > or not. What next, we give Alan Isaac commit rights and then it's OK to break numpy.matrix when that's convenient? > Your -1 is attempting to veto other people's *not* paying > attention to this build system. I... don't think -1's work that way > :-( > You're proposing it'll be OK for others to break stuff that the people before them put quite some effort into implementing. I damn well have the right to give that a -1. David is fixing the few existing problems now, so there should be zero issues here. You're deliberately mischaracterizing the situation (pre-alpha, lot of effort, etc.), so I'm not going to bother responding to the rest, I'm annoyed enough as is. Ralf P.S. if anyone wants to spend some productive energy on the build situation, MSVC 2010 support for Python 3.x would be nice: https://github.com/numpy/numpy/issues/4245 -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Sat Jul 5 19:13:25 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Sat, 05 Jul 2014 19:13:25 -0400 Subject: [Numpy-discussion] Remove bento from numpy In-Reply-To: References: Message-ID: <53B88695.3020802@gmail.com> On 7/5/2014 6:42 PM, Ralf Gommers wrote: > What next, we give Alan Isaac commit rights and then it's OK to break numpy.matrix when that's convenient? I always wondered what I would do with commit rights ... Alan From ben.root at ou.edu Sat Jul 5 22:13:18 2014 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 5 Jul 2014 22:13:18 -0400 Subject: [Numpy-discussion] Questions about fixes for 1.9.0rc2 In-Reply-To: <1404573876.3423.9.camel@sebastian-t440> References: <1404573876.3423.9.camel@sebastian-t440> Message-ID: Drats... I actually know those two topics... and I might have free time tomorrow afternoon at SciPy. Maybe I could take a peek at it? Ben On Sat, Jul 5, 2014 at 11:24 AM, Sebastian Berg wrote: > On Sa, 2014-07-05 at 00:41 +0100, Nathaniel Smith wrote: > > On 5 Jul 2014 00:07, "Charles R Harris" > > > > > > That is a massive pile of affected projects :-(. > > > > My worry is that if all these projects we know about are broken, then > > how many other codebases that we aren't testing are also broken? > > > > Yeah, I would imagine quite a few might be. It isn't that I guess many > used the "feature" deliberately, but it is easy to just code it and > assume that the code is correct since it works. So I think I will just > need to fix it. The pull request *should* already do this with a band > aid-solution, by just falling back to the old funky stuff if there is a > failure. If someone is good with python exception handling and string > formatting in C, please feel free to have a look ;). > > - Sebastian > > > > If the issues are fixed in matplotlib and pandas I'd be inclined to > > release as is with a mention of versions in the release notes. > > > > Even if it's fixed in pandas master, how long until it's in user's > > hands? > > > > -n > > > > > Chuck > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at scipy.org > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bryanv at continuum.io Sun Jul 6 01:12:08 2014 From: bryanv at continuum.io (Bryan Van de Ven) Date: Sun, 6 Jul 2014 01:12:08 -0400 Subject: [Numpy-discussion] Remove bento from numpy In-Reply-To: References: Message-ID: <439ACFB7-AE6F-40CA-8483-BC837BE10520@continuum.io> Speaking as someone who started but then stopped dabbling in the NumPy C core, having to think about two build system is a huge turn-off. Getting into the NumPy C code is hard enough without having to worry about multiple build systems. Bryan On Jul 5, 2014, at 6:42 PM, Ralf Gommers wrote: > > > > On Sat, Jul 5, 2014 at 4:17 PM, Nathaniel Smith wrote: > On Sat, Jul 5, 2014 at 2:32 PM, Ralf Gommers wrote: > > > > On Sat, Jul 5, 2014 at 1:54 PM, Nathaniel Smith wrote: > >> > >> On 5 Jul 2014 09:23, "Ralf Gommers" wrote: > >> Perhaps a compromise would be to declare it officially unsupported and > >> remove it from Travis CI, while leaving the files in place to be used on an > >> at-your-own-risk basis? As long as it's in Travis, the default is that > >> anyone who breaks it has to fix it. If it's not in Travis, then the default > >> is that the people (person?) who use bento are responsible for keeping it > >> working for their needs. > > > > -1 that just means that simple changes like adding a new extension will not > > get made before PRs get merged, and bento support will be in a broken state > > much more often. > > Yes, and then the handful of people who care about this would fix it > or not. > > What next, we give Alan Isaac commit rights and then it's OK to break numpy.matrix when that's convenient? > > Your -1 is attempting to veto other people's *not* paying > attention to this build system. I... don't think -1's work that way > :-( > > You're proposing it'll be OK for others to break stuff that the people before them put quite some effort into implementing. I damn well have the right to give that a -1. > > David is fixing the few existing problems now, so there should be zero issues here. You're deliberately mischaracterizing the situation (pre-alpha, lot of effort, etc.), so I'm not going to bother responding to the rest, I'm annoyed enough as is. > > Ralf > > P.S. if anyone wants to spend some productive energy on the build situation, MSVC 2010 support for Python 3.x would be nice: https://github.com/numpy/numpy/issues/4245 > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From sebastian at sipsolutions.net Sun Jul 6 02:30:34 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 06 Jul 2014 08:30:34 +0200 Subject: [Numpy-discussion] Questions about fixes for 1.9.0rc2 In-Reply-To: References: <1404573876.3423.9.camel@sebastian-t440> Message-ID: <1404628234.12836.1.camel@sebastian-t440> On Sa, 2014-07-05 at 22:13 -0400, Benjamin Root wrote: > Drats... I actually know those two topics... and I might have free > time tomorrow afternoon at SciPy. Maybe I could take a peek at it? > Maybe if you have time. It is just the attempt_1d_fallback function in the pull request https://github.com/numpy/numpy/pull/4804 This is called only after the normal indexing code gave an exception already and maybe we can make the warnings more informative. - Sebastian > > Ben > > > > On Sat, Jul 5, 2014 at 11:24 AM, Sebastian Berg > wrote: > On Sa, 2014-07-05 at 00:41 +0100, Nathaniel Smith wrote: > > On 5 Jul 2014 00:07, "Charles R Harris" > > > > > > > > That is a massive pile of affected projects :-(. > > > > My worry is that if all these projects we know about are > broken, then > > how many other codebases that we aren't testing are also > broken? > > > > > Yeah, I would imagine quite a few might be. It isn't that I > guess many > used the "feature" deliberately, but it is easy to just code > it and > assume that the code is correct since it works. So I think I > will just > need to fix it. The pull request *should* already do this with > a band > aid-solution, by just falling back to the old funky stuff if > there is a > failure. If someone is good with python exception handling and > string > formatting in C, please feel free to have a look ;). > > - Sebastian > > > > If the issues are fixed in matplotlib and pandas I'd be > inclined to > > release as is with a mention of versions in the release > notes. > > > > Even if it's fixed in pandas master, how long until it's in > user's > > hands? > > > > -n > > > > > Chuck > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at scipy.org > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From sturla.molden at gmail.com Sun Jul 6 03:52:53 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 6 Jul 2014 07:52:53 +0000 (UTC) Subject: [Numpy-discussion] About the npz format References: <20140704134954.GB31861@kudu.in-berlin.de> Message-ID: <940174683426325795.593859sturla.molden-gmail.com@news.gmane.org> There is no os.mkfifo on Windows. Sturla Valentin Haenel wrote: > sorry, for the top-post, but should we add this as an issue on the > github tracker? I'd like to revisit it this summer. > > V- > > * Julian Taylor [2014-04-18]: >> On 18.04.2014 18:29, Valentin Haenel wrote: >>> Hi, >>> >>> * Valentin Haenel [2014-04-17]: >>>> * Valentin Haenel [2014-04-17]: >>>>> * Julian Taylor [2014-04-17]: >>>>>> On 17.04.2014 21:30, onefire wrote: >>>>>>> Thanks for the suggestion. I did profile the program before, just not >>>>>>> using Python. >>>>>> >>>>>> one problem of npz is that the zipfile module does not support streaming >>>>>> data in (or if it does now we aren't using it). >>>>>> So numpy writes the file uncompressed to disk and then zips it which is >>>>>> horrible for performance and disk usage. >>>>> >>>>> As a workaround may also be possible to write the temporary NPY files to >>>>> cStringIO instances and then use ``ZipFile.writestr`` with the >>>>> ``getvalue()`` of the cStringIO object. However that approach may >>>>> require some memory. In python 2.7, for each array: one copy inside the >>>>> cStringIO instance and then another copy of when calling getvalue on the >>>>> cString, I believe. >>>> >>>> There is a proof-of-concept implementation here: >>>> >>>> https://github.com/esc/numpy/compare/feature;npz_no_temp_file >>> >>> Anybody interested in me fixing this up (unit tests, API, etc..) for >>> inclusion? >>> >> >> I wonder if it would be better to instead use a fifo to avoid the memory >> doubling. Windows probably hasn't got them (exposed via python) but one >> can slap a platform check in front. >> attached a proof of concept without proper error handling (which is >> unfortunately the tricky part) > >>> From 472b4c0a44804b65d0774147010ec7a931a1c52d Mon Sep 17 00:00:00 2001 >> From: Julian Taylor >> Date: Thu, 17 Apr 2014 23:01:47 +0200 >> Subject: [PATCH] use a pipe for savez >> >> --- >> numpy/lib/npyio.py | 25 +++++++++++-------------- >> 1 file changed, 11 insertions(+), 14 deletions(-) >> >> diff --git a/numpy/lib/npyio.py b/numpy/lib/npyio.py >> index 98b4b6e..baafa9d 100644 >> --- a/numpy/lib/npyio.py >> +++ b/numpy/lib/npyio.py >> @@ -585,22 +585,19 @@ def _savez(file, args, kwds, compress): >> zipf = zipfile_factory(file, mode="w", compression=compression) >> >> # Stage arrays in a temporary file on disk, before writing to zip. >> - fd, tmpfile = tempfile.mkstemp(suffix='-numpy.npy') >> - os.close(fd) >> - try: >> + import threading >> + with tempfile.TemporaryDirectory() as td: >> + fifoname = os.path.join(td, "fifo") >> + os.mkfifo(fifoname) >> for key, val in namedict.items(): >> fname = key + '.npy' >> - fid = open(tmpfile, 'wb') >> - try: >> - format.write_array(fid, np.asanyarray(val)) >> - fid.close() >> - fid = None >> - zipf.write(tmpfile, arcname=fname) >> - finally: >> - if fid: >> - fid.close() >> - finally: >> - os.remove(tmpfile) >> + def mywrite(pipe, val): >> + with open(pipe, "wb") as wpipe: >> + format.write_array(wpipe, np.asanyarray(val)) >> + t = threading.Thread(target=mywrite, args=(fifoname, val)) >> + t.start() >> + zipf.write(fifoname, arcname=fname) >> + t.join() >> >> zipf.close() >> >> -- >> 1.9.1 >> > >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion From sturla.molden at gmail.com Sun Jul 6 04:35:55 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 6 Jul 2014 08:35:55 +0000 (UTC) Subject: [Numpy-discussion] [Python-ideas] PEP pre-draft: Support for indexing with keyword arguments References: <1404463173.2714.4.camel@sebastian-t440> Message-ID: <1324974989426327476.518333sturla.molden-gmail.com@news.gmane.org> Sebastian Berg wrote: >> Could it be useful for structured arrays? > > Not sure how. The named columns seem like a decent point to me. NumPy is naming the fields, not the axes, so it might be more useful for Pandas than NumPy. For example if we have an image with r,g,b data, NumPy would not name a 'color' axis with indexes 'r', 'g' and 'b'. But conceptually image[i:m, j:n, field='r'] could be faster than image[i:m, j:n]['r'] or image['r'][i:m, j:n], and perhaps also slightly more readable. I am note sure about nested dtypes in record arrays though... If the possibility of keyword indexing are supported in Python, there is nothing that prevents this Pandas like extension to NumPy arrays: image[i:m, j:n, color='r'] it would require an extension of the current dtype descriptors, in order to tell NumPy among which fields the keyword "color" would select, but it shouldn't be undoable. Sturla From cournape at gmail.com Sun Jul 6 05:02:08 2014 From: cournape at gmail.com (David Cournapeau) Date: Sun, 6 Jul 2014 18:02:08 +0900 Subject: [Numpy-discussion] Remove bento from numpy In-Reply-To: <53B834E7.4060109@googlemail.com> References: <53B82DE8.7090905@googlemail.com> <53B834E7.4060109@googlemail.com> Message-ID: On Sun, Jul 6, 2014 at 2:24 AM, Julian Taylor wrote: > On 05.07.2014 19:11, David Cournapeau wrote: > > On Sun, Jul 6, 2014 at 1:55 AM, Julian Taylor > > > > > wrote: > > > > On 05.07.2014 18:40, David Cournapeau wrote: > > > The efforts are on average less demanding than this discussion. We > are > > > talking about adding entries to a list in most cases... > > > > > > Also, while adding the optimization support for bento, I've > > noticed that > > > a lot of the related distutils code is broken, and does not work as > > > expected on at least OS X + clang. > > > > It just spits out a lot of warnings but they are harmless. > > > > > > Adding lots of warnings are not harmless as they render the compiler > > warning system near useless (too many false alarms). > > > > true but until now we haven't received a single complaint nor fixes so > probably not many developers are actually using macs/clang to work on > numpy C code. > But I do agree its bad and I have fixing that on my todo list, I didn't > get around to it yet. > Here is an attempt: https://github.com/numpy/numpy/pull/4842 It uses a vile hack, but I did not see any other simple way. It fixes the warnings on osx, once travis-ci confirms the tests pass ok on linux, I will test it on msvc. David > > > I will fix the checks for both distutils and bento (using the autoconf > > macros setup, which should be more reliable than what we use for builtin > > and __attribute__-related checks) > > > > David > > > > > > We could remove them by using with -Werror=attribute for the > conftests > > if it really bothers someone. > > Or do you mean something else? > > > > > > > > David > > > > > > > > > On Sun, Jul 6, 2014 at 1:38 AM, Nathaniel Smith > > > > >> wrote: > > > > > > On Sat, Jul 5, 2014 at 3:21 PM, David Cournapeau > > > > > >> > wrote: > > > > > > > > On Sat, Jul 5, 2014 at 11:17 PM, Nathaniel Smith > > > > > >> wrote: > > > >> > > > >> Maybe bento will revive and take over the new python > > packaging world! > > > >> Maybe not. Maybe something else will. I don't see how our > > support for > > > >> it will really affect these outcomes in any way. And I > > especially > > > >> don't see why it's important to spend time *now* on keeping > > bento > > > >> working, just in case it becomes useful *later*. > > > > > > > > But it is working right now, so that argument is moot. > > > > > > My suggestion was that we should drop the rule that a patch > has to > > > keep bento working to be merged. We're talking about future > > breakages > > > and future effort. The fact that it's working now doesn't say > > anything > > > about whether it's worth continuing to invest time in it. > > > > > > -- > > > Nathaniel J. Smith > > > Postdoctoral researcher - Informatics - University of Edinburgh > > > http://vorpus.org > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at scipy.org > > NumPy-Discussion at scipy.org>> > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at scipy.org > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Sun Jul 6 13:54:27 2014 From: ben.root at ou.edu (Benjamin Root) Date: Sun, 6 Jul 2014 13:54:27 -0400 Subject: [Numpy-discussion] Questions about fixes for 1.9.0rc2 In-Reply-To: <1404628234.12836.1.camel@sebastian-t440> References: <1404573876.3423.9.camel@sebastian-t440> <1404628234.12836.1.camel@sebastian-t440> Message-ID: I see that a solution has already been found and merged. Are there any remaining issues for matplotlib to resolve? On Sun, Jul 6, 2014 at 2:30 AM, Sebastian Berg wrote: > On Sa, 2014-07-05 at 22:13 -0400, Benjamin Root wrote: > > Drats... I actually know those two topics... and I might have free > > time tomorrow afternoon at SciPy. Maybe I could take a peek at it? > > > > Maybe if you have time. It is just the attempt_1d_fallback function in > the pull request https://github.com/numpy/numpy/pull/4804 > This is called only after the normal indexing code gave an exception > already and maybe we can make the warnings more informative. > > - Sebastian > > > > > Ben > > > > > > > > On Sat, Jul 5, 2014 at 11:24 AM, Sebastian Berg > > wrote: > > On Sa, 2014-07-05 at 00:41 +0100, Nathaniel Smith wrote: > > > On 5 Jul 2014 00:07, "Charles R Harris" > > > > > > > > > > > > > > That is a massive pile of affected projects :-(. > > > > > > My worry is that if all these projects we know about are > > broken, then > > > how many other codebases that we aren't testing are also > > broken? > > > > > > > > > Yeah, I would imagine quite a few might be. It isn't that I > > guess many > > used the "feature" deliberately, but it is easy to just code > > it and > > assume that the code is correct since it works. So I think I > > will just > > need to fix it. The pull request *should* already do this with > > a band > > aid-solution, by just falling back to the old funky stuff if > > there is a > > failure. If someone is good with python exception handling and > > string > > formatting in C, please feel free to have a look ;). > > > > - Sebastian > > > > > > If the issues are fixed in matplotlib and pandas I'd be > > inclined to > > > release as is with a mention of versions in the release > > notes. > > > > > > Even if it's fixed in pandas master, how long until it's in > > user's > > > hands? > > > > > > -n > > > > > > > Chuck > > > > > > > > > > > > _______________________________________________ > > > > NumPy-Discussion mailing list > > > > NumPy-Discussion at scipy.org > > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at scipy.org > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Jul 6 14:07:11 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 6 Jul 2014 12:07:11 -0600 Subject: [Numpy-discussion] Questions about fixes for 1.9.0rc2 In-Reply-To: References: <1404573876.3423.9.camel@sebastian-t440> <1404628234.12836.1.camel@sebastian-t440> Message-ID: On Sun, Jul 6, 2014 at 11:54 AM, Benjamin Root wrote: > I see that a solution has already been found and merged. Are there any > remaining issues for matplotlib to resolve? > > You might take a look at the fixes in the matplotlib PR. They struck me as a bit hasty rather than fixes for the underlying problems, especially in the cubic interpolation case. The other mismatched assignment might be fixable with a `.flat` on the lhs rather than a reshape on the rhs. At least that was one suggested fix, I don't know if it works... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Sun Jul 6 14:40:01 2014 From: ben.root at ou.edu (Benjamin Root) Date: Sun, 6 Jul 2014 14:40:01 -0400 Subject: [Numpy-discussion] Cython requirement? Message-ID: When did Cython become a build requirement? I remember discussing the use of Cython a while back, and IIRC the agreement was that both the cython code and the generated C files would be included in version control so that cython wouldn't be a build requirement, only a developer requirement when modifying those files. I just did a git clean -fxd and rebase to current master, and I am getting a message indicating that I need Cython 0.19 to build numpy (I haven't updated cython in ages on this particular machine). ben at tigger:~/Programs/numpy$ python setup.py install --user Running from numpy source directory. Cythonizing sources Processing numpy/random/mtrand/mtrand.pyx Traceback (most recent call last): File "/home/ben/Programs/numpy/tools/cythonize.py", line 199, in main() File "/home/ben/Programs/numpy/tools/cythonize.py", line 195, in main find_process_files(root_dir) File "/home/ben/Programs/numpy/tools/cythonize.py", line 187, in find_process_files process(cur_dir, fromfile, tofile, function, hash_db) File "/home/ben/Programs/numpy/tools/cythonize.py", line 161, in process processor_function(fromfile, tofile) File "/home/ben/Programs/numpy/tools/cythonize.py", line 59, in process_pyx raise Exception('Building %s requires Cython >= 0.19' % VENDOR) Exception: Building NumPy requires Cython >= 0.19 Traceback (most recent call last): File "setup.py", line 251, in setup_package() File "setup.py", line 239, in setup_package generate_cython() File "setup.py", line 191, in generate_cython raise RuntimeError("Running cythonize failed!") RuntimeError: Running cythonize failed! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidmenhur at gmail.com Sun Jul 6 14:53:46 2014 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Sun, 6 Jul 2014 20:53:46 +0200 Subject: [Numpy-discussion] Cython requirement? In-Reply-To: References: Message-ID: On 6 July 2014 20:40, Benjamin Root wrote: > When did Cython become a build requirement? I remember discussing the use > of Cython a while back, and IIRC the agreement was that both the cython > code and the generated C files would be included in version control so that > cython wouldn't be a build requirement, only a developer requirement when > modifying those files. The policy was changed to not include them in VC, but to them in the releases, not to pollute the repository and avoid having C files not matching the pyx, IIRC. The change was fairly recent, I was only able to dig this email mentioning it: http://numpy-discussion.10968.n7.nabble.com/numpy-git-master-requiring-cython-for-build-td37250.html /David. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Sun Jul 6 14:54:51 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 06 Jul 2014 20:54:51 +0200 Subject: [Numpy-discussion] Questions about fixes for 1.9.0rc2 In-Reply-To: References: <1404573876.3423.9.camel@sebastian-t440> <1404628234.12836.1.camel@sebastian-t440> Message-ID: <1404672891.14324.2.camel@sebastian-t440> On So, 2014-07-06 at 12:07 -0600, Charles R Harris wrote: > > > > On Sun, Jul 6, 2014 at 11:54 AM, Benjamin Root > wrote: > I see that a solution has already been found and merged. Are > there any remaining issues for matplotlib to resolve? > > > > > > You might take a look at the fixes in the matplotlib PR. They struck > me as a bit hasty rather than fixes for the underlying problems, > especially in the cubic interpolation case. The other mismatched > assignment might be fixable with a `.flat` on the lhs rather than a > reshape on the rhs. At least that was one suggested fix, I don't know > if it works... > Frankly, I wouldn't necessarily suggest using .flat assignments instead. `.flat` will basically enforce the old behavior, which is not necessarily better... - Sebastian > > > > > Chuck > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From robert.kern at gmail.com Sun Jul 6 15:00:24 2014 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 6 Jul 2014 20:00:24 +0100 Subject: [Numpy-discussion] Cython requirement? In-Reply-To: References: Message-ID: On Sun, Jul 6, 2014 at 7:40 PM, Benjamin Root wrote: > When did Cython become a build requirement? I remember discussing the use of > Cython a while back, and IIRC the agreement was that both the cython code > and the generated C files would be included in version control so that > cython wouldn't be a build requirement, only a developer requirement when > modifying those files. It's a build requirement for building from the git checkout, but not the distributed source tarballs. The change was not too long ago, but it was discussed here. -- Robert Kern From ben.root at ou.edu Sun Jul 6 15:32:30 2014 From: ben.root at ou.edu (Benjamin Root) Date: Sun, 6 Jul 2014 15:32:30 -0400 Subject: [Numpy-discussion] indexed assignment testcases Message-ID: While trying to wrap my head around the issues with matplotlib's tri module and the new numpy indexing, I have made some test cases where I wonder if warnings should be issued. import numpy as np a = np.ones((10,)) all_false = np.zeros((10,), dtype=bool) a[all_false] = np.array([2.0]) # the shapes don't match here mask_in = np.array([False]*8 + [True, True]) a[mask_in] = np.array([]) # raises ValueError as expected a[mask_in] = np.array([[]]) # no exception because it is 2-D, for some reason (on master, but not release-0.9b1) a[mask_in] = np.array([2.0]) # This works and repeats 2.0 twice. I thought this wasn't supposed to happen anymore? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Sun Jul 6 15:33:38 2014 From: ben.root at ou.edu (Benjamin Root) Date: Sun, 6 Jul 2014 15:33:38 -0400 Subject: [Numpy-discussion] Cython requirement? In-Reply-To: References: Message-ID: Ok, must have missed that discussion. I don't like the reasoning, but that boat has sailed. On Sun, Jul 6, 2014 at 3:00 PM, Robert Kern wrote: > On Sun, Jul 6, 2014 at 7:40 PM, Benjamin Root wrote: > > When did Cython become a build requirement? I remember discussing the > use of > > Cython a while back, and IIRC the agreement was that both the cython code > > and the generated C files would be included in version control so that > > cython wouldn't be a build requirement, only a developer requirement when > > modifying those files. > > It's a build requirement for building from the git checkout, but not > the distributed source tarballs. The change was not too long ago, but > it was discussed here. > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Sun Jul 6 15:58:36 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 06 Jul 2014 21:58:36 +0200 Subject: [Numpy-discussion] indexed assignment testcases In-Reply-To: References: Message-ID: <1404676716.14324.6.camel@sebastian-t440> On So, 2014-07-06 at 15:32 -0400, Benjamin Root wrote: > While trying to wrap my head around the issues with matplotlib's tri > module and the new numpy indexing, I have made some test cases where I > wonder if warnings should be issued. > > > import numpy as np > > a = np.ones((10,)) > > all_false = np.zeros((10,), dtype=bool) > > a[all_false] = np.array([2.0]) # the shapes don't match here > The shapes match using broadcasting. Values shape of (1,) can be broadcast to indexing result shape of (0,). > > mask_in = np.array([False]*8 + [True, True]) > > a[mask_in] = np.array([]) # raises ValueError as expected > > a[mask_in] = np.array([[]]) # no exception because it is 2-D, for > some reason (on master, but not release-0.9b1) > Gives a (maybe not good) deprecation warning in master. But those are typically invisible... > > a[mask_in] = np.array([2.0]) # This works and repeats 2.0 twice. I > thought this wasn't supposed to happen anymore? > Again, broadcasting of values onto out shape. - Sebastian > > Ben Root > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From charlesr.harris at gmail.com Sun Jul 6 15:59:00 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 6 Jul 2014 13:59:00 -0600 Subject: [Numpy-discussion] indexed assignment testcases In-Reply-To: References: Message-ID: On Sun, Jul 6, 2014 at 1:32 PM, Benjamin Root wrote: > While trying to wrap my head around the issues with matplotlib's tri > module and the new numpy indexing, I have made some test cases where I > wonder if warnings should be issued. > > import numpy as np > a = np.ones((10,)) > all_false = np.zeros((10,), dtype=bool) > a[all_false] = np.array([2.0]) # the shapes don't match here > It broadcasts because the leading dimension is 1. > > mask_in = np.array([False]*8 + [True, True]) > a[mask_in] = np.array([]) # raises ValueError as expected > a[mask_in] = np.array([[]]) # no exception because it is 2-D, for some > reason (on master, but not release-0.9b1) > Now falls back to old behavior and raises a DeprecationWarning. You don't see that by default. > > a[mask_in] = np.array([2.0]) # This works and repeats 2.0 twice. I thought > this wasn't supposed to happen anymore? > Broadcasting again. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Sun Jul 6 16:14:36 2014 From: ben.root at ou.edu (Benjamin Root) Date: Sun, 6 Jul 2014 16:14:36 -0400 Subject: [Numpy-discussion] indexed assignment testcases In-Reply-To: References: Message-ID: re: deprecation warnings... that's what I get when I am working on my non-dev box because I am at the conference, and have gotten too used to the setup of my dev box... as for the broadcasting issue, I can see it for the second case, but the first case still doesn't sit right with me. My understanding of broadcasting is to effectively *expand* an array to match the shape of another array (or some target shape). In this case, the array is being effectively *contracted* in shape. That makes zero sense to me. Ben On Sun, Jul 6, 2014 at 3:59 PM, Charles R Harris wrote: > > > > On Sun, Jul 6, 2014 at 1:32 PM, Benjamin Root wrote: > >> While trying to wrap my head around the issues with matplotlib's tri >> module and the new numpy indexing, I have made some test cases where I >> wonder if warnings should be issued. >> >> import numpy as np >> a = np.ones((10,)) >> all_false = np.zeros((10,), dtype=bool) >> a[all_false] = np.array([2.0]) # the shapes don't match here >> > > It broadcasts because the leading dimension is 1. > > >> >> mask_in = np.array([False]*8 + [True, True]) >> a[mask_in] = np.array([]) # raises ValueError as expected >> a[mask_in] = np.array([[]]) # no exception because it is 2-D, for some >> reason (on master, but not release-0.9b1) >> > > Now falls back to old behavior and raises a DeprecationWarning. You don't > see that by default. > > >> >> a[mask_in] = np.array([2.0]) # This works and repeats 2.0 twice. I >> thought this wasn't supposed to happen anymore? >> > > Broadcasting again. > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From var.mail.daniel at gmail.com Sun Jul 6 16:35:39 2014 From: var.mail.daniel at gmail.com (Daniel da Silva) Date: Sun, 6 Jul 2014 16:35:39 -0400 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style Message-ID: The idea is that there be a short-hand for creating arrays as there is for matrices: np.mat('.2 .7 .1; .3 .5 .2; .1 .1 .9') It was suggested in GitHub issue #4817 in light that it would be beneficial to beginners and to presenters during demonstrations. In GitHub pull request #484 , I implemented this as the np.arr function. Does anyone have any feedback on the API details? Some examples from my implementation follow. >>> np.arr('3; 4; 5') array([[3], [4], [5]]) >>> np.arr('3; 4; 5', dtype=float) array([[ 3.], [ 4.], [ 5.]]) >>> np.arr('1 0 0; 0 1 0; 0 0 1') array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]) >>> np.arr('4, 5; 6, 7') array([[4, 5], [6, 7]]) -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Sun Jul 6 17:04:04 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 06 Jul 2014 23:04:04 +0200 Subject: [Numpy-discussion] indexed assignment testcases In-Reply-To: References: Message-ID: <1404680644.16951.2.camel@sebastian-t440> On So, 2014-07-06 at 16:14 -0400, Benjamin Root wrote: > re: deprecation warnings... that's what I get when I am working on my > non-dev box because I am at the conference, and have gotten too used > to the setup of my dev box... > > > as for the broadcasting issue, I can see it for the second case, but > the first case still doesn't sit right with me. My understanding of > broadcasting is to effectively *expand* an array to match the shape of > another array (or some target shape). In this case, the array is being > effectively *contracted* in shape. That makes zero sense to me. > Well, from a technical point of view, it is more like changing the shape to whatever fits while setting the stride to 0. I am sure there are a few places where the doc is not clear. From a practical point of view, it makes sense if you consider this: arr[arr < 0] = 0 Where it might be that the array has no elements smaller 0. Though I admit I would write 0 here, and not [0]. - Sebastian > > Ben > > > > On Sun, Jul 6, 2014 at 3:59 PM, Charles R Harris > wrote: > > > > On Sun, Jul 6, 2014 at 1:32 PM, Benjamin Root > wrote: > While trying to wrap my head around the issues with > matplotlib's tri module and the new numpy indexing, I > have made some test cases where I wonder if warnings > should be issued. > > > import numpy as np > > a = np.ones((10,)) > > all_false = np.zeros((10,), dtype=bool) > > a[all_false] = np.array([2.0]) # the shapes don't > match here > > > > It broadcasts because the leading dimension is 1. > > > > > mask_in = np.array([False]*8 + [True, True]) > > a[mask_in] = np.array([]) # raises ValueError as > expected > > a[mask_in] = np.array([[]]) # no exception because it > is 2-D, for some reason (on master, but not > release-0.9b1) > > > Now falls back to old behavior and raises a > DeprecationWarning. You don't see that by default. > > > > > a[mask_in] = np.array([2.0]) # This works and repeats > 2.0 twice. I thought this wasn't supposed to happen > anymore? > > > > Broadcasting again. > > > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From njs at pobox.com Sun Jul 6 17:43:10 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 6 Jul 2014 22:43:10 +0100 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: References: Message-ID: On Sun, Jul 6, 2014 at 9:35 PM, Daniel da Silva wrote: > The idea is that there be a short-hand for creating arrays as there is for > matrices: > > np.mat('.2 .7 .1; .3 .5 .2; .1 .1 .9') > > It was suggested in GitHub issue #4817 in light that it would be beneficial > to beginners and to presenters during demonstrations. In GitHub pull > request #484, I implemented this as the np.arr function. > > Does anyone have any feedback on the API details? Some examples from my > implementation follow. > >>>> np.arr('3; 4; 5') > array([[3], > [4], > [5]]) > >>>> np.arr('3; 4; 5', dtype=float) > array([[ 3.], > [ 4.], > [ 5.]]) > >>>> np.arr('1 0 0; 0 1 0; 0 0 1') > array([[1, 0, 0], > [0, 1, 0], > [0, 0, 1]]) > >>>> np.arr('4, 5; 6, 7') > array([[4, 5], > [6, 7]]) It occurs to me that np.mat always returns a 2d matrix, but for arrays there are more options. What should np.arr('1 2 3') return? a 1d array or a 2d row vector? (Maybe np.arr('1 2 3;') should give the row-vector?) Should there be some way to write 3d or higher-d arrays? -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From njs at pobox.com Sun Jul 6 17:48:13 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 6 Jul 2014 22:48:13 +0100 Subject: [Numpy-discussion] indexed assignment testcases In-Reply-To: References: Message-ID: On Sun, Jul 6, 2014 at 9:14 PM, Benjamin Root wrote: > as for the broadcasting issue, I can see it for the second case, but the > first case still doesn't sit right with me. My understanding of broadcasting > is to effectively *expand* an array to match the shape of another array (or > some target shape). In this case, the array is being effectively > *contracted* in shape. That makes zero sense to me. That's how it's always worked though, in all cases of broadcasting; nothing special about indexing: In [8]: a = np.zeros((3, 0)) In [9]: a + 1 Out[9]: array([], shape=(3, 0), dtype=float64) In [10]: a + [[1], [2], [3]] Out[10]: array([], shape=(3, 0), dtype=float64) IME it's extremely useful in practice for avoiding special cases when some axis has a vary size that can be zero. -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From ben.root at ou.edu Sun Jul 6 17:57:37 2014 From: ben.root at ou.edu (Benjamin Root) Date: Sun, 6 Jul 2014 17:57:37 -0400 Subject: [Numpy-discussion] indexed assignment testcases In-Reply-To: References: Message-ID: I guess I always treated scalars as something special when it comes to broadcasting. Seeing these examples, I can see how my grokking of broadcasting was incomplete. I still think that the assignment of an array of values (as opposed to a scalar) to nothing could potentially mask deeper issues, but now I see that it may be impossible to distinguish from the perfectly normal case. Cheers! Ben Root On Sun, Jul 6, 2014 at 5:48 PM, Nathaniel Smith wrote: > On Sun, Jul 6, 2014 at 9:14 PM, Benjamin Root wrote: > > as for the broadcasting issue, I can see it for the second case, but the > > first case still doesn't sit right with me. My understanding of > broadcasting > > is to effectively *expand* an array to match the shape of another array > (or > > some target shape). In this case, the array is being effectively > > *contracted* in shape. That makes zero sense to me. > > That's how it's always worked though, in all cases of broadcasting; > nothing special about indexing: > > In [8]: a = np.zeros((3, 0)) > > In [9]: a + 1 > Out[9]: array([], shape=(3, 0), dtype=float64) > > In [10]: a + [[1], [2], [3]] > Out[10]: array([], shape=(3, 0), dtype=float64) > > IME it's extremely useful in practice for avoiding special cases when > some axis has a vary size that can be zero. > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Sun Jul 6 18:06:25 2014 From: efiring at hawaii.edu (Eric Firing) Date: Sun, 06 Jul 2014 12:06:25 -1000 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: References: Message-ID: <53B9C861.3090809@hawaii.edu> On 2014/07/06, 11:43 AM, Nathaniel Smith wrote: > On Sun, Jul 6, 2014 at 9:35 PM, Daniel da Silva > wrote: >> The idea is that there be a short-hand for creating arrays as there is for >> matrices: >> >> np.mat('.2 .7 .1; .3 .5 .2; .1 .1 .9') >> >> It was suggested in GitHub issue #4817 in light that it would be beneficial >> to beginners and to presenters during demonstrations. In GitHub pull >> request #484, I implemented this as the np.arr function. >> >> Does anyone have any feedback on the API details? Some examples from my >> implementation follow. >> >>>>> np.arr('3; 4; 5') >> array([[3], >> [4], >> [5]]) >> >>>>> np.arr('3; 4; 5', dtype=float) >> array([[ 3.], >> [ 4.], >> [ 5.]]) >> >>>>> np.arr('1 0 0; 0 1 0; 0 0 1') >> array([[1, 0, 0], >> [0, 1, 0], >> [0, 0, 1]]) >> >>>>> np.arr('4, 5; 6, 7') >> array([[4, 5], >> [6, 7]]) > > It occurs to me that np.mat always returns a 2d matrix, but for arrays > there are more options. > > What should np.arr('1 2 3') return? a 1d array or a 2d row vector? I would say 1d array. This is numpy, not numpy.matrix. > (Maybe np.arr('1 2 3;') should give the row-vector?) Yes, it is reasonable that a semicolon should trigger 2d. > > Should there be some way to write 3d or higher-d arrays? No, there should not. This is for quick demos and that sort of thing. It is not a substitute for np.array(). (I'm not entirely convinced np.arr() is a good idea at all; but if it is, it must be kept simple.) A possible downside for beginners is that this might delay their understanding that the commas are needed for np.array([1, 2, 3]). Eric > > -n > From ted.sandler at gmail.com Sun Jul 6 18:47:47 2014 From: ted.sandler at gmail.com (Ted Sandler) Date: Sun, 6 Jul 2014 15:47:47 -0700 Subject: [Numpy-discussion] parsing dtype descriptors In-Reply-To: References: <20140703193506.GA25653@kudu.in-berlin.de> Message-ID: Thanks! On Fri, Jul 4, 2014 at 1:53 AM, Robert Kern wrote: > On Thu, Jul 3, 2014 at 10:53 PM, Ted Sandler > wrote: > > Thanks. No, it's not what I'm looking for. > > > > I'm looking for the code that parses the string " array > > header's descriptor: > > > > {'descr': ' > > > There are many different descriptor strings, e.g.: > > > > '>f8' > > '=f4' > > 'float32' > > '>c16' > > ... > > > > Ideally, I want the exhaustive list of valid input strings that describe > > standard ndarrays (i.e. ndarrays with simple entries as opposed to > records > > or subarrays). Lacking an exhaustive list or spec, I'd like the source > code > > that does the parsing for them. > > > https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/descriptor.c#L1321 > > https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/conversion_utils.c#L1000 > > https://github.com/numpy/numpy/blob/master/numpy/core/include/numpy/ndarraytypes.h#L97 > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndarray at mac.com Sun Jul 6 22:27:21 2014 From: ndarray at mac.com (Alexander Belopolsky) Date: Sun, 6 Jul 2014 22:27:21 -0400 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: <53B9C861.3090809@hawaii.edu> References: <53B9C861.3090809@hawaii.edu> Message-ID: On Sun, Jul 6, 2014 at 6:06 PM, Eric Firing wrote: > (I'm not entirely convinced > np.arr() is a good idea at all; but if it is, it must be kept simple.) > If you are going to introduce this functionality, please don't call it np.arr. Right now, np.a presents you with a whopping 53 completion choices. Adding "r", narrows that to 21, but np.arr completes to np.array right away. Please don't introduce another bump in this road. "Namespaces are one honking great idea -- let's do more of those!" I would suggest calling it something like np.array_simple or np.array_from_string, but the best choice IMO, would be np.ndarray.from_string (a static constructor method). -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Sun Jul 6 22:59:45 2014 From: efiring at hawaii.edu (Eric Firing) Date: Sun, 06 Jul 2014 16:59:45 -1000 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: References: <53B9C861.3090809@hawaii.edu> Message-ID: <53BA0D21.5050508@hawaii.edu> On 2014/07/06, 4:27 PM, Alexander Belopolsky wrote: > > On Sun, Jul 6, 2014 at 6:06 PM, Eric Firing > wrote: > > (I'm not entirely convinced > np.arr() is a good idea at all; but if it is, it must be kept simple.) > > > If you are going to introduce this functionality, please don't call it > np.arr. > > Right now, np.a presents you with a whopping 53 completion choices. > Adding "r", narrows that to 21, but np.arr completes to np.array > right away. Please don't introduce another bump in this road. > > "Namespaces are one honking great idea -- let's do more of those!" > > I would suggest calling it something like np.array_simple or > np.array_from_string, but the best choice IMO, would be > np.ndarray.from_string (a static constructor method). I think the problem is that this defeats the point: minimizing typing when doing an off-the-cuff demo or test. I don't know that this use case justifies the clutter, regardless of what it is called; but evidently there is some demand for it. Eric > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From ndarray at mac.com Mon Jul 7 00:29:33 2014 From: ndarray at mac.com (Alexander Belopolsky) Date: Mon, 7 Jul 2014 00:29:33 -0400 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: <53BA0D21.5050508@hawaii.edu> References: <53B9C861.3090809@hawaii.edu> <53BA0D21.5050508@hawaii.edu> Message-ID: On Sun, Jul 6, 2014 at 10:59 PM, Eric Firing wrote: > > I would suggest calling it something like np.array_simple or > > np.array_from_string, but the best choice IMO, would be > > np.ndarray.from_string (a static constructor method). > > > I think the problem is that this defeats the point: minimizing typing > when doing an off-the-cuff demo or test. You can always put np.arr = np.ndarray.from_string or even arr = np.ndarray.from_string right next to the line where you define np. (Which makes me wonder if something like this belongs to ipython magic.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Jul 7 01:53:22 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 6 Jul 2014 23:53:22 -0600 Subject: [Numpy-discussion] 1.10-devel is open Message-ID: Just so. The fixes for 1.9.0b1 are now in that branch ready for the next beta. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From J.M.Hoekstra at tudelft.nl Mon Jul 7 02:48:52 2014 From: J.M.Hoekstra at tudelft.nl (Jacco Hoekstra - LR) Date: Mon, 7 Jul 2014 06:48:52 +0000 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: References: <53B9C861.3090809@hawaii.edu> <53BA0D21.5050508@hawaii.edu> Message-ID: <245AC908B39361438CFA2299B0DD50E438FAB9BD@SRV361.tudelft.net> How about using the old name np.mat() for this type of array creation? So the: A = np.mat(?1 2;3 4?) creates a two dimensional array. But then resulting in an array A instead of the matrix type? It might at least provide some partial downward compatibility. Best regards, Jacco Hoekstra From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Alexander Belopolsky Sent: maandag 7 juli 2014 6:30 To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Short-hand array creation in `numpy.mat` style On Sun, Jul 6, 2014 at 10:59 PM, Eric Firing > wrote: > I would suggest calling it something like np.array_simple or > np.array_from_string, but the best choice IMO, would be > np.ndarray.from_string (a static constructor method). I think the problem is that this defeats the point: minimizing typing when doing an off-the-cuff demo or test. You can always put np.arr = np.ndarray.from_string or even arr = np.ndarray.from_string right next to the line where you define np. (Which makes me wonder if something like this belongs to ipython magic.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Mon Jul 7 04:02:13 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Mon, 07 Jul 2014 10:02:13 +0200 Subject: [Numpy-discussion] 1.10-devel is open In-Reply-To: References: Message-ID: <53BA5405.5000508@googlemail.com> On 07.07.2014 07:53, Charles R Harris wrote: > Just so. The fixes for 1.9.0b1 are now in that branch ready for the next > beta. > how did you do that without a merge commit? however you did it you have git has lost ancestry which is not so nice for backporting. If there are no objections I'd like to rewind the maintenance branch back to beta1 and merge master in properly. From jtaylor.debian at googlemail.com Mon Jul 7 04:33:10 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Mon, 07 Jul 2014 10:33:10 +0200 Subject: [Numpy-discussion] 1.10-devel is open In-Reply-To: <53BA5405.5000508@googlemail.com> References: <53BA5405.5000508@googlemail.com> Message-ID: <53BA5B46.4010306@googlemail.com> On 07.07.2014 10:02, Julian Taylor wrote: > On 07.07.2014 07:53, Charles R Harris wrote: >> Just so. The fixes for 1.9.0b1 are now in that branch ready for the next >> beta. >> > > how did you do that without a merge commit? > however you did it you have git has lost ancestry which is not so nice > for backporting. > If there are no objections I'd like to rewind the maintenance branch > back to beta1 and merge master in properly. > I went ahead with the and rewind + merge [0], please reset your branches to the origin in case you updated the maintenance/1.9.x branch in last few hours and get merge errors when running git pull. [0] https://github.com/numpy/numpy/pull/4849 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From olivier.grisel at ensta.org Mon Jul 7 05:18:42 2014 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Mon, 7 Jul 2014 11:18:42 +0200 Subject: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing In-Reply-To: References: <536CB2C6.1030305@googlemail.com> Message-ID: Hi! I gave appveyor a try this WE so as to build a minimalistic Python 3 project with a Cython extension. It works both with 32 and 64 bit MSVC++ and can generate wheel packages. See: https://github.com/ogrisel/python-appveyor-demo However 2008 is not (yet) installed so it cannot be used for Python 2.7. The Feodor Fitsner seems to be open to install older versions of MSVC++ on the worker VM image so this might be possible in the future. Let's see. Off-course for numpy / scipy this does not solve the fortran compiler issue, so Carl's static mingw-w64 toolchain still looks like a very promising solution (and could probably be run on the appveyor infra as well). Best, -- Olivier From davidmenhur at gmail.com Mon Jul 7 07:17:16 2014 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Mon, 7 Jul 2014 13:17:16 +0200 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: <245AC908B39361438CFA2299B0DD50E438FAB9BD@SRV361.tudelft.net> References: <53B9C861.3090809@hawaii.edu> <53BA0D21.5050508@hawaii.edu> <245AC908B39361438CFA2299B0DD50E438FAB9BD@SRV361.tudelft.net> Message-ID: On 7 July 2014 08:48, Jacco Hoekstra - LR wrote: > How about using the old name np.mat() for this type of array creation? How about a new one? np.matarray, for MATLAB array. /David. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Mon Jul 7 08:25:25 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 07 Jul 2014 08:25:25 -0400 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: References: <53B9C861.3090809@hawaii.edu> <53BA0D21.5050508@hawaii.edu> <245AC908B39361438CFA2299B0DD50E438FAB9BD@SRV361.tudelft.net> Message-ID: <53BA91B5.6010604@gmail.com> On 7/7/2014 7:17 AM, Da?id wrote: > How about a new one? np.matarray, for MATLAB array. How about `str2arr` or even `build`, since teaching appears to be a focus. Also, I agree '1 2 3' shd become 1d and '1 2 3;' shd become 2d. It seems unambiguous to allow '1 2 3;;' to be 3d, or even '1 2;3 4;;5 6;7 8' (two 2d arrays), but I'm just noting that, not urging that it be implemented. Alan Isaac From charlesr.harris at gmail.com Mon Jul 7 08:34:18 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 7 Jul 2014 06:34:18 -0600 Subject: [Numpy-discussion] 1.10-devel is open In-Reply-To: <53BA5405.5000508@googlemail.com> References: <53BA5405.5000508@googlemail.com> Message-ID: On Mon, Jul 7, 2014 at 2:02 AM, Julian Taylor wrote: > On 07.07.2014 07:53, Charles R Harris wrote: > > Just so. The fixes for 1.9.0b1 are now in that branch ready for the next > > beta. > > > > how did you do that without a merge commit? > git branch tmp maintenance/1.9.x git co tmp git branch -f maintenance/1.9.x d244ec7 git rebase -p --onto tmp 10098da maintenance/1.9.x > however you did it you have git has lost ancestry which is not so nice > for backporting. > Same changesets, I believe. If '-p' is omitted the merges are omitted. > If there are no objections I'd like to rewind the maintenance branch > back to beta1 and merge master in properly. > I thought this somewhat cleaner than a merge :0 Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Mon Jul 7 08:46:13 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Mon, 7 Jul 2014 14:46:13 +0200 Subject: [Numpy-discussion] 1.10-devel is open In-Reply-To: References: <53BA5405.5000508@googlemail.com> Message-ID: On Mon, Jul 7, 2014 at 2:34 PM, Charles R Harris wrote: > On Mon, Jul 7, 2014 at 2:02 AM, Julian Taylor > wrote: >> >> On 07.07.2014 07:53, Charles R Harris wrote: >> > Just so. The fixes for 1.9.0b1 are now in that branch ready for the next >> > beta. >> > >> >> how did you do that without a merge commit? > > > git branch tmp maintenance/1.9.x > git co tmp > git branch -f maintenance/1.9.x d244ec7 > git rebase -p --onto tmp 10098da maintenance/1.9.x > >> >> however you did it you have git has lost ancestry which is not so nice >> for backporting. > > > Same changesets, I believe. If '-p' is omitted the merges are omitted. > >> >> If there are no objections I'd like to rewind the maintenance branch >> back to beta1 and merge master in properly. > > > I thought this somewhat cleaner than a merge :0 > By rebasing or cherry-picking git loses the information that the changeset originates from another branch. So when you try to merge or cherrypick more changes from the branch the changes are coming from the automerging bails or is at least less useful. So if you are moving changes from one branch to another one should merge whenever possible. Now that both branches have diverged, 1.9 by the release commit, and 1.10 by the opening commit, there is no easy way for git to track the origins of a changeset and we have to do the usual cherry picking, as to my knowledge git does not have partial merges. From sebastian at sipsolutions.net Mon Jul 7 09:11:47 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 07 Jul 2014 15:11:47 +0200 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: <53BA91B5.6010604@gmail.com> References: <53B9C861.3090809@hawaii.edu> <53BA0D21.5050508@hawaii.edu> <245AC908B39361438CFA2299B0DD50E438FAB9BD@SRV361.tudelft.net> <53BA91B5.6010604@gmail.com> Message-ID: <1404738707.25854.13.camel@sebastian-t440> On Mo, 2014-07-07 at 08:25 -0400, Alan G Isaac wrote: > On 7/7/2014 7:17 AM, Da?id wrote: > > How about a new one? np.matarray, for MATLAB array. > > > How about `str2arr` or even `build`, since teaching appears to be a focus. > Also, I agree '1 2 3' shd become 1d and '1 2 3;' shd become 2d. > It seems unambiguous to allow '1 2 3;;' to be 3d, or even > '1 2;3 4;;5 6;7 8' (two 2d arrays), but I'm just noting > that, not urging that it be implemented. > Probably overdoing it, but if we plan on more then just this, what about banning such functions to something like numpy.interactive/numpy.helpers which you can then import * (or better specific functions) from? I think the fact that you need many imports on startup should rather be fixed by an ipython scientific mode or other startup imports. - Sebastian > Alan Isaac > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From charlesr.harris at gmail.com Mon Jul 7 09:12:50 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 7 Jul 2014 07:12:50 -0600 Subject: [Numpy-discussion] 1.10-devel is open In-Reply-To: References: <53BA5405.5000508@googlemail.com> Message-ID: On Mon, Jul 7, 2014 at 6:46 AM, Julian Taylor wrote: > On Mon, Jul 7, 2014 at 2:34 PM, Charles R Harris > wrote: > > On Mon, Jul 7, 2014 at 2:02 AM, Julian Taylor > > wrote: > >> > >> On 07.07.2014 07:53, Charles R Harris wrote: > >> > Just so. The fixes for 1.9.0b1 are now in that branch ready for the > next > >> > beta. > >> > > >> > >> how did you do that without a merge commit? > > > > > > git branch tmp maintenance/1.9.x > > git co tmp > > git branch -f maintenance/1.9.x d244ec7 > > git rebase -p --onto tmp 10098da maintenance/1.9.x > > > >> > >> however you did it you have git has lost ancestry which is not so nice > >> for backporting. > > > > > > Same changesets, I believe. If '-p' is omitted the merges are omitted. > > > >> > >> If there are no objections I'd like to rewind the maintenance branch > >> back to beta1 and merge master in properly. > > > > > > I thought this somewhat cleaner than a merge :0 > > > > By rebasing or cherry-picking git loses the information that the > changeset originates from another branch. > So when you try to merge or cherrypick more changes from the branch > the changes are coming from the automerging bails or is at least less > useful. > So if you are moving changes from one branch to another one should > merge whenever possible. > > Now that both branches have diverged, 1.9 by the release commit, and > 1.10 by the opening commit, there is no easy way for git to track the > origins of a changeset and we have to do the usual cherry picking, as > to my knowledge git does not have partial merges. > Yes, what I did was like one big cherry-pick. But I think we end up in the same place with two divergent branches. I think git history is just a string of changesets and each changeset has a hash. Same hash, same changeset, and I think that was preserved, so in that sense history was preserved. The 1.9.x branch pushed without trouble. Anyway, six of one, half dozen of the other. I was going to do the merge route originally, even did the merge. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Jul 7 09:25:58 2014 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 7 Jul 2014 14:25:58 +0100 Subject: [Numpy-discussion] 1.10-devel is open In-Reply-To: References: <53BA5405.5000508@googlemail.com> Message-ID: On 7 Jul 2014 14:12, "Charles R Harris" wrote:. > > Yes, what I did was like one big cherry-pick. But I think we end up in the same place with two divergent branches. I think git history is just a string of changesets and each changeset has a hash. Same hash, same changeset, and I think that was preserved, so in that sense history was preserved. No, git history hashes are effectively a hash of . So when you rebase, you keep the same changes but move them to be based on a different base revision. This means that the rebased changes get new hashes, and git has no idea that the changes are related. If you merge, then git marks the original changes as being parents of the merge node, so it can answer questions like "what changes have been applied to maintenance/1.9.x since it branched from master?", and can do better cherrypicks because it can use a more recent common ancestor. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Mon Jul 7 09:30:59 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Mon, 7 Jul 2014 15:30:59 +0200 Subject: [Numpy-discussion] 1.10-devel is open In-Reply-To: References: <53BA5405.5000508@googlemail.com> Message-ID: On Mon, Jul 7, 2014 at 3:12 PM, Charles R Harris wrote: > On Mon, Jul 7, 2014 at 6:46 AM, Julian Taylor > wrote: >> >> On Mon, Jul 7, 2014 at 2:34 PM, Charles R Harris >> wrote: >> > On Mon, Jul 7, 2014 at 2:02 AM, Julian Taylor >> > wrote: >> >> >> >> On 07.07.2014 07:53, Charles R Harris wrote: >> >> > Just so. The fixes for 1.9.0b1 are now in that branch ready for the >> >> > next >> >> > beta. >> >> > >> >> >> >> how did you do that without a merge commit? >> > >> > >> > git branch tmp maintenance/1.9.x >> > git co tmp >> > git branch -f maintenance/1.9.x d244ec7 >> > git rebase -p --onto tmp 10098da maintenance/1.9.x >> > >> >> >> >> however you did it you have git has lost ancestry which is not so nice >> >> for backporting. >> > >> > >> > Same changesets, I believe. If '-p' is omitted the merges are omitted. >> > >> >> >> >> If there are no objections I'd like to rewind the maintenance branch >> >> back to beta1 and merge master in properly. >> > >> > >> > I thought this somewhat cleaner than a merge :0 >> > >> >> By rebasing or cherry-picking git loses the information that the >> changeset originates from another branch. >> So when you try to merge or cherrypick more changes from the branch >> the changes are coming from the automerging bails or is at least less >> useful. >> So if you are moving changes from one branch to another one should >> merge whenever possible. >> >> Now that both branches have diverged, 1.9 by the release commit, and >> 1.10 by the opening commit, there is no easy way for git to track the >> origins of a changeset and we have to do the usual cherry picking, as >> to my knowledge git does not have partial merges. > > > Yes, what I did was like one big cherry-pick. But I think we end up in the > same place with two divergent branches. I think git history is just a string > of changesets and each changeset has a hash. Same hash, same changeset, and > I think that was preserved, so in that sense history was preserved. The > 1.9.x branch pushed without trouble. Anyway, six of one, half dozen of the > other. I was going to do the merge route originally, even did the merge. > the rebase does not preserve hashes, it rewrites the commits (minimal change is changing the commiter). Your approach brings us to this state: R the maintenance release commit, D the master 1.10 opening commit A - > B -> C -> D < master \ R -> B' -> C' < maintenance whereas a merge is this: A -> B -> C -> D < master \ merge \ R -------- \ M < maintenance the difference is when you now want to merge D into the maintenance branch. In the first case git tries to merge the D changeset into the branch, it tracks down the anchestry of D and C' figures This leads to the merge base A, and git needs to merge B, C, D, R, B', C' now merging B and B' and C and C' will conflict as they change the same lines (in an ideal world git would realize that the diffs are actually equal, but it does not do that in my experience) and asks the user for help. now in the merge case its different. You want to move D into the branch it tracks down the ancestry of D and R This leads to the merge base A, and both branches have the same commit B and C. So now it only needs to merge D and R (leading to M) which will be automatic if they do not conflict. From josef.pktd at gmail.com Mon Jul 7 09:50:35 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 7 Jul 2014 09:50:35 -0400 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: <1404738707.25854.13.camel@sebastian-t440> References: <53B9C861.3090809@hawaii.edu> <53BA0D21.5050508@hawaii.edu> <245AC908B39361438CFA2299B0DD50E438FAB9BD@SRV361.tudelft.net> <53BA91B5.6010604@gmail.com> <1404738707.25854.13.camel@sebastian-t440> Message-ID: On Mon, Jul 7, 2014 at 9:11 AM, Sebastian Berg wrote: > On Mo, 2014-07-07 at 08:25 -0400, Alan G Isaac wrote: > > On 7/7/2014 7:17 AM, Da?id wrote: > > > How about a new one? np.matarray, for MATLAB array. > > > > > > How about `str2arr` or even `build`, since teaching appears to be a > focus. > > Also, I agree '1 2 3' shd become 1d and '1 2 3;' shd become 2d. > > It seems unambiguous to allow '1 2 3;;' to be 3d, or even > > '1 2;3 4;;5 6;7 8' (two 2d arrays), but I'm just noting > > that, not urging that it be implemented. > > > > Probably overdoing it, but if we plan on more then just this, what about > banning such functions to something like numpy.interactive/numpy.helpers > which you can then import * (or better specific functions) from? > > I think the fact that you need many imports on startup should rather be > fixed by an ipython scientific mode or other startup imports. > Is this whole thing really worth it? We get back to a numpy pylab. First users learn the dirty shortcuts, and then they have to learn how to do it "properly". (I'm using quite often string split and reshape for copy-pasted text tables.) Josef > > - Sebastian > > > Alan Isaac > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Mon Jul 7 10:28:09 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 07 Jul 2014 16:28:09 +0200 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: References: <53B9C861.3090809@hawaii.edu> <53BA0D21.5050508@hawaii.edu> <245AC908B39361438CFA2299B0DD50E438FAB9BD@SRV361.tudelft.net> <53BA91B5.6010604@gmail.com> <1404738707.25854.13.camel@sebastian-t440> Message-ID: <1404743289.25854.23.camel@sebastian-t440> On Mo, 2014-07-07 at 09:50 -0400, josef.pktd at gmail.com wrote: > > > > On Mon, Jul 7, 2014 at 9:11 AM, Sebastian Berg > wrote: > On Mo, 2014-07-07 at 08:25 -0400, Alan G Isaac wrote: > > On 7/7/2014 7:17 AM, Da?id wrote: > > > How about a new one? np.matarray, for MATLAB array. > > > > > > How about `str2arr` or even `build`, since teaching appears > to be a focus. > > Also, I agree '1 2 3' shd become 1d and '1 2 3;' shd become > 2d. > > It seems unambiguous to allow '1 2 3;;' to be 3d, or even > > '1 2;3 4;;5 6;7 8' (two 2d arrays), but I'm just noting > > that, not urging that it be implemented. > > > > > Probably overdoing it, but if we plan on more then just this, > what about > banning such functions to something like > numpy.interactive/numpy.helpers > which you can then import * (or better specific functions) > from? > > I think the fact that you need many imports on startup should > rather be > fixed by an ipython scientific mode or other startup imports. > > > > > Is this whole thing really worth it? We get back to a numpy pylab. > > > First users learn the dirty shortcuts, and then they have to learn how > to do it "properly". > Yeah, you are right. Just a bit afraid of creating too many such functions that I am not sure are very useful/used much. For example I am not sure that many use np.r_ or np.c_ > > > (I'm using quite often string split and reshape for copy-pasted text > tables.) > > > Josef > > > > > - Sebastian > > > Alan Isaac > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From njs at pobox.com Mon Jul 7 13:58:32 2014 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 7 Jul 2014 18:58:32 +0100 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: <1404743289.25854.23.camel@sebastian-t440> References: <53B9C861.3090809@hawaii.edu> <53BA0D21.5050508@hawaii.edu> <245AC908B39361438CFA2299B0DD50E438FAB9BD@SRV361.tudelft.net> <53BA91B5.6010604@gmail.com> <1404738707.25854.13.camel@sebastian-t440> <1404743289.25854.23.camel@sebastian-t440> Message-ID: On Mon, Jul 7, 2014 at 3:28 PM, Sebastian Berg wrote: > On Mo, 2014-07-07 at 09:50 -0400, josef.pktd at gmail.com wrote: >> >> On Mon, Jul 7, 2014 at 9:11 AM, Sebastian Berg >> wrote: >> On Mo, 2014-07-07 at 08:25 -0400, Alan G Isaac wrote: >> > On 7/7/2014 7:17 AM, Da?id wrote: >> > > How about a new one? np.matarray, for MATLAB array. >> > >> > >> > How about `str2arr` or even `build`, since teaching appears >> to be a focus. >> > Also, I agree '1 2 3' shd become 1d and '1 2 3;' shd become >> 2d. >> > It seems unambiguous to allow '1 2 3;;' to be 3d, or even >> > '1 2;3 4;;5 6;7 8' (two 2d arrays), but I'm just noting >> > that, not urging that it be implemented. >> > >> >> Probably overdoing it, but if we plan on more then just this, >> what about >> banning such functions to something like >> numpy.interactive/numpy.helpers >> which you can then import * (or better specific functions) >> from? >> >> I think the fact that you need many imports on startup should >> rather be >> fixed by an ipython scientific mode or other startup imports. >> >> >> >> >> Is this whole thing really worth it? We get back to a numpy pylab. >> >> >> First users learn the dirty shortcuts, and then they have to learn how >> to do it "properly". >> > > Yeah, you are right. Just a bit afraid of creating too many such > functions that I am not sure are very useful/used much. For example I am > not sure that many use np.r_ or np.c_ Yeah, we definitely have too many random bits of API around overall. But I think this one is probably worthwhile. It doesn't add any real complexity (no new types, trivial for readers to understand the first time they encounter it, etc.), and it addresses a recurring perceived shortcoming of numpy that people run into in the first 5 minutes of use, at a time when it's pretty easy to give up and go back to Matlab. And, it removes one of the perceived advantages of np.matrix over np.ndarray, so it smooths our way for eventually phasing out np.matrix. I'm not sure that preserving np.arr is that important ( here only saves 1 character!), but some possible alternatives for short names: np.marr ("matlab-like array construction") np.sarr ("string array") np.parse -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From josef.pktd at gmail.com Mon Jul 7 14:15:33 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 7 Jul 2014 14:15:33 -0400 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: References: <53B9C861.3090809@hawaii.edu> <53BA0D21.5050508@hawaii.edu> <245AC908B39361438CFA2299B0DD50E438FAB9BD@SRV361.tudelft.net> <53BA91B5.6010604@gmail.com> <1404738707.25854.13.camel@sebastian-t440> <1404743289.25854.23.camel@sebastian-t440> Message-ID: On Mon, Jul 7, 2014 at 1:58 PM, Nathaniel Smith wrote: > On Mon, Jul 7, 2014 at 3:28 PM, Sebastian Berg > wrote: > > On Mo, 2014-07-07 at 09:50 -0400, josef.pktd at gmail.com wrote: > >> > >> On Mon, Jul 7, 2014 at 9:11 AM, Sebastian Berg > >> wrote: > >> On Mo, 2014-07-07 at 08:25 -0400, Alan G Isaac wrote: > >> > On 7/7/2014 7:17 AM, Da?id wrote: > >> > > How about a new one? np.matarray, for MATLAB array. > >> > > >> > > >> > How about `str2arr` or even `build`, since teaching appears > >> to be a focus. > >> > Also, I agree '1 2 3' shd become 1d and '1 2 3;' shd become > >> 2d. > >> > It seems unambiguous to allow '1 2 3;;' to be 3d, or even > >> > '1 2;3 4;;5 6;7 8' (two 2d arrays), but I'm just noting > >> > that, not urging that it be implemented. > >> > > >> > >> Probably overdoing it, but if we plan on more then just this, > >> what about > >> banning such functions to something like > >> numpy.interactive/numpy.helpers > >> which you can then import * (or better specific functions) > >> from? > >> > >> I think the fact that you need many imports on startup should > >> rather be > >> fixed by an ipython scientific mode or other startup imports. > >> > >> > >> > >> > >> Is this whole thing really worth it? We get back to a numpy pylab. > >> > >> > >> First users learn the dirty shortcuts, and then they have to learn how > >> to do it "properly". > >> > > > > Yeah, you are right. Just a bit afraid of creating too many such > > functions that I am not sure are very useful/used much. For example I am > > not sure that many use np.r_ or np.c_ > > Yeah, we definitely have too many random bits of API around overall. > But I think this one is probably worthwhile. It doesn't add any real > complexity (no new types, trivial for readers to understand the first > time they encounter it, etc.), and it addresses a recurring perceived > shortcoming of numpy that people run into in the first 5 minutes of > use, at a time when it's pretty easy to give up and go back to Matlab. > And, it removes one of the perceived advantages of np.matrix over > np.ndarray, so it smooths our way for eventually phasing out > np.matrix. > > I'm not sure that preserving np.arr is that important ( here > only saves 1 character!), but some possible alternatives for short > names: > > np.marr ("matlab-like array construction") > np.sarr ("string array") > np.parse > short like np.s (didn't know there is already s_) something long like >>> np.fromstring('1 2', sep=' ') array([ 1., 2.]) >>> np.fromstring2d('1 2 3; 5 3.4 7') array([[ 1. , 2. , 3. ], [ 5. , 3.4, 7. ]]) Josef > > -n > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at gmail.com Mon Jul 7 14:20:30 2014 From: faltet at gmail.com (Francesc Alted) Date: Mon, 07 Jul 2014 20:20:30 +0200 Subject: [Numpy-discussion] ANN: python-blosc 1.2.7 released Message-ID: <53BAE4EE.5090909@gmail.com> ============================= Announcing python-blosc 1.2.4 ============================= What is new? ============ This is a maintenance release, where included c-blosc sources have been updated to 1.4.0. This adds support for non-Intel architectures, most specially those not supporting unaligned access. For more info, you can have a look at the release notes in: https://github.com/Blosc/python-blosc/wiki/Release-notes More docs and examples are available in the documentation site: http://python-blosc.blosc.org What is it? =========== Blosc (http://www.blosc.org) is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc is the first compressor that is meant not only to reduce the size of large datasets on-disk or in-memory, but also to accelerate object manipulations that are memory-bound (http://www.blosc.org/docs/StarvingCPUs.pdf). See http://www.blosc.org/synthetic-benchmarks.html for some benchmarks on how much speed it can achieve in some datasets. Blosc works well for compressing numerical arrays that contains data with relatively low entropy, like sparse data, time series, grids with regular-spaced values, etc. python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for the Blosc compression library. There is also a handy command line and Python library for Blosc called Bloscpack (https://github.com/Blosc/bloscpack) that allows you to compress large binary datafiles on-disk. Installing ========== python-blosc is in PyPI repository, so installing it is easy: $ pip install -U blosc # yes, you should omit the python- prefix Download sources ================ The sources are managed through github services at: http://github.com/Blosc/python-blosc Documentation ============= There is Sphinx-based documentation site at: http://python-blosc.blosc.org/ Mailing list ============ There is an official mailing list for Blosc at: blosc at googlegroups.com http://groups.google.es/group/blosc Licenses ======== Both Blosc and its Python wrapper are distributed using the MIT license. See: https://github.com/Blosc/python-blosc/blob/master/LICENSES for more details. ---- **Enjoy data!** -- Francesc Alted From faltet at gmail.com Mon Jul 7 14:28:31 2014 From: faltet at gmail.com (Francesc Alted) Date: Mon, 07 Jul 2014 20:28:31 +0200 Subject: [Numpy-discussion] [CORRECTION] python-blosc 1.2.4 released (Was: ANN: python-blosc 1.2.7 released) In-Reply-To: <53BAE4EE.5090909@gmail.com> References: <53BAE4EE.5090909@gmail.com> Message-ID: <53BAE6CF.3070008@gmail.com> Indeed it was 1.2.4 the version just released and not 1.2.7. Sorry for the typo! Francesc On 7/7/14, 8:20 PM, Francesc Alted wrote: > ============================= > Announcing python-blosc 1.2.4 > ============================= > > What is new? > ============ > > This is a maintenance release, where included c-blosc sources have been > updated to 1.4.0. This adds support for non-Intel architectures, most > specially those not supporting unaligned access. > > For more info, you can have a look at the release notes in: > > https://github.com/Blosc/python-blosc/wiki/Release-notes > > More docs and examples are available in the documentation site: > > http://python-blosc.blosc.org > > > What is it? > =========== > > Blosc (http://www.blosc.org) is a high performance compressor > optimized for binary data. It has been designed to transmit data to > the processor cache faster than the traditional, non-compressed, > direct memory fetch approach via a memcpy() OS call. > > Blosc is the first compressor that is meant not only to reduce the size > of large datasets on-disk or in-memory, but also to accelerate object > manipulations that are memory-bound > (http://www.blosc.org/docs/StarvingCPUs.pdf). See > http://www.blosc.org/synthetic-benchmarks.html for some benchmarks on > how much speed it can achieve in some datasets. > > Blosc works well for compressing numerical arrays that contains data > with relatively low entropy, like sparse data, time series, grids with > regular-spaced values, etc. > > python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for > the Blosc compression library. > > There is also a handy command line and Python library for Blosc called > Bloscpack (https://github.com/Blosc/bloscpack) that allows you to > compress large binary datafiles on-disk. > > > Installing > ========== > > python-blosc is in PyPI repository, so installing it is easy: > > $ pip install -U blosc # yes, you should omit the python- prefix > > > Download sources > ================ > > The sources are managed through github services at: > > http://github.com/Blosc/python-blosc > > > Documentation > ============= > > There is Sphinx-based documentation site at: > > http://python-blosc.blosc.org/ > > > Mailing list > ============ > > There is an official mailing list for Blosc at: > > blosc at googlegroups.com > http://groups.google.es/group/blosc > > > Licenses > ======== > > Both Blosc and its Python wrapper are distributed using the MIT license. > See: > > https://github.com/Blosc/python-blosc/blob/master/LICENSES > > for more details. > > ---- > > **Enjoy data!** > -- Francesc Alted From valentin at haenel.co Mon Jul 7 14:30:44 2014 From: valentin at haenel.co (Valentin Haenel) Date: Mon, 7 Jul 2014 20:30:44 +0200 Subject: [Numpy-discussion] ANN: python-blosc 1.2.7 released In-Reply-To: <53BAE4EE.5090909@gmail.com> References: <53BAE4EE.5090909@gmail.com> Message-ID: <20140707183044.GB13382@kudu.in-berlin.de> Hi, * Francesc Alted [2014-07-07]: [snip] > There is also a handy command line and Python library for Blosc called > Bloscpack (https://github.com/Blosc/bloscpack) that allows you to > compress large binary datafiles on-disk. For this list, you might be interested to know, that Bloscpack also supports compressing/decompressing Numpy arrays out-of-the-box via a Python API: https://github.com/Blosc/bloscpack#numpy best, V- From chris.barker at noaa.gov Mon Jul 7 14:32:12 2014 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Mon, 7 Jul 2014 11:32:12 -0700 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: References: <53B9C861.3090809@hawaii.edu> Message-ID: <-2968451659458027190@unknownmsgid> If you are going to introduce this functionality, please don't call it np.arr. I agree, but.., I would suggest calling it something like np.array_simple or np.array_from_string, but the best choice IMO, would be np.ndarray.from_string (a static constructor method). Except the entire point of his is that it's easy to type... -1 on the whole idea -- this isn't Matlab, I'd saving a little typing worth it? CHB _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Jul 7 15:02:45 2014 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Mon, 7 Jul 2014 12:02:45 -0700 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: <1404743289.25854.23.camel@sebastian-t440> References: <53B9C861.3090809@hawaii.edu> <53BA0D21.5050508@hawaii.edu> <245AC908B39361438CFA2299B0DD50E438FAB9BD@SRV361.tudelft.net> <53BA91B5.6010604@gmail.com> <1404738707.25854.13.camel@sebastian-t440> <1404743289.25854.23.camel@sebastian-t440> Message-ID: <-337347367645528086@unknownmsgid> On Jul 7, 2014, at 7:28 AM, Sebastian Berg wrote: > not sure that many use np.r_ or np.c_ I actually really like those ;-) -Chris From pav at iki.fi Tue Jul 8 09:09:17 2014 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 08 Jul 2014 16:09:17 +0300 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: <-2968451659458027190@unknownmsgid> References: <53B9C861.3090809@hawaii.edu> <-2968451659458027190@unknownmsgid> Message-ID: 07.07.2014 21:32, Chris Barker - NOAA Federal kirjoitti: > If you are going to introduce this functionality, please don't call it > np.arr. It might be appropriate for pirate versions of Numpy. *** Seriously though, having a variant of `mat` that returns arrays could be useful, so weak +0. Preferably, the name should be quite short to type. On the other hand, unlike r_ and c_, I haven't seen or used mat() in real code. -- Pauli Virtanen From joseluismietta at yahoo.com.ar Tue Jul 8 20:29:55 2014 From: joseluismietta at yahoo.com.ar (=?iso-8859-1?Q?Jos=E8_Luis_Mietta?=) Date: Tue, 8 Jul 2014 17:29:55 -0700 Subject: [Numpy-discussion] Number of elements in a intersection graph Message-ID: <1404865795.90882.YahooMailNeo@web142302.mail.bf1.yahoo.com> Hi experts!! I am studying the intersection between line segments (sticks). I have an Numpy array (M) corresponding to the intersection graph of the system (the element Mij = 1 if the sticks' i 'and' 'j' intersect, and Mij = 0 if not intersect). I want to determine the number of elements that form the path that connects two sticks (N and K), i.e.: the number of sticks that form the spanning cluster between stick N and K. How I can do?? Please explain step by step. Best regards! Thanks a lot. Jos? Luis -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Wed Jul 9 06:41:17 2014 From: stefan at sun.ac.za (=?UTF-8?Q?St=C3=A9fan_van_der_Walt?=) Date: Wed, 9 Jul 2014 12:41:17 +0200 Subject: [Numpy-discussion] Remove bento from numpy In-Reply-To: References: Message-ID: On Sat, Jul 5, 2014 at 6:40 PM, David Cournapeau wrote: > The efforts are on average less demanding than this discussion. We are > talking about adding entries to a list in most cases... In scikit-image we use the following script to check for the most basic discrepancies: https://github.com/scikit-image/scikit-image/blob/master/check_bento_build.py St?fan From olivier.grisel at ensta.org Wed Jul 9 10:00:34 2014 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Wed, 9 Jul 2014 16:00:34 +0200 Subject: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing In-Reply-To: References: <536CB2C6.1030305@googlemail.com> Message-ID: Feodor updated the AppVeyor nodes to have the Windows SDK matching MSVC 2008 Express for Python 2. I have updated my sample scripts and we now have a working example of a free CI system for: Python 2 and 3 both for 32 and 64 bit architectures. https://github.com/ogrisel/python-appveyor-demo Best, -- Olivier From bryanv at continuum.io Wed Jul 9 11:13:58 2014 From: bryanv at continuum.io (Bryan Van de Ven) Date: Wed, 9 Jul 2014 10:13:58 -0500 Subject: [Numpy-discussion] ANN: Bokeh 0.5 released Message-ID: <565B058A-583F-4D6E-B6B2-5C7FDB724F2E@continuum.io> I am very happy to announce the release of Bokeh version 0.5! (http://continuum.io/blog/bokeh-0.5) Bokeh is a Python library for visualizing large and realtime datasets on the web. This release includes many new features: weekly dev releases, a new plot frame, a click tool, "always on" hover tool, multiple axes, log axes, minor ticks, gears and gauges glyphs, and an NPM BokehJS package. Several usability enhancements have been made to the plotting.py interface to make it even easier to use. The Bokeh tutorial also now includes exercises in IPython notebook form. Of course, we've made many little bug fixes - see the CHANGELOG for full details. The biggest news is all the long-term and architectural goals landing in Bokeh 0.5: * Widgets! Build apps and dashboards with Bokeh * Very high level bokeh.charts interface * Initial Abstract Rendering support for big data visualizations * Tighter Pandas integration * Simpler, easier plot embedding options Expect dynamic, data-driven layouts, including ggplot style auto-faceting in upcoming releases, as well as R language bindings, more statistical plot types in bokeh.charts, and cloud hosting for Bokeh apps. Check out the full documentation, interactive gallery, and tutorial at http://bokeh.pydata.org as well as the new Bokeh IPython notebook nbviewer index (including all the tutorials) at: http://nbviewer.ipython.org/github/ContinuumIO/bokeh-notebooks/blob/master/index.ipynb If you are using Anaconda, you can install with conda: conda install bokeh Alternatively, you can install with pip: pip install bokeh BokehJS is also available by CDN for use in standalone javascript applications: http://cdn.pydata.org/bokeh-0.5.min.js http://cdn.pydata.org/bokeh-0.5.min.css Issues, enhancement requests, and pull requests can be made on the Bokeh Github page: https://github.com/continuumio/bokeh Questions can be directed to the Bokeh mailing list: bokeh at continuum.io If you have interest in helping to develop Bokeh, please get involved! Special thanks to recent contributors: Tabish Chasmawala, Samuel Colvin, Christina Doig, Tarun Gaba, Maggie Mari, Amy Troschinetz, Ben Zaitlen. Bryan Van de Ven Continuum Analytics http://continuum.io From robert.kern at gmail.com Wed Jul 9 15:31:35 2014 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 9 Jul 2014 20:31:35 +0100 Subject: [Numpy-discussion] Number of elements in a intersection graph In-Reply-To: <1404865795.90882.YahooMailNeo@web142302.mail.bf1.yahoo.com> References: <1404865795.90882.YahooMailNeo@web142302.mail.bf1.yahoo.com> Message-ID: On Wed, Jul 9, 2014 at 1:29 AM, Jos? Luis Mietta wrote: > Hi experts!! > > I am studying the intersection between line segments (sticks). I have an > Numpy array (M) corresponding to the intersection graph of the system (the > element Mij = 1 if the sticks' i 'and' 'j' intersect, and Mij = 0 if not > intersect). > > I want to determine the number of elements that form the path that connects > two sticks (N and K), i.e.: the number of sticks that form the spanning > cluster between stick N and K. > How I can do?? Please explain step by step. The last time you asked a question about this project, we pointed you to the networkx package. http://networkx.github.io/documentation/latest/reference/algorithms.shortest_paths.html You can make a networkx.Graph object from your adjacency matrix very simply: graph = networkx.Graph(M) -- Robert Kern From rmcgibbo at gmail.com Wed Jul 9 18:53:26 2014 From: rmcgibbo at gmail.com (Robert McGibbon) Date: Wed, 9 Jul 2014 15:53:26 -0700 Subject: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing In-Reply-To: References: <536CB2C6.1030305@googlemail.com> Message-ID: This is an awesome resource for tons of projects. Thanks Olivier! -Robert On Wed, Jul 9, 2014 at 7:00 AM, Olivier Grisel wrote: > Feodor updated the AppVeyor nodes to have the Windows SDK matching > MSVC 2008 Express for Python 2. I have updated my sample scripts and > we now have a working example of a free CI system for: > > Python 2 and 3 both for 32 and 64 bit architectures. > > https://github.com/ogrisel/python-appveyor-demo > > Best, > > -- > Olivier > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ted.sandler at gmail.com Wed Jul 9 20:29:04 2014 From: ted.sandler at gmail.com (Ted Sandler) Date: Wed, 9 Jul 2014 17:29:04 -0700 Subject: [Numpy-discussion] Number of elements in a intersection graph In-Reply-To: References: <1404865795.90882.YahooMailNeo@web142302.mail.bf1.yahoo.com> Message-ID: Use NetworkX + breadth first search and you are done. On Wed, Jul 9, 2014 at 12:31 PM, Robert Kern wrote: > On Wed, Jul 9, 2014 at 1:29 AM, Jos? Luis Mietta > wrote: > > Hi experts!! > > > > I am studying the intersection between line segments (sticks). I have an > > Numpy array (M) corresponding to the intersection graph of the system > (the > > element Mij = 1 if the sticks' i 'and' 'j' intersect, and Mij = 0 if not > > intersect). > > > > I want to determine the number of elements that form the path that > connects > > two sticks (N and K), i.e.: the number of sticks that form the spanning > > cluster between stick N and K. > > How I can do?? Please explain step by step. > > The last time you asked a question about this project, we pointed you > to the networkx package. > > > http://networkx.github.io/documentation/latest/reference/algorithms.shortest_paths.html > > You can make a networkx.Graph object from your adjacency matrix very > simply: > > graph = networkx.Graph(M) > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Thu Jul 10 03:46:00 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Thu, 10 Jul 2014 09:46:00 +0200 Subject: [Numpy-discussion] numpy.partition and the ICC compiler Message-ID: <53BE44B8.3070908@googlemail.com> hi, there seems to be some issue with the newish selection code when compiling numpy with the ICC compiler. See this issue: https://github.com/numpy/numpy/issues/4836 I cannot reproduce the problem even when compiling with ICC myself. I have also tried valgrind and GCC's undefined behavior sanitizer without any results. Can somebody with debugging experience please try the posted testcase and if it is reproduce-able provide the information needed to fix this. It should also affect 1.8.0 if you replace the percentile call with np.partition(imc[i], (0, 1, 1027603, 1027604)) I would need a backtrace, the current register state, the local variables, disassembly and ideally the few steps until the crash/wrong result with variable state. Cheers, Julian From jtaylor.debian at googlemail.com Fri Jul 11 03:39:46 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Fri, 11 Jul 2014 09:39:46 +0200 Subject: [Numpy-discussion] np.zeros of structured array of array of objects Message-ID: <53BF94C2.7000407@googlemail.com> Hi, looking at https://github.com/numpy/numpy/issues/4857 I noticed that np.zeros of a structured array of array of objects only initializes the first element of if the embedded array to zero and leaves the rest None: In [1]: a = numpy.zeros(10, dtype=[('multiple objects', object, 2)]); a Out[1]: array([([0, None],), ([0, None],), ([0, None],), ([0, None],), ([0, None],), ([0, None],), ([0, None],), ([0, None],), ([0, None],), ([0, None],)], dtype=[('multiple objects', 'O', (2,))]) Is this the intented behavior? I would have expected all fields to be set to an int-object 0. If not can we change it or is it too likely people rely on this behavior? From michael.lehn at uni-ulm.de Fri Jul 11 02:21:29 2014 From: michael.lehn at uni-ulm.de (Dr. Michael Lehn) Date: Fri, 11 Jul 2014 08:21:29 +0200 Subject: [Numpy-discussion] The BLAS problem (was: Re: Wiki page for building numerical stuff on Windows) In-Reply-To: References: <46818810418925962.495791sturla.molden-gmail.com@news.gmane.org> <517271708418928107.376969sturla.molden-gmail.com@news.gmane.org> Message-ID: Am 29.04.2014 um 02:01 schrieb Nathaniel Smith : > On Tue, Apr 29, 2014 at 12:52 AM, Sturla Molden wrote: >> On 29/04/14 01:30, Nathaniel Smith wrote: >> >>> I finally read this paper: >>> >>> http://www.cs.utexas.edu/users/flame/pubs/blis2_toms_rev2.pdf >>> >>> and I have to say that I'm no longer so convinced that OpenBLAS is the >>> right starting point. >> >> I think OpenBLAS in the long run is doomed as an OSS project. Having >> huge portions of the source in assembly is not sustainable in 2014. >> OpenBLAS (like GotoBLAS2 before it) runs a high risk of becoming >> abandonware. > > Have you read the paper I linked? I really recommend it. BLIS is > apparently 95% straight-up-C, plus a slot where you stick in a tiny > CPU-specific super-optimized kernel [1]. So this localizes the nasty > stuff to one tiny function, plus most of the kernels that have been > written so far do in fact use intrinsics [2]. > > [1] https://code.google.com/p/blis/wiki/KernelsHowTo > [2] https://code.google.com/p/blis/wiki/HardwareSupport > I was teaching this summer an undergraduate class ?Software Basics on HPC?. Of course on topic was the efficient implementation of the matrix-matrix product GEMM. The BLIS paper [1] is a great source for that. In my opinion having your own hands-on experience is very important for actually understanding this concepts. That in particular means that we implemented our own matrix-matrix product. The pure C (ANSI C) implementation has less than 450 lines of code. The code consists of several function and students developed these functions one by one from one assignment to the other. You can see the result here: http://apfel.mathematik.uni-ulm.de/~lehn/sghpc/gemm/page02/index.html#toc4 Other assignments where about improving the micro kernel with SSE instructions. You can travers through the pages to see how we where doing so step by step. Please understand that this course material is still work in progress and needs some polish here and there. Still it could be useful for others and even a starting point for a simple BLAS implementation. Cheers, Michael [1]: http://www.cs.utexas.edu/users/flame/pubs/BLISTOMSrev2.pdf ----------------------------------------------------------------------------------- Dr. Michael Lehn University of Ulm, Institute for Numerical Mathematics Helmholtzstr. 20 D-89069 Ulm, Germany Phone: (+49) 731 50-23534, Fax: (+49) 731 50-23548 ----------------------------------------------------------------------------------- From olivier.grisel at ensta.org Fri Jul 11 06:30:40 2014 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Fri, 11 Jul 2014 12:30:40 +0200 Subject: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing In-Reply-To: References: <536CB2C6.1030305@googlemail.com> Message-ID: 2014-07-10 0:53 GMT+02:00 Robert McGibbon : > This is an awesome resource for tons of projects. Thanks. FYI here is the PR for sklearn to use AppVeyor CI: https://github.com/scikit-learn/scikit-learn/pull/3363 It's slightly different from the minimalistic sample I wrote for python-appveyor-demo in the sense that for sklearn I decided to actually install the generated wheel package and run the tests on the resulting installed library rather than on the project source folder. -- Olivier From jeffreback at gmail.com Fri Jul 11 07:56:38 2014 From: jeffreback at gmail.com (Jeff) Date: Fri, 11 Jul 2014 04:56:38 -0700 (PDT) Subject: [Numpy-discussion] ANN: Pandas 0.14.0 Release Candidate 1 In-Reply-To: References: Message-ID: <78326bb6-44e0-41c4-8fbe-526b01cec592@googlegroups.com> Matthew, we posted the release of 0.14.1 last night. Are these picked up and build here automatically? https://nipy.bic.berkeley.edu/scipy_installers/ thanks Jeff On Saturday, May 17, 2014 7:22:00 AM UTC-4, Jeff wrote: > > Hi, > > I'm pleased to announce the availability of the first release candidate of > Pandas 0.14.0. > Please try this RC and report any issues here: Pandas Issues > > We will be releasing officially in about 2 weeks or so. > > This is a major release from 0.13.1 and includes a small number of API > changes, several new features, enhancements, and > performance improvements along with a large number of bug fixes. > > Highlights include: > > - Officially support Python 3.4 > - SQL interfaces updated to use sqlalchemy, > - Display interface changes > - MultiIndexing Using Slicers > - Ability to join a singly-indexed DataFrame with a multi-indexed > DataFrame > - More consistency in groupby results and more flexible groupby > specifications > - Holiday calendars are now supported in CustomBusinessDay > - Several improvements in plotting functions, including: hexbin, area > and pie plots. > - Performance doc section on I/O operations > > Since there are some significant changes in the default way DataFrames are > displayed. I have put > up a comment issue looking for some feedback here > > > Here are the full whatsnew and documentation links: > > v0.14.0 Whatsnew > > > v0.14.0 Documentation Page > > > Source tarballs, and windows builds are available here: > > Pandas v0.14rc1 Release > > A big thank you to everyone who contributed to this release! > > Jeff > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeffreback at gmail.com Fri Jul 11 09:31:15 2014 From: jeffreback at gmail.com (Jeff Reback) Date: Fri, 11 Jul 2014 09:31:15 -0400 Subject: [Numpy-discussion] ANN: pandas 0.14.1 released Message-ID: Hello, We are proud to announce v0.14.1 of pandas, a minor release from 0.14.0. This release includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. This was 1.5 months of work with 244 commits by 45 authors encompassing 306 issues. We recommend that all users upgrade to this version. *Highlights:* - New method select_dtypes() to select columns based on the dtype - New method sem() to calculate the standard error of the mean. - Support for dateutil timezones (see *docs* ). - Support for ignoring full line comments in the read_csv() text parser. - New documentation section on *Options and Settings* . - Lots of bug fixes For a more a full description of Whatsnew for v0.14.1 here: http://pandas.pydata.org/pandas-docs/stable/whatsnew.html *What is it:* *pandas* is a Python package providing fast, flexible, and expressive data structures designed to make working with ?relational? or ?labeled? data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. Documentation: http://pandas.pydata.org/pandas-docs/stable/ Source tarballs, windows binaries are available on PyPI: https://pypi.python.org/pypi/pandas windows binaries are courtesy of Christoph Gohlke and are built on Numpy 1.8 macosx wheels will be available soon, courtesy of Matthew Brett Please report any issues here: https://github.com/pydata/pandas/issues Thanks The Pandas Development Team Contributors to the 0.14.1 release - Andrew Rosenfeld - Andy Hayden - Benjamin Adams - Benjamin M. Gross - Brian Quistorff - Brian Wignall - bwignall - clham - Daniel Waeber - David Bew - David Stephens - DSM - dsm054 - helger - immerrr - Jacob Schaer - jaimefrio - Jan Schulz - John David Reaver - John W. O?Brien - Joris Van den Bossche - jreback - Julien Danjou - Kevin Sheppard - K.-Michael Aye - Kyle Meyer - lexual - Matthew Brett - Matt Wittmann - Michael Mueller - Mortada Mehyar - onesandzeroes - Phillip Cloud - Rob Levy - rockg - sanguineturtle - Schaer, Jacob C - seth-p - sinhrks - Stephan Hoyer - Thomas Kluyver - Todd Jennings - TomAugspurger - unknown - yelite -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Jul 11 10:35:48 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 11 Jul 2014 15:35:48 +0100 Subject: [Numpy-discussion] np.zeros of structured array of array of objects In-Reply-To: <53BF94C2.7000407@googlemail.com> References: <53BF94C2.7000407@googlemail.com> Message-ID: On Fri, Jul 11, 2014 at 8:39 AM, Julian Taylor wrote: > Hi, > looking at https://github.com/numpy/numpy/issues/4857 I noticed that > np.zeros of a structured array of array of objects only initializes the > first element of if the embedded array to zero and leaves the rest None: > > In [1]: a = numpy.zeros(10, dtype=[('multiple objects', object, 2)]); a > Out[1]: > array([([0, None],), ([0, None],), ([0, None],), ([0, None],), > ([0, None],), ([0, None],), ([0, None],), ([0, None],), > ([0, None],), ([0, None],)], > dtype=[('multiple objects', 'O', (2,))]) > > > Is this the intented behavior? I would have expected all fields to be > set to an int-object 0. > If not can we change it or is it too likely people rely on this behavior? Looks like a bug to me, and I can't off-hand think of any reason why anyone would be relying on this... I vote that unless someone speaks up we just fix it. If it really does break anything then we can always catch that in beta... -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From var.mail.daniel at gmail.com Fri Jul 11 16:30:36 2014 From: var.mail.daniel at gmail.com (Daniel da Silva) Date: Fri, 11 Jul 2014 16:30:36 -0400 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: References: <53B9C861.3090809@hawaii.edu> <-2968451659458027190@unknownmsgid> Message-ID: I think the idea at hand is not that it would be used everyday, but it would be there when needed. What people do everyday is with *real* data. They are using functions to load the data. Where this would come in useful would be presentations and tutorials. If leading a presentation on scientific computing in Python to beginners, which would look better on a bullet in a slide? - np.build('.2 .7 .1; .3 .5 .2; .1 .1 .9')) - np.array([[.2, .7, .1], [.3, .5, .2], [.1, .1, .9]]) The default way of defining contrived arrays by passing lists of lists is awkward for beginners. While lists of lists are not a hard concept, it's not something you want to force on someone who doesn't know the Python language yet. The second bullet above doesn't represent the readability of the Python world. I would suggest that this be named np.build() (or np.helpers.build()) in light of it providing a simple interface to building arrays. Again, when you work with real data you are taking an extra step to think about how you load that data. That's not what you need to think about when being introduced to NumPy. On Tue, Jul 8, 2014 at 9:09 AM, Pauli Virtanen wrote: > 07.07.2014 21:32, Chris Barker - NOAA Federal kirjoitti: > > If you are going to introduce this functionality, please don't call it > > np.arr. > > It might be appropriate for pirate versions of Numpy. > > *** > > Seriously though, having a variant of `mat` that returns arrays could be > useful, so weak +0. Preferably, the name should be quite short to type. > > On the other hand, unlike r_ and c_, I haven't seen or used mat() in > real code. > > -- > Pauli Virtanen > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rays at blue-cove.com Fri Jul 11 12:10:30 2014 From: rays at blue-cove.com (RayS) Date: Fri, 11 Jul 2014 09:10:30 -0700 Subject: [Numpy-discussion] ANN: Pandas 0.14.0 Release Candidate 1 In-Reply-To: <78326bb6-44e0-41c4-8fbe-526b01cec592@googlegroups.com> References: <78326bb6-44e0-41c4-8fbe-526b01cec592@googlegroups.com> Message-ID: <201407111610.s6BGAXia005296@blue-cove.com> At 04:56 AM 7/11/2014, you wrote: >Matthew, we posted the release of 0.14.1 last night. Are these >picked up and build here automatically? >https://nipy.bic.berkeley.edu/scipy_installers/ I see it's at http://www.lfd.uci.edu/~gohlke/pythonlibs/#pandas - Ray From jeffreback at gmail.com Sat Jul 12 06:33:37 2014 From: jeffreback at gmail.com (Jeff Reback) Date: Sat, 12 Jul 2014 06:33:37 -0400 Subject: [Numpy-discussion] ANN: Pandas 0.14.0 Release Candidate 1 In-Reply-To: <201407111610.s6BGAXia005296@blue-cove.com> References: <78326bb6-44e0-41c4-8fbe-526b01cec592@googlegroups.com> <201407111610.s6BGAXia005296@blue-cove.com> Message-ID: <2C2898F6-6754-4798-A2D0-7BAD32FE57AA@gmail.com> Ray Matthew builds Mac osx wheels for scipy stack (those are windows binaries) thanks anyhow > On Jul 11, 2014, at 12:10 PM, RayS wrote: > > At 04:56 AM 7/11/2014, you wrote: >> Matthew, we posted the release of 0.14.1 last night. Are these >> picked up and build here automatically? >> https://nipy.bic.berkeley.edu/scipy_installers/ > > I see it's at http://www.lfd.uci.edu/~gohlke/pythonlibs/#pandas > > - Ray > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Sat Jul 12 13:17:14 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 12 Jul 2014 12:17:14 -0500 Subject: [Numpy-discussion] String type again. Message-ID: As previous posts have pointed out, Numpy's `S` type is currently treated as a byte string, which leads to more complicated code in python3. OTOH, the unicode type is stored as UCS4, which consumes a lot of space, especially for ascii strings. This note proposes to adapt the currently existing 'a' type letter, currently aliased to 'S', as a new fixed encoding dtype. Python 3.3 introduced two one byte internal representations for unicode strings, ascii and latin1. Ascii has the advantage that it is a subset of UTF-8, whereas latin1 has a few more symbols. Another possibility is to just make it an UTF-8 encoding, but I think this would involve more overhead as Python would need to determine the maximum character size. These are just preliminary thoughts, comments are welcome. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From joseluismietta at yahoo.com.ar Sat Jul 12 12:53:45 2014 From: joseluismietta at yahoo.com.ar (=?iso-8859-1?Q?Jos=E8_Luis_Mietta?=) Date: Sat, 12 Jul 2014 09:53:45 -0700 Subject: [Numpy-discussion] plt.show() and plt.draw() doesnt work Message-ID: <1405184025.93121.YahooMailNeo@web142302.mail.bf1.yahoo.com> Hi experts! I have a numpy array M. I generate a graph using NetworkX and then I want to draw this graph: ??? import networkx as nx ??? import matplotlib.pyplot as plt ??? G=nx.graph(M) ??? nx.draw(G) ??? plt.draw() Doing this, no picture appears. In addition, if I do `plt.show()` no picture appears. Please help! Best regards -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidmenhur at gmail.com Sat Jul 12 17:32:24 2014 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Sat, 12 Jul 2014 23:32:24 +0200 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: References: <53B9C861.3090809@hawaii.edu> <-2968451659458027190@unknownmsgid> Message-ID: On 11 July 2014 22:30, Daniel da Silva wrote: > I think the idea at hand is not that it would be used everyday, but it > would be there when needed. What people do everyday is with *real* data. > They are using functions to load the data. > But sometimes we have to hard-code a few values, and it is true that making a list (or nested list) is quite verbose; one example are unittests. Having a MATLAB-style array creation would be convenient for those cases. -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Jul 12 20:02:37 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 13 Jul 2014 01:02:37 +0100 Subject: [Numpy-discussion] String type again. In-Reply-To: References: Message-ID: On 12 Jul 2014 23:06, "Charles R Harris" wrote: > > As previous posts have pointed out, Numpy's `S` type is currently treated as a byte string, which leads to more complicated code in python3. OTOH, the unicode type is stored as UCS4, which consumes a lot of space, especially for ascii strings. This note proposes to adapt the currently existing 'a' type letter, currently aliased to 'S', as a new fixed encoding dtype. Python 3.3 introduced two one byte internal representations for unicode strings, ascii and latin1. Ascii has the advantage that it is a subset of UTF-8, whereas latin1 has a few more symbols. Another possibility is to just make it an UTF-8 encoding, but I think this would involve more overhead as Python would need to determine the maximum character size. These are just preliminary thoughts, comments are welcome. I feel like for most purposes, what we *really* want is a variable length string dtype (I.e., where each element can be a different length.). Pandas pays quite some price in overhead to fake this right now. Adding such a thing will cause some problems regarding compatibility (what to do with array(["foo"])) and education, but I think it's worth it in the long run. A variable length string with out of band storage also would allow for a lot of py3.3-style storage tricks of we want then. Given that, though, I'm a little dubious about adding a third fixed length string type, since it seems like it might be a temporary patch, yet raises the prospect of having to indefinitely support *5* distinct string types (3 of which will map to py3 str)... OTOH, fixed length nul padded latin1 would be useful for various flat file reading tasks. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sun Jul 13 08:11:07 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 13 Jul 2014 14:11:07 +0200 Subject: [Numpy-discussion] plt.show() and plt.draw() doesnt work In-Reply-To: <1405184025.93121.YahooMailNeo@web142302.mail.bf1.yahoo.com> References: <1405184025.93121.YahooMailNeo@web142302.mail.bf1.yahoo.com> Message-ID: On Sat, Jul 12, 2014 at 6:53 PM, Jos? Luis Mietta < joseluismietta at yahoo.com.ar> wrote: > Hi experts! > > I have a numpy array M. I generate a graph using NetworkX and then I want > to draw this graph: > > import networkx as nx > import matplotlib.pyplot as plt > G=nx.graph(M) > nx.draw(G) > plt.draw() > > Doing this, no picture appears. In addition, if I do `plt.show()` no > picture appears. > You're getting a TypeError I guess? The third line is incorrect, should be G = nx.graph.Graph(M) If that's not the issue and it's really about plotting, you should ask on the matplotlib users list. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndarray at mac.com Sun Jul 13 13:05:48 2014 From: ndarray at mac.com (Alexander Belopolsky) Date: Sun, 13 Jul 2014 13:05:48 -0400 Subject: [Numpy-discussion] String type again. In-Reply-To: References: Message-ID: On Sat, Jul 12, 2014 at 8:02 PM, Nathaniel Smith wrote: > I feel like for most purposes, what we *really* want is a variable length > string dtype (I.e., where each element can be a different length.). I've been toying with the idea of creating an array type for interned strings. In many applications dealing with large arrays of variable size strings, the strings come from a relatively short set of names. Arrays of interned strings can be manipulated very efficiently because in may respects they are just like arrays of integers. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndarray at mac.com Sun Jul 13 13:13:57 2014 From: ndarray at mac.com (Alexander Belopolsky) Date: Sun, 13 Jul 2014 13:13:57 -0400 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: References: <53B9C861.3090809@hawaii.edu> <-2968451659458027190@unknownmsgid> Message-ID: On Fri, Jul 11, 2014 at 4:30 PM, Daniel da Silva wrote: > If leading a presentation on scientific computing in Python to beginners, > which would look better on a bullet in a slide? > > - > > np.build('.2 .7 .1; .3 .5 .2; .1 .1 .9')) > > - > > np.array([[.2, .7, .1], [.3, .5, .2], [.1, .1, .9]]) > > > np.array([[.2, .7, .1], [.3, .5, .2], [.1, .1, .9]]) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndarray at mac.com Sun Jul 13 13:31:14 2014 From: ndarray at mac.com (Alexander Belopolsky) Date: Sun, 13 Jul 2014 13:31:14 -0400 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: References: <53B9C861.3090809@hawaii.edu> <-2968451659458027190@unknownmsgid> Message-ID: Also, the use of strings will confuse most syntax highlighters. Compare the two options in this screenshot: [image: Inline image 2] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2014-07-13 at 1.29.20 PM.png Type: image/png Size: 26129 bytes Desc: not available URL: From ben.root at ou.edu Mon Jul 14 09:23:22 2014 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 14 Jul 2014 09:23:22 -0400 Subject: [Numpy-discussion] plt.show() and plt.draw() doesnt work In-Reply-To: <1405184025.93121.YahooMailNeo@web142302.mail.bf1.yahoo.com> References: <1405184025.93121.YahooMailNeo@web142302.mail.bf1.yahoo.com> Message-ID: Please send this question to the matplotlib-users mailing list (if you haven't already, I am still going through a huge backlog). This is the NumPy list. Ben Root On Sat, Jul 12, 2014 at 12:53 PM, Jos? Luis Mietta < joseluismietta at yahoo.com.ar> wrote: > Hi experts! > > I have a numpy array M. I generate a graph using NetworkX and then I want > to draw this graph: > > import networkx as nx > import matplotlib.pyplot as plt > G=nx.graph(M) > nx.draw(G) > plt.draw() > > Doing this, no picture appears. In addition, if I do `plt.show()` no > picture appears. > > Please help! > > Best regards > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Mon Jul 14 13:00:45 2014 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Mon, 14 Jul 2014 19:00:45 +0200 Subject: [Numpy-discussion] String type again. In-Reply-To: References: Message-ID: 2014-07-13 19:05 GMT+02:00 Alexander Belopolsky : > > On Sat, Jul 12, 2014 at 8:02 PM, Nathaniel Smith wrote: >> >> I feel like for most purposes, what we *really* want is a variable length >> string dtype (I.e., where each element can be a different length.). > > > > I've been toying with the idea of creating an array type for interned > strings. In many applications dealing with large arrays of variable size > strings, the strings come from a relatively short set of names. Arrays of > interned strings can be manipulated very efficiently because in may respects > they are just like arrays of integers. +1 I think this is why pandas is using dtype=object to load string data: in many cases short string values are used to represent categorical variables with a comparatively small cardinality of possible values for a dataset with comparatively numerous records. In that case the dtype=object is not that bad as it just stores pointer on string objects managed by Python. It's possible to intern the strings manually at load time (I don't know if pandas or python already do it automatically in that case). The integer semantics is good for that case. Having an explicit dtype might be even better. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel From andrew.collette at gmail.com Mon Jul 14 13:39:41 2014 From: andrew.collette at gmail.com (Andrew Collette) Date: Mon, 14 Jul 2014 11:39:41 -0600 Subject: [Numpy-discussion] String type again. In-Reply-To: References: Message-ID: Hi Chuck, > This note proposes to adapt the currently existing 'a' > type letter, currently aliased to 'S', as a new fixed encoding dtype. Python > 3.3 introduced two one byte internal representations for unicode strings, > ascii and latin1. Ascii has the advantage that it is a subset of UTF-8, > whereas latin1 has a few more symbols. Another possibility is to just make > it an UTF-8 encoding, but I think this would involve more overhead as Python > would need to determine the maximum character size. For storing data in HDF5 (PyTables or h5py), it would be somewhat cleaner if either ASCII or UTF-8 are used, as these are the only two charsets officially supported by the library. Latin-1 would require a custom read/write converter, which isn't the end of the world but would be tricky to do in a correct way, and likely somewhat slow. We'd also run into truncation issues since certain latin-1 chars become multibyte sequences in UTF8. I assume 'a' strings would still be null-padded? Andrew From charlesr.harris at gmail.com Mon Jul 14 14:22:43 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 14 Jul 2014 12:22:43 -0600 Subject: [Numpy-discussion] __numpy_ufunc__ Message-ID: Hi All, Julian has raised the question of including numpy_ufunc in numpy 1.9. I don't feel strongly one way or the other, but it doesn't seem to be finished yet and 1.10 might be a better place to work out the remaining problems along with the astropy folks testing possible uses. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Jul 14 16:13:00 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 14 Jul 2014 13:13:00 -0700 Subject: [Numpy-discussion] String type again. In-Reply-To: References: Message-ID: On Sat, Jul 12, 2014 at 10:17 AM, Charles R Harris < charlesr.harris at gmail.com> wrote: > As previous posts have pointed out, Numpy's `S` type is currently treated > as a byte string, which leads to more complicated code in python3. > Also, a byte string in py3 is not, in fact the same as the py2 string type. So we have a problem -- if we want 'S' to mean what it essentially does in py2, what do we map it to in pure-python land? I propose we embrace the py3 model as fully as possible: There is text data, and there is binary data. In py3, that is 'str' and 'bytes'. So numpy should have dtypes to match these. We're a bit stuck, however, because 'S' mapped to the py2 string type, which no longer exists in py3. Sorry not running py3 to see what 'S' does now, but I know it's bit broken, and may be too late to change it. But: it is certainly a common case in the scientific world to have 1-byte-per-character string data, and care about store size. So a 1-byte-per-character text data types may be a good idea: As for a bytes type -- do we need it, or are we fine with simply using uint8 arrays? (or, even the most common case, converting directly to the type that is actually stored in those bytes... > especially for ascii strings. This note proposes to adapt the currently > existing 'a' type letter, currently aliased to 'S', as a new fixed encoding > dtype. > +1 > Python 3.3 introduced two one byte internal representations for unicode > strings, ascii and latin1. Ascii has the advantage that it is a subset of > UTF-8, whereas latin1 has a few more symbols. > +1 for latin-1 -- those extra symbols are handy. Also, at least with Python's stdlib encoding, you can round-trip any binary data through latin-1 -- kind of making it act like a bytes object.... > Another possibility is to just make it an UTF-8 encoding, but I think this > would involve more overhead as Python would need to determine the maximum > character size. > yeah -- that is a) overhead, and b) breaks the numpy fixed size dtype model. And it's trickier for numpy arrays, 'cause they are mutable -- python strings can do OK, as they don't need to accommodate potentially changing sizes of strings. On Sat, Jul 12, 2014 at 5:02 PM, Nathaniel Smith wrote: > I feel like for most purposes, what we *really* want is a variable length > string dtype (I.e., where each element can be a different length.). well, that is fundamentally different than the usual numpy data model -- it would require that the array store pointers and dereference them on use -- is there anywhere else in numpy (other than the object dtype ) that does that? And if we did -- would it end up having any advantage over putting strings in an object array? Or for that matter, using a list of strings instead? > Pandas pays quite some price in overhead to fake this right now. Adding > such a thing will cause some problems regarding compatibility (what to do > with array(["foo"])) and education, but I think it's worth it in the long > run. i.e do you use the fixed-length type or the variable-length type? I'm not sure it's to killer to have a default and let eh user set a dtype if they want something else. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From hodgson.neil at yahoo.co.uk Tue Jul 15 05:22:56 2014 From: hodgson.neil at yahoo.co.uk (Neil Hodgson) Date: Tue, 15 Jul 2014 10:22:56 +0100 Subject: [Numpy-discussion] Bug in np.cross for 2D vectors Message-ID: <1405416176.45058.YahooMailNeo@web133104.mail.ir2.yahoo.com> Hi, We came across this bug while using np.cross on 3D arrays of 2D vectors. The first example shows the problem and we looked at the source for np.cross and believe we found the bug - an unnecessary swapaxes when returning the output (comment inserted in the code). Thanks Neil # Example shape = (3,5,7,2) # These are effectively 3D arrays (3*5*7) of 2D vectors data1 = np.random.randn(*shape) data2 = np.random.randn(*shape) # The cross product of data1 and data2 should produce a (3*5*7) array of scalars cross_product_longhand = data1[:,:,:,0]*data2[:,:,:,1]-data1[:,:,:,1]*data2[:,:,:,0] print 'longhand output shape:',cross_product_longhand.shape # and it does cross_product_numpy = np.cross(data1,data2) print 'numpy output shape:',cross_product_numpy.shape # It seems to have transposed the last 2 dimensions if (cross_product_longhand == np.transpose(cross_product_numpy, (0,2,1))).all(): print 'Unexpected transposition in numpy.cross (numpy version %s)'%np.__version__ # np.cross L1464if axis is not None: ??? axisa, axisb, axisc=(axis,)*3 a = asarray(a).swapaxes(axisa, 0) b = asarray(b).swapaxes(axisb, 0) msg = "incompatible dimensions for cross product\n"\ ????? "(dimension must be 2 or 3)" if (a.shape[0] not in [2, 3]) or (b.shape[0] not in [2, 3]): ???? raise ValueError(msg) if a.shape[0] == 2:??? if (b.shape[0] == 2): ??????? cp = a[0]*b[1] - a[1]*b[0] ??????? if cp.ndim == 0: ??????????? return cp ??????? else: ??????????? ## WE SHOULD NOT SWAPAXES HERE! ??????????? ## For 2D vectors the first axis has been ??????????? ## collapsed during the cross product ??????????? return cp.swapaxes(0, axisc) -------------- next part -------------- An HTML attachment was scrubbed... URL: From J.M.Hoekstra at tudelft.nl Tue Jul 15 02:33:30 2014 From: J.M.Hoekstra at tudelft.nl (Jacco Hoekstra - LR) Date: Tue, 15 Jul 2014 06:33:30 +0000 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: References: <53B9C861.3090809@hawaii.edu> <-2968451659458027190@unknownmsgid> Message-ID: <245AC908B39361438CFA2299B0DD50E438FC7B7C@SRV361.tudelft.net> WeIl, I do not see the confusion here (only due to the use of the array function, maybe). It is a string, after all, so it should be colour-coded as such. I would love to keep this feaure of np.mat in somehow, named np.txt2arr or something. We, linear algebraists, will already lose the .I method for matrix inversion, the * for matrix multiplication, let?s keep at least one of the many handy features of the matrix-type in. It is simply a very useful, short-hand way, probably a separate function, to make a 2D-array. If you think it?s ugly, don?t use it. But it certainly is faster to type it and former Matlab-users will love it as well. Just my 2 cts. From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Alexander Belopolsky Sent: zondag 13 juli 2014 19:31 To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Short-hand array creation in `numpy.mat` style Also, the use of strings will confuse most syntax highlighters. Compare the two options in this screenshot: [Inline image 2] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 7699 bytes Desc: image002.jpg URL: From njs at pobox.com Tue Jul 15 06:55:13 2014 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 15 Jul 2014 11:55:13 +0100 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: References: <53B9C861.3090809@hawaii.edu> <-2968451659458027190@unknownmsgid> Message-ID: On Sun, Jul 13, 2014 at 6:31 PM, Alexander Belopolsky wrote: > Also, the use of strings will confuse most syntax highlighters. Compare > the two options in this screenshot: > > [image: Inline image 2] > I guess this is a minor issue for "real" code, but even IPython doesn't (yet?) provide syntax highlighting for lines as they're typed, and this is a tool intended mainly for interactive use. That screenshot also I think illustrates why people have such a preference for the first syntax. The second line looks nice, but try typing it quickly and getting all the commas located correctly inside versus outside of each of the triply-nested brackets... No-one's come up with any names for this that are nearly as good as "arr". Is it really that bad to have to type one extra character, np.array instead of np.arr? -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2014-07-13 at 1.29.20 PM.png Type: image/png Size: 26129 bytes Desc: not available URL: From jeffreback at gmail.com Tue Jul 15 06:56:11 2014 From: jeffreback at gmail.com (Jeff Reback) Date: Tue, 15 Jul 2014 06:56:11 -0400 Subject: [Numpy-discussion] String type again. In-Reply-To: References: Message-ID: <5A29EC4A-CFE7-4B16-9C0A-4541B5544D62@gmail.com> in 0.15.0 pandas will have full fledged support for categoricals which in effect allow u 2 map a smaller number of strings to integers this is now in pandas master http://pandas-docs.github.io/pandas-docs-travis/categorical.html feedback welcome! > On Jul 14, 2014, at 1:00 PM, Olivier Grisel wrote: > > 2014-07-13 19:05 GMT+02:00 Alexander Belopolsky : >> >>> On Sat, Jul 12, 2014 at 8:02 PM, Nathaniel Smith wrote: >>> >>> I feel like for most purposes, what we *really* want is a variable length >>> string dtype (I.e., where each element can be a different length.). >> >> >> >> I've been toying with the idea of creating an array type for interned >> strings. In many applications dealing with large arrays of variable size >> strings, the strings come from a relatively short set of names. Arrays of >> interned strings can be manipulated very efficiently because in may respects >> they are just like arrays of integers. > > +1 I think this is why pandas is using dtype=object to load string > data: in many cases short string values are used to represent > categorical variables with a comparatively small cardinality of > possible values for a dataset with comparatively numerous records. > > In that case the dtype=object is not that bad as it just stores > pointer on string objects managed by Python. It's possible to intern > the strings manually at load time (I don't know if pandas or python > already do it automatically in that case). The integer semantics is > good for that case. Having an explicit dtype might be even better. > > -- > Olivier > http://twitter.com/ogrisel - http://github.com/ogrisel > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From sebastian at sipsolutions.net Tue Jul 15 07:26:30 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 15 Jul 2014 13:26:30 +0200 Subject: [Numpy-discussion] String type again. In-Reply-To: References: Message-ID: <1405423590.8281.7.camel@sebastian-t440> On Sa, 2014-07-12 at 12:17 -0500, Charles R Harris wrote: > As previous posts have pointed out, Numpy's `S` type is currently > treated as a byte string, which leads to more complicated code in > python3. OTOH, the unicode type is stored as UCS4, which consumes a > lot of space, especially for ascii strings. This note proposes to > adapt the currently existing 'a' type letter, currently aliased to > 'S', as a new fixed encoding dtype. Python 3.3 introduced two one byte > internal representations for unicode strings, ascii and latin1. Ascii > has the advantage that it is a subset of UTF-8, whereas latin1 has a > few more symbols. Another possibility is to just make it an UTF-8 > encoding, but I think this would involve more overhead as Python would > need to determine the maximum character size. These are just > preliminary thoughts, comments are welcome. > Just wondering, couldn't we have a type which actually has an (arbitrary, python supported) encoding (and "bytes" might even just be a special case of no encoding)? Basically storing bytes and on access do element[i].decode(specified_encoding) and on storing element[i] = value.encode(specified_encoding). There is always the never ending small issue of trailing null bytes. If we want to be fully compatible, such a type would have to store the string length explicitly to support trailing null bytes. - Sebastian > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From jaime.frio at gmail.com Tue Jul 15 07:41:57 2014 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Tue, 15 Jul 2014 04:41:57 -0700 Subject: [Numpy-discussion] Bug in np.cross for 2D vectors In-Reply-To: <1405416176.45058.YahooMailNeo@web133104.mail.ir2.yahoo.com> References: <1405416176.45058.YahooMailNeo@web133104.mail.ir2.yahoo.com> Message-ID: On Tue, Jul 15, 2014 at 2:22 AM, Neil Hodgson wrote: > Hi, > > We came across this bug while using np.cross on 3D arrays of 2D vectors. > What version of numpy are you using? This should already be solved in numpy master, and be part of the 1.9 release. Here's the relevant commit, although the code has been cleaned up a bit in later ones: https://github.com/numpy/numpy/commit/b9454f50f23516234c325490913224c3a69fb122 Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Jul 15 11:15:17 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 15 Jul 2014 09:15:17 -0600 Subject: [Numpy-discussion] String type again. In-Reply-To: <1405423590.8281.7.camel@sebastian-t440> References: <1405423590.8281.7.camel@sebastian-t440> Message-ID: On Tue, Jul 15, 2014 at 5:26 AM, Sebastian Berg wrote: > On Sa, 2014-07-12 at 12:17 -0500, Charles R Harris wrote: > > As previous posts have pointed out, Numpy's `S` type is currently > > treated as a byte string, which leads to more complicated code in > > python3. OTOH, the unicode type is stored as UCS4, which consumes a > > lot of space, especially for ascii strings. This note proposes to > > adapt the currently existing 'a' type letter, currently aliased to > > 'S', as a new fixed encoding dtype. Python 3.3 introduced two one byte > > internal representations for unicode strings, ascii and latin1. Ascii > > has the advantage that it is a subset of UTF-8, whereas latin1 has a > > few more symbols. Another possibility is to just make it an UTF-8 > > encoding, but I think this would involve more overhead as Python would > > need to determine the maximum character size. These are just > > preliminary thoughts, comments are welcome. > > > > Just wondering, couldn't we have a type which actually has an > (arbitrary, python supported) encoding (and "bytes" might even just be a > special case of no encoding)? Basically storing bytes and on access do > element[i].decode(specified_encoding) and on storing element[i] = > value.encode(specified_encoding). > > There is always the never ending small issue of trailing null bytes. If > we want to be fully compatible, such a type would have to store the > string length explicitly to support trailing null bytes. > UTF-8 encoding works with null bytes. That is one of the reasons it is so popular. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Jul 15 11:29:13 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 15 Jul 2014 09:29:13 -0600 Subject: [Numpy-discussion] String type again. In-Reply-To: References: <1405423590.8281.7.camel@sebastian-t440> Message-ID: On Tue, Jul 15, 2014 at 9:15 AM, Charles R Harris wrote: > > > > On Tue, Jul 15, 2014 at 5:26 AM, Sebastian Berg < > sebastian at sipsolutions.net> wrote: > >> On Sa, 2014-07-12 at 12:17 -0500, Charles R Harris wrote: >> > As previous posts have pointed out, Numpy's `S` type is currently >> > treated as a byte string, which leads to more complicated code in >> > python3. OTOH, the unicode type is stored as UCS4, which consumes a >> > lot of space, especially for ascii strings. This note proposes to >> > adapt the currently existing 'a' type letter, currently aliased to >> > 'S', as a new fixed encoding dtype. Python 3.3 introduced two one byte >> > internal representations for unicode strings, ascii and latin1. Ascii >> > has the advantage that it is a subset of UTF-8, whereas latin1 has a >> > few more symbols. Another possibility is to just make it an UTF-8 >> > encoding, but I think this would involve more overhead as Python would >> > need to determine the maximum character size. These are just >> > preliminary thoughts, comments are welcome. >> > >> >> Just wondering, couldn't we have a type which actually has an >> (arbitrary, python supported) encoding (and "bytes" might even just be a >> special case of no encoding)? Basically storing bytes and on access do >> element[i].decode(specified_encoding) and on storing element[i] = >> value.encode(specified_encoding). >> >> There is always the never ending small issue of trailing null bytes. If >> we want to be fully compatible, such a type would have to store the >> string length explicitly to support trailing null bytes. >> > > UTF-8 encoding works with null bytes. That is one of the reasons it is so > popular. > > Thinking more about it, the easiest thing to do might be to make the S dtype a UTF-8 encoding. Most of the machinery to deal with that is already in place. That change might affect some users though, and we might need to do some work to make it backwards compatible with python 2. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Jul 15 12:18:30 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 15 Jul 2014 09:18:30 -0700 Subject: [Numpy-discussion] String type again. In-Reply-To: References: Message-ID: On Mon, Jul 14, 2014 at 10:39 AM, Andrew Collette wrote: > For storing data in HDF5 (PyTables or h5py), it would be somewhat > cleaner if either ASCII or UTF-8 are used, as these are the only two > charsets officially supported by the library. good argument for ASCII, but utf-8 is a bad idea, as there is no 1:1 correspondence between length of string in bytes and length in characters -- as numpy needs to pre-allocate a defined number of bytes for a dtype, there is a disconnect between the user and numpy as to how long a string is being stored...this isn't a problem for immutable strings, and less of a problem for HDF, as you can determine how many bytes you need before you write the file (or does HDF support var-length elements?) > Latin-1 would require a > custom read/write converter, which isn't the end of the world "custom"? it would be an encoding operation -- which you'd need to go from utf-8 to/from unicode anyway. So you would lose the ability to have a nice 1:1 binary representation map between numpy and HDF... good argument for ASCII, I guess. Or for HDF to use latin-1 ;-) Does HDF enforce ascii-only? what does it do with the > 127 values? > would be tricky to do in a correct way, and likely somewhat slow. > We'd also run into truncation issues since certain latin-1 chars > become multibyte sequences in UTF8. > that's the whole issue with UTF-8 -- it needs to be addressed somewhere, and the numpy-HDF interface seems like a smarter place to put it than the numpy-user interface! I assume 'a' strings would still be null-padded? yup. -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Tue Jul 15 12:32:00 2014 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 15 Jul 2014 12:32:00 -0400 Subject: [Numpy-discussion] __numpy_ufunc__ In-Reply-To: References: Message-ID: Perhaps a bit of context might be useful? How is numpy_ufunc different from the ufuncs that we know and love? What are the known implications? What are the known shortcomings? Are there ABI and/or API concerns between 1.9 and 1.10? Ben Root On Mon, Jul 14, 2014 at 2:22 PM, Charles R Harris wrote: > Hi All, > > Julian has raised the question of including numpy_ufunc in numpy 1.9. I > don't feel strongly one way or the other, but it doesn't seem to be > finished yet and 1.10 might be a better place to work out the remaining > problems along with the astropy folks testing possible uses. > > Thoughts? > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Tue Jul 15 14:06:26 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Tue, 15 Jul 2014 20:06:26 +0200 Subject: [Numpy-discussion] __numpy_ufunc__ and 1.9 release Message-ID: <53C56DA2.40402@googlemail.com> hi, as you may know we want to release numpy 1.9 soon. We should have solved most indexing regressions the first beta showed. The remaining blockers are finishing the new __numpy_ufunc__ feature. This feature should allow for alternative method to overriding the behavior of ufuncs from subclasses. It is described here: https://github.com/numpy/numpy/blob/master/doc/neps/ufunc-overrides.rst The current blocker issues are: https://github.com/numpy/numpy/issues/4753 https://github.com/numpy/numpy/pull/4815 I'm not to familiar with all the complications of subclassing so I can't really say how hard this is to solve. My issue is that it there still seems to be debate on how to handle operator overriding correctly and I am opposed to releasing a numpy with yet another experimental feature that may or may not be finished sometime later. Having datetime in infinite experimental state is bad enough. I think nobody is served well if we release 1.9 with the feature prematurely based on a not representative set of users and the later after more users showed up see we have to change its behavior. So I'm wondering if we should delay the introduction of this feature to 1.10 or is it important enough to wait until there is a consensus on the remaining issues? From shoyer at gmail.com Tue Jul 15 14:21:39 2014 From: shoyer at gmail.com (Stephan Hoyer) Date: Tue, 15 Jul 2014 11:21:39 -0700 Subject: [Numpy-discussion] String type again. In-Reply-To: References: Message-ID: On Mon, Jul 14, 2014 at 10:00 AM, Olivier Grisel wrote: > 2014-07-13 19:05 GMT+02:00 Alexander Belopolsky : > > I've been toying with the idea of creating an array type for interned > > strings. In many applications dealing with large arrays of variable size > > strings, the strings come from a relatively short set of names. Arrays > of > > interned strings can be manipulated very efficiently because in may > respects > > they are just like arrays of integers. > > +1 I think this is why pandas is using dtype=object to load string > data: in many cases short string values are used to represent > categorical variables with a comparatively small cardinality of > possible values for a dataset with comparatively numerous records. > Pandas has a new "categorical" type (just merged into master) which is pretty similar to interned strings: https://github.com/pydata/pandas/pull/7217 http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html Of course, it would be ideal for numpy itself to natively support categoricals and variables length strings. Best, Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From aldcroft at head.cfa.harvard.edu Tue Jul 15 14:40:58 2014 From: aldcroft at head.cfa.harvard.edu (Aldcroft, Thomas) Date: Tue, 15 Jul 2014 14:40:58 -0400 Subject: [Numpy-discussion] String type again. In-Reply-To: References: Message-ID: On Sat, Jul 12, 2014 at 8:02 PM, Nathaniel Smith wrote: > On 12 Jul 2014 23:06, "Charles R Harris" > wrote: > > > > As previous posts have pointed out, Numpy's `S` type is currently > treated as a byte string, which leads to more complicated code in python3. > OTOH, the unicode type is stored as UCS4, which consumes a lot of space, > especially for ascii strings. This note proposes to adapt the currently > existing 'a' type letter, currently aliased to 'S', as a new fixed encoding > dtype. Python 3.3 introduced two one byte internal representations for > unicode strings, ascii and latin1. Ascii has the advantage that it is a > subset of UTF-8, whereas latin1 has a few more symbols. Another possibility > is to just make it an UTF-8 encoding, but I think this would involve more > overhead as Python would need to determine the maximum character size. > These are just preliminary thoughts, comments are welcome. > > I feel like for most purposes, what we *really* want is a variable length > string dtype (I.e., where each element can be a different length.). Pandas > pays quite some price in overhead to fake this right now. Adding such a > thing will cause some problems regarding compatibility (what to do with > array(["foo"])) and education, but I think it's worth it in the long run. A > variable length string with out of band storage also would allow for a lot > of py3.3-style storage tricks of we want then. > > Given that, though, I'm a little dubious about adding a third fixed length > string type, since it seems like it might be a temporary patch, yet raises > the prospect of having to indefinitely support *5* distinct string types (3 > of which will map to py3 str)... > > OTOH, fixed length nul padded latin1 would be useful for various flat file > reading tasks. > As one of the original agitators for this, let me re-iterate that what the astronomical community *really* wants is the original proposal as described by Chris Barker [1] and essentially what Charles said. We have large data archives that have ASCII string data in binary formats like FITS and HDF5. The current readers for those datasets present users with numpy S data types, which in Python 3 cannot be compared to str (unicode) literals. In many cases those datasets are large, and in my case I regularly deal with multi-Gb sized bytestring arrays. Converting those to a U dtype is not practical. This issue is the sole blocker that I personally have in beginning to move our operations code base to be Python 3 compatible, and eventually actually baselining Python 3. A variable length string would be great, but it feels like a different (and more difficult) problem to me. If, however, this can be the solution to the problem I described, and it can be implemented in a finite time, then I'm all for it! :-) I hate begging for features with no chance of contributing much to the implementation (lacking the necessary expertise in numpy internals). I would be happy to draft a NEP if that will help the process. Cheers, Tom [1]: http://mail.scipy.org/pipermail/numpy-discussion/2014-January/068622.html > -n > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew.collette at gmail.com Tue Jul 15 15:11:41 2014 From: andrew.collette at gmail.com (Andrew Collette) Date: Tue, 15 Jul 2014 13:11:41 -0600 Subject: [Numpy-discussion] String type again. In-Reply-To: References: Message-ID: Hi, > good argument for ASCII, but utf-8 is a bad idea, as there is no 1:1 correspondence between length of string in bytes and length in characters -- as numpy needs to pre-allocate a defined number of bytes for a dtype, there is a disconnect between the user and numpy as to how long a string is being stored...this isn't a problem for immutable strings, and less of a problem for HDF, as you can determine how many bytes you need before you write the file (or does HDF support var-length elements?) There is an HDF5 variable-length type, which we currently read and write as Python str objects (using NumPy's object type). But HDF5 additionally has a fixed-storage-width UTF8 type, so we could map to a NumPy fixed-storage-width type trivially. When determining the HDF5 data type, unfortunately all you have to go on is the NumPy dtype... creating an HDF5 dataset is done separately from writing the data. > "custom"? it would be an encoding operation -- which you'd need to go from utf-8 to/from unicode anyway. So you would lose the ability to have a nice 1:1 binary representation map between numpy and HDF... good argument for ASCII, I guess. Or for HDF to use latin-1 ;-) "Custom" in this context means a user-created HDF5 data-conversion filter, which is necessary since all data conversion is handled inside the HDF5 library. We've written several for things like the NumPy bool type, etc: https://github.com/h5py/h5py/blob/master/h5py/_conv.pyx As far as generic Unicode goes, we currently don't support the NumPy "U" dtype in h5py for similar reasons; there's no destination type in HDF5 which (1) would preserve the dtype for round-trip write/read operations and (2) doesn't risk truncation. A Latin-1 based 'a' type would have similar problems. > Does HDF enforce ascii-only? what does it do with the > 127 values? Unfortunately/fortunately the charset is not enforced for either ASCII or UTF-8, although the HDF Group has been thinking about it. > that's the whole issue with UTF-8 -- it needs to be addressed somewhere, and the numpy-HDF interface seems like a smarter place to put it than the numpy-user interface! I agree fixed-storage-width UTF-8 is likely too complex to use as a native NumPy type. Ideally, NumPy would support variable-length strings, in which case all these headaches would go away. But I imagine that's also somewhat complicated. :) Andrew From chris.barker at noaa.gov Tue Jul 15 16:45:41 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 15 Jul 2014 13:45:41 -0700 Subject: [Numpy-discussion] String type again. In-Reply-To: <1405423590.8281.7.camel@sebastian-t440> References: <1405423590.8281.7.camel@sebastian-t440> Message-ID: On Tue, Jul 15, 2014 at 4:26 AM, Sebastian Berg wrote: > Just wondering, couldn't we have a type which actually has an > (arbitrary, python supported) encoding (and "bytes" might even just be a > special case of no encoding)? well, then we're back to the core issue here: numpy dtypes need to be a pre-specified length encoded bytes are an arbitrary length. This leads us to wanting to use only fixed-number-of-bytes-per-character encodings: - ascii - latin-a - UCS-4 (or UTF-32..I get a bit confused about the names) maybe UCS-2 (NOT UTF-16) would be worth considering, for a compromise between space and fraction of unicode supported. Basically storing bytes and on access do > element[i].decode(specified_encoding) and on storing element[i] = > value.encode(specified_encoding). > this really doesn't seem that different than just using python strings -- is there a point to having a pointer-to-python-string type as a less generalized version of the currently possible python strings in object arrays? There is always the never ending small issue of trailing null bytes. If > we want to be fully compatible, such a type would have to store the > string length explicitly to support trailing null bytes. > are null bytes legal (as something other than a terminator) in some encodings? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From tsyu80 at gmail.com Wed Jul 16 00:37:13 2014 From: tsyu80 at gmail.com (Tony Yu) Date: Tue, 15 Jul 2014 23:37:13 -0500 Subject: [Numpy-discussion] `allclose` vs `assert_allclose` Message-ID: Is there any reason why the defaults for `allclose` and `assert_allclose` differ? This makes debugging a broken test much more difficult. More importantly, using an absolute tolerance of 0 causes failures for some common cases. For example, if two values are very close to zero, a test will fail: np.testing.assert_allclose(0, 1e-14) Git blame suggests the change was made in the following commit, but I guess that change only reverted to the original behavior. https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6bf It seems like the defaults for `allclose` and `assert_allclose` should match, and an absolute tolerance of 0 is probably not ideal. I guess this is a pretty big behavioral change, but the current default for `assert_allclose` doesn't seem ideal. Thanks, -Tony -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Wed Jul 16 03:06:07 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 16 Jul 2014 09:06:07 +0200 Subject: [Numpy-discussion] __numpy_ufunc__ In-Reply-To: References: Message-ID: On Mon, Jul 14, 2014 at 8:22 PM, Charles R Harris wrote: > Hi All, > > Julian has raised the question of including numpy_ufunc in numpy 1.9. I > don't feel strongly one way or the other, but it doesn't seem to be > finished yet and 1.10 might be a better place to work out the remaining > problems along with the astropy folks testing possible uses. > > Thoughts? > It's already in, so do you mean not using? Would help to know what the issue is, because it's finished enough that it's already used in a released version of scipy (in sparse matrices). Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Jul 16 04:07:40 2014 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 16 Jul 2014 09:07:40 +0100 Subject: [Numpy-discussion] __numpy_ufunc__ In-Reply-To: References: Message-ID: Weirdly, I never received Chuck's original email in this thread. Should some list admin be informed? I also am not sure what/where Julian's comments were, so I second the call for context :-). Putting it off until 1.10 doesn't seem like an obviously bad idea to me, but specifics would help... (__numpy_ufunc__ is the new system for allowing arbitrary third party objects to override how ufuncs are applied to them, i.e. it means np.sin(sparsemat) and np.sin(gpuarray) can be defined to do something sensible. Conceptually it replaces the old __array_prepare__/__array_wrap__ system, which was limited to ndarray subclasses and has major limits on what you can do. Of course __array_prepare/wrap__ will also continue to be supported for compatibility.) -n On 16 Jul 2014 00:10, "Benjamin Root" wrote: > Perhaps a bit of context might be useful? How is numpy_ufunc different > from the ufuncs that we know and love? What are the known implications? > What are the known shortcomings? Are there ABI and/or API concerns between > 1.9 and 1.10? > > Ben Root > > > On Mon, Jul 14, 2014 at 2:22 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> Hi All, >> >> Julian has raised the question of including numpy_ufunc in numpy 1.9. I >> don't feel strongly one way or the other, but it doesn't seem to be >> finished yet and 1.10 might be a better place to work out the remaining >> problems along with the astropy folks testing possible uses. >> >> Thoughts? >> >> Chuck >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Wed Jul 16 06:48:09 2014 From: toddrjen at gmail.com (Todd) Date: Wed, 16 Jul 2014 12:48:09 +0200 Subject: [Numpy-discussion] String type again. In-Reply-To: References: Message-ID: On Jul 16, 2014 11:43 AM, "Chris Barker" wrote: > So numpy should have dtypes to match these. We're a bit stuck, however, because 'S' mapped to the py2 string type, which no longer exists in py3. Sorry not running py3 to see what 'S' does now, but I know it's bit broken, and may be too late to change it In py3 a 'S' dtype is converted to a python bytes object. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Wed Jul 16 09:18:44 2014 From: chaoyuejoy at gmail.com (Chao YUE) Date: Wed, 16 Jul 2014 15:18:44 +0200 Subject: [Numpy-discussion] Rounding float to integer while minizing the difference between the two arrays? Message-ID: Dear all, I have two arrays with both float type, let's say X and Y. I want to round the X to integers (intX) according to some decimal threshold, at the same time I want to limit the following difference as small: diff = np.sum(X*Y) - np.sum(intX*Y) I don't have to necessarily minimize the "diff" variable (If with this demand the computation time is too long). But I would like to limit the "diff" to, let's say ten percent within np.sum(X*Y). I have tried to write some functions, but I don't know where to start the opitimization. def convert_integer(x,threshold=0): """ This fucntion converts the float number x to integer according to the threshold. """ if abs(x-0) < 1e5: return 0 else: pdec,pint = math.modf(x) if pdec > threshold: return int(math.ceil(pint)+1) else: return int(math.ceil(pint)) def convert_arr(arr,threshold=0): out = arr.copy() for i,num in enumerate(arr): out[i] = convert_integer(num,threshold=threshold) return out In [147]: convert_arr(np.array([0.14,1.14,0.12]),0.13) Out[147]: array([1, 2, 0]) Now my problem is, how can I minimize or limit the following? diff = np.sum(X*Y) - np.sum(convert_arr(X,threshold=?)*Y) Because it's the first time I encounter such kind of question, so please give me some clue to start :p Thanks a lot in advance. Best, Chao -- please visit: http://www.globalcarbonatlas.org/ *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Jul 16 09:52:31 2014 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 16 Jul 2014 14:52:31 +0100 Subject: [Numpy-discussion] `allclose` vs `assert_allclose` In-Reply-To: References: Message-ID: On 16 Jul 2014 10:26, "Tony Yu" wrote: > > Is there any reason why the defaults for `allclose` and `assert_allclose` differ? This makes debugging a broken test much more difficult. More importantly, using an absolute tolerance of 0 causes failures for some common cases. For example, if two values are very close to zero, a test will fail: > > np.testing.assert_allclose(0, 1e-14) > > Git blame suggests the change was made in the following commit, but I guess that change only reverted to the original behavior. > > https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6bf > > It seems like the defaults for `allclose` and `assert_allclose` should match, and an absolute tolerance of 0 is probably not ideal. I guess this is a pretty big behavioral change, but the current default for `assert_allclose` doesn't seem ideal. What you say makes sense to me, and loosening the default tolerances won't break any existing tests. (And I'm not too worried about people who were counting on getting 1e-7 instead of 1e-5 or whatever... if it matters that much to you exactly what tolerance you test, you should be setting the tolerance explicitly!) I vote that unless someone comes up with some terrible objection in the next few days then you should submit a PR :-) -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From aldcroft at head.cfa.harvard.edu Wed Jul 16 10:01:45 2014 From: aldcroft at head.cfa.harvard.edu (Aldcroft, Thomas) Date: Wed, 16 Jul 2014 10:01:45 -0400 Subject: [Numpy-discussion] String type again. In-Reply-To: References: <1405423590.8281.7.camel@sebastian-t440> Message-ID: On Tue, Jul 15, 2014 at 11:15 AM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > > On Tue, Jul 15, 2014 at 5:26 AM, Sebastian Berg < > sebastian at sipsolutions.net> wrote: > >> On Sa, 2014-07-12 at 12:17 -0500, Charles R Harris wrote: >> > As previous posts have pointed out, Numpy's `S` type is currently >> > treated as a byte string, which leads to more complicated code in >> > python3. OTOH, the unicode type is stored as UCS4, which consumes a >> > lot of space, especially for ascii strings. This note proposes to >> > adapt the currently existing 'a' type letter, currently aliased to >> > 'S', as a new fixed encoding dtype. Python 3.3 introduced two one byte >> > internal representations for unicode strings, ascii and latin1. Ascii >> > has the advantage that it is a subset of UTF-8, whereas latin1 has a >> > few more symbols. Another possibility is to just make it an UTF-8 >> > encoding, but I think this would involve more overhead as Python would >> > need to determine the maximum character size. These are just >> > preliminary thoughts, comments are welcome. >> > >> >> Just wondering, couldn't we have a type which actually has an >> (arbitrary, python supported) encoding (and "bytes" might even just be a >> special case of no encoding)? Basically storing bytes and on access do >> element[i].decode(specified_encoding) and on storing element[i] = >> value.encode(specified_encoding). >> >> There is always the never ending small issue of trailing null bytes. If >> we want to be fully compatible, such a type would have to store the >> string length explicitly to support trailing null bytes. >> > > UTF-8 encoding works with null bytes. That is one of the reasons it is so > popular. > > > Thinking more about it, the easiest thing to do might be to make the S > dtype a UTF-8 encoding. Most of the machinery to deal with that is already > in place. That change might affect some users though, and we might need to > do some work to make it backwards compatible with python 2. > > Chuck Are you saying that numpy S dtypes would be exported to Py3 as str? This would work in my use case, though it seems it would break things for the (few-ish) people using the numpy S type in Py3 since it would now look like a Python str instead of bytes object. One other thought is that one *might* finesse the fixed width vs. utf-8 variable length issue by using the exact same rules that currently apply to strings in Py2: - When setting an array from input like a list of strings (unicode in Py3), make the array wide enough to handle the widest (in bytes) entry. - When setting an element in an existing array, truncate any characters that don't fit in the existing width. In the second point note that the truncation would be full unicode characters, not bytes. This could be a point of confusion in some cases, but it's simple to implement and formally consistent with current behavior. - Tom p.s. Strangely enough the mail I quoted from Chuck beginning with "Thinking about it more .." never got to my email and I only happened to have seen it in the archives. > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Wed Jul 16 11:26:05 2014 From: chaoyuejoy at gmail.com (Chao YUE) Date: Wed, 16 Jul 2014 17:26:05 +0200 Subject: [Numpy-discussion] Rounding float to integer while minizing the difference between the two arrays? In-Reply-To: References: Message-ID: Sorry, there is one error in this part of code, it should be: def convert_integer(x,threshold=0): """ This fucntion converts the float number x to integer according to the threshold. """ if abs(x-0) < 1e-5: return 0 else: pdec,pint = math.modf(x) if pdec > threshold: return int(math.ceil(pint)+1) else: return int(math.ceil(pint)) On Wed, Jul 16, 2014 at 3:18 PM, Chao YUE wrote: > Dear all, > > I have two arrays with both float type, let's say X and Y. I want to round > the X to integers (intX) according to some decimal threshold, at the same > time I want to limit the following difference as small: > > diff = np.sum(X*Y) - np.sum(intX*Y) > > I don't have to necessarily minimize the "diff" variable (If with this > demand the computation time is too long). But I would like to limit the > "diff" to, let's say ten percent within np.sum(X*Y). > > I have tried to write some functions, but I don't know where to start the > opitimization. > > def convert_integer(x,threshold=0): > """ > This fucntion converts the float number x to integer according to the > threshold. > """ > if abs(x-0) < 1e5: > return 0 > else: > pdec,pint = math.modf(x) > if pdec > threshold: > return int(math.ceil(pint)+1) > else: > return int(math.ceil(pint)) > > def convert_arr(arr,threshold=0): > out = arr.copy() > for i,num in enumerate(arr): > out[i] = convert_integer(num,threshold=threshold) > return out > > In [147]: > convert_arr(np.array([0.14,1.14,0.12]),0.13) > > Out[147]: > array([1, 2, 0]) > > Now my problem is, how can I minimize or limit the following? > diff = np.sum(X*Y) - np.sum(convert_arr(X,threshold=?)*Y) > > Because it's the first time I encounter such kind of question, so please > give me some clue to start :p Thanks a lot in advance. > > Best, > > Chao > > -- > please visit: > http://www.globalcarbonatlas.org/ > > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > > ************************************************************************************ > -- please visit: http://www.globalcarbonatlas.org/ *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Wed Jul 16 13:16:13 2014 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 16 Jul 2014 20:16:13 +0300 Subject: [Numpy-discussion] __numpy_ufunc__ and 1.9 release In-Reply-To: <53C56DA2.40402@googlemail.com> References: <53C56DA2.40402@googlemail.com> Message-ID: <53C6B35D.9020609@iki.fi> Hi, 15.07.2014 21:06, Julian Taylor kirjoitti: [clip: __numpy_ufunc__] > So I'm wondering if we should delay the introduction of this > feature to 1.10 or is it important enough to wait until there is a > consensus on the remaining issues? My 10c: The feature is not so much in hurry that it alone should delay 1.9. Moreover, it's best for everyone that it is bug-free on the first go, and it gets some real-world testing before the release. Better safe than sorry. I'd pull it out from 1.9.x branch, and iron out the remaining wrinkles before 1.10. Pauli From aldcroft at head.cfa.harvard.edu Wed Jul 16 13:32:44 2014 From: aldcroft at head.cfa.harvard.edu (Aldcroft, Thomas) Date: Wed, 16 Jul 2014 13:32:44 -0400 Subject: [Numpy-discussion] String type again. In-Reply-To: References: Message-ID: On Wed, Jul 16, 2014 at 6:48 AM, Todd wrote: > On Jul 16, 2014 11:43 AM, "Chris Barker" wrote: > > So numpy should have dtypes to match these. We're a bit stuck, however, > because 'S' mapped to the py2 string type, which no longer exists in py3. > Sorry not running py3 to see what 'S' does now, but I know it's bit broken, > and may be too late to change it > > In py3 a 'S' dtype is converted to a python bytes object. > As a slightly philosophical aside, at some point during Scipy, Nick Coghlan said that the core Python team had stopped recommending the use of `from __future__ import unicode_literals` for Python 2 / 3 compatible code. I have some experience now with writing 2 / 3 code for astropy and I came to the same conclusion. The point is that `str` is the "natural" text class that is used by default for both 2 and 3. Most scientific Py2 code is written to this model. Following this to the Py3 end, that would imply that the most natural convention for numpy S dtype in Py3 would be that it gets to Python as a utf-8 `str`, as Chuck suggested. I think the variable-length encoding issue is not such a problem if you follow the existing numpy convention of truncating overflowing strings on assignment. Using utf-8 like this would (I think) make most Py2 code that uses HDF5 and FITS ASCII string data just work out of the box on Py3, which would be super. - Tom > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Wed Jul 16 14:47:24 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 16 Jul 2014 20:47:24 +0200 Subject: [Numpy-discussion] `allclose` vs `assert_allclose` In-Reply-To: References: Message-ID: On Wed, Jul 16, 2014 at 6:37 AM, Tony Yu wrote: > Is there any reason why the defaults for `allclose` and `assert_allclose` > differ? This makes debugging a broken test much more difficult. More > importantly, using an absolute tolerance of 0 causes failures for some > common cases. For example, if two values are very close to zero, a test > will fail: > > np.testing.assert_allclose(0, 1e-14) > > Git blame suggests the change was made in the following commit, but I > guess that change only reverted to the original behavior. > > > https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6bf > Indeed, was reverting a change that crept into https://github.com/numpy/numpy/commit/f527b49a > > It seems like the defaults for `allclose` and `assert_allclose` should > match, and an absolute tolerance of 0 is probably not ideal. I guess this > is a pretty big behavioral change, but the current default for > `assert_allclose` doesn't seem ideal. > I agree, current behavior quite annoying. It would make sense to change the atol default to 1e-8, but technically it's a backwards compatibility break. Would probably have a very minor impact though. Changing the default for rtol in one of the functions may be much more painful though, I don't think that should be done. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Wed Jul 16 14:53:32 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 16 Jul 2014 20:53:32 +0200 Subject: [Numpy-discussion] __numpy_ufunc__ In-Reply-To: References: Message-ID: On Wed, Jul 16, 2014 at 10:07 AM, Nathaniel Smith wrote: > Weirdly, I never received Chuck's original email in this thread. Should > some list admin be informed? > Also weirdly, my reply didn't show up on gmane. Not sure if it got through, so re-sending: It's already in, so do you mean not using? Would help to know what the issue is, because it's finished enough that it's already used in a released version of scipy (in sparse matrices). Ralf I also am not sure what/where Julian's comments were, so I second the call > for context :-). Putting it off until 1.10 doesn't seem like an obviously > bad idea to me, but specifics would help... > > (__numpy_ufunc__ is the new system for allowing arbitrary third party > objects to override how ufuncs are applied to them, i.e. it means > np.sin(sparsemat) and np.sin(gpuarray) can be defined to do something > sensible. Conceptually it replaces the old __array_prepare__/__array_wrap__ > system, which was limited to ndarray subclasses and has major limits on > what you can do. Of course __array_prepare/wrap__ will also continue to be > supported for compatibility.) > -n > On 16 Jul 2014 00:10, "Benjamin Root" wrote: > >> Perhaps a bit of context might be useful? How is numpy_ufunc different >> from the ufuncs that we know and love? What are the known implications? >> What are the known shortcomings? Are there ABI and/or API concerns between >> 1.9 and 1.10? >> >> Ben Root >> >> >> On Mon, Jul 14, 2014 at 2:22 PM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> Hi All, >>> >>> Julian has raised the question of including numpy_ufunc in numpy 1.9. I >>> don't feel strongly one way or the other, but it doesn't seem to be >>> finished yet and 1.10 might be a better place to work out the remaining >>> problems along with the astropy folks testing possible uses. >>> >>> Thoughts? >>> >>> Chuck >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Jul 16 16:51:39 2014 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Wed, 16 Jul 2014 13:51:39 -0700 Subject: [Numpy-discussion] String type again. In-Reply-To: References: Message-ID: <-4597269384285942771@unknownmsgid> > But HDF5 > additionally has a fixed-storage-width UTF8 type, so we could map to a > NumPy fixed-storage-width type trivially. Sure -- this is why *nix uses utf-8 for filenames -- it can just be a char*. But that just punts the problem to client code. I think a UTF-8 string type does not match the numpy model well, and I don't think we should support it just because it would be easier for the HDF 5 wrappers. ( to be fair, there are probably other similar systems numpy wants to interface with that cod use this...) It seems if you want a 1:1 binary mapping between HDF and numpy for utf strings, then a bytes type in numpy makes more sense. Numpy could/should have encode and decode methods for converting byte arrays to/from Unicode arrays (does it already? ). > "Custom" in this context means a user-created HDF5 data-conversion > filter, which is necessary since all data conversion is handled inside > the HDF5 library. > As far as generic Unicode goes, we currently don't support the NumPy > "U" dtype in h5py for similar reasons; there's no destination type in > HDF5 which (1) would preserve the dtype for round-trip write/read > operations and (2) doesn't risk truncation. It sounds to like HDF5 simply doesn't support Unicode. Calling an array of bytes utf-8 simple pushes the problem on to client libs. As that's where the problem lies, then the PyHDF may be the place to address it. If we put utf-8 in numpy, we have the truncation problem there instead -- which is exactly what I think we should avoid. > A Latin-1 based 'a' type > would have similar problems. Maybe not -- latin1 is fixed width. >> Does HDF enforce ascii-only? what does it do with the > 127 values? > > Unfortunately/fortunately the charset is not enforced for either ASCII So you can dump Latin-1 into and out of the HDF 'ASCII' type -- it's essentially the old char* / py2 string. An ugly situation, but why not use it? > or UTF-8, So ASCII and utf-8 are really the same thing, with different meta-data... > although the HDF Group has been thinking about it. I wonder if they would consider going Latin-1 instead of ASCII -- similarly to utf-8 it's backward compatible with ASCII, but gives you a little more. I don't know that there is another 1byte encoding worth using -- it maybe be my English bias, but it seems Latin-1 gives us ASCII+some extra stuff handy for science ( I use the degree symbol a lot, for instance) with nothing lost. > Ideally, NumPy would support variable-length > strings, in which case all these headaches would go away. Would they? That would push the problem back to PyHDF -- which I'm arguing is where it belongs, but I didn't think you were ;-) > > But I > imagine that's also somewhat complicated. :) That's a whole other kettle of fish, yes. -Chris From chaoyuejoy at gmail.com Wed Jul 16 16:59:32 2014 From: chaoyuejoy at gmail.com (Chao YUE) Date: Wed, 16 Jul 2014 22:59:32 +0200 Subject: [Numpy-discussion] Rounding float to integer while minizing the difference between the two arrays? In-Reply-To: References: Message-ID: Dear all, A bit sorry, this is not difficult. scipy.optimize.minimize_scalar seems to solve my problem. Thanks anyway, for this great tool. Cheers, Chao On Wed, Jul 16, 2014 at 3:18 PM, Chao YUE wrote: > Dear all, > > I have two arrays with both float type, let's say X and Y. I want to round > the X to integers (intX) according to some decimal threshold, at the same > time I want to limit the following difference as small: > > diff = np.sum(X*Y) - np.sum(intX*Y) > > I don't have to necessarily minimize the "diff" variable (If with this > demand the computation time is too long). But I would like to limit the > "diff" to, let's say ten percent within np.sum(X*Y). > > I have tried to write some functions, but I don't know where to start the > opitimization. > > def convert_integer(x,threshold=0): > """ > This fucntion converts the float number x to integer according to the > threshold. > """ > if abs(x-0) < 1e5: > return 0 > else: > pdec,pint = math.modf(x) > if pdec > threshold: > return int(math.ceil(pint)+1) > else: > return int(math.ceil(pint)) > > def convert_arr(arr,threshold=0): > out = arr.copy() > for i,num in enumerate(arr): > out[i] = convert_integer(num,threshold=threshold) > return out > > In [147]: > convert_arr(np.array([0.14,1.14,0.12]),0.13) > > Out[147]: > array([1, 2, 0]) > > Now my problem is, how can I minimize or limit the following? > diff = np.sum(X*Y) - np.sum(convert_arr(X,threshold=?)*Y) > > Because it's the first time I encounter such kind of question, so please > give me some clue to start :p Thanks a lot in advance. > > Best, > > Chao > > -- > please visit: > http://www.globalcarbonatlas.org/ > > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > > ************************************************************************************ > -- please visit: http://www.globalcarbonatlas.org/ *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Wed Jul 16 18:20:40 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Thu, 17 Jul 2014 00:20:40 +0200 Subject: [Numpy-discussion] parallel distutils extensions build? use gcc -flto Message-ID: <53C6FAB8.40107@googlemail.com> hi, I have been playing around a bit with gccs link time optimization feature and found that using it actually speeds up a from scratch build of numpy due to its ability to perform parallel optimization and linking. As a bonus you also should get faster binaries due to the better optimizations lto allows. As compiling with lto does require some possibly lesser know details I wanted to share it. Prerequesits are a working gcc toolchain of at least gcc-4.8 and binutils > 2.21, gcc 4.9 is better as its faster. First of all numpy checks the long double representation by compiling a file and looking at the binary, this won't work as the od -b reimplementation here does not understand lto objects, so on x86 we must short circuit that: --- a/numpy/core/setup_common.py +++ b/numpy/core/setup_common.py @@ -174,6 +174,7 @@ def check_long_double_representation(cmd): # We need to use _compile because we need the object filename src, object = cmd._compile(body, None, None, 'c') try: + return 'IEEE_DOUBLE_LE' type = long_double_representation(pyod(object)) return type finally: Next we build numpy as usual but override the compiler, linker and ar to add our custom flags. The setup.py call would look like this: CC='gcc -fno-fat-lto-objects -flto=4 -fuse-linker-plugin -O3' \ LDSHARED='gcc -fno-fat-lto-objects -flto=4 -fuse-linker-plugin -shared -O3' AR=gcc-ar \ python setup.py build_ext Some explanation: The ar override is needed as numpy builds a static library and ar needs to know about lto objects. gcc-ar does exactly that. -flto=4 the main flag tell gcc to perform link time optimizations using 4 parallel processes. -fno-fat-lto-objects tells gcc to only build lto objects, normally it builds both an lto object and a normal object for toolchain compatibilty. If our toolchain can handle lto objects this is just a waste of time and we skip it. (The flag is default in gcc-4.9 but not 4.8) -fuse-linker-plugin directs it to run its link time optimizer plugin in the linking step, the linker must support plugins, both bfd (> 2.21) and gold linker do so. This allows for more optimizations. -O3 has to be added to the linker too as thats where the optimization occurs. In general a problem with lto is that the compiler options of all steps much match the flags used for linking. If you are using c++ or gfortran you also have to override that to use lto (CXX and FF(?)) See https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html for a lot more details. For some numbers on my machine a from scratch numpy build with no caching takes 1min55s, with lto on 4 it only takes 55s. Pretty neat for a much more involved optimization process. Concerning the speed gain we get by this, I ran our benchmark suite with this build, there were no really significant gains which is somewhat expected as numpy is simple C code with most function bottlenecks already inlined. So conclusion: flto seems to work well with recent gccs and allows for faster builds using the limited distutils. While probably not useful for development where compiler caching (ccache) is of utmost importance it is still interesting for projects doing one shot uncached builds (travis like CI) and have huge objects (e.g. swig or cython) and don't want to change to proper parallel build systems like bento. PS: So far I know clang also supports lto but I never used it PPS: using NPY_SEPARATE_COMPILATION=0 crashes gcc-4.9, time for a bug report. Cheers, Julian From fperez.net at gmail.com Wed Jul 16 23:08:58 2014 From: fperez.net at gmail.com (Fernando Perez) Date: Wed, 16 Jul 2014 20:08:58 -0700 Subject: [Numpy-discussion] Numpy BoF at SciPy 2014 - quick report Message-ID: Hi all, sorry for not posting earlier, post-conference InboxInfinity blues and all that... The BoF did go as planned, and it was a good discussion, mostly following the tentative agenda outlined here: https://github.com/numpy/numpy/wiki/Numpy-BoF-at-Scipy-2014 Various folks were kind enough to take notes during the conversation on an Etherpad instance: https://scipy2014.etherpad.mozilla.org/35 For the sake of completeness and future reference, below I'm including a copy of the notes in this email. Other than what's in the notes, my take home from the discussion is mostly that: - we probably needed a longer slot than 45 minutes to have a chance to dig in a little deeper. - it would have been more productive if a focused numpy sprint had been also planned, so that there could be more structured follow-up on the ideas that came up. It would be great to hear from others who were present at the conference. In particular, Chris Barker brought up a number of things regarding datetime and planned on following up during the sprints, but I'm not sure what ended up happening. Thanks to everyone who participated! Cheers f #### Copy of Etherpad notes as of 7/16/2014: Notes from BoF: 1:30, July 19, 2014 Working with topics on this page: https://github.com/numpy/numpy/wiki/Numpy-BoF-at-Scipy-2014 chuck: where do we go from here? -- what is the role of numpy now? Generalized ufuncs -- still some more to do -- (LA stuff - norms) - some ufuncs don't impliment array interface -- which are those -- sprint topic? - zeros_like, ones_like, more... (duplicate) github issue: https://github.com/numpy/numpy/issues/4862 Here's the original issue: https://github.com/numpy/numpy/issues/3602 Implementation of @ (matrix multiplication) - will be in 3.5 ~ 18months - no work started yet -- have to make sure we do it. - @@ was not added. - The PEP for numpy is well-defined. Not much thinking to be done. (Good for a sprint) Datetime: - Can it be done? -- too many calendars -- to many time scales, etc. - Can we cover most applications? - DynND -- higher abstraction -- convert to back end implimentation - Also look at what R and Julia do? - Maybe fix up the little issues in datetime64, first? - Pandas does not use numpy machinery - uses a array of objects: those objects are subclassed form datetime.datetime - does use int64, but gets unboxed on storage. - Root cause is using UTC, rather than a naive time. - Naive is not associated with a time zone. Can be interpreted in any way. - Ripping out the locale timezone on I/O would help. - More often than not, using the locale timezone is not desired. - For example, many experimental data do not attach time zones. (Or wrong timezone) - Consider laboratory time (stopwatch rather than a clock). (timedelta) - The C++ committee is standardizing this. - A key feature which is missing, is being able to choose your epoch. New DTypes - Example: quad float types. A solution for missing values? Adding units support. - Record & structured arrays play around with dtypes. Needs to be easier to use these. - Improve documentation. - How to extend to support things like labeled arrays? - This is orthogonal to dtypes. - Would rather access time column instead of 3rd column. - Would provide a better foundation for pandas. - Key is to keep inputs simple. - Finish the DataArray push? - We are very closely there. It has been sitting there for a while. - If interested, talk at sprints on July 10. Missing values? - maybe improve masked array. - give up for now. Inheriting ndarray - introduces many bugs. - should discourage this, but make it easier to work with it. Dynd - The issues discussed so far were motivation for starting dynd - for example, a pluggable type system - adding a categorical type in numpy (at Continuum) broke lots. Easier in dynd. - Commitment for dynd is to give it a numpy-like API - Both need to evolve together. - Find ways to make things more uniform (in numpy) - Dynd is more an experimental phase, changing quickly. - Can we import dynd as np? - Not a goal. More exploratory in this phase. - Adding a layer like that at a later time would be good. Not there, yet. - Do not want to repeat py2->py3 debacle. - Buffer protocol: - Supported, but dynd extends it. - As a pure C++ library, goal is to freeze once stable so systems beyond Python can depend on it as a stable interface for working with array data. Boost::Python - Nothing official from numpy for using numpy arrays in C++ - Not prioritized. - Numpy has gotten better about namespace pollution? - It kind of works already. Talk to Mike Droettboom -- Fernando Perez (@fperez_org; http://fperez.org) fperez.net-at-gmail: mailing lists only (I ignore this when swamped!) fernando.perez-at-berkeley: contact me here for any direct mail -------------- next part -------------- An HTML attachment was scrubbed... URL: From joseph.martinot-lagarde at m4x.org Wed Jul 16 14:57:26 2014 From: joseph.martinot-lagarde at m4x.org (Joseph Martinot-Lagarde) Date: Wed, 16 Jul 2014 20:57:26 +0200 Subject: [Numpy-discussion] String type again. In-Reply-To: References: Message-ID: <53C6CB16.2060503@m4x.org> Le 15/07/2014 18:18, Chris Barker a ?crit : > (or does HDF support var-length > elements?) > It does: http://www.hdfgroup.org/HDF5/doc/TechNotes/VLTypes.html From sebastian at sipsolutions.net Tue Jul 15 07:16:34 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 15 Jul 2014 13:16:34 +0200 Subject: [Numpy-discussion] Bug in np.cross for 2D vectors In-Reply-To: <1405416176.45058.YahooMailNeo@web133104.mail.ir2.yahoo.com> References: <1405416176.45058.YahooMailNeo@web133104.mail.ir2.yahoo.com> Message-ID: <1405422994.8281.1.camel@sebastian-t440> On Di, 2014-07-15 at 10:22 +0100, Neil Hodgson wrote: > Hi, > > We came across this bug while using np.cross on 3D arrays of 2D > vectors. Hi, which numpy version are you using? Until recently, the cross product simply did *not* work in a broadcasting manner (3d arrays of 2d vectors), it did something, but usually not the right thing. This is fixed in recent versions (not sure if 1.8 or only now with 1.9) - Sebastian > The first example shows the problem and we looked at the source for > np.cross and believe we found the bug - an unnecessary swapaxes when > returning the output (comment inserted in the code). > > Thanks > Neil > > # Example > > shape = (3,5,7,2) > > > # These are effectively 3D arrays (3*5*7) of 2D vectors > data1 = np.random.randn(*shape) > data2 = np.random.randn(*shape) > > > # The cross product of data1 and data2 should produce a (3*5*7) array > of scalars > cross_product_longhand = > data1[:,:,:,0]*data2[:,:,:,1]-data1[:,:,:,1]*data2[:,:,:,0] > print 'longhand output shape:',cross_product_longhand.shape # and it > does > > > cross_product_numpy = np.cross(data1,data2) > print 'numpy output shape:',cross_product_numpy.shape # It seems to > have transposed the last 2 dimensions > > > if (cross_product_longhand == np.transpose(cross_product_numpy, > (0,2,1))).all(): > print 'Unexpected transposition in numpy.cross (numpy version %s)'% > np.__version__ > > > # np.cross L1464 > if axis is not None: > axisa, axisb, axisc=(axis,)*3 > a = asarray(a).swapaxes(axisa, 0) > b = asarray(b).swapaxes(axisb, 0) > msg = "incompatible dimensions for cross product\n"\ > "(dimension must be 2 or 3)" > if (a.shape[0] not in [2, 3]) or (b.shape[0] not in [2, 3]): > raise ValueError(msg) > if a.shape[0] == 2: > if (b.shape[0] == 2): > cp = a[0]*b[1] - a[1]*b[0] > if cp.ndim == 0: > return cp > else: > ## WE SHOULD NOT SWAPAXES HERE! > ## For 2D vectors the first axis has been > > ## collapsed during the cross product > return cp.swapaxes(0, axisc) > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From sebastian at sipsolutions.net Wed Jul 16 05:14:10 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 16 Jul 2014 11:14:10 +0200 Subject: [Numpy-discussion] __numpy_ufunc__ In-Reply-To: References: Message-ID: <1405502050.6657.0.camel@sebastian-t440> On Mi, 2014-07-16 at 09:07 +0100, Nathaniel Smith wrote: > Weirdly, I never received Chuck's original email in this thread. > Should some list admin be informed? > I send some mails yesterday and they never arrived... Not sure if it is a problem on my side or not. > I also am not sure what/where Julian's comments were, so I second the > call for context :-). Putting it off until 1.10 doesn't seem like an > obviously bad idea to me, but specifics would help... > > (__numpy_ufunc__ is the new system for allowing arbitrary third party > objects to override how ufuncs are applied to them, i.e. it means > np.sin(sparsemat) and np.sin(gpuarray) can be defined to do something > sensible. Conceptually it replaces the old > __array_prepare__/__array_wrap__ system, which was limited to ndarray > subclasses and has major limits on what you can do. Of course > __array_prepare/wrap__ will also continue to be supported for > compatibility.) > > -n > > On 16 Jul 2014 00:10, "Benjamin Root" wrote: > Perhaps a bit of context might be useful? How is numpy_ufunc > different from the ufuncs that we know and love? What are the > known implications? What are the known shortcomings? Are there > ABI and/or API concerns between 1.9 and 1.10? > > > Ben Root > > > > On Mon, Jul 14, 2014 at 2:22 PM, Charles R Harris > wrote: > Hi All, > > Julian has raised the question of including > numpy_ufunc in numpy 1.9. I don't feel strongly one > way or the other, but it doesn't seem to be finished > yet and 1.10 might be a better place to work out the > remaining problems along with the astropy folks > testing possible uses. > > > Thoughts? > > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From njs at pobox.com Thu Jul 17 07:04:04 2014 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 17 Jul 2014 12:04:04 +0100 Subject: [Numpy-discussion] Mailing list slowdown (was Re: __numpy_ufunc__) Message-ID: On 17 Jul 2014 11:51, "Sebastian Berg" wrote: > > On Mi, 2014-07-16 at 09:07 +0100, Nathaniel Smith wrote: > > Weirdly, I never received Chuck's original email in this thread. > > Should some list admin be informed? > > > > I send some mails yesterday and they never arrived... Not sure if it is > a problem on my side or not. I did eventually get Chuck's original message, but not until several days later. CC'ing postmaster at enthought.com in case they have some insight into what's going on! -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From hodgson.neil at yahoo.co.uk Wed Jul 16 16:25:44 2014 From: hodgson.neil at yahoo.co.uk (Neil Hodgson) Date: Wed, 16 Jul 2014 21:25:44 +0100 Subject: [Numpy-discussion] Bug in np.cross for 2D vectors Message-ID: <1405542344.31622.YahooMailNeo@web133105.mail.ir2.yahoo.com> > Hi, > > We came across this bug while using np.cross on 3D arrays of 2D vectors. > > What version of numpy are you using? This should already be solved in numpy > master, and be part of the 1.9 release. Here's the relevant commit, > although the code has been cleaned up a bit in later ones: > https://github.com/numpy/numpy/commit/b9454f50f23516234c325490913224c3a69fb122 > Jaime Yes, we are using 1.8 - sorry I should have checked! Thanks Neil -------------- next part -------------- An HTML attachment was scrubbed... URL: From hodgson.neil at yahoo.co.uk Thu Jul 17 08:06:45 2014 From: hodgson.neil at yahoo.co.uk (Neil Hodgson) Date: Thu, 17 Jul 2014 13:06:45 +0100 Subject: [Numpy-discussion] Bug in np.cross for 2D vectors In-Reply-To: <1405542344.31622.YahooMailNeo@web133105.mail.ir2.yahoo.com> References: <1405542344.31622.YahooMailNeo@web133105.mail.ir2.yahoo.com> Message-ID: <1405598805.57307.YahooMailNeo@web133104.mail.ir2.yahoo.com> > Hi, > > We came across this bug while using np.cross on 3D arrays of 2D vectors. > > What version of numpy are you using? This should already be solved in numpy > master, and be part of the 1.9 release. Here's the relevant commit, > although the code has been cleaned up a bit in later ones: > https://github.com/numpy/numpy/commit/b9454f50f23516234c325490913224c3a69fb122 > Jaime >Hi, > >which numpy version are you using? Until recently, the cross product >simply did *not* work in a broadcasting manner (3d arrays of 2d >vectors), it did something, but usually not the right thing. This is >fixed in recent versions (not sure if 1.8 or only now with 1.9) >- Sebastian Hi, I thought I replied, but I don't see it on the list, so here goes again... Yes, we are using 1.8, will confirm it's ok with 1.9 Thanks Neil -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Jul 17 11:37:24 2014 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 17 Jul 2014 16:37:24 +0100 Subject: [Numpy-discussion] `allclose` vs `assert_allclose` In-Reply-To: References: Message-ID: On Wed, Jul 16, 2014 at 7:47 PM, Ralf Gommers wrote: > > On Wed, Jul 16, 2014 at 6:37 AM, Tony Yu wrote: >> It seems like the defaults for `allclose` and `assert_allclose` should >> match, and an absolute tolerance of 0 is probably not ideal. I guess this is >> a pretty big behavioral change, but the current default for >> `assert_allclose` doesn't seem ideal. > > I agree, current behavior quite annoying. It would make sense to change the > atol default to 1e-8, but technically it's a backwards compatibility break. > Would probably have a very minor impact though. Changing the default for > rtol in one of the functions may be much more painful though, I don't think > that should be done. Currently we have: allclose: rtol=1e-5, atol=1e-8 assert_allclose: rtol=1e-7, atol=0 Why would it be painful to change assert_allclose to match allclose? It would weaken some tests, but no code would break. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From njs at pobox.com Thu Jul 17 11:48:19 2014 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 17 Jul 2014 16:48:19 +0100 Subject: [Numpy-discussion] String type again. In-Reply-To: References: <1405423590.8281.7.camel@sebastian-t440> Message-ID: On Tue, Jul 15, 2014 at 4:29 PM, Charles R Harris wrote: > Thinking more about it, the easiest thing to do might be to make the S dtype > a UTF-8 encoding. Most of the machinery to deal with that is already in > place. That change might affect some users though, and we might need to do > some work to make it backwards compatible with python 2. I'd be very concerned about backcompat for existing code that uses e.g. "S128" as a dtype to mean "128 arbitrary bytes". An example is this file format reading code: https://github.com/rerpy/rerpy/blob/master/rerpy/io/erpss.py#L123 The file format says there are 128 bytes there, and their interpretation depends on other fields in the header -- but in one case, for "large montages", there's an encoding where every 3 bytes represents 4 characters using an ad hoc 6-bit character set: https://github.com/rerpy/rerpy/blob/master/rerpy/io/erpss.py#L133 Perhaps this case could be handled better by using a u8 subarray or something (that code also goes to some efforts to work around nul padding), and that particular project hasn't been ported to py3 yet so technically wouldn't be affected if we changed the meaning of "S" on py3. But it does seem useful to have a "fixed length bytes" dtype even in py3, and if we declare that be "S" then it avoids breaking any existing code depending on it... -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From njs at pobox.com Thu Jul 17 11:52:59 2014 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 17 Jul 2014 16:52:59 +0100 Subject: [Numpy-discussion] String type again. In-Reply-To: References: Message-ID: On Tue, Jul 15, 2014 at 7:40 PM, Aldcroft, Thomas wrote: > > On Sat, Jul 12, 2014 at 8:02 PM, Nathaniel Smith wrote: >> >> OTOH, fixed length nul padded latin1 would be useful for various flat file >> reading tasks. > > As one of the original agitators for this, let me re-iterate that what the > astronomical community *really* wants is the original proposal as described > by Chris Barker [1] and essentially what Charles said. We have large data > archives that have ASCII string data in binary formats like FITS and HDF5. > The current readers for those datasets present users with numpy S data > types, which in Python 3 cannot be compared to str (unicode) literals. In > many cases those datasets are large, and in my case I regularly deal with > multi-Gb sized bytestring arrays. Converting those to a U dtype is not > practical. This is feedback is *super* useful, thanks. Can you elaborate a bit more on your requirements? I get that: - You have data that is treated as text, so it is convenient to be able to use Python strings for things like equality tests, np.sum(arr == "green") etc. - Your data uses only ASCII characters, and you don't want to spend more than 1 byte of memory per character. Do you ever have 8 bit characters, and if so, what encoding do you use? Does it matter to you that the memory layout for these 1-byte-per-char strings remain fixed-width nul-padded concatenated strings (e.g., because you are mmap'ing files that have this format)? Or do FITS/HDF5 handle layout details internally and you don't care so long as the above requirements are met? Does the fixed-width nature of numpy strings cause problems in the above setting? -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From njs at pobox.com Thu Jul 17 12:11:11 2014 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 17 Jul 2014 17:11:11 +0100 Subject: [Numpy-discussion] [SciPy-Dev] __numpy_ufunc__ and 1.9 release In-Reply-To: <53C56DA2.40402@googlemail.com> References: <53C56DA2.40402@googlemail.com> Message-ID: On Tue, Jul 15, 2014 at 7:06 PM, Julian Taylor wrote: > hi, > as you may know we want to release numpy 1.9 soon. We should have solved > most indexing regressions the first beta showed. > > The remaining blockers are finishing the new __numpy_ufunc__ feature. > This feature should allow for alternative method to overriding the > behavior of ufuncs from subclasses. > It is described here: > https://github.com/numpy/numpy/blob/master/doc/neps/ufunc-overrides.rst > > The current blocker issues are: > https://github.com/numpy/numpy/issues/4753 > https://github.com/numpy/numpy/pull/4815 > > I'm not to familiar with all the complications of subclassing so I can't > really say how hard this is to solve. > My issue is that it there still seems to be debate on how to handle > operator overriding correctly and I am opposed to releasing a numpy with > yet another experimental feature that may or may not be finished > sometime later. Having datetime in infinite experimental state is bad > enough. > I think nobody is served well if we release 1.9 with the feature > prematurely based on a not representative set of users and the later > after more users showed up see we have to change its behavior. > > So I'm wondering if we should delay the introduction of this feature to > 1.10 or is it important enough to wait until there is a consensus on the > remaining issues? -1 on delaying the release (but you knew I'd say that) I don't have a strong feeling about whether or not we should disable __numpy_ufunc__ for the 1.9 release based on those bugs. They don't seem obviously catastrophic to me, but you make a good point about datetime. I think it's your call as release manager... -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From josef.pktd at gmail.com Thu Jul 17 16:07:03 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 17 Jul 2014 16:07:03 -0400 Subject: [Numpy-discussion] `allclose` vs `assert_allclose` In-Reply-To: References: Message-ID: On Wed, Jul 16, 2014 at 9:52 AM, Nathaniel Smith wrote: > On 16 Jul 2014 10:26, "Tony Yu" wrote: > > > > Is there any reason why the defaults for `allclose` and > `assert_allclose` differ? This makes debugging a broken test much more > difficult. More importantly, using an absolute tolerance of 0 causes > failures for some common cases. For example, if two values are very close > to zero, a test will fail: > > > > np.testing.assert_allclose(0, 1e-14) > > > > Git blame suggests the change was made in the following commit, but I > guess that change only reverted to the original behavior. > > > > > https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6bf > > > > It seems like the defaults for `allclose` and `assert_allclose` should > match, and an absolute tolerance of 0 is probably not ideal. I guess this > is a pretty big behavioral change, but the current default for > `assert_allclose` doesn't seem ideal. > > What you say makes sense to me, and loosening the default tolerances won't > break any existing tests. (And I'm not too worried about people who were > counting on getting 1e-7 instead of 1e-5 or whatever... if it matters that > much to you exactly what tolerance you test, you should be setting the > tolerance explicitly!) I vote that unless someone comes up with some > terrible objection in the next few days then you should submit a PR :-) > If you mean by this to add atol=1e-8 as default, then I'm against it. At least it will change the meaning of many of our tests in statsmodels. I'm using rtol to check for correct 1e-15 or 1e-30, which would be completely swamped if you change the default atol=0. Adding atol=0 to all assert_allclose that currently use only rtol is a lot of work. I think I almost never use a default rtol, but I often leave atol at the default = 0. If we have zeros, then I don't think it's too much work to decide whether this should be atol=1e-20, or 1e-8. Josef > -n > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Jul 17 16:21:33 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 17 Jul 2014 16:21:33 -0400 Subject: [Numpy-discussion] `allclose` vs `assert_allclose` In-Reply-To: References: Message-ID: On Thu, Jul 17, 2014 at 4:07 PM, wrote: > > > > On Wed, Jul 16, 2014 at 9:52 AM, Nathaniel Smith wrote: > >> On 16 Jul 2014 10:26, "Tony Yu" wrote: >> > >> > Is there any reason why the defaults for `allclose` and >> `assert_allclose` differ? This makes debugging a broken test much more >> difficult. More importantly, using an absolute tolerance of 0 causes >> failures for some common cases. For example, if two values are very close >> to zero, a test will fail: >> > >> > np.testing.assert_allclose(0, 1e-14) >> > >> > Git blame suggests the change was made in the following commit, but I >> guess that change only reverted to the original behavior. >> > >> > >> https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6bf >> > >> > It seems like the defaults for `allclose` and `assert_allclose` should >> match, and an absolute tolerance of 0 is probably not ideal. I guess this >> is a pretty big behavioral change, but the current default for >> `assert_allclose` doesn't seem ideal. >> >> What you say makes sense to me, and loosening the default tolerances >> won't break any existing tests. (And I'm not too worried about people who >> were counting on getting 1e-7 instead of 1e-5 or whatever... if it matters >> that much to you exactly what tolerance you test, you should be setting the >> tolerance explicitly!) I vote that unless someone comes up with some >> terrible objection in the next few days then you should submit a PR :-) >> > > If you mean by this to add atol=1e-8 as default, then I'm against it. > > At least it will change the meaning of many of our tests in statsmodels. > > I'm using rtol to check for correct 1e-15 or 1e-30, which would be > completely swamped if you change the default atol=0. > Adding atol=0 to all assert_allclose that currently use only rtol is a lot > of work. > I think I almost never use a default rtol, but I often leave atol at the > default = 0. > > If we have zeros, then I don't think it's too much work to decide whether > this should be atol=1e-20, or 1e-8. > Just to explain, p-values, sf of the distributions are usually accurate at 1e-30 or 1e-50 or something like that. And when we test the tails of the distributions we use that the relative error is small and the absolute error is "tiny". We would need to do a grep to see how many cases there actually are in scipy and statsmodels, before we change it because for some use cases we only get atol 1e-5 or 1e-7 (e.g. nonlinear optimization). Linear algebra is usually atol or rtol 1e-11 to 1e-14 in my cases, AFAIR. Josef > > Josef > > > >> -n >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Jul 17 17:01:36 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 17 Jul 2014 17:01:36 -0400 Subject: [Numpy-discussion] `allclose` vs `assert_allclose` In-Reply-To: References: Message-ID: On Thu, Jul 17, 2014 at 4:21 PM, wrote: > > > > On Thu, Jul 17, 2014 at 4:07 PM, wrote: > >> >> >> >> On Wed, Jul 16, 2014 at 9:52 AM, Nathaniel Smith wrote: >> >>> On 16 Jul 2014 10:26, "Tony Yu" wrote: >>> > >>> > Is there any reason why the defaults for `allclose` and >>> `assert_allclose` differ? This makes debugging a broken test much more >>> difficult. More importantly, using an absolute tolerance of 0 causes >>> failures for some common cases. For example, if two values are very close >>> to zero, a test will fail: >>> >> And one more comment: I debug "broken tests" pretty often. My favorites in pdb are np.max(np.abs(x - y)) and np.max(np.abs(x / y - 1)) to see how much I would have to adjust atol and rtol in assert_allclose in the tests to make them pass, and to decide whether this is an acceptable numerical difference or a bug. allclose doesn't tell me anything and I almost never use it. Josef > > >>> > np.testing.assert_allclose(0, 1e-14) >>> > >>> > Git blame suggests the change was made in the following commit, but I >>> guess that change only reverted to the original behavior. >>> > >>> > >>> https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6bf >>> > >>> > It seems like the defaults for `allclose` and `assert_allclose` >>> should match, and an absolute tolerance of 0 is probably not ideal. I guess >>> this is a pretty big behavioral change, but the current default for >>> `assert_allclose` doesn't seem ideal. >>> >>> What you say makes sense to me, and loosening the default tolerances >>> won't break any existing tests. (And I'm not too worried about people who >>> were counting on getting 1e-7 instead of 1e-5 or whatever... if it matters >>> that much to you exactly what tolerance you test, you should be setting the >>> tolerance explicitly!) I vote that unless someone comes up with some >>> terrible objection in the next few days then you should submit a PR :-) >>> >> >> If you mean by this to add atol=1e-8 as default, then I'm against it. >> >> At least it will change the meaning of many of our tests in statsmodels. >> >> I'm using rtol to check for correct 1e-15 or 1e-30, which would be >> completely swamped if you change the default atol=0. >> Adding atol=0 to all assert_allclose that currently use only rtol is a >> lot of work. >> I think I almost never use a default rtol, but I often leave atol at the >> default = 0. >> >> If we have zeros, then I don't think it's too much work to decide whether >> this should be atol=1e-20, or 1e-8. >> > > Just to explain, p-values, sf of the distributions are usually accurate at > 1e-30 or 1e-50 or something like that. And when we test the tails of the > distributions we use that the relative error is small and the absolute > error is "tiny". > > We would need to do a grep to see how many cases there actually are in > scipy and statsmodels, before we change it because for some use cases we > only get atol 1e-5 or 1e-7 (e.g. nonlinear optimization). > Linear algebra is usually atol or rtol 1e-11 to 1e-14 in my cases, AFAIR. > > Josef > > >> >> Josef >> >> >> >>> -n >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Jul 17 17:05:26 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 17 Jul 2014 14:05:26 -0700 Subject: [Numpy-discussion] String type again. In-Reply-To: References: Message-ID: On Wed, Jul 16, 2014 at 3:48 AM, Todd wrote: > On Jul 16, 2014 11:43 AM, "Chris Barker" wrote: > > So numpy should have dtypes to match these. We're a bit stuck, however, > because 'S' mapped to the py2 string type, which no longer exists in py3. > Sorry not running py3 to see what 'S' does now, but I know it's bit broken, > and may be too late to change it > > In py3 a 'S' dtype is converted to a python bytes object. > right -- thanks. That's the source of the problems. A bit of a higher-level view of the issues at hand. Python has three relevant data types: A unicode type (unicode in py2, str in py3) A one-byte-per-char stringtype (py2 string) A bytes type The big problem is that py2 only has the unicode and py2string types, and py3 only has the unicode and bytes type. numpy has 'S' and 'U' types: which map naturally to the py2string and unicode types. but since py3 has no py2string type, we have a problem. If numpy were to embrace the py3 model, then 'S' should have mapped to py3's string, aka unicode. But: 1) then there would be no bytes type, which is a problem, as people do need to a pass collections of bytes around. I"ve alwyas figured numpy's uint8 should suffice for that, but "strings of bytes" are useful, and it seem to be awkward, or maybe impossible to construct such a beast with the usual dtype machinery 2) there is a need (or at least a desire), to have a compact, one-byte-per-charater text type in numpy. Thinking of it in this framework leads me to the conclusion that numpy should have three types: 1) A unicode type --no change here 2) A bytes types -- almost the current 'S' type - A bytes type would map to/from py3 bytes objects (and py2 bytes objects, which are the same as py2strings) - one way is would differ from a py2str is that there would be no assumption of null-termination (not sure where that is now) 3) A one-byte-per-char text type -- more or less Chuck's current proposal. - it would map to/from the py3 string -- it is text after all - it would be null-terminated - it would have a one-byte per-char encoding: ascii, latin-1 or settable (TBA) It would be nice if numpy had built-in encoding/decoding to/from the unicode type to/from the bytes type (tricky due to not knowing how many bytes a given string will decode to without decoding it.. Which leaves us with the decisions: * what does 'S' map to? - currently it's almost a bytes type, and maps to bytes in py3 -- so maybe keep that status quo. Except that it really doesn't act like text anymore, so 2 to 3 transition is kind of ugly, and the name is misleading. * what encoding to use for the one-byte-per-char-text-type? - I think latin-1 is the way to go -- you could use it like asciii if you want, but if you need a few other characters they are there. And you can even store binary data in it, thought that's a "bad idea" anyway. - ascii would solve common use cases, but I see no reason to restrict folks to 127 characters -- you can use those if you like. If the binary data needs to get passed to something that really needs to be ascii-only, it could be checked at that point. - perhaps the best option is for client code to be able chose an encoding -- but more code, maybe a more confusing interface? worth it? * Do we have a utf-8 type?: I think not -- it simply does not map to both unicode and numpy's fixed-length requirement. If all this gets done, we have some transition issues, but I think it would solve everyone's problems (though maybe not as cleanly as we'd like...). For instance, if someone needs to map numpy arrays to utf-8 data (i.e. HDF5), then they can either use the bytes type and let the user decode, or encode/decode to unicode on i/o. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From charles at crunch.io Thu Jul 17 18:10:14 2014 From: charles at crunch.io (Charles G. Waldman) Date: Thu, 17 Jul 2014 15:10:14 -0700 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: References: <53B9C861.3090809@hawaii.edu> <-2968451659458027190@unknownmsgid> Message-ID: -1 on the 'arr' name. I think if we're going to support this function at all (which I'm not convinced is a good idea), it should be np.fromsomething like the other from* functions. Maybe frommatlab? I think that 'arr' is just too generic and too close to 'array'. On Tue, Jul 15, 2014 at 3:55 AM, Nathaniel Smith wrote: > On Sun, Jul 13, 2014 at 6:31 PM, Alexander Belopolsky > wrote: > >> Also, the use of strings will confuse most syntax highlighters. Compare >> the two options in this screenshot: >> >> [image: Inline image 2] >> > > I guess this is a minor issue for "real" code, but even IPython doesn't > (yet?) provide syntax highlighting for lines as they're typed, and this is > a tool intended mainly for interactive use. > > That screenshot also I think illustrates why people have such a preference > for the first syntax. The second line looks nice, but try typing it quickly > and getting all the commas located correctly inside versus outside of each > of the triply-nested brackets... > > No-one's come up with any names for this that are nearly as good as "arr". > Is it really that bad to have to type one extra character, np.array instead > of np.arr? > > -n > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Screen Shot 2014-07-13 at 1.29.20 PM.png Type: image/png Size: 26129 bytes Desc: not available URL: From tsyu80 at gmail.com Fri Jul 18 00:33:49 2014 From: tsyu80 at gmail.com (Tony Yu) Date: Thu, 17 Jul 2014 23:33:49 -0500 Subject: [Numpy-discussion] `allclose` vs `assert_allclose` In-Reply-To: References: Message-ID: On Wed, Jul 16, 2014 at 1:47 PM, Ralf Gommers wrote: > > > > On Wed, Jul 16, 2014 at 6:37 AM, Tony Yu wrote: > >> Is there any reason why the defaults for `allclose` and `assert_allclose` >> differ? This makes debugging a broken test much more difficult. More >> importantly, using an absolute tolerance of 0 causes failures for some >> common cases. For example, if two values are very close to zero, a test >> will fail: >> >> np.testing.assert_allclose(0, 1e-14) >> >> Git blame suggests the change was made in the following commit, but I >> guess that change only reverted to the original behavior. >> >> >> https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6bf >> > > Indeed, was reverting a change that crept into > https://github.com/numpy/numpy/commit/f527b49a > > >> >> It seems like the defaults for `allclose` and `assert_allclose` should >> match, and an absolute tolerance of 0 is probably not ideal. I guess this >> is a pretty big behavioral change, but the current default for >> `assert_allclose` doesn't seem ideal. >> > > I agree, current behavior quite annoying. It would make sense to change > the atol default to 1e-8, but technically it's a backwards compatibility > break. Would probably have a very minor impact though. Changing the default > for rtol in one of the functions may be much more painful though, I don't > think that should be done. > > Ralf > Thanks for the feedback. I've opened up a PR here: https://github.com/numpy/numpy/pull/4880 Best, -Tony -------------- next part -------------- An HTML attachment was scrubbed... URL: From rhl at astro.princeton.edu Thu Jul 17 09:48:16 2014 From: rhl at astro.princeton.edu (Robert Lupton the Good) Date: Thu, 17 Jul 2014 09:48:16 -0400 Subject: [Numpy-discussion] Numpy BoF at SciPy 2014 - quick report In-Reply-To: References: Message-ID: <3A0037EB-BF51-4943-8E27-EDDFC8F09456@astro.princeton.edu> Having just re-read the PEP I'm concerned that this proposal leaves at least one major (?) trap for naive users, namely x = np.array([1, 10]) print X.T at x which will print 101, not [[1, 10], [10, 100]] Yes, I know why this is happening but it's still a problem -- the user said, "I'm thinking matrices" when they wrote @ but the x.T had done the "wrong" thing before the @ kicked in. And yes, a savvy user would have written x = np.ones([[1, 10]]) (but then np.dot(x, x.T) isn't a scalar). This is the way things are at present, but with the new @ syntax coming in I think we should consider fixing it. I can think of three possibilities: 1. Leave this as a trap for the unwary, and a reason for people to stick to np.matrix (np.matrix([1, 10]) behaves "correctly") 2. Make x.T a syntax error for 1-D arrays. It's a no-op and IMHO a trap. 3. Make x.T promote the shape == (2,) array to (1, 2) and return a (2, 1) array. This may be too magic, but it's my preferred solution. R > Implementation of @ (matrix multiplication) > - will be in 3.5 ~ 18months > - no work started yet -- have to make sure we do it. > - @@ was not added. > - The PEP for numpy is well-defined. Not much thinking to be done. (Good for a sprint) -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 495 bytes Desc: Message signed with OpenPGP using GPGMail URL: From sebastian at sipsolutions.net Fri Jul 18 04:03:59 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 18 Jul 2014 10:03:59 +0200 Subject: [Numpy-discussion] Numpy BoF at SciPy 2014 - quick report In-Reply-To: <3A0037EB-BF51-4943-8E27-EDDFC8F09456@astro.princeton.edu> References: <3A0037EB-BF51-4943-8E27-EDDFC8F09456@astro.princeton.edu> Message-ID: <1405670639.6974.4.camel@sebastian-t440> On Do, 2014-07-17 at 09:48 -0400, Robert Lupton the Good wrote: > Having just re-read the PEP I'm concerned that this proposal leaves at least one major (?) trap for naive users, namely > x = np.array([1, 10]) > print X.T at x > which will print 101, not [[1, 10], [10, 100]] > > Yes, I know why this is happening but it's still a problem -- the user said, "I'm thinking matrices" when they wrote @ but the x.T had done the "wrong" thing before the @ kicked in. And yes, a savvy user would have written x = np.ones([[1, 10]]) (but then np.dot(x, x.T) isn't a scalar). > > This is the way things are at present, but with the new @ syntax coming in I think we should consider fixing it. > > I can think of three possibilities: > 1. Leave this as a trap for the unwary, and a reason for people to stick to np.matrix (np.matrix([1, 10]) behaves "correctly") > 2. Make x.T a syntax error for 1-D arrays. It's a no-op and IMHO a trap. > 3. Make x.T promote the shape == (2,) array to (1, 2) and return a (2, 1) array. This may be too magic, but it's my preferred solution. > Making it a warning may be another option. Changing `.T` to promote to 2-d (also maybe to actually only transpose the last two axes for higher D arrays), could be nice, but getting there might take quite a long FutureWarning or even Error -> new feature cycle... - Sebastian > R > > > Implementation of @ (matrix multiplication) > > - will be in 3.5 ~ 18months > > - no work started yet -- have to make sure we do it. > > - @@ was not added. > > - The PEP for numpy is well-defined. Not much thinking to be done. (Good for a sprint) > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From robert.kern at gmail.com Fri Jul 18 06:31:13 2014 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 18 Jul 2014 11:31:13 +0100 Subject: [Numpy-discussion] Numpy BoF at SciPy 2014 - quick report In-Reply-To: <1405670639.6974.4.camel@sebastian-t440> References: <3A0037EB-BF51-4943-8E27-EDDFC8F09456@astro.princeton.edu> <1405670639.6974.4.camel@sebastian-t440> Message-ID: On Fri, Jul 18, 2014 at 9:03 AM, Sebastian Berg wrote: > On Do, 2014-07-17 at 09:48 -0400, Robert Lupton the Good wrote: >> Having just re-read the PEP I'm concerned that this proposal leaves at least one major (?) trap for naive users, namely >> x = np.array([1, 10]) >> print X.T at x >> which will print 101, not [[1, 10], [10, 100]] >> >> Yes, I know why this is happening but it's still a problem -- the user said, "I'm thinking matrices" when they wrote @ but the x.T had done the "wrong" thing before the @ kicked in. And yes, a savvy user would have written x = np.ones([[1, 10]]) (but then np.dot(x, x.T) isn't a scalar). >> >> This is the way things are at present, but with the new @ syntax coming in I think we should consider fixing it. >> >> I can think of three possibilities: >> 1. Leave this as a trap for the unwary, and a reason for people to stick to np.matrix (np.matrix([1, 10]) behaves "correctly") >> 2. Make x.T a syntax error for 1-D arrays. It's a no-op and IMHO a trap. >> 3. Make x.T promote the shape == (2,) array to (1, 2) and return a (2, 1) array. This may be too magic, but it's my preferred solution. > > Making it a warning may be another option. Changing `.T` to promote to > 2-d (also maybe to actually only transpose the last two axes for higher > D arrays), could be nice, but getting there might take quite a long > FutureWarning or even Error -> new feature cycle... Hmm, just the other day I wrote some code that relies on the current behavior. I was writing a function that could work both on 3-vectors and arrays of 3-vectors. To unpack the input into the separate components, I did: x, y, z = vector.T Which works correctly whether `vector` is shaped (3,) or (N, 3). -- Robert Kern From njs at pobox.com Fri Jul 18 06:33:00 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 18 Jul 2014 11:33:00 +0100 Subject: [Numpy-discussion] String type again. In-Reply-To: References: Message-ID: On Thu, Jul 17, 2014 at 10:05 PM, Chris Barker wrote: > A bit of a higher-level view of the issues at hand. > > Python has three relevant data types: > > A unicode type (unicode in py2, str in py3) > A one-byte-per-char stringtype (py2 string) > A bytes type > > The big problem is that py2 only has the unicode and py2string types, and > py3 only has the unicode and bytes type. > > numpy has 'S' and 'U' types: which map naturally to the py2string and > unicode types. > > but since py3 has no py2string type, we have a problem. > > If numpy were to embrace the py3 model, then 'S' should have mapped to py3's > string, aka unicode. > > But: > > 1) then there would be no bytes type, which is a problem, as people do need > to a pass collections of bytes around. I"ve alwyas figured numpy's uint8 > should suffice for that, but "strings of bytes" are useful, and it seem to > be awkward, or maybe impossible to construct such a beast with the usual > dtype machinery > > 2) there is a need (or at least a desire), to have a compact, > one-byte-per-charater text type in numpy. > > Thinking of it in this framework leads me to the conclusion that numpy > should have three types: This sounds pretty reasonable to me. > 1) A unicode type --no change here > > 2) A bytes types -- almost the current 'S' type > - A bytes type would map to/from py3 bytes objects (and py2 bytes > objects, which are the same as py2strings) > - one way is would differ from a py2str is that there would be no > assumption of null-termination (not sure where that is now) AFAICT this is *exactly* the same as the current 'S' type. What differences do you see? > 3) A one-byte-per-char text type -- more or less Chuck's current proposal. > - it would map to/from the py3 string -- it is text after all > - it would be null-terminated Numpy strings types are never null-terminated ATM. They're null-padded, which is slightly different. When storing data in an S5, for instance, strings of length 5 have no nulls appending, strings of length 4 have 1 null appended, strings of length 3 have 2 nulls appended, etc. When reading data out of an S5, then all trailing nulls are stripped. So, they may not be null terminated (if the length of the string exactly matches the length of the dtype), and the strings being stored can contain internal nulls ("foo\x00bar" is fine), but they cannot contain trailing nulls ("foo\x00" will come back as just "foo"). Do you actually care about null-termination specifically? Or did you just mean "it should work like the other ones, which I vaguely remember involves nulls"? ;-) > - it would have a one-byte per-char encoding: ascii, latin-1 or settable > (TBA) Settable is technically very difficult until we redo the dtype machinery to allow parametrized types. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From njs at pobox.com Fri Jul 18 06:37:46 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 18 Jul 2014 11:37:46 +0100 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: References: <53B9C861.3090809@hawaii.edu> <-2968451659458027190@unknownmsgid> Message-ID: On Thu, Jul 17, 2014 at 11:10 PM, Charles G. Waldman wrote: > > -1 on the 'arr' name. I think if we're going to support this function at all (which I'm not convinced is a good idea), it should be np.fromsomething like the other from* functions. > > Maybe frommatlab? > > I think that 'arr' is just too generic and too close to 'array'. Well, it's definitely not a good idea if we name it something like that :-). The whole motivation is to provide a quick way to type 2d arrays interactively, hence the current name "np.mat". (The fact that it happens to match matlab syntax is a nice bonus, because stealing is always better than inventing when it works.) -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From njs at pobox.com Fri Jul 18 06:44:49 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 18 Jul 2014 11:44:49 +0100 Subject: [Numpy-discussion] `allclose` vs `assert_allclose` In-Reply-To: References: Message-ID: On Thu, Jul 17, 2014 at 9:07 PM, wrote: > On Wed, Jul 16, 2014 at 9:52 AM, Nathaniel Smith wrote: >> What you say makes sense to me, and loosening the default tolerances won't >> break any existing tests. (And I'm not too worried about people who were >> counting on getting 1e-7 instead of 1e-5 or whatever... if it matters that >> much to you exactly what tolerance you test, you should be setting the >> tolerance explicitly!) I vote that unless someone comes up with some >> terrible objection in the next few days then you should submit a PR :-) > > If you mean by this to add atol=1e-8 as default, then I'm against it. > > At least it will change the meaning of many of our tests in statsmodels. > > I'm using rtol to check for correct 1e-15 or 1e-30, which would be > completely swamped if you change the default atol=0. > Adding atol=0 to all assert_allclose that currently use only rtol is a lot > of work. > I think I almost never use a default rtol, but I often leave atol at the > default = 0. > > If we have zeros, then I don't think it's too much work to decide whether > this should be atol=1e-20, or 1e-8. This is a compelling use-case, but there are also lots of compelling usecases that want some non-zero atol (i.e., comparing stuff to 0). Saying that allclose is for one of those use cases and assert_allclose is for the other is... not a very felicitious API design, I think. So we really should do *something*. Are there really any cases where you want non-zero atol= that don't involve comparing something against a 'desired' value of zero? It's a little wacky, but I'm wondering if we ought to change the rule (for all versions of allclose) to if desired == 0: tol = atol else: tol = rtol * desired In particular, means that np.allclose(x, 1e-30) would reject x values of 0 or 2e-30, but np.allclose(x, 0) will accept x == 1e-30 or 2e-30. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From josef.pktd at gmail.com Fri Jul 18 07:07:36 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 18 Jul 2014 07:07:36 -0400 Subject: [Numpy-discussion] problems with mailing list ? Message-ID: Are the problems with sending out the messages with the mailing lists? I'm getting some replies without original messages, and in some threads I don't get replies, missing part of the discussions. Josef -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Jul 18 07:38:21 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 18 Jul 2014 07:38:21 -0400 Subject: [Numpy-discussion] `allclose` vs `assert_allclose` In-Reply-To: References: Message-ID: On Thu, Jul 17, 2014 at 4:07 PM, wrote: > > > > On Wed, Jul 16, 2014 at 9:52 AM, Nathaniel Smith wrote: > >> On 16 Jul 2014 10:26, "Tony Yu" wrote: >> > >> > Is there any reason why the defaults for `allclose` and >> `assert_allclose` differ? This makes debugging a broken test much more >> difficult. More importantly, using an absolute tolerance of 0 causes >> failures for some common cases. For example, if two values are very close >> to zero, a test will fail: >> > >> > np.testing.assert_allclose(0, 1e-14) >> > >> > Git blame suggests the change was made in the following commit, but I >> guess that change only reverted to the original behavior. >> > >> > >> https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6bf >> > >> > It seems like the defaults for `allclose` and `assert_allclose` should >> match, and an absolute tolerance of 0 is probably not ideal. I guess this >> is a pretty big behavioral change, but the current default for >> `assert_allclose` doesn't seem ideal. >> >> What you say makes sense to me, and loosening the default tolerances >> won't break any existing tests. (And I'm not too worried about people who >> were counting on getting 1e-7 instead of 1e-5 or whatever... if it matters >> that much to you exactly what tolerance you test, you should be setting the >> tolerance explicitly!) I vote that unless someone comes up with some >> terrible objection in the next few days then you should submit a PR :-) >> > > If you mean by this to add atol=1e-8 as default, then I'm against it. > > At least it will change the meaning of many of our tests in statsmodels. > > I'm using rtol to check for correct 1e-15 or 1e-30, which would be > completely swamped if you change the default atol=0. > Adding atol=0 to all assert_allclose that currently use only rtol is a lot > of work. > I think I almost never use a default rtol, but I often leave atol at the > default = 0. > > If we have zeros, then I don't think it's too much work to decide whether > this should be atol=1e-20, or 1e-8. > copied from http://mail.scipy.org/pipermail/numpy-discussion/2014-July/070639.html since I didn't get any messages here This is a compelling use-case, but there are also lots of compelling usecases that want some non-zero atol (i.e., comparing stuff to 0). Saying that allclose is for one of those use cases and assert_allclose is for the other is... not a very felicitious API design, I think. So we really should do *something*. Are there really any cases where you want non-zero atol= that don't involve comparing something against a 'desired' value of zero? It's a little wacky, but I'm wondering if we ought to change the rule (for all versions of allclose) to if desired == 0: tol = atol else: tol = rtol * desired In particular, means that np.allclose(x, 1e-30) would reject x values of 0 or 2e-30, but np.allclose(x, 0) will accept x == 1e-30 or 2e-30. -n That's much too confusing. I don't know what the usecases for np.allclose are since I don't have any. assert_allclose is one of our (statsmodels) most frequently used numpy function this is not informative: `np.allclose(x, 1e-30)` since there are keywords either np.assert_allclose(x, atol=1e-30) if I want to be "close" to zero or np.assert_allclose(x, rtol=1e-11, atol=1e-25) if we have a mix of large numbers and "zeros" in an array. Making the behavior of assert_allclose depending on whether desired is exactly zero or 1e-20 looks too difficult to remember, and which desired I use would depend on what I get out of R or Stata. atol=1e-8 is not close to zero in most cases in my experience. The numpy.testing assert functions are some of the most useful functions in numpy, and heavily used "code". Josef > > Josef > > > >> -n >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Jul 18 07:41:57 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 18 Jul 2014 07:41:57 -0400 Subject: [Numpy-discussion] `allclose` vs `assert_allclose` In-Reply-To: References: Message-ID: On Thu, Jul 17, 2014 at 11:37 AM, Nathaniel Smith wrote: > On Wed, Jul 16, 2014 at 7:47 PM, Ralf Gommers > wrote: > > > > On Wed, Jul 16, 2014 at 6:37 AM, Tony Yu wrote: > >> It seems like the defaults for `allclose` and `assert_allclose` should > >> match, and an absolute tolerance of 0 is probably not ideal. I guess > this is > >> a pretty big behavioral change, but the current default for > >> `assert_allclose` doesn't seem ideal. > > > > I agree, current behavior quite annoying. It would make sense to change > the > > atol default to 1e-8, but technically it's a backwards compatibility > break. > > Would probably have a very minor impact though. Changing the default for > > rtol in one of the functions may be much more painful though, I don't > think > > that should be done. > > Currently we have: > > allclose: rtol=1e-5, atol=1e-8 > assert_allclose: rtol=1e-7, atol=0 > > Why would it be painful to change assert_allclose to match allclose? > It would weaken some tests, but no code would break. > We might break our code, if suddenly our test suite doesn't do what it is supposed to do. (rough guess: 40% of the statsmodels code are unit tests.) Josef > > -n > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jul 18 09:18:59 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 18 Jul 2014 07:18:59 -0600 Subject: [Numpy-discussion] Numpy BoF at SciPy 2014 - quick report In-Reply-To: <1405670639.6974.4.camel@sebastian-t440> References: <3A0037EB-BF51-4943-8E27-EDDFC8F09456@astro.princeton.edu> <1405670639.6974.4.camel@sebastian-t440> Message-ID: On Fri, Jul 18, 2014 at 2:03 AM, Sebastian Berg wrote: > On Do, 2014-07-17 at 09:48 -0400, Robert Lupton the Good wrote: > > Having just re-read the PEP I'm concerned that this proposal leaves at > least one major (?) trap for naive users, namely > > x = np.array([1, 10]) > > print X.T at x > > which will print 101, not [[1, 10], [10, 100]] > > > > Yes, I know why this is happening but it's still a problem -- the user > said, "I'm thinking matrices" when they wrote @ but the x.T had done the > "wrong" thing before the @ kicked in. And yes, a savvy user would have > written x = np.ones([[1, 10]]) (but then np.dot(x, x.T) isn't a scalar). > > > > This is the way things are at present, but with the new @ syntax coming > in I think we should consider fixing it. > > > > I can think of three possibilities: > > 1. Leave this as a trap for the unwary, and a reason for people to > stick to np.matrix (np.matrix([1, 10]) behaves "correctly") > > 2. Make x.T a syntax error for 1-D arrays. It's a no-op and IMHO > a trap. > > 3. Make x.T promote the shape == (2,) array to (1, 2) and return a > (2, 1) array. This may be too magic, but it's my preferred solution. > > > > Making it a warning may be another option. Changing `.T` to promote to > 2-d (also maybe to actually only transpose the last two axes for higher > D arrays), could be nice, but getting there might take quite a long > FutureWarning or even Error -> new feature cycle... > I've toyed some with the idea of adding a flag bit for transpose of 1-d arrays. It would flip with every transpose and be ignored for non 1-d arrays. A bit of a hack, but would allow for a column/row vector distinction. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From andy.terrel at gmail.com Fri Jul 18 09:51:04 2014 From: andy.terrel at gmail.com (Andy Ray Terrel) Date: Fri, 18 Jul 2014 09:51:04 -0400 Subject: [Numpy-discussion] problems with mailing list ? In-Reply-To: References: Message-ID: Yes I've filed a ticket with Enthought. On Fri, Jul 18, 2014 at 7:07 AM, wrote: > Are the problems with sending out the messages with the mailing lists? > > I'm getting some replies without original messages, and in some threads I > don't get replies, missing part of the discussions. > > > Josef > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From charles at crunch.io Fri Jul 18 10:02:06 2014 From: charles at crunch.io (Charles G. Waldman) Date: Fri, 18 Jul 2014 07:02:06 -0700 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: References: <53B9C861.3090809@hawaii.edu> <-2968451659458027190@unknownmsgid> Message-ID: I greatly prefer "np.mat" to "np.arr" for this, FWIW On Fri, Jul 18, 2014 at 3:37 AM, Nathaniel Smith wrote: > On Thu, Jul 17, 2014 at 11:10 PM, Charles G. Waldman wrote: >> >> -1 on the 'arr' name. I think if we're going to support this function at all (which I'm not convinced is a good idea), it should be np.fromsomething like the other from* functions. >> >> Maybe frommatlab? >> >> I think that 'arr' is just too generic and too close to 'array'. > > Well, it's definitely not a good idea if we name it something like that :-). > > The whole motivation is to provide a quick way to type 2d arrays > interactively, hence the current name "np.mat". (The fact that it > happens to match matlab syntax is a nice bonus, because stealing is > always better than inventing when it works.) > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From aldcroft at head.cfa.harvard.edu Fri Jul 18 10:04:22 2014 From: aldcroft at head.cfa.harvard.edu (Aldcroft, Thomas) Date: Fri, 18 Jul 2014 10:04:22 -0400 Subject: [Numpy-discussion] String type again. In-Reply-To: References: Message-ID: On Thu, Jul 17, 2014 at 11:52 AM, Nathaniel Smith wrote: > On Tue, Jul 15, 2014 at 7:40 PM, Aldcroft, Thomas > wrote: > > > > On Sat, Jul 12, 2014 at 8:02 PM, Nathaniel Smith wrote: > >> > >> OTOH, fixed length nul padded latin1 would be useful for various flat > file > >> reading tasks. > > > > As one of the original agitators for this, let me re-iterate that what > the > > astronomical community *really* wants is the original proposal as > described > > by Chris Barker [1] and essentially what Charles said. We have large > data > > archives that have ASCII string data in binary formats like FITS and > HDF5. > > The current readers for those datasets present users with numpy S data > > types, which in Python 3 cannot be compared to str (unicode) literals. > In > > many cases those datasets are large, and in my case I regularly deal with > > multi-Gb sized bytestring arrays. Converting those to a U dtype is not > > practical. > > This is feedback is *super* useful, thanks. Can you elaborate a bit > more on your requirements? > > I get that: > - You have data that is treated as text, so it is convenient to be > able to use Python strings for things like equality tests, np.sum(arr > == "green") etc. > - Your data uses only ASCII characters, and you don't want to spend > more than 1 byte of memory per character. > > Do you ever have 8 bit characters, and if so, what encoding do you use? > No. > > Does it matter to you that the memory layout for these 1-byte-per-char > strings remain fixed-width nul-padded concatenated strings (e.g., > because you are mmap'ing files that have this format)? Or do FITS/HDF5 > handle layout details internally and you don't care so long as the > above requirements are met? > Yes, memory layout matters since mmap'ing files is a key feature in FITS. > > Does the fixed-width nature of numpy strings cause problems in the > above setting? > No. In particular FITS is ubiquitous as the binary data transport format in astronomy, and it specifies fixed width strings, so fixed width in numpy is a good thing in this case. More generally legacy (or even modern high-performance) Fortran / C will commonly handle string arrays as arrays of fixed width characters. In the majority of cases these codes (that I'm aware of) know nothing about unicode. This all works transparently with Python 2 + Numpy, so the goal is to have that same "it just works" capability in Python 3 with minimal code changes. Thanks, Tom > > -n > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Fri Jul 18 10:13:53 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Fri, 18 Jul 2014 16:13:53 +0200 Subject: [Numpy-discussion] proposal: new commit guidelines for backportable bugfixes Message-ID: hi, I have been doing a lot of backporting for the last few bugfix releases and noticed that our current approach committing to master and cherrypicking is not so good for the git history. When cherry picking a bugfix from master to a maintenance branch both branches contain a commit with the same content and git knows of no relation between them. This causes unnecessary merge conflicts when cherry picking two changes that modify the same file. The git version (1.9.1) I am using is not smart enough too figure out the changesets in both leaf commits is the same. Additionally the output of `git log maintenance/1.9.x..master` becomes very large as all already backported issues appear again in master. [0] To help with this I want to propose new best practices for pull requests of bugfixes suitable for backporting. Instead of basing the bugfix on the head commit of the master, base them on the merge base between master and the latest maintenance branch. This allows merging the PR into both master and the maintenance branch without pulling in any extra changes from either branches. Then both branches contain the same commit and gits automerging can work better and git log will only show you the commits that are only really on one branch or the other. In practice this is very simple. You can still develop your bugfix on master but before you push it you just run: git rebase --onto $(git merge-base master maintenance/1.9.x) HEAD^ In most bugfix PRs this should work without conflict as they should be relatively small. If you get a merge conflict during this operation, just do git rebase --abort and do a normal pull request, in that case the backporter should worry about the conflict. Does this sound like a reasonable procedure? Cheers, Julian [0] git cherry is supposed to help with that, but it never really worked properly for me From jtaylor.debian at googlemail.com Fri Jul 18 11:10:53 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Fri, 18 Jul 2014 17:10:53 +0200 Subject: [Numpy-discussion] String type again. In-Reply-To: References: <1405423590.8281.7.camel@sebastian-t440> Message-ID: On Thu, Jul 17, 2014 at 5:48 PM, Nathaniel Smith wrote: > On Tue, Jul 15, 2014 at 4:29 PM, Charles R Harris > wrote: >> Thinking more about it, the easiest thing to do might be to make the S dtype >> a UTF-8 encoding. Most of the machinery to deal with that is already in >> place. That change might affect some users though, and we might need to do >> some work to make it backwards compatible with python 2. > > I'd be very concerned about backcompat for existing code that uses > e.g. "S128" as a dtype to mean "128 arbitrary bytes". An example is > this file format reading code: > https://github.com/rerpy/rerpy/blob/master/rerpy/io/erpss.py#L123 > The file format says there are 128 bytes there, and their > interpretation depends on other fields in the header -- but in one > case, for "large montages", there's an encoding where every 3 bytes > represents 4 characters using an ad hoc 6-bit character set: > https://github.com/rerpy/rerpy/blob/master/rerpy/io/erpss.py#L133 > > Perhaps this case could be handled better by using a u8 subarray or > something (that code also goes to some efforts to work around nul > padding), and that particular project hasn't been ported to py3 yet so > technically wouldn't be affected if we changed the meaning of "S" on > py3. But it does seem useful to have a "fixed length bytes" dtype even > in py3, and if we declare that be "S" then it avoids breaking any > existing code depending on it... > We break code either way. Either we break applications using S as string type, but now it becomes bytes in python3. Or we break applications treating S as byte type and we change it to string in python3. Unfortunately we missed the opportunity when adding python3 support to fix the same exact same bytes/text boundary issue which is the main reason why pythons3 exists in the first place. We should have made porting to numpy3 a intentionally(!) backward incompatible change just like python itself did. Now we are stuck with deciding, which option breaks less. On the one hand, that S is bytes in python3 is somewhat established by now and lots of workarounds are already place. On the other hand, I think code that relies on S being bytes is in the minority and python3 usage is probably still insignificant in this area. Unfortunately getting actual numbers and not wild guesses on this is probably not easy. From charlesr.harris at gmail.com Fri Jul 18 11:20:31 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 18 Jul 2014 09:20:31 -0600 Subject: [Numpy-discussion] __numpy_ufunc__ In-Reply-To: References: Message-ID: On Wed, Jul 16, 2014 at 12:53 PM, Ralf Gommers wrote: > > > > On Wed, Jul 16, 2014 at 10:07 AM, Nathaniel Smith wrote: > >> Weirdly, I never received Chuck's original email in this thread. Should >> some list admin be informed? >> > Also weirdly, my reply didn't show up on gmane. Not sure if it got > through, so re-sending: > > It's already in, so do you mean not using? Would help to know what the > issue is, because it's finished enough that it's already used in a released > version of scipy (in sparse matrices). > My own feeling is that we should leave it in as it is fairly useable and just needs to have some problematic case worked out. The fact that scipy already uses it is a strong argument to keep it in. I think Julian's concern is that they won't be worked out. Julian has started another thread on the topic and that is probably where the conversation should continue. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jul 18 11:38:02 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 18 Jul 2014 09:38:02 -0600 Subject: [Numpy-discussion] proposal: new commit guidelines for backportable bugfixes In-Reply-To: References: Message-ID: On Fri, Jul 18, 2014 at 8:13 AM, Julian Taylor < jtaylor.debian at googlemail.com> wrote: > hi, > I have been doing a lot of backporting for the last few bugfix > releases and noticed that our current approach committing to master > and cherrypicking is not so good for the git history. > When cherry picking a bugfix from master to a maintenance branch both > branches contain a commit with the same content and git knows of no > relation between them. This causes unnecessary merge conflicts when > cherry picking two changes that modify the same file. The git version > (1.9.1) I am using is not smart enough too figure out the changesets > in both leaf commits is the same. > Additionally the output of `git log maintenance/1.9.x..master` becomes > very large as all already backported issues appear again in master. > [0] > > To help with this I want to propose new best practices for pull > requests of bugfixes suitable for backporting. > Instead of basing the bugfix on the head commit of the master, base > them on the merge base between master and the latest maintenance > branch. > This allows merging the PR into both master and the maintenance branch > without pulling in any extra changes from either branches. > Then both branches contain the same commit and gits automerging can > work better and git log will only show you the commits that are only > really on one branch or the other. > > In practice this is very simple. You can still develop your bugfix on > master but before you push it you just run: > > git rebase --onto $(git merge-base master maintenance/1.9.x) HEAD^ > > In most bugfix PRs this should work without conflict as they should be > relatively small. > If you get a merge conflict during this operation, just do git rebase > --abort and do a normal pull request, in that case the backporter > should worry about the conflict. > > Does this sound like a reasonable procedure? > Cheers, > Julian > > [0] git cherry is supposed to help with that, but it never really > worked properly for me > Arrived here promptly. This looks OK to me, but with the understanding that a number of folks won't know what is going on. It should be documented in doc/source/dev/gitwash/development_workflow.rst and perhaps a command alias in .git/config would help, something like npyrebase, or hopefully something better ;) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From derek at astro.physik.uni-goettingen.de Fri Jul 18 08:09:31 2014 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Fri, 18 Jul 2014 14:09:31 +0200 Subject: [Numpy-discussion] problems with mailing list ? In-Reply-To: References: Message-ID: <6B2A9899-AB3E-40E5-AB43-BD7A2DC07D8A@astro.physik.uni-goettingen.de> On 18 Jul 2014, at 01:07 pm, josef.pktd at gmail.com wrote: > Are the problems with sending out the messages with the mailing lists? > > I'm getting some replies without original messages, and in some threads I don't get replies, missing part of the discussions. > There seem to be problems with the Scipy list server; my last mails to astropy at scipy.org have taken 12-18 hours before they made it to the list, and some people here reported messages staying in the void for several days. But I think it?s been reported to Enthought already. Derek From charlesr.harris at gmail.com Fri Jul 18 11:57:33 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 18 Jul 2014 09:57:33 -0600 Subject: [Numpy-discussion] __numpy_ufunc__ and 1.9 release In-Reply-To: <53C6B35D.9020609@iki.fi> References: <53C56DA2.40402@googlemail.com> <53C6B35D.9020609@iki.fi> Message-ID: On Wed, Jul 16, 2014 at 11:16 AM, Pauli Virtanen wrote: > Hi, > > 15.07.2014 21:06, Julian Taylor kirjoitti: > [clip: __numpy_ufunc__] > > So I'm wondering if we should delay the introduction of this > > feature to 1.10 or is it important enough to wait until there is a > > consensus on the remaining issues? > > My 10c: > > The feature is not so much in hurry that it alone should delay 1.9. > Moreover, it's best for everyone that it is bug-free on the first go, > and it gets some real-world testing before the release. Better safe than > sorry. > > I'd pull it out from 1.9.x branch, and iron out the remaining wrinkles > before 1.10. > Thanks Pauli, your opinion on the matter is what I needed to see and I'll take it as dispositive. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From aldcroft at head.cfa.harvard.edu Fri Jul 18 12:06:57 2014 From: aldcroft at head.cfa.harvard.edu (Aldcroft, Thomas) Date: Fri, 18 Jul 2014 12:06:57 -0400 Subject: [Numpy-discussion] String type again. In-Reply-To: References: <1405423590.8281.7.camel@sebastian-t440> Message-ID: On Fri, Jul 18, 2014 at 11:10 AM, Julian Taylor < jtaylor.debian at googlemail.com> wrote: > On Thu, Jul 17, 2014 at 5:48 PM, Nathaniel Smith wrote: > > On Tue, Jul 15, 2014 at 4:29 PM, Charles R Harris > > wrote: > >> Thinking more about it, the easiest thing to do might be to make the S > dtype > >> a UTF-8 encoding. Most of the machinery to deal with that is already in > >> place. That change might affect some users though, and we might need to > do > >> some work to make it backwards compatible with python 2. > > > > I'd be very concerned about backcompat for existing code that uses > > e.g. "S128" as a dtype to mean "128 arbitrary bytes". An example is > > this file format reading code: > > https://github.com/rerpy/rerpy/blob/master/rerpy/io/erpss.py#L123 > > The file format says there are 128 bytes there, and their > > interpretation depends on other fields in the header -- but in one > > case, for "large montages", there's an encoding where every 3 bytes > > represents 4 characters using an ad hoc 6-bit character set: > > https://github.com/rerpy/rerpy/blob/master/rerpy/io/erpss.py#L133 > > > > Perhaps this case could be handled better by using a u8 subarray or > > something (that code also goes to some efforts to work around nul > > padding), and that particular project hasn't been ported to py3 yet so > > technically wouldn't be affected if we changed the meaning of "S" on > > py3. But it does seem useful to have a "fixed length bytes" dtype even > > in py3, and if we declare that be "S" then it avoids breaking any > > existing code depending on it... > > > > We break code either way. > Either we break applications using S as string type, but now it > becomes bytes in python3. > Or we break applications treating S as byte type and we change it to > string in python3. > > Unfortunately we missed the opportunity when adding python3 support to > fix the same exact same bytes/text boundary issue which is the main > reason why pythons3 exists in the first place. > We should have made porting to numpy3 a intentionally(!) backward > incompatible change just like python itself did. > > Now we are stuck with deciding, which option breaks less. > On the one hand, that S is bytes in python3 is somewhat established by > now and lots of workarounds are already place. > Removing workarounds is generally a good thing (!), and often not that hard to do by numpy version number for libraries that need to support multiple numpy versions. It's never ideal to break compatibility, but in this case it would be fixing something that is currently not working in a useful way. - Tom > On the other hand, I think code that relies on S being bytes is in the > minority and python3 usage is probably still insignificant in this > area. Unfortunately getting actual numbers and not wild guesses on > this is probably not easy. _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Fri Jul 18 12:07:45 2014 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 18 Jul 2014 19:07:45 +0300 Subject: [Numpy-discussion] String type again. In-Reply-To: References: <1405423590.8281.7.camel@sebastian-t440> Message-ID: <53C94651.4040805@iki.fi> 18.07.2014 18:10, Julian Taylor kirjoitti: [clip] > We break code either way. Either we break applications using S as > string type, but now it becomes bytes in python3. Or we break > applications treating S as byte type and we change it to string in > python3. > > Unfortunately we missed the opportunity when adding python3 support > to fix the same exact same bytes/text boundary issue which is the > main reason why pythons3 exists in the first place. We should have > made porting to numpy3 a intentionally(!) backward incompatible > change just like python itself did. > > Now we are stuck with deciding, which option breaks less. On the > one hand, that S is bytes in python3 is somewhat established by now > and lots of workarounds are already place. On the other hand, I > think code that relies on S being bytes is in the minority and > python3 usage is probably still insignificant in this area. > Unfortunately getting actual numbers and not wild guesses on this > is probably not easy. One way to try this out is to change the meaning of 'S' and see how badly e.g. pandas or matplotlib break on py3 as a consequence. Another approach would be to add a new 1-byte unicode as a type code different from 'S'. The automatic ASCII encoding in constructor/assignment on Py3 can be deprecated, which would make 'S' a strict bytes dtype. This also is not perfect, since array(['foo']) on Py2 should for backward compatibility continue returning dtype='S'. Moreover, already existing code does not make use of it. -- Pauli Virtanen From chris.barker at noaa.gov Fri Jul 18 12:10:00 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 18 Jul 2014 09:10:00 -0700 Subject: [Numpy-discussion] String type again. In-Reply-To: References: <1405423590.8281.7.camel@sebastian-t440> Message-ID: On Thu, Jul 17, 2014 at 8:48 AM, Nathaniel Smith wrote: > I'd be very concerned about backcompat for existing code that uses > e.g. "S128" as a dtype to mean "128 arbitrary bytes". yup -- 'S' matches teh py2 string well, which is BOTH text and bytes. That should not change -- at least in py2. > An example is > this file format reading code: > https://github.com/rerpy/rerpy/blob/master/rerpy/io/erpss.py#L123 > The file format says there are 128 bytes there, and their > interpretation depends on other fields in the header -- but in one > case, for "large montages", there's an encoding where every 3 bytes > represents 4 characters using an ad hoc 6-bit character set: > https://github.com/rerpy/rerpy/blob/master/rerpy/io/erpss.py#L133 > > Perhaps this case could be handled better by using a u8 subarray or > something (that code also goes to some efforts to work around nul padding), yes -- that might have been better, though I have not been successful at figuring out how to spell a dtype that works well -- hence my suggestion that we have a bytes type. > and that particular project hasn't been ported to py3 yet so > technically wouldn't be affected if we changed the meaning of "S" on > py3. But it does seem useful to have a "fixed length bytes" dtype even > in py3, and if we declare that be "S" then it avoids breaking any > existing code depending on it... > sure, but having 'S' be bytes does break other code that depends on it being a text type. Unfortunately, py2 mingled text and bytes, numpy mirrored that, so there is no completely backward compatible way to go forward. But for some guidance -- text is the big issue with py2 <-> p3 migration, so folks are presumable going to expect things to change with numpy text handling as well. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Fri Jul 18 12:11:06 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Fri, 18 Jul 2014 12:11:06 -0400 Subject: [Numpy-discussion] Numpy BoF at SciPy 2014 - quick report In-Reply-To: <1405670639.6974.4.camel@sebastian-t440> References: <3A0037EB-BF51-4943-8E27-EDDFC8F09456@astro.princeton.edu> <1405670639.6974.4.camel@sebastian-t440> Message-ID: <53C9471A.3010805@gmail.com> On 7/18/2014 4:03 AM, Sebastian Berg wrote: > Changing `.T` to promote to > 2-d (also maybe to actually only transpose the last two axes for higher > D arrays), could be nice, but getting there might take quite a long > FutureWarning or even Error -> new feature cycle. Considering the extent of implied breakage, I hope this will not be considered. Also, there are already nice ways to add an axis (even optionally, with `atleast_2d`). I think having `.T` as a no-op on a 1d array is correct behavior. I would not change it. However I can understand preferring an error. (Mathematica considers it an error.) Alan Isaac From chris.barker at noaa.gov Fri Jul 18 12:15:35 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 18 Jul 2014 09:15:35 -0700 Subject: [Numpy-discussion] String type again. In-Reply-To: References: Message-ID: On Fri, Jul 18, 2014 at 3:33 AM, Nathaniel Smith wrote: > > 2) A bytes types -- almost the current 'S' type > > - A bytes type would map to/from py3 bytes objects (and py2 bytes > > objects, which are the same as py2strings) > > - one way is would differ from a py2str is that there would be no > > assumption of null-termination (not sure where that is now) > > AFAICT this is *exactly* the same as the current 'S' type. What > differences do you see? as you mention it, it is the same on py3, except maybe handling of null bytes -- you mentioned that you had to do some work-arounds for that. a proper bytes type would do nothing special with null bytes. > > 3) A one-byte-per-char text type -- more or less Chuck's current > proposal. > > - it would map to/from the py3 string -- it is text after all > > - it would be null-terminated > > Numpy strings types are never null-terminated ATM. They're > null-padded, which is slightly different. When storing data in an S5, > for instance, strings of length 5 have no nulls appending, strings of > length 4 have 1 null appended, strings of length 3 have 2 nulls > appended, etc. When reading data out of an S5, then all trailing nulls > are stripped. > > So, they may not be null terminated (if the length of the string > exactly matches the length of the dtype), and the strings being stored > can contain internal nulls ("foo\x00bar" is fine), but they cannot > contain trailing nulls ("foo\x00" will come back as just "foo"). > > Do you actually care about null-termination specifically? Or did you > just mean "it should work like the other ones, which I vaguely > remember involves nulls"? ;-) > That's pretty much what I meant, yes ;-) But the key is that when pushing one of these things to a python string, any thing after a null byte is ignored. Which is why you can't use it for arbitrary bytes. > - it would have a one-byte per-char encoding: ascii, latin-1 or > settable > > (TBA) > > Settable is technically very difficult until we redo the dtype > machinery to allow parametrized types. indeed -- we have that a bit with Datetime -- but that's a whole other kettle of fish. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Jul 18 12:23:40 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 18 Jul 2014 17:23:40 +0100 Subject: [Numpy-discussion] proposal: new commit guidelines for backportable bugfixes In-Reply-To: References: Message-ID: On 18 Jul 2014 15:36, "Julian Taylor" wrote: > > git rebase --onto $(git merge-base master maintenance/1.9.x) HEAD^ As a potential refinement, this might be simpler if we define a branch that points to this commit. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Fri Jul 18 12:30:04 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Fri, 18 Jul 2014 18:30:04 +0200 Subject: [Numpy-discussion] proposal: new commit guidelines for backportable bugfixes In-Reply-To: References: Message-ID: On Fri, Jul 18, 2014 at 5:38 PM, Charles R Harris wrote: > > > On Fri, Jul 18, 2014 at 8:13 AM, Julian Taylor > wrote: >> >> hi, >> I have been doing a lot of backporting for the last few bugfix >> releases and noticed that our current approach committing to master >> and cherrypicking is not so good for the git history. >> When cherry picking a bugfix from master to a maintenance branch both >> branches contain a commit with the same content and git knows of no >> relation between them. This causes unnecessary merge conflicts when >> cherry picking two changes that modify the same file. The git version >> (1.9.1) I am using is not smart enough too figure out the changesets >> in both leaf commits is the same. >> Additionally the output of `git log maintenance/1.9.x..master` becomes >> very large as all already backported issues appear again in master. >> [0] >> >> To help with this I want to propose new best practices for pull >> requests of bugfixes suitable for backporting. >> Instead of basing the bugfix on the head commit of the master, base >> them on the merge base between master and the latest maintenance >> branch. >> This allows merging the PR into both master and the maintenance branch >> without pulling in any extra changes from either branches. >> Then both branches contain the same commit and gits automerging can >> work better and git log will only show you the commits that are only >> really on one branch or the other. >> >> In practice this is very simple. You can still develop your bugfix on >> master but before you push it you just run: >> >> git rebase --onto $(git merge-base master maintenance/1.9.x) HEAD^ >> >> In most bugfix PRs this should work without conflict as they should be >> relatively small. >> If you get a merge conflict during this operation, just do git rebase >> --abort and do a normal pull request, in that case the backporter >> should worry about the conflict. >> >> Does this sound like a reasonable procedure? >> Cheers, >> Julian >> >> [0] git cherry is supposed to help with that, but it never really >> worked properly for me > > > Arrived here promptly. This looks OK to me, but with the understanding that > a number of folks won't know what is going on. It should be documented in > doc/source/dev/gitwash/development_workflow.rst and perhaps a command alias > in .git/config would help, something like npyrebase, or hopefully something > better ;) > > Chuck > Yes of course I would document it when its ok for everyone. I do not want that this inconveniences contributors, maybe we can just ask for it if extra changes are required for the PR anyway. I would just like that the people who merge PR's (which are currently just a handful) try to use this method when the PR is applicable for a maintenance branch. We can add a small tool that does what does the rebase, merges to both master and the branch and closes the PR, something like: tools/merge-backport-pr #pr-number From andrew.collette at gmail.com Fri Jul 18 12:32:24 2014 From: andrew.collette at gmail.com (Andrew Collette) Date: Fri, 18 Jul 2014 10:32:24 -0600 Subject: [Numpy-discussion] String type again. In-Reply-To: <-4597269384285942771@unknownmsgid> References: <-4597269384285942771@unknownmsgid> Message-ID: Hi Chris, >> A Latin-1 based 'a' type >> would have similar problems. > > Maybe not -- latin1 is fixed width. Yes, Latin-1 is fixed width, but the issue is that when writing to a fixed-width UTF8 string in HDF5, it will expand, possibly losing data. What I would like to avoid is a situation where a user writes a 10-byte string from NumPy into a 10-byte space in an HDF5 dataset, and unexpectedly loses the last few characters because of the encoding mismatch. People are used to truncation when e.g. storing a 20-byte string in a 10-byte dataset, but it's surprising when the source and destination are the same size. :) In any case, I certainly agree NumPy shouldn't be limited by the capabilities of HDF5. There are other valuable use cases, including access to the high-bit characters Latin-1 provides. But from a strict compatibility standpoint, ASCII would be beneficial. Andrew From chris.barker at noaa.gov Fri Jul 18 12:33:32 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 18 Jul 2014 09:33:32 -0700 Subject: [Numpy-discussion] String type again. In-Reply-To: <53C94651.4040805@iki.fi> References: <1405423590.8281.7.camel@sebastian-t440> <53C94651.4040805@iki.fi> Message-ID: On Fri, Jul 18, 2014 at 9:07 AM, Pauli Virtanen wrote: > Another approach would be to add a new 1-byte unicode you can't do unicode in 1-byte -- so what does this mean, exactly? > This also is not perfect, since array(['foo']) on Py2 should for > backward compatibility continue returning dtype='S'. yup. but we may be OK -- as "bytes" in py2 is the same as string anyway. But what do we do with null bytes? when going from 'S' to py2 string? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Fri Jul 18 12:35:31 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Fri, 18 Jul 2014 18:35:31 +0200 Subject: [Numpy-discussion] proposal: new commit guidelines for backportable bugfixes In-Reply-To: References: Message-ID: On Fri, Jul 18, 2014 at 6:23 PM, Nathaniel Smith wrote: > On 18 Jul 2014 15:36, "Julian Taylor" wrote: >> >> git rebase --onto $(git merge-base master maintenance/1.9.x) HEAD^ > > As a potential refinement, this might be simpler if we define a branch that > points to this commit. > we could do that, though the merge base changes to the last commit that was merged in that way. The old merge base is still valid but much older. I applied this method to some of my bugfixes so the current merge base of master and 1.9 is a commit from yesterday not anymore the diverging point of master and 1.9. But I don't know if the newer merge base makes any difference to git. From markperrymiller at gmail.com Fri Jul 18 12:45:52 2014 From: markperrymiller at gmail.com (Mark Miller) Date: Fri, 18 Jul 2014 09:45:52 -0700 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: References: <53B9C861.3090809@hawaii.edu> <-2968451659458027190@unknownmsgid> Message-ID: On Fri, Jul 18, 2014 at 3:37 AM, Nathaniel Smith wrote: > On Thu, Jul 17, 2014 at 11:10 PM, Charles G. Waldman > wrote: > > > > -1 on the 'arr' name. I think if we're going to support this function > at all (which I'm not convinced is a good idea), it should be > np.fromsomething like the other from* functions. > > > > Maybe frommatlab? > > > > I think that 'arr' is just too generic and too close to 'array'. > > Well, it's definitely not a good idea if we name it something like that > :-). > > The whole motivation is to provide a quick way to type 2d arrays > interactively, hence the current name "np.mat". (The fact that it > happens to match matlab syntax is a nice bonus, because stealing is > always better than inventing when it works.) > > Some minor confusion on my part. If the true goal is to just allow quick entry of a 2d array, why not just advocate using a = numpy.array(numpy.mat("1 2 3; 4 5 6; 7 8 9")) If anyone is really set on having this functionality, they could just write a one-line wrapper function and call it a day. Note that I would personally not use this type of shorthand syntax for teaching or presentations. I'd prefer to use proper python syntax myself from the get go rather than having to start over from square one and teach a completely different syntax for constructing >2d arrays. "There should be one-- and preferably only one --obvious way to do it." -Zen of Python -Mark _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Jul 18 12:53:51 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 18 Jul 2014 17:53:51 +0100 Subject: [Numpy-discussion] `allclose` vs `assert_allclose` In-Reply-To: References: Message-ID: On Fri, Jul 18, 2014 at 12:38 PM, wrote: > > On Thu, Jul 17, 2014 at 4:07 PM, wrote: > >> If you mean by this to add atol=1e-8 as default, then I'm against it. >> >> At least it will change the meaning of many of our tests in statsmodels. >> >> I'm using rtol to check for correct 1e-15 or 1e-30, which would be >> completely swamped if you change the default atol=0. >> Adding atol=0 to all assert_allclose that currently use only rtol is a lot >> of work. >> I think I almost never use a default rtol, but I often leave atol at the >> default = 0. >> >> If we have zeros, then I don't think it's too much work to decide whether >> this should be atol=1e-20, or 1e-8. > > > copied from > http://mail.scipy.org/pipermail/numpy-discussion/2014-July/070639.html > since I didn't get any messages here > > This is a compelling use-case, but there are also lots of compelling > usecases that want some non-zero atol (i.e., comparing stuff to 0). > Saying that allclose is for one of those use cases and assert_allclose > is for the other is... not a very felicitious API design, I think. So > we really should do *something*. > > Are there really any cases where you want non-zero atol= that don't > involve comparing something against a 'desired' value of zero? It's a > little wacky, but I'm wondering if we ought to change the rule (for > all versions of allclose) to > > if desired == 0: > tol = atol > else: > tol = rtol * desired > > In particular, means that np.allclose(x, 1e-30) would reject x values > of 0 or 2e-30, but np.allclose(x, 0) will accept x == 1e-30 or 2e-30. > > -n > > > That's much too confusing. > I don't know what the usecases for np.allclose are since I don't have any. I wrote allclose because it's shorter, but my point is that assert_allclose and allclose should use the same criterion, and was making a suggestion for what that shared criterion might be. > assert_allclose is one of our (statsmodels) most frequently used numpy > function > > this is not informative: > > `np.allclose(x, 1e-30)` > > > since there are keywords > either np.assert_allclose(x, atol=1e-30) I think we might be talking past each other here -- 1e-30 here is my "gold" p-value that I'm hoping x will match, not a tolerance argument. > if I want to be "close" to zero > or > > np.assert_allclose(x, rtol=1e-11, atol=1e-25) > > if we have a mix of large numbers and "zeros" in an array. > > Making the behavior of assert_allclose depending on whether desired is > exactly zero or 1e-20 looks too difficult to remember, and which desired I > use would depend on what I get out of R or Stata. I thought your whole point here was that 1e-20 and zero are qualitatively different values that you would not want to accidentally confuse? Surely R and Stata aren't returning exact zeros for small non-zero values like probability tails? > atol=1e-8 is not close to zero in most cases in my experience. If I understand correctly (Tony?) the problem here is that another common use case for assert_allclose is in cases like assert_allclose(np.sin(some * complex ** calculation / (that - should - be * zero)), 0) For cases like this, you need *some* non-zero atol or the thing just doesn't work, and one could quibble over the exact value as long as it's larger than "normal" floating point error. These calculations usually involve "normal" sized numbers, so atol should be comparable to eps * these values. eps is 2e-16, so atol=1e-8 works for values up to around 1e8, which is a plausible upper bound for where people might expect assert_allclose to just work. I'm trying to figure out some way to support your use cases while also supporting other use cases. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From chris.barker at noaa.gov Fri Jul 18 12:54:03 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 18 Jul 2014 09:54:03 -0700 Subject: [Numpy-discussion] String type again. In-Reply-To: References: <-4597269384285942771@unknownmsgid> Message-ID: On Fri, Jul 18, 2014 at 9:32 AM, Andrew Collette wrote: > >> A Latin-1 based 'a' type > >> would have similar problems. > > > > Maybe not -- latin1 is fixed width. > > Yes, Latin-1 is fixed width, but the issue is that when writing to a > fixed-width UTF8 string in HDF5, it will expand, possibly losing data. > you shouldn't do that -- I was in no way suggesting that a latin-1 string get pushed to a utf-8 array by default -- that would be a bad idea. utf-8 is a unicode encoding, it should be used for unicode. As for truncation -- that's inherent in using a fixed-width array to store a non-fixed width encoding. What I would like to avoid is a situation where a user writes a > 10-byte string from NumPy into a 10-byte space in an HDF5 dataset, and > unexpectedly loses the last few characters because of the encoding > mismatch. > Again, they shouldn't do that, they should be pushing a 10-character string into something -- and utf-8 is going to (Possible) truncate that. That's HDF/utf-8 limitation that people are going to have to deal with. I think you're suggesting that numpy follow the HDF model, so that the numpy-HDF transition can be clean and easy. However, I think that utf-8 is an inappropriate model for numpy, and that the mess of bytes to utf-8 is pyHDF's problem, not numpy's. i.e your issue above -- should users put a 10 character string into a numpy 10 byte utf-8 type and see it truncated? That's what I want to avoid. In any case, I certainly agree NumPy shouldn't be limited by the > capabilities of HDF5. There are other valuable use cases, including > access to the high-bit characters Latin-1 provides. But from a strict > compatibility standpoint, ASCII would be beneficial. > This is where I wonder about HDF's "ascii" type -- is it really ascii? Or is it that old standby one-byte-per-character-and-if-it's-ascii-we-all-know-what-it-means-but-if-it's-not-we'll-still-pass-it-around type? i.e the old char* ? In which case, you can just push a latin-1 type into and out of your HDF ascii arrays and everything will work just fine. Unless someone stores something other than latin-1 or ascii in it -- but even then, the bytes would still be preserved. This is why I see no downside to latin-1 -- if you don't use the > 127 code points, it's the same thing -- if you do, you get some extra handy characters. The only difference is that a proper ascii type would not let you store anything above 127 at all -- why restrict ourselves? And if you want utf-8 in HDF, then use a unicode array knowing that some truncation could occur, or use a byte array, and do the encoding yourself, so the user knows exactly what they are doing. [it would be nice if numpy had a pure numpy solution to encoding/decoding, though maybe it wouldn't really be any faster than going through python anyway...] -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Jul 18 12:59:32 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 18 Jul 2014 17:59:32 +0100 Subject: [Numpy-discussion] String type again. In-Reply-To: References: <-4597269384285942771@unknownmsgid> Message-ID: On Fri, Jul 18, 2014 at 5:54 PM, Chris Barker wrote: > > This is why I see no downside to latin-1 -- if you don't use the > 127 code > points, it's the same thing -- if you do, you get some extra handy > characters. The only difference is that a proper ascii type would not let > you store anything above 127 at all -- why restrict ourselves? IMO the extra characters aren't the most compelling argument for latin1 over ascii. Latin1 gives the nice assurance that if some jerk *does* give me an "ascii" file that somewhere has some byte with the 8th bit set, then I can still load the data and fix things by hand. This is trickier if numpy just refuses to touch the data, blowing up with an exception when I try. In general it's easy to create numpy arrays containing arbitrary bitpatterns, so it's nice to have some strategy for what to do with them. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From njs at pobox.com Fri Jul 18 13:00:24 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 18 Jul 2014 18:00:24 +0100 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: References: <53B9C861.3090809@hawaii.edu> <-2968451659458027190@unknownmsgid> Message-ID: On Fri, Jul 18, 2014 at 3:02 PM, Charles G. Waldman wrote: > I greatly prefer "np.mat" to "np.arr" for this, FWIW Unfortunately that's already taken... -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From chris.barker at noaa.gov Fri Jul 18 13:00:02 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 18 Jul 2014 10:00:02 -0700 Subject: [Numpy-discussion] `allclose` vs `assert_allclose` In-Reply-To: References: Message-ID: On Fri, Jul 18, 2014 at 9:53 AM, Nathaniel Smith wrote: > > I don't know what the usecases for np.allclose are since I don't have > any. > I use it all the time -- sometimes you want to check something, but not raise an assertion -- and I use it like: assert np.allclose() with pytest, because it does some nice failure reporting that way (though maybe because I just landed on that). Though I have to say I"m very surprised that assert_allclose() doesn't simpily call allclose() to do it's work, and having different default is really really bad. but that cat's out of the bag. If we don't normalize these, we should put nice strong notes in the docs for both that they are NOT the same. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From andy.terrel at gmail.com Fri Jul 18 13:00:28 2014 From: andy.terrel at gmail.com (Andy Ray Terrel) Date: Fri, 18 Jul 2014 13:00:28 -0400 Subject: [Numpy-discussion] Mailing list slowdown (was Re: __numpy_ufunc__) In-Reply-To: References: Message-ID: We think this is fixed now. Let me know if it is otherwise. On Thu, Jul 17, 2014 at 7:04 AM, Nathaniel Smith wrote: > On 17 Jul 2014 11:51, "Sebastian Berg" wrote: >> >> On Mi, 2014-07-16 at 09:07 +0100, Nathaniel Smith wrote: >> > Weirdly, I never received Chuck's original email in this thread. >> > Should some list admin be informed? >> > >> >> I send some mails yesterday and they never arrived... Not sure if it is >> a problem on my side or not. > > I did eventually get Chuck's original message, but not until several days > later. > > CC'ing postmaster at enthought.com in case they have some insight into what's > going on! > > -n > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From andy.terrel at gmail.com Fri Jul 18 13:01:18 2014 From: andy.terrel at gmail.com (Andy Ray Terrel) Date: Fri, 18 Jul 2014 13:01:18 -0400 Subject: [Numpy-discussion] problems with mailing list ? In-Reply-To: <6B2A9899-AB3E-40E5-AB43-BD7A2DC07D8A@astro.physik.uni-goettingen.de> References: <6B2A9899-AB3E-40E5-AB43-BD7A2DC07D8A@astro.physik.uni-goettingen.de> Message-ID: The Enthought support tells me this is fixed now. Please let me know if otherwise. On Fri, Jul 18, 2014 at 8:09 AM, Derek Homeier wrote: > On 18 Jul 2014, at 01:07 pm, josef.pktd at gmail.com wrote: > >> Are the problems with sending out the messages with the mailing lists? >> >> I'm getting some replies without original messages, and in some threads I don't get replies, missing part of the discussions. >> > There seem to be problems with the Scipy list server; my last mails to astropy at scipy.org have taken > 12-18 hours before they made it to the list, and some people here reported messages staying in the > void for several days. But I think it?s been reported to Enthought already. > > Derek > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From sturla.molden at gmail.com Fri Jul 18 13:03:10 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Fri, 18 Jul 2014 17:03:10 +0000 (UTC) Subject: [Numpy-discussion] proposal: new commit guidelines for backportable bugfixes References: Message-ID: <1569159014427395668.306103sturla.molden-gmail.com@news.gmane.org> Julian Taylor wrote: > git rebase --onto $(git merge-base master maintenance/1.9.x) HEAD^ That's the problem with Git, it solves one problem an creates another. Personally I have no idea what that command might do. Sturla From alan.isaac at gmail.com Fri Jul 18 13:05:57 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Fri, 18 Jul 2014 13:05:57 -0400 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: References: <53B9C861.3090809@hawaii.edu> <-2968451659458027190@unknownmsgid> Message-ID: <53C953F5.90100@gmail.com> On 7/18/2014 12:45 PM, Mark Miller wrote: > If the true goal is to just allow quick entry of a 2d array, why not just advocate using > a = numpy.array(numpy.mat("1 2 3; 4 5 6; 7 8 9")) It's even simpler: a = np.mat(' 1 2 3;4 5 6;7 8 9').A I'm not putting a dog in this race. Still I would say that the reason why such proposals miss the point is that there are introductory settings where one would like to explain as few complications as possible. In particular, one might prefer *not* to discuss the existence of a matrix type. As an additional downside, this is only good for 2d, and there have been proposals for the new array builder to handle other dimensions. fwiw, Alan Isaac From pav at iki.fi Fri Jul 18 13:26:59 2014 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 18 Jul 2014 20:26:59 +0300 Subject: [Numpy-discussion] String type again. In-Reply-To: References: <1405423590.8281.7.camel@sebastian-t440> <53C94651.4040805@iki.fi> Message-ID: <53C958E3.9070700@iki.fi> 18.07.2014 19:33, Chris Barker kirjoitti: > On Fri, Jul 18, 2014 at 9:07 AM, Pauli Virtanen > wrote: > >> Another approach would be to add a new 1-byte unicode > > you can't do unicode in 1-byte -- so what does this mean, exactly? The first 256 unicode code points, which happen to coincide with latin1. >> This also is not perfect, since array(['foo']) on Py2 should for >> backward compatibility continue returning dtype='S'. > > yup. but we may be OK -- as "bytes" in py2 is the same as string > anyway. But what do we do with null bytes? when going from 'S' to > py2 string? Changing the null chopping and preserving backward compat would require yet another new dtype. This would then mean that the 'S' dtype would become pretty much deprecated on Py3. Forcing everyone to re-do their Python 3 ports would be somewhat cleaner. However, this train may have left a couple of years ago. -- Pauli Virtanen From andrew.collette at gmail.com Fri Jul 18 13:29:10 2014 From: andrew.collette at gmail.com (Andrew Collette) Date: Fri, 18 Jul 2014 11:29:10 -0600 Subject: [Numpy-discussion] String type again. In-Reply-To: References: <-4597269384285942771@unknownmsgid> Message-ID: Hi Chris, > Again, they shouldn't do that, they should be pushing a 10-character string > into something -- and utf-8 is going to (Possible) truncate that. That's > HDF/utf-8 limitation that people are going to have to deal with. I think > you're suggesting that numpy follow the HDF model, so that the numpy-HDF > transition can be clean and easy. However, I think that utf-8 is an > inappropriate model for numpy, and that the mess of bytes to utf-8 is > pyHDF's problem, not numpy's. The root of the issue is that HDF5 provides a limited set of fixed-storage-width string types, and a fixed-storage-width NumPy type of the same size using Latin-1 can't map to any of them without losing data. For example, if "a10" is a hypothetical 10-byte-wide NumPy dtype using Latin-1, reading/writing to an "a10" HDF5 dataset backed with 10-byte UTF-8 storage would risk truncation, even if the advertised widths are the same. There is unfortunately nothing we can do in the h5py code base to paper over this... it's a limitation of the format. > This is where I wonder about HDF's "ascii" type -- is it really ascii? Or is > it that old standby > one-byte-per-character-and-if-it's-ascii-we-all-know-what-it-means-but-if-it's-not-we'll-still-pass-it-around > type? i.e the old char* ? > > In which case, you can just push a latin-1 type into and out of your HDF > ascii arrays and everything will work just fine. Unless someone stores > something other than latin-1 or ascii in it -- but even then, the bytes > would still be preserved. The encoding is explicitly ASCII (H5T_ASCII, in HDF5 lingo). Anecdotally, I've heard people store other encodings in it, but (1) I'm not eager to make things worse by mis-labelling data, and (2) the HDF Group has made indications that they may start checking the encoding at conversion time. (1) is particularly important, as a major focus of h5py is compatibility with the rest of the HDF5 ecosystem. Again, I wouldn't argue that these considerations by themselves are enough of a reason for NumPy to use ASCII or UTF-8, certainly. Just that from this particular HDF5 perspective, they provide maximum compatibility and minimize the chances of accidental data loss. Andrew From charlesr.harris at gmail.com Fri Jul 18 13:39:21 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 18 Jul 2014 11:39:21 -0600 Subject: [Numpy-discussion] String type again. In-Reply-To: References: <-4597269384285942771@unknownmsgid> Message-ID: On Fri, Jul 18, 2014 at 10:59 AM, Nathaniel Smith wrote: > On Fri, Jul 18, 2014 at 5:54 PM, Chris Barker > wrote: > > > > This is why I see no downside to latin-1 -- if you don't use the > 127 > code > > points, it's the same thing -- if you do, you get some extra handy > > characters. The only difference is that a proper ascii type would not let > > you store anything above 127 at all -- why restrict ourselves? > > IMO the extra characters aren't the most compelling argument for > latin1 over ascii. Latin1 gives the nice assurance that if some jerk > *does* give me an "ascii" file that somewhere has some byte with the > 8th bit set, then I can still load the data and fix things by hand. > This is trickier if numpy just refuses to touch the data, blowing up > with an exception when I try. In general it's easy to create numpy > arrays containing arbitrary bitpatterns, so it's nice to have some > strategy for what to do with them. > > Just to throw in one more complication, there is no buffer protocol for a fixed encoding type. In Python 3 'c', 's', 'p' are all considered as bytes, in Python 2 as strings. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Fri Jul 18 13:47:15 2014 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 18 Jul 2014 20:47:15 +0300 Subject: [Numpy-discussion] proposal: new commit guidelines for backportable bugfixes In-Reply-To: References: Message-ID: <53C95DA3.7010901@iki.fi> 18.07.2014 19:35, Julian Taylor kirjoitti: > On Fri, Jul 18, 2014 at 6:23 PM, Nathaniel Smith > wrote: >> On 18 Jul 2014 15:36, "Julian Taylor" >> wrote: >>> >>> git rebase --onto $(git merge-base master maintenance/1.9.x) >>> HEAD^ >> >> As a potential refinement, this might be simpler if we define a >> branch that points to this commit. >> > > we could do that, though the merge base changes to the last commit > that was merged in that way. The old merge base is still valid but > much older. I applied this method to some of my bugfixes so the > current merge base of master and 1.9 is a commit from yesterday > not anymore the diverging point of master and 1.9. But I don't know > if the newer merge base makes any difference to git. Will the merge base actually ever change if you don't merge the branches to each other? *** The other well-known alternative to bugfixes is to first commit it in the earliest maintenance branch where you want to have it, and then merge that branch forward to the newer maintenance branches, and finally into master. Pauli From josef.pktd at gmail.com Fri Jul 18 14:03:56 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 18 Jul 2014 14:03:56 -0400 Subject: [Numpy-discussion] `allclose` vs `assert_allclose` In-Reply-To: References: Message-ID: On Fri, Jul 18, 2014 at 12:53 PM, Nathaniel Smith wrote: > On Fri, Jul 18, 2014 at 12:38 PM, wrote: > > > > On Thu, Jul 17, 2014 at 4:07 PM, wrote: > > > >> If you mean by this to add atol=1e-8 as default, then I'm against it. > >> > >> At least it will change the meaning of many of our tests in statsmodels. > >> > >> I'm using rtol to check for correct 1e-15 or 1e-30, which would be > >> completely swamped if you change the default atol=0. > >> Adding atol=0 to all assert_allclose that currently use only rtol is a > lot > >> of work. > >> I think I almost never use a default rtol, but I often leave atol at the > >> default = 0. > >> > >> If we have zeros, then I don't think it's too much work to decide > whether > >> this should be atol=1e-20, or 1e-8. > > > > > > copied from > > http://mail.scipy.org/pipermail/numpy-discussion/2014-July/070639.html > > since I didn't get any messages here > > > > This is a compelling use-case, but there are also lots of compelling > > usecases that want some non-zero atol (i.e., comparing stuff to 0). > > Saying that allclose is for one of those use cases and assert_allclose > > is for the other is... not a very felicitious API design, I think. So > > we really should do *something*. > > > > Are there really any cases where you want non-zero atol= that don't > > involve comparing something against a 'desired' value of zero? It's a > > little wacky, but I'm wondering if we ought to change the rule (for > > all versions of allclose) to > > > > if desired == 0: > > tol = atol > > else: > > tol = rtol * desired > > > > In particular, means that np.allclose(x, 1e-30) would reject x values > > of 0 or 2e-30, but np.allclose(x, 0) will accept x == 1e-30 or 2e-30. > > > > -n > > > > > > That's much too confusing. > > I don't know what the usecases for np.allclose are since I don't have > any. > > I wrote allclose because it's shorter, but my point is that > assert_allclose and allclose should use the same criterion, and was > making a suggestion for what that shared criterion might be. > > > assert_allclose is one of our (statsmodels) most frequently used numpy > > function > > > > this is not informative: > > > > `np.allclose(x, 1e-30)` > > > > > > since there are keywords > > either np.assert_allclose(x, atol=1e-30) > > I think we might be talking past each other here -- 1e-30 here is my > "gold" p-value that I'm hoping x will match, not a tolerance argument. > my mistake > > > if I want to be "close" to zero > > or > > > > np.assert_allclose(x, rtol=1e-11, atol=1e-25) > > > > if we have a mix of large numbers and "zeros" in an array. > > > > Making the behavior of assert_allclose depending on whether desired is > > exactly zero or 1e-20 looks too difficult to remember, and which desired > I > > use would depend on what I get out of R or Stata. > > I thought your whole point here was that 1e-20 and zero are > qualitatively different values that you would not want to accidentally > confuse? Surely R and Stata aren't returning exact zeros for small > non-zero values like probability tails? > > > atol=1e-8 is not close to zero in most cases in my experience. > > If I understand correctly (Tony?) the problem here is that another > common use case for assert_allclose is in cases like > > assert_allclose(np.sin(some * complex ** calculation / (that - should > - be * zero)), 0) > > For cases like this, you need *some* non-zero atol or the thing just > doesn't work, and one could quibble over the exact value as long as > it's larger than "normal" floating point error. These calculations > usually involve "normal" sized numbers, so atol should be comparable > to eps * these values. eps is 2e-16, so atol=1e-8 works for values up > to around 1e8, which is a plausible upper bound for where people might > expect assert_allclose to just work. I'm trying to figure out some way > to support your use cases while also supporting other use cases. > my problem is that there is no "normal" floating point error. If I have units in 1000 or units in 0.0001 depends on the example and dataset that we use for testing. this test two different functions/methods that calculate the same thing (Pdb) pval array([ 3.01270184e-42, 5.90847367e-02, 3.00066946e-12]) (Pdb) res2.pvalues array([ 3.01270184e-42, 5.90847367e-02, 3.00066946e-12]) (Pdb) assert_allclose(pval, res2.pvalues, rtol=5 * rtol, atol=1e-25) I don't care about errors that are smaller that 1e-25 for example testing p-values against Stata (Pdb) tt.pvalue array([ 5.70315140e-30, 6.24662551e-02, 5.86024090e-11]) (Pdb) res2.pvalues array([ 5.70315140e-30, 6.24662551e-02, 5.86024090e-11]) (Pdb) tt.pvalue - res2.pvalues array([ 2.16612016e-40, 2.51187959e-15, 4.30027936e-21]) (Pdb) tt.pvalue / res2.pvalues - 1 array([ 3.79811738e-11, 4.01900735e-14, 7.33806349e-11]) (Pdb) rtol 1e-10 (Pdb) assert_allclose(tt.pvalue, res2.pvalues, rtol=5 * rtol) I could find a lot more and maybe nicer examples, since I spend quite a bit of time fine tuning unit tests. Of course you can change it. But the testing functions are code and very popular code. And if you break backwards compatibility, then I wouldn't mind reviewing a pull request for statsmodels that adds 300 to 400 `atol=0` to the unit tests. :) Josef > > -n > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Jul 18 14:20:54 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 18 Jul 2014 14:20:54 -0400 Subject: [Numpy-discussion] `allclose` vs `assert_allclose` In-Reply-To: References: Message-ID: On Fri, Jul 18, 2014 at 2:03 PM, wrote: > > > > On Fri, Jul 18, 2014 at 12:53 PM, Nathaniel Smith wrote: > >> On Fri, Jul 18, 2014 at 12:38 PM, wrote: >> > >> > On Thu, Jul 17, 2014 at 4:07 PM, wrote: >> > >> >> If you mean by this to add atol=1e-8 as default, then I'm against it. >> >> >> >> At least it will change the meaning of many of our tests in >> statsmodels. >> >> >> >> I'm using rtol to check for correct 1e-15 or 1e-30, which would be >> >> completely swamped if you change the default atol=0. >> >> Adding atol=0 to all assert_allclose that currently use only rtol is a >> lot >> >> of work. >> >> I think I almost never use a default rtol, but I often leave atol at >> the >> >> default = 0. >> >> >> >> If we have zeros, then I don't think it's too much work to decide >> whether >> >> this should be atol=1e-20, or 1e-8. >> > >> > >> > copied from >> > http://mail.scipy.org/pipermail/numpy-discussion/2014-July/070639.html >> > since I didn't get any messages here >> > >> > This is a compelling use-case, but there are also lots of compelling >> > usecases that want some non-zero atol (i.e., comparing stuff to 0). >> > Saying that allclose is for one of those use cases and assert_allclose >> > is for the other is... not a very felicitious API design, I think. So >> > we really should do *something*. >> > >> > Are there really any cases where you want non-zero atol= that don't >> > involve comparing something against a 'desired' value of zero? It's a >> > little wacky, but I'm wondering if we ought to change the rule (for >> > all versions of allclose) to >> > >> > if desired == 0: >> > tol = atol >> > else: >> > tol = rtol * desired >> > >> > In particular, means that np.allclose(x, 1e-30) would reject x values >> > of 0 or 2e-30, but np.allclose(x, 0) will accept x == 1e-30 or 2e-30. >> > >> > -n >> > >> > >> > That's much too confusing. >> > I don't know what the usecases for np.allclose are since I don't have >> any. >> >> I wrote allclose because it's shorter, but my point is that >> assert_allclose and allclose should use the same criterion, and was >> making a suggestion for what that shared criterion might be. >> >> > assert_allclose is one of our (statsmodels) most frequently used numpy >> > function >> > >> > this is not informative: >> > >> > `np.allclose(x, 1e-30)` >> > >> > >> > since there are keywords >> > either np.assert_allclose(x, atol=1e-30) >> >> I think we might be talking past each other here -- 1e-30 here is my >> "gold" p-value that I'm hoping x will match, not a tolerance argument. >> > > my mistake > > > >> >> > if I want to be "close" to zero >> > or >> > >> > np.assert_allclose(x, rtol=1e-11, atol=1e-25) >> > >> > if we have a mix of large numbers and "zeros" in an array. >> > >> > Making the behavior of assert_allclose depending on whether desired is >> > exactly zero or 1e-20 looks too difficult to remember, and which >> desired I >> > use would depend on what I get out of R or Stata. >> >> I thought your whole point here was that 1e-20 and zero are >> qualitatively different values that you would not want to accidentally >> confuse? Surely R and Stata aren't returning exact zeros for small >> non-zero values like probability tails? >> >> > atol=1e-8 is not close to zero in most cases in my experience. >> >> If I understand correctly (Tony?) the problem here is that another >> common use case for assert_allclose is in cases like >> >> assert_allclose(np.sin(some * complex ** calculation / (that - should >> - be * zero)), 0) >> >> For cases like this, you need *some* non-zero atol or the thing just >> doesn't work, and one could quibble over the exact value as long as >> it's larger than "normal" floating point error. These calculations >> usually involve "normal" sized numbers, so atol should be comparable >> to eps * these values. eps is 2e-16, so atol=1e-8 works for values up >> to around 1e8, which is a plausible upper bound for where people might >> expect assert_allclose to just work. I'm trying to figure out some way >> to support your use cases while also supporting other use cases. >> > > my problem is that there is no "normal" floating point error. > If I have units in 1000 or units in 0.0001 depends on the example and > dataset that we use for testing. > > this test two different functions/methods that calculate the same thing > > (Pdb) pval > array([ 3.01270184e-42, 5.90847367e-02, 3.00066946e-12]) > (Pdb) res2.pvalues > array([ 3.01270184e-42, 5.90847367e-02, 3.00066946e-12]) > (Pdb) assert_allclose(pval, res2.pvalues, rtol=5 * rtol, atol=1e-25) > > I don't care about errors that are smaller that 1e-25 > > for example testing p-values against Stata > > (Pdb) tt.pvalue > array([ 5.70315140e-30, 6.24662551e-02, 5.86024090e-11]) > (Pdb) res2.pvalues > array([ 5.70315140e-30, 6.24662551e-02, 5.86024090e-11]) > (Pdb) tt.pvalue - res2.pvalues > array([ 2.16612016e-40, 2.51187959e-15, 4.30027936e-21]) > (Pdb) tt.pvalue / res2.pvalues - 1 > array([ 3.79811738e-11, 4.01900735e-14, 7.33806349e-11]) > (Pdb) rtol > 1e-10 > (Pdb) assert_allclose(tt.pvalue, res2.pvalues, rtol=5 * rtol) > > > I could find a lot more and maybe nicer examples, since I spend quite a > bit of time fine tuning unit tests. > > Of course you can change it. > > But the testing functions are code and very popular code. > > And if you break backwards compatibility, then I wouldn't mind reviewing a > pull request for statsmodels that adds 300 to 400 `atol=0` to the unit > tests. :) > scipy (not current master) doesn't look "so" bad. I find 400 "assert_allclose(" and maybe a third to half use atol. As expected optimize uses only atol because of the convergence criteria. scipy.stats uses mostly rtol or default. Josef > > Josef > > >> >> -n >> >> -- >> Nathaniel J. Smith >> Postdoctoral researcher - Informatics - University of Edinburgh >> http://vorpus.org >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Jul 18 14:29:25 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 18 Jul 2014 19:29:25 +0100 Subject: [Numpy-discussion] `allclose` vs `assert_allclose` In-Reply-To: References: Message-ID: On Fri, Jul 18, 2014 at 7:03 PM, wrote: > > On Fri, Jul 18, 2014 at 12:53 PM, Nathaniel Smith wrote: >> >> For cases like this, you need *some* non-zero atol or the thing just >> doesn't work, and one could quibble over the exact value as long as >> it's larger than "normal" floating point error. These calculations >> usually involve "normal" sized numbers, so atol should be comparable >> to eps * these values. eps is 2e-16, so atol=1e-8 works for values up >> to around 1e8, which is a plausible upper bound for where people might >> expect assert_allclose to just work. I'm trying to figure out some way >> to support your use cases while also supporting other use cases. > > > my problem is that there is no "normal" floating point error. > If I have units in 1000 or units in 0.0001 depends on the example and > dataset that we use for testing. > > this test two different functions/methods that calculate the same thing > > (Pdb) pval > array([ 3.01270184e-42, 5.90847367e-02, 3.00066946e-12]) > (Pdb) res2.pvalues > array([ 3.01270184e-42, 5.90847367e-02, 3.00066946e-12]) > (Pdb) assert_allclose(pval, res2.pvalues, rtol=5 * rtol, atol=1e-25) > > I don't care about errors that are smaller that 1e-25 > > for example testing p-values against Stata > > (Pdb) tt.pvalue > array([ 5.70315140e-30, 6.24662551e-02, 5.86024090e-11]) > (Pdb) res2.pvalues > array([ 5.70315140e-30, 6.24662551e-02, 5.86024090e-11]) > (Pdb) tt.pvalue - res2.pvalues > array([ 2.16612016e-40, 2.51187959e-15, 4.30027936e-21]) > (Pdb) tt.pvalue / res2.pvalues - 1 > array([ 3.79811738e-11, 4.01900735e-14, 7.33806349e-11]) > (Pdb) rtol > 1e-10 > (Pdb) assert_allclose(tt.pvalue, res2.pvalues, rtol=5 * rtol) > > > I could find a lot more and maybe nicer examples, since I spend quite a bit > of time fine tuning unit tests. ...these are all cases where there are not exact zeros, so my proposal would not affect them? I can see the argument that we shouldn't provide any default rtol/atol at all because there is no good default, but... I don't think putting that big of a barrier in front of newbies writing their first tests is a good idea. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From josef.pktd at gmail.com Fri Jul 18 14:31:01 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 18 Jul 2014 14:31:01 -0400 Subject: [Numpy-discussion] `allclose` vs `assert_allclose` In-Reply-To: References: Message-ID: > > > > Making the behavior of assert_allclose depending on whether desired is > > exactly zero or 1e-20 looks too difficult to remember, and which desired > I > > use would depend on what I get out of R or Stata. > > I thought your whole point here was that 1e-20 and zero are > qualitatively different values that you would not want to accidentally > confuse? Surely R and Stata aren't returning exact zeros for small > non-zero values like probability tails? > > I was thinking of the case when we only see "pvalue < 1e-16" or something like this, and we replace this by assert close to zero. which would translate to `assert_allclose(pvalue, 0, atol=1e-16)` with maybe an additional rtol=1e-11 if we have an array of pvalues where some are "large" (>0.5). It's not a very frequent case, mainly when we don't have access to the underlying float numbers and only have the print representation. Josef -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Jul 18 14:41:25 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 18 Jul 2014 14:41:25 -0400 Subject: [Numpy-discussion] `allclose` vs `assert_allclose` In-Reply-To: References: Message-ID: On Fri, Jul 18, 2014 at 2:29 PM, Nathaniel Smith wrote: > On Fri, Jul 18, 2014 at 7:03 PM, wrote: > > > > On Fri, Jul 18, 2014 at 12:53 PM, Nathaniel Smith wrote: > >> > >> For cases like this, you need *some* non-zero atol or the thing just > >> doesn't work, and one could quibble over the exact value as long as > >> it's larger than "normal" floating point error. These calculations > >> usually involve "normal" sized numbers, so atol should be comparable > >> to eps * these values. eps is 2e-16, so atol=1e-8 works for values up > >> to around 1e8, which is a plausible upper bound for where people might > >> expect assert_allclose to just work. I'm trying to figure out some way > >> to support your use cases while also supporting other use cases. > > > > > > my problem is that there is no "normal" floating point error. > > If I have units in 1000 or units in 0.0001 depends on the example and > > dataset that we use for testing. > > > > this test two different functions/methods that calculate the same thing > > > > (Pdb) pval > > array([ 3.01270184e-42, 5.90847367e-02, 3.00066946e-12]) > > (Pdb) res2.pvalues > > array([ 3.01270184e-42, 5.90847367e-02, 3.00066946e-12]) > > (Pdb) assert_allclose(pval, res2.pvalues, rtol=5 * rtol, atol=1e-25) > > > > I don't care about errors that are smaller that 1e-25 > > > > for example testing p-values against Stata > > > > (Pdb) tt.pvalue > > array([ 5.70315140e-30, 6.24662551e-02, 5.86024090e-11]) > > (Pdb) res2.pvalues > > array([ 5.70315140e-30, 6.24662551e-02, 5.86024090e-11]) > > (Pdb) tt.pvalue - res2.pvalues > > array([ 2.16612016e-40, 2.51187959e-15, 4.30027936e-21]) > > (Pdb) tt.pvalue / res2.pvalues - 1 > > array([ 3.79811738e-11, 4.01900735e-14, 7.33806349e-11]) > > (Pdb) rtol > > 1e-10 > > (Pdb) assert_allclose(tt.pvalue, res2.pvalues, rtol=5 * rtol) > > > > > > I could find a lot more and maybe nicer examples, since I spend quite a > bit > > of time fine tuning unit tests. > > ...these are all cases where there are not exact zeros, so my proposal > would not affect them? > > I can see the argument that we shouldn't provide any default rtol/atol > at all because there is no good default, but... I don't think putting > that big of a barrier in front of newbies writing their first tests is > a good idea. > I think atol=0 is **very** good for newbies, and everyone else. If expected is really zero or very small, then it immediately causes a test failure, and it's relatively obvious how to fix it. I worry a lot more about unit tests that don't "bite" written by newbies or not so newbies who just use a default. That's one of the problems we had with assert_almost_equal, and why I was very happy to switch to assert_allclose with it's emphasis on relative tolerance. Josef > > -n > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charles at crunch.io Fri Jul 18 14:42:22 2014 From: charles at crunch.io (Charles G. Waldman) Date: Fri, 18 Jul 2014 11:42:22 -0700 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: <53C953F5.90100@gmail.com> References: <53B9C861.3090809@hawaii.edu> <-2968451659458027190@unknownmsgid> <53C953F5.90100@gmail.com> Message-ID: Well, if the goal is "shorthand", typing numpy.array(numpy.mat()) won't please many users. But the more I think about it, the less I think Numpy should support this (non-Pythonic) input mode. Too much molly-coddling of new users! When doing interactive work I usually just type: >>> np.array([[1,2,3], ... [4,5,6], ... [7,8,9]]) which is (IMO) easier to read: e.g. it's not totally obvious that "1,0,0;0,1,0;0,0,1" represents a 3x3 identity matrix, but [[1,0,0], [0,1,0], [0,0,1]] is pretty obvious. The difference in (non-whitespace) chars is 19 vs 25, so the "shorthand" doesn't seem to save that much. Just my ?0.02, - C On Fri, Jul 18, 2014 at 10:05 AM, Alan G Isaac wrote: > On 7/18/2014 12:45 PM, Mark Miller wrote: >> If the true goal is to just allow quick entry of a 2d array, why not just advocate using >> a = numpy.array(numpy.mat("1 2 3; 4 5 6; 7 8 9")) > > > It's even simpler: > a = np.mat(' 1 2 3;4 5 6;7 8 9').A > > I'm not putting a dog in this race. Still I would say that > the reason why such proposals miss the point is that > there are introductory settings where one would like > to explain as few complications as possible. In > particular, one might prefer *not* to discuss the > existence of a matrix type. As an additional downside, > this is only good for 2d, and there have been proposals > for the new array builder to handle other dimensions. > > fwiw, > Alan Isaac > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From njs at pobox.com Fri Jul 18 14:44:08 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 18 Jul 2014 19:44:08 +0100 Subject: [Numpy-discussion] `allclose` vs `assert_allclose` In-Reply-To: References: Message-ID: On 18 Jul 2014 19:31, wrote: >> >> >> > Making the behavior of assert_allclose depending on whether desired is >> > exactly zero or 1e-20 looks too difficult to remember, and which desired I >> > use would depend on what I get out of R or Stata. >> >> I thought your whole point here was that 1e-20 and zero are >> qualitatively different values that you would not want to accidentally >> confuse? Surely R and Stata aren't returning exact zeros for small >> non-zero values like probability tails? >> > > I was thinking of the case when we only see "pvalue < 1e-16" or something like this, and we replace this by assert close to zero. > which would translate to `assert_allclose(pvalue, 0, atol=1e-16)` > with maybe an additional rtol=1e-11 if we have an array of pvalues where some are "large" (>0.5). This example is also handled correctly by my proposal :-) -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Jul 18 14:43:39 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 18 Jul 2014 11:43:39 -0700 Subject: [Numpy-discussion] Numpy BoF at SciPy 2014 - quick report In-Reply-To: References: <3A0037EB-BF51-4943-8E27-EDDFC8F09456@astro.princeton.edu> <1405670639.6974.4.camel@sebastian-t440> Message-ID: On Fri, Jul 18, 2014 at 6:18 AM, Charles R Harris wrote: > I've toyed some with the idea of adding a flag bit for transpose of 1-d > arrays. It would flip with every transpose and be ignored for non 1-d > arrays. A bit of a hack, but would allow for a column/row vector > distinction. > very cool! I've thought for a while that one of the major things lacking from numpy.matrix was row and column vectors. To do linear algebra naturally, you really need those. This may be a really lightweight way to get that - without the distinction between "arrays" and "matrixes", which I think we're trying to get rid of with the @ operator. when would this flag be used? - linear algebra operations (mostly @ -- anything else?) - broadcasting??? neat idea, anyway. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Fri Jul 18 14:47:20 2014 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 18 Jul 2014 21:47:20 +0300 Subject: [Numpy-discussion] `allclose` vs `assert_allclose` In-Reply-To: References: Message-ID: <53C96BB8.4060104@iki.fi> 18.07.2014 21:03, josef.pktd at gmail.com kirjoitti: [clip] > Of course you can change it. > > But the testing functions are code and very popular code. > > And if you break backwards compatibility, then I wouldn't mind reviewing a > pull request for statsmodels that adds 300 to 400 `atol=0` to the unit > tests. :) 10c: Scipy has 960 of those, and atol ~ 0 is required in some cases (difficult to say in how big percentage without review). The default of atol=1e-8 is pretty large. There's ~60 instances of allclose(), most of which are in tests. About half of those don't have atol=, whereas most have rtol. Using allclose in non-test code without specifying both tolerances explicitly is IMHO a sign of sloppiness, as the default tolerances are both pretty big (and atol != 0 is not scale-free). *** Consistency would be nice, especially in not having traps like assert_allclose(a, b, eps) -> assert_(not np.allclose(a, b, eps)) Bumping the tolerances in assert_allclose() up to match allclose() will probably not break code, but it can render some tests ineffective. If the change is made, it needs to be noted in the release notes. I think the number of project authors who relied on that the default was atol=0 is not so big. (In other news, we should discourage use of assert_almost_equal, by telling people to use assert_allclose instead in the docstring at the least. It has only atol= and it specifies it in a very cumbersome log10 basis...) -- Pauli Virtanen From njs at pobox.com Fri Jul 18 14:49:04 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 18 Jul 2014 19:49:04 +0100 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: <53C953F5.90100@gmail.com> References: <53B9C861.3090809@hawaii.edu> <-2968451659458027190@unknownmsgid> <53C953F5.90100@gmail.com> Message-ID: On 18 Jul 2014 18:06, "Alan G Isaac" wrote: > > On 7/18/2014 12:45 PM, Mark Miller wrote: > > If the true goal is to just allow quick entry of a 2d array, why not just advocate using > > a = numpy.array(numpy.mat("1 2 3; 4 5 6; 7 8 9")) > > > It's even simpler: > a = np.mat(' 1 2 3;4 5 6;7 8 9').A > > I'm not putting a dog in this race. Still I would say that > the reason why such proposals miss the point is that > there are introductory settings where one would like > to explain as few complications as possible. In > particular, one might prefer *not* to discuss the > existence of a matrix type. As an additional downside, > this is only good for 2d, and there have been proposals > for the new array builder to handle other dimensions. Going through np.mat also fails on the meta-goal, which is to remove reasons for people to prefer np.matrix to np.ndarray, so that eventually we can deprecate the former without harm. As far as this goal goes, it's all very well for some of us to say that users should toughen up or whatever, but it's useless: they'll just ignore you and use np.mat because it's easier. And then we have even more of a mess to clean up later. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Jul 18 14:50:07 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 18 Jul 2014 11:50:07 -0700 Subject: [Numpy-discussion] Numpy BoF at SciPy 2014 - quick report In-Reply-To: References: Message-ID: On Wed, Jul 16, 2014 at 8:08 PM, Fernando Perez wrote: > - it would have been more productive if a focused numpy sprint had been > also planned, so that there could be more structured follow-up on the ideas > that came up. > The trick is people to do it -- there are a scary few number of people with skills, time, and inclination to work on the core numpy code. Exactly one of them (thanks Chuck!) was there for the sprints this year. If there were a way to put together a stand-alone numpy sprint at some point, that would be really great! In particular, Chris Barker brought up a number of things regarding > datetime and planned on following up during the sprints, but I'm not sure > what ended up happening. > We did indeed follow op. No code was written, but: Chuck, Mark W. and I come up with a rough proposal. A handful of other folks came by to chat about it, and seemed to think it would be useful. In short: Some minor changes to time zone handling, with a hook in place to potentially plug in fancier support in the future. Possibly a hook in to plug in addition calendars. We're working on a NEP as we speak (or, correctly speaking, I'm distracted from working on the PEP by reading the numpy list....) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Jul 18 14:52:31 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 18 Jul 2014 11:52:31 -0700 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: References: <53B9C861.3090809@hawaii.edu> <-2968451659458027190@unknownmsgid> <53C953F5.90100@gmail.com> Message-ID: On Fri, Jul 18, 2014 at 11:49 AM, Nathaniel Smith wrote: > Going through np.mat also fails on the meta-goal, which is to remove > reasons for people to prefer np.matrix to np.ndarray, so that eventually we > can deprecate the former without harm. > > As far as this goal goes, it's all very well for some of us to say that > users should toughen up or whatever, but it's useless: they'll just ignore > you and use np.mat because it's easier. And then we have even more of a > mess to clean up later. > so maybe don't do anything new, and np.mat can produce an array at some point in the future when np.matrix is deprecated.... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Jul 18 15:13:05 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 18 Jul 2014 15:13:05 -0400 Subject: [Numpy-discussion] `allclose` vs `assert_allclose` In-Reply-To: References: Message-ID: On Fri, Jul 18, 2014 at 2:44 PM, Nathaniel Smith wrote: > On 18 Jul 2014 19:31, wrote: > >> > >> > >> > Making the behavior of assert_allclose depending on whether desired is > >> > exactly zero or 1e-20 looks too difficult to remember, and which > desired I > >> > use would depend on what I get out of R or Stata. > >> > >> I thought your whole point here was that 1e-20 and zero are > >> qualitatively different values that you would not want to accidentally > >> confuse? Surely R and Stata aren't returning exact zeros for small > >> non-zero values like probability tails? > >> > > > > I was thinking of the case when we only see "pvalue < 1e-16" or > something like this, and we replace this by assert close to zero. > > which would translate to `assert_allclose(pvalue, 0, atol=1e-16)` > > with maybe an additional rtol=1e-11 if we have an array of pvalues where > some are "large" (>0.5). > > This example is also handled correctly by my proposal :-) > depends on the details of your proposal alternative: desired is exactly zero means assert_equal (Pdb) self.res_reg.params[m:] array([ 0., 0., 0.]) (Pdb) assert_allclose(0, self.res_reg.params[m:]) (Pdb) assert_allclose(0, self.res_reg.params[m:], rtol=0, atol=0) (Pdb) This test uses currently assert_almost_equal with decimal=4 :( regularized estimation with hard thresholding: the first m values are estimate not equal zero, the m to the end elements are "exactly zero". This is discrete models fit_regularized which predates numpy assert_allclose. I haven't checked what the unit test of Kerby's current additions for fit_regularized looks like. unit testing is serious business: I'd rather have good unit test in SciPy related packages than convincing a few more newbies that they can use the defaults for everything. Josef > -n > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Jul 18 15:13:21 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 18 Jul 2014 12:13:21 -0700 Subject: [Numpy-discussion] `allclose` vs `assert_allclose` In-Reply-To: <53C96BB8.4060104@iki.fi> References: <53C96BB8.4060104@iki.fi> Message-ID: On Fri, Jul 18, 2014 at 11:47 AM, Pauli Virtanen wrote: > Using allclose in non-test code without specifying both tolerances > explicitly is IMHO a sign of sloppiness, as the default tolerances are > both pretty big (and atol != 0 is not scale-free). > using it without specifying tolerances is sloppy in ANY use case. Bumping the tolerances in assert_allclose() up to match allclose() will > probably not break code, but it can render some tests ineffective. > being a bit pedantic here, but rendering a test ineffective IS breaking code. And I'd rather a change break my tests than render them ineffective -- if they break, I'll go look at them. If they are rendered ineffective, I'll never notice. Curious here -- is atol necessary for anything OTHER than near zero? I can see that in a given case, you may know exactly what range of values to expect (and everything in the array is of the same order of magnitude), but an appropriate rtol would work there too. If only zero testing is needed, then atol=0 makes sense as a default. (or maybe atol=eps) Note: """ The relative difference (`rtol` * abs(`b`)) and the absolute difference `atol` are added together to compare against the absolute difference between `a` and `b`. """ Which points to seting atol=0 for the default as well, or it can totally mess up a test on very small numbers. I'll bet there is a LOT of sloppy use of these out the wild (I know I've been sloppy), and Im starting to think that atol=0 is the ONLY appropriate default for the sloppy among us for instance: In [40]: a1 = np.array([1e-100]) In [41]: a2 = np.array([1.00000001e-100]) In [42]: np.all np.all np.allclose np.alltrue In [42]: np.allclose(a1, a2, rtol=1e-10) Out[42]: True In [43]: np.allclose(a1, a2, rtol=1e-10, atol=0) Out[43]: False That's really not good. By the way: Definition: np.allclose(a, b, rtol=1e-05, atol=1e-08) Really? those are HUGE defaults for double-precision math. I can't believe I haven't looked more closely at this before! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Jul 18 15:16:53 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 18 Jul 2014 12:16:53 -0700 Subject: [Numpy-discussion] String type again. In-Reply-To: References: <-4597269384285942771@unknownmsgid> Message-ID: On Fri, Jul 18, 2014 at 9:59 AM, Nathaniel Smith wrote: > IMO the extra characters aren't the most compelling argument for > latin1 over ascii. Latin1 gives the nice assurance that if some jerk > *does* give me an "ascii" file that somewhere has some byte with the > 8th bit set, then I can still load the data and fix things by hand. > Absolutely! py2's frequent barfing on the ascii encoding is really a pain. And if you aren't going tin enforce ascii, then better to be clear about what those extra bits mean. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Jul 18 15:27:43 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 18 Jul 2014 12:27:43 -0700 Subject: [Numpy-discussion] String type again. In-Reply-To: References: <-4597269384285942771@unknownmsgid> Message-ID: On Fri, Jul 18, 2014 at 10:29 AM, Andrew Collette wrote: > The root of the issue is that HDF5 provides a limited set of > fixed-storage-width string types, and a fixed-storage-width NumPy type > of the same size using Latin-1 can't map to any of them without losing > data. For example, if "a10" is a hypothetical 10-byte-wide NumPy > dtype using Latin-1, reading/writing to an "a10" HDF5 dataset backed > with 10-byte UTF-8 storage would risk truncation, even if the > advertised widths are the same. > I do get this, yes. > There is unfortunately nothing we can do in the h5py code base to > paper over this... it's a limitation of the format. yup. Similar limitations in numpy. > This is where I wonder about HDF's "ascii" type -- is it really ascii? > Or is > > it that old standby > > > one-byte-per-character-and-if-it's-ascii-we-all-know-what-it-means-but-if-it's-not-we'll-still-pass-it-around > > type? i.e the old char* ? > > > > In which case, you can just push a latin-1 type into and out of your HDF > > ascii arrays and everything will work just fine. Unless someone stores > > something other than latin-1 or ascii in it -- but even then, the bytes > > would still be preserved. > > The encoding is explicitly ASCII (H5T_ASCII, in HDF5 lingo). > Anecdotally, I've heard people store other encodings in it, but (1) > I'm not eager to make things worse by mis-labelling data, and (2) the > HDF Group has made indications that they may start checking the > encoding at conversion time. (1) is particularly important, as a > major focus of h5py is compatibility with the rest of the HDF5 > ecosystem. > If it were me, I'd encourage the HDF group to NOT enforce ascii. just like with the numpy 'S' type, I'm guessing there is a fair bit of code in the wild that [ab]uses the ascii type by throwing other bytes in there. In fact, this one reason that utf-8 is so popular -- you still use all that code that simply takes a char* and passes it around (or maybe compares it), without making any assumptions about what it means. that from this particular HDF5 perspective, they provide maximum > compatibility and minimize the chances of accidental data loss. What it would do is push the problem from the HDF5<->numpy interface to the python<->numpy interface. I'm not sure that's a good trade off. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Jul 18 15:32:55 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 18 Jul 2014 12:32:55 -0700 Subject: [Numpy-discussion] String type again. In-Reply-To: References: <-4597269384285942771@unknownmsgid> Message-ID: On Fri, Jul 18, 2014 at 9:59 AM, Nathaniel Smith wrote: > IMO the extra characters aren't the most compelling argument for > latin1 over ascii. Latin1 gives the nice assurance that if some jerk > *does* give me an "ascii" file that somewhere has some byte with the > 8th bit set, then I can still load the data and fix things by hand. > On Fri, Jul 18, 2014 at 10:39 AM, Charles R Harris < charlesr.harris at gmail.com> wrote: > Just to throw in one more complication, there is no buffer protocol for a > fixed encoding type. In Python 3 'c', 's', 'p' are all considered as bytes, > in Python 2 as strings. I suppose another option is to formally cal it what has been a defacto non-standard for years: ascii-with-who-knows-what-for-the-higher-codes. i.e ASCII, but not barf on decoding, (replace?). but you can use latin-1 the same way, so why not? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Fri Jul 18 15:43:05 2014 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 18 Jul 2014 22:43:05 +0300 Subject: [Numpy-discussion] `allclose` vs `assert_allclose` In-Reply-To: References: <53C96BB8.4060104@iki.fi> Message-ID: 18.07.2014 22:13, Chris Barker kirjoitti: [clip] > but an appropriate rtol would work there too. If only zero testing is > needed, then atol=0 makes sense as a default. (or maybe atol=eps) There's plenty of room below eps, but finfo(float).tiny ~ 3e-308 (or some big multiple) is also reasonable in the scale-freeness sense. From andrew.collette at gmail.com Fri Jul 18 15:52:10 2014 From: andrew.collette at gmail.com (Andrew Collette) Date: Fri, 18 Jul 2014 13:52:10 -0600 Subject: [Numpy-discussion] String type again. In-Reply-To: References: <-4597269384285942771@unknownmsgid> Message-ID: Hi Chris, > What it would do is push the problem from the HDF5<->numpy interface to the > python<->numpy interface. > > I'm not sure that's a good trade off. Maybe I'm being too paranoid about the truncation issue. We already perform truncation when going from e.g. vlen to fixed-width strings in h5py... it's just the truncation behavior for same-width data that throws me. Here's a strawman for how a Latin-1 "a" type might be handled in h5py: 1. Creation from existing "a" data: Use vlen strings. Doesn't preserve the dtype, but maybe that's not so important. 2. Writing from "a" data to fixed-width ASCII: Copy, and replace bytes>127 with "?" (or don't) 3. Writing from "a" data to fixed-width UTF-8: Transcode and truncate (being careful not to end in the middle of a multibyte character) 4. Reading from fixed-width ASCII to "a": Straight copy, no inspection 5. Reading from fixed-width UTF-8 to "a": Copy, and replace non-Latin-1 chars with "?" (The above example uses replacement rather than raising an exception, because an exception in the HDF5 conversion callback will leave the write/read half-completed). In any case, I can say that the lack of an text 'S' type in NumPy has been a significant pain point for h5py users on Python 3 over the years. Whatever specific encoding ends up being used, such a type can only improve the situation, and I'm firmly in favor of it. Andrew From joseph.martinot-lagarde at m4x.org Fri Jul 18 16:15:42 2014 From: joseph.martinot-lagarde at m4x.org (Joseph Martinot-Lagarde) Date: Fri, 18 Jul 2014 22:15:42 +0200 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: References: <53B9C861.3090809@hawaii.edu> <-2968451659458027190@unknownmsgid> <53C953F5.90100@gmail.com> Message-ID: <53C9806E.3090008@m4x.org> Le 18/07/2014 20:42, Charles G. Waldman a ?crit : > Well, if the goal is "shorthand", typing numpy.array(numpy.mat()) > won't please many users. > > But the more I think about it, the less I think Numpy should support > this (non-Pythonic) input mode. Too much molly-coddling of new users! > When doing interactive work I usually just type: > >>>> np.array([[1,2,3], > ... [4,5,6], > ... [7,8,9]]) > > which is (IMO) easier to read: e.g. it's not totally obvious that > "1,0,0;0,1,0;0,0,1" represents a 3x3 identity matrix, but > > [[1,0,0], > [0,1,0], > [0,0,1]] > > is pretty obvious. > Compare what's comparable: [[1,0,0], [0,1,0], [0,0,1]] vs "1 0 0;" "0 1 0;" "0 0 1" or """ 1 0 0; 0 1 0; 0 0 1 """ [[1,0,0], [0,1,0], [0,0,1]] vs "1 0 0; 0 1 0; 0 0 1" > The difference in (non-whitespace) chars is 19 vs 25, so the > "shorthand" doesn't seem to save that much. Well, it's easier to type "" (twice the same character) than [], and you have no risk in swapping en opening and a closing bracket. In addition, you have to use AltGr on some keyboards to get the brackets. It doesn't boils down to a number of characters. > > Just my ?0.02, > > - C > > > > > On Fri, Jul 18, 2014 at 10:05 AM, Alan G Isaac wrote: >> On 7/18/2014 12:45 PM, Mark Miller wrote: >>> If the true goal is to just allow quick entry of a 2d array, why not just advocate using >>> a = numpy.array(numpy.mat("1 2 3; 4 5 6; 7 8 9")) >> >> >> It's even simpler: >> a = np.mat(' 1 2 3;4 5 6;7 8 9').A >> >> I'm not putting a dog in this race. Still I would say that >> the reason why such proposals miss the point is that >> there are introductory settings where one would like >> to explain as few complications as possible. In >> particular, one might prefer *not* to discuss the >> existence of a matrix type. As an additional downside, >> this is only good for 2d, and there have been proposals >> for the new array builder to handle other dimensions. >> >> fwiw, >> Alan Isaac >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From charles at crunch.io Fri Jul 18 16:21:04 2014 From: charles at crunch.io (Charles G. Waldman) Date: Fri, 18 Jul 2014 13:21:04 -0700 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: <53C9806E.3090008@m4x.org> References: <53B9C861.3090809@hawaii.edu> <-2968451659458027190@unknownmsgid> <53C953F5.90100@gmail.com> <53C9806E.3090008@m4x.org> Message-ID: Joseph Martinot-Lagarde writes: > Compare what's comparable: That's fair. > In addition, you have to use AltGr on some keyboards to get the brackets Wow, it must be rather painful to do any real programming on such a keyboard! - C On Fri, Jul 18, 2014 at 1:15 PM, Joseph Martinot-Lagarde wrote: > Le 18/07/2014 20:42, Charles G. Waldman a ?crit : >> Well, if the goal is "shorthand", typing numpy.array(numpy.mat()) >> won't please many users. >> >> But the more I think about it, the less I think Numpy should support >> this (non-Pythonic) input mode. Too much molly-coddling of new users! >> When doing interactive work I usually just type: >> >>>>> np.array([[1,2,3], >> ... [4,5,6], >> ... [7,8,9]]) >> >> which is (IMO) easier to read: e.g. it's not totally obvious that >> "1,0,0;0,1,0;0,0,1" represents a 3x3 identity matrix, but >> >> [[1,0,0], >> [0,1,0], >> [0,0,1]] >> >> is pretty obvious. >> > Compare what's comparable: > > [[1,0,0], > [0,1,0], > [0,0,1]] > > vs > > "1 0 0;" > "0 1 0;" > "0 0 1" > > or > > """ > 1 0 0; > 0 1 0; > 0 0 1 > """ > > [[1,0,0], [0,1,0], [0,0,1]] > vs > "1 0 0; 0 1 0; 0 0 1" > >> The difference in (non-whitespace) chars is 19 vs 25, so the >> "shorthand" doesn't seem to save that much. > > Well, it's easier to type "" (twice the same character) than [], and you > have no risk in swapping en opening and a closing bracket. In addition, > you have to use AltGr on some keyboards to get the brackets. It doesn't > boils down to a number of characters. > >> >> Just my ?0.02, >> >> - C >> >> >> >> >> On Fri, Jul 18, 2014 at 10:05 AM, Alan G Isaac wrote: >>> On 7/18/2014 12:45 PM, Mark Miller wrote: >>>> If the true goal is to just allow quick entry of a 2d array, why not just advocate using >>>> a = numpy.array(numpy.mat("1 2 3; 4 5 6; 7 8 9")) >>> >>> >>> It's even simpler: >>> a = np.mat(' 1 2 3;4 5 6;7 8 9').A >>> >>> I'm not putting a dog in this race. Still I would say that >>> the reason why such proposals miss the point is that >>> there are introductory settings where one would like >>> to explain as few complications as possible. In >>> particular, one might prefer *not* to discuss the >>> existence of a matrix type. As an additional downside, >>> this is only good for 2d, and there have been proposals >>> for the new array builder to handle other dimensions. >>> >>> fwiw, >>> Alan Isaac >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From chris.barker at noaa.gov Fri Jul 18 16:32:50 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 18 Jul 2014 13:32:50 -0700 Subject: [Numpy-discussion] `allclose` vs `assert_allclose` In-Reply-To: References: <53C96BB8.4060104@iki.fi> Message-ID: On Fri, Jul 18, 2014 at 12:43 PM, Pauli Virtanen wrote: > 18.07.2014 22:13, Chris Barker kirjoitti: > [clip] > > but an appropriate rtol would work there too. If only zero testing is > > needed, then atol=0 makes sense as a default. (or maybe atol=eps) > > There's plenty of room below eps, but finfo(float).tiny ~ 3e-308 (or > some big multiple) is also reasonable in the scale-freeness sense. right! brain blip -- eps is the difference between 1 and then next larger representable number, yes? So a long way away from smallest representable number. So yes, zero or [something]e-308 -- making zero seem like a good idea again.... is it totally ridiculous to have the default be dependent on dtype? float32 vs float64? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Jul 18 16:44:34 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 18 Jul 2014 16:44:34 -0400 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: References: <53B9C861.3090809@hawaii.edu> <-2968451659458027190@unknownmsgid> <53C953F5.90100@gmail.com> <53C9806E.3090008@m4x.org> Message-ID: On Fri, Jul 18, 2014 at 4:21 PM, Charles G. Waldman wrote: > Joseph Martinot-Lagarde writes: > > > Compare what's comparable: > > That's fair. > > > In addition, you have to use AltGr on some keyboards to get the brackets > > Wow, it must be rather painful to do any real programming on such a > keyboard! > > - C > > > On Fri, Jul 18, 2014 at 1:15 PM, Joseph Martinot-Lagarde > wrote: > > Le 18/07/2014 20:42, Charles G. Waldman a ?crit : > >> Well, if the goal is "shorthand", typing numpy.array(numpy.mat()) > >> won't please many users. > >> > >> But the more I think about it, the less I think Numpy should support > >> this (non-Pythonic) input mode. Too much molly-coddling of new users! > >> When doing interactive work I usually just type: > >> > >>>>> np.array([[1,2,3], > >> ... [4,5,6], > >> ... [7,8,9]]) > >> > >> which is (IMO) easier to read: e.g. it's not totally obvious that > >> "1,0,0;0,1,0;0,0,1" represents a 3x3 identity matrix, but > >> > >> [[1,0,0], > >> [0,1,0], > >> [0,0,1]] > >> > >> is pretty obvious. > >> > > Compare what's comparable: > > > > [[1,0,0], > > [0,1,0], > > [0,0,1]] > > > > vs > > > > "1 0 0;" > > "0 1 0;" > > "0 0 1" > > > > or > > > > """ > > 1 0 0; > > 0 1 0; > > 0 0 1 > > """ > > > > [[1,0,0], [0,1,0], [0,0,1]] > > vs > > "1 0 0; 0 1 0; 0 0 1" > > > >> The difference in (non-whitespace) chars is 19 vs 25, so the > >> "shorthand" doesn't seem to save that much. > > > > Well, it's easier to type "" (twice the same character) than [], and you > > have no risk in swapping en opening and a closing bracket. In addition, > > you have to use AltGr on some keyboards to get the brackets. It doesn't > > boils down to a number of characters. > > > >> > >> Just my ?0.02, > It's the year of the notebook. notebooks are reusable. notebooks correctly align the brackets in the second and third line and it looks pretty, just like a matrix (But, I don't have to teach newbies, and often I even correct whitespace on the commandline, because it looks ugly and I will eventually copy it to a script file.) Josef no broken windows! well, except for the ones I don't feel like fixing right now. > >> > >> - C > >> > >> > >> > >> > >> On Fri, Jul 18, 2014 at 10:05 AM, Alan G Isaac > wrote: > >>> On 7/18/2014 12:45 PM, Mark Miller wrote: > >>>> If the true goal is to just allow quick entry of a 2d array, why not > just advocate using > >>>> a = numpy.array(numpy.mat("1 2 3; 4 5 6; 7 8 9")) > >>> > >>> > >>> It's even simpler: > >>> a = np.mat(' 1 2 3;4 5 6;7 8 9').A > >>> > >>> I'm not putting a dog in this race. Still I would say that > >>> the reason why such proposals miss the point is that > >>> there are introductory settings where one would like > >>> to explain as few complications as possible. In > >>> particular, one might prefer *not* to discuss the > >>> existence of a matrix type. As an additional downside, > >>> this is only good for 2d, and there have been proposals > >>> for the new array builder to handle other dimensions. > >>> > >>> fwiw, > >>> Alan Isaac > >>> > >>> _______________________________________________ > >>> NumPy-Discussion mailing list > >>> NumPy-Discussion at scipy.org > >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Jul 18 16:44:39 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 18 Jul 2014 13:44:39 -0700 Subject: [Numpy-discussion] String type again. In-Reply-To: References: <-4597269384285942771@unknownmsgid> Message-ID: On Fri, Jul 18, 2014 at 12:52 PM, Andrew Collette wrote: > > What it would do is push the problem from the HDF5<->numpy interface to > the > > python<->numpy interface. > > > > I'm not sure that's a good trade off. > > Maybe I'm being too paranoid about the truncation issue. Actually, I agree about the truncation issue, but it's a question of where to put it -- I'm suggesting that I don't want it at the python<->numpy interface. > Here's a strawman for how a Latin-1 "a" type might be handled in h5py: > > 1. Creation from existing "a" data: Use vlen strings. Doesn't > preserve the dtype, but maybe that's not so important. > do vlen strings support full unicode? -- then, yes, that's good. > 2. Writing from "a" data to fixed-width ASCII: Copy, and replace > bytes>127 with "?" (or don't) > I'd vote for don't, unless HDF starts enforcing pure ascii. But if it does, then yes, replacement makes more sense than exceptions. 3. Writing from "a" data to fixed-width UTF-8: Transcode and truncate > (being careful not to end in the middle of a multibyte character) > yup -- buyer beware. > 4. Reading from fixed-width ASCII to "a": Straight copy, no inspection > yup. > 5. Reading from fixed-width UTF-8 to "a": Copy, and replace > non-Latin-1 chars with "?" > sure what about reading from fixed-width UTF-8 to 'U' -- that seems like the natural way to go for unicode. Tough a bit hard to know how long U needs to be -- but <= the length of the utf-8 array (in characters). > (The above example uses replacement rather than raising an exception, > because an exception in the HDF5 conversion callback will leave the > write/read half-completed). > and really -- what would you do with an exception on read? give up and throw the file away? note that I'm also proposing a "bytes" dtype, which might make sense for grabbing utf-8 data from HDF-5. Then either h5py or the user could decode to a unicode type. In any case, I can say that the lack of an text 'S' type in NumPy has > been a significant pain point for h5py users on Python 3 over the > years. isn't the current 'S' a pretty good map to hdf ascii? Whatever specific encoding ends up being used, such a type can > only improve the situation, and I'm firmly in favor of it. agreed. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Jul 18 16:46:23 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 18 Jul 2014 13:46:23 -0700 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: <53C9806E.3090008@m4x.org> References: <53B9C861.3090809@hawaii.edu> <-2968451659458027190@unknownmsgid> <53C953F5.90100@gmail.com> <53C9806E.3090008@m4x.org> Message-ID: On Fri, Jul 18, 2014 at 1:15 PM, Joseph Martinot-Lagarde < joseph.martinot-lagarde at m4x.org> wrote: > In addition, > you have to use AltGr on some keyboards to get the brackets. If it's hard to type square brackets -- you're kind of dead in the water with Python anyway -- this is not going to help. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Fri Jul 18 16:53:26 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Fri, 18 Jul 2014 22:53:26 +0200 Subject: [Numpy-discussion] proposal: new commit guidelines for backportable bugfixes In-Reply-To: <53C95DA3.7010901@iki.fi> References: <53C95DA3.7010901@iki.fi> Message-ID: <53C98946.3020405@googlemail.com> On 18.07.2014 19:47, Pauli Virtanen wrote: > 18.07.2014 19:35, Julian Taylor kirjoitti: >> On Fri, Jul 18, 2014 at 6:23 PM, Nathaniel Smith >> wrote: >>> On 18 Jul 2014 15:36, "Julian Taylor" >>> wrote: >>>> >>>> git rebase --onto $(git merge-base master maintenance/1.9.x) >>>> HEAD^ >>> >>> As a potential refinement, this might be simpler if we define a >>> branch that points to this commit. >>> >> >> we could do that, though the merge base changes to the last commit >> that was merged in that way. The old merge base is still valid but >> much older. I applied this method to some of my bugfixes so the >> current merge base of master and 1.9 is a commit from yesterday >> not anymore the diverging point of master and 1.9. But I don't know >> if the newer merge base makes any difference to git. > > Will the merge base actually ever change if you don't merge the > branches to each other we want to merge them into each other so a change of merge base is unavoidable. > > The other well-known alternative to bugfixes is to first commit it in > the earliest maintenance branch where you want to have it, and then > merge that branch forward to the newer maintenance branches, and > finally into master. > wouldn't that still require basing bugfixes onto the point before the master and maintenance branch diverged? otherwise a merge from maintenance to master would include the commits that are only part of the maintenance branch (release commits, regression fixes etc.) basing bugfixes on maintenance does allow cherry picking into master as you don't care too much about backward mergeability here, but you still lose a good git log and git branch --contains to check which bugfix is in which branch. From joseph.martinot-lagarde at m4x.org Fri Jul 18 17:04:11 2014 From: joseph.martinot-lagarde at m4x.org (Joseph Martinot-Lagarde) Date: Fri, 18 Jul 2014 23:04:11 +0200 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: References: <53B9C861.3090809@hawaii.edu> <-2968451659458027190@unknownmsgid> <53C953F5.90100@gmail.com> <53C9806E.3090008@m4x.org> Message-ID: Le 18/07/2014 22:46, Chris Barker a ?crit : > On Fri, Jul 18, 2014 at 1:15 PM, Joseph Martinot-Lagarde > > wrote: > > In addition, > you have to use AltGr on some keyboards to get the brackets. > > > If it's hard to type square brackets -- you're kind of dead in the water > with Python anyway -- this is not going to help. > > -Chris > Welcome to the azerty world ! ;) It's not that hard to type, just a bit more involved. My biggest problem is that you have to type the opening and closing bracket for each line, with a comma in between. It will always be harder and more error prone than a single semicolon, whatever the keyboard. My use case is not teaching but doing quick'n'dirty computations with a few values. Sometimes these values are copy-pasted from a space separated file, or from a printed array in another console. Having to add comas and bracket makes simple computations less easy. That's why I often use Octave for these. From andrew.collette at gmail.com Fri Jul 18 17:30:59 2014 From: andrew.collette at gmail.com (Andrew Collette) Date: Fri, 18 Jul 2014 15:30:59 -0600 Subject: [Numpy-discussion] String type again. In-Reply-To: References: <-4597269384285942771@unknownmsgid> Message-ID: Hi Chris, > Actually, I agree about the truncation issue, but it's a question of where > to put it -- I'm suggesting that I don't want it at the python<->numpy > interface. Yes, that's a good point. Of course, by using Latin-1 rather than UTF-8 we can't support all Unicode code points (hence the "?" replacement possible on read from HDF5). > do vlen strings support full unicode? -- then, yes, that's good. Yes, they do. It's somewhat unfortunate to immediately cast to vlen though, since people usually have fixed-width datasets to start with for efficiency reasons... > what about reading from fixed-width UTF-8 to 'U' -- that seems like the > natural way to go for unicode. Tough a bit hard to know how long U needs to > be -- but <= the length of the utf-8 array (in characters). Space concerns ("U" has a 4x space penalty for ASCII-ish data). Plus, for similar reasons to this discussion, creating "U" datasets is unsupported at the moment. > note that I'm also proposing a "bytes" dtype, which might make sense for > grabbing utf-8 data from HDF-5. Then either h5py or the user could decode to > a unicode type. Sound quite like the existing 'S' type. >> In any case, I can say that the lack of an text 'S' type in NumPy has >> been a significant pain point for h5py users on Python 3 over the >> years. > > isn't the current 'S' a pretty good map to hdf ascii? Yes; in fact, right now all fixed-width strings in h5py (ASCII and UTF-8) are read/written as 'S'. The problem is that on Py3, 'S' is treated as bytes, not text, so you can't freely mix it with str. I am about to leave for the weekend... thanks for a great discussion! To conclude, it strikes me that in choosing an encoding we get to pick at most two of the following: 1. Support for all Unicode characters 2. Fixed number of characters 3. Fixed number of storage bytes At this point, I would vote for UTF-8 in a fixed width buffer (1/3), but I imagine as this progresses towards a NEP others will weigh in. Andrew From josef.pktd at gmail.com Fri Jul 18 17:31:28 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 18 Jul 2014 17:31:28 -0400 Subject: [Numpy-discussion] Short-hand array creation in `numpy.mat` style In-Reply-To: References: <53B9C861.3090809@hawaii.edu> <-2968451659458027190@unknownmsgid> <53C953F5.90100@gmail.com> <53C9806E.3090008@m4x.org> Message-ID: On Fri, Jul 18, 2014 at 5:04 PM, Joseph Martinot-Lagarde < joseph.martinot-lagarde at m4x.org> wrote: > Le 18/07/2014 22:46, Chris Barker a ?crit : > > On Fri, Jul 18, 2014 at 1:15 PM, Joseph Martinot-Lagarde > > > > wrote: > > > > In addition, > > you have to use AltGr on some keyboards to get the brackets. > > > > > > If it's hard to type square brackets -- you're kind of dead in the water > > with Python anyway -- this is not going to help. > > > > -Chris > > > Welcome to the azerty world ! ;) > > It's not that hard to type, just a bit more involved. My biggest problem > is that you have to type the opening and closing bracket for each line, > with a comma in between. It will always be harder and more error prone > than a single semicolon, whatever the keyboard. > > My use case is not teaching but doing quick'n'dirty computations with a > few values. Sometimes these values are copy-pasted from a space > separated file, or from a printed array in another console. Having to > add comas and bracket makes simple computations less easy. That's why I > often use Octave for these. > my copy paste approaches for almost quick'n'dirty (no semicolons): given: a b c 1 2 3 4 5 6 7 8 9 (select & Ctrl-C) >>> pandas.read_clipboard(sep=' ') a b c 0 1 2 3 1 4 5 6 2 7 8 9 >>> np.asarray(pandas.read_clipboard()) array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=int64) >>> pandas.read_clipboard().values array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=int64) arr = np.array('''\ 1 2 3 4 5 6 7 8 9'''.split(), float).reshape(-1, 3) the last is not so quick and dirty but reusable and reused. Josef > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jul 18 17:49:20 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 18 Jul 2014 15:49:20 -0600 Subject: [Numpy-discussion] String type again. In-Reply-To: References: <-4597269384285942771@unknownmsgid> Message-ID: On Fri, Jul 18, 2014 at 3:30 PM, Andrew Collette wrote: > Hi Chris, > > > Actually, I agree about the truncation issue, but it's a question of > where > > to put it -- I'm suggesting that I don't want it at the python<->numpy > > interface. > > Yes, that's a good point. Of course, by using Latin-1 rather than > UTF-8 we can't support all Unicode code points (hence the "?" > replacement possible on read from HDF5). > > > do vlen strings support full unicode? -- then, yes, that's good. > > Yes, they do. It's somewhat unfortunate to immediately cast to vlen > though, since people usually have fixed-width datasets to start with > for efficiency reasons... > > > what about reading from fixed-width UTF-8 to 'U' -- that seems like the > > natural way to go for unicode. Tough a bit hard to know how long U needs > to > > be -- but <= the length of the utf-8 array (in characters). > > Space concerns ("U" has a 4x space penalty for ASCII-ish data). Plus, > for similar reasons to this discussion, creating "U" datasets is > unsupported at the moment. > > > note that I'm also proposing a "bytes" dtype, which might make sense for > > grabbing utf-8 data from HDF-5. Then either h5py or the user could > decode to > > a unicode type. > > Sound quite like the existing 'S' type. > > >> In any case, I can say that the lack of an text 'S' type in NumPy has > >> been a significant pain point for h5py users on Python 3 over the > >> years. > > > > isn't the current 'S' a pretty good map to hdf ascii? > > Yes; in fact, right now all fixed-width strings in h5py (ASCII and > UTF-8) are read/written as 'S'. The problem is that on Py3, 'S' is > treated as bytes, not text, so you can't freely mix it with str. > > I am about to leave for the weekend... thanks for a great discussion! > To conclude, it strikes me that in choosing an encoding we get to pick > at most two of the following: > > 1. Support for all Unicode characters > 2. Fixed number of characters > 3. Fixed number of storage bytes > > At this point, I would vote for UTF-8 in a fixed width buffer (1/3), > but I imagine as this progresses towards a NEP others will weigh in. > At some point I'm pretty sure we will want to support utf-8 as it looks well on its way to a universal standard. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Fri Jul 18 18:44:47 2014 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 19 Jul 2014 01:44:47 +0300 Subject: [Numpy-discussion] proposal: new commit guidelines for backportable bugfixes In-Reply-To: <53C98946.3020405@googlemail.com> References: <53C95DA3.7010901@iki.fi> <53C98946.3020405@googlemail.com> Message-ID: 18.07.2014 23:53, Julian Taylor kirjoitti: > On 18.07.2014 19:47, Pauli Virtanen wrote: [clip] > > The other well-known alternative to bugfixes is to first commit it in > > the earliest maintenance branch where you want to have it, and then > > merge that branch forward to the newer maintenance branches, and > > finally into master. > > wouldn't that still require basing bugfixes onto the point before the > master and maintenance branch diverged? > otherwise a merge from maintenance to master would include the commits > that are only part of the maintenance branch (release commits, > regression fixes etc.) If I understand correctly, the idea is to manually revert the changes that don't belong in, which needs to be only done once for each, as the merge logic should deal with it in all subsequent merges. I think there are in practice not so many commits that you want to have only in the release branch. Version number bumping is one (and easily addressed by a follow-up commit in master that bumps it again) --- what else? The bugfix-in-release-and-forward-port-to-master seems to be the recommended practice for Mercurial: http://mercurial.selenic.com/wiki/StandardBranching https://docs.python.org/devguide/committing.html I think there are also git guides that recommend using it. The option of basing commits on last merge base is probably not really feasible with Mercurial (I haven't seen git guides that propose it either). > basing bugfixes on maintenance does allow cherry picking into master as > you don't care too much about backward mergeability here, but you still > lose a good git log and git branch --contains to check which bugfix is > in which branch. I don't disagree with this. Cherry picking is OK, but only as long as the number of commits is not too large and you use a tool (e.g. my git-cherry-tree) that tries to check which patches are in and which not. Pauli From njs at pobox.com Fri Jul 18 18:49:08 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 18 Jul 2014 23:49:08 +0100 Subject: [Numpy-discussion] proposal: new commit guidelines for backportable bugfixes In-Reply-To: References: <53C95DA3.7010901@iki.fi> <53C98946.3020405@googlemail.com> Message-ID: On Fri, Jul 18, 2014 at 11:44 PM, Pauli Virtanen wrote: > 18.07.2014 23:53, Julian Taylor kirjoitti: >> On 18.07.2014 19:47, Pauli Virtanen wrote: > [clip] >> > The other well-known alternative to bugfixes is to first commit it in >> > the earliest maintenance branch where you want to have it, and then >> > merge that branch forward to the newer maintenance branches, and >> > finally into master. >> >> wouldn't that still require basing bugfixes onto the point before the >> master and maintenance branch diverged? >> otherwise a merge from maintenance to master would include the commits >> that are only part of the maintenance branch (release commits, >> regression fixes etc.) > > If I understand correctly, the idea is to manually revert the changes > that don't belong in, which needs to be only done once for each, as the > merge logic should deal with it in all subsequent merges. > > I think there are in practice not so many commits that you want to have > only in the release branch. Version number bumping is one (and easily > addressed by a follow-up commit in master that bumps it again) --- what > else? Presumably all the commits that we miss on the first pass and end up backporting the hard way later :-) -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From pav at iki.fi Fri Jul 18 19:10:09 2014 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 19 Jul 2014 02:10:09 +0300 Subject: [Numpy-discussion] proposal: new commit guidelines for backportable bugfixes In-Reply-To: References: <53C95DA3.7010901@iki.fi> <53C98946.3020405@googlemail.com> Message-ID: 19.07.2014 01:49, Nathaniel Smith kirjoitti: > On Fri, Jul 18, 2014 at 11:44 PM, Pauli Virtanen wrote: >> 18.07.2014 23:53, Julian Taylor kirjoitti: >>> On 18.07.2014 19:47, Pauli Virtanen wrote: >> [clip] >>>> The other well-known alternative to bugfixes is to first commit it in >>>> the earliest maintenance branch where you want to have it, and then >>>> merge that branch forward to the newer maintenance branches, and >>>> finally into master. >>> >>> wouldn't that still require basing bugfixes onto the point before the >>> master and maintenance branch diverged? >>> otherwise a merge from maintenance to master would include the commits >>> that are only part of the maintenance branch (release commits, >>> regression fixes etc.) >> >> If I understand correctly, the idea is to manually revert the changes >> that don't belong in, which needs to be only done once for each, as the >> merge logic should deal with it in all subsequent merges. >> >> I think there are in practice not so many commits that you want to have >> only in the release branch. Version number bumping is one (and easily >> addressed by a follow-up commit in master that bumps it again) --- what >> else? > > Presumably all the commits that we miss on the first pass and end up > backporting the hard way later :-) If those are just cherry-picked, they will generate merge conflicts the next time things are merged back (or, the merge will be smart enough to note the patch was already applied some time ago). This is then probably not really a big problem. Pauli From pav at iki.fi Fri Jul 18 19:13:34 2014 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 19 Jul 2014 02:13:34 +0300 Subject: [Numpy-discussion] proposal: new commit guidelines for backportable bugfixes In-Reply-To: References: <53C95DA3.7010901@iki.fi> <53C98946.3020405@googlemail.com> Message-ID: 19.07.2014 02:10, Pauli Virtanen kirjoitti: > 19.07.2014 01:49, Nathaniel Smith kirjoitti: >> On Fri, Jul 18, 2014 at 11:44 PM, Pauli Virtanen wrote: [clip] >> Presumably all the commits that we miss on the first pass and end up >> backporting the hard way later :-) > > If those are just cherry-picked, they will generate merge conflicts the > next time things are merged back (or, the merge will be smart enough to > note the patch was already applied some time ago). This is then probably > not really a big problem. NB. this is a bit playing devil's advocate --- I'm not advocating porting bugfixes from merge branches, as using the merge base should also work fine. From bramwillemsen at gmail.com Fri Jul 18 20:03:11 2014 From: bramwillemsen at gmail.com (Bram Willemsen) Date: Fri, 18 Jul 2014 19:03:11 -0500 Subject: [Numpy-discussion] BLAS / LAPACK / MKL cannot be found? Message-ID: Hi everyone, I am trying to install a package called PySparse. I have modified the paths in the template site.cfg file. It appears that during the built process it compiles numpy as well, because the error messages I get are all over the numpy mailing list (I did not find one addressing my problem exactly). My MKL libs are installed at "/wgdisk/omega2dev2/env/EL5/intel/composer_xe_2013.0.079/mkl/lib/intel64" . There are BLAS / LAPACK libraries here (I provided the library mkl_rt.so to SuiteSparse for instance, and it resolved all LAPACK and BLAS dependencies that way). But for some reason "/wgdisk/omega2dev2/env/EL5/intel/composer_xe_2013.0.079/mkl/lib/intel64" does not work in the built process below, in what I think is a numpy installer. The output shows that the directory is searched, but that the result is "NOT AVAILABLE". Could someone give me a pointer for why my the MKL/BLAS/LAPACK dependencies cannot be resolved? It would be very appreciated sincerely, Bram --------------------------------------------------------------------------- PART OF THE OUTPUT OF PySparse SETUP.PY --------------------------------------------------------------------------- blas_opt_info: blas_mkl_info: libraries mkl,vml,guide not found in ['/usr/local/lib', '/wgdisk/hy3300/re15/lwillemsen/local_install/lib', '/wgdisk/omega2dev2/env/EL5/intel/composer_xe_2013.0.079/mkl/lib/intel64'] NOT AVAILABLE openblas_info: libraries not found in ['/usr/local/lib', '/wgdisk/hy3300/re15/lwillemsen/local_install/lib', '/wgdisk/omega2dev2/env/EL5/intel/composer_xe_2013.0.079/mkl/lib/intel64'] NOT AVAILABLE atlas_blas_threads_info: Setting PTATLAS=ATLAS libraries ptf77blas,ptcblas,atlas not found in ['/usr/local/lib', '/wgdisk/hy3300/re15/lwillemsen/local_install/lib', '/wgdisk/omega2dev2/env/EL5/intel/composer_xe_2013.0.079/mkl/lib/intel64'] NOT AVAILABLE atlas_blas_info: libraries f77blas,cblas,atlas not found in ['/usr/local/lib', '/wgdisk/hy3300/re15/lwillemsen/local_install/lib', '/wgdisk/omega2dev2/env/EL5/intel/composer_xe_2013.0.079/mkl/lib/intel64'] NOT AVAILABLE blas_info: libraries blas not found in ['/usr/local/lib', '/wgdisk/hy3300/re15/lwillemsen/local_install/lib', '/wgdisk/omega2dev2/env/EL5/intel/composer_xe_2013.0.079/mkl/lib/intel64'] NOT AVAILABLE blas_src_info: NOT AVAILABLE NOT AVAILABLE No blas info found Sparse:: Using BLAS info: {} Using dflt_lib_dirs = /usr/local/lib:/wgdisk/hy3300/re15/lwillemsen/local_install/lib:/wgdisk/omega2dev2/env/EL5/intel/composer_xe_2013.0.079/mkl/lib/intel64 Using dflt_libs = [] No blas info found Eigen:: Using BLAS info: {} lapack_opt_info: lapack_mkl_info: mkl_info: libraries mkl,vml,guide not found in ['/usr/local/lib', '/wgdisk/hy3300/re15/lwillemsen/local_install/lib', '/wgdisk/omega2dev2/env/EL5/intel/composer_xe_2013.0.079/mkl/lib/intel64'] NOT AVAILABLE NOT AVAILABLE atlas_threads_info: Setting PTATLAS=ATLAS libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib libraries lapack_atlas not found in /usr/local/lib libraries ptf77blas,ptcblas,atlas not found in /wgdisk/hy3300/re15/lwillemsen/local_install/lib libraries lapack_atlas not found in /wgdisk/hy3300/re15/lwillemsen/local_install/lib libraries ptf77blas,ptcblas,atlas not found in /wgdisk/omega2dev2/env/EL5/intel/composer_xe_2013.0.079/mkl/lib/intel64 libraries lapack_atlas not found in /wgdisk/omega2dev2/env/EL5/intel/composer_xe_2013.0.079/mkl/lib/intel64 numpy.distutils.system_info.atlas_threads_info NOT AVAILABLE atlas_info: libraries f77blas,cblas,atlas not found in /usr/local/lib libraries lapack_atlas not found in /usr/local/lib libraries f77blas,cblas,atlas not found in /wgdisk/hy3300/re15/lwillemsen/local_install/lib libraries lapack_atlas not found in /wgdisk/hy3300/re15/lwillemsen/local_install/lib libraries f77blas,cblas,atlas not found in /wgdisk/omega2dev2/env/EL5/intel/composer_xe_2013.0.079/mkl/lib/intel64 libraries lapack_atlas not found in /wgdisk/omega2dev2/env/EL5/intel/composer_xe_2013.0.079/mkl/lib/intel64 numpy.distutils.system_info.atlas_info NOT AVAILABLE lapack_info: libraries lapack not found in ['/usr/local/lib', '/wgdisk/hy3300/re15/lwillemsen/local_install/lib', '/wgdisk/omega2dev2/env/EL5/intel/composer_xe_2013.0.079/mkl/lib/intel64'] NOT AVAILABLE lapack_src_info: NOT AVAILABLE NOT AVAILABLE No lapack info found Eigen:: Using LAPACK info: {} non-existing path in 'pysparse/eigen': '/usr/local/lib:/wgdisk/hy3300/re15/lwillemsen/local_install/lib:/wgdisk/omega2dev2/env/EL5/intel/composer_xe_2013.0.079/mkl/lib/intel64' No blas info found Direct:: Using BLAS info: {} -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jul 18 21:47:25 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 18 Jul 2014 19:47:25 -0600 Subject: [Numpy-discussion] `allclose` vs `assert_allclose` In-Reply-To: References: <53C96BB8.4060104@iki.fi> Message-ID: On Fri, Jul 18, 2014 at 2:32 PM, Chris Barker wrote: > On Fri, Jul 18, 2014 at 12:43 PM, Pauli Virtanen wrote: > >> 18.07.2014 22:13, Chris Barker kirjoitti: >> [clip] >> > but an appropriate rtol would work there too. If only zero testing is >> > needed, then atol=0 makes sense as a default. (or maybe atol=eps) >> >> There's plenty of room below eps, but finfo(float).tiny ~ 3e-308 (or >> some big multiple) is also reasonable in the scale-freeness sense. > > > right! brain blip -- eps is the difference between 1 and then next larger > representable number, yes? So a long way away from smallest representable > number. So yes, zero or [something]e-308 -- making zero seem like a good > idea again.... > > is it totally ridiculous to have the default be dependent on dtype? > float32 vs float64? > > Whatever the final decision is, if the defaults change we should start with a FutureWarning. How we can make that work is uncertain, because I don't know of any reliable way to detect if we are using the default value or if a value was passed in. Maybe just warn if `atol == 0` ? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From argriffi at ncsu.edu Fri Jul 18 21:56:41 2014 From: argriffi at ncsu.edu (alex) Date: Fri, 18 Jul 2014 21:56:41 -0400 Subject: [Numpy-discussion] `allclose` vs `assert_allclose` Message-ID: On Fri, Jul 18, 2014 at 9:47 PM, Charles R Harris wrote: > > > > On Fri, Jul 18, 2014 at 2:32 PM, Chris Barker wrote: >> >> On Fri, Jul 18, 2014 at 12:43 PM, Pauli Virtanen wrote: >>> >>> 18.07.2014 22:13, Chris Barker kirjoitti: >>> [clip] >>> > but an appropriate rtol would work there too. If only zero testing is >>> > needed, then atol=0 makes sense as a default. (or maybe atol=eps) >>> >>> There's plenty of room below eps, but finfo(float).tiny ~ 3e-308 (or >>> some big multiple) is also reasonable in the scale-freeness sense. >> >> >> right! brain blip -- eps is the difference between 1 and then next larger >> representable number, yes? So a long way away from smallest representable >> number. So yes, zero or [something]e-308 -- making zero seem like a good >> idea again.... >> >> is it totally ridiculous to have the default be dependent on dtype? >> float32 vs float64? >> > > Whatever the final decision is, if the defaults change we should start with > a FutureWarning. How we can make that work is uncertain, because I don't > know of any reliable way to detect if we are using the default value or if a > value was passed in. There are tricks like http://stackoverflow.com/questions/12265695, not that I'm suggesting to do that. From ralf.gommers at gmail.com Sat Jul 19 04:04:10 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 19 Jul 2014 10:04:10 +0200 Subject: [Numpy-discussion] proposal: new commit guidelines for backportable bugfixes In-Reply-To: References: <53C95DA3.7010901@iki.fi> <53C98946.3020405@googlemail.com> Message-ID: On Sat, Jul 19, 2014 at 12:44 AM, Pauli Virtanen wrote: > 18.07.2014 23:53, Julian Taylor kirjoitti: > > On 18.07.2014 19:47, Pauli Virtanen wrote: > [clip] > > > The other well-known alternative to bugfixes is to first commit it in > > > the earliest maintenance branch where you want to have it, and then > > > merge that branch forward to the newer maintenance branches, and > > > finally into master. > > > > wouldn't that still require basing bugfixes onto the point before the > > master and maintenance branch diverged? > > otherwise a merge from maintenance to master would include the commits > > that are only part of the maintenance branch (release commits, > > regression fixes etc.) > > If I understand correctly, the idea is to manually revert the changes > that don't belong in, which needs to be only done once for each, as the > merge logic should deal with it in all subsequent merges. > > I think there are in practice not so many commits that you want to have > only in the release branch. Version number bumping is one (and easily > addressed by a follow-up commit in master that bumps it again) --- what > else? > > The bugfix-in-release-and-forward-port-to-master seems to be the > recommended practice for Mercurial: > > http://mercurial.selenic.com/wiki/StandardBranching > > https://docs.python.org/devguide/committing.html > > I think there are also git guides that recommend using it. > > The option of basing commits on last merge base is probably not really > feasible with Mercurial (I haven't seen git guides that propose it either). > > > basing bugfixes on maintenance does allow cherry picking into master as > > you don't care too much about backward mergeability here, but you still > > lose a good git log and git branch --contains to check which bugfix is > > in which branch. > > I don't disagree with this. Cherry picking is OK, but only as long as > the number of commits is not too large This should be the case most of the time I think. It looks like we've started backporting more and more though, even things like minor doc fixes. The maintenance overhead would be much lower if we would stick to only backporting important bug fixes. Any strategy chosen is fine with me, but I would like to see considered how this affects the number of PRs and the complexity for occasional contributors. Those contributors can't really judge what's backportable and don't want to deal with rebasing. So the new strategy would be something like: 1. bugfix PR sent to master by contributor 2. maintainer decides it's backportable, so after review he doesn't merge PR but rebases it and sends a second PR. First one, with review content, is closed not merged. 3. merge PR into maintenance branch. 4. send third PR to merge back or forward port the fix to master, and merge that. (or some variation with merge bases which is even more involved) Compare to what we did a while ago for numpy and still do for scipy: 1. all PRs are sent to master 2. hit green button after review 3. bugfix is cherry-picked and pushed directly to the maintenance branch The downside of the second strategy is indeed the occasional extra merge conflict, but having 3x less PRs, 2x less merge commits and a less confusing process for occasional contributors could well be worth dealing with that merge conflict. Cheers, Ralf and you use a tool (e.g. my > git-cherry-tree) that tries to check which patches are in and which not. > > Pauli > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sat Jul 19 06:29:14 2014 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 19 Jul 2014 13:29:14 +0300 Subject: [Numpy-discussion] proposal: new commit guidelines for backportable bugfixes In-Reply-To: References: <53C95DA3.7010901@iki.fi> <53C98946.3020405@googlemail.com> Message-ID: 19.07.2014 11:04, Ralf Gommers kirjoitti: [clip] > 1. bugfix PR sent to master by contributor > 2. maintainer decides it's backportable, so after review he doesn't merge > PR but rebases it and sends a second PR. First one, with review content, is > closed not merged. > 3. merge PR into maintenance branch. > 4. send third PR to merge back or forward port the fix to master, and > merge that. > (or some variation with merge bases which is even more involved) The maintainer can just rebase on merge base, and then merge and push it via git as usual, without having to deal with Github. If the pull request happens to be already based on an OK merge base, it can be merged via Github directly to master. The only thing maintainer gains from sending additional pull request via Github is that the code gets run by Travis-CI. However, the tests will also run automatically after pushing the merge commits, so test failures can be caught (although after the fact). This is also the case for directly pushed cherry-picked commits. -- Pauli Virtanen From ralf.gommers at gmail.com Sat Jul 19 07:04:17 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 19 Jul 2014 13:04:17 +0200 Subject: [Numpy-discussion] proposal: new commit guidelines for backportable bugfixes In-Reply-To: References: <53C95DA3.7010901@iki.fi> <53C98946.3020405@googlemail.com> Message-ID: On Sat, Jul 19, 2014 at 12:29 PM, Pauli Virtanen wrote: > 19.07.2014 11:04, Ralf Gommers kirjoitti: > [clip] > > 1. bugfix PR sent to master by contributor > > 2. maintainer decides it's backportable, so after review he doesn't > merge > > PR but rebases it and sends a second PR. First one, with review content, > is > > closed not merged. > > 3. merge PR into maintenance branch. > > 4. send third PR to merge back or forward port the fix to master, and > > merge that. > > (or some variation with merge bases which is even more involved) > > The maintainer can just rebase on merge base, and then merge and push it > via git as usual, without having to deal with Github. I agree, but note that that's not what's happening in the numpy repo at the moment and that Julian (and maybe Chuck as well?) is explicitly against any direct pushes. So the 3x more PRs between what the process used to be and what Julian proposes is not unrealistic. Maybe still worth it, but it's a trade-off (example: I used to use "gitk --all", but it's a spaghetti now). Ralf > If the pull > request happens to be already based on an OK merge base, it can be > merged via Github directly to master. > > The only thing maintainer gains from sending additional pull request via > Github is that the code gets run by Travis-CI. However, the tests will > also run automatically after pushing the merge commits, so test failures > can be caught (although after the fact). This is also the case for > directly pushed cherry-picked commits. > > -- > Pauli Virtanen > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Sat Jul 19 07:26:10 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Sat, 19 Jul 2014 13:26:10 +0200 Subject: [Numpy-discussion] proposal: new commit guidelines for backportable bugfixes In-Reply-To: References: <53C95DA3.7010901@iki.fi> <53C98946.3020405@googlemail.com> Message-ID: <53CA55D2.5020605@googlemail.com> On 19.07.2014 13:04, Ralf Gommers wrote: > > > > On Sat, Jul 19, 2014 at 12:29 PM, Pauli Virtanen > wrote: > > 19.07.2014 11:04, Ralf Gommers kirjoitti: > [clip] > > 1. bugfix PR sent to master by contributor > > 2. maintainer decides it's backportable, so after review he > doesn't merge > > PR but rebases it and sends a second PR. First one, with review > content, is > > closed not merged. > > 3. merge PR into maintenance branch. > > 4. send third PR to merge back or forward port the fix to > master, and > > merge that. > > (or some variation with merge bases which is even more involved) > > The maintainer can just rebase on merge base, and then merge and push it > via git as usual, without having to deal with Github. > > > I agree, but note that that's not what's happening in the numpy repo at > the moment and that Julian (and maybe Chuck as well?) is explicitly > against any direct pushes. So the 3x more PRs between what the process > used to be and what Julian proposes is not unrealistic. > It is what is happening at the numpy repo. We are never directly pushing unreviewed changes, we always have at least one PR. We only directly push changes that have been approved to be applied two more than one branch. With the method I propose there are not any more PRs. You have the main PR targeted to master and the bugfix PR targeted to the maintenance branch, it was the same before except the bugfix PR was a cherry pick instead of a merge. When directly pushing the second merge we even cut one PR from the process. E.g. I pushed Pauls PR #4882 directly to 1.9 without asking him to create a new PR but as far as git is concerned there is no difference, it as still two merges. We could always ask for a new PR for the branch merge to see travis results before the merge. E.g. #4877 and #4891 same branch two PRs two merges. I don't think that should be currently required as master and 1.9 are almost identical and there is little value in seeing travis results for the second merge before doing the merge. But when the branches diverge more the two PRs should probably be preferred to avoid having broken commits on the branches that make bisecting harder. From ralf.gommers at gmail.com Sat Jul 19 08:09:26 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 19 Jul 2014 14:09:26 +0200 Subject: [Numpy-discussion] proposal: new commit guidelines for backportable bugfixes In-Reply-To: <53CA55D2.5020605@googlemail.com> References: <53C95DA3.7010901@iki.fi> <53C98946.3020405@googlemail.com> <53CA55D2.5020605@googlemail.com> Message-ID: On Sat, Jul 19, 2014 at 1:26 PM, Julian Taylor < jtaylor.debian at googlemail.com> wrote: > On 19.07.2014 13:04, Ralf Gommers wrote: > > > > > > > > On Sat, Jul 19, 2014 at 12:29 PM, Pauli Virtanen > > wrote: > > > > 19.07.2014 11:04, Ralf Gommers kirjoitti: > > [clip] > > > 1. bugfix PR sent to master by contributor > > > 2. maintainer decides it's backportable, so after review he > > doesn't merge > > > PR but rebases it and sends a second PR. First one, with review > > content, is > > > closed not merged. > > > 3. merge PR into maintenance branch. > > > 4. send third PR to merge back or forward port the fix to > > master, and > > > merge that. > > > (or some variation with merge bases which is even more involved) > > > > The maintainer can just rebase on merge base, and then merge and > push it > > via git as usual, without having to deal with Github. > > > > > > I agree, but note that that's not what's happening in the numpy repo at > > the moment and that Julian (and maybe Chuck as well?) is explicitly > > against any direct pushes. So the 3x more PRs between what the process > > used to be and what Julian proposes is not unrealistic. > > > > It is what is happening at the numpy repo. > We are never directly pushing unreviewed changes, we always have at > least one PR. We only directly push changes that have been approved to > be applied two more than one branch. > OK never mind then. I was pretty sure you said you were against this, and I see a lot of PRs for simple backports in 1.8.x and 1.9.x. If you now say it's fine (or even preferred) to push directly, my worry about multiple PRs isn't relevant anymore. Ralf > With the method I propose there are not any more PRs. You have the main > PR targeted to master > and the bugfix PR targeted to the maintenance > branch, it was the same before except the bugfix PR was a cherry pick > instead of a merge. > When directly pushing the second merge we even cut one PR from the process. > E.g. I pushed Pauls PR #4882 directly to 1.9 without asking him to > create a new PR but as far as git is concerned there is no difference, > it as still two merges. > > We could always ask for a new PR for the branch merge to see travis > results before the merge. E.g. #4877 and #4891 same branch two PRs two > merges. > I don't think that should be currently required as master and 1.9 are > almost identical and there is little value in seeing travis results for > the second merge before doing the merge. > But when the branches diverge more the two PRs should probably be > preferred to avoid having broken commits on the branches that make > bisecting harder. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Sat Jul 19 08:12:57 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Sat, 19 Jul 2014 14:12:57 +0200 Subject: [Numpy-discussion] proposal: new commit guidelines for backportable bugfixes In-Reply-To: References: <53C95DA3.7010901@iki.fi> <53C98946.3020405@googlemail.com> <53CA55D2.5020605@googlemail.com> Message-ID: <53CA60C9.9090902@googlemail.com> On 19.07.2014 14:09, Ralf Gommers wrote: > > > > On Sat, Jul 19, 2014 at 1:26 PM, Julian Taylor > > > wrote: > > On 19.07.2014 13:04, Ralf Gommers wrote: > > > > > > > > On Sat, Jul 19, 2014 at 12:29 PM, Pauli Virtanen > > >> wrote: > > > > 19.07.2014 11:04, Ralf Gommers kirjoitti: > > [clip] > > > 1. bugfix PR sent to master by contributor > > > 2. maintainer decides it's backportable, so after review he > > doesn't merge > > > PR but rebases it and sends a second PR. First one, with review > > content, is > > > closed not merged. > > > 3. merge PR into maintenance branch. > > > 4. send third PR to merge back or forward port the fix to > > master, and > > > merge that. > > > (or some variation with merge bases which is even more involved) > > > > The maintainer can just rebase on merge base, and then merge > and push it > > via git as usual, without having to deal with Github. > > > > > > I agree, but note that that's not what's happening in the numpy > repo at > > the moment and that Julian (and maybe Chuck as well?) is explicitly > > against any direct pushes. So the 3x more PRs between what the process > > used to be and what Julian proposes is not unrealistic. > > > > It is what is happening at the numpy repo. > We are never directly pushing unreviewed changes, we always have at > least one PR. We only directly push changes that have been approved to > be applied two more than one branch. > > > OK never mind then. I was pretty sure you said you were against this, > and I see a lot of PRs for simple backports in 1.8.x and 1.9.x. If you > now say it's fine (or even preferred) to push directly, my worry about > multiple PRs isn't relevant anymore. > thats not what I'm saying. I'm strongly against pushing unreviewed changes. There must *always* be at least one PR. Pushing this PR to multiple branches without another PR is fine with me if it makes sense in the situation (== the merge is trivial enough to not need *another* review) From bramwillemsen at gmail.com Sat Jul 19 13:31:53 2014 From: bramwillemsen at gmail.com (Bram Willemsen) Date: Sat, 19 Jul 2014 17:31:53 +0000 (UTC) Subject: [Numpy-discussion] BLAS / LAPACK / MKL cannot be found? References: Message-ID: Okay I figured out how to do it, in case someone finds this message later. You need to enter this specific section for the MKL implementation of BLAS and LAPACK, and it will find it! #https://software.intel.com/en-us/articles/numpyscipy-with-intel-mkl [mkl] library_dirs = /wgdisk/omega2dev2/env/EL5/intel/composer_xe_2013.0.079/mkl/lib/intel64 include_dirs = /wgdisk/omega2dev2/env/EL5/intel/composer_xe_2013.0.079/mkl/include mkl_libs = mkl_rt lapack_libs = Note that no libs are given for lapack_libs. This is not a failed copy-paste :) Hopefully this will help someone! From joseluismietta at yahoo.com.ar Tue Jul 22 07:19:09 2014 From: joseluismietta at yahoo.com.ar (=?iso-8859-1?Q?Jos=E8_Luis_Mietta?=) Date: Tue, 22 Jul 2014 04:19:09 -0700 Subject: [Numpy-discussion] length - sticks algorithm Message-ID: <1406027949.48361.YahooMailNeo@web142302.mail.bf1.yahoo.com> Hi experts! Im working with conductivity of sticks film - systems. In my algorithm (N sticks) I have the intersection graph matrix M (M is a NxN matrix, M_ij=1 if sticks 'i' and 'j' do intersect, and M_ij=0 if sticks 'i' and 'j' do not). Also I have 2 lists with the end-points of each stick. In addition, I can calculate the intersection point (If exist) between sticks. I want to calculate all the distances between the points of intersection (1,2,3,...N) in the next figure: without lose the connectivity information (which intersection is connected to which). In the figure, (A) is the system with sticks. I dont know how to do this. Im a python + numpy user. Waiting for your answers! Thans a lot -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue Jul 22 08:02:03 2014 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 22 Jul 2014 13:02:03 +0100 Subject: [Numpy-discussion] length - sticks algorithm In-Reply-To: <1406027949.48361.YahooMailNeo@web142302.mail.bf1.yahoo.com> References: <1406027949.48361.YahooMailNeo@web142302.mail.bf1.yahoo.com> Message-ID: What have you tried? What exactly are you having problems with? Loosely, I would suggest the following approach: For each stick, iterate over each stick that intersects with it (as recorded in M). Find the coordinates of all of the intersection points. Label the intersection points by the IDs of the two sticks that form the intersection (normalize these IDs by keeping them in order so you don't duplicate intersections already found; e.g. (2, 5), not (5, 2)). Arbitrarily, but consistently, pick one end of the stick and find the distances from that end to each of the intersection points. This induces an order on the intersections with that stick by sorting the intersections by their distance from the arbitrary end of the stick. You will need this to determine which intersections on the same stick are neighbors and which aren't. I.e., if you have 3 intersections with a given stick, (i,j), (i,k), and (i,l), you want (i,j)-(i,k), and (i,k)-(i,l), but not (i,j)-(i,l). You can find the distances between each of the intersections easily from that. Use a networkx Graph to record the distances (you are making a so-called "weighted graph"). On Tue, Jul 22, 2014 at 12:19 PM, Jos? Luis Mietta < joseluismietta at yahoo.com.ar> wrote: > > Hi experts! > > Im working with conductivity of sticks film - systems. > > In my algorithm (N sticks) I have the intersection graph matrix M (M is a > NxN matrix, M_ij=1 if sticks 'i' and 'j' do intersect, and M_ij=0 if sticks > 'i' and 'j' do not). > Also I have 2 lists with the end-points of each stick. In addition, I can > calculate the intersection point (If exist) between sticks. > > I want to calculate all the distances between the points of intersection > (1,2,3,...N) in the next figure: > [image: enter image description here] > without lose the connectivity information (which intersection is connected > to which). In the figure, (A) is the system with sticks. > > I dont know how to do this. Im a python + numpy user. > > Waiting for your answers! > > Thans a lot > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at gmail.com Tue Jul 22 10:53:54 2014 From: faltet at gmail.com (Francesc Alted) Date: Tue, 22 Jul 2014 16:53:54 +0200 Subject: [Numpy-discussion] ANN: bcolz 0.7.0 released Message-ID: <53CE7B02.5070000@gmail.com> ====================== Announcing bcolz 0.7.0 ====================== What's new ========== In this release, support for Python 3 has been added, Pandas and HDF5/PyTables conversion, support for different compressors via latest release of Blosc, and a new `iterblocks()` iterator. Also, intensive benchmarking has lead to an important tuning of buffer sizes parameters so that compression and evaluation goes faster than ever. Together, bcolz and the Blosc compressor, are finally fullfilling the promise of accelerating memory I/O, at least for some real scenarios: http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots ``bcolz`` is a renaming of the ``carray`` project. The new goals for the project are to create simple, yet flexible compressed containers, that can live either on-disk or in-memory, and with some high-performance iterators (like `iter()`, `where()`) for querying them. For more detailed info, see the release notes in: https://github.com/Blosc/bcolz/wiki/Release-Notes What it is ========== bcolz provides columnar and compressed data containers. Column storage allows for efficiently querying tables with a large number of columns. It also allows for cheap addition and removal of column. In addition, bcolz objects are compressed by default for reducing memory/disk I/O needs. The compression process is carried out internally by Blosc, a high-performance compressor that is optimized for binary data. bcolz can use numexpr internally so as to accelerate many vector and query operations (although it can use pure NumPy for doing so too). numexpr optimizes the memory usage and use several cores for doing the computations, so it is blazing fast. Moreover, the carray/ctable containers can be disk-based, and it is possible to use them for seamlessly performing out-of-memory computations. bcolz has minimal dependencies (NumPy), comes with an exhaustive test suite and fully supports both 32-bit and 64-bit platforms. Also, it is typically tested on both UNIX and Windows operating systems. Installing ========== bcolz is in the PyPI repository, so installing it is easy: $ pip install -U bcolz Resources ========= Visit the main bcolz site repository at: http://github.com/Blosc/bcolz Manual: http://bcolz.blosc.org Home of Blosc compressor: http://blosc.org User's mail list: bcolz at googlegroups.com http://groups.google.com/group/bcolz License is the new BSD: https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt ---- **Enjoy data!** -- Francesc Alted From totonixsame at gmail.com Tue Jul 22 14:34:04 2014 From: totonixsame at gmail.com (Thiago Franco Moraes) Date: Tue, 22 Jul 2014 15:34:04 -0300 Subject: [Numpy-discussion] =?utf-8?q?Research_position_in_the_Brazilian_R?= =?utf-8?q?esearch_Institute_for_Science_and_Neurotechnology_?= =?utf-8?q?=E2=80=93_BRAINN?= Message-ID: *Research position in the Brazilian Research Institute for Science and Neurotechnology ? BRAINN Postdoc researcher to work with software development for medical imaging* The Brazilian Research Institute for Neuroscience and Neurotechnology (BRAINN) (www.brainn.org.br) focuses on the investigation of basic mechanisms leading to epilepsy and stroke, and the injury mechanisms that follow disease onset and progression. This research has important applications related to prevention, diagnosis, treatment and rehabilitation and will serve as a model for better understanding normal and abnormal brain function. The BRAINN Institute is composed of 10 institutions from Brazil and abroad and hosted by State University of Campinas (UNICAMP). Among the associated institutions is Renato Archer Information Technology Center (CTI) that has a specialized team in open-source software development for medical imaging (www.cti.gov.br/invesalius) and 3D printing applications for healthcare. CTI is located close the UNICAMP in the city of Campinas, State of S?o Paulo in a very technological region of Brazil and is looking for a postdoc researcher to work with software development for medical imaging related to the imaging analysis, diagnosis and treatment of brain diseases. The postdoc position is for two years with the possibility of being renovated for more two years. *Education* - PhD in computer science, computer engineering, mathematics, physics or related. *Requirements* - Digital image processing (Medical imaging) - Computer graphics (basic) * Benefits* 6.143,40 Reais per month free of taxes (about US$ 2.800,00); 15% technical reserve for conferences participation and specific materials acquisition; *Interested* Send curriculum to: jorge.silva at cti.gov.br with subject ?Postdoc position? Applications reviews will begin August 1, 2014 and continue until the position is filled. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scopatz at gmail.com Tue Jul 22 19:04:00 2014 From: scopatz at gmail.com (Anthony Scopatz) Date: Tue, 22 Jul 2014 18:04:00 -0500 Subject: [Numpy-discussion] ANN: bcolz 0.7.0 released In-Reply-To: <53CE7B02.5070000@gmail.com> References: <53CE7B02.5070000@gmail.com> Message-ID: Congrats Francesc! On Tue, Jul 22, 2014 at 9:53 AM, Francesc Alted wrote: > ====================== > Announcing bcolz 0.7.0 > ====================== > > What's new > ========== > > In this release, support for Python 3 has been added, Pandas and > HDF5/PyTables conversion, support for different compressors via latest > release of Blosc, and a new `iterblocks()` iterator. > > Also, intensive benchmarking has lead to an important tuning of buffer > sizes parameters so that compression and evaluation goes faster than > ever. Together, bcolz and the Blosc compressor, are finally fullfilling > the promise of accelerating memory I/O, at least for some real > scenarios: > > > http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots > > ``bcolz`` is a renaming of the ``carray`` project. The new goals for > the project are to create simple, yet flexible compressed containers, > that can live either on-disk or in-memory, and with some > high-performance iterators (like `iter()`, `where()`) for querying them. > > For more detailed info, see the release notes in: > https://github.com/Blosc/bcolz/wiki/Release-Notes > > > What it is > ========== > > bcolz provides columnar and compressed data containers. Column storage > allows for efficiently querying tables with a large number of columns. > It also allows for cheap addition and removal of column. In addition, > bcolz objects are compressed by default for reducing memory/disk I/O > needs. The compression process is carried out internally by Blosc, a > high-performance compressor that is optimized for binary data. > > bcolz can use numexpr internally so as to accelerate many vector and > query operations (although it can use pure NumPy for doing so too). > numexpr optimizes the memory usage and use several cores for doing the > computations, so it is blazing fast. Moreover, the carray/ctable > containers can be disk-based, and it is possible to use them for > seamlessly performing out-of-memory computations. > > bcolz has minimal dependencies (NumPy), comes with an exhaustive test > suite and fully supports both 32-bit and 64-bit platforms. Also, it is > typically tested on both UNIX and Windows operating systems. > > > Installing > ========== > > bcolz is in the PyPI repository, so installing it is easy: > > $ pip install -U bcolz > > > Resources > ========= > > Visit the main bcolz site repository at: > http://github.com/Blosc/bcolz > > Manual: > http://bcolz.blosc.org > > Home of Blosc compressor: > http://blosc.org > > User's mail list: > bcolz at googlegroups.com > http://groups.google.com/group/bcolz > > License is the new BSD: > https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt > > > ---- > > **Enjoy data!** > > -- Francesc Alted > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Wed Jul 23 13:19:58 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Wed, 23 Jul 2014 19:19:58 +0200 Subject: [Numpy-discussion] change default integer from int32 to int64 on win64? Message-ID: <53CFEEBE.5000207@googlemail.com> hi, it recently came to my attention that the default integer type in numpy on windows 64 bit is a 32 bit integers [0]. This seems like a quite serious problem as it means you can't use any integers created from python integers < 32 bit to index arrays larger than 2GB. For example np.product(array.shape) which will never overflow on linux and mac, can overflow on win64. I think this is a very dangerous platform difference and a quite large inconvenience for win64 users so I think it would be good to fix this. This would be a very large change of API and probably also ABI. But as we also never officially released win64 binaries we could change it for from source compilations and give win64 binary distributors the option to keep the old ABI/API at their discretion. Any thoughts on this from win64 users? Cheers, Julian Taylor [0] https://github.com/astropy/astropy/pull/2697 From jtaylor.debian at googlemail.com Wed Jul 23 13:37:39 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Wed, 23 Jul 2014 19:37:39 +0200 Subject: [Numpy-discussion] __numpy_ufunc__ and 1.9 release In-Reply-To: <53C56DA2.40402@googlemail.com> References: <53C56DA2.40402@googlemail.com> Message-ID: <53CFF2E3.1020708@googlemail.com> On 15.07.2014 20:06, Julian Taylor wrote: > hi, > as you may know we want to release numpy 1.9 soon. We should have solved > most indexing regressions the first beta showed. > > The remaining blockers are finishing the new __numpy_ufunc__ feature. > This feature should allow for alternative method to overriding the > behavior of ufuncs from subclasses. > It is described here: > https://github.com/numpy/numpy/blob/master/doc/neps/ufunc-overrides.rst > > The current blocker issues are: > https://github.com/numpy/numpy/issues/4753 > https://github.com/numpy/numpy/pull/4815 > > I'm not to familiar with all the complications of subclassing so I can't > really say how hard this is to solve. > My issue is that it there still seems to be debate on how to handle > operator overriding correctly and I am opposed to releasing a numpy with > yet another experimental feature that may or may not be finished > sometime later. Having datetime in infinite experimental state is bad > enough. > I think nobody is served well if we release 1.9 with the feature > prematurely based on a not representative set of users and the later > after more users showed up see we have to change its behavior. > > So I'm wondering if we should delay the introduction of this feature to > 1.10 or is it important enough to wait until there is a consensus on the > remaining issues? > So its been a week and we got a few answers and new issues. To summarize: - to my knowledge no progress was made on the issues - scipy already has a released version using the current implementation - no very loud objections to delaying the feature to 1.10 - I am still unfamiliar with the problematics of subclassing, but don't want to release something new which has unsolved issues. That scipy already uses it in a released version (0.14) is very problematic. Can maybe someone give some insight if the potential changes to resolve the remaining issues would break scipy? If so we have following choices: - declare what we have as final and close the remaining issues as 'won't fix'. Any changes would have to have a new name __numpy_ufunc2__ or a somehow versioned the interface - delay the introduction, potentially breaking scipy 0.14 when numpy 1.10 is released. I would like to get the next (and last) numpy 1.9 beta out soon, so I would propose to make a decision until this Saturday the 26.02.2014 however misinformed it may be. Please note that the numpy 1.10 release cycle is likely going to be a very long one as we are currently planning to change a bunch of default behaviours that currently raise deprecation warnings and possibly will try to fix string types, text IO and datetime. Please see the future changes notes in the current 1.9.x release notes. If we delay numpy_ufunc it is not unlikely that it will take a year until we release 1.10. Though we could still put it into a earlier 1.9.1. Cheers, Julian From robert.kern at gmail.com Wed Jul 23 14:54:28 2014 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 23 Jul 2014 19:54:28 +0100 Subject: [Numpy-discussion] change default integer from int32 to int64 on win64? In-Reply-To: <53CFEEBE.5000207@googlemail.com> References: <53CFEEBE.5000207@googlemail.com> Message-ID: On Wed, Jul 23, 2014 at 6:19 PM, Julian Taylor wrote: > hi, > it recently came to my attention that the default integer type in numpy > on windows 64 bit is a 32 bit integers [0]. > This seems like a quite serious problem as it means you can't use any > integers created from python integers < 32 bit to index arrays larger > than 2GB. > For example np.product(array.shape) which will never overflow on linux > and mac, can overflow on win64. Currently, on win64, we use Python long integer objects for `.shape` and related attributes. I wonder if we could return numpy int64 scalars instead. Then np.product() (or anything else that consumes these via np.asarray()) would infer the correct dtype for the result. > I think this is a very dangerous platform difference and a quite large > inconvenience for win64 users so I think it would be good to fix this. > This would be a very large change of API and probably also ABI. Yes. Not only would it be a very large change from the status quo, I think it introduces *much greater* platform difference than what we have currently. The assumption that the default integer object corresponds to the platform C long, whatever that is, is pretty heavily ingrained. > But as we also never officially released win64 binaries we could change > it for from source compilations and give win64 binary distributors the > option to keep the old ABI/API at their discretion. That option would make the problem worse, not better. -- Robert Kern From jtaylor.debian at googlemail.com Wed Jul 23 15:50:31 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Wed, 23 Jul 2014 21:50:31 +0200 Subject: [Numpy-discussion] change default integer from int32 to int64 on win64? In-Reply-To: References: <53CFEEBE.5000207@googlemail.com> Message-ID: <53D01207.2090807@googlemail.com> On 23.07.2014 20:54, Robert Kern wrote: > On Wed, Jul 23, 2014 at 6:19 PM, Julian Taylor > wrote: >> hi, >> it recently came to my attention that the default integer type in numpy >> on windows 64 bit is a 32 bit integers [0]. >> This seems like a quite serious problem as it means you can't use any >> integers created from python integers < 32 bit to index arrays larger >> than 2GB. >> For example np.product(array.shape) which will never overflow on linux >> and mac, can overflow on win64. > > Currently, on win64, we use Python long integer objects for `.shape` > and related attributes. I wonder if we could return numpy int64 > scalars instead. Then np.product() (or anything else that consumes > these via np.asarray()) would infer the correct dtype for the result. this might be a less invasive alternative that might solve a lot of the incompatibilities, but it would probably also change np.arange(5) and similar functions to int64 which might change the dtype of a lot of arrays. The difference to just changing it everywhere might not be so large anymore. > >> I think this is a very dangerous platform difference and a quite large >> inconvenience for win64 users so I think it would be good to fix this. >> This would be a very large change of API and probably also ABI. > > Yes. Not only would it be a very large change from the status quo, I > think it introduces *much greater* platform difference than what we > have currently. The assumption that the default integer object > corresponds to the platform C long, whatever that is, is pretty > heavily ingrained. This should be only a concern for the ABI which can be solved by simply recompiling. In comparison that the API is different on win64 compared to all other platforms is something that needs source level changes. > >> But as we also never officially released win64 binaries we could change >> it for from source compilations and give win64 binary distributors the >> option to keep the old ABI/API at their discretion. > > That option would make the problem worse, not better. > maybe, I'm not familiar with the numpy win64 distribution landscape. Is it not like linux where you have one distributor per workstation setup that can update all its packages to a new ABI on one go? From robert.kern at gmail.com Wed Jul 23 16:04:41 2014 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 23 Jul 2014 21:04:41 +0100 Subject: [Numpy-discussion] change default integer from int32 to int64 on win64? In-Reply-To: <53D01207.2090807@googlemail.com> References: <53CFEEBE.5000207@googlemail.com> <53D01207.2090807@googlemail.com> Message-ID: On Wed, Jul 23, 2014 at 8:50 PM, Julian Taylor wrote: > On 23.07.2014 20:54, Robert Kern wrote: >> On Wed, Jul 23, 2014 at 6:19 PM, Julian Taylor >> wrote: >>> hi, >>> it recently came to my attention that the default integer type in numpy >>> on windows 64 bit is a 32 bit integers [0]. >>> This seems like a quite serious problem as it means you can't use any >>> integers created from python integers < 32 bit to index arrays larger >>> than 2GB. >>> For example np.product(array.shape) which will never overflow on linux >>> and mac, can overflow on win64. >> >> Currently, on win64, we use Python long integer objects for `.shape` >> and related attributes. I wonder if we could return numpy int64 >> scalars instead. Then np.product() (or anything else that consumes >> these via np.asarray()) would infer the correct dtype for the result. > > this might be a less invasive alternative that might solve a lot of the > incompatibilities, but it would probably also change np.arange(5) and > similar functions to int64 which might change the dtype of a lot of > arrays. The difference to just changing it everywhere might not be so > large anymore. No, np.arange(5) would not change behavior given my suggestion, only the type of the integer objects in ndarray.shape and related tuples. >>> I think this is a very dangerous platform difference and a quite large >>> inconvenience for win64 users so I think it would be good to fix this. >>> This would be a very large change of API and probably also ABI. >> >> Yes. Not only would it be a very large change from the status quo, I >> think it introduces *much greater* platform difference than what we >> have currently. The assumption that the default integer object >> corresponds to the platform C long, whatever that is, is pretty >> heavily ingrained. > > This should be only a concern for the ABI which can be solved by simply > recompiling. > In comparison that the API is different on win64 compared to all other > platforms is something that needs source level changes. No, the API is no different on win64 than other platforms. Why do you think it is? The win64 platform is a weird platform in this respect, having made a choice that other 64-bit platforms didn't, but numpy's API treats it consistently. When we say that something is a C long, it's a C long on all platforms. >>> But as we also never officially released win64 binaries we could change >>> it for from source compilations and give win64 binary distributors the >>> option to keep the old ABI/API at their discretion. >> >> That option would make the problem worse, not better. > > maybe, I'm not familiar with the numpy win64 distribution landscape. > Is it not like linux where you have one distributor per workstation > setup that can update all its packages to a new ABI on one go? No. There tend to be multiple providers. -- Robert Kern From sebastian at sipsolutions.net Wed Jul 23 16:06:11 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 23 Jul 2014 22:06:11 +0200 Subject: [Numpy-discussion] change default integer from int32 to int64 on win64? In-Reply-To: <53D01207.2090807@googlemail.com> References: <53CFEEBE.5000207@googlemail.com> <53D01207.2090807@googlemail.com> Message-ID: <1406145971.2895.5.camel@sebastian-laptop> On Wed, 2014-07-23 at 21:50 +0200, Julian Taylor wrote: > On 23.07.2014 20:54, Robert Kern wrote: > > On Wed, Jul 23, 2014 at 6:19 PM, Julian Taylor > > wrote: > >> hi, > >> it recently came to my attention that the default integer type in numpy > >> on windows 64 bit is a 32 bit integers [0]. > >> This seems like a quite serious problem as it means you can't use any > >> integers created from python integers < 32 bit to index arrays larger > >> than 2GB. > >> For example np.product(array.shape) which will never overflow on linux > >> and mac, can overflow on win64. > > > > Currently, on win64, we use Python long integer objects for `.shape` > > and related attributes. I wonder if we could return numpy int64 > > scalars instead. Then np.product() (or anything else that consumes > > these via np.asarray()) would infer the correct dtype for the result. > > this might be a less invasive alternative that might solve a lot of the > incompatibilities, but it would probably also change np.arange(5) and > similar functions to int64 which might change the dtype of a lot of > arrays. The difference to just changing it everywhere might not be so > large anymore. > Aren't most such functions already using intp? Just guessing, but: In [16]: np.arange(30, dtype=np.long).dtype.num Out[16]: 9 In [17]: np.arange(30, dtype=np.intp).dtype.num Out[17]: 7 In [18]: np.arange(30).dtype.num Out[18]: 7 frankly, I am not sure what needs to change at all, except the normal array creation and the sum promotion rule. I am probably naive here, but what is the ABI change that is necessary for that? I guess the problem you see is breaking code doing np.array([1,2,3]) and then assuming in C that it is a long array? - Sebastian > > > >> I think this is a very dangerous platform difference and a quite large > >> inconvenience for win64 users so I think it would be good to fix this. > >> This would be a very large change of API and probably also ABI. > > > > Yes. Not only would it be a very large change from the status quo, I > > think it introduces *much greater* platform difference than what we > > have currently. The assumption that the default integer object > > corresponds to the platform C long, whatever that is, is pretty > > heavily ingrained. > > This should be only a concern for the ABI which can be solved by simply > recompiling. > In comparison that the API is different on win64 compared to all other > platforms is something that needs source level changes. > > > > >> But as we also never officially released win64 binaries we could change > >> it for from source compilations and give win64 binary distributors the > >> option to keep the old ABI/API at their discretion. > > > > That option would make the problem worse, not better. > > > > maybe, I'm not familiar with the numpy win64 distribution landscape. > Is it not like linux where you have one distributor per workstation > setup that can update all its packages to a new ABI on one go? > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From sebastian at sipsolutions.net Wed Jul 23 16:17:01 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 23 Jul 2014 22:17:01 +0200 Subject: [Numpy-discussion] change default integer from int32 to int64 on win64? In-Reply-To: <1406145971.2895.5.camel@sebastian-laptop> References: <53CFEEBE.5000207@googlemail.com> <53D01207.2090807@googlemail.com> <1406145971.2895.5.camel@sebastian-laptop> Message-ID: <1406146621.2895.6.camel@sebastian-laptop> On Wed, 2014-07-23 at 22:06 +0200, Sebastian Berg wrote: > On Wed, 2014-07-23 at 21:50 +0200, Julian Taylor wrote: > > On 23.07.2014 20:54, Robert Kern wrote: > > > On Wed, Jul 23, 2014 at 6:19 PM, Julian Taylor > > > wrote: > > >> hi, > > >> it recently came to my attention that the default integer type in numpy > > >> on windows 64 bit is a 32 bit integers [0]. > > >> This seems like a quite serious problem as it means you can't use any > > >> integers created from python integers < 32 bit to index arrays larger > > >> than 2GB. > > >> For example np.product(array.shape) which will never overflow on linux > > >> and mac, can overflow on win64. > > > > > > Currently, on win64, we use Python long integer objects for `.shape` > > > and related attributes. I wonder if we could return numpy int64 > > > scalars instead. Then np.product() (or anything else that consumes > > > these via np.asarray()) would infer the correct dtype for the result. > > > > this might be a less invasive alternative that might solve a lot of the > > incompatibilities, but it would probably also change np.arange(5) and > > similar functions to int64 which might change the dtype of a lot of > > arrays. The difference to just changing it everywhere might not be so > > large anymore. > > > > Aren't most such functions already using intp? Just guessing, but: > > In [16]: np.arange(30, dtype=np.long).dtype.num > Out[16]: 9 > > In [17]: np.arange(30, dtype=np.intp).dtype.num > Out[17]: 7 > > In [18]: np.arange(30).dtype.num > Out[18]: 7 > Ops, never mind that stuff, probably not... np.int_ is 7 too, this is just the way how intp is chosen. > frankly, I am not sure what needs to change at all, except the normal > array creation and the sum promotion rule. I am probably naive here, but > what is the ABI change that is necessary for that? > > I guess the problem you see is breaking code doing np.array([1,2,3]) and > then assuming in C that it is a long array? > > - Sebastian > > > > > > >> I think this is a very dangerous platform difference and a quite large > > >> inconvenience for win64 users so I think it would be good to fix this. > > >> This would be a very large change of API and probably also ABI. > > > > > > Yes. Not only would it be a very large change from the status quo, I > > > think it introduces *much greater* platform difference than what we > > > have currently. The assumption that the default integer object > > > corresponds to the platform C long, whatever that is, is pretty > > > heavily ingrained. > > > > This should be only a concern for the ABI which can be solved by simply > > recompiling. > > In comparison that the API is different on win64 compared to all other > > platforms is something that needs source level changes. > > > > > > > >> But as we also never officially released win64 binaries we could change > > >> it for from source compilations and give win64 binary distributors the > > >> option to keep the old ABI/API at their discretion. > > > > > > That option would make the problem worse, not better. > > > > > > > maybe, I'm not familiar with the numpy win64 distribution landscape. > > Is it not like linux where you have one distributor per workstation > > setup that can update all its packages to a new ABI on one go? > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From jtaylor.debian at googlemail.com Wed Jul 23 16:34:40 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Wed, 23 Jul 2014 22:34:40 +0200 Subject: [Numpy-discussion] change default integer from int32 to int64 on win64? In-Reply-To: References: <53CFEEBE.5000207@googlemail.com> <53D01207.2090807@googlemail.com> Message-ID: <53D01C60.1090307@googlemail.com> On 23.07.2014 22:04, Robert Kern wrote: > On Wed, Jul 23, 2014 at 8:50 PM, Julian Taylor > wrote: >> On 23.07.2014 20:54, Robert Kern wrote: >>> On Wed, Jul 23, 2014 at 6:19 PM, Julian Taylor >>> wrote: >>>> hi, >>>> it recently came to my attention that the default integer type in numpy >>>> on windows 64 bit is a 32 bit integers [0]. >>>> This seems like a quite serious problem as it means you can't use any >>>> integers created from python integers < 32 bit to index arrays larger >>>> than 2GB. >>>> For example np.product(array.shape) which will never overflow on linux >>>> and mac, can overflow on win64. >>> >>> Currently, on win64, we use Python long integer objects for `.shape` >>> and related attributes. I wonder if we could return numpy int64 >>> scalars instead. Then np.product() (or anything else that consumes >>> these via np.asarray()) would infer the correct dtype for the result. >> >> this might be a less invasive alternative that might solve a lot of the >> incompatibilities, but it would probably also change np.arange(5) and >> similar functions to int64 which might change the dtype of a lot of >> arrays. The difference to just changing it everywhere might not be so >> large anymore. > > No, np.arange(5) would not change behavior given my suggestion, only > the type of the integer objects in ndarray.shape and related tuples. ndarray.shape are not numpy scalars but python objects, so they would always be converted back to 32 bit integers when given back to numpy. > >>>> I think this is a very dangerous platform difference and a quite large >>>> inconvenience for win64 users so I think it would be good to fix this. >>>> This would be a very large change of API and probably also ABI. >>> >>> Yes. Not only would it be a very large change from the status quo, I >>> think it introduces *much greater* platform difference than what we >>> have currently. The assumption that the default integer object >>> corresponds to the platform C long, whatever that is, is pretty >>> heavily ingrained. >> >> This should be only a concern for the ABI which can be solved by simply >> recompiling. >> In comparison that the API is different on win64 compared to all other >> platforms is something that needs source level changes. > > No, the API is no different on win64 than other platforms. Why do you > think it is? The win64 platform is a weird platform in this respect, > having made a choice that other 64-bit platforms didn't, but numpy's > API treats it consistently. When we say that something is a C long, > it's a C long on all platforms. The API is different if you consider it from a python perspective. The default integer dtype should be sufficiently large to index into any numpy array, thats what I call an API here. win64 behaves different, you have to explicitly upcast your index to be able to index all memory. But API or ABI is just semantics here, what I actually mean is the difference of source changes vs recompiling to deal with the issue. Of course there might be C code that needs more than recompiling, but it should not be that much, it would have to be already somewhat broken/restrictive code that uses numpy buffers without first checking which type it has. There can also be python code that might need source changes e.g. np.int_ memory mapping a binary from win32 assuming np.int_ is also 32 bit on win64, but this would be broken on linux and mac already now. >>>> But as we also never officially released win64 binaries we could change >>>> it for from source compilations and give win64 binary distributors the >>>> option to keep the old ABI/API at their discretion. >>> >>> That option would make the problem worse, not better. >> >> maybe, I'm not familiar with the numpy win64 distribution landscape. >> Is it not like linux where you have one distributor per workstation >> setup that can update all its packages to a new ABI on one go? > > No. There tend to be multiple providers. > From robert.kern at gmail.com Wed Jul 23 16:57:50 2014 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 23 Jul 2014 21:57:50 +0100 Subject: [Numpy-discussion] change default integer from int32 to int64 on win64? In-Reply-To: <53D01C60.1090307@googlemail.com> References: <53CFEEBE.5000207@googlemail.com> <53D01207.2090807@googlemail.com> <53D01C60.1090307@googlemail.com> Message-ID: On Wed, Jul 23, 2014 at 9:34 PM, Julian Taylor wrote: > On 23.07.2014 22:04, Robert Kern wrote: >> On Wed, Jul 23, 2014 at 8:50 PM, Julian Taylor >> wrote: >>> On 23.07.2014 20:54, Robert Kern wrote: >>>> On Wed, Jul 23, 2014 at 6:19 PM, Julian Taylor >>>> wrote: >>>>> hi, >>>>> it recently came to my attention that the default integer type in numpy >>>>> on windows 64 bit is a 32 bit integers [0]. >>>>> This seems like a quite serious problem as it means you can't use any >>>>> integers created from python integers < 32 bit to index arrays larger >>>>> than 2GB. >>>>> For example np.product(array.shape) which will never overflow on linux >>>>> and mac, can overflow on win64. >>>> >>>> Currently, on win64, we use Python long integer objects for `.shape` >>>> and related attributes. I wonder if we could return numpy int64 >>>> scalars instead. Then np.product() (or anything else that consumes >>>> these via np.asarray()) would infer the correct dtype for the result. >>> >>> this might be a less invasive alternative that might solve a lot of the >>> incompatibilities, but it would probably also change np.arange(5) and >>> similar functions to int64 which might change the dtype of a lot of >>> arrays. The difference to just changing it everywhere might not be so >>> large anymore. >> >> No, np.arange(5) would not change behavior given my suggestion, only >> the type of the integer objects in ndarray.shape and related tuples. > > ndarray.shape are not numpy scalars but python objects, so they would > always be converted back to 32 bit integers when given back to numpy. That's what I'm suggesting that we change: make `type(ndarray.shape[i])` be `np.intp` instead of `long`. However, I'm not sure that this is an issue with numpy 1.8.0 at least. I can't reproduce the reported problem on Win64: In [12]: import numpy as np In [13]: from numpy.lib import stride_tricks In [14]: import sys In [15]: b = stride_tricks.as_strided(np.zeros(1), shape=(100000, 200000, 400000), strides=(0, 0, 0)) In [16]: b.shape Out[16]: (100000L, 200000L, 400000L) In [17]: np.product(b.shape) Out[17]: 8000000000000000 In [18]: np.product(b.shape).dtype Out[18]: dtype('int64') In [19]: sys.maxint Out[19]: 2147483647 In [20]: np.__version__ Out[20]: '1.8.0' In [21]: np.array(b.shape) Out[21]: array([100000, 200000, 400000], dtype=int64) This is on Python 2.7, so maybe something got weird in the Python 3 version that Chris Gohlke tested? >>>>> I think this is a very dangerous platform difference and a quite large >>>>> inconvenience for win64 users so I think it would be good to fix this. >>>>> This would be a very large change of API and probably also ABI. >>>> >>>> Yes. Not only would it be a very large change from the status quo, I >>>> think it introduces *much greater* platform difference than what we >>>> have currently. The assumption that the default integer object >>>> corresponds to the platform C long, whatever that is, is pretty >>>> heavily ingrained. >>> >>> This should be only a concern for the ABI which can be solved by simply >>> recompiling. >>> In comparison that the API is different on win64 compared to all other >>> platforms is something that needs source level changes. >> >> No, the API is no different on win64 than other platforms. Why do you >> think it is? The win64 platform is a weird platform in this respect, >> having made a choice that other 64-bit platforms didn't, but numpy's >> API treats it consistently. When we say that something is a C long, >> it's a C long on all platforms. > > The API is different if you consider it from a python perspective. > The default integer dtype should be sufficiently large to index into any > numpy array, thats what I call an API here. That's perhaps what you want, but numpy has never claimed to do this. The numpy project deliberately chose (and is so documented) to make its default integer type a C long, not a C size_t, to match Python's default. > win64 behaves different, you > have to explicitly upcast your index to be able to index all memory. > But API or ABI is just semantics here, what I actually mean is the > difference of source changes vs recompiling to deal with the issue. > Of course there might be C code that needs more than recompiling, but it > should not be that much, it would have to be already somewhat > broken/restrictive code that uses numpy buffers without first checking > which type it has. > > There can also be python code that might need source changes e.g. > np.int_ memory mapping a binary from win32 assuming np.int_ is also 32 > bit on win64, but this would be broken on linux and mac already now. Anything that assumes that np.int_ is any particular fixed size is always broken, naturally. -- Robert Kern From robert.kern at gmail.com Wed Jul 23 17:07:10 2014 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 23 Jul 2014 22:07:10 +0100 Subject: [Numpy-discussion] change default integer from int32 to int64 on win64? In-Reply-To: References: <53CFEEBE.5000207@googlemail.com> <53D01207.2090807@googlemail.com> <53D01C60.1090307@googlemail.com> Message-ID: On Wed, Jul 23, 2014 at 9:57 PM, Robert Kern wrote: > That's what I'm suggesting that we change: make > `type(ndarray.shape[i])` be `np.intp` instead of `long`. > > However, I'm not sure that this is an issue with numpy 1.8.0 at least. > I can't reproduce the reported problem on Win64: > > In [12]: import numpy as np > > In [13]: from numpy.lib import stride_tricks > > In [14]: import sys > > In [15]: b = stride_tricks.as_strided(np.zeros(1), shape=(100000, > 200000, 400000), strides=(0, 0, 0)) > > In [16]: b.shape > Out[16]: (100000L, 200000L, 400000L) > > In [17]: np.product(b.shape) > Out[17]: 8000000000000000 > > In [18]: np.product(b.shape).dtype > Out[18]: dtype('int64') > > In [19]: sys.maxint > Out[19]: 2147483647 > > In [20]: np.__version__ > Out[20]: '1.8.0' > > In [21]: np.array(b.shape) > Out[21]: array([100000, 200000, 400000], dtype=int64) > > > This is on Python 2.7, so maybe something got weird in the Python 3 > version that Chris Gohlke tested? Ah yes, naturally. Because there is no separate `long` type in Python 3, np.asarray() can't use the type to distinguish what type to build the array. Returning np.intp objects in the tuple would resolve the problem in much the same way the problem is currently resolved in Python 2. This would also have the effect of unifying API on all platforms: currently, win64 is the only platform where the `.shape` tuple and related attribute returns Python longs instead of Python ints. -- Robert Kern From njs at pobox.com Wed Jul 23 17:13:33 2014 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 23 Jul 2014 22:13:33 +0100 Subject: [Numpy-discussion] change default integer from int32 to int64 on win64? In-Reply-To: References: <53CFEEBE.5000207@googlemail.com> <53D01207.2090807@googlemail.com> <53D01C60.1090307@googlemail.com> Message-ID: On Wed, Jul 23, 2014 at 9:57 PM, Robert Kern wrote: > That's perhaps what you want, but numpy has never claimed to do this. > The numpy project deliberately chose (and is so documented) to make > its default integer type a C long, not a C size_t, to match Python's > default. This is true, but it's not very compelling on its own -- "big as a pointer" is a much much more useful property than "big as a long". The only real reason this made sense in the first place is the equivalence between Python int and C long, but even that is gone now with Python 3. IMO at this point backcompat is really the only serious reason for keeping int32 as the default integer type in win64. But of course this is a pretty serious concern... Julian: making the change experimentally and checking how badly scipy and some similar libraries break might be a way to focus the backcompat discussion more. -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From pav at iki.fi Wed Jul 23 18:35:57 2014 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 24 Jul 2014 01:35:57 +0300 Subject: [Numpy-discussion] __numpy_ufunc__ and 1.9 release In-Reply-To: <53CFF2E3.1020708@googlemail.com> References: <53C56DA2.40402@googlemail.com> <53CFF2E3.1020708@googlemail.com> Message-ID: <53D038CD.3000306@iki.fi> 23.07.2014, 20:37, Julian Taylor kirjoitti: [clip: __numpy_ufunc__] > So its been a week and we got a few answers and new issues. To > summarize: - to my knowledge no progress was made on the issues - > scipy already has a released version using the current > implementation - no very loud objections to delaying the feature to > 1.10 - I am still unfamiliar with the problematics of subclassing, > but don't want to release something new which has unsolved issues. > > That scipy already uses it in a released version (0.14) is very > problematic. Can maybe someone give some insight if the potential > changes to resolve the remaining issues would break scipy? > > If so we have following choices: > > - declare what we have as final and close the remaining issues as > 'won't fix'. Any changes would have to have a new name > __numpy_ufunc2__ or a somehow versioned the interface - delay the > introduction, potentially breaking scipy 0.14 when numpy 1.10 is > released. > > I would like to get the next (and last) numpy 1.9 beta out soon, so > I would propose to make a decision until this Saturday the > 26.02.2014 however misinformed it may be. It seems fairly unlikely to me that the `__numpy_ufunc__` interface itself requires any changes. I believe the definition of the interface is quite safe to consider as fixed --- it is a fairly straighforward hook for Numpy ufuncs. (There are also no essential changes in it since last year.) For the binary operator overriding, Scipy sets the constraint that ndarray * spmatrix MUST call spmatrix.__rmul__ even if spmatrix.__numpy_ufunc__ is defined. spmatrixes are not ndarray subclasses, and various subclassing problems do not enter here. Note that this binop discussion is somewhat separate from the __numpy_ufunc__ interface itself. The only information available about it at the binop stage is `hasattr(other, '__numpy_ufunc__')`. *** Regarding the blockers: (1) https://github.com/numpy/numpy/issues/4753 This is a bug in the argument normalization --- output arguments are not checked for the presence of "__numpy_ufunc__" if they are passed as keyword arguments (as a positional argument it works). It's a bug in the implementation, but I don't think it is really a blocker. Scipy sparse matrices will in practice seldom be used as output args for ufuncs. *** (2) https://github.com/numpy/numpy/pull/4815 The is open question concerns semantics of `__numpy_ufunc__` versus Python operator overrides. When should ndarray.__mul__(other) return NotImplemented? Scipy sparse matrices are not subclasses of ndarray, so the code in question in Numpy gets to run only for ndarray * spmatrix This provides a constraint to what solution we can choose in Numpy to deal with the issue: ndarray.__mul__(spmatrix) MUST continue to return NotImplemented This is the current behavior, and cannot be changed: it is not possible to defer this to __numpy_ufunc__(ufunc=np.multiply), because sparse matrices define `*` as the matrix multiply, and not the elementwise multiply. (This settles one line of discussion in the issues --- ndarray should defer.) How Numpy currently determines whether to return NotImplemented in this case or to call np.multiply(self, other) is by comparing `__array_priority__` attributes of `self` and `other`. Scipy sparse matrices define an `__array_priority__` larger than ndarrays, which then makes a NotImplemented be returned. The idea in the __numpy_ufunc__ NEP was to replace this with `hasattr(other, '__numpy_ufunc__') and hasattr(other, '__rmul__')`. However, when both self and other are ndarray subclasses in a certain configuration, both end up returning NotImplemented, and Python raises TypeError. The `__array_priority__` mechanism is also broken in some of the subclassing cases: https://github.com/numpy/numpy/issues/4766 As far as I see, the backward compatibility requirement from Scipy only rules out the option that ndarray.__mul__(other) should unconditionally call `np.add(self, other)`. We have some freedom how to solve the binop vs. subclass issues. It's possible to e.g. retain the __array_priority__ stuff as a backward compatibility measure as we do currently. -- Pauli Virtanen From sturla.molden at gmail.com Wed Jul 23 22:47:05 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 24 Jul 2014 02:47:05 +0000 (UTC) Subject: [Numpy-discussion] change default integer from int32 to int64 on win64? References: <53CFEEBE.5000207@googlemail.com> <53D01207.2090807@googlemail.com> <53D01C60.1090307@googlemail.com> Message-ID: <877454677427862634.570293sturla.molden-gmail.com@news.gmane.org> Julian Taylor wrote: > The default integer dtype should be sufficiently large to index into any > numpy array, thats what I call an API here. win64 behaves different, you > have to explicitly upcast your index to be able to index all memory. No, you don't have to manually upcast Python int to Python long. Python 2 will automatically create a Python long if you overflow a Python int. On Python 3 the Python int does not have a size limit. Sturla From robert.kern at gmail.com Thu Jul 24 04:36:18 2014 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 24 Jul 2014 09:36:18 +0100 Subject: [Numpy-discussion] change default integer from int32 to int64 on win64? In-Reply-To: <877454677427862634.570293sturla.molden-gmail.com@news.gmane.org> References: <53CFEEBE.5000207@googlemail.com> <53D01207.2090807@googlemail.com> <53D01C60.1090307@googlemail.com> <877454677427862634.570293sturla.molden-gmail.com@news.gmane.org> Message-ID: On Thu, Jul 24, 2014 at 3:47 AM, Sturla Molden wrote: > Julian Taylor wrote: > >> The default integer dtype should be sufficiently large to index into any >> numpy array, thats what I call an API here. win64 behaves different, you >> have to explicitly upcast your index to be able to index all memory. > > No, you don't have to manually upcast Python int to Python long. > > Python 2 will automatically create a Python long if you overflow a Python > int. > > On Python 3 the Python int does not have a size limit. Please reread the thread more carefully. That's not what this discussion is about. -- Robert Kern From thomas_unterthiner at web.de Thu Jul 24 05:32:24 2014 From: thomas_unterthiner at web.de (Thomas Unterthiner) Date: Thu, 24 Jul 2014 11:32:24 +0200 Subject: [Numpy-discussion] numpy.mean still broken for large float32 arrays Message-ID: <53D0D2A8.2060308@web.de> Hi! The following is a known "bug" since at least 2010 [1]: import numpy as np X = np.ones((50000, 1024), np.float32) print X.mean() >>> 0.32768 I ran into this for the first time today as part of a larger program. I was very surprised by this, and spent over an hour looking for bugs in my code before noticing that the culprit was `mean` being broken for large float32 arrays. I realize that this behavior is actually documented, but it is absolutely non-intuitive. I assume most users expect `mean` to just work. This has been discussed once two years ago [2], but nothing came of that. This could be easily fixed by making `np.float64` the default dtype (as it already is for integer types), or by at least checking inside mean if the passed array was a large np.float32 array and switch the dtype to np.float64 in that case. Is there a reason why this has not been done? Cheers Thomas [1] http://mail.scipy.org/pipermail/numpy-discussion/2010-November/053697.html [2] http://numpy-discussion.10968.n7.nabble.com/Bug-in-numpy-mean-revisited-td1293.html From larsmans at gmail.com Thu Jul 24 05:39:30 2014 From: larsmans at gmail.com (Lars Buitinck) Date: Thu, 24 Jul 2014 11:39:30 +0200 Subject: [Numpy-discussion] change default integer from int32 to int64 on win64? Message-ID: Wed, 23 Jul 2014 22:13:33 +0100 Nathaniel Smith : > On Wed, Jul 23, 2014 at 9:57 PM, Robert Kern wrote: >> That's perhaps what you want, but numpy has never claimed to do this. ... except in np.where, which promises to return indices but actually returns arrays of longs and thus doesn't work with large arrays on Windows. I know this is a bug that can be fixed without changing the size of np.int, but it goes to show that even core functionality in NumPy gets it wrong. > This is true, but it's not very compelling on its own -- "big as a > pointer" is a much much more useful property than "big as a long". The > only real reason this made sense in the first place is the equivalence > between Python int and C long, but even that is gone now with Python > 3. IMO at this point backcompat is really the only serious reason for > keeping int32 as the default integer type in win64. But of course this > is a pretty serious concern... Hear, hear. The C type long is only useful as an "at least 32-bit" integer, but on the platforms that NumPy targets, int is also at least that large. The only real benefit of long is that it makes porting more interesting . If you have intp and a bunch of explicitly-sized integer types, you don't need an additional type that behaves like a long *except* for backward compat. The Go people got this right; they only have explicitly-sized integer types and an int type the size of a pointer [1]. [1] http://golang.org/doc/go1.1#int From robert.kern at gmail.com Thu Jul 24 05:46:00 2014 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 24 Jul 2014 10:46:00 +0100 Subject: [Numpy-discussion] change default integer from int32 to int64 on win64? In-Reply-To: References: Message-ID: On Thu, Jul 24, 2014 at 10:39 AM, Lars Buitinck wrote: > Wed, 23 Jul 2014 22:13:33 +0100 Nathaniel Smith : >> On Wed, Jul 23, 2014 at 9:57 PM, Robert Kern wrote: >>> That's perhaps what you want, but numpy has never claimed to do this. > > ... except in np.where, which promises to return indices but actually > returns arrays of longs and thus doesn't work with large arrays on > Windows. > > I know this is a bug that can be fixed without changing the size of > np.int, but it goes to show that even core functionality in NumPy gets > it wrong. Does it? I don't have my Windows VM available at the moment, but it looks like PyArray_Nonzero() is correctly returning an intp array: https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/item_selection.c#L2478 If it is incorrect somewhere else, please submit a bug report. -- Robert Kern From hoogendoorn.eelco at gmail.com Thu Jul 24 05:59:16 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Thu, 24 Jul 2014 11:59:16 +0200 Subject: [Numpy-discussion] numpy.mean still broken for large float32 arrays In-Reply-To: <53D0D2A8.2060308@web.de> References: <53D0D2A8.2060308@web.de> Message-ID: Arguably, this isn't a problem of numpy, but of programmers being trained to think of floating point numbers as 'real' numbers, rather than just a finite number of states with a funny distribution over the number line. np.mean isn't broken; your understanding of floating point number is. What you appear to wish for is a silent upcasting of the accumulated result. This is often performed in reducing operations, but I can imagine it runs into trouble for nd-arrays. After all, if I have a huge array that I want to reduce over a very short axis, upcasting might be very undesirable; it wouldn't buy me any extra precision, but it would increase memory use from 'huge' to 'even more huge'. np.mean has a kwarg that allows you to explicitly choose the dtype of the accumulant. X.mean(dtype=np.float64)==1.0. Personally, I have a distaste for implicit behavior, unless the rule is simple and there really can be no negative downsides; which doesn't apply here I would argue. Perhaps when reducing an array completely to a single value, there is no harm in upcasting to the maximum machine precision; but that becomes a rather complex rule which would work out differently for different machines. Its better to be confronted with the limitations of floating point numbers earlier, rather than later when you want to distribute your work and run into subtle bugs on other peoples computers.? -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas_unterthiner at web.de Thu Jul 24 06:55:07 2014 From: thomas_unterthiner at web.de (Thomas Unterthiner) Date: Thu, 24 Jul 2014 12:55:07 +0200 Subject: [Numpy-discussion] numpy.mean still broken for large float32 arrays In-Reply-To: References: <53D0D2A8.2060308@web.de> Message-ID: <53D0E60B.9060500@web.de> I don't agree. The problem is that I expect `mean` to do something reasonable. The documentation mentions that the results can be "inaccurate", which is a huge understatement: the results can be utterly wrong. That is not reasonable. At the very least, a warning should be issued in cases where the dtype might not be appropriate. One cannot predict what input sizes a program will be run with once it's in use (especially if it's in use for several years). I'd argue this is true for pretty much every code except quick one-off scripts. Thus one would have to use `dtype=np.float64` everywhere. By which point it seems obvious that it should have been the default in the first place. The other alternative would be to extend np.mean with some logic that internally figures out the right thing to do (which I don't think is too hard, since ). Your example with the short axis is something that can be checked for. I agree that the logic could become a bit hairy, but not too much: If we are going to sum up more than N values (where N could be determined at compile time, or simply be some constant), we upcast unless the user explicitly specified a dtype. Of course, this would incur an increase in memory. However I'd argue that it's not even a large increase: If you can fit the matrix in memory, then allocating a row/column of float64 instead of float32 should be doable, as well. And I'd much rather get an OutOfMemory execption than silently continue my calculations with useless/wrong results. Cheers Thomas On 2014-07-24 11:59, Eelco Hoogendoorn wrote: > Arguably, this isn't a problem of numpy, but of programmers being > trained to think of floating point numbers as 'real' numbers, rather > than just a finite number of states with a funny distribution over the > number line. np.mean isn't broken; your understanding of floating > point number is. > > What you appear to wish for is a silent upcasting of the accumulated > result. This is often performed in reducing operations, but I can > imagine it runs into trouble for nd-arrays. After all, if I have a > huge array that I want to reduce over a very short axis, upcasting > might be very undesirable; it wouldn't buy me any extra precision, but > it would increase memory use from 'huge' to 'even more huge'. > > np.mean has a kwarg that allows you to explicitly choose the dtype of > the accumulant. X.mean(dtype=np.float64)==1.0. Personally, I have a > distaste for implicit behavior, unless the rule is simple and there > really can be no negative downsides; which doesn't apply here I would > argue. Perhaps when reducing an array completely to a single value, > there is no harm in upcasting to the maximum machine precision; but > that becomes a rather complex rule which would work out differently > for different machines. Its better to be confronted with the > limitations of floating point numbers earlier, rather than later when > you want to distribute your work and run into subtle bugs on other > peoples computers.? > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From fabien.maussion at gmail.com Thu Jul 24 07:33:06 2014 From: fabien.maussion at gmail.com (Fabien) Date: Thu, 24 Jul 2014 13:33:06 +0200 Subject: [Numpy-discussion] numpy.mean still broken for large float32 arrays In-Reply-To: References: <53D0D2A8.2060308@web.de> Message-ID: Hi all, On 24.07.2014 11:59, Eelco Hoogendoorn wrote: > np.mean isn't broken; your understanding of floating point number is. I am quite new to python, and this problem is discussed over and over for other languages too. However, numpy's summation problem appears with relatively small arrays already: py>import numpy as np py>np.ones((4000,4000), np.float32).mean() 1.0 py>np.ones((5000,5000), np.float32).mean() 0.67108864000000001 A 5000*5000 image is not unusual anymore today. In IDL: IDL> mean(fltarr(5000L, 5000L)+1) 1.0000000 IDL> mean(fltarr(7000L, 7000L)+1) 1.0000000 IDL> mean(fltarr(10000L, 10000L)+1) 0.67108864 I can't really explain why there are differences between the two languages (IDL uses 32-bit, single-precision, floating-point numbers) Fabien From jtaylor.debian at googlemail.com Thu Jul 24 07:56:17 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Thu, 24 Jul 2014 13:56:17 +0200 Subject: [Numpy-discussion] numpy.mean still broken for large float32 arrays In-Reply-To: References: <53D0D2A8.2060308@web.de> Message-ID: On Thu, Jul 24, 2014 at 1:33 PM, Fabien wrote: > Hi all, > > On 24.07.2014 11:59, Eelco Hoogendoorn wrote: >> np.mean isn't broken; your understanding of floating point number is. > > I am quite new to python, and this problem is discussed over and over > for other languages too. However, numpy's summation problem appears with > relatively small arrays already: > > py>import numpy as np > py>np.ones((4000,4000), np.float32).mean() > 1.0 > py>np.ones((5000,5000), np.float32).mean() > 0.67108864000000001 > > A 5000*5000 image is not unusual anymore today. > > In IDL: > IDL> mean(fltarr(5000L, 5000L)+1) > 1.0000000 > IDL> mean(fltarr(7000L, 7000L)+1) > 1.0000000 > IDL> mean(fltarr(10000L, 10000L)+1) > 0.67108864 > > I can't really explain why there are differences between the two > languages (IDL uses 32-bit, single-precision, floating-point numbers) > > Fabien > something as simple as summation is already an interesting algorithmic problem there are several ways do to with different speeds and accuracies. E.g. pythons math.fsum is always exact to one ulp but is very slow as it requires partial sorting. Then there is kahan summation that has an accuracy of O(1) ulp but its about 4 times slower than the naive sum. In practice one of the better methods is pairwise summation that is pretty much as fast as a naive summation but has an accuracy of O(logN) ulp. This is the method numpy 1.9 will use this method by default (+ its even a bit faster than our old implementation of the naive sum): https://github.com/numpy/numpy/pull/3685 but it has some limitations, it is limited to blocks fo the buffer size (8192 elements by default) and does not work along the slow axes due to limitations in the numpy iterator. From jaime.frio at gmail.com Thu Jul 24 10:27:28 2014 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Thu, 24 Jul 2014 07:27:28 -0700 Subject: [Numpy-discussion] numpy.mean still broken for large float32 arrays In-Reply-To: References: <53D0D2A8.2060308@web.de> Message-ID: On Thu, Jul 24, 2014 at 4:56 AM, Julian Taylor < jtaylor.debian at googlemail.com> wrote: > In practice one of the better methods is pairwise summation that is > pretty much as fast as a naive summation but has an accuracy of > O(logN) ulp. > This is the method numpy 1.9 will use this method by default (+ its > even a bit faster than our old implementation of the naive sum): > https://github.com/numpy/numpy/pull/3685 > > but it has some limitations, it is limited to blocks fo the buffer > size (8192 elements by default) and does not work along the slow axes > due to limitations in the numpy iterator. > For what it's worth, I see the issue on a 64-bit Windows numpy 1.8, but cannot on a 32-bit Windows numpy master: >>> np.__version__ '1.8.0' >>> np.ones(100000000, dtype=np.float32).mean() 0.16777216 >>> np.__version__ '1.10.0.dev-Unknown' >>> np.ones(100000000, dtype=np.float32).mean() 1.0 -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Thu Jul 24 11:09:12 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Thu, 24 Jul 2014 11:09:12 -0400 Subject: [Numpy-discussion] numpy.mean still broken for large float32 arrays In-Reply-To: References: <53D0D2A8.2060308@web.de> Message-ID: <53D12198.8040308@gmail.com> On 7/24/2014 5:59 AM, Eelco Hoogendoorn wrote to Thomas: > np.mean isn't broken; your understanding of floating point number is. This comment seems to conflate separate issues: the desirable return type, and the computational algorithm. It is certainly possible to compute a mean of float32 doing reduction in float64 and still return a float32. There is nothing implicit in the name `mean` that says we have to just add everything up and divide by the count. My own view is that `mean` would behave enough better if computed as a running mean to justify the speed loss. Naturally similar issues arise for `var` and `std`, etc. See http://www.johndcook.com/standard_deviation.html for some discussion and references. Alan Isaac From hoogendoorn.eelco at gmail.com Thu Jul 24 11:31:11 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Thu, 24 Jul 2014 18:31:11 +0300 Subject: [Numpy-discussion] numpy.mean still broken for large float32arrays Message-ID: <53d126f2.46b3c20a.4c2b.ffff8ede@mx.google.com> Thanks Julian, those seem like Nice improvements. The fact that it either does or doesnt work depending on the axis makes me a Little queesy; but yeah, checking that fp's do what You think they should, is unfortunately best left as the responsibility of the programmer. -----Original Message----- From: "Julian Taylor" Sent: ?24-?7-?2014 14:56 To: "Discussion of Numerical Python" Subject: Re: [Numpy-discussion] numpy.mean still broken for large float32arrays On Thu, Jul 24, 2014 at 1:33 PM, Fabien wrote: > Hi all, > > On 24.07.2014 11:59, Eelco Hoogendoorn wrote: >> np.mean isn't broken; your understanding of floating point number is. > > I am quite new to python, and this problem is discussed over and over > for other languages too. However, numpy's summation problem appears with > relatively small arrays already: > > py>import numpy as np > py>np.ones((4000,4000), np.float32).mean() > 1.0 > py>np.ones((5000,5000), np.float32).mean() > 0.67108864000000001 > > A 5000*5000 image is not unusual anymore today. > > In IDL: > IDL> mean(fltarr(5000L, 5000L)+1) > 1.0000000 > IDL> mean(fltarr(7000L, 7000L)+1) > 1.0000000 > IDL> mean(fltarr(10000L, 10000L)+1) > 0.67108864 > > I can't really explain why there are differences between the two > languages (IDL uses 32-bit, single-precision, floating-point numbers) > > Fabien > something as simple as summation is already an interesting algorithmic problem there are several ways do to with different speeds and accuracies. E.g. pythons math.fsum is always exact to one ulp but is very slow as it requires partial sorting. Then there is kahan summation that has an accuracy of O(1) ulp but its about 4 times slower than the naive sum. In practice one of the better methods is pairwise summation that is pretty much as fast as a naive summation but has an accuracy of O(logN) ulp. This is the method numpy 1.9 will use this method by default (+ its even a bit faster than our old implementation of the naive sum): https://github.com/numpy/numpy/pull/3685 but it has some limitations, it is limited to blocks fo the buffer size (8192 elements by default) and does not work along the slow axes due to limitations in the numpy iterator. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Thu Jul 24 11:34:28 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Thu, 24 Jul 2014 18:34:28 +0300 Subject: [Numpy-discussion] numpy.mean still broken for large float32arrays Message-ID: <53d127b7.a5cbc20a.62be.ffffa347@mx.google.com> True, i suppose there is no harm in accumulating with max precision, and storing the result in the Original dtype, unless otherwise specified, although i wonder if the current nditer supports such behavior. -----Original Message----- From: "Alan G Isaac" Sent: ?24-?7-?2014 18:09 To: "Discussion of Numerical Python" Subject: Re: [Numpy-discussion] numpy.mean still broken for large float32arrays On 7/24/2014 5:59 AM, Eelco Hoogendoorn wrote to Thomas: > np.mean isn't broken; your understanding of floating point number is. This comment seems to conflate separate issues: the desirable return type, and the computational algorithm. It is certainly possible to compute a mean of float32 doing reduction in float64 and still return a float32. There is nothing implicit in the name `mean` that says we have to just add everything up and divide by the count. My own view is that `mean` would behave enough better if computed as a running mean to justify the speed loss. Naturally similar issues arise for `var` and `std`, etc. See http://www.johndcook.com/standard_deviation.html for some discussion and references. Alan Isaac _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Jul 24 12:59:38 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 24 Jul 2014 10:59:38 -0600 Subject: [Numpy-discussion] numpy.mean still broken for large float32 arrays In-Reply-To: References: <53D0D2A8.2060308@web.de> Message-ID: On Thu, Jul 24, 2014 at 8:27 AM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > On Thu, Jul 24, 2014 at 4:56 AM, Julian Taylor < > jtaylor.debian at googlemail.com> wrote: > >> In practice one of the better methods is pairwise summation that is >> pretty much as fast as a naive summation but has an accuracy of >> O(logN) ulp. >> This is the method numpy 1.9 will use this method by default (+ its >> even a bit faster than our old implementation of the naive sum): >> https://github.com/numpy/numpy/pull/3685 >> >> but it has some limitations, it is limited to blocks fo the buffer >> size (8192 elements by default) and does not work along the slow axes >> due to limitations in the numpy iterator. >> > > For what it's worth, I see the issue on a 64-bit Windows numpy 1.8, but > cannot on a 32-bit Windows numpy master: > > >>> np.__version__ > '1.8.0' > >>> np.ones(100000000, dtype=np.float32).mean() > 0.16777216 > > >>> np.__version__ > '1.10.0.dev-Unknown' > >>> np.ones(100000000, dtype=np.float32).mean() > 1.0 > > Interesting. Might be compiler related as there are many choices for floating point instructions/registers in i386. The i386 version may effectively be working in double precision. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From joseph.martinot-lagarde at m4x.org Thu Jul 24 13:03:50 2014 From: joseph.martinot-lagarde at m4x.org (Joseph Martinot-Lagarde) Date: Thu, 24 Jul 2014 19:03:50 +0200 Subject: [Numpy-discussion] numpy.mean still broken for large float32 arrays In-Reply-To: <53D0E60B.9060500@web.de> References: <53D0D2A8.2060308@web.de> <53D0E60B.9060500@web.de> Message-ID: Le 24/07/2014 12:55, Thomas Unterthiner a ?crit : > I don't agree. The problem is that I expect `mean` to do something > reasonable. The documentation mentions that the results can be > "inaccurate", which is a huge understatement: the results can be utterly > wrong. That is not reasonable. At the very least, a warning should be > issued in cases where the dtype might not be appropriate. > Maybe the problem is the documentation, then. If this is a common error, it could be explicitly documented in the function documentation. From nouiz at nouiz.org Thu Jul 24 13:04:43 2014 From: nouiz at nouiz.org (=?UTF-8?B?RnLDqWTDqXJpYyBCYXN0aWVu?=) Date: Thu, 24 Jul 2014 13:04:43 -0400 Subject: [Numpy-discussion] numpy.mean still broken for large float32 arrays In-Reply-To: References: <53D0D2A8.2060308@web.de> Message-ID: On Thu, Jul 24, 2014 at 12:59 PM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > > On Thu, Jul 24, 2014 at 8:27 AM, Jaime Fern?ndez del R?o < > jaime.frio at gmail.com> wrote: > >> On Thu, Jul 24, 2014 at 4:56 AM, Julian Taylor < >> jtaylor.debian at googlemail.com> wrote: >> >>> In practice one of the better methods is pairwise summation that is >>> pretty much as fast as a naive summation but has an accuracy of >>> O(logN) ulp. >>> This is the method numpy 1.9 will use this method by default (+ its >>> even a bit faster than our old implementation of the naive sum): >>> https://github.com/numpy/numpy/pull/3685 >>> >>> but it has some limitations, it is limited to blocks fo the buffer >>> size (8192 elements by default) and does not work along the slow axes >>> due to limitations in the numpy iterator. >>> >> >> For what it's worth, I see the issue on a 64-bit Windows numpy 1.8, but >> cannot on a 32-bit Windows numpy master: >> >> >>> np.__version__ >> '1.8.0' >> >>> np.ones(100000000, dtype=np.float32).mean() >> 0.16777216 >> >> >>> np.__version__ >> '1.10.0.dev-Unknown' >> >>> np.ones(100000000, dtype=np.float32).mean() >> 1.0 >> >> > Interesting. Might be compiler related as there are many choices for > floating point instructions/registers in i386. The i386 version may > effectively be working in double precision. > Also note the different numpy version. Julian told that numpy 1.9 will use a more precise version in that case. That could explain that. Fred -------------- next part -------------- An HTML attachment was scrubbed... URL: From rays at blue-cove.com Thu Jul 24 13:36:12 2014 From: rays at blue-cove.com (RayS) Date: Thu, 24 Jul 2014 10:36:12 -0700 Subject: [Numpy-discussion] numpy.mean still broken for large float32 arrays In-Reply-To: References: <53D0D2A8.2060308@web.de> Message-ID: <201407241736.s6OHaEcK032578@blue-cove.com> import numpy print numpy.__version__ for s in range(1864100, 1864200): if numpy.ones((s, 9), numpy.float32).sum()!= s*9: print '\nbroke', s break else: print '\r',s, C:\temp>python np_sum.py 1.8.0b2 1864135 broke 1864136 import numpy print numpy.__version__ for s in range(1864130*9, 1864135*9): if numpy.ones((s, 1), numpy.float32).sum()!= s: print '\nbroke', s break else: print '\r',s, C:\temp>python np_sum.py 1.8.0b2 16777214 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Thu Jul 24 13:53:08 2014 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 24 Jul 2014 13:53:08 -0400 Subject: [Numpy-discussion] masked_where broadcasting? Message-ID: I ran into this this morning while writing up a new test for matplotlib. Shouldn't these two arrays be broadcasted automatically or maybe np.ma is being overly cautious? u = np.ma.masked_where((-0.4 < x) & (x < 0.1), u, copy=False) File "/home/ben/.local/lib/python2.7/site-packages/numpy/ma/core.py", line 1806, in masked_where " (got %s and %s)" % (cshape, ashape)) IndexError: Inconsistant shape between the condition and the input (got (10, 1, 1) and (10, 10, 3)) x has shape (10, 1, 1) and u has shape (10, 10, 3). This is on a recent-ish numpy master. Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From rays at blue-cove.com Thu Jul 24 15:05:56 2014 From: rays at blue-cove.com (RayS) Date: Thu, 24 Jul 2014 12:05:56 -0700 Subject: [Numpy-discussion] numpy.mean still broken for large float32 arrays Message-ID: <201407241906.s6OJ61gG006795@blue-cove.com> Probably a number of scipy places as well import numpy import scipy.stats print numpy.__version__ print scipy.__version__ for s in range(16777214, 16777944): if scipy.stats.nanmean(numpy.ones((s, 1), numpy.float32))[0]!=1: print '\nbroke', s, scipy.stats.nanmean(numpy.ones((s, 1), numpy.float32)) break else: print '\r',s, c:\temp>python np_sum.py 1.8.0b2 0.11.0 16777216 broke 16777217 [ 0.99999994] From jeffreback at gmail.com Thu Jul 24 15:25:06 2014 From: jeffreback at gmail.com (Jeff Reback) Date: Thu, 24 Jul 2014 15:25:06 -0400 Subject: [Numpy-discussion] numpy.mean still broken for large float32 arrays In-Reply-To: <201407241906.s6OJ61gG006795@blue-cove.com> References: <201407241906.s6OJ61gG006795@blue-cove.com> Message-ID: related recent issue: https://github.com/numpy/numpy/issues/4638 and pandas is now explicitly specifying the accumulator to avoid this problem: https://github.com/pydata/pandas/pull/6954/files pandas also implemented the Welfords method for rolling_var in 0.14.0, see here: https://github.com/pydata/pandas/pull/6817 On Thu, Jul 24, 2014 at 3:05 PM, RayS wrote: > Probably a number of scipy places as well > > > > import numpy > import scipy.stats > print numpy.__version__ > print scipy.__version__ > for s in range(16777214, 16777944): > if scipy.stats.nanmean(numpy.ones((s, 1), numpy.float32))[0]!=1: > print '\nbroke', s, scipy.stats.nanmean(numpy.ones((s, 1), > numpy.float32)) > break > else: > print '\r',s, > > c:\temp>python np_sum.py > 1.8.0b2 > 0.11.0 > 16777216 > broke 16777217 [ 0.99999994] > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Thu Jul 24 16:42:53 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Thu, 24 Jul 2014 23:42:53 +0300 Subject: [Numpy-discussion] numpy.mean still broken for large float32arrays Message-ID: <53d17011.234dc20a.45b5.ffffeacf@mx.google.com> Inaccurate and utterly wrong are subjective. If You want To Be sufficiently strict, floating point calculations are almost always 'utterly wrong'. Granted, It would Be Nice if the docs specified the algorithm used. But numpy does not produce anything different than what a standard c loop or c++ std lib func would. This isn't a bug report, but rather a feature request. That said, support for fancy reduction algorithms would certainly be nice, if implementing it in numpy in a coherent manner is feasible. -----Original Message----- From: "Joseph Martinot-Lagarde" Sent: ?24-?7-?2014 20:04 To: "numpy-discussion at scipy.org" Subject: Re: [Numpy-discussion] numpy.mean still broken for large float32arrays Le 24/07/2014 12:55, Thomas Unterthiner a ?crit : > I don't agree. The problem is that I expect `mean` to do something > reasonable. The documentation mentions that the results can be > "inaccurate", which is a huge understatement: the results can be utterly > wrong. That is not reasonable. At the very least, a warning should be > issued in cases where the dtype might not be appropriate. > Maybe the problem is the documentation, then. If this is a common error, it could be explicitly documented in the function documentation. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Thu Jul 24 17:10:15 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Thu, 24 Jul 2014 17:10:15 -0400 Subject: [Numpy-discussion] numpy.mean still broken for large float32arrays In-Reply-To: <53d17011.234dc20a.45b5.ffffeacf@mx.google.com> References: <53d17011.234dc20a.45b5.ffffeacf@mx.google.com> Message-ID: <53D17637.4000008@gmail.com> On 7/24/2014 4:42 PM, Eelco Hoogendoorn wrote: > This isn't a bug report, but rather a feature request. I'm not sure statement this is correct. The mean of a float32 array can certainly be computed as a float32. Currently this is not necessarily what happens, not even approximately. That feels a lot like a bug, even if we can readily understand how the algorithm currently used produces it. To say whether it is a bug or not, don't we have to ask about the intent of `mean`? If the intent is to sum and divide, then it is not a bug. If the intent is to produce the mean, then it is a bug. Alan Isaac From hoogendoorn.eelco at gmail.com Thu Jul 24 23:37:49 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Fri, 25 Jul 2014 06:37:49 +0300 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays Message-ID: <53d1d132.93d3b40a.2d6b.0714@mx.google.com> Perhaps it is a slightly semantical discussion; but all fp calculations have errors, and there are always strategies for making them smaller. We just don't happen to like the error for this case; but rest assured it won't be hard to find new cases of 'blatantly wrong' results, no matter what accumulator is implemented. That's no reason to not try and be clever about it, but there isn't going to be an algorithm that is best for all possible inputs, and in the end the most important thing is that the algorithm used is specified in the docs. -----Original Message----- From: "Alan G Isaac" Sent: ?25-?7-?2014 00:10 To: "Discussion of Numerical Python" Subject: Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays On 7/24/2014 4:42 PM, Eelco Hoogendoorn wrote: > This isn't a bug report, but rather a feature request. I'm not sure statement this is correct. The mean of a float32 array can certainly be computed as a float32. Currently this is not necessarily what happens, not even approximately. That feels a lot like a bug, even if we can readily understand how the algorithm currently used produces it. To say whether it is a bug or not, don't we have to ask about the intent of `mean`? If the intent is to sum and divide, then it is not a bug. If the intent is to produce the mean, then it is a bug. Alan Isaac _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Fri Jul 25 04:22:56 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Fri, 25 Jul 2014 10:22:56 +0200 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: <53d1d132.93d3b40a.2d6b.0714@mx.google.com> References: <53d1d132.93d3b40a.2d6b.0714@mx.google.com> Message-ID: To elaborate on that point; knowing that numpy accumulates in a simple first-to-last sweep, and does not implicitly upcast, the original problem can be solved in several ways; specifying a higher precision to sum with, or by a nested summation, like X.mean(0).mean(0)==1.0. I personally like this explicitness, and am wary of numpy doing overly clever things behind the scenes, as I can think of other code that might become broken if things change too radically. For instance, I often sort large arrays with a large spread in magnitudes before summation, relying on the fact that summing the smallest values first gives best precision. Any changes made to reduction behavior should try and be backwards compatible with such properties of straightforward reductions, or else a lot of code is going to be broken without warning. I suppose using maximum precision internally, and nesting all reductions over multiple axes of an ndarray, are both easy to implement improvements that do not come with any drawbacks that I can think of. Actually the maximum precision I am not so sure of, as I personally prefer to make an informed decision about precision used, and get an error on a platform that does not support the specified precision, rather than obtain subtly or horribly broken results without warning when moving your code to a different platform/compiler whatever. On Fri, Jul 25, 2014 at 5:37 AM, Eelco Hoogendoorn < hoogendoorn.eelco at gmail.com> wrote: > Perhaps it is a slightly semantical discussion; but all fp calculations > have errors, and there are always strategies for making them smaller. We > just don't happen to like the error for this case; but rest assured it > won't be hard to find new cases of 'blatantly wrong' results, no matter > what accumulator is implemented. That's no reason to not try and be clever > about it, but there isn't going to be an algorithm that is best for all > possible inputs, and in the end the most important thing is that the > algorithm used is specified in the docs. > ------------------------------ > From: Alan G Isaac > Sent: ?25-?7-?2014 00:10 > > To: Discussion of Numerical Python > Subject: Re: [Numpy-discussion] numpy.mean still broken for > largefloat32arrays > > On 7/24/2014 4:42 PM, Eelco Hoogendoorn wrote: > > This isn't a bug report, but rather a feature request. > > I'm not sure statement this is correct. The mean of a float32 array > can certainly be computed as a float32. Currently this is > not necessarily what happens, not even approximately. > That feels a lot like a bug, even if we can readily understand > how the algorithm currently used produces it. To say whether > it is a bug or not, don't we have to ask about the intent of `mean`? > If the intent is to sum and divide, then it is not a bug. > If the intent is to produce the mean, then it is a bug. > > Alan Isaac > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Fri Jul 25 09:06:40 2014 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Fri, 25 Jul 2014 15:06:40 +0200 Subject: [Numpy-discussion] change default integer from int32 to int64 on win64? In-Reply-To: References: Message-ID: The dtype returned by np.where looks right (int64): >>> import platform >>> platform.architecture() ('64bit', 'WindowsPE') >>> import numpy as np >>> np.__version__ '1.9.0b1' >>> a = np.zeros(10) >>> np.where(a == 0) (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int64),) -- Olivier From jeffreback at gmail.com Fri Jul 25 09:52:37 2014 From: jeffreback at gmail.com (Jeff) Date: Fri, 25 Jul 2014 06:52:37 -0700 (PDT) Subject: [Numpy-discussion] ANN: Pandas 0.14.0 Release Candidate 1 In-Reply-To: References: Message-ID: <733e0eed-fbef-46d4-b797-c5368b280fee@googlegroups.com> How does the build trigger? If its just a matter of clicking on something when released. I think we can handle that :) On Saturday, May 17, 2014 7:22:00 AM UTC-4, Jeff wrote: > > Hi, > > I'm pleased to announce the availability of the first release candidate of > Pandas 0.14.0. > Please try this RC and report any issues here: Pandas Issues > > We will be releasing officially in about 2 weeks or so. > > This is a major release from 0.13.1 and includes a small number of API > changes, several new features, enhancements, and > performance improvements along with a large number of bug fixes. > > Highlights include: > > - Officially support Python 3.4 > - SQL interfaces updated to use sqlalchemy, > - Display interface changes > - MultiIndexing Using Slicers > - Ability to join a singly-indexed DataFrame with a multi-indexed > DataFrame > - More consistency in groupby results and more flexible groupby > specifications > - Holiday calendars are now supported in CustomBusinessDay > - Several improvements in plotting functions, including: hexbin, area > and pie plots. > - Performance doc section on I/O operations > > Since there are some significant changes in the default way DataFrames are > displayed. I have put > up a comment issue looking for some feedback here > > > Here are the full whatsnew and documentation links: > > v0.14.0 Whatsnew > > > v0.14.0 Documentation Page > > > Source tarballs, and windows builds are available here: > > Pandas v0.14rc1 Release > > A big thank you to everyone who contributed to this release! > > Jeff > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rays at blue-cove.com Fri Jul 25 10:11:52 2014 From: rays at blue-cove.com (RayS) Date: Fri, 25 Jul 2014 07:11:52 -0700 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: References: <53d1d132.93d3b40a.2d6b.0714@mx.google.com> Message-ID: <201407251411.s6PEBrpw018675@blue-cove.com> At 01:22 AM 7/25/2014, you wrote: > Actually the maximum precision I am not so > sure of, as I personally prefer to make an > informed decision about precision used, and get > an error on a platform that does not support > the specified precision, rather than obtain > subtly or horribly broken results without > warning? when moving your code to a different platform/compiler whatever. We were talking on this in the office, as we realized it does affect a couple of lines dealing with large arrays, including complex64. As I expect Python modules to work uniformly cross platform unless documented otherwise, to me that includes 32 vs 64 bit platforms, implying that the modules should automatically use large enough accumulators for the data type input. http://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html does mention inaccuracy. http://docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.stats.mstats.gmean.html http://docs.scipy.org/doc/numpy/reference/generated/numpy.sum.html etc do not, exactly - Ray From robert.kern at gmail.com Fri Jul 25 10:22:36 2014 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 25 Jul 2014 15:22:36 +0100 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: <201407251411.s6PEBrpw018675@blue-cove.com> References: <53d1d132.93d3b40a.2d6b.0714@mx.google.com> <201407251411.s6PEBrpw018675@blue-cove.com> Message-ID: On Fri, Jul 25, 2014 at 3:11 PM, RayS wrote: > At 01:22 AM 7/25/2014, you wrote: >> Actually the maximum precision I am not so >> sure of, as I personally prefer to make an >> informed decision about precision used, and get >> an error on a platform that does not support >> the specified precision, rather than obtain >> subtly or horribly broken results without >> warning? when moving your code to a different platform/compiler whatever. > > We were talking on this in the office, as we > realized it does affect a couple of lines dealing > with large arrays, including complex64. > As I expect Python modules to work uniformly > cross platform unless documented otherwise, to me > that includes 32 vs 64 bit platforms, implying > that the modules should automatically use large > enough accumulators for the data type input. The 32/64-bitness of your platform has nothing to do with floating point. Nothing discussed in this thread is platform-specific (modulo some minor details about the hardware FPU, but that should be taken as read). -- Robert Kern From rays at blue-cove.com Fri Jul 25 12:56:39 2014 From: rays at blue-cove.com (RayS) Date: Fri, 25 Jul 2014 09:56:39 -0700 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: References: <53d1d132.93d3b40a.2d6b.0714@mx.google.com> <201407251411.s6PEBrpw018675@blue-cove.com> Message-ID: <201407251656.s6PGui7J027752@blue-cove.com> At 07:22 AM 7/25/2014, you wrote: > > We were talking on this in the office, as we > > realized it does affect a couple of lines dealing > > with large arrays, including complex64. > > As I expect Python modules to work uniformly > > cross platform unless documented otherwise, to me > > that includes 32 vs 64 bit platforms, implying > > that the modules should automatically use large > > enough accumulators for the data type input. > >The 32/64-bitness of your platform has nothing to do with floating >point. As a naive end user, I can, and do, download different binaries for different CPUs/Windows versions and will get different results http://mail.scipy.org/pipermail/numpy-discussion/2014-July/070747.html > Nothing discussed in this thread is platform-specific (modulo >some minor details about the hardware FPU, but that should be taken as >read). And compilers, apparently. The important point was that it would be best if all of the methods affected by summing 32 bit floats with 32 bit accumulators had the same Notes as numpy.mean(). We went through a lot of code yesterday, assuming that any numpy or Scipy.stats functions that use accumulators suffer the same issue, whether noted or not, and found it true. "Depending on the input data, this can cause the results to be inaccurate, especially for float32 (see example below). Specifying a higher-precision accumulator using the dtype keyword can alleviate this issue." seems rather un-Pythonic. - Ray -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Fri Jul 25 13:40:17 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Fri, 25 Jul 2014 20:40:17 +0300 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays Message-ID: <53d296aa.d1c6b40a.6e8d.ffffe5bf@mx.google.com> Arguably, the whole of floating point numbers and their related shenanigans is not very pythonic in the first place. The accuracy of the output WILL depend on the input, to some degree or another. At the risk of repeating myself: explicit is better than implicit -----Original Message----- From: "RayS" Sent: ?25-?7-?2014 19:56 To: "Discussion of Numerical Python" Subject: Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays At 07:22 AM 7/25/2014, you wrote: > We were talking on this in the office, as we > realized it does affect a couple of lines dealing > with large arrays, including complex64. > As I expect Python modules to work uniformly > cross platform unless documented otherwise, to me > that includes 32 vs 64 bit platforms, implying > that the modules should automatically use large > enough accumulators for the data type input. The 32/64-bitness of your platform has nothing to do with floating point. As a naive end user, I can, and do, download different binaries for different CPUs/Windows versions and will get different results http://mail.scipy.org/pipermail/numpy-discussion/2014-July/070747.html Nothing discussed in this thread is platform-specific (modulo some minor details about the hardware FPU, but that should be taken as read). And compilers, apparently. The important point was that it would be best if all of the methods affected by summing 32 bit floats with 32 bit accumulators had the same Notes as numpy.mean(). We went through a lot of code yesterday, assuming that any numpy or Scipy.stats functions that use accumulators suffer the same issue, whether noted or not, and found it true. "Depending on the input data, this can cause the results to be inaccurate, especially for float32 (see example below). Specifying a higher-precision accumulator using the dtype keyword can alleviate this issue." seems rather un-Pythonic. - Ray -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Fri Jul 25 14:00:15 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Fri, 25 Jul 2014 14:00:15 -0400 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: <53d296aa.d1c6b40a.6e8d.ffffe5bf@mx.google.com> References: <53d296aa.d1c6b40a.6e8d.ffffe5bf@mx.google.com> Message-ID: <53D29B2F.9060005@gmail.com> On 7/25/2014 1:40 PM, Eelco Hoogendoorn wrote: > At the risk of repeating myself: explicit is better than implicit This sounds like an argument for renaming the `mean` function `naivemean` rather than `mean`. Whatever numpy names `mean`, shouldn't it implement an algorithm that produces the mean? And obviously, for any float data type, the mean value of the values in the array is representable as a value of the same type. Alan Isaac From matthew.brett at gmail.com Fri Jul 25 14:06:30 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 25 Jul 2014 14:06:30 -0400 Subject: [Numpy-discussion] [pydata] Re: ANN: Pandas 0.14.0 Release Candidate 1 In-Reply-To: <733e0eed-fbef-46d4-b797-c5368b280fee@googlegroups.com> References: <733e0eed-fbef-46d4-b797-c5368b280fee@googlegroups.com> Message-ID: Hi, On Fri, Jul 25, 2014 at 9:52 AM, Jeff wrote: > How does the build trigger? If its just a matter of clicking on something > when released. I think we can handle that :) > The two options are: * I add you and whoever else does releases to my repo, and you can trigger builds by pressing a button on the travis page for my repo, or pushing commits to the repo * You take over the repo, I submit a pull request to make sure you have auth to upload to rackspace, and proceed as above. But yes - single click -> build.... Cheers, Matthew From njs at pobox.com Fri Jul 25 14:29:16 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 25 Jul 2014 19:29:16 +0100 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: <201407251656.s6PGui7J027752@blue-cove.com> References: <53d1d132.93d3b40a.2d6b.0714@mx.google.com> <201407251411.s6PEBrpw018675@blue-cove.com> <201407251656.s6PGui7J027752@blue-cove.com> Message-ID: On Fri, Jul 25, 2014 at 5:56 PM, RayS wrote: > The important point was that it would be best if all of the methods affected > by summing 32 bit floats with 32 bit accumulators had the same Notes as > numpy.mean(). We went through a lot of code yesterday, assuming that any > numpy or Scipy.stats functions that use accumulators suffer the same issue, > whether noted or not, and found it true. Do you have a list of the functions that are affected? > "Depending on the input data, this can cause the results to be inaccurate, > especially for float32 (see example below). Specifying a higher-precision > accumulator using the dtype keyword can alleviate this issue." seems rather > un-Pythonic. It's true that in its full generality, this problem just isn't something numpy can solve. Using float32 is extremely dangerous and should not be attempted unless you're prepared to seriously analyze all your code for numeric stability; IME it often runs into problems in practice, in any number of ways. Remember that it only has as much precision as a 24 bit integer. There are good reasons why float64 is the default! That said, it does seem that np.mean could be implemented better than it is, even given float32's inherent limitations. If anyone wants to implement better algorithms for computing the mean, variance, sums, etc., then we would love to add them to numpy. I'd suggest implementing them as gufuncs -- there are examples of defining gufuncs in numpy/linalg/umath_linalg.c.src. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From hoogendoorn.eelco at gmail.com Fri Jul 25 15:23:43 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Fri, 25 Jul 2014 21:23:43 +0200 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: <53D29B2F.9060005@gmail.com> References: <53d296aa.d1c6b40a.6e8d.ffffe5bf@mx.google.com> <53D29B2F.9060005@gmail.com> Message-ID: It need not be exactly representable as such; take the mean of [1, 1+eps] for instance. Granted, there are at most two number in the range of the original dtype which are closest to the true mean; but im not sure that computing them exactly is a tractable problem for arbitrary input. Im not sure what is considered best practice for these problems; or if there is one, considering the hetrogenity of the problem. As noted earlier, summing a list of floating point values is a remarkably multifaceted problem, once you get down into the details. I think it should be understood that all floating point algorithms are subject to floating point errors. As long as the algorithm used is specified, one can make an informed decision if the given algorithm will do what you expect of it. That's the best we can hope for. If we are going to advocate doing 'clever' things behind the scenes, we have to take backwards compatibility (not creating a possibility of producing worse results on the same input) and platform independence in mind. Funny summation orders could violate the former depending on the implementation details, and 'using the highest machine precision available' violates the latter (and is horrible practice in general, imo. Either you don't need the extra accuracy, or you do, and the absence on a given platform should be an error) Perhaps pairwise summation in the original order of the data is the best option: q = np.ones((2,)*26, np.float32) print q.mean() while q.ndim > 0: q = q.mean(axis=-1, dtype=np.float32) print q This only requires log(N) space on the stack if properly implemented, and is not platform dependent, nor should have any backward compatibility issues that I can think of. But im not sure how easy it would be to implement, given the current framework. The ability to specify different algorithms per kwarg wouldn't be a bad idea either, imo; or the ability to explicitly specify a separate output and accumulator dtype. On Fri, Jul 25, 2014 at 8:00 PM, Alan G Isaac wrote: > On 7/25/2014 1:40 PM, Eelco Hoogendoorn wrote: > > At the risk of repeating myself: explicit is better than implicit > > > This sounds like an argument for renaming the `mean` function `naivemean` > rather than `mean`. Whatever numpy names `mean`, shouldn't it > implement an algorithm that produces the mean? And obviously, for any > float data type, the mean value of the values in the array is representable > as a value of the same type. > > Alan Isaac > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rays at blue-cove.com Fri Jul 25 16:25:57 2014 From: rays at blue-cove.com (RayS) Date: Fri, 25 Jul 2014 13:25:57 -0700 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: References: <53d1d132.93d3b40a.2d6b.0714@mx.google.com> <201407251411.s6PEBrpw018675@blue-cove.com> <201407251656.s6PGui7J027752@blue-cove.com> Message-ID: <201407252026.s6PKQ2kZ016912@blue-cove.com> At 11:29 AM 7/25/2014, you wrote: >On Fri, Jul 25, 2014 at 5:56 PM, RayS wrote: > > The important point was that it would be best if all of the > methods affected > > by summing 32 bit floats with 32 bit accumulators had the same Notes as > > numpy.mean(). We went through a lot of code yesterday, assuming that any > > numpy or Scipy.stats functions that use accumulators suffer the same issue, > > whether noted or not, and found it true. > >Do you have a list of the functions that are affected? We only tested a few we used, but scipy.stats.nanmean, or any .*mean() numpy.sum, mean, average, std, var,... via something like: import numpy import scipy.stats print numpy.__version__ print scipy.__version__ onez = numpy.ones((2**25, 1), numpy.float32) step = 2**10 func = scipy.stats.nanmean for s in range(2**24-step, 2**25, step): if func(onez[:s+step])!=1.: print '\nbroke', s, func(onez[:s+step]) break else: print '\r',s, > That said, it does seem that np.mean could be implemented better than >it is, even given float32's inherent limitations. If anyone wants to >implement better algorithms for computing the mean, variance, sums, >etc., then we would love to add them to numpy. Others have pointed out the possible tradeoffs in summation algos, perhaps a method arg would be appropriate, "better" depending on your desire for speed vs. accuracy. It just occurred to me that if the STSI folks (who count photons) took the mean() or other such func of an image array from Hubble sensors to find background value, they'd better always be using float64. - Ray From josef.pktd at gmail.com Fri Jul 25 17:36:27 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 25 Jul 2014 17:36:27 -0400 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: <201407252026.s6PKQ2kZ016912@blue-cove.com> References: <53d1d132.93d3b40a.2d6b.0714@mx.google.com> <201407251411.s6PEBrpw018675@blue-cove.com> <201407251656.s6PGui7J027752@blue-cove.com> <201407252026.s6PKQ2kZ016912@blue-cove.com> Message-ID: On Fri, Jul 25, 2014 at 4:25 PM, RayS wrote: > At 11:29 AM 7/25/2014, you wrote: > >On Fri, Jul 25, 2014 at 5:56 PM, RayS wrote: > > > The important point was that it would be best if all of the > > methods affected > > > by summing 32 bit floats with 32 bit accumulators had the same Notes as > > > numpy.mean(). We went through a lot of code yesterday, assuming that > any > > > numpy or Scipy.stats functions that use accumulators suffer the same > issue, > > > whether noted or not, and found it true. > > > >Do you have a list of the functions that are affected? > > We only tested a few we used, but > scipy.stats.nanmean, or any .*mean() > numpy.sum, mean, average, std, var,... > > via something like: > > import numpy > import scipy.stats > print numpy.__version__ > print scipy.__version__ > onez = numpy.ones((2**25, 1), numpy.float32) > step = 2**10 > func = scipy.stats.nanmean > for s in range(2**24-step, 2**25, step): > if func(onez[:s+step])!=1.: > print '\nbroke', s, func(onez[:s+step]) > break > else: > print '\r',s, > > > That said, it does seem that np.mean could be implemented better than > >it is, even given float32's inherent limitations. If anyone wants to > >implement better algorithms for computing the mean, variance, sums, > >etc., then we would love to add them to numpy. > > Others have pointed out the possible tradeoffs in summation algos, > perhaps a method arg would be appropriate, "better" depending on your > desire for speed vs. accuracy. > I think this would be a good improvement. But it doesn't compensate for users to be aware of the problems. I think the docstring and the description of the dtype argument is pretty clear. I'm largely with Eelco, whatever precision or algorithm we use, with floating point calculations we run into problems in some cases. And I don't think this is a "broken" function but a design decision that takes the different tradeoffs into account. Whether it's the right decision is an open question, if there are better algorithm with essentially not disadvantages. Two examples: I had problems to verify some results against Stata at more than a few significant digits, until I realized that Stata had used float32 for the calculations by default in this case, while I was working with float64. Using single precision linear algebra causes the same numerical problems as numpy.mean runs into. A few years ago I tried to match some tougher NIST examples that were intentionally very badly scaled. numpy.mean at float64 had quite large errors, but a simple trick with two passes through the data managed to get very close to the certified NIST examples. my conclusion: don't use float32 unless you know you don't need any higher precision. even with float64 it is possible to run into extreme cases where you get numerical garbage or large precision losses. However, in the large majority of cases a boring fast "naive" implementation is enough. Also, whether we use mean, sum or dot in a calculation is an implementation detail, which in the case of dot doesn't have a dtype argument and always depends on the dtype of the arrays, AFAIK. Separating the accumulation dtype from the array dtype would require a lot of work except in the simplest cases, like those that only use sum and mean with specified dtype argument. trying out the original example: >>> X = np.ones((50000, 1024), np.float32) >>> X.mean() 1.0 >>> X.mean(dtype=np.float32) 1.0 >>> np.dot(X.ravel(), np.ones(X.ravel().shape) *1. / X.ravel().shape) 1.0000000002299174 >>> np.__version__ '1.5.1' Win32 Josef > > It just occurred to me that if the STSI folks (who count photons) > took the mean() or other such func of an image array from Hubble > sensors to find background value, they'd better always be using float64. > > - Ray > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Fri Jul 25 17:51:40 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Sat, 26 Jul 2014 00:51:40 +0300 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays Message-ID: <53d2d191.48c9c20a.026d.4a6a@mx.google.com> Ray: I'm not working with Hubble data, but yeah these are all issues I've run into with my terrabytes of microscopy data as well. Given that such raw data comes as uint16, its best to do your calculations as much as possible in good old ints. What you compute is what you get, no obscure shenanigans. It just occurred to me that pairwise summation will lead to highly branchy code, and you can forget about any vector extensions. Tradeoffs indeed. Any such hierarchical summation is probably best done by aggregating naively summed blocks. -----Original Message----- From: "RayS" Sent: ?25-?7-?2014 23:26 To: "Discussion of Numerical Python" Subject: Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays At 11:29 AM 7/25/2014, you wrote: >On Fri, Jul 25, 2014 at 5:56 PM, RayS wrote: > > The important point was that it would be best if all of the > methods affected > > by summing 32 bit floats with 32 bit accumulators had the same Notes as > > numpy.mean(). We went through a lot of code yesterday, assuming that any > > numpy or Scipy.stats functions that use accumulators suffer the same issue, > > whether noted or not, and found it true. > >Do you have a list of the functions that are affected? We only tested a few we used, but scipy.stats.nanmean, or any .*mean() numpy.sum, mean, average, std, var,... via something like: import numpy import scipy.stats print numpy.__version__ print scipy.__version__ onez = numpy.ones((2**25, 1), numpy.float32) step = 2**10 func = scipy.stats.nanmean for s in range(2**24-step, 2**25, step): if func(onez[:s+step])!=1.: print '\nbroke', s, func(onez[:s+step]) break else: print '\r',s, > That said, it does seem that np.mean could be implemented better than >it is, even given float32's inherent limitations. If anyone wants to >implement better algorithms for computing the mean, variance, sums, >etc., then we would love to add them to numpy. Others have pointed out the possible tradeoffs in summation algos, perhaps a method arg would be appropriate, "better" depending on your desire for speed vs. accuracy. It just occurred to me that if the STSI folks (who count photons) took the mean() or other such func of an image array from Hubble sensors to find background value, they'd better always be using float64. - Ray _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Fri Jul 25 17:57:51 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Fri, 25 Jul 2014 23:57:51 +0200 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: <53d2d191.48c9c20a.026d.4a6a@mx.google.com> References: <53d2d191.48c9c20a.026d.4a6a@mx.google.com> Message-ID: <53D2D2DF.9000907@googlemail.com> On 25.07.2014 23:51, Eelco Hoogendoorn wrote: > Ray: I'm not working with Hubble data, but yeah these are all issues > I've run into with my terrabytes of microscopy data as well. Given that > such raw data comes as uint16, its best to do your calculations as much > as possible in good old ints. What you compute is what you get, no > obscure shenanigans. integers are dangerous too, they overflow quickly and signed overflow is even undefined in C the standard. > > It just occurred to me that pairwise summation will lead to highly > branchy code, and you can forget about any vector extensions. Tradeoffs > indeed. Any such hierarchical summation is probably best done by > aggregating naively summed blocks. pairwise summation is usually implemented with a naive sum cutoff large enough so the recursion does not matter much. In numpy 1.9 this cutoff is 128 elements, but the inner loop is unrolled 8 times which makes it effectively 16 elements. the unrolling factor of 8 was intentionally chosen to allow using AVX in the inner loop without changing the summation ordering, but last I tested actually using AVX here only gave mediocre speedups (10%-20% on an i5). > ------------------------------------------------------------------------ > From: RayS > Sent: ?25-?7-?2014 23:26 > To: Discussion of Numerical Python > Subject: Re: [Numpy-discussion] numpy.mean still broken for > largefloat32arrays > > At 11:29 AM 7/25/2014, you wrote: >>On Fri, Jul 25, 2014 at 5:56 PM, RayS wrote: >> > The important point was that it would be best if all of the >> methods affected >> > by summing 32 bit floats with 32 bit accumulators had the same Notes as >> > numpy.mean(). We went through a lot of code yesterday, assuming that any >> > numpy or Scipy.stats functions that use accumulators suffer the same > issue, >> > whether noted or not, and found it true. >> >>Do you have a list of the functions that are affected? > > We only tested a few we used, but > scipy.stats.nanmean, or any .*mean() > numpy.sum, mean, average, std, var,... > > via something like: > > import numpy > import scipy.stats > print numpy.__version__ > print scipy.__version__ > onez = numpy.ones((2**25, 1), numpy.float32) > step = 2**10 > func = scipy.stats.nanmean > for s in range(2**24-step, 2**25, step): > if func(onez[:s+step])!=1.: > print '\nbroke', s, func(onez[:s+step]) > break > else: > print '\r',s, > >> That said, it does seem that np.mean could be implemented better than >>it is, even given float32's inherent limitations. If anyone wants to >>implement better algorithms for computing the mean, variance, sums, >>etc., then we would love to add them to numpy. > > Others have pointed out the possible tradeoffs in summation algos, > perhaps a method arg would be appropriate, "better" depending on your > desire for speed vs. accuracy. > > It just occurred to me that if the STSI folks (who count photons) > took the mean() or other such func of an image array from Hubble > sensors to find background value, they'd better always be using float64. > > - Ray > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From rays at blue-cove.com Fri Jul 25 19:51:03 2014 From: rays at blue-cove.com (RayS) Date: Fri, 25 Jul 2014 16:51:03 -0700 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: References: <53d1d132.93d3b40a.2d6b.0714@mx.google.com> <201407251411.s6PEBrpw018675@blue-cove.com> <201407251656.s6PGui7J027752@blue-cove.com> <201407252026.s6PKQ2kZ016912@blue-cove.com> Message-ID: <201407252351.s6PNpBGH022808@blue-cove.com> At 02:36 PM 7/25/2014, you wrote: >But it doesn't compensate for users to be aware of the problems. I >think the docstring and the description of the dtype argument is pretty clear. Most of the docs for the affected functions do not have a Note with the same warning as mean() - Ray From larsmans at gmail.com Sat Jul 26 04:19:01 2014 From: larsmans at gmail.com (Lars Buitinck) Date: Sat, 26 Jul 2014 10:19:01 +0200 Subject: [Numpy-discussion] change default integer from int32 to int64 on win64? Message-ID: > Date: Fri, 25 Jul 2014 15:06:40 +0200 > From: Olivier Grisel > Subject: Re: [Numpy-discussion] change default integer from int32 to > int64 on win64? > To: Discussion of Numerical Python > Content-Type: text/plain; charset=UTF-8 > > The dtype returned by np.where looks right (int64): > >>>> import platform >>>> platform.architecture() > ('64bit', 'WindowsPE') >>>> import numpy as np >>>> np.__version__ > '1.9.0b1' >>>> a = np.zeros(10) >>>> np.where(a == 0) > (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int64),) Strange. In [1] we had to cast the result of np.where because it was an array of long. I ran through the NumPy code, and I couldn't find the flaw, but neither could I find a point in the history where it was fixed. [1] https://github.com/scikit-learn/scikit-learn/commit/ebdeddbab1620c2473d04dc242d1e30684af9511 From robert.kern at gmail.com Sat Jul 26 04:49:54 2014 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 26 Jul 2014 09:49:54 +0100 Subject: [Numpy-discussion] change default integer from int32 to int64 on win64? In-Reply-To: References: Message-ID: On Sat, Jul 26, 2014 at 9:19 AM, Lars Buitinck wrote: >> Date: Fri, 25 Jul 2014 15:06:40 +0200 >> From: Olivier Grisel >> Subject: Re: [Numpy-discussion] change default integer from int32 to >> int64 on win64? >> To: Discussion of Numerical Python >> Content-Type: text/plain; charset=UTF-8 >> >> The dtype returned by np.where looks right (int64): >> >>>>> import platform >>>>> platform.architecture() >> ('64bit', 'WindowsPE') >>>>> import numpy as np >>>>> np.__version__ >> '1.9.0b1' >>>>> a = np.zeros(10) >>>>> np.where(a == 0) >> (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int64),) > > Strange. In [1] we had to cast the result of np.where because it was > an array of long. I ran through the NumPy code, and I couldn't find > the flaw, but neither could I find a point in the history where it was > fixed. > > [1] https://github.com/scikit-learn/scikit-learn/commit/ebdeddbab1620c2473d04dc242d1e30684af9511 As far as I can tell, it's been that way essentially forever, before numpy was numpy: https://github.com/numpy/numpy/commit/8cb36a62#diff-88aedadb94e0ead6b434d55f81668471R645 -- Robert Kern From hoogendoorn.eelco at gmail.com Sat Jul 26 05:05:08 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Sat, 26 Jul 2014 12:05:08 +0300 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays Message-ID: <53d36f6e.a959b40a.7f23.738c@mx.google.com> Cool, sounds like great improvements. I can imagine that after some loop unrolling one becomes memory bound pretty soon. Is the summation guaranteed to traverse the data in its natural order? And do you happen to know what the rules for choosing accumulator dtypes are? -----Original Message----- From: "Julian Taylor" Sent: ?26-?7-?2014 00:58 To: "Discussion of Numerical Python" Subject: Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays On 25.07.2014 23:51, Eelco Hoogendoorn wrote: > Ray: I'm not working with Hubble data, but yeah these are all issues > I've run into with my terrabytes of microscopy data as well. Given that > such raw data comes as uint16, its best to do your calculations as much > as possible in good old ints. What you compute is what you get, no > obscure shenanigans. integers are dangerous too, they overflow quickly and signed overflow is even undefined in C the standard. > > It just occurred to me that pairwise summation will lead to highly > branchy code, and you can forget about any vector extensions. Tradeoffs > indeed. Any such hierarchical summation is probably best done by > aggregating naively summed blocks. pairwise summation is usually implemented with a naive sum cutoff large enough so the recursion does not matter much. In numpy 1.9 this cutoff is 128 elements, but the inner loop is unrolled 8 times which makes it effectively 16 elements. the unrolling factor of 8 was intentionally chosen to allow using AVX in the inner loop without changing the summation ordering, but last I tested actually using AVX here only gave mediocre speedups (10%-20% on an i5). > ------------------------------------------------------------------------ > From: RayS > Sent: ?25-?7-?2014 23:26 > To: Discussion of Numerical Python > Subject: Re: [Numpy-discussion] numpy.mean still broken for > largefloat32arrays > > At 11:29 AM 7/25/2014, you wrote: >>On Fri, Jul 25, 2014 at 5:56 PM, RayS wrote: >> > The important point was that it would be best if all of the >> methods affected >> > by summing 32 bit floats with 32 bit accumulators had the same Notes as >> > numpy.mean(). We went through a lot of code yesterday, assuming that any >> > numpy or Scipy.stats functions that use accumulators suffer the same > issue, >> > whether noted or not, and found it true. >> >>Do you have a list of the functions that are affected? > > We only tested a few we used, but > scipy.stats.nanmean, or any .*mean() > numpy.sum, mean, average, std, var,... > > via something like: > > import numpy > import scipy.stats > print numpy.__version__ > print scipy.__version__ > onez = numpy.ones((2**25, 1), numpy.float32) > step = 2**10 > func = scipy.stats.nanmean > for s in range(2**24-step, 2**25, step): > if func(onez[:s+step])!=1.: > print '\nbroke', s, func(onez[:s+step]) > break > else: > print '\r',s, > >> That said, it does seem that np.mean could be implemented better than >>it is, even given float32's inherent limitations. If anyone wants to >>implement better algorithms for computing the mean, variance, sums, >>etc., then we would love to add them to numpy. > > Others have pointed out the possible tradeoffs in summation algos, > perhaps a method arg would be appropriate, "better" depending on your > desire for speed vs. accuracy. > > It just occurred to me that if the STSI folks (who count photons) > took the mean() or other such func of an image array from Hubble > sensors to find background value, they'd better always be using float64. > > - Ray > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Sat Jul 26 05:15:00 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sat, 26 Jul 2014 11:15:00 +0200 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: References: <53d296aa.d1c6b40a.6e8d.ffffe5bf@mx.google.com> <53D29B2F.9060005@gmail.com> Message-ID: <1406366100.30315.7.camel@sebastian-t440> On Fr, 2014-07-25 at 21:23 +0200, Eelco Hoogendoorn wrote: > It need not be exactly representable as such; take the mean of [1, 1 > +eps] for instance. Granted, there are at most two number in the range > of the original dtype which are closest to the true mean; but im not > sure that computing them exactly is a tractable problem for arbitrary > input. > > > This only requires log(N) space on the stack if properly implemented, > and is not platform dependent, nor should have any backward > compatibility issues that I can think of. But im not sure how easy it > would be to implement, given the current framework. The ability to > specify different algorithms per kwarg wouldn't be a bad idea either, > imo; or the ability to explicitly specify a separate output and > accumulator dtype. > > Well, you already can use dtype to cause an upcast of both arrays. However this currently will cause a buffered upcast to float64 for the float32 data. You could also add a d,f->d loop to avoid the cast, but then you would have to use the out argument currently. In any case, the real solution here is IMO what I think most of us already thought before would be good, and that is a keyword argument or maybe context (though I am unsure about details with threading, etc.) to chose more stable algorithms for such statistical functions. The pairwise summation that is in master now is very awesome, but it is not secure enough in the sense that a new user will have difficulty understanding when he can be sure it is used. - Sebastian > > On Fri, Jul 25, 2014 at 8:00 PM, Alan G Isaac > wrote: > On 7/25/2014 1:40 PM, Eelco Hoogendoorn wrote: > > At the risk of repeating myself: explicit is better than > implicit From sturla.molden at gmail.com Sat Jul 26 06:39:18 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sat, 26 Jul 2014 10:39:18 +0000 (UTC) Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays References: <53d296aa.d1c6b40a.6e8d.ffffe5bf@mx.google.com> <53D29B2F.9060005@gmail.com> <1406366100.30315.7.camel@sebastian-t440> Message-ID: <956345389428063831.570461sturla.molden-gmail.com@news.gmane.org> Sebastian Berg wrote: > chose more stable algorithms for such statistical functions. The > pairwise summation that is in master now is very awesome, but it is not > secure enough in the sense that a new user will have difficulty > understanding when he can be sure it is used. Why is it not always used? From hoogendoorn.eelco at gmail.com Sat Jul 26 09:38:46 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Sat, 26 Jul 2014 15:38:46 +0200 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: <956345389428063831.570461sturla.molden-gmail.com@news.gmane.org> References: <53d296aa.d1c6b40a.6e8d.ffffe5bf@mx.google.com> <53D29B2F.9060005@gmail.com> <1406366100.30315.7.camel@sebastian-t440> <956345389428063831.570461sturla.molden-gmail.com@news.gmane.org> Message-ID: I was wondering the same thing. Are there any known tradeoffs to this method of reduction? On Sat, Jul 26, 2014 at 12:39 PM, Sturla Molden wrote: > Sebastian Berg wrote: > > > chose more stable algorithms for such statistical functions. The > > pairwise summation that is in master now is very awesome, but it is not > > secure enough in the sense that a new user will have difficulty > > understanding when he can be sure it is used. > > Why is it not always used? > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Sat Jul 26 09:53:06 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Sat, 26 Jul 2014 15:53:06 +0200 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: References: <53d296aa.d1c6b40a.6e8d.ffffe5bf@mx.google.com> <53D29B2F.9060005@gmail.com> <1406366100.30315.7.camel@sebastian-t440> <956345389428063831.570461sturla.molden-gmail.com@news.gmane.org> Message-ID: <53D3B2C2.7090309@googlemail.com> On 26.07.2014 15:38, Eelco Hoogendoorn wrote: > > Why is it not always used? for 1d reduction the iterator blocks by 8192 elements even when no buffering is required. There is a TODO in the source to fix that by adding additional checks. Unfortunately nobody knows hat these additional tests would need to be and Mark Wiebe who wrote it did not reply to a ping yet. Also along the non-fast axes the iterator optimizes the reduction to remove the strided access, see: https://github.com/numpy/numpy/pull/4697#issuecomment-42752599 Instead of having a keyword argument to mean I would prefer a context manager that changes algorithms for different requirements. This would easily allow changing the accuracy and performance of third party functions using numpy without changing the third party library as long as they are using numpy as the base. E.g. with np.precisionstate(sum="kahan"): scipy.stats.nanmean(d) We also have case where numpy uses algorithms that are far more precise than most people needs them. E.g. np.hypot and the related complex absolute value and division. These are very slow with glibc as it provides 1ulp accuracy, this is hardly ever needed. Another case that could use dynamic changing is flushing subnormals to zero. But this api is like Nathaniels parameterizable dtypes just an idea floating in my head which needs proper design and implementation written down. The issue is as usual ENOTIME. From ben.root at ou.edu Sat Jul 26 09:57:05 2014 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 26 Jul 2014 09:57:05 -0400 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: <53D3B2C2.7090309@googlemail.com> References: <53d296aa.d1c6b40a.6e8d.ffffe5bf@mx.google.com> <53D29B2F.9060005@gmail.com> <1406366100.30315.7.camel@sebastian-t440> <956345389428063831.570461sturla.molden-gmail.com@news.gmane.org> <53D3B2C2.7090309@googlemail.com> Message-ID: I could get behind the context manager approach. It would help keep backwards compatibility, while providing a very easy (and clean) way of consistently using the same reduction operation. Adding kwargs is just a road to hell. Cheers! Ben Root On Sat, Jul 26, 2014 at 9:53 AM, Julian Taylor < jtaylor.debian at googlemail.com> wrote: > On 26.07.2014 15:38, Eelco Hoogendoorn wrote: > > > > Why is it not always used? > > for 1d reduction the iterator blocks by 8192 elements even when no > buffering is required. There is a TODO in the source to fix that by > adding additional checks. Unfortunately nobody knows hat these > additional tests would need to be and Mark Wiebe who wrote it did not > reply to a ping yet. > > Also along the non-fast axes the iterator optimizes the reduction to > remove the strided access, see: > https://github.com/numpy/numpy/pull/4697#issuecomment-42752599 > > > Instead of having a keyword argument to mean I would prefer a context > manager that changes algorithms for different requirements. > This would easily allow changing the accuracy and performance of third > party functions using numpy without changing the third party library as > long as they are using numpy as the base. > E.g. > with np.precisionstate(sum="kahan"): > scipy.stats.nanmean(d) > > We also have case where numpy uses algorithms that are far more precise > than most people needs them. E.g. np.hypot and the related complex > absolute value and division. > These are very slow with glibc as it provides 1ulp accuracy, this is > hardly ever needed. > Another case that could use dynamic changing is flushing subnormals to > zero. > > But this api is like Nathaniels parameterizable dtypes just an idea > floating in my head which needs proper design and implementation written > down. The issue is as usual ENOTIME. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Sat Jul 26 09:58:51 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sat, 26 Jul 2014 15:58:51 +0200 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: References: <53d296aa.d1c6b40a.6e8d.ffffe5bf@mx.google.com> <53D29B2F.9060005@gmail.com> <1406366100.30315.7.camel@sebastian-t440> <956345389428063831.570461sturla.molden-gmail.com@news.gmane.org> Message-ID: <1406383131.30315.9.camel@sebastian-t440> On Sa, 2014-07-26 at 15:38 +0200, Eelco Hoogendoorn wrote: > I was wondering the same thing. Are there any known tradeoffs to this > method of reduction? > Yes, it is much more complicated and incompatible with naive ufuncs if you want your memory access to be optimized. And optimizing that is very much worth it speed wise... - Sebastian > > On Sat, Jul 26, 2014 at 12:39 PM, Sturla Molden > wrote: > Sebastian Berg wrote: > > > chose more stable algorithms for such statistical functions. > The > > pairwise summation that is in master now is very awesome, > but it is not > > secure enough in the sense that a new user will have > difficulty > > understanding when he can be sure it is used. > > > Why is it not always used? > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From hoogendoorn.eelco at gmail.com Sat Jul 26 10:10:50 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Sat, 26 Jul 2014 16:10:50 +0200 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: <53D3B2C2.7090309@googlemail.com> References: <53d296aa.d1c6b40a.6e8d.ffffe5bf@mx.google.com> <53D29B2F.9060005@gmail.com> <1406366100.30315.7.camel@sebastian-t440> <956345389428063831.570461sturla.molden-gmail.com@news.gmane.org> <53D3B2C2.7090309@googlemail.com> Message-ID: A context manager makes sense. I very much appreciate the time constraints and the effort put in this far, but if we can not make something work uniformly, I wonder if we should include it in the master at all. I don't have a problem with customizing algorithms where fp accuracy demands it; I have more of a problem with hard to predict behavior. If np.ones(bigN).sum() gives different results than np.ones((bigN,2)).sum(0), aside from the obvious differences, that would be one hard to catch source of bugs. Wouldn't per-axis reduction, as a limited form of nested reduction, provide most of the benefits, without any of the drawbacks? On Sat, Jul 26, 2014 at 3:53 PM, Julian Taylor < jtaylor.debian at googlemail.com> wrote: > On 26.07.2014 15:38, Eelco Hoogendoorn wrote: > > > > Why is it not always used? > > for 1d reduction the iterator blocks by 8192 elements even when no > buffering is required. There is a TODO in the source to fix that by > adding additional checks. Unfortunately nobody knows hat these > additional tests would need to be and Mark Wiebe who wrote it did not > reply to a ping yet. > > Also along the non-fast axes the iterator optimizes the reduction to > remove the strided access, see: > https://github.com/numpy/numpy/pull/4697#issuecomment-42752599 > > > Instead of having a keyword argument to mean I would prefer a context > manager that changes algorithms for different requirements. > This would easily allow changing the accuracy and performance of third > party functions using numpy without changing the third party library as > long as they are using numpy as the base. > E.g. > with np.precisionstate(sum="kahan"): > scipy.stats.nanmean(d) > > We also have case where numpy uses algorithms that are far more precise > than most people needs them. E.g. np.hypot and the related complex > absolute value and division. > These are very slow with glibc as it provides 1ulp accuracy, this is > hardly ever needed. > Another case that could use dynamic changing is flushing subnormals to > zero. > > But this api is like Nathaniels parameterizable dtypes just an idea > floating in my head which needs proper design and implementation written > down. The issue is as usual ENOTIME. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Sat Jul 26 11:11:09 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sat, 26 Jul 2014 15:11:09 +0000 (UTC) Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays References: <53d296aa.d1c6b40a.6e8d.ffffe5bf@mx.google.com> <53D29B2F.9060005@gmail.com> <1406366100.30315.7.camel@sebastian-t440> <956345389428063831.570461sturla.molden-gmail.com@news.gmane.org> <1406383131.30315.9.camel@sebastian-t440> Message-ID: <555575025428079997.909854sturla.molden-gmail.com@news.gmane.org> Sebastian Berg wrote: > Yes, it is much more complicated and incompatible with naive ufuncs if > you want your memory access to be optimized. And optimizing that is very > much worth it speed wise... Why? Couldn't we just copy the data chunk-wise to a temporary buffer of say 2**13 numbers and then reduce that? I don't see why we need another iterator for that. Sturla From sturla.molden at gmail.com Sat Jul 26 12:34:10 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sat, 26 Jul 2014 16:34:10 +0000 (UTC) Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays References: <53d296aa.d1c6b40a.6e8d.ffffe5bf@mx.google.com> <53D29B2F.9060005@gmail.com> <1406366100.30315.7.camel@sebastian-t440> <956345389428063831.570461sturla.molden-gmail.com@news.gmane.org> <1406383131.30315.9.camel@sebastian-t440> <555575025428079997.909854sturla.molden-gmail.com@news.gmane.org> Message-ID: <2097579651428085154.727016sturla.molden-gmail.com@news.gmane.org> Sturla Molden wrote: > Sebastian Berg wrote: > >> Yes, it is much more complicated and incompatible with naive ufuncs if >> you want your memory access to be optimized. And optimizing that is very >> much worth it speed wise... > > Why? Couldn't we just copy the data chunk-wise to a temporary buffer of say > 2**13 numbers and then reduce that? I don't see why we need another > iterator for that. I am sorry if this is a stupid suggestion. My knowledge of how NumPy ufuncs works could have been better. Sturla From josef.pktd at gmail.com Sat Jul 26 14:29:11 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 26 Jul 2014 14:29:11 -0400 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: References: <53d296aa.d1c6b40a.6e8d.ffffe5bf@mx.google.com> <53D29B2F.9060005@gmail.com> <1406366100.30315.7.camel@sebastian-t440> <956345389428063831.570461sturla.molden-gmail.com@news.gmane.org> <53D3B2C2.7090309@googlemail.com> Message-ID: On Sat, Jul 26, 2014 at 9:57 AM, Benjamin Root wrote: > I could get behind the context manager approach. It would help keep > backwards compatibility, while providing a very easy (and clean) way of > consistently using the same reduction operation. Adding kwargs is just a > road to hell. > Wouldn't a context manager require a global state that changes how everything is calculated ? Josef > > Cheers! > Ben Root > > > On Sat, Jul 26, 2014 at 9:53 AM, Julian Taylor < > jtaylor.debian at googlemail.com> wrote: > >> On 26.07.2014 15:38, Eelco Hoogendoorn wrote: >> > >> > Why is it not always used? >> >> for 1d reduction the iterator blocks by 8192 elements even when no >> buffering is required. There is a TODO in the source to fix that by >> adding additional checks. Unfortunately nobody knows hat these >> additional tests would need to be and Mark Wiebe who wrote it did not >> reply to a ping yet. >> >> Also along the non-fast axes the iterator optimizes the reduction to >> remove the strided access, see: >> https://github.com/numpy/numpy/pull/4697#issuecomment-42752599 >> >> >> Instead of having a keyword argument to mean I would prefer a context >> manager that changes algorithms for different requirements. >> This would easily allow changing the accuracy and performance of third >> party functions using numpy without changing the third party library as >> long as they are using numpy as the base. >> E.g. >> with np.precisionstate(sum="kahan"): >> scipy.stats.nanmean(d) >> >> We also have case where numpy uses algorithms that are far more precise >> than most people needs them. E.g. np.hypot and the related complex >> absolute value and division. >> These are very slow with glibc as it provides 1ulp accuracy, this is >> hardly ever needed. >> Another case that could use dynamic changing is flushing subnormals to >> zero. >> >> But this api is like Nathaniels parameterizable dtypes just an idea >> floating in my head which needs proper design and implementation written >> down. The issue is as usual ENOTIME. >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Sat Jul 26 14:44:08 2014 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 26 Jul 2014 14:44:08 -0400 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: References: <53d296aa.d1c6b40a.6e8d.ffffe5bf@mx.google.com> <53D29B2F.9060005@gmail.com> <1406366100.30315.7.camel@sebastian-t440> <956345389428063831.570461sturla.molden-gmail.com@news.gmane.org> <53D3B2C2.7090309@googlemail.com> Message-ID: That is one way of doing it, and probably the cleanest way. Or else you have to pass in the context object everywhere anyway. But I am not so concerned about that (we do that for other things as well). Bigger concerns would be nested contexts. For example, what if one of the scikit functions use such a context to explicitly state that they need a particular reduction algorithm, but the call to that scikit function is buried under a few layers of user functions, at the top of which has a context manager that states a different reduction op. Whose context wins? Naively, the scikit's context wins (because that's how contexts work). But, does that break with the very broad design goal here? To let the user specify the reduction kernel? Practically speaking, we could see users naively puting in context managers all over the place in their libraries, possibly choosing incorrect algorithms (I am serious here, how often have we seen stackoverflow instructions just blindly parrot certain arguments "just because")? This gives the user no real mechanism to override the library, largely defeating the purpose. My other concern would be with multi-threaded code (which is where a global state would be bad). Ben On Sat, Jul 26, 2014 at 2:29 PM, wrote: > > > > On Sat, Jul 26, 2014 at 9:57 AM, Benjamin Root wrote: > >> I could get behind the context manager approach. It would help keep >> backwards compatibility, while providing a very easy (and clean) way of >> consistently using the same reduction operation. Adding kwargs is just a >> road to hell. >> > > Wouldn't a context manager require a global state that changes how > everything is calculated ? > > Josef > > > >> >> Cheers! >> Ben Root >> >> >> On Sat, Jul 26, 2014 at 9:53 AM, Julian Taylor < >> jtaylor.debian at googlemail.com> wrote: >> >>> On 26.07.2014 15:38, Eelco Hoogendoorn wrote: >>> > >>> > Why is it not always used? >>> >>> for 1d reduction the iterator blocks by 8192 elements even when no >>> buffering is required. There is a TODO in the source to fix that by >>> adding additional checks. Unfortunately nobody knows hat these >>> additional tests would need to be and Mark Wiebe who wrote it did not >>> reply to a ping yet. >>> >>> Also along the non-fast axes the iterator optimizes the reduction to >>> remove the strided access, see: >>> https://github.com/numpy/numpy/pull/4697#issuecomment-42752599 >>> >>> >>> Instead of having a keyword argument to mean I would prefer a context >>> manager that changes algorithms for different requirements. >>> This would easily allow changing the accuracy and performance of third >>> party functions using numpy without changing the third party library as >>> long as they are using numpy as the base. >>> E.g. >>> with np.precisionstate(sum="kahan"): >>> scipy.stats.nanmean(d) >>> >>> We also have case where numpy uses algorithms that are far more precise >>> than most people needs them. E.g. np.hypot and the related complex >>> absolute value and division. >>> These are very slow with glibc as it provides 1ulp accuracy, this is >>> hardly ever needed. >>> Another case that could use dynamic changing is flushing subnormals to >>> zero. >>> >>> But this api is like Nathaniels parameterizable dtypes just an idea >>> floating in my head which needs proper design and implementation written >>> down. The issue is as usual ENOTIME. >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat Jul 26 15:00:12 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 26 Jul 2014 15:00:12 -0400 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: References: <53d296aa.d1c6b40a.6e8d.ffffe5bf@mx.google.com> <53D29B2F.9060005@gmail.com> <1406366100.30315.7.camel@sebastian-t440> <956345389428063831.570461sturla.molden-gmail.com@news.gmane.org> <53D3B2C2.7090309@googlemail.com> Message-ID: On Sat, Jul 26, 2014 at 2:44 PM, Benjamin Root wrote: > That is one way of doing it, and probably the cleanest way. Or else you > have to pass in the context object everywhere anyway. But I am not so > concerned about that (we do that for other things as well). Bigger concerns > would be nested contexts. For example, what if one of the scikit functions > use such a context to explicitly state that they need a particular > reduction algorithm, but the call to that scikit function is buried under a > few layers of user functions, at the top of which has a context manager > that states a different reduction op. > > Whose context wins? Naively, the scikit's context wins (because that's how > contexts work). But, does that break with the very broad design goal here? > To let the user specify the reduction kernel? Practically speaking, we > could see users naively puting in context managers all over the place in > their libraries, possibly choosing incorrect algorithms (I am serious here, > how often have we seen stackoverflow instructions just blindly parrot > certain arguments "just because")? This gives the user no real mechanism to > override the library, largely defeating the purpose. > > My other concern would be with multi-threaded code (which is where a > global state would be bad). > statsmodels still has avoided anything that smells like a global state that changes calculation. (We never even implemented different global warning levels.) https://groups.google.com/d/msg/pystatsmodels/-J9WXKLjyH4/5xvKu9_mbbEJ Josef There be Dragons. > > Ben > > > > On Sat, Jul 26, 2014 at 2:29 PM, wrote: > >> >> >> >> On Sat, Jul 26, 2014 at 9:57 AM, Benjamin Root wrote: >> >>> I could get behind the context manager approach. It would help keep >>> backwards compatibility, while providing a very easy (and clean) way of >>> consistently using the same reduction operation. Adding kwargs is just a >>> road to hell. >>> >> >> Wouldn't a context manager require a global state that changes how >> everything is calculated ? >> >> Josef >> >> >> >>> >>> Cheers! >>> Ben Root >>> >>> >>> On Sat, Jul 26, 2014 at 9:53 AM, Julian Taylor < >>> jtaylor.debian at googlemail.com> wrote: >>> >>>> On 26.07.2014 15:38, Eelco Hoogendoorn wrote: >>>> > >>>> > Why is it not always used? >>>> >>>> for 1d reduction the iterator blocks by 8192 elements even when no >>>> buffering is required. There is a TODO in the source to fix that by >>>> adding additional checks. Unfortunately nobody knows hat these >>>> additional tests would need to be and Mark Wiebe who wrote it did not >>>> reply to a ping yet. >>>> >>>> Also along the non-fast axes the iterator optimizes the reduction to >>>> remove the strided access, see: >>>> https://github.com/numpy/numpy/pull/4697#issuecomment-42752599 >>>> >>>> >>>> Instead of having a keyword argument to mean I would prefer a context >>>> manager that changes algorithms for different requirements. >>>> This would easily allow changing the accuracy and performance of third >>>> party functions using numpy without changing the third party library as >>>> long as they are using numpy as the base. >>>> E.g. >>>> with np.precisionstate(sum="kahan"): >>>> scipy.stats.nanmean(d) >>>> >>>> We also have case where numpy uses algorithms that are far more precise >>>> than most people needs them. E.g. np.hypot and the related complex >>>> absolute value and division. >>>> These are very slow with glibc as it provides 1ulp accuracy, this is >>>> hardly ever needed. >>>> Another case that could use dynamic changing is flushing subnormals to >>>> zero. >>>> >>>> But this api is like Nathaniels parameterizable dtypes just an idea >>>> floating in my head which needs proper design and implementation written >>>> down. The issue is as usual ENOTIME. >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Sat Jul 26 15:04:10 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sat, 26 Jul 2014 19:04:10 +0000 (UTC) Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays References: <53D29B2F.9060005@gmail.com> <1406366100.30315.7.camel@sebastian-t440> <956345389428063831.570461sturla.molden-gmail.com@news.gmane.org> <53D3B2C2.7090309@googlemail.com> Message-ID: <1676055168428094052.869608sturla.molden-gmail.com@news.gmane.org> Benjamin Root wrote: > My other concern would be with multi-threaded code (which is where a global > state would be bad). It would presumably require a global threading.RLock for protecting the global state. Sturla From hoogendoorn.eelco at gmail.com Sat Jul 26 15:19:59 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Sat, 26 Jul 2014 21:19:59 +0200 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: <2097579651428085154.727016sturla.molden-gmail.com@news.gmane.org> References: <53d296aa.d1c6b40a.6e8d.ffffe5bf@mx.google.com> <53D29B2F.9060005@gmail.com> <1406366100.30315.7.camel@sebastian-t440> <956345389428063831.570461sturla.molden-gmail.com@news.gmane.org> <1406383131.30315.9.camel@sebastian-t440> <555575025428079997.909854sturla.molden-gmail.com@news.gmane.org> <2097579651428085154.727016sturla.molden-gmail.com@news.gmane.org> Message-ID: Perhaps I in turn am missing something; but I would suppose that any algorithm that requires multiple passes over the data is off the table? Perhaps I am being a little old fashioned and performance oriented here, but to make the ultra-majority of use cases suffer a factor two performance penalty for an odd use case which already has a plethora of fine and dandy solutions? Id vote against, fwiw... On Sat, Jul 26, 2014 at 6:34 PM, Sturla Molden wrote: > Sturla Molden wrote: > > Sebastian Berg wrote: > > > >> Yes, it is much more complicated and incompatible with naive ufuncs if > >> you want your memory access to be optimized. And optimizing that is very > >> much worth it speed wise... > > > > Why? Couldn't we just copy the data chunk-wise to a temporary buffer of > say > > 2**13 numbers and then reduce that? I don't see why we need another > > iterator for that. > > I am sorry if this is a stupid suggestion. My knowledge of how NumPy ufuncs > works could have been better. > > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Sat Jul 26 15:19:52 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sat, 26 Jul 2014 19:19:52 +0000 (UTC) Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays References: <1406366100.30315.7.camel@sebastian-t440> <956345389428063831.570461sturla.molden-gmail.com@news.gmane.org> <53D3B2C2.7090309@googlemail.com> Message-ID: <1227118823428094389.828592sturla.molden-gmail.com@news.gmane.org> wrote: > statsmodels still has avoided anything that smells like a global state that > changes calculation. If global states are stored in a stack, as in OpenGL, it is not so bad. A context manager could push a state in __enter__ and pop the state in __exit__. This is actually how I write OpenGL code in Python and Cython: pairs of glBegin/glEnd, glPushMatrix/glPopMatrix, and glPushAttrib/glPopAttrib nicely fits with Python context managers. However, the bigger issue is multithreading scalability. You need to protect the global state with a recursive lock, and it might not scale like you want. A thread might call a lengthy computation that releases the GIL, but still hold the rlock that protects the state. All your hopes for seing more then one core saturated will go down the drain. It is even bad for i/o bound code, e.g. on-line signal processing: Data might be ready for processing in one thread, but the global state is locked by an idle thread waiting for data. Sturla From sylvain.corlay at gmail.com Sat Jul 26 15:30:06 2014 From: sylvain.corlay at gmail.com (Sylvain Corlay) Date: Sat, 26 Jul 2014 15:30:06 -0400 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: References: <53d296aa.d1c6b40a.6e8d.ffffe5bf@mx.google.com> <53D29B2F.9060005@gmail.com> <1406366100.30315.7.camel@sebastian-t440> <956345389428063831.570461sturla.molden-gmail.com@news.gmane.org> <1406383131.30315.9.camel@sebastian-t440> <555575025428079997.909854sturla.molden-gmail.com@news.gmane.org> <2097579651428085154.727016sturla.molden-gmail.com@news.gmane.org> Message-ID: I completely agree with Eelco. I expect numpy.mean to do something simple and straightforward. If the naive method is not well suited for my data, I can deal with it and have my own ad hoc method. On Sat, Jul 26, 2014 at 3:19 PM, Eelco Hoogendoorn wrote: > Perhaps I in turn am missing something; but I would suppose that any > algorithm that requires multiple passes over the data is off the table? > Perhaps I am being a little old fashioned and performance oriented here, but > to make the ultra-majority of use cases suffer a factor two performance > penalty for an odd use case which already has a plethora of fine and dandy > solutions? Id vote against, fwiw... > > > On Sat, Jul 26, 2014 at 6:34 PM, Sturla Molden > wrote: >> >> Sturla Molden wrote: >> > Sebastian Berg wrote: >> > >> >> Yes, it is much more complicated and incompatible with naive ufuncs if >> >> you want your memory access to be optimized. And optimizing that is >> >> very >> >> much worth it speed wise... >> > >> > Why? Couldn't we just copy the data chunk-wise to a temporary buffer of >> > say >> > 2**13 numbers and then reduce that? I don't see why we need another >> > iterator for that. >> >> I am sorry if this is a stupid suggestion. My knowledge of how NumPy >> ufuncs >> works could have been better. >> >> Sturla >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From robert.kern at gmail.com Sat Jul 26 16:06:21 2014 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 26 Jul 2014 21:06:21 +0100 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: <1676055168428094052.869608sturla.molden-gmail.com@news.gmane.org> References: <53D29B2F.9060005@gmail.com> <1406366100.30315.7.camel@sebastian-t440> <956345389428063831.570461sturla.molden-gmail.com@news.gmane.org> <53D3B2C2.7090309@googlemail.com> <1676055168428094052.869608sturla.molden-gmail.com@news.gmane.org> Message-ID: On Sat, Jul 26, 2014 at 8:04 PM, Sturla Molden wrote: > Benjamin Root wrote: > >> My other concern would be with multi-threaded code (which is where a global >> state would be bad). > > It would presumably require a global threading.RLock for protecting the > global state. We would use thread-local storage like we currently do with the np.errstate() context manager. Each thread will have its own "global" state. -- Robert Kern From sturla.molden at gmail.com Sat Jul 26 17:19:06 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sat, 26 Jul 2014 21:19:06 +0000 (UTC) Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays References: <1406366100.30315.7.camel@sebastian-t440> <956345389428063831.570461sturla.molden-gmail.com@news.gmane.org> <53D3B2C2.7090309@googlemail.com> <1676055168428094052.869608sturla.molden-gmail.com@news.gmane.org> Message-ID: <686489827428101123.449928sturla.molden-gmail.com@news.gmane.org> Robert Kern wrote: >> It would presumably require a global threading.RLock for protecting the >> global state. > > We would use thread-local storage like we currently do with the > np.errstate() context manager. Each thread will have its own "global" > state. That sounds like a better plan, yes :) Sturla From gabriel.altay at gmail.com Sat Jul 26 17:32:11 2014 From: gabriel.altay at gmail.com (Gabriel Altay) Date: Sat, 26 Jul 2014 17:32:11 -0400 Subject: [Numpy-discussion] ImportError while building Numpy on Ubuntu 14.04 Message-ID: I'm attempting to build Numpy from source in order to do some development. I've cloned the github repo and installed the pre-reqs for Ubuntu http://www.scipy.org/scipylib/building/linux.html#debian-ubuntu However, when I do >>> python setup.py build I get Running from numpy source directory. Traceback (most recent call last): File "setup.py", line 251, in setup_package() File "setup.py", line 235, in setup_package from numpy.distutils.core import setup File "/home/galtay/github/numpy-env/numpy/numpy/distutils/__init__.py", line 37, in from numpy.testing import Tester File "/home/galtay/github/numpy-env/numpy/numpy/testing/__init__.py", line 13, in from .utils import * File "/home/galtay/github/numpy-env/numpy/numpy/testing/utils.py", line 17, in from numpy.core import float32, empty, arange, array_repr, ndarray File "/home/galtay/github/numpy-env/numpy/numpy/core/__init__.py", line 6, in from . import multiarray ImportError: cannot import name multiarray Any hints? I'm running Continuum Analytics Anaconda Python distribution thanks, -Gabriel -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun Jul 27 02:04:50 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 27 Jul 2014 02:04:50 -0400 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: <686489827428101123.449928sturla.molden-gmail.com@news.gmane.org> References: <1406366100.30315.7.camel@sebastian-t440> <956345389428063831.570461sturla.molden-gmail.com@news.gmane.org> <53D3B2C2.7090309@googlemail.com> <1676055168428094052.869608sturla.molden-gmail.com@news.gmane.org> <686489827428101123.449928sturla.molden-gmail.com@news.gmane.org> Message-ID: On Sat, Jul 26, 2014 at 5:19 PM, Sturla Molden wrote: > Robert Kern wrote: > > >> It would presumably require a global threading.RLock for protecting the > >> global state. > > > > We would use thread-local storage like we currently do with the > > np.errstate() context manager. Each thread will have its own "global" > > state. > > That sounds like a better plan, yes :) > Any "global" state that changes how things are calculated will have unpredictable results. And I don't trust python users to be disciplined enough. issue: Why do I get different results after `import this_funy_package`? Josef > > > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Sun Jul 27 04:24:58 2014 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 27 Jul 2014 09:24:58 +0100 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: References: <1406366100.30315.7.camel@sebastian-t440> <956345389428063831.570461sturla.molden-gmail.com@news.gmane.org> <53D3B2C2.7090309@googlemail.com> <1676055168428094052.869608sturla.molden-gmail.com@news.gmane.org> <686489827428101123.449928sturla.molden-gmail.com@news.gmane.org> Message-ID: On Sun, Jul 27, 2014 at 7:04 AM, wrote: > > On Sat, Jul 26, 2014 at 5:19 PM, Sturla Molden > wrote: >> >> Robert Kern wrote: >> >> >> It would presumably require a global threading.RLock for protecting the >> >> global state. >> > >> > We would use thread-local storage like we currently do with the >> > np.errstate() context manager. Each thread will have its own "global" >> > state. >> >> That sounds like a better plan, yes :) > > Any "global" state that changes how things are calculated will have > unpredictable results. > > And I don't trust python users to be disciplined enough. > > issue: Why do I get different results after `import this_funy_package`? That's why the suggestion is that it be controlled by a context manager. The state change will only be limited to the `with:` statement. You would not be able to "fire-and-forget" the state change. -- Robert Kern From josef.pktd at gmail.com Sun Jul 27 04:56:32 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 27 Jul 2014 04:56:32 -0400 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: References: <1406366100.30315.7.camel@sebastian-t440> <956345389428063831.570461sturla.molden-gmail.com@news.gmane.org> <53D3B2C2.7090309@googlemail.com> <1676055168428094052.869608sturla.molden-gmail.com@news.gmane.org> <686489827428101123.449928sturla.molden-gmail.com@news.gmane.org> Message-ID: On Sun, Jul 27, 2014 at 4:24 AM, Robert Kern wrote: > On Sun, Jul 27, 2014 at 7:04 AM, wrote: > > > > On Sat, Jul 26, 2014 at 5:19 PM, Sturla Molden > > wrote: > >> > >> Robert Kern wrote: > >> > >> >> It would presumably require a global threading.RLock for protecting > the > >> >> global state. > >> > > >> > We would use thread-local storage like we currently do with the > >> > np.errstate() context manager. Each thread will have its own "global" > >> > state. > >> > >> That sounds like a better plan, yes :) > > > > Any "global" state that changes how things are calculated will have > > unpredictable results. > > > > And I don't trust python users to be disciplined enough. > > > > issue: Why do I get different results after `import this_funy_package`? > > That's why the suggestion is that it be controlled by a context > manager. The state change will only be limited to the `with:` > statement. You would not be able to "fire-and-forget" the state > change. > Can you implement a context manager without introducing a global variable that everyone could set, and forget? Josef > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Sun Jul 27 05:04:41 2014 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 27 Jul 2014 10:04:41 +0100 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: References: <1406366100.30315.7.camel@sebastian-t440> <956345389428063831.570461sturla.molden-gmail.com@news.gmane.org> <53D3B2C2.7090309@googlemail.com> <1676055168428094052.869608sturla.molden-gmail.com@news.gmane.org> <686489827428101123.449928sturla.molden-gmail.com@news.gmane.org> Message-ID: On Sun, Jul 27, 2014 at 9:56 AM, wrote: > > On Sun, Jul 27, 2014 at 4:24 AM, Robert Kern wrote: >> >> On Sun, Jul 27, 2014 at 7:04 AM, wrote: >> > >> > On Sat, Jul 26, 2014 at 5:19 PM, Sturla Molden >> > wrote: >> >> >> >> Robert Kern wrote: >> >> >> >> >> It would presumably require a global threading.RLock for protecting >> >> >> the >> >> >> global state. >> >> > >> >> > We would use thread-local storage like we currently do with the >> >> > np.errstate() context manager. Each thread will have its own "global" >> >> > state. >> >> >> >> That sounds like a better plan, yes :) >> > >> > Any "global" state that changes how things are calculated will have >> > unpredictable results. >> > >> > And I don't trust python users to be disciplined enough. >> > >> > issue: Why do I get different results after `import this_funy_package`? >> >> That's why the suggestion is that it be controlled by a context >> manager. The state change will only be limited to the `with:` >> statement. You would not be able to "fire-and-forget" the state >> change. > > Can you implement a context manager without introducing a global variable > that everyone could set, and forget? Oh sure, with enough effort and digging, someone could search through the C source, find the hidden, private API that does this, and deliberately mess with it. But they can already do that with all of the other necessarily-global state; every module object is a glorified global variable that can be mutated. You won't be able to do it by accident or omission or a lack of discipline. It's not a tempting public target like, say, np.seterr(). -- Robert Kern From rays at blue-cove.com Sun Jul 27 10:16:47 2014 From: rays at blue-cove.com (RayS) Date: Sun, 27 Jul 2014 07:16:47 -0700 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: References: <1406366100.30315.7.camel@sebastian-t440> <956345389428063831.570461sturla.molden-gmail.com@news.gmane.org> <53D3B2C2.7090309@googlemail.com> <1676055168428094052.869608sturla.molden-gmail.com@news.gmane.org> <686489827428101123.449928sturla.molden-gmail.com@news.gmane.org> Message-ID: <201407271416.s6REGn0J031512@blue-cove.com> At 02:04 AM 7/27/2014, you wrote: >You won't be able to do it by accident or omission or a lack of >discipline. It's not a tempting public target like, say, np.seterr(). BTW, why not throw an overflow error in the large float32 sum() case? Is it too expensive to check while accumulating? - Ray From njs at pobox.com Sun Jul 27 10:44:46 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 27 Jul 2014 15:44:46 +0100 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: <201407271416.s6REGn0J031512@blue-cove.com> References: <1406366100.30315.7.camel@sebastian-t440> <956345389428063831.570461sturla.molden-gmail.com@news.gmane.org> <53D3B2C2.7090309@googlemail.com> <1676055168428094052.869608sturla.molden-gmail.com@news.gmane.org> <686489827428101123.449928sturla.molden-gmail.com@news.gmane.org> <201407271416.s6REGn0J031512@blue-cove.com> Message-ID: On Sun, Jul 27, 2014 at 3:16 PM, RayS wrote: > At 02:04 AM 7/27/2014, you wrote: > >>You won't be able to do it by accident or omission or a lack of >>discipline. It's not a tempting public target like, say, np.seterr(). > > BTW, why not throw an overflow error in the large float32 sum() case? > Is it too expensive to check while accumulating? In the example that started this thread, there's no overflow (in the technical sense) occurring. Overflow for ints means wrapping around, and for floats it means exceeding the maximum possible value and overflowing to infinity. The problem here is that when summing up the values, the sum gets large enough that after rounding, x + 1 = x and the sum stops increasing. (For float32's all this requires is x > 16777216.) So while the final error is massive, the mechanism is just ordinary floating-point round-off error. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From rays at blue-cove.com Sun Jul 27 13:02:16 2014 From: rays at blue-cove.com (RayS) Date: Sun, 27 Jul 2014 10:02:16 -0700 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: References: <1406366100.30315.7.camel@sebastian-t440> <956345389428063831.570461sturla.molden-gmail.com@news.gmane.org> <53D3B2C2.7090309@googlemail.com> <1676055168428094052.869608sturla.molden-gmail.com@news.gmane.org> <686489827428101123.449928sturla.molden-gmail.com@news.gmane.org> <201407271416.s6REGn0J031512@blue-cove.com> Message-ID: <201407271702.s6RH2I7K000353@blue-cove.com> Thanks for the clarification, but how is the numpy rounding directed? Round to nearest, ties to even? http://en.wikipedia.org/wiki/IEEE_floating_point#Rounding_rules Just curious, as I couldn't find a reference. - Ray At 07:44 AM 7/27/2014, you wrote: >On Sun, Jul 27, 2014 at 3:16 PM, RayS wrote: > > At 02:04 AM 7/27/2014, you wrote: > > > >>You won't be able to do it by accident or omission or a lack of > >>discipline. It's not a tempting public target like, say, np.seterr(). > > > > BTW, why not throw an overflow error in the large float32 sum() case? > > Is it too expensive to check while accumulating? > >In the example that started this thread, there's no overflow (in the >technical sense) occurring. Overflow for ints means wrapping around, >and for floats it means exceeding the maximum possible value and >overflowing to infinity. > >The problem here is that when summing up the values, the sum gets >large enough that after rounding, x + 1 = x and the sum stops >increasing. (For float32's all this requires is x > 16777216.) So >while the final error is massive, the mechanism is just ordinary >floating-point round-off error. > >-n > >-- >Nathaniel J. Smith >Postdoctoral researcher - Informatics - University of Edinburgh >http://vorpus.org >_______________________________________________ >NumPy-Discussion mailing list >NumPy-Discussion at scipy.org >http://mail.scipy.org/mailman/listinfo/numpy-discussion From sturla.molden at gmail.com Sun Jul 27 14:26:37 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 27 Jul 2014 18:26:37 +0000 (UTC) Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays References: <201407271416.s6REGn0J031512@blue-cove.com> Message-ID: <1486437082428178269.968493sturla.molden-gmail.com@news.gmane.org> Nathaniel Smith wrote: > The problem here is that when summing up the values, the sum gets > large enough that after rounding, x + 1 = x and the sum stops > increasing. Interesting. That explains why the divide-and-conquer reduction is much more robust. Thanks :) Sturla From rmcgibbo at gmail.com Sun Jul 27 22:32:54 2014 From: rmcgibbo at gmail.com (Robert McGibbon) Date: Sun, 27 Jul 2014 19:32:54 -0700 Subject: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing In-Reply-To: References: <536CB2C6.1030305@googlemail.com> Message-ID: I forked Olivier's example project to use the same infrastructure for building conda binaries and deploying them to binstar, which might also be useful for some projects. https://github.com/rmcgibbo/python-appveyor-conda-example -Robert On Wed, Jul 9, 2014 at 3:53 PM, Robert McGibbon wrote: > This is an awesome resource for tons of projects. > > Thanks Olivier! > > -Robert > > > On Wed, Jul 9, 2014 at 7:00 AM, Olivier Grisel > wrote: > >> Feodor updated the AppVeyor nodes to have the Windows SDK matching >> MSVC 2008 Express for Python 2. I have updated my sample scripts and >> we now have a working example of a free CI system for: >> >> Python 2 and 3 both for 32 and 64 bit architectures. >> >> https://github.com/ogrisel/python-appveyor-demo >> >> Best, >> >> -- >> Olivier >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Mon Jul 28 08:37:13 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Mon, 28 Jul 2014 14:37:13 +0200 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: <1486437082428178269.968493sturla.molden-gmail.com@news.gmane.org> References: <201407271416.s6REGn0J031512@blue-cove.com> <1486437082428178269.968493sturla.molden-gmail.com@news.gmane.org> Message-ID: To rephrase my most pressing question: may np.ones((N,2)).mean(0) and np.ones((2,N)).mean(1) produce different results with the implementation in the current master? If so, I think that would be very much regrettable; and if this is a minority opinion, I do hope that at least this gets documented in a most explicit manner. On Sun, Jul 27, 2014 at 8:26 PM, Sturla Molden wrote: > Nathaniel Smith wrote: > > > The problem here is that when summing up the values, the sum gets > > large enough that after rounding, x + 1 = x and the sum stops > > increasing. > > Interesting. That explains why the divide-and-conquer reduction is much > more robust. > > Thanks :) > > > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Mon Jul 28 08:46:35 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 28 Jul 2014 14:46:35 +0200 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: References: <201407271416.s6REGn0J031512@blue-cove.com> <1486437082428178269.968493sturla.molden-gmail.com@news.gmane.org> Message-ID: <1406551595.11957.4.camel@sebastian-t440> On Mo, 2014-07-28 at 14:37 +0200, Eelco Hoogendoorn wrote: > To rephrase my most pressing question: may np.ones((N,2)).mean(0) and > np.ones((2,N)).mean(1) produce different results with the > implementation in the current master? If so, I think that would be > very much regrettable; and if this is a minority opinion, I do hope > that at least this gets documented in a most explicit manner. > This will always give you different results. Though in master. the difference is more likely to be large, since (often the second one) maybe be less likely to run into bigger numerical issues. > > On Sun, Jul 27, 2014 at 8:26 PM, Sturla Molden > wrote: > Nathaniel Smith wrote: > > > The problem here is that when summing up the values, the sum > gets > > large enough that after rounding, x + 1 = x and the sum > stops > > increasing. > > > Interesting. That explains why the divide-and-conquer > reduction is much > more robust. > > Thanks :) > > > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From argriffi at ncsu.edu Mon Jul 28 09:21:15 2014 From: argriffi at ncsu.edu (alex) Date: Mon, 28 Jul 2014 09:21:15 -0400 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: <1406551595.11957.4.camel@sebastian-t440> References: <201407271416.s6REGn0J031512@blue-cove.com> <1486437082428178269.968493sturla.molden-gmail.com@news.gmane.org> <1406551595.11957.4.camel@sebastian-t440> Message-ID: On Mon, Jul 28, 2014 at 8:46 AM, Sebastian Berg wrote: > On Mo, 2014-07-28 at 14:37 +0200, Eelco Hoogendoorn wrote: >> To rephrase my most pressing question: may np.ones((N,2)).mean(0) and >> np.ones((2,N)).mean(1) produce different results with the >> implementation in the current master? If so, I think that would be >> very much regrettable; and if this is a minority opinion, I do hope >> that at least this gets documented in a most explicit manner. >> > > This will always give you different results. Though in master. the > difference is more likely to be large, since (often the second one) > maybe be less likely to run into bigger numerical issues. Are you sure they always give different results? Notice that np.ones((N,2)).mean(0) np.ones((2,N)).mean(1) compute means of different axes on transposed arrays so these differences 'cancel out'. My understanding of the question is to clarify how numpy reduction algorithms are special-cased for the fast axis vs. other axes. From cmkleffner at gmail.com Mon Jul 28 09:25:33 2014 From: cmkleffner at gmail.com (Carl Kleffner) Date: Mon, 28 Jul 2014 15:25:33 +0200 Subject: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing In-Reply-To: References: <536CB2C6.1030305@googlemail.com> Message-ID: Hi, on https://bitbucket.org/carlkl/mingw-w64-for-python/downloads I uploaded 7z-archives for mingw-w64 and for OpenBLAS-0.2.10 for 32 bit and for 64 bit. To use mingw-w64 for Python >= 3.3 you have to manually tweak the so called specs file - see readme.txt in the archive. Regards Carl 2014-07-28 4:32 GMT+02:00 Robert McGibbon : > I forked Olivier's example project to use the same infrastructure for > building conda binaries and deploying them to binstar, which might also be > useful for some projects. > > https://github.com/rmcgibbo/python-appveyor-conda-example > > -Robert > > > On Wed, Jul 9, 2014 at 3:53 PM, Robert McGibbon > wrote: > >> This is an awesome resource for tons of projects. >> >> Thanks Olivier! >> >> -Robert >> >> >> On Wed, Jul 9, 2014 at 7:00 AM, Olivier Grisel >> wrote: >> >>> Feodor updated the AppVeyor nodes to have the Windows SDK matching >>> MSVC 2008 Express for Python 2. I have updated my sample scripts and >>> we now have a working example of a free CI system for: >>> >>> Python 2 and 3 both for 32 and 64 bit architectures. >>> >>> https://github.com/ogrisel/python-appveyor-demo >>> >>> Best, >>> >>> -- >>> Olivier >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidmenhur at gmail.com Mon Jul 28 09:30:24 2014 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Mon, 28 Jul 2014 15:30:24 +0200 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: <1406551595.11957.4.camel@sebastian-t440> References: <201407271416.s6REGn0J031512@blue-cove.com> <1486437082428178269.968493sturla.molden-gmail.com@news.gmane.org> <1406551595.11957.4.camel@sebastian-t440> Message-ID: On 28 July 2014 14:46, Sebastian Berg wrote: > > To rephrase my most pressing question: may np.ones((N,2)).mean(0) and > > np.ones((2,N)).mean(1) produce different results with the > > implementation in the current master? If so, I think that would be > > very much regrettable; and if this is a minority opinion, I do hope > > that at least this gets documented in a most explicit manner. > > > > This will always give you different results. Though in master. the > difference is more likely to be large, since (often the second one) > maybe be less likely to run into bigger numerical issues. > An example using float16 on Numpy 1.8.1 (I haven't seen diferences with float32): for N in np.logspace(2, 6): print N, (np.ones((N,2), dtype=np.float16).mean(0), np.ones((2,N), dtype=np.float16).mean(1)) The first one gives correct results up to 2049, from where the values start to fall. The second one, on the other hand, gives correct results up to 65519, where it blows to infinity. Interestingly, in the second case there are fluctuations. For example, for N = 65424, the mean is 0.99951172, but 1 for the next and previous numbers. But I think they are just an effect of the rounding, as: In [33]: np.ones(N+1, dtype=np.float16).sum() - N Out[33]: 16.0 In [35]: np.ones(N+1, dtype=np.float16).sum() - (N +1) Out[35]: 15.0 In [36]: np.ones(N-1, dtype=np.float16).sum() - (N -1) Out[36]: -15.0 In [37]: N = 65519 - 20 In [38]: np.ones(N, dtype=np.float16).sum() - N Out[38]: 5.0 In [39]: np.ones(N+1, dtype=np.float16).sum() - (N +1) Out[39]: 4.0 In [40]: np.ones(N-1, dtype=np.float16).sum() - (N -1) Out[40]: 6.0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Mon Jul 28 09:35:23 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 28 Jul 2014 15:35:23 +0200 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: References: <201407271416.s6REGn0J031512@blue-cove.com> <1486437082428178269.968493sturla.molden-gmail.com@news.gmane.org> <1406551595.11957.4.camel@sebastian-t440> Message-ID: On 28/07/14 15:21, alex wrote: > Are you sure they always give different results? Notice that > np.ones((N,2)).mean(0) > np.ones((2,N)).mean(1) > compute means of different axes on transposed arrays so these > differences 'cancel out'. They will be if different algorithms are used. np.ones((N,2)).mean(0) will have larger accumulated rounding error than np.ones((2,N)).mean(1), if only the latter uses the divide-and-conquer summation. I would suggest that in the first case we try to copy the array to a temporary contiguous buffer and use the same divide-and-conquer algorithm, unless some heuristics on memory usage fails. Sturla From fabien.maussion at gmail.com Mon Jul 28 09:50:50 2014 From: fabien.maussion at gmail.com (Fabien) Date: Mon, 28 Jul 2014 15:50:50 +0200 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: References: <201407271416.s6REGn0J031512@blue-cove.com> <1486437082428178269.968493sturla.molden-gmail.com@news.gmane.org> <1406551595.11957.4.camel@sebastian-t440> Message-ID: On 28.07.2014 15:30, Da?id wrote: > An example using float16 on Numpy 1.8.1 (I haven't seen diferences with > float32): Why aren't there differences between float16 and float32 ? Could this be related to my earlier post in this thread where I mentioned summation problems occurring much earlier in numpy than in IDL? Fabien From sebastian at sipsolutions.net Mon Jul 28 10:06:12 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 28 Jul 2014 16:06:12 +0200 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: References: <201407271416.s6REGn0J031512@blue-cove.com> <1486437082428178269.968493sturla.molden-gmail.com@news.gmane.org> <1406551595.11957.4.camel@sebastian-t440> Message-ID: <1406556372.11957.20.camel@sebastian-t440> On Mo, 2014-07-28 at 15:35 +0200, Sturla Molden wrote: > On 28/07/14 15:21, alex wrote: > > > Are you sure they always give different results? Notice that > > np.ones((N,2)).mean(0) > > np.ones((2,N)).mean(1) > > compute means of different axes on transposed arrays so these > > differences 'cancel out'. > > They will be if different algorithms are used. np.ones((N,2)).mean(0) > will have larger accumulated rounding error than np.ones((2,N)).mean(1), > if only the latter uses the divide-and-conquer summation. > What I wanted to point out is that to some extend the algorithm does not matter. You will not necessarily get identical results already if you use a different iteration order, and we have been doing that for years for speed reasons. All libs like BLAS do the same. Yes, the new changes make this much more dramatic, but they only make some paths much better, never worse. It might be dangerous, but only in the sense that you test it with the good path and it works good enough, but later (also) use the other one in some lib. I am not even sure if I > I would suggest that in the first case we try to copy the array to a > temporary contiguous buffer and use the same divide-and-conquer > algorithm, unless some heuristics on memory usage fails. > Sure, but you have to make major changes to the buffered iterator to do that without larger speed implications. It might be a good idea, but it requires someone who knows this stuff to spend a lot of time and care in the depths of numpy. > Sturla > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From sebastian at sipsolutions.net Mon Jul 28 10:08:39 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 28 Jul 2014 16:08:39 +0200 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: References: <201407271416.s6REGn0J031512@blue-cove.com> <1486437082428178269.968493sturla.molden-gmail.com@news.gmane.org> <1406551595.11957.4.camel@sebastian-t440> Message-ID: <1406556519.11957.22.camel@sebastian-t440> On Mo, 2014-07-28 at 15:50 +0200, Fabien wrote: > On 28.07.2014 15:30, Da?id wrote: > > An example using float16 on Numpy 1.8.1 (I haven't seen diferences with > > float32): > > Why aren't there differences between float16 and float32 ? > float16 calculations are actually float32 calculations. If done along the fast axis they will not get rounded in between (within those 8192 elements chunks). Basically something like the difference we are talking about for float32 and float64 has for years existed in float16. > Could this be related to my earlier post in this thread where I > mentioned summation problems occurring much earlier in numpy than in IDL? > > Fabien > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From hoogendoorn.eelco at gmail.com Mon Jul 28 10:31:41 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Mon, 28 Jul 2014 16:31:41 +0200 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: <1406556372.11957.20.camel@sebastian-t440> References: <201407271416.s6REGn0J031512@blue-cove.com> <1486437082428178269.968493sturla.molden-gmail.com@news.gmane.org> <1406551595.11957.4.camel@sebastian-t440> <1406556372.11957.20.camel@sebastian-t440> Message-ID: Sebastian: Those are good points. Indeed iteration order may already produce different results, even though the semantics of numpy suggest identical operations. Still, I feel this different behavior without any semantical clues is something to be minimized. Indeed copying might have large speed implications. But on second thought, does it? Either the data is already aligned and no copy is required, or it isn't aligned, and we need one pass of cache inefficient access to the data anyway. Infact, if we had one low level function which does cache-intelligent transposition of numpy arrays (using some block strategy), this might be faster even than the current simple reduction operations when forced to work on awkwardly aligned data. Ideally, this intelligent access and intelligent reduction would be part of a single pass of course; but that wouldn't really fit within the numpy design, and merely such an intelligent transpose would provide most of the benefit I think. Or is the mechanism behind ascontiguousarray already intelligent in this sense? On Mon, Jul 28, 2014 at 4:06 PM, Sebastian Berg wrote: > On Mo, 2014-07-28 at 15:35 +0200, Sturla Molden wrote: > > On 28/07/14 15:21, alex wrote: > > > > > Are you sure they always give different results? Notice that > > > np.ones((N,2)).mean(0) > > > np.ones((2,N)).mean(1) > > > compute means of different axes on transposed arrays so these > > > differences 'cancel out'. > > > > They will be if different algorithms are used. np.ones((N,2)).mean(0) > > will have larger accumulated rounding error than np.ones((2,N)).mean(1), > > if only the latter uses the divide-and-conquer summation. > > > > What I wanted to point out is that to some extend the algorithm does not > matter. You will not necessarily get identical results already if you > use a different iteration order, and we have been doing that for years > for speed reasons. All libs like BLAS do the same. > Yes, the new changes make this much more dramatic, but they only make > some paths much better, never worse. It might be dangerous, but only in > the sense that you test it with the good path and it works good enough, > but later (also) use the other one in some lib. I am not even sure if I > > > I would suggest that in the first case we try to copy the array to a > > temporary contiguous buffer and use the same divide-and-conquer > > algorithm, unless some heuristics on memory usage fails. > > > > Sure, but you have to make major changes to the buffered iterator to do > that without larger speed implications. It might be a good idea, but it > requires someone who knows this stuff to spend a lot of time and care in > the depths of numpy. > > > Sturla > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Mon Jul 28 10:46:26 2014 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Mon, 28 Jul 2014 16:46:26 +0200 Subject: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing In-Reply-To: References: <536CB2C6.1030305@googlemail.com> Message-ID: 2014-07-28 15:25 GMT+02:00 Carl Kleffner : > Hi, > > on https://bitbucket.org/carlkl/mingw-w64-for-python/downloads I uploaded > 7z-archives for mingw-w64 and for OpenBLAS-0.2.10 for 32 bit and for 64 bit. > To use mingw-w64 for Python >= 3.3 you have to manually tweak the so called > specs file - see readme.txt in the archive. Have the patches to build numpy and scipy with mingw-w64 been merged in the master branches of those projects? -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel From cmkleffner at gmail.com Mon Jul 28 11:16:47 2014 From: cmkleffner at gmail.com (Carl Kleffner) Date: Mon, 28 Jul 2014 17:16:47 +0200 Subject: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing In-Reply-To: References: <536CB2C6.1030305@googlemail.com> Message-ID: I had to move my development enviroment on different windows box recently (stilll in progress). On this box I don't have full access unfortunately. The patch for scipy build was merged into scipy master some time ago, see https://github.com/scipy/scipy/pull/3484 . I have some additional patches for scipy.test. The pull request for numpy build has not yet been made for the reasons I mentioned. Cheers, Carl 2014-07-28 16:46 GMT+02:00 Olivier Grisel : > 2014-07-28 15:25 GMT+02:00 Carl Kleffner : > > Hi, > > > > on https://bitbucket.org/carlkl/mingw-w64-for-python/downloads I > uploaded > > 7z-archives for mingw-w64 and for OpenBLAS-0.2.10 for 32 bit and for 64 > bit. > > To use mingw-w64 for Python >= 3.3 you have to manually tweak the so > called > > specs file - see readme.txt in the archive. > > Have the patches to build numpy and scipy with mingw-w64 been merged > in the master branches of those projects? > > > -- > Olivier > http://twitter.com/ogrisel - http://github.com/ogrisel > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Mon Jul 28 11:22:33 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 28 Jul 2014 17:22:33 +0200 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: References: <201407271416.s6REGn0J031512@blue-cove.com> <1486437082428178269.968493sturla.molden-gmail.com@news.gmane.org> <1406551595.11957.4.camel@sebastian-t440> <1406556372.11957.20.camel@sebastian-t440> Message-ID: <1406560953.11957.28.camel@sebastian-t440> On Mo, 2014-07-28 at 16:31 +0200, Eelco Hoogendoorn wrote: > Sebastian: > > > Those are good points. Indeed iteration order may already produce > different results, even though the semantics of numpy suggest > identical operations. Still, I feel this different behavior without > any semantical clues is something to be minimized. > > Indeed copying might have large speed implications. But on second > thought, does it? Either the data is already aligned and no copy is > required, or it isn't aligned, and we need one pass of cache > inefficient access to the data anyway. Infact, if we had one low level > function which does cache-intelligent transposition of numpy arrays > (using some block strategy), this might be faster even than the > current simple reduction operations when forced to work on awkwardly > aligned data. Ideally, this intelligent access and intelligent > reduction would be part of a single pass of course; but that wouldn't > really fit within the numpy design, and merely such an intelligent > transpose would provide most of the benefit I think. Or is the > mechanism behind ascontiguousarray already intelligent in this sense? > The iterator is currently smart in the sense that it will (obviously low level), do something like [1]. Most things in numpy use that iterator, ascontiguousarray does so as well. Such a blocked cache aware iterator is what I mean by, someone who knows it would have to spend a lot of time on it :). [1] Appendix: arr = np.ones((100, 100)) arr.sum(1) # being equivalent (iteration order wise) to: res = np.zeros(100) for i in range(100): res += arr[i, :] # while arr.sum(0) would be: for i in range(100): res[i] = arr[i, :].sum() > > On Mon, Jul 28, 2014 at 4:06 PM, Sebastian Berg > wrote: > On Mo, 2014-07-28 at 15:35 +0200, Sturla Molden wrote: > > On 28/07/14 15:21, alex wrote: > > > > > Are you sure they always give different results? Notice > that > > > np.ones((N,2)).mean(0) > > > np.ones((2,N)).mean(1) > > > compute means of different axes on transposed arrays so > these > > > differences 'cancel out'. > > > > They will be if different algorithms are used. > np.ones((N,2)).mean(0) > > will have larger accumulated rounding error than > np.ones((2,N)).mean(1), > > if only the latter uses the divide-and-conquer summation. > > > > > What I wanted to point out is that to some extend the > algorithm does not > matter. You will not necessarily get identical results already > if you > use a different iteration order, and we have been doing that > for years > for speed reasons. All libs like BLAS do the same. > Yes, the new changes make this much more dramatic, but they > only make > some paths much better, never worse. It might be dangerous, > but only in > the sense that you test it with the good path and it works > good enough, > but later (also) use the other one in some lib. I am not even > sure if I > > > I would suggest that in the first case we try to copy the > array to a > > temporary contiguous buffer and use the same > divide-and-conquer > > algorithm, unless some heuristics on memory usage fails. > > > > > Sure, but you have to make major changes to the buffered > iterator to do > that without larger speed implications. It might be a good > idea, but it > requires someone who knows this stuff to spend a lot of time > and care in > the depths of numpy. > > > Sturla > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From hoogendoorn.eelco at gmail.com Mon Jul 28 17:32:15 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Mon, 28 Jul 2014 23:32:15 +0200 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: <1406560953.11957.28.camel@sebastian-t440> References: <201407271416.s6REGn0J031512@blue-cove.com> <1486437082428178269.968493sturla.molden-gmail.com@news.gmane.org> <1406551595.11957.4.camel@sebastian-t440> <1406556372.11957.20.camel@sebastian-t440> <1406560953.11957.28.camel@sebastian-t440> Message-ID: I see, thanks for the clarification. Just for the sake of argument, since unfortunately I don't have the time to go dig in the guts of numpy myself: a design which always produces results of the same (high) accuracy, but only optimizes the common access patterns in a hacky way, and may be inefficient in case it needs to fall back on dumb iteration or array copying, is the best compromise between features and the ever limiting amount of time available, I would argue, no? Its preferable if your code works, but may be hacked to work more efficiently, than that it works efficiently, but may need hacking to work correctly under all circumstances. But fun as it is to think about what ought to be, i suppose the people who do actually pour in the effort have thought about these things already. A numpy 2.0 could probably borrow/integrate a lot from numexpr, I suppose. By the way, the hierarchical summation would make it fairly easy to erase (and in any case would minimize) summation differences due to differences between logical and actual ordering in memory of the data, no? On Mon, Jul 28, 2014 at 5:22 PM, Sebastian Berg wrote: > On Mo, 2014-07-28 at 16:31 +0200, Eelco Hoogendoorn wrote: > > Sebastian: > > > > > > Those are good points. Indeed iteration order may already produce > > different results, even though the semantics of numpy suggest > > identical operations. Still, I feel this different behavior without > > any semantical clues is something to be minimized. > > > > Indeed copying might have large speed implications. But on second > > thought, does it? Either the data is already aligned and no copy is > > required, or it isn't aligned, and we need one pass of cache > > inefficient access to the data anyway. Infact, if we had one low level > > function which does cache-intelligent transposition of numpy arrays > > (using some block strategy), this might be faster even than the > > current simple reduction operations when forced to work on awkwardly > > aligned data. Ideally, this intelligent access and intelligent > > reduction would be part of a single pass of course; but that wouldn't > > really fit within the numpy design, and merely such an intelligent > > transpose would provide most of the benefit I think. Or is the > > mechanism behind ascontiguousarray already intelligent in this sense? > > > > The iterator is currently smart in the sense that it will (obviously low > level), do something like [1]. Most things in numpy use that iterator, > ascontiguousarray does so as well. Such a blocked cache aware iterator > is what I mean by, someone who knows it would have to spend a lot of > time on it :). > > [1] Appendix: > > arr = np.ones((100, 100)) > arr.sum(1) > # being equivalent (iteration order wise) to: > res = np.zeros(100) > for i in range(100): > res += arr[i, :] > # while arr.sum(0) would be: > for i in range(100): > res[i] = arr[i, :].sum() > > > > > On Mon, Jul 28, 2014 at 4:06 PM, Sebastian Berg > > wrote: > > On Mo, 2014-07-28 at 15:35 +0200, Sturla Molden wrote: > > > On 28/07/14 15:21, alex wrote: > > > > > > > Are you sure they always give different results? Notice > > that > > > > np.ones((N,2)).mean(0) > > > > np.ones((2,N)).mean(1) > > > > compute means of different axes on transposed arrays so > > these > > > > differences 'cancel out'. > > > > > > They will be if different algorithms are used. > > np.ones((N,2)).mean(0) > > > will have larger accumulated rounding error than > > np.ones((2,N)).mean(1), > > > if only the latter uses the divide-and-conquer summation. > > > > > > > > > What I wanted to point out is that to some extend the > > algorithm does not > > matter. You will not necessarily get identical results already > > if you > > use a different iteration order, and we have been doing that > > for years > > for speed reasons. All libs like BLAS do the same. > > Yes, the new changes make this much more dramatic, but they > > only make > > some paths much better, never worse. It might be dangerous, > > but only in > > the sense that you test it with the good path and it works > > good enough, > > but later (also) use the other one in some lib. I am not even > > sure if I > > > > > I would suggest that in the first case we try to copy the > > array to a > > > temporary contiguous buffer and use the same > > divide-and-conquer > > > algorithm, unless some heuristics on memory usage fails. > > > > > > > > > Sure, but you have to make major changes to the buffered > > iterator to do > > that without larger speed implications. It might be a good > > idea, but it > > requires someone who knows this stuff to spend a lot of time > > and care in > > the depths of numpy. > > > > > Sturla > > > > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at scipy.org > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Mon Jul 28 18:03:35 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Tue, 29 Jul 2014 00:03:35 +0200 Subject: [Numpy-discussion] numpy.mean still broken for largefloat32arrays In-Reply-To: References: <201407271416.s6REGn0J031512@blue-cove.com> <1486437082428178269.968493sturla.molden-gmail.com@news.gmane.org> <1406551595.11957.4.camel@sebastian-t440> <1406556372.11957.20.camel@sebastian-t440> <1406560953.11957.28.camel@sebastian-t440> Message-ID: <53D6C8B7.7040606@googlemail.com> On 28.07.2014 23:32, Eelco Hoogendoorn wrote: > I see, thanks for the clarification. Just for the sake of argument, > since unfortunately I don't have the time to go dig in the guts of numpy > myself: a design which always produces results of the same (high) > accuracy, but only optimizes the common access patterns in a hacky way, > and may be inefficient in case it needs to fall back on dumb iteration > or array copying, is the best compromise between features and the ever > limiting amount of time available, I would argue, no? Its preferable if > your code works, but may be hacked to work more efficiently, than that > it works efficiently, but may need hacking to work correctly under all > circumstances. I don't see the inconsistency as such a big problem. If applications are so sensitive to accurate summations over large uniform datasets they will most likely implement their own algorithm instead of relying on the black box in numpy (which never documented any accuracy bounds or used algorithms on summation so far I know). If they do they should add testsuites that will detect accidental use the less accurate path in numpy and fix it before they even release. General purpose libraries that may not be able to test every input third party users may give them usually don't have the luxury of only supporting the latest version of numpy to have pairwise summation guaranteed in the first place, so they would just have to implement their own algorithms anyway. > > But fun as it is to think about what ought to be, i suppose the people > who do actually pour in the effort have thought about these things > already. A numpy 2.0 could probably borrow/integrate a lot from numexpr, > I suppose. > > By the way, the hierarchical summation would make it fairly easy to > erase (and in any case would minimize) summation differences due to > differences between logical and actual ordering in memory of the data, no? > > > On Mon, Jul 28, 2014 at 5:22 PM, Sebastian Berg > > wrote: > > On Mo, 2014-07-28 at 16:31 +0200, Eelco Hoogendoorn wrote: > > Sebastian: > > > > > > Those are good points. Indeed iteration order may already produce > > different results, even though the semantics of numpy suggest > > identical operations. Still, I feel this different behavior without > > any semantical clues is something to be minimized. > > > > Indeed copying might have large speed implications. But on second > > thought, does it? Either the data is already aligned and no copy is > > required, or it isn't aligned, and we need one pass of cache > > inefficient access to the data anyway. Infact, if we had one low level > > function which does cache-intelligent transposition of numpy arrays > > (using some block strategy), this might be faster even than the > > current simple reduction operations when forced to work on awkwardly > > aligned data. Ideally, this intelligent access and intelligent > > reduction would be part of a single pass of course; but that wouldn't > > really fit within the numpy design, and merely such an intelligent > > transpose would provide most of the benefit I think. Or is the > > mechanism behind ascontiguousarray already intelligent in this sense? > > > > The iterator is currently smart in the sense that it will (obviously low > level), do something like [1]. Most things in numpy use that iterator, > ascontiguousarray does so as well. Such a blocked cache aware iterator > is what I mean by, someone who knows it would have to spend a lot of > time on it :). > > [1] Appendix: > > arr = np.ones((100, 100)) > arr.sum(1) > # being equivalent (iteration order wise) to: > res = np.zeros(100) > for i in range(100): > res += arr[i, :] > # while arr.sum(0) would be: > for i in range(100): > res[i] = arr[i, :].sum() > > > > > On Mon, Jul 28, 2014 at 4:06 PM, Sebastian Berg > > > > wrote: > > On Mo, 2014-07-28 at 15:35 +0200, Sturla Molden wrote: > > > On 28/07/14 15:21, alex wrote: > > > > > > > Are you sure they always give different results? Notice > > that > > > > np.ones((N,2)).mean(0) > > > > np.ones((2,N)).mean(1) > > > > compute means of different axes on transposed arrays so > > these > > > > differences 'cancel out'. > > > > > > They will be if different algorithms are used. > > np.ones((N,2)).mean(0) > > > will have larger accumulated rounding error than > > np.ones((2,N)).mean(1), > > > if only the latter uses the divide-and-conquer summation. > > > > > > > > > What I wanted to point out is that to some extend the > > algorithm does not > > matter. You will not necessarily get identical results already > > if you > > use a different iteration order, and we have been doing that > > for years > > for speed reasons. All libs like BLAS do the same. > > Yes, the new changes make this much more dramatic, but they > > only make > > some paths much better, never worse. It might be dangerous, > > but only in > > the sense that you test it with the good path and it works > > good enough, > > but later (also) use the other one in some lib. I am not even > > sure if I > > > > > I would suggest that in the first case we try to copy the > > array to a > > > temporary contiguous buffer and use the same > > divide-and-conquer > > > algorithm, unless some heuristics on memory usage fails. > > > > > > > > > Sure, but you have to make major changes to the buffered > > iterator to do > > that without larger speed implications. It might be a good > > idea, but it > > requires someone who knows this stuff to spend a lot of time > > and care in > > the depths of numpy. > > > > > Sturla > > > > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at scipy.org > > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From joseluismietta at yahoo.com.ar Tue Jul 29 07:47:20 2014 From: joseluismietta at yahoo.com.ar (=?iso-8859-1?Q?Jos=E8_Luis_Mietta?=) Date: Tue, 29 Jul 2014 04:47:20 -0700 Subject: [Numpy-discussion] length - sticks algorithm In-Reply-To: References: <1406027949.48361.YahooMailNeo@web142302.mail.bf1.yahoo.com> Message-ID: <1406634440.12428.YahooMailNeo@web142306.mail.bf1.yahoo.com> Robert, thanks for your help! Now I have: * Q nodes (Q stick-stick intersections) * a list 'NODES'=[(x,y,i,j)_1,........, (x,y,i,j)_Q], where each element (x,y,i,j) represent the intersection point (x,y) of the sticks i and j. * a matrix 'H' with Q elements {H_k,l}. H_k,l=0 if nodes 'k' and 'l' aren't joined by a edge, and H_k,l = R_k,l = the electrical resistance associated with the union of the nodes 'k' and 'l' (directly proportional to the length of the edge that connects these nodes). * a list 'nodes_resistances'=[R_1, ....., R_Q]. All nodes with 'j' (or 'i') = N+1 have a electric potential 'V' respect all nodes with 'j' or 'i' = N. Now i must apply NODAL ANALYSIS for determinate the electrical current through each of the edges, and the net current (see attached files). I have no ideas about how to do that. Can you help me? Thanks a lot! Best regards, Jos? Luis El d?a martes, 22 de julio de 2014 9:02, Robert Kern escribi?: What have you tried? What exactly are you having problems with? Loosely, I would suggest the following approach: For each stick, iterate over each stick that intersects with it (as recorded in M). Find the coordinates of all of the intersection points. Label the intersection points by the IDs of the two sticks that form the intersection (normalize these IDs by keeping them in order so you don't duplicate intersections already found; e.g. (2, 5), not (5, 2)). Arbitrarily, but consistently, pick one end of the stick and find the distances from that end to each of the intersection points. This induces an order on the intersections with that stick by sorting the intersections by their distance from the arbitrary end of the stick. You will need this to determine which intersections on the same stick are neighbors and which aren't. I.e., if you have 3 intersections with a given stick, (i,j), (i,k), and (i,l), you want (i,j)-(i,k), and (i,k)-(i,l), but not (i,j)-(i,l). You can find the distances between each of the intersections easily from that. Use a networkx Graph to record the distances (you are making a so-called "weighted graph"). On Tue, Jul 22, 2014 at 12:19 PM, Jos? Luis Mietta wrote: > Hi experts! > > >Im working with conductivity of sticks film - systems. > > > >In my algorithm (N sticks) I have the intersection graph matrix M (M is a NxN matrix, M_ij=1 if sticks 'i' and 'j' do intersect, and M_ij=0 if sticks 'i' and 'j' do not). >Also I have 2 lists with the end-points of each stick. In addition, I can calculate the intersection point (If exist) between sticks. > > >I want to calculate all the distances between the points of intersection (1,2,3,...N) in the next figure: >without lose the connectivity information (which intersection is connected to which). In the figure, (A) is the system with sticks. > > >I dont know how to do this. Im a python + numpy user. > > >Waiting for your answers! > > >Thans a lot >_______________________________________________ >NumPy-Discussion mailing list >NumPy-Discussion at scipy.org >http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Robert Kern _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: EE201_matrix_analysis.pdf Type: image/ipeg Size: 184815 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 10.1103 at PhysRevB.86.134202.pdf Type: image/ipeg Size: 1146335 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rahman2012.pdf Type: image/ipeg Size: 482265 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Dibujo.png Type: image/png Size: 446152 bytes Desc: not available URL: From cjw at ncf.ca Tue Jul 29 08:24:41 2014 From: cjw at ncf.ca (Colin J. Williams) Date: Tue, 29 Jul 2014 08:24:41 -0400 (EDT) Subject: [Numpy-discussion] Compiling Numpy-1.8.1 In-Reply-To: <53D6C8B7.7040606@googlemail.com> Message-ID: <1583752360.20803.1406636681520.JavaMail.root@ncf.ca> This version of Numpy does not appear to be available as an installable binary. In any event, the LAPACK and other packages do not seem to be available with the installable versions. I understand that Windows Studio 2008 is normally used for Windows compiling. Unfortunately, this is no longer available from Microsoft. The link is replaced by a Power Point presentation. Can anyone suggest an alternative compiler/linker? Colin W. From robert.kern at gmail.com Tue Jul 29 08:43:03 2014 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 29 Jul 2014 13:43:03 +0100 Subject: [Numpy-discussion] length - sticks algorithm In-Reply-To: <1406634440.12428.YahooMailNeo@web142306.mail.bf1.yahoo.com> References: <1406027949.48361.YahooMailNeo@web142302.mail.bf1.yahoo.com> <1406634440.12428.YahooMailNeo@web142306.mail.bf1.yahoo.com> Message-ID: On Tue, Jul 29, 2014 at 12:47 PM, Jos? Luis Mietta < joseluismietta at yahoo.com.ar> wrote: > Robert, thanks for your help! > > Now I have: > > * Q nodes (Q stick-stick intersections) > * a list 'NODES'=[(x,y,i,j)_1,........, (x,y,i,j)_Q], where each element > (x,y,i,j) represent the intersection point (x,y) of the sticks i and j. > * a matrix 'H' with Q elements {H_k,l}. > H_k,l=0 if nodes 'k' and 'l' aren't joined by a edge, and H_k,l = R_k,l = > the electrical resistance associated with the union of the nodes 'k' and > 'l' (directly proportional to the length of the edge that connects these > nodes). > * a list 'nodes_resistances'=[R_1, ....., R_Q]. > > All nodes with 'j' (or 'i') = N+1 have a electric potential 'V' respect > all nodes with 'j' or 'i' = N. > > Now i must apply NODAL ANALYSIS for determinate the electrical current > through each of the edges, and the net current (see attached files). I > have no ideas about how to do that. Can you help me? > Please do not send largish binary attachments to this list. I do not know off-hand how to do this, but it looks like the EE201 document you attached tells you how. It is somewhat beyond the scope of this mailing list to help you understand that document, sorry. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Tue Jul 29 08:50:12 2014 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Tue, 29 Jul 2014 14:50:12 +0200 Subject: [Numpy-discussion] Compiling Numpy-1.8.1 In-Reply-To: <1583752360.20803.1406636681520.JavaMail.root@ncf.ca> References: <53D6C8B7.7040606@googlemail.com> <1583752360.20803.1406636681520.JavaMail.root@ncf.ca> Message-ID: 2014-07-29 14:24 GMT+02:00 Colin J. Williams : > > This version of Numpy does not appear to be available as an installable binary. In any event, the LAPACK and other packages do not seem to be available with the installable versions. > > I understand that Windows Studio 2008 is normally used for Windows compiling. Unfortunately, this is no longer available from Microsoft. The link is replaced by a Power Point presentation. > > Can anyone suggest an alternative compiler/linker? The web installers for MSVC Express 2008 is still online at: http://go.microsoft.com/?linkid=7729279 FYI I recently update the scikit-learn documentation for building under windows, both for Python 2 and Python 3 as well as 32 bit and 64 bit architectures: http://scikit-learn.org/stable/install.html#building-on-windows The same build environment should work for numpy (I think). -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel From cjwilliams43 at gmail.com Tue Jul 29 09:48:44 2014 From: cjwilliams43 at gmail.com (Colin J. Williams) Date: Tue, 29 Jul 2014 09:48:44 -0400 Subject: [Numpy-discussion] Compiling Numpy-1.8.1 In-Reply-To: References: <53D6C8B7.7040606@googlemail.com> <1583752360.20803.1406636681520.JavaMail.root@ncf.ca> Message-ID: Oliver, Thanks. I've installed Windows Studio 2008 Express. I'll read your building on Winods Document. Colin W. On 29 July 2014 08:50, Olivier Grisel wrote: > 2014-07-29 14:24 GMT+02:00 Colin J. Williams : > > > > This version of Numpy does not appear to be available as an installable > binary. In any event, the LAPACK and other packages do not seem to be > available with the installable versions. > > > > I understand that Windows Studio 2008 is normally used for Windows > compiling. Unfortunately, this is no longer available from Microsoft. The > link is replaced by a Power Point presentation. > > > > Can anyone suggest an alternative compiler/linker? > > The web installers for MSVC Express 2008 is still online at: > http://go.microsoft.com/?linkid=7729279 > > FYI I recently update the scikit-learn documentation for building > under windows, both for Python 2 and Python 3 as well as 32 bit and 64 > bit architectures: > > http://scikit-learn.org/stable/install.html#building-on-windows > > The same build environment should work for numpy (I think). > > -- > Olivier > http://twitter.com/ogrisel - http://github.com/ogrisel > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From derek at astro.physik.uni-goettingen.de Tue Jul 29 13:52:51 2014 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Tue, 29 Jul 2014 19:52:51 +0200 Subject: [Numpy-discussion] length - sticks algorithm In-Reply-To: References: <1406027949.48361.YahooMailNeo@web142302.mail.bf1.yahoo.com> <1406634440.12428.YahooMailNeo@web142306.mail.bf1.yahoo.com> Message-ID: <5BAE10D9-E948-49FE-BD25-1D97A118D2A7@astro.physik.uni-goettingen.de> On 29 Jul 2014, at 02:43 pm, Robert Kern wrote: > On Tue, Jul 29, 2014 at 12:47 PM, Jos? Luis Mietta wrote: > Robert, thanks for your help! > > Now I have: > > * Q nodes (Q stick-stick intersections) > * a list 'NODES'=[(x,y,i,j)_1,........, (x,y,i,j)_Q], where each element (x,y,i,j) represent the intersection point (x,y) of the sticks i and j. > * a matrix 'H' with Q elements {H_k,l}. > H_k,l=0 if nodes 'k' and 'l' aren't joined by a edge, and H_k,l = R_k,l = the electrical resistance associated withthe union of the nodes 'k' and 'l' (directly proportional to the length of the edge that connects these nodes). > * a list 'nodes_resistances'=[R_1, ....., R_Q]. > > All nodes with 'j' (or 'i') = N+1 have a electric potential 'V' respect all nodes with 'j' or 'i' = N. > > Now i must apply NODAL ANALYSIS for determinate the electrical current through each of the edges, and the net current (see attached files). I have no ideas about how to do that. Can you help me? > > Please do not send largish binary attachments to this list. I do not know off-hand how to do this, but it looks like the EE201 document you attached tells you how. It is somewhat beyond the scope of this mailing list to help you understand that document, sorry. > And it is not a good idea to post copyrighted journal articles to a list where they will end up in a public list archive (even if not immediately recognisable so). Derek From faltet at gmail.com Wed Jul 30 06:34:02 2014 From: faltet at gmail.com (Francesc Alted) Date: Wed, 30 Jul 2014 12:34:02 +0200 Subject: [Numpy-discussion] ANN: bcolz 0.7.1 released Message-ID: <53D8CA1A.6090400@gmail.com> ====================== Announcing bcolz 0.7.1 ====================== What's new ========== This is maintenance release, where bcolz got rid of the nose dependency for Python 2.6 (only unittest2 should be required). Also, some small fixes for the test suite, specially in 32-bit has been done. Thanks to Ilan Schnell for pointing out the problems and for suggesting fixes. ``bcolz`` is a renaming of the ``carray`` project. The new goals for the project are to create simple, yet flexible compressed containers, that can live either on-disk or in-memory, and with some high-performance iterators (like `iter()`, `where()`) for querying them. Together, bcolz and the Blosc compressor, are finally fullfilling the promise of accelerating memory I/O, at least for some real scenarios: http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots For more detailed info, see the release notes in: https://github.com/Blosc/bcolz/wiki/Release-Notes What it is ========== bcolz provides columnar and compressed data containers. Column storage allows for efficiently querying tables with a large number of columns. It also allows for cheap addition and removal of column. In addition, bcolz objects are compressed by default for reducing memory/disk I/O needs. The compression process is carried out internally by Blosc, a high-performance compressor that is optimized for binary data. bcolz can use numexpr internally so as to accelerate many vector and query operations (although it can use pure NumPy for doing so too). numexpr optimizes the memory usage and use several cores for doing the computations, so it is blazing fast. Moreover, the carray/ctable containers can be disk-based, and it is possible to use them for seamlessly performing out-of-memory computations. bcolz has minimal dependencies (NumPy), comes with an exhaustive test suite and fully supports both 32-bit and 64-bit platforms. Also, it is typically tested on both UNIX and Windows operating systems. Installing ========== bcolz is in the PyPI repository, so installing it is easy: $ pip install -U bcolz Resources ========= Visit the main bcolz site repository at: http://github.com/Blosc/bcolz Manual: http://bcolz.blosc.org Home of Blosc compressor: http://blosc.org User's mail list: bcolz at googlegroups.com http://groups.google.com/group/bcolz License is the new BSD: https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt ---- **Enjoy data!** -- Francesc Alted From jtaylor.debian at googlemail.com Wed Jul 30 16:20:05 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Wed, 30 Jul 2014 22:20:05 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.9.0 beta 2 release Message-ID: <53D95375.5080707@googlemail.com> Hello, The source packages and binaries got numpy 1.9.0 beta 2 have just been uploaded to sourceforge. https://sourceforge.net/projects/numpy/files/NumPy/1.9.0b2 1.9.0 will be a new feature release supporting Python 2.6 - 2.7 and 3.2 - 3.4. Unfortunately we have disabled the new __numpy_ufunc__ feature for overriding ufuncs in subclasses for now. There are still some unresolved issues with its behavior regarding python operator precedence and subclasses. If you have a stake in the issue please read Pauli's summary of the remaining issues: http://mail.scipy.org/pipermail/numpy-discussion/2014-July/070737.html When the issues are resolved to everyones satisfaction we hope to enable the feature for 1.10 in its final form. We have restored the indexing edge case that broke matplotlib with numpy 1.9.0 beta 1 but some of the other test failures in other packages are deemed bugs in their code and not reasonable to support in numpy anymore. Most projects have fixed the issues in their latest stable or development versions. Depending on how bad the broken functionality is you may need to update your third party packages when updating numpy to 1.9.0b2. An attempt was made to update the windows binary toolchain to the latest mingw/mingw64 version and an up to date ATLAS version but this turned up a few ugly test failures. Help in resolving these issues is appreciated, no core developer has Windows debugging experience. Please see this issue for details: https://github.com/numpy/numpy/issues/4909 The changelog is mostly the same as in beta1. Please read it carefully there have been many small changes that could affect your code. https://github.com/numpy/numpy/blob/maintenance/1.9.x/doc/release/1.9.0-notes.rst Please also take special note of the future changes section which will apply to the following release 1.10.0 and make sure to check if your applications would be affected by them. Source tarballs, windows installers and release notes can be found at https://sourceforge.net/projects/numpy/files/NumPy/1.9.0b2 Cheers, Julian Taylor -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From jks257 at cornell.edu Wed Jul 30 16:36:24 2014 From: jks257 at cornell.edu (Jeffrey Ken Smith) Date: Wed, 30 Jul 2014 20:36:24 +0000 Subject: [Numpy-discussion] Can't build numpy on my Windows 7 desktop computer Message-ID: I have been unable to install on my Windows 7 desktop computer, which is a Dell - I had no problems installing it on my new laptop, which is also a Dell. When I try to run the superpack .exe file, I get a message claiming that Python2.7 is not in the registry even though it is and even though I was able to install pyodbc. If I download the zip file and try to use setup.py, I get messages like "No module named msvccompiler in numpy.distutils: trying from distutils error: unable to find vcvarsall.bat" I have no idea what this means or what to do about it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Jul 30 18:01:03 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 30 Jul 2014 15:01:03 -0700 Subject: [Numpy-discussion] Can't build numpy on my Windows 7 desktop computer In-Reply-To: References: Message-ID: On Wed, Jul 30, 2014 at 1:36 PM, Jeffrey Ken Smith wrote: > I have been unable to install on my Windows 7 desktop computer, which is > a Dell ? I had no problems installing it on my new laptop, which is also a > Dell. When I try to run the superpack .exe file, I get a message claiming > that Python2.7 is not in the registry even though it is and even though I > was able to install pyodbc. > Really bad error message - you are probably trying to install a 64 bit numpy into a 32 bit python, or vice versa -- make sure you are doing both the same. And I recommend the binaries from here: http://www.lfd.uci.edu/~gohlke/pythonlibs/ (or Anaconda or Canopy) -Chris > If I download the zip file and try to use setup.py, I get messages like > > > > ?No module named msvccompiler in numpy.distutils: trying from distutils > > error: unable to find vcvarsall.bat? > > > > I have no idea what this means or what to do about it. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Jul 30 18:02:14 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 30 Jul 2014 15:02:14 -0700 Subject: [Numpy-discussion] Can't build numpy on my Windows 7 desktop computer In-Reply-To: References: Message-ID: one more note: > >> If I download the zip file and try to use setup.py, I get messages like >> >> >> >> ?No module named msvccompiler in numpy.distutils: trying from distutils >> >> error: unable to find vcvarsall.bat? >> >> >> >> I have no idea what this means or what to do about it. >> > It means it is trying to compile numpy, and you don't have the compiler set up to do that. But I suspect you don't want to anyway. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Jul 30 18:34:32 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 30 Jul 2014 16:34:32 -0600 Subject: [Numpy-discussion] Remove user_array.py Message-ID: Hi All, numpy/lib/user_array.py is an old module (2006) that documents itself as unfinished. The only recent changes are my work for supporting both python2 and python3 from the same code base. It was apparently intended as an alternative to inheriting from ndarray. It has no tests to speak of except a few odds and ends included in the module. I suspect this is one of those features that few have heard of. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Jul 30 18:52:50 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 30 Jul 2014 15:52:50 -0700 Subject: [Numpy-discussion] OSX wheels for older numpy versions on pypi Message-ID: Hi, I took the liberty of uploading OSX wheels for some older numpy versions to pypi. These can be useful for testing, or when building your own wheels to be compatible with earlier numpy versions - see: http://stackoverflow.com/questions/17709641/valueerror-numpy-dtype-has-the-wrong-size-try-recompiling/18369312#18369312 There are currently wheels for numpy 1.5.1 py27 numpy 1.6.0 py27 numpy 1.6.1 py27 numpy 1.7.1 py27, 32, 33, 34 These are all compiled against ATLAS: https://github.com/matthew-brett/numpy-atlas-binaries install with e.g. pip install numpy==1.6.1 If anyone needs other wheels compiled, let me know, I'll try and upload them, Cheers, Matthew From cmkleffner at gmail.com Wed Jul 30 20:12:49 2014 From: cmkleffner at gmail.com (Carl Kleffner) Date: Thu, 31 Jul 2014 02:12:49 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.9.0 beta 2 release In-Reply-To: <53D95375.5080707@googlemail.com> References: <53D95375.5080707@googlemail.com> Message-ID: Hi, I created mingw-w64 builds for testing based on OpenBLAS, see: https://bitbucket.org/carlkl/mingw-w64-for-python/downloads . gists for numpy.test run: win32: https://gist.github.com/carlkl/43182c7c5e0049db7b4e amd64: https://gist.github.com/carlkl/c528505af31ac32720b0 Regards, Carl 2014-07-30 22:20 GMT+02:00 Julian Taylor : > Hello, > > The source packages and binaries got numpy 1.9.0 beta 2 have just been > uploaded to sourceforge. > https://sourceforge.net/projects/numpy/files/NumPy/1.9.0b2 > > 1.9.0 will be a new feature release supporting Python 2.6 - 2.7 and 3.2 > - 3.4. > > Unfortunately we have disabled the new __numpy_ufunc__ feature for > overriding ufuncs in subclasses for now. There are still some unresolved > issues with its behavior regarding python operator precedence and > subclasses. > If you have a stake in the issue please read Pauli's summary of the > remaining issues: > http://mail.scipy.org/pipermail/numpy-discussion/2014-July/070737.html > > When the issues are resolved to everyones satisfaction we hope to enable > the feature for 1.10 in its final form. > > We have restored the indexing edge case that broke matplotlib with numpy > 1.9.0 beta 1 but some of the other test failures in other packages are > deemed bugs in their code and not reasonable to support in numpy > anymore. Most projects have fixed the issues in their latest stable or > development versions. Depending on how bad the broken functionality is > you may need to update your third party packages when updating numpy to > 1.9.0b2. > > An attempt was made to update the windows binary toolchain to the latest > mingw/mingw64 version and an up to date ATLAS version but this turned up > a few ugly test failures. > Help in resolving these issues is appreciated, no core developer has > Windows debugging experience. > Please see this issue for details: > https://github.com/numpy/numpy/issues/4909 > > > The changelog is mostly the same as in beta1. Please read it carefully > there have been many small changes that could affect your code. > > https://github.com/numpy/numpy/blob/maintenance/1.9.x/doc/release/1.9.0-notes.rst > Please also take special note of the future changes section which will > apply to the following release 1.10.0 and make sure to check if your > applications would be affected by them. > > Source tarballs, windows installers and release notes can be found at > https://sourceforge.net/projects/numpy/files/NumPy/1.9.0b2 > > Cheers, > Julian Taylor > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Jul 30 22:06:43 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 30 Jul 2014 19:06:43 -0700 Subject: [Numpy-discussion] ANN: NumPy 1.9.0 beta 2 release In-Reply-To: References: <53D95375.5080707@googlemail.com> Message-ID: Hi, On Wed, Jul 30, 2014 at 5:12 PM, Carl Kleffner wrote: > Hi, > > I created mingw-w64 builds for testing based on OpenBLAS, see: > https://bitbucket.org/carlkl/mingw-w64-for-python/downloads . > > gists for numpy.test run: > > win32: https://gist.github.com/carlkl/43182c7c5e0049db7b4e > amd64: https://gist.github.com/carlkl/c528505af31ac32720b0 Thanks all for all the hard work. Here's OSX wheels for testing: http://wheels.scikit-image.org Try with: pip install --pre -f http://wheels.scikit-image.org numpy This should work with Python.org Python on OSX 10.6+, homebrew / macports / system Python for 10.9 [1] Please do send feedback. Cheers, Matthew [1] System Python (/usr/bin/python) will only see your new copy of numpy if you adjust the default path, or test in a virtualenv, because of the system Python sys.path setup From matthew.brett at gmail.com Wed Jul 30 22:42:50 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 30 Jul 2014 19:42:50 -0700 Subject: [Numpy-discussion] ANN: NumPy 1.9.0 beta 2 release In-Reply-To: References: <53D95375.5080707@googlemail.com> Message-ID: Hi, On Wed, Jul 30, 2014 at 5:12 PM, Carl Kleffner wrote: > Hi, > > I created mingw-w64 builds for testing based on OpenBLAS, see: > https://bitbucket.org/carlkl/mingw-w64-for-python/downloads . > > gists for numpy.test run: > > win32: https://gist.github.com/carlkl/43182c7c5e0049db7b4e > amd64: https://gist.github.com/carlkl/c528505af31ac32720b0 I believe the amd64 failure is because Windows doesn't like you trying to open a file that is already open - maybe this will fix it: https://github.com/numpy/numpy/pull/4927 Cheers, Matthew From charlesr.harris at gmail.com Wed Jul 30 23:20:15 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 30 Jul 2014 21:20:15 -0600 Subject: [Numpy-discussion] ANN: NumPy 1.9.0 beta 2 release In-Reply-To: References: <53D95375.5080707@googlemail.com> Message-ID: On Wed, Jul 30, 2014 at 8:42 PM, Matthew Brett wrote: > Hi, > > On Wed, Jul 30, 2014 at 5:12 PM, Carl Kleffner > wrote: > > Hi, > > > > I created mingw-w64 builds for testing based on OpenBLAS, see: > > https://bitbucket.org/carlkl/mingw-w64-for-python/downloads . > > > > gists for numpy.test run: > > > > win32: https://gist.github.com/carlkl/43182c7c5e0049db7b4e > > amd64: https://gist.github.com/carlkl/c528505af31ac32720b0 > > I believe the amd64 failure is because Windows doesn't like you trying > to open a file that is already open - maybe this will fix it: > > https://github.com/numpy/numpy/pull/4927 > > Cheers, > Thanks for getting this out. I just noticed that we are getting a couple of warnings on some platforms. *Python 3.2 debug*; /usr/lib/python3.2/platform.py:381: ResourceWarning: unclosed file <_io.TextIOWrapper name='/etc/lsb-release' mode='rU' encoding='UTF-8'> full_distribution_name=0) *USE_CHROOT=1 ARCH=i386 DIST=trusty PYTHON=3.4* /usr/local/lib/python3.4/dist-packages/numpy/distutils/cpuinfo.py:120: UserWarning: [Errno 2] No such file or directory: '/proc/cpuinfo' warnings.warn(str(e), UserWarning) Not sure about the second one. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Thu Jul 31 02:55:11 2014 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 31 Jul 2014 07:55:11 +0100 Subject: [Numpy-discussion] Remove user_array.py In-Reply-To: References: Message-ID: On Wed, Jul 30, 2014 at 11:34 PM, Charles R Harris wrote: > Hi All, > > numpy/lib/user_array.py is an old module (2006) that documents itself as > unfinished. The only recent changes are my work for supporting both python2 > and python3 from the same code base. It was apparently intended as an > alternative to inheriting from ndarray. It has no tests to speak of except a > few odds and ends included in the module. I suspect this is one of those > features that few have heard of. Agreed. -- Robert Kern From olivier.grisel at ensta.org Thu Jul 31 07:56:25 2014 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Thu, 31 Jul 2014 13:56:25 +0200 Subject: [Numpy-discussion] OSX wheels for older numpy versions on pypi In-Reply-To: References: Message-ID: 2014-07-31 0:52 GMT+02:00 Matthew Brett : > Hi, > > I took the liberty of uploading OSX wheels for some older numpy > versions to pypi. These can be useful for testing, or when building > your own wheels to be compatible with earlier numpy versions - see: > > http://stackoverflow.com/questions/17709641/valueerror-numpy-dtype-has-the-wrong-size-try-recompiling/18369312#18369312 > > There are currently wheels for > > numpy 1.5.1 py27 > numpy 1.6.0 py27 > numpy 1.6.1 py27 > numpy 1.7.1 py27, 32, 33, 34 > > These are all compiled against ATLAS: > > https://github.com/matthew-brett/numpy-atlas-binaries > > install with e.g. > > pip install numpy==1.6.1 Thanks, this is very helpful for project maintainers who have to switch between versions to reproduce bugs reported by users. Do you plan do do the same for scipy? As scipy is even slower to build that would be even more helpful. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel From jtaylor.debian at googlemail.com Thu Jul 31 13:45:57 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Thu, 31 Jul 2014 19:45:57 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.9.0 beta 2 release In-Reply-To: References: <53D95375.5080707@googlemail.com> Message-ID: <53DA80D5.20909@googlemail.com> On 31.07.2014 05:20, Charles R Harris wrote: > > > I just noticed that we are getting a couple of warnings on some platforms. > ... > > *USE_CHROOT=1 ARCH=i386 DIST=trusty PYTHON=3.4* > > /usr/local/lib/python3.4/dist-packages/numpy/distutils/cpuinfo.py:120: > UserWarning: [Errno 2] No such file or directory: '/proc/cpuinfo' > > warnings.warn(str(e), UserWarning) > > Not sure about the second one. > this should harmless, the chroot we use to test 32 bit here does not have the proc filesystem mounted, we could mount it but this distutils feature should not be relevant for travis. From matthew.brett at gmail.com Thu Jul 31 16:40:21 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 31 Jul 2014 13:40:21 -0700 Subject: [Numpy-discussion] OSX wheels for older numpy versions on pypi In-Reply-To: References: Message-ID: On Thu, Jul 31, 2014 at 4:56 AM, Olivier Grisel wrote: > 2014-07-31 0:52 GMT+02:00 Matthew Brett : >> Hi, >> >> I took the liberty of uploading OSX wheels for some older numpy >> versions to pypi. These can be useful for testing, or when building >> your own wheels to be compatible with earlier numpy versions - see: >> >> http://stackoverflow.com/questions/17709641/valueerror-numpy-dtype-has-the-wrong-size-try-recompiling/18369312#18369312 >> >> There are currently wheels for >> >> numpy 1.5.1 py27 >> numpy 1.6.0 py27 >> numpy 1.6.1 py27 >> numpy 1.7.1 py27, 32, 33, 34 >> >> These are all compiled against ATLAS: >> >> https://github.com/matthew-brett/numpy-atlas-binaries >> >> install with e.g. >> >> pip install numpy==1.6.1 > > Thanks, this is very helpful for project maintainers who have to > switch between versions to reproduce bugs reported by users. > > Do you plan do do the same for scipy? As scipy is even slower to build > that would be even more helpful. Sure, I built and uploaded: scipy-0.12.0 py27 scipy-0.13.0 py27, 33, 34 Are there any others you need? Cheers, Matthew From Catherine.M.Moroney at jpl.nasa.gov Thu Jul 31 18:31:02 2014 From: Catherine.M.Moroney at jpl.nasa.gov (Moroney, Catherine M (398D)) Date: Thu, 31 Jul 2014 22:31:02 +0000 Subject: [Numpy-discussion] working with numpy object arrays Message-ID: <5AAFD452-2882-4D7F-883E-C7C3148D882A@jpl.nasa.gov> In the example code below, is it possible to return an array of all the ".a" values of the MyClass objects as stored in the object array "a"? I am successfully able to retrieve the "a" attributes if I loop through the array elements one by one, but I cannot do a whole-array operation to retrieve the "a" attributes. Is there any way to retrieve all the "a" attributes of the MyClass objects all at once, or do I have to loop through all elements of "array" one-by-one? Thanks for any help, Catherine import numpy class MyClass(object): def __init__(self, a): self.a = a def add(self, b, c): self.a += b+c def return_a(self): return self.a array = numpy.empty((2,2), dtype=object) for i in xrange(0, 2): for j in xrange(0, 2): array[i,j] = MyClass(i+j) for i in xrange(0, 2): for j in xrange(0, 2): array[i,j].add(i, j) print "(%i,%i) = %i" % (i, j, array[i,j].a) try: array_a = array[:,:].a print "a values =",array_a except AttributeError: print "Unable to access a attributes of array as a whole." array_a = numpy.empty((2,2)) for i in xrange(0, 2): for j in xrange(0, 2): array_a[i,j] = array[i,j].a print "a values =",array_a From matthew.brett at gmail.com Thu Jul 31 18:55:43 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 31 Jul 2014 15:55:43 -0700 Subject: [Numpy-discussion] OSX wheels for older numpy versions on pypi In-Reply-To: References: Message-ID: On Thu, Jul 31, 2014 at 1:40 PM, Matthew Brett wrote: > On Thu, Jul 31, 2014 at 4:56 AM, Olivier Grisel > wrote: >> 2014-07-31 0:52 GMT+02:00 Matthew Brett : >>> Hi, >>> >>> I took the liberty of uploading OSX wheels for some older numpy >>> versions to pypi. These can be useful for testing, or when building >>> your own wheels to be compatible with earlier numpy versions - see: >>> >>> http://stackoverflow.com/questions/17709641/valueerror-numpy-dtype-has-the-wrong-size-try-recompiling/18369312#18369312 >>> >>> There are currently wheels for >>> >>> numpy 1.5.1 py27 >>> numpy 1.6.0 py27 >>> numpy 1.6.1 py27 >>> numpy 1.7.1 py27, 32, 33, 34 >>> >>> These are all compiled against ATLAS: >>> >>> https://github.com/matthew-brett/numpy-atlas-binaries >>> >>> install with e.g. >>> >>> pip install numpy==1.6.1 >> >> Thanks, this is very helpful for project maintainers who have to >> switch between versions to reproduce bugs reported by users. >> >> Do you plan do do the same for scipy? As scipy is even slower to build >> that would be even more helpful. > > Sure, I built and uploaded: > > scipy-0.12.0 py27 > scipy-0.13.0 py27, 33, 34 I uploaded 0.11.0 and 0.10.0 for py27 in the meantime, Cheers, Matthew From charlesr.harris at gmail.com Thu Jul 31 21:27:55 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 31 Jul 2014 19:27:55 -0600 Subject: [Numpy-discussion] Remove numpy/compat/_inspect.py ? Message-ID: Hi All, The _inspect.py function looks like a numpy version of the python inspect function. ISTR that is was a work around for problems with the early python versions, but that would have been back in 2009. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: