From markbak at gmail.com Wed Feb 1 04:25:04 2012 From: markbak at gmail.com (Mark Bakker) Date: Wed, 1 Feb 2012 10:25:04 +0100 Subject: [Numpy-discussion] combination of list of indices and newaxis not allowed? Message-ID: Hello list, I am trying to specify the indices of an array with a list and add a newaxis, but that combination doesn't seem to be allowed. Any reason why? Here's an example: a = arange(3) This works: a[[0,2]][:,newaxis] Out[445]: array([[0], [2]]) This is more elegant syntax (and, I thought, correct), but it doesn't work: a[[0,2],newaxis] --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /Users/mark/ttim/svn/trunk/ in () ----> 1 a[[0,2],newaxis] TypeError: long() argument must be a string or a number, not 'NoneType' -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Wed Feb 1 06:47:56 2012 From: shish at keba.be (Olivier Delalleau) Date: Wed, 1 Feb 2012 06:47:56 -0500 Subject: [Numpy-discussion] combination of list of indices and newaxis not allowed? In-Reply-To: References: Message-ID: I think you just can't use newaxis in advanced indexing (doc says "The newaxisobject can be used in the basic slicing syntax", and does not mention newaxis in the advanced indexing part). -=- Olivier Le 1 f?vrier 2012 04:25, Mark Bakker a ?crit : > Hello list, > > I am trying to specify the indices of an array with a list and add a > newaxis, but that combination doesn't seem to be allowed. Any reason why? > Here's an example: > > a = arange(3) > > This works: > > a[[0,2]][:,newaxis] > Out[445]: > array([[0], > [2]]) > > This is more elegant syntax (and, I thought, correct), but it doesn't work: > > a[[0,2],newaxis] > --------------------------------------------------------------------------- > TypeError Traceback (most recent call last) > /Users/mark/ttim/svn/trunk/ in () > ----> 1 a[[0,2],newaxis] > > TypeError: long() argument must be a string or a number, not 'NoneType' > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.haessig at crans.org Wed Feb 1 11:47:42 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Wed, 01 Feb 2012 17:47:42 +0100 Subject: [Numpy-discussion] Cross-covariance function In-Reply-To: <4F22D0C5.2050807@gmail.com> References: <4F2152D6.10303@crans.org> <4F217E82.2050002@crans.org> <4F218CC8.9070602@molden.no> <4F2277F1.4050407@crans.org> <4F22D0C5.2050807@gmail.com> Message-ID: <4F296CAE.8080702@crans.org> Hi Bruce, Sorry for the delay in the answer. Le 27/01/2012 17:28, Bruce Southey a ?crit : > The output is still a covariance so do we really need yet another set > of very similar functions to maintain? > Or can we get away with a new keyword? > The idea of an additional keyword seems appealing. Just to make sure I understood it well, you woud be proposing a new signature like : def cov(.... get_full_cov_matrix=True) and when `get_full_cov_matrix` is set to False, only the cross covariance part would be returned. Am I right ? > If speed really matters to you guys then surely moving np.cov into C > would have more impact on 'saving the world' than this proposal. That > also ignores algorithm used ( > http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Covariance). > > I didn't get your point about the algorithm here. From this nomenclature, I would say that numpy.cov is based on a vectorized "two-pass algorithm" which computes the means first and then substracts it before computing the matrix product. Would you make it different ? > Actually np.cov also is deficient in that it does not have the dtype > argument so it is prone to numerical precision errors (especially > getting the mean of the array). Probably should be a ticket... I'm not a specialist of numerical precisions, but I got very impressed by the recent example raised on Jan 24th by Michael Aye which was one of the first "real life" example I've seen. The way I see the cov algorithm, I see first a possibility to propagate an optional dtype argument to the mean computation. However, I'm unsure about what to do after, for the matrix product since "dot(X.T, X.conj()) / fact" is also a sort of mean computation. Therefore it can also be affected by numerical precision issue. What would you suggest ? (the only solution I see would be to use the running variance algorithm. Since the code wouldn't be vectorized anymore, this indeed would benefits from going to C) Best, Pierre From mgroszhauser at gmail.com Wed Feb 1 11:53:52 2012 From: mgroszhauser at gmail.com (=?ISO-8859-1?Q?martin_gro=DFhauser?=) Date: Wed, 1 Feb 2012 17:53:52 +0100 Subject: [Numpy-discussion] Broadcasting doesn't work with divide after tile Message-ID: Hello, when I try in my script to divide a masked array by a scalar I get an error. The instruction is: >> sppa = sp / 100. sp is a masked array with ndim = 3. error is: Traceback (most recent call last): File "/media/nethome/Work/workspace/interimEnso/src/mlBudget.py", line 95, in sppa = sp / 100. File "/usr/lib/pymodules/python2.7/numpy/ma/core.py", line 3673, in __div__ return divide(self, other) File "/usr/lib/pymodules/python2.7/numpy/ma/core.py", line 1072, in __call__ m |= ma ValueError: invalid return array shape The interesting thing is that this error only occurs after a tiling instruction: >> sp4d = N.tile(sp, (ninterf, 1, 1, 1)) If I do the division before the tiling I don't get an error. There's also no error if I do the division with N.divide(sp, 100.). Also printing the array sp after tiling doesn't work, while it works before. If I debug the script with eclipse/PyDev, in the variables window I get the message "Unable to get repr for " for the sp array after tiling. The tiling operation shouldn't change the array, should it? Is this a bug, or is it expected behaviour? Regards, Martin Groszhauser If I try to print the From dstn at astro.princeton.edu Wed Feb 1 12:03:58 2012 From: dstn at astro.princeton.edu (Dustin Lang) Date: Wed, 1 Feb 2012 12:03:58 -0500 (EST) Subject: [Numpy-discussion] I must be wrong? -- endian detection failure on Mac OSX 10.5 Message-ID: Hi, I don't really believe this is a numpy bug that hasn't been detected, so it must be something weird about my setup, but I can't figure it out. Here goes. The symptom is that while numpy-1.4.1 builds fine, numpy-1.5.0 and later releases fail with: In file included from numpy/core/src/npymath/npy_math.c.src:56: numpy/core/src/npymath/npy_math_private.h:78: error: conflicting types for ieee_double_shape_type numpy/core/src/npymath/npy_math_private.h:64: note: previous declaration of ieee_double_shape_type was here error: Command "gcc -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Inumpy/core/include -Ibuild/src.macosx-10.5-i386-2.7/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/include -I/usr/local/python-2.7.2/include/python2.7 -Ibuild/src.macosx-10.5-i386-2.7/numpy/core/src/multiarray -Ibuild/src.macosx-10.5-i386-2.7/numpy/core/src/umath -c build/src.macosx-10.5-i386-2.7/numpy/core/src/npymath/npy_math.c -o build/temp.macosx-10.5-i386-2.7/build/src.macosx-10.5-i386-2.7/numpy/core/src/npymath/npy_math.o" failed with exit status 1 The relevant code looks like, #define IEEE_WORD_ORDER NPY_BYTE_ORDER #if IEEE_WORD_ORDER == NPY_BIG_ENDIAN // declare ieee_double_shape_type; #endif #if IEEE_WORD_ORDER == NPY_LITTLE_ENDIAN // declare ieee_double_shape_type; #endif so it looks like both word-order blocks are getting compiled. For the record, including the same header files as the failing code and compiling with the same command-line args I get: LITTLE_ENDIAN is defined: 1234 __LITTLE_ENDIAN is not defined __LITTLE_ENDIAN__ is defined: 1 (by gcc) BIG_ENDIAN is defined: 4321 __BIG_ENDIAN is not defined __BIG_ENDIAN__ is not defined BYTE_ORDER is defined: 1234 __BYTE_ORDER is not defined __BYTE_ORDER__ is not defined NPY_BYTE_ORDER is defined => __BYTE_ORDER NPY_BIG_ENDIAN is defined => __BIG_ENDIAN NPY_LITTLE_ENDIAN is defined => __LITTLE_ENDIAN and NPY_BYTE_ORDER etc are set in npy_endian.h, in this block of code: #ifdef NPY_HAVE_ENDIAN_H /* Use endian.h if available */ #include #define NPY_BYTE_ORDER __BYTE_ORDER #define NPY_LITTLE_ENDIAN __LITTLE_ENDIAN #define NPY_BIG_ENDIAN __BIG_ENDIAN #else (setup.py detected that I do have endian.h: build/src.macosx-10.5-i386-2.7/numpy/core/include/numpy/_numpyconfig.h:#define NPY_HAVE_ENDIAN_H 1 ) So my guess is that npy_endian.h is expecting glibc-style endian.h with __BYTE_ORDER but getting Apple's endian.h with BYTE_ORDER. Then NPY_BYTE_ORDER gets defined to __BYTE_ORDER which is itself not defined. Same with NPY_{BIG,LITTLE}_ENDIAN, and then apparently the two undefined things compare equal in wacky preprocessor land? For what it's worth, in my own codebase I see that I do this: #if \ (defined(__BYTE_ORDER) && (__BYTE_ORDER == __BIG_ENDIAN)) || \ (defined( _BYTE_ORDER) && ( _BYTE_ORDER == _BIG_ENDIAN)) || \ (defined( BYTE_ORDER) && ( BYTE_ORDER == BIG_ENDIAN)) // yup, big-endian #endif This is a Mac OSX 10.5.8 machine, MacBook5,1, Intel Core2 Duo CPU P8600 @ 2.40GHz, gcc 4.4.6 and python 2.7.2 The weirdness on this system is that I installed a gcc with only x86_64 support, while the kernel and uname insist that it's i386, but I don't *think* that's implicated here. cheers, dustin From scipy at samueljohn.de Wed Feb 1 12:13:43 2012 From: scipy at samueljohn.de (Samuel John) Date: Wed, 1 Feb 2012 18:13:43 +0100 Subject: [Numpy-discussion] I must be wrong? -- endian detection failure on Mac OSX 10.5 In-Reply-To: References: Message-ID: <5BF4306B-B56F-4F5B-9D36-09D5E9FF3333@samueljohn.de> Hi! Your Machine should be able to handle at least Mac OS X10.6 and even 10.7. If there is not a strong reason to remain on 10.5... 10.5 is so long ago, I can barely remember. cheers, Samuel On 01.02.2012, at 18:03, Dustin Lang wrote: > > Hi, > > I don't really believe this is a numpy bug that hasn't been detected, so > it must be something weird about my setup, but I can't figure it out. > Here goes. > > The symptom is that while numpy-1.4.1 builds fine, numpy-1.5.0 and later > releases fail with: > > In file included from numpy/core/src/npymath/npy_math.c.src:56: > numpy/core/src/npymath/npy_math_private.h:78: error: conflicting types for ieee_double_shape_type > numpy/core/src/npymath/npy_math_private.h:64: note: previous declaration of ieee_double_shape_type was here > error: Command "gcc -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes > -Inumpy/core/include > -Ibuild/src.macosx-10.5-i386-2.7/numpy/core/include/numpy > -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core > -Inumpy/core/src/npymath -Inumpy/core/src/multiarray > -Inumpy/core/src/umath -Inumpy/core/include > -I/usr/local/python-2.7.2/include/python2.7 > -Ibuild/src.macosx-10.5-i386-2.7/numpy/core/src/multiarray > -Ibuild/src.macosx-10.5-i386-2.7/numpy/core/src/umath -c > build/src.macosx-10.5-i386-2.7/numpy/core/src/npymath/npy_math.c -o > build/temp.macosx-10.5-i386-2.7/build/src.macosx-10.5-i386-2.7/numpy/core/src/npymath/npy_math.o" > failed with exit status 1 > > > The relevant code looks like, > > #define IEEE_WORD_ORDER NPY_BYTE_ORDER > > #if IEEE_WORD_ORDER == NPY_BIG_ENDIAN > // declare ieee_double_shape_type; > #endif > > #if IEEE_WORD_ORDER == NPY_LITTLE_ENDIAN > // declare ieee_double_shape_type; > #endif > > > so it looks like both word-order blocks are getting compiled. > > For the record, including the same header files as the failing code and > compiling with the same command-line args I get: > > LITTLE_ENDIAN is defined: 1234 > __LITTLE_ENDIAN is not defined > __LITTLE_ENDIAN__ is defined: 1 (by gcc) > BIG_ENDIAN is defined: 4321 > __BIG_ENDIAN is not defined > __BIG_ENDIAN__ is not defined > BYTE_ORDER is defined: 1234 > __BYTE_ORDER is not defined > __BYTE_ORDER__ is not defined > NPY_BYTE_ORDER is defined > => __BYTE_ORDER > NPY_BIG_ENDIAN is defined > => __BIG_ENDIAN > NPY_LITTLE_ENDIAN is defined > => __LITTLE_ENDIAN > > and NPY_BYTE_ORDER etc are set in npy_endian.h, in this block of code: > > #ifdef NPY_HAVE_ENDIAN_H > /* Use endian.h if available */ > #include > > #define NPY_BYTE_ORDER __BYTE_ORDER > #define NPY_LITTLE_ENDIAN __LITTLE_ENDIAN > #define NPY_BIG_ENDIAN __BIG_ENDIAN > #else > > (setup.py detected that I do have endian.h: > build/src.macosx-10.5-i386-2.7/numpy/core/include/numpy/_numpyconfig.h:#define NPY_HAVE_ENDIAN_H 1 > ) > > So my guess is that npy_endian.h is expecting glibc-style endian.h with > __BYTE_ORDER but getting Apple's endian.h with BYTE_ORDER. Then > NPY_BYTE_ORDER gets defined to __BYTE_ORDER which is itself not defined. > Same with NPY_{BIG,LITTLE}_ENDIAN, and then apparently the two undefined > things compare equal in wacky preprocessor land? > > > For what it's worth, in my own codebase I see that I do this: > > #if \ > (defined(__BYTE_ORDER) && (__BYTE_ORDER == __BIG_ENDIAN)) || \ > (defined( _BYTE_ORDER) && ( _BYTE_ORDER == _BIG_ENDIAN)) || \ > (defined( BYTE_ORDER) && ( BYTE_ORDER == BIG_ENDIAN)) > // yup, big-endian > #endif > > > This is a Mac OSX 10.5.8 machine, MacBook5,1, Intel Core2 Duo CPU P8600 @ > 2.40GHz, gcc 4.4.6 and python 2.7.2 > > The weirdness on this system is that I installed a gcc with only x86_64 > support, while the kernel and uname insist that it's i386, but I don't > *think* that's implicated here. > > > cheers, > dustin > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From aronne.merrelli at gmail.com Wed Feb 1 12:33:21 2012 From: aronne.merrelli at gmail.com (Aronne Merrelli) Date: Wed, 1 Feb 2012 11:33:21 -0600 Subject: [Numpy-discussion] Help with f2py in MacPorts environment Message-ID: Hello, I'm trying to do a simple test with f2py, using the Hermite polynomial example here: http://www.scipy.org/Cookbook/F2Py I cannot figure out how to configure the compile/build commands to work with my system. I'm a novice at this stuff, so please bear with me... I'm running on Mac OS X, and since the Xcode compiler does not include fortran, I have the version 4.4 of gcc installed from MacPorts. So my compilers should be /opt/local/bin/gcc-mp-4.4 and gfortran-mp-4.4, and the associated lib/include directories are also under /opt. I've used this to compile other fortran programs so I am confident it is installed correctly. Now, I downloaded that hermite polynomial code, and the first command (f2py -m hermite -h hermite.pyf hermite.f) runs fine. I'm stuck on the second command that is supposed to compile the code and create the .so file. The keyword arguments to f2py (-f90exec, -compiler) don't seem to help but it appears to read shell variables, so the farthest I could get was to do the following: CC=/opt/local/bin/gcc-mp-4.4 F90=/opt/local/bin/gfortran-mp-4.4 f2py --verbose -c hermite.pyf hermite.f In verbose mode it prints out the contents of the distutils objects, and it looks like it is getting the correct compiler paths, but things are still garbled because certain things appear to still reference the Xcode compiler - here is a piece of the distutils content printed by f2py: ******************************************************************************** distutils.unixccompiler.UnixCCompiler linker_exe = ['/opt/local/bin/gcc-mp-4.4'] compiler_so = ['/opt/local/bin/gcc-mp-4.4', '-fno-strict-aliasing', '-fno-common', '-dynamic', '-arch', 'i386', '-isysroot', '/Developer/SDKs/MacOSX10.5.sdk', '-DNDEBUG', '-g', '-O3', '-arch', 'i386', '-isysroot', '/Developer/SDKs/MacOSX10.5.sdk'] So it has the correct path to the MacPorts gcc but then the flags are not correct since those refer to Xcode; it causes a stop error because gcc-mp-4.4 has no -arch flag. My guess is that I have to figure out how to change those flags - which probably means modifying the distutils calls somehow. Has anyone dealt this issue before, and maybe has some pointers on where to look? It is not clear to me how any of the f2py command line flags will modify these things, and I don't see how to setup my own 'UnixCCompiler' object that has the correct flags. Thanks, Aronne -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.haessig at crans.org Wed Feb 1 13:31:02 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Wed, 01 Feb 2012 19:31:02 +0100 Subject: [Numpy-discussion] autocorrelation computation performance : use of np.correlate Message-ID: <4F2984E6.4070005@crans.org> Hi, [I'm not sure whether this discussion belongs to numpy-discussion or scipy-dev] In day to day time series analysis I regularly need to look at the data autocorrelation ("acorr" or "acf" depending on the software package). The straighforward available function I have is matplotlib.pyplot.acorr. However, for a moderately long time series (say of length 10**5) it taking a huge time just to just dislays the autocorrelation values within a small range of time lags. The main reason being it is relying on np.correlate(x,x, mode=2) while only a few lags are needed. (I guess mode=2 is an (old fashioned?) way to set mode='full') I know that np.correlate performance issue has been discussed already, and there is a *somehow* related ticket (http://projects.scipy.org/numpy/ticket/1260). I noticed in the ticket's change number 2 the following comment by Josef : "Maybe a truncated convolution/correlation would be good". I'll come back to this soon. I made an example script "acf_timing.py" to start my point with some timing data : In Ipython: >>> run acf_timing.py # it imports statsmodel's acf + define 2 other acf implementations + an example data 10**5 samples long %time l,c = mpl_acf(a, 10) CPU times: user 8.69 s, sys: 0.00 s, total: 8.69 s Wall time: 11.18 s # pretty long... %time c = sm_acf(a, 10) CPU times: user 8.76 s, sys: 0.01 s, total: 8.78 s Wall time: 10.79 s # long as well. statsmodel has a similar underlying implementation # http://statsmodels.sourceforge.net/generated/scikits.statsmodels.tsa.stattools.acf.html#scikits.statsmodels.tsa.stattools.acf #Now, better option : use the fft convolution %time c=sm_acf(a, 10,fft=True) CPU times: user 0.03 s, sys: 0.01 s, total: 0.04 s Wall time: 0.07 s # Fast, but I'm not sure about the memory implication of using fft though. #The naive option : just compute the acf lags that are needed %time l,c = naive_acf(a, 10) CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s Wall time: 0.01 s # Iterative computation. Pretty silly but very fast # (Now of course, this naive implementation won't scale nicely for a lot of lags) Now comes (at last) the question : what should be done about this performance issue ? - should there be a truncated np.convolve/np.correlate function, as Josef suggested ? - or should people in need of autocorrelation find some workarounds because this usecase is not big enough to call for a change in np.convolve ? I really feel this question is about *where* a change should be implemented (numpy, scipy.signal, maplotlib ?) so that it makes sense while not breaking 10^10 lines of numpy related code... Best, Pierre -------------- next part -------------- A non-text attachment was scrubbed... Name: acf_timing.py Type: text/x-python Size: 1780 bytes Desc: not available URL: From mgroszhauser at gmail.com Wed Feb 1 13:55:33 2012 From: mgroszhauser at gmail.com (=?ISO-8859-1?Q?martin_gro=DFhauser?=) Date: Wed, 1 Feb 2012 19:55:33 +0100 Subject: [Numpy-discussion] Broadcasting doesn't work with divide after tile In-Reply-To: References: Message-ID: 2012/2/1 martin gro?hauser : > Hello, > > when I try in my script to divide a masked array by a scalar I get an > error. The instruction is: > >>> sppa = sp / 100. > > sp is a masked array with ndim = 3. > > error is: > Traceback (most recent call last): > ?File "/media/nethome/Work/workspace/interimEnso/src/mlBudget.py", > line 95, in > ? ?sppa = sp / 100. > ?File "/usr/lib/pymodules/python2.7/numpy/ma/core.py", line 3673, in __div__ > ? ?return divide(self, other) > ?File "/usr/lib/pymodules/python2.7/numpy/ma/core.py", line 1072, in __call__ > ? ?m |= ma > ValueError: invalid return array shape > > The interesting thing is that this error only occurs after a tiling instruction: >>> ? ? ? ? sp4d = N.tile(sp, (ninterf, 1, 1, 1)) > > If I do the division before the tiling I don't get an error. There's > also no error if I do the division with N.divide(sp, 100.). > > Also printing the array sp after tiling doesn't work, while it works > before. If I debug the script with eclipse/PyDev, in the variables > window I get the message "Unable to get repr for 'numpy.ma.core.MaskedArray'>" for the sp array after tiling. > > The tiling operation shouldn't change the array, should it? Is this a > bug, or is it expected behaviour? > > Regards, > Martin Groszhauser I forgot to mention that I'm using Numpy 1.5.1 from Ubuntu 11.04 (1.5.1-1ubuntu2). From lists at informa.tiker.net Wed Feb 1 14:26:43 2012 From: lists at informa.tiker.net (Andreas Kloeckner) Date: Wed, 01 Feb 2012 14:26:43 -0500 Subject: [Numpy-discussion] Curious behavior of __radd__ Message-ID: <87y5smiccs.fsf@ding.tiker.net> Hi all, here's something I don't understand. Consider the following code snippet: --------------------------------------------------- class A(object): def __radd__(self, other): print(type(other)) import numpy as np np.complex64(1j) + A() --------------------------------------------------- In my world, this should print . It does print . Who is casting my sized complex to a built-in complex, and why? It can be Python's type coercion, because the behavior is the same in Python 3.2. (And the docs say Python 3 doesn't support coercion.) (Please cc me.) Thanks, Andreas -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From charlesr.harris at gmail.com Wed Feb 1 14:53:15 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 1 Feb 2012 12:53:15 -0700 Subject: [Numpy-discussion] Curious behavior of __radd__ In-Reply-To: <87y5smiccs.fsf@ding.tiker.net> References: <87y5smiccs.fsf@ding.tiker.net> Message-ID: On Wed, Feb 1, 2012 at 12:26 PM, Andreas Kloeckner wrote: > Hi all, > > here's something I don't understand. Consider the following code snippet: > > --------------------------------------------------- > class A(object): > def __radd__(self, other): > print(type(other)) > > import numpy as np > np.complex64(1j) + A() > --------------------------------------------------- > > In my world, this should print . > It does print . > > Who is casting my sized complex to a built-in complex, and why? > > It can be Python's type coercion, because the behavior is the same in > Python 3.2. (And the docs say Python 3 doesn't support coercion.) > > It gets called once for every scalar in the array, or in your case, just the scalar, and the scalars are converted to python types for the call. At least, that is what it looks like. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Wed Feb 1 15:09:54 2012 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 1 Feb 2012 14:09:54 -0600 Subject: [Numpy-discussion] Broadcasting doesn't work with divide after tile In-Reply-To: References: Message-ID: 2012/2/1 martin gro?hauser > 2012/2/1 martin gro?hauser : > > Hello, > > > > when I try in my script to divide a masked array by a scalar I get an > > error. The instruction is: > > > >>> sppa = sp / 100. > > > > sp is a masked array with ndim = 3. > > > > error is: > > Traceback (most recent call last): > > File "/media/nethome/Work/workspace/interimEnso/src/mlBudget.py", > > line 95, in > > sppa = sp / 100. > > File "/usr/lib/pymodules/python2.7/numpy/ma/core.py", line 3673, in > __div__ > > return divide(self, other) > > File "/usr/lib/pymodules/python2.7/numpy/ma/core.py", line 1072, in > __call__ > > m |= ma > > ValueError: invalid return array shape > > > > The interesting thing is that this error only occurs after a tiling > instruction: > >>> sp4d = N.tile(sp, (ninterf, 1, 1, 1)) > > > > If I do the division before the tiling I don't get an error. There's > > also no error if I do the division with N.divide(sp, 100.). > > > > Also printing the array sp after tiling doesn't work, while it works > > before. If I debug the script with eclipse/PyDev, in the variables > > window I get the message "Unable to get repr for > 'numpy.ma.core.MaskedArray'>" for the sp array after tiling. > > > > The tiling operation shouldn't change the array, should it? Is this a > > bug, or is it expected behaviour? > > > > Regards, > > Martin Groszhauser > > I forgot to mention that I'm using Numpy 1.5.1 from Ubuntu 11.04 > (1.5.1-1ubuntu2). > I can't reproduce this bug with the latest numpy from github master. Perhaps it has been fixed by now? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Feb 1 15:53:25 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 1 Feb 2012 13:53:25 -0700 Subject: [Numpy-discussion] Heads up and macro deprecation. Message-ID: Hi All, Two things here. 1) Some macros for threading and the iterator now require a trailing semicolon. This change will be reverted before the 1.7 release so that scipy 0.10 will compile, but because it is desirable in the long term it would be helpful if folks maintaining c extensions using numpy would try compiling them against current development and adding the semicolon where needed. The added semicolon will be backward compatible with earlier versions of numpy. 2) It is proposed to deprecate all of the macros in the old_defines.h file and require the use of their replacements. Numpy itself will have made this change after pull-189 is merged and getting rid of the surplus macros will help clean up the historical detritus that has built up over the years, easing maintenance, clarifying code, and making the eventual transition to 2.0 a bit easier. There is a sed script in the tools directory as part of the pull request that can be used to make the needed substitutions. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.haessig at crans.org Wed Feb 1 16:57:42 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Wed, 01 Feb 2012 22:57:42 +0100 Subject: [Numpy-discussion] Broadcasting doesn't work with divide after tile In-Reply-To: References: Message-ID: <4F29B556.90305@crans.org> Le 01/02/2012 21:09, Benjamin Root a ?crit : > I can't reproduce this bug with the latest numpy from github master. > Perhaps it has been fixed by now? Hi, I've no idea what's going on, but here is my $0.02 contribution. I reproduced the bug (numpy 1.5.1) with a rather minimal script. See attached. Best, Pierre -------------- next part -------------- A non-text attachment was scrubbed... Name: ma_tiling_issue.py Type: text/x-python Size: 478 bytes Desc: not available URL: From charlesr.harris at gmail.com Wed Feb 1 18:29:04 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 1 Feb 2012 16:29:04 -0700 Subject: [Numpy-discussion] Documentation question. Message-ID: The macro PyArray_RemoveLargest has been replaced by PyArray_RemoveSmallest (which seems strange), but I wonder if this documentation still makes sense. diff --git a/doc/source/user/c-info.beyond-basics.rst b/doc/source/user/ c-info.beyond-basics.rs index 9ed2ab3..3437985 100644 --- a/doc/source/user/c-info.beyond-basics.rst +++ b/doc/source/user/c-info.beyond-basics.rst @@ -189,7 +189,7 @@ feature follows. PyArray_MultiIter_NEXT(mobj); } -The function :cfunc:`PyArray_RemoveLargest` ( ``multi`` ) can be used to +The function :cfunc:`PyArray_RemoveSmallest` ( ``multi`` ) can be used to take a multi-iterator object and adjust all the iterators so that iteration does not take place over the largest dimension (it makes that dimension of size 1). The code being looped over that makes use Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Wed Feb 1 18:48:51 2012 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 1 Feb 2012 17:48:51 -0600 Subject: [Numpy-discussion] autocorrelation computation performance : use of np.correlate In-Reply-To: <4F2984E6.4070005@crans.org> References: <4F2984E6.4070005@crans.org> Message-ID: On Wednesday, February 1, 2012, Pierre Haessig wrote: > Hi, > > [I'm not sure whether this discussion belongs to numpy-discussion or scipy-dev] > > In day to day time series analysis I regularly need to look at the data autocorrelation ("acorr" or "acf" depending on the software package). > The straighforward available function I have is matplotlib.pyplot.acorr. However, for a moderately long time series (say of length 10**5) it taking a huge time just to just dislays the autocorrelation values within a small range of time lags. > The main reason being it is relying on np.correlate(x,x, mode=2) while only a few lags are needed. > (I guess mode=2 is an (old fashioned?) way to set mode='full') > > I know that np.correlate performance issue has been discussed already, and there is a *somehow* related ticket ( http://projects.scipy.org/numpy/ticket/1260). I noticed in the ticket's change number 2 the following comment by Josef : "Maybe a truncated convolution/correlation would be good". I'll come back to this soon. > > I made an example script "acf_timing.py" to start my point with some timing data : > > In Ipython: >>>> run acf_timing.py # it imports statsmodel's acf + define 2 other acf implementations + an example data 10**5 samples long > > %time l,c = mpl_acf(a, 10) > CPU times: user 8.69 s, sys: 0.00 s, total: 8.69 s > Wall time: 11.18 s # pretty long... > > %time c = sm_acf(a, 10) > CPU times: user 8.76 s, sys: 0.01 s, total: 8.78 s > Wall time: 10.79 s # long as well. statsmodel has a similar underlying implementation > # http://statsmodels.sourceforge.net/generated/scikits.statsmodels.tsa.stattools.acf.html#scikits.statsmodels.tsa.stattools.acf > > #Now, better option : use the fft convolution > %time c=sm_acf(a, 10,fft=True) > CPU times: user 0.03 s, sys: 0.01 s, total: 0.04 s > Wall time: 0.07 s > # Fast, but I'm not sure about the memory implication of using fft though. > > #The naive option : just compute the acf lags that are needed > %time l,c = naive_acf(a, 10) > CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s > Wall time: 0.01 s > # Iterative computation. Pretty silly but very fast > # (Now of course, this naive implementation won't scale nicely for a lot of lags) > > Now comes (at last) the question : what should be done about this performance issue ? > - should there be a truncated np.convolve/np.correlate function, as Josef suggested ? > - or should people in need of autocorrelation find some workarounds because this usecase is not big enough to call for a change in np.convolve ? > > I really feel this question is about *where* a change should be implemented (numpy, scipy.signal, maplotlib ?) so that it makes sense while not breaking 10^10 lines of numpy related code... > > Best, > Pierre > > Speaking for matplotlib, the acorr() (and xcorr()) functions in mpl are merely a convenience. The proper place for any change would not be mpl (although, we would certainly take advantage of any improved acorr() and xcorr() that are made available in numpy. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Feb 1 19:58:28 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 1 Feb 2012 19:58:28 -0500 Subject: [Numpy-discussion] autocorrelation computation performance : use of np.correlate In-Reply-To: References: <4F2984E6.4070005@crans.org> Message-ID: On Wed, Feb 1, 2012 at 6:48 PM, Benjamin Root wrote: > > > On Wednesday, February 1, 2012, Pierre Haessig > wrote: >> Hi, >> >> [I'm not sure whether this discussion belongs to numpy-discussion or >> scipy-dev] >> >> In day to day time series analysis I regularly need to look at the data >> autocorrelation ("acorr" or "acf" depending on the software package). >> The straighforward available function I have is matplotlib.pyplot.acorr. >> However, for a moderately long time series (say of length 10**5) it taking a >> huge time just to just dislays the autocorrelation values within a small >> range of time lags. >> The main reason being it is relying on np.correlate(x,x, mode=2) while >> only a few lags are needed. >> (I guess mode=2 is an (old fashioned?) way to set mode='full') >> >> I know that np.correlate performance issue has been discussed already, and >> there is a *somehow* related ticket >> (http://projects.scipy.org/numpy/ticket/1260). I noticed in the ticket's >> change number 2 the following comment by Josef : "Maybe a truncated >> convolution/correlation would be good". I'll come back to this soon. >> >> I made an example script "acf_timing.py" to start my point with some >> timing data : >> >> In Ipython: >>>>> run acf_timing.py # it imports statsmodel's acf + define 2 other acf >>>>> implementations + an example data 10**5 samples long >> >> %time l,c = mpl_acf(a, 10) >> CPU times: user 8.69 s, sys: 0.00 s, total: 8.69 s >> Wall time: 11.18 s # pretty long... >> >> ?%time c = sm_acf(a, 10) >> CPU times: user 8.76 s, sys: 0.01 s, total: 8.78 s >> Wall time: 10.79 s # long as well. statsmodel has a similar underlying >> implementation >> # >> http://statsmodels.sourceforge.net/generated/scikits.statsmodels.tsa.stattools.acf.html#scikits.statsmodels.tsa.stattools.acf >> >> #Now, better option : use the fft convolution >> %time c=sm_acf(a, 10,fft=True) >> CPU times: user 0.03 s, sys: 0.01 s, total: 0.04 s >> Wall time: 0.07 s >> # Fast, but I'm not sure about the memory implication of using fft though. >> >> #The naive option : just compute the acf lags that are needed >> %time l,c = naive_acf(a, 10) >> CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s >> Wall time: 0.01 s >> # Iterative computation. Pretty silly but very fast >> # (Now of course, this naive implementation won't scale nicely for a lot >> of lags) I don't think it's silly to have a short python loop, statsmodels actually uses the loop in the models, for example in yule_walker (and GLSAR), because in most statistical application I wouldn't expect a large number of lags. The time series models don't use the acov directly, but I think most of the time we just loop over the lags. >> >> Now comes (at last) the question : what should be done about this >> performance issue ? >> ?- should there be a truncated np.convolve/np.correlate function, as Josef >> suggested ? >> ?- or should people in need of autocorrelation find some workarounds >> because this usecase is not big enough to call for a change in np.convolve ? >> >> I really feel this question is about *where* a change should be >> implemented ?(numpy, scipy.signal, maplotlib ?) so that it makes sense while >> not breaking 10^10 lines of numpy related code... >> >> Best, >> Pierre >> >> > > Speaking for matplotlib, the acorr() (and xcorr()) functions in mpl are > merely a convenience. ?The proper place for any change would not be mpl > (although, we would certainly take advantage of any improved acorr() and > xcorr() that are made available in numpy. I also think that numpy or scipy would be the natural candidates for a correlate that works fast for an intermediate number of desired lags (but still short compared to length of data). Josef > > Ben Root > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From mwwiebe at gmail.com Wed Feb 1 20:04:13 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 1 Feb 2012 17:04:13 -0800 Subject: [Numpy-discussion] Documentation question. In-Reply-To: References: Message-ID: On Wed, Feb 1, 2012 at 3:29 PM, Charles R Harris wrote: > The macro PyArray_RemoveLargest has been replaced by > PyArray_RemoveSmallest (which seems strange), but I wonder if this > documentation still makes sense. > My impression about this code is that it went through a number of rounds trying to choose an iteration order heuristic that has improved performance over C-order. The change of Largest to Smallest probably reflects one of these heuristic changes. I think it's safe to say that the nditer introduced in 1.6 completely removes the need for this functionality. I did a grep for this function in the master branch, and it is no longer used by NumPy internally. -Mark > diff --git a/doc/source/user/c-info.beyond-basics.rst b/doc/source/user/ > c-info.beyond-basics.rs > index 9ed2ab3..3437985 100644 > --- a/doc/source/user/c-info.beyond-basics.rst > +++ b/doc/source/user/c-info.beyond-basics.rst > @@ -189,7 +189,7 @@ feature follows. > PyArray_MultiIter_NEXT(mobj); > } > > -The function :cfunc:`PyArray_RemoveLargest` ( ``multi`` ) can be used to > +The function :cfunc:`PyArray_RemoveSmallest` ( ``multi`` ) can be used to > take a multi-iterator object and adjust all the iterators so that > iteration does not take place over the largest dimension (it makes > that dimension of size 1). The code being looped over that makes use > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Wed Feb 1 20:49:26 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 1 Feb 2012 17:49:26 -0800 Subject: [Numpy-discussion] combination of list of indices and newaxis not allowed? In-Reply-To: References: Message-ID: On Wed, Feb 1, 2012 at 3:47 AM, Olivier Delalleau wrote: > I think you just can't use newaxis in advanced indexing (doc says "The > newaxis object can be used in the basic slicing syntax", and does not > mention newaxis in the advanced indexing part). Yes, with fancy indexing the two arguments need to be broadcast together, and this fails for [0, 2] and None (newaxis is simply the None object). St?fan From travis at continuum.io Wed Feb 1 21:14:48 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 1 Feb 2012 20:14:48 -0600 Subject: [Numpy-discussion] Documentation question. In-Reply-To: References: Message-ID: <8C03579A-2F62-43FE-99EE-D536066B02CA@continuum.io> On Feb 1, 2012, at 7:04 PM, Mark Wiebe wrote: > On Wed, Feb 1, 2012 at 3:29 PM, Charles R Harris wrote: > The macro PyArray_RemoveLargest has been replaced by PyArray_RemoveSmallest (which seems strange), but I wonder if this documentation still makes sense. > > My impression about this code is that it went through a number of rounds trying to choose an iteration order heuristic that has improved performance over C-order. The change of Largest to Smallest probably reflects one of these heuristic changes. I think it's safe to say that the nditer introduced in 1.6 completely removes the need for this functionality. I did a grep for this function in the master branch, and it is no longer used by NumPy internally. There is a common need to iterate over all but one dimension of a NumPy array. The final dimension is iterated over in an "internal" loop. This is the essence of how ufuncs work and avoid the possibly expensive overhead of a C-call during each iteration. Initially, it seemed prudent to remove the dimension that had the largest size (so that the final inner-iteration was the largest number). Later, timings revealed that that the 'inner' dimension should be the one with the smallest *striding*. I have not looked at nditer in detail, but would appreciate seeing an explanation of how the nditer approach removes the need for this macro. When that is clear, then this macro can and should be deprecated. -Travis > > -Mark > > diff --git a/doc/source/user/c-info.beyond-basics.rst b/doc/source/user/c-info.beyond-basics.rs > index 9ed2ab3..3437985 100644 > --- a/doc/source/user/c-info.beyond-basics.rst > +++ b/doc/source/user/c-info.beyond-basics.rst > @@ -189,7 +189,7 @@ feature follows. > PyArray_MultiIter_NEXT(mobj); > } > > -The function :cfunc:`PyArray_RemoveLargest` ( ``multi`` ) can be used to > +The function :cfunc:`PyArray_RemoveSmallest` ( ``multi`` ) can be used to > take a multi-iterator object and adjust all the iterators so that > iteration does not take place over the largest dimension (it makes > that dimension of size 1). The code being looped over that makes use > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Wed Feb 1 21:31:13 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 1 Feb 2012 18:31:13 -0800 Subject: [Numpy-discussion] Documentation question. In-Reply-To: <8C03579A-2F62-43FE-99EE-D536066B02CA@continuum.io> References: <8C03579A-2F62-43FE-99EE-D536066B02CA@continuum.io> Message-ID: On Wed, Feb 1, 2012 at 6:14 PM, Travis Oliphant wrote: > > On Feb 1, 2012, at 7:04 PM, Mark Wiebe wrote: > > On Wed, Feb 1, 2012 at 3:29 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> The macro PyArray_RemoveLargest has been replaced by >> PyArray_RemoveSmallest (which seems strange), but I wonder if this >> documentation still makes sense. >> > > My impression about this code is that it went through a number of rounds > trying to choose an iteration order heuristic that has improved performance > over C-order. The change of Largest to Smallest probably reflects one of > these heuristic changes. I think it's safe to say that the nditer > introduced in 1.6 completely removes the need for this functionality. I did > a grep for this function in the master branch, and it is no longer used by > NumPy internally. > > > There is a common need to iterate over all but one dimension of a NumPy > array. The final dimension is iterated over in an "internal" loop. This > is the essence of how ufuncs work and avoid the possibly expensive overhead > of a C-call during each iteration. > This use-case is handled by the flag NPY_ITER_EXTERNAL_LOOP ( http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#NPY_ITER_EXTERNAL_LOOP) in the nditer. > Initially, it seemed prudent to remove the dimension that had the largest > size (so that the final inner-iteration was the largest number). Later, > timings revealed that that the 'inner' dimension should be the one with the > smallest *striding*. I have not looked at nditer in detail, but would > appreciate seeing an explanation of how the nditer approach removes the > need for this macro. When that is clear, then this macro can and should > be deprecated. > To see the full list of what to use in the nditer versus the older iterators, I created a table: http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#converting-from-previous-numpy-iterators Only PyArray_BroadcastToShape and PyArray_MultiIter_NEXTi don't have a nice correspondence, because they refer to implementation details in the previous iterators which are done differently in the nditer. -Mark > > -Travis > > > > > > -Mark > > >> diff --git a/doc/source/user/c-info.beyond-basics.rst b/doc/source/user/ >> c-info.beyond-basics.rs >> index 9ed2ab3..3437985 100644 >> --- a/doc/source/user/c-info.beyond-basics.rst >> +++ b/doc/source/user/c-info.beyond-basics.rst >> @@ -189,7 +189,7 @@ feature follows. >> PyArray_MultiIter_NEXT(mobj); >> } >> >> -The function :cfunc:`PyArray_RemoveLargest` ( ``multi`` ) can be used to >> +The function :cfunc:`PyArray_RemoveSmallest` ( ``multi`` ) can be used to >> take a multi-iterator object and adjust all the iterators so that >> iteration does not take place over the largest dimension (it makes >> that dimension of size 1). The code being looped over that makes use >> >> Chuck >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Wed Feb 1 21:42:41 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 1 Feb 2012 20:42:41 -0600 Subject: [Numpy-discussion] Documentation question. In-Reply-To: References: <8C03579A-2F62-43FE-99EE-D536066B02CA@continuum.io> Message-ID: <66E561C8-A77B-4D4F-8676-310BB1B0E0D2@continuum.io> Thanks! What a great doc page. -Travis On Feb 1, 2012, at 8:31 PM, Mark Wiebe wrote: > On Wed, Feb 1, 2012 at 6:14 PM, Travis Oliphant wrote: > > On Feb 1, 2012, at 7:04 PM, Mark Wiebe wrote: > >> On Wed, Feb 1, 2012 at 3:29 PM, Charles R Harris wrote: >> The macro PyArray_RemoveLargest has been replaced by PyArray_RemoveSmallest (which seems strange), but I wonder if this documentation still makes sense. >> >> My impression about this code is that it went through a number of rounds trying to choose an iteration order heuristic that has improved performance over C-order. The change of Largest to Smallest probably reflects one of these heuristic changes. I think it's safe to say that the nditer introduced in 1.6 completely removes the need for this functionality. I did a grep for this function in the master branch, and it is no longer used by NumPy internally. > > There is a common need to iterate over all but one dimension of a NumPy array. The final dimension is iterated over in an "internal" loop. This is the essence of how ufuncs work and avoid the possibly expensive overhead of a C-call during each iteration. > > This use-case is handled by the flag NPY_ITER_EXTERNAL_LOOP (http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#NPY_ITER_EXTERNAL_LOOP) in the nditer. > > Initially, it seemed prudent to remove the dimension that had the largest size (so that the final inner-iteration was the largest number). Later, timings revealed that that the 'inner' dimension should be the one with the smallest *striding*. I have not looked at nditer in detail, but would appreciate seeing an explanation of how the nditer approach removes the need for this macro. When that is clear, then this macro can and should be deprecated. > > To see the full list of what to use in the nditer versus the older iterators, I created a table: > > http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#converting-from-previous-numpy-iterators > > Only PyArray_BroadcastToShape and PyArray_MultiIter_NEXTi don't have a nice correspondence, because they refer to implementation details in the previous iterators which are done differently in the nditer. > > -Mark > > > -Travis > > > > >> >> -Mark >> >> diff --git a/doc/source/user/c-info.beyond-basics.rst b/doc/source/user/c-info.beyond-basics.rs >> index 9ed2ab3..3437985 100644 >> --- a/doc/source/user/c-info.beyond-basics.rst >> +++ b/doc/source/user/c-info.beyond-basics.rst >> @@ -189,7 +189,7 @@ feature follows. >> PyArray_MultiIter_NEXT(mobj); >> } >> >> -The function :cfunc:`PyArray_RemoveLargest` ( ``multi`` ) can be used to >> +The function :cfunc:`PyArray_RemoveSmallest` ( ``multi`` ) can be used to >> take a multi-iterator object and adjust all the iterators so that >> iteration does not take place over the largest dimension (it makes >> that dimension of size 1). The code being looped over that makes use >> >> Chuck >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Wed Feb 1 22:13:30 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 1 Feb 2012 21:13:30 -0600 Subject: [Numpy-discussion] Documentation question. In-Reply-To: References: <8C03579A-2F62-43FE-99EE-D536066B02CA@continuum.io> Message-ID: <5CEBD114-7FE5-4ED1-83FD-19A2D24D6B4D@continuum.io> Hey Mark, I spent some quality time with your iterator docs tonight and look forward to getting into the code a bit more soon. I wanted to get your general impressions about what it would take to extend the iterator API to handle iterating over "regions" of the inputs --- i.e. to support generalized ufuncs. Also, on my todo list is to compare generalized ufuncs with "threading" in PDL (Perl Data Language) and see if we can support that in NumPy. Threading is the word that PDL uses to describe "broadcasting" --- but it does more than ufuncs. Here is some information on it. http://marketingstartups.com/2009/05/28/the-first-six-steps-of-getting-your-startup-noticed/ -Travis On Feb 1, 2012, at 8:31 PM, Mark Wiebe wrote: > On Wed, Feb 1, 2012 at 6:14 PM, Travis Oliphant wrote: > > On Feb 1, 2012, at 7:04 PM, Mark Wiebe wrote: > >> On Wed, Feb 1, 2012 at 3:29 PM, Charles R Harris wrote: >> The macro PyArray_RemoveLargest has been replaced by PyArray_RemoveSmallest (which seems strange), but I wonder if this documentation still makes sense. >> >> My impression about this code is that it went through a number of rounds trying to choose an iteration order heuristic that has improved performance over C-order. The change of Largest to Smallest probably reflects one of these heuristic changes. I think it's safe to say that the nditer introduced in 1.6 completely removes the need for this functionality. I did a grep for this function in the master branch, and it is no longer used by NumPy internally. > > There is a common need to iterate over all but one dimension of a NumPy array. The final dimension is iterated over in an "internal" loop. This is the essence of how ufuncs work and avoid the possibly expensive overhead of a C-call during each iteration. > > This use-case is handled by the flag NPY_ITER_EXTERNAL_LOOP (http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#NPY_ITER_EXTERNAL_LOOP) in the nditer. > > Initially, it seemed prudent to remove the dimension that had the largest size (so that the final inner-iteration was the largest number). Later, timings revealed that that the 'inner' dimension should be the one with the smallest *striding*. I have not looked at nditer in detail, but would appreciate seeing an explanation of how the nditer approach removes the need for this macro. When that is clear, then this macro can and should be deprecated. > > To see the full list of what to use in the nditer versus the older iterators, I created a table: > > http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#converting-from-previous-numpy-iterators > > Only PyArray_BroadcastToShape and PyArray_MultiIter_NEXTi don't have a nice correspondence, because they refer to implementation details in the previous iterators which are done differently in the nditer. > > -Mark > > > -Travis > > > > >> >> -Mark >> >> diff --git a/doc/source/user/c-info.beyond-basics.rst b/doc/source/user/c-info.beyond-basics.rs >> index 9ed2ab3..3437985 100644 >> --- a/doc/source/user/c-info.beyond-basics.rst >> +++ b/doc/source/user/c-info.beyond-basics.rst >> @@ -189,7 +189,7 @@ feature follows. >> PyArray_MultiIter_NEXT(mobj); >> } >> >> -The function :cfunc:`PyArray_RemoveLargest` ( ``multi`` ) can be used to >> +The function :cfunc:`PyArray_RemoveSmallest` ( ``multi`` ) can be used to >> take a multi-iterator object and adjust all the iterators so that >> iteration does not take place over the largest dimension (it makes >> that dimension of size 1). The code being looped over that makes use >> >> Chuck >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Wed Feb 1 22:45:18 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 1 Feb 2012 21:45:18 -0600 Subject: [Numpy-discussion] Documentation question. In-Reply-To: References: <8C03579A-2F62-43FE-99EE-D536066B02CA@continuum.io> Message-ID: <13D57887-3C39-4271-A69E-991D156AFE88@continuum.io> I believe I sent the wrong link to PDL, threading, earlier.... http://www.johnlapeyre.com/pdl/pdldoc/newbook/node5.html#SECTION005100000000000000000 Is the explanation of threading in Perl that I was reading about. I have read about Threading earlier and really liked it's generality. I would be interested to see what we can support in NumPy. I think there is overlap with generalized functions but they are not the same thing. I had sent the other link awhile ago to a different set of people :-) but I guess my Ctrl-V didn't quite work.... Sorry about that noise. -Travis On Feb 1, 2012, at 8:31 PM, Mark Wiebe wrote: > On Wed, Feb 1, 2012 at 6:14 PM, Travis Oliphant wrote: > > On Feb 1, 2012, at 7:04 PM, Mark Wiebe wrote: > >> On Wed, Feb 1, 2012 at 3:29 PM, Charles R Harris wrote: >> The macro PyArray_RemoveLargest has been replaced by PyArray_RemoveSmallest (which seems strange), but I wonder if this documentation still makes sense. >> >> My impression about this code is that it went through a number of rounds trying to choose an iteration order heuristic that has improved performance over C-order. The change of Largest to Smallest probably reflects one of these heuristic changes. I think it's safe to say that the nditer introduced in 1.6 completely removes the need for this functionality. I did a grep for this function in the master branch, and it is no longer used by NumPy internally. > > There is a common need to iterate over all but one dimension of a NumPy array. The final dimension is iterated over in an "internal" loop. This is the essence of how ufuncs work and avoid the possibly expensive overhead of a C-call during each iteration. > > This use-case is handled by the flag NPY_ITER_EXTERNAL_LOOP (http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#NPY_ITER_EXTERNAL_LOOP) in the nditer. > > Initially, it seemed prudent to remove the dimension that had the largest size (so that the final inner-iteration was the largest number). Later, timings revealed that that the 'inner' dimension should be the one with the smallest *striding*. I have not looked at nditer in detail, but would appreciate seeing an explanation of how the nditer approach removes the need for this macro. When that is clear, then this macro can and should be deprecated. > > To see the full list of what to use in the nditer versus the older iterators, I created a table: > > http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#converting-from-previous-numpy-iterators > > Only PyArray_BroadcastToShape and PyArray_MultiIter_NEXTi don't have a nice correspondence, because they refer to implementation details in the previous iterators which are done differently in the nditer. > > -Mark > > > -Travis > > > > >> >> -Mark >> >> diff --git a/doc/source/user/c-info.beyond-basics.rst b/doc/source/user/c-info.beyond-basics.rs >> index 9ed2ab3..3437985 100644 >> --- a/doc/source/user/c-info.beyond-basics.rst >> +++ b/doc/source/user/c-info.beyond-basics.rst >> @@ -189,7 +189,7 @@ feature follows. >> PyArray_MultiIter_NEXT(mobj); >> } >> >> -The function :cfunc:`PyArray_RemoveLargest` ( ``multi`` ) can be used to >> +The function :cfunc:`PyArray_RemoveSmallest` ( ``multi`` ) can be used to >> take a multi-iterator object and adjust all the iterators so that >> iteration does not take place over the largest dimension (it makes >> that dimension of size 1). The code being looped over that makes use >> >> Chuck >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Thu Feb 2 00:58:58 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 1 Feb 2012 23:58:58 -0600 Subject: [Numpy-discussion] Curious behavior of __radd__ In-Reply-To: <87y5smiccs.fsf@ding.tiker.net> References: <87y5smiccs.fsf@ding.tiker.net> Message-ID: This seems odd to me. Unraveling what is going on (so far): Let a = np.complex64(1j) and b = A() * np.complex64.__add__ is calling np.add * np.add(a, b) needs to search for an "add" loop that matches the input types and it finds one with signature ('O', 'O') -> 'O' * a is converted to an array via the equivalent of a1 = array(a,'O') * b is converted to an array in the same way b1 = array(b, 'O') Somehow a1 as an array of objects is an array of "complex" types instead of "complex64" types Then the equivalent of a1[()] + b1[()] is called. So, the conversion is being done by a1 = array(a, 'O') I don't know why this is. This seems like a regression, but I don't have an old version of NumPy to check. compare: type(a) type(np.array(a,'O')[()]) These should be the same type. But they are not... -Travis On Feb 1, 2012, at 1:26 PM, Andreas Kloeckner wrote: > Hi all, > > here's something I don't understand. Consider the following code snippet: > > --------------------------------------------------- > class A(object): > def __radd__(self, other): > print(type(other)) > > import numpy as np > np.complex64(1j) + A() > --------------------------------------------------- > > In my world, this should print . > It does print . > > Who is casting my sized complex to a built-in complex, and why? > > It can be Python's type coercion, because the behavior is the same in > Python 3.2. (And the docs say Python 3 doesn't support coercion.) > > (Please cc me.) > > Thanks, > Andreas > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From warren.weckesser at enthought.com Thu Feb 2 01:04:11 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Thu, 2 Feb 2012 00:04:11 -0600 Subject: [Numpy-discussion] ufunc delegation to object method In-Reply-To: References: Message-ID: Bump... On Mon, Jan 30, 2012 at 1:17 AM, Warren Weckesser < warren.weckesser at enthought.com> wrote: > In the following code, numpy.sin() calls the object's sin() function: > > In [2]: class Foo(object): > ...: def sin(self): > ...: return "spam" > ...: > > In [3]: f = Foo() > > In [4]: np.sin(f) > Out[4]: 'spam' > > Is this, in fact, guaranteed behavior for a ufunc? It does not appear to > be documented. > > This question came up in the discussion of SciPy pull request 138 ( > https://github.com/scipy/scipy/pull/138), where the idea is to add numpy > unary ufunc support to SciPy's sparse arrays. > > (Sorry if this email shows up twice. I sent it the first time while the > Enthought servers were down, and eventually got an email back saying it had > not been sent.) > > Warren > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Thu Feb 2 01:21:10 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 2 Feb 2012 00:21:10 -0600 Subject: [Numpy-discussion] ufunc delegation to object method In-Reply-To: References: Message-ID: Yes. This is the behavior. It was part of the original Numeric implementation. In the code generator file: numpy/core/code_generators/generate_umath.py ufuncs with a registered type of 'P' have this behavior. There is a long list of them. -Travis On Feb 2, 2012, at 12:04 AM, Warren Weckesser wrote: > Bump... > > On Mon, Jan 30, 2012 at 1:17 AM, Warren Weckesser wrote: > In the following code, numpy.sin() calls the object's sin() function: > > In [2]: class Foo(object): > ...: def sin(self): > ...: return "spam" > ...: > > In [3]: f = Foo() > > In [4]: np.sin(f) > Out[4]: 'spam' > > Is this, in fact, guaranteed behavior for a ufunc? It does not appear to be documented. > > This question came up in the discussion of SciPy pull request 138 (https://github.com/scipy/scipy/pull/138), where the idea is to add numpy unary ufunc support to SciPy's sparse arrays. > > (Sorry if this email shows up twice. I sent it the first time while the Enthought servers were down, and eventually got an email back saying it had not been sent.) > > Warren > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at enthought.com Thu Feb 2 01:46:29 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Thu, 2 Feb 2012 00:46:29 -0600 Subject: [Numpy-discussion] ufunc delegation to object method In-Reply-To: References: Message-ID: On Thu, Feb 2, 2012 at 12:21 AM, Travis Oliphant wrote: > Yes. This is the behavior. It was part of the original Numeric > implementation. In the code generator file: > > numpy/core/code_generators/generate_umath.py > > ufuncs with a registered type of 'P' have this behavior. There is a long > list of them. > > -Travis > > Great, thanks. Warren > > On Feb 2, 2012, at 12:04 AM, Warren Weckesser wrote: > > Bump... > > On Mon, Jan 30, 2012 at 1:17 AM, Warren Weckesser < > warren.weckesser at enthought.com> wrote: > >> In the following code, numpy.sin() calls the object's sin() function: >> >> In [2]: class Foo(object): >> ...: def sin(self): >> ...: return "spam" >> ...: >> >> In [3]: f = Foo() >> >> In [4]: np.sin(f) >> Out[4]: 'spam' >> >> Is this, in fact, guaranteed behavior for a ufunc? It does not appear to >> be documented. >> >> This question came up in the discussion of SciPy pull request 138 ( >> https://github.com/scipy/scipy/pull/138), where the idea is to add numpy >> unary ufunc support to SciPy's sparse arrays. >> >> (Sorry if this email shows up twice. I sent it the first time while the >> Enthought servers were down, and eventually got an email back saying it had >> not been sent.) >> >> Warren >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Thu Feb 2 02:50:43 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 2 Feb 2012 01:50:43 -0600 Subject: [Numpy-discussion] Curious behavior of __radd__ In-Reply-To: <87y5smiccs.fsf@ding.tiker.net> References: <87y5smiccs.fsf@ding.tiker.net> Message-ID: <5D47EB77-9009-494A-89D0-77140503C760@continuum.io> Hey Andreas, As previously described: what changes the type of np.complex64(1j) during the A() call is that when a is an array scalar it is converted to an object array because that is the only signature that matches. During this conversion, what is extracted from the object array is piped through the equivalent of .item() which tries to map to a standard Python object. It must be different to break the infinite recursion that would otherwise get setup as the ufunc registered for "Object" loops just calls PyNumber_Add with the extracted objects. If the extracted object were again the original array scalar, it's __add__ method would be called, which would just call back into the ufunc machinery setting up the cycle again... I've confirmed this has been the behavior since at least 1.4.x. So, it's not a regression. It is actually intentionally to avoid the recursion. The proper fix is to add actual scalar math methods instead of re-using the ufunc machinery for the array scalars (this would be much faster as well...) -Travis On Feb 1, 2012, at 1:26 PM, Andreas Kloeckner wrote: > Hi all, > > here's something I don't understand. Consider the following code snippet: > > --------------------------------------------------- > class A(object): > def __radd__(self, other): > print(type(other)) > > import numpy as np > np.complex64(1j) + A() > --------------------------------------------------- > > In my world, this should print . > It does print . > > Who is casting my sized complex to a built-in complex, and why? > > It can be Python's type coercion, because the behavior is the same in > Python 3.2. (And the docs say Python 3 doesn't support coercion.) > > (Please cc me.) > > Thanks, > Andreas > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From mgroszhauser at gmail.com Thu Feb 2 03:52:38 2012 From: mgroszhauser at gmail.com (=?ISO-8859-1?Q?martin_gro=DFhauser?=) Date: Thu, 2 Feb 2012 09:52:38 +0100 Subject: [Numpy-discussion] Broadcasting doesn't work with divide after tile In-Reply-To: <4F29B556.90305@crans.org> References: <4F29B556.90305@crans.org> Message-ID: On Wed, Feb 1, 2012 at 10:57 PM, Pierre Haessig wrote: > I've no idea what's going on, but here is my $0.02 contribution. I > reproduced the bug (numpy 1.5.1) with a rather minimal script. See attached. I reproduced the issue with Pierre's script also in numpy 1.6.1 and latest github (2.0.0.dev-b8bfcd0). In newer versions the error message is: Traceback (most recent call last): File "ma_tiling_issue.py", line 18, in a/100. #raises ValueError: invalid return array shape File "/usr/local/lib/python2.7/dist-packages/numpy/ma/core.py", line 3654, in __div__ return divide(self, other) File "/usr/local/lib/python2.7/dist-packages/numpy/ma/core.py", line 1078, in __call__ m |= ma ValueError: non-broadcastable output operand with shape (3,3,3) doesn't match the broadcast shape (1,3,3,3) I still don't know what's going on. Is the internal representation (shape) of the array changed by the tile instruction? I created a ticket: http://projects.scipy.org/numpy/ticket/2035 From madsipsen at gmail.com Thu Feb 2 04:40:14 2012 From: madsipsen at gmail.com (Mads Ipsen) Date: Thu, 02 Feb 2012 10:40:14 +0100 Subject: [Numpy-discussion] Unexpected reorganization of internal data In-Reply-To: References: <4F27A663.4000403@gmail.com> Message-ID: <4F2A59FE.3030908@gmail.com> On 31/01/2012 18:23, Chris Barker wrote: > On Tue, Jan 31, 2012 at 6:14 AM, Malcolm Reynolds > wrote: >> Not exactly an answer to your question, but I can highly recommend >> using Boost.python, PyUblas and Ublas for your C++ vectors and >> matrices. It gives you a really good interface on the C++ side to >> numpy arrays and matrices, which can be passed in both directions over >> the language threshold with no copying. > or use Cython... > >> If I had to guess I'd say sometimes when transposing numpy simply sets >> a flag internally to avoid copying the data, but in some cases (such >> as perhaps when multiplication needs to take place) the data has to be >> placed in a new object. > good guess: > >> V = numpy.dot(R, U.transpose()).transpose() >>>> a > array([[1, 2], > [3, 4], > [5, 6]]) >>>> a.flags > C_CONTIGUOUS : True > F_CONTIGUOUS : False > OWNDATA : True > WRITEABLE : True > ALIGNED : True > UPDATEIFCOPY : False > >>>> b = a.transpose() >>>> b.flags > C_CONTIGUOUS : False > F_CONTIGUOUS : True > OWNDATA : False > WRITEABLE : True > ALIGNED : True > UPDATEIFCOPY : False > > so the transpose() simple re-arranges the strides to Fortran order, > rather than changing anything in memory. > > np.dot() produces a new array, so it is C-contiguous, then you > transpose it, so you get a fortran-ordered array. > >> Now when I call my C++ function from the Python side, all the data in V is printed, but it has been transposed. > as mentioned, if you are working with arrays in C++ (or fortran, orC, > or...) and need to count on the ordering of the data, you need to > check it in your extension code. There are utilities for this. > >> However, if I do: >> V = numpy.array(U.transpose()).transpose() > right: > > In [7]: a.flags > Out[7]: > C_CONTIGUOUS : True > F_CONTIGUOUS : False > OWNDATA : True > WRITEABLE : True > ALIGNED : True > UPDATEIFCOPY : False > > In [8]: a.transpose().flags > Out[8]: > C_CONTIGUOUS : False > F_CONTIGUOUS : True > OWNDATA : False > WRITEABLE : True > ALIGNED : True > UPDATEIFCOPY : False > > In [9]: np.array( a.transpose() ).flags > Out[9]: > C_CONTIGUOUS : False > F_CONTIGUOUS : True > OWNDATA : True > WRITEABLE : True > ALIGNED : True > UPDATEIFCOPY : False > > > so the np.array call doesn't re-arrange the order if it doesn't need > to. If you want to force it, you can specify the order: > > In [10]: np.array( a.transpose(), order='C' ).flags > Out[10]: > C_CONTIGUOUS : True > F_CONTIGUOUS : False > OWNDATA : True > WRITEABLE : True > ALIGNED : True > UPDATEIFCOPY : False > > > (note: this does surprise me a bit, as it is making a copy, but there > you go -- if order matters, specify it) > > In general, numpy does a lot of things for the sake of efficiency -- > avoiding copies when it can, for instance -- this give efficiency and > flexibility, but you do need to be careful, particularly when > interfacing with the binary data directly. > > -Chris > > > > > > Thanks for all the answers to my question. Helped a lot. Mads -- +-----------------------------------------------------+ | Mads Ipsen | +----------------------+------------------------------+ | G?seb?ksvej 7, 4. tv | | | DK-2500 Valby | phone: +45-29716388 | | Denmark | email: mads.ipsen at gmail.com | +----------------------+------------------------------+ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ruby185 at gmail.com Thu Feb 2 07:14:05 2012 From: ruby185 at gmail.com (Ruby Stevenson) Date: Thu, 2 Feb 2012 07:14:05 -0500 Subject: [Numpy-discussion] histogram help In-Reply-To: <26FC23E7C398A64083C980D16001012D261F0D938C@VA3DIAXVS361.RED001.local> References: <26FC23E7C398A64083C980D16001012D261F0D938C@VA3DIAXVS361.RED001.local> Message-ID: Exactly, histogram of Z, which itself is an array, for each (x, y). sorry for getting everyone including myself confused :-) I think I am now using histogram call correctly ... but I now have a slightly different question. It maybe better to ask in a different subject, but here it is any way: support I will have a sequence of (x, y), and Z, with x, y are coordinates, and Z being an array for third dimension, Is there a better or more efficient way to build up this 3-D array? thanks Ruby On Tue, Jan 31, 2012 at 4:42 AM, Nadav Horesh wrote: > Do you want a histogramm of z for each (x,y) ? > > ? Nadav > > ________________________________________ > From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] On Behalf Of Ruby Stevenson [ruby185 at gmail.com] > Sent: 30 January 2012 21:27 > To: Discussion of Numerical Python > Subject: Re: [Numpy-discussion] histogram help > > Sorry, I realize I didn't describe the problem completely clear or correct. > > the (x,y) in this case is just many co-ordinates, and ?each coordinate > has a list of values (Z value) associated with it. ?The bins are > allocated for the Z. > > I hope this clarify things a little. Thanks again. > > Ruby > > > > > On Mon, Jan 30, 2012 at 2:21 PM, Ruby Stevenson wrote: >> hi, all >> >> I am trying to figure out how to do histogram with numpy >> >> I have a three-dimension array A[x,y,z], ?another array (bins) has >> been allocated along Z dimension, z' >> >> how can I get the histogram of H[ x, y, z' ]? >> >> thanks for your help. >> >> Ruby > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From shish at keba.be Thu Feb 2 08:46:00 2012 From: shish at keba.be (Olivier Delalleau) Date: Thu, 2 Feb 2012 08:46:00 -0500 Subject: [Numpy-discussion] histogram help In-Reply-To: References: <26FC23E7C398A64083C980D16001012D261F0D938C@VA3DIAXVS361.RED001.local> Message-ID: Sorry but I don't understand your last question. Better / more efficient than what? -=- Olivier Le 2 f?vrier 2012 07:14, Ruby Stevenson a ?crit : > Exactly, histogram of Z, which itself is an array, for each (x, y). > > sorry for getting everyone including myself confused :-) > > I think I am now using histogram call correctly ... but I now have a > slightly different question. It maybe better to ask in a different > subject, but here it is any way: > > support I will have a sequence of (x, y), and Z, with x, y are > coordinates, and Z being an array for third dimension, Is there a > better or more efficient way to build up this 3-D array? > > thanks > > Ruby > > > > > > > On Tue, Jan 31, 2012 at 4:42 AM, Nadav Horesh > wrote: > > Do you want a histogramm of z for each (x,y) ? > > > > Nadav > > > > ________________________________________ > > From: numpy-discussion-bounces at scipy.org [ > numpy-discussion-bounces at scipy.org] On Behalf Of Ruby Stevenson [ > ruby185 at gmail.com] > > Sent: 30 January 2012 21:27 > > To: Discussion of Numerical Python > > Subject: Re: [Numpy-discussion] histogram help > > > > Sorry, I realize I didn't describe the problem completely clear or > correct. > > > > the (x,y) in this case is just many co-ordinates, and each coordinate > > has a list of values (Z value) associated with it. The bins are > > allocated for the Z. > > > > I hope this clarify things a little. Thanks again. > > > > Ruby > > > > > > > > > > On Mon, Jan 30, 2012 at 2:21 PM, Ruby Stevenson > wrote: > >> hi, all > >> > >> I am trying to figure out how to do histogram with numpy > >> > >> I have a three-dimension array A[x,y,z], another array (bins) has > >> been allocated along Z dimension, z' > >> > >> how can I get the histogram of H[ x, y, z' ]? > >> > >> thanks for your help. > >> > >> Ruby > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Thu Feb 2 10:07:05 2012 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 02 Feb 2012 09:07:05 -0600 Subject: [Numpy-discussion] Broadcasting doesn't work with divide after tile In-Reply-To: References: <4F29B556.90305@crans.org> Message-ID: <4F2AA699.2060803@gmail.com> On 02/02/2012 02:52 AM, martin gro?hauser wrote: > On Wed, Feb 1, 2012 at 10:57 PM, Pierre Haessig > wrote: >> I've no idea what's going on, but here is my $0.02 contribution. I >> reproduced the bug (numpy 1.5.1) with a rather minimal script. See attached. > I reproduced the issue with Pierre's script also in numpy 1.6.1 and > latest github (2.0.0.dev-b8bfcd0). In newer versions the error message > is: > > Traceback (most recent call last): > File "ma_tiling_issue.py", line 18, in > a/100. #raises ValueError: invalid return array shape > File "/usr/local/lib/python2.7/dist-packages/numpy/ma/core.py", line > 3654, in __div__ > return divide(self, other) > File "/usr/local/lib/python2.7/dist-packages/numpy/ma/core.py", line > 1078, in __call__ > m |= ma > ValueError: non-broadcastable output operand with shape (3,3,3) > doesn't match the broadcast shape (1,3,3,3) > > I still don't know what's going on. Is the internal representation > (shape) of the array changed by the tile instruction? > > I created a ticket: http://projects.scipy.org/numpy/ticket/2035 > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion This is triggered when the array's mask is not 'False' and appears to be due to the call on line 827 of numpy/lib/shape_base.py: c = _nx.array(A,copy=False,subok=True,ndmin=d) where _nx is 'import numpy.core.numeric as _nx'. Also, setting 'copy=True' in the call does not change anything. You probably can get around it by passing a copy to np.tile(): sp4d = np.tile(a.copy(), (4, 1, 1, 1)) Bruce From bsouthey at gmail.com Thu Feb 2 11:58:57 2012 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 02 Feb 2012 10:58:57 -0600 Subject: [Numpy-discussion] Heads up and macro deprecation. In-Reply-To: References: Message-ID: <4F2AC0D1.5020409@gmail.com> On 02/01/2012 02:53 PM, Charles R Harris wrote: > Hi All, > > Two things here. > > 1) Some macros for threading and the iterator now require a trailing > semicolon. This change will be reverted before the 1.7 release so that > scipy 0.10 will compile, but because it is desirable in the long term > it would be helpful if folks maintaining c extensions using numpy > would try compiling them against current development and adding the > semicolon where needed. The added semicolon will be backward > compatible with earlier versions of numpy. Why do the changes need to "be reverted before the 1.7 release'? Scipy 0.10 was released nearly three months ago so we should be moving forward. I think this is not the first time a released scipy would not build with the 'future' numpy. But most of the scipy 0.10 downloads are binaries so I presume that this change should not affect those users. But if this is such a major downstream problem, just have a very, very minor bug-fix very much restricted to this issue. If the changes do not affect binary users then perhaps just a re-release of the source archives would be needed rather than a full bug release. > > 2) It is proposed to deprecate all of the macros in the old_defines.h > file and require the use of their replacements. Numpy itself will have > made this change after pull-189 > is merged and getting rid of > the surplus macros will help clean up the historical detritus that has > built up over the years, easing maintenance, clarifying code, and > making the eventual transition to 2.0 a bit easier. There is a sed > script in the tools directory as part of the pull request that can be > used to make the needed substitutions. Isn't this just formalizing the name changes that has been happening for some time in 'core/include/numpy/old_defines.h'? That is people have really been using the 'new macros' for ages, just that these have been 'called' with the old names. If so, I would be support an aggressive stance for those changes are just renaming and the slow depreciation cycle for other cases. Bruce > > Thoughts? > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Feb 2 12:36:51 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 2 Feb 2012 10:36:51 -0700 Subject: [Numpy-discussion] Heads up and macro deprecation. In-Reply-To: <4F2AC0D1.5020409@gmail.com> References: <4F2AC0D1.5020409@gmail.com> Message-ID: On Thu, Feb 2, 2012 at 9:58 AM, Bruce Southey wrote: > On 02/01/2012 02:53 PM, Charles R Harris wrote: > > Hi All, > > Two things here. > > 1) Some macros for threading and the iterator now require a trailing > semicolon. This change will be reverted before the 1.7 release so that > scipy 0.10 will compile, but because it is desirable in the long term it > would be helpful if folks maintaining c extensions using numpy would try > compiling them against current development and adding the semicolon where > needed. The added semicolon will be backward compatible with earlier > versions of numpy. > > > Why do the changes need to "be reverted before the 1.7 release'? > Scipy 0.10 was released nearly three months ago so we should be moving > forward. I think this is not the first time a released scipy would not > build with the 'future' numpy. But most of the scipy 0.10 downloads are > binaries so I presume that this change should not affect those users. But > if this is such a major downstream problem, just have a very, very minor > bug-fix very much restricted to this issue. If the changes do not affect > binary users then perhaps just a re-release of the source archives would be > needed rather than a full bug release. > > That was Ralph's preference. > > 2) It is proposed to deprecate all of the macros in the old_defines.h file > and require the use of their replacements. Numpy itself will have made this > change after pull-189 is merged > and getting rid of the surplus macros will help clean up the historical > detritus that has built up over the years, easing maintenance, clarifying > code, and making the eventual transition to 2.0 a bit easier. There is a > sed script in the tools directory as part of the pull request that can be > used to make the needed substitutions. > > > Isn't this just formalizing the name changes that has been happening for > some time in 'core/include/numpy/old_defines.h'? > That is people have really been using the 'new macros' for ages, just that > these have been 'called' with the old names. If so, I would be support an > aggressive stance for those changes are just renaming and the slow > depreciation cycle for other cases. > > Yes, the macro functionality is the same, just the names have changed. I'm also going to update all the noprefix macro uses in numpy in a separate pull request. I expect that will be more disruptive. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Thu Feb 2 13:58:35 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 2 Feb 2012 19:58:35 +0100 Subject: [Numpy-discussion] Heads up and macro deprecation. In-Reply-To: References: <4F2AC0D1.5020409@gmail.com> Message-ID: On Thu, Feb 2, 2012 at 6:36 PM, Charles R Harris wrote: > > > On Thu, Feb 2, 2012 at 9:58 AM, Bruce Southey wrote: > >> On 02/01/2012 02:53 PM, Charles R Harris wrote: >> >> Hi All, >> >> Two things here. >> >> 1) Some macros for threading and the iterator now require a trailing >> semicolon. This change will be reverted before the 1.7 release so that >> scipy 0.10 will compile, but because it is desirable in the long term it >> would be helpful if folks maintaining c extensions using numpy would try >> compiling them against current development and adding the semicolon where >> needed. The added semicolon will be backward compatible with earlier >> versions of numpy. >> >> >> Why do the changes need to "be reverted before the 1.7 release'? >> Scipy 0.10 was released nearly three months ago so we should be moving >> forward. I think this is not the first time a released scipy would not >> build with the 'future' numpy. But most of the scipy 0.10 downloads are >> binaries so I presume that this change should not affect those users. But >> if this is such a major downstream problem, just have a very, very minor >> bug-fix very much restricted to this issue. If the changes do not affect >> binary users then perhaps just a re-release of the source archives would be >> needed rather than a full bug release. >> >> I don't think that it has happened in the recent past that the last released version of scipy wouldn't build with the last released numpy. And it would be a problem IMHO. The alternative would be to do a 0.10.1 release for this. Ralf > That was Ralph's preference. > >> >> 2) It is proposed to deprecate all of the macros in the old_defines.h >> file and require the use of their replacements. Numpy itself will have made >> this change after pull-189 is >> merged and getting rid of the surplus macros will help clean up the >> historical detritus that has built up over the years, easing maintenance, >> clarifying code, and making the eventual transition to 2.0 a bit easier. >> There is a sed script in the tools directory as part of the pull request >> that can be used to make the needed substitutions. >> >> >> Isn't this just formalizing the name changes that has been happening for >> some time in 'core/include/numpy/old_defines.h'? >> That is people have really been using the 'new macros' for ages, just >> that these have been 'called' with the old names. If so, I would be support >> an aggressive stance for those changes are just renaming and the slow >> depreciation cycle for other cases. >> >> > Yes, the macro functionality is the same, just the names have changed. > > I'm also going to update all the noprefix macro uses in numpy in a > separate pull request. I expect that will be more disruptive. > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Thu Feb 2 14:13:35 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 2 Feb 2012 13:13:35 -0600 Subject: [Numpy-discussion] Heads up and macro deprecation. In-Reply-To: References: <4F2AC0D1.5020409@gmail.com> Message-ID: On Feb 2, 2012, at 12:58 PM, Ralf Gommers wrote: > > > On Thu, Feb 2, 2012 at 6:36 PM, Charles R Harris wrote: > > > On Thu, Feb 2, 2012 at 9:58 AM, Bruce Southey wrote: > On 02/01/2012 02:53 PM, Charles R Harris wrote: >> >> Hi All, >> >> Two things here. >> >> 1) Some macros for threading and the iterator now require a trailing semicolon. This change will be reverted before the 1.7 release so that scipy 0.10 will compile, but because it is desirable in the long term it would be helpful if folks maintaining c extensions using numpy would try compiling them against current development and adding the semicolon where needed. The added semicolon will be backward compatible with earlier versions of numpy. > > Why do the changes need to "be reverted before the 1.7 release'? > Scipy 0.10 was released nearly three months ago so we should be moving forward. I think this is not the first time a released scipy would not build with the 'future' numpy. But most of the scipy 0.10 downloads are binaries so I presume that this change should not affect those users. But if this is such a major downstream problem, just have a very, very minor bug-fix very much restricted to this issue. If the changes do not affect binary users then perhaps just a re-release of the source archives would be needed rather than a full bug release. > > > I don't think that it has happened in the recent past that the last released version of scipy wouldn't build with the last released numpy. And it would be a problem IMHO. The alternative would be to do a 0.10.1 release for this. The latest released version of SciPy should definitely build with the latest released version of NumPy. I like the idea of a 0.10.1 release to get these changes out there and supported. -Travis > > Ralf > > > That was Ralph's preference. >> >> 2) It is proposed to deprecate all of the macros in the old_defines.h file and require the use of their replacements. Numpy itself will have made this change after pull-189 is merged and getting rid of the surplus macros will help clean up the historical detritus that has built up over the years, easing maintenance, clarifying code, and making the eventual transition to 2.0 a bit easier. There is a sed script in the tools directory as part of the pull request that can be used to make the needed substitutions. > > Isn't this just formalizing the name changes that has been happening for some time in 'core/include/numpy/old_defines.h'? > That is people have really been using the 'new macros' for ages, just that these have been 'called' with the old names. If so, I would be support an aggressive stance for those changes are just renaming and the slow depreciation cycle for other cases. > > > Yes, the macro functionality is the same, just the names have changed. > > I'm also going to update all the noprefix macro uses in numpy in a separate pull request. I expect that will be more disruptive. > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Thu Feb 2 17:18:59 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Thu, 2 Feb 2012 14:18:59 -0800 Subject: [Numpy-discussion] Documentation question. In-Reply-To: <5CEBD114-7FE5-4ED1-83FD-19A2D24D6B4D@continuum.io> References: <8C03579A-2F62-43FE-99EE-D536066B02CA@continuum.io> <5CEBD114-7FE5-4ED1-83FD-19A2D24D6B4D@continuum.io> Message-ID: On Wed, Feb 1, 2012 at 7:13 PM, Travis Oliphant wrote: > Hey Mark, > > I spent some quality time with your iterator docs tonight and look forward > to getting into the code a bit more soon. I wanted to get your general > impressions about what it would take to extend the iterator API to handle > iterating over "regions" of the inputs --- i.e. to support generalized > ufuncs. > Supposing this feature occurred to me, and I were to name it "subarray iteration," I would create a branch to start developing it here: https://github.com/m-paradox/numpy/tree/subarray_iter Brief documentation of how the interface could look is here: https://github.com/m-paradox/numpy/commit/610e7ae0bad66b95988bd2a933c08537b26898c5 > Also, on my todo list is to compare generalized ufuncs with "threading" in > PDL (Perl Data Language) and see if we can support that in NumPy. > Threading is the word that PDL uses to describe "broadcasting" --- but it > does more than ufuncs. > > Here is some information on it. > I've read this document before, and recall not seeing anything that was more than broadcasting. The only notable difference from broadcasting was that threading adds axis padding to the right instead of to the left. -Mark > > -Travis > > > > On Feb 1, 2012, at 8:31 PM, Mark Wiebe wrote: > > On Wed, Feb 1, 2012 at 6:14 PM, Travis Oliphant wrote: > >> >> On Feb 1, 2012, at 7:04 PM, Mark Wiebe wrote: >> >> On Wed, Feb 1, 2012 at 3:29 PM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> The macro PyArray_RemoveLargest has been replaced by >>> PyArray_RemoveSmallest (which seems strange), but I wonder if this >>> documentation still makes sense. >>> >> >> My impression about this code is that it went through a number of rounds >> trying to choose an iteration order heuristic that has improved performance >> over C-order. The change of Largest to Smallest probably reflects one of >> these heuristic changes. I think it's safe to say that the nditer >> introduced in 1.6 completely removes the need for this functionality. I did >> a grep for this function in the master branch, and it is no longer used by >> NumPy internally. >> >> >> There is a common need to iterate over all but one dimension of a NumPy >> array. The final dimension is iterated over in an "internal" loop. This >> is the essence of how ufuncs work and avoid the possibly expensive overhead >> of a C-call during each iteration. >> > > This use-case is handled by the flag NPY_ITER_EXTERNAL_LOOP ( > http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#NPY_ITER_EXTERNAL_LOOP) > in the nditer. > > >> Initially, it seemed prudent to remove the dimension that had the largest >> size (so that the final inner-iteration was the largest number). Later, >> timings revealed that that the 'inner' dimension should be the one with the >> smallest *striding*. I have not looked at nditer in detail, but would >> appreciate seeing an explanation of how the nditer approach removes the >> need for this macro. When that is clear, then this macro can and should >> be deprecated. >> > > To see the full list of what to use in the nditer versus the older > iterators, I created a table: > > > http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#converting-from-previous-numpy-iterators > > Only PyArray_BroadcastToShape and PyArray_MultiIter_NEXTi don't have a > nice correspondence, because they refer to implementation details in the > previous iterators which are done differently in the nditer. > > -Mark > > >> >> -Travis >> >> >> >> >> >> -Mark >> >> >>> diff --git a/doc/source/user/c-info.beyond-basics.rst b/doc/source/user/ >>> c-info.beyond-basics.rs >>> index 9ed2ab3..3437985 100644 >>> --- a/doc/source/user/c-info.beyond-basics.rst >>> +++ b/doc/source/user/c-info.beyond-basics.rst >>> @@ -189,7 +189,7 @@ feature follows. >>> PyArray_MultiIter_NEXT(mobj); >>> } >>> >>> -The function :cfunc:`PyArray_RemoveLargest` ( ``multi`` ) can be used to >>> +The function :cfunc:`PyArray_RemoveSmallest` ( ``multi`` ) can be used >>> to >>> take a multi-iterator object and adjust all the iterators so that >>> iteration does not take place over the largest dimension (it makes >>> that dimension of size 1). The code being looped over that makes use >>> >>> Chuck >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Thu Feb 2 17:24:23 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Thu, 2 Feb 2012 14:24:23 -0800 Subject: [Numpy-discussion] Heads up and macro deprecation. In-Reply-To: References: Message-ID: On Wed, Feb 1, 2012 at 12:53 PM, Charles R Harris wrote: > Hi All, > > Two things here. > > 1) Some macros for threading and the iterator now require a trailing > semicolon. This change will be reverted before the 1.7 release so that > scipy 0.10 will compile, but because it is desirable in the long term it > would be helpful if folks maintaining c extensions using numpy would try > compiling them against current development and adding the semicolon where > needed. The added semicolon will be backward compatible with earlier > versions of numpy. > Perhaps we could just deprecate the semicolon thing, so that it changes along with the other C changes? > 2) It is proposed to deprecate all of the macros in the old_defines.h file > and require the use of their replacements. Numpy itself will have made this > change after pull-189 is merged > and getting rid of the surplus macros will help clean up the historical > detritus that has built up over the years, easing maintenance, clarifying > code, and making the eventual transition to 2.0 a bit easier. There is a > sed script in the tools directory as part of the pull request that can be > used to make the needed substitutions. > I'm in favour. In general, it would be nice if NumPy code could serve as examples for people wanting tips on writing NumPy C extensions, and these cleanups are very helpful for that. -Mark > > Thoughts? > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Thu Feb 2 21:11:37 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 2 Feb 2012 20:11:37 -0600 Subject: [Numpy-discussion] Documentation question. In-Reply-To: References: <8C03579A-2F62-43FE-99EE-D536066B02CA@continuum.io> <5CEBD114-7FE5-4ED1-83FD-19A2D24D6B4D@continuum.io> Message-ID: <312E4285-D424-4BD6-923C-2D9DBF91CE84@continuum.io> I see your time machine is in full working order :-) -- Travis Oliphant (on a mobile) 512-826-7480 On Feb 2, 2012, at 4:18 PM, Mark Wiebe wrote: > On Wed, Feb 1, 2012 at 7:13 PM, Travis Oliphant wrote: > Hey Mark, > > I spent some quality time with your iterator docs tonight and look forward to getting into the code a bit more soon. I wanted to get your general impressions about what it would take to extend the iterator API to handle iterating over "regions" of the inputs --- i.e. to support generalized ufuncs. > > Supposing this feature occurred to me, and I were to name it "subarray iteration," I would create a branch to start developing it here: > > https://github.com/m-paradox/numpy/tree/subarray_iter > > Brief documentation of how the interface could look is here: > > https://github.com/m-paradox/numpy/commit/610e7ae0bad66b95988bd2a933c08537b26898c5 > > Also, on my todo list is to compare generalized ufuncs with "threading" in PDL (Perl Data Language) and see if we can support that in NumPy. Threading is the word that PDL uses to describe "broadcasting" --- but it does more than ufuncs. > > Here is some information on it. > > I've read this document before, and recall not seeing anything that was more than broadcasting. The only notable difference from broadcasting was that threading adds axis padding to the right instead of to the left. > > -Mark > > > -Travis > > > > On Feb 1, 2012, at 8:31 PM, Mark Wiebe wrote: > >> On Wed, Feb 1, 2012 at 6:14 PM, Travis Oliphant wrote: >> >> On Feb 1, 2012, at 7:04 PM, Mark Wiebe wrote: >> >>> On Wed, Feb 1, 2012 at 3:29 PM, Charles R Harris wrote: >>> The macro PyArray_RemoveLargest has been replaced by PyArray_RemoveSmallest (which seems strange), but I wonder if this documentation still makes sense. >>> >>> My impression about this code is that it went through a number of rounds trying to choose an iteration order heuristic that has improved performance over C-order. The change of Largest to Smallest probably reflects one of these heuristic changes. I think it's safe to say that the nditer introduced in 1.6 completely removes the need for this functionality. I did a grep for this function in the master branch, and it is no longer used by NumPy internally. >> >> There is a common need to iterate over all but one dimension of a NumPy array. The final dimension is iterated over in an "internal" loop. This is the essence of how ufuncs work and avoid the possibly expensive overhead of a C-call during each iteration. >> >> This use-case is handled by the flag NPY_ITER_EXTERNAL_LOOP (http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#NPY_ITER_EXTERNAL_LOOP) in the nditer. >> >> Initially, it seemed prudent to remove the dimension that had the largest size (so that the final inner-iteration was the largest number). Later, timings revealed that that the 'inner' dimension should be the one with the smallest *striding*. I have not looked at nditer in detail, but would appreciate seeing an explanation of how the nditer approach removes the need for this macro. When that is clear, then this macro can and should be deprecated. >> >> To see the full list of what to use in the nditer versus the older iterators, I created a table: >> >> http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#converting-from-previous-numpy-iterators >> >> Only PyArray_BroadcastToShape and PyArray_MultiIter_NEXTi don't have a nice correspondence, because they refer to implementation details in the previous iterators which are done differently in the nditer. >> >> -Mark >> >> >> -Travis >> >> >> >> >>> >>> -Mark >>> >>> diff --git a/doc/source/user/c-info.beyond-basics.rst b/doc/source/user/c-info.beyond-basics.rs >>> index 9ed2ab3..3437985 100644 >>> --- a/doc/source/user/c-info.beyond-basics.rst >>> +++ b/doc/source/user/c-info.beyond-basics.rst >>> @@ -189,7 +189,7 @@ feature follows. >>> PyArray_MultiIter_NEXT(mobj); >>> } >>> >>> -The function :cfunc:`PyArray_RemoveLargest` ( ``multi`` ) can be used to >>> +The function :cfunc:`PyArray_RemoveSmallest` ( ``multi`` ) can be used to >>> take a multi-iterator object and adjust all the iterators so that >>> iteration does not take place over the largest dimension (it makes >>> that dimension of size 1). The code being looped over that makes use >>> >>> Chuck >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From mesanthu at gmail.com Fri Feb 3 05:16:52 2012 From: mesanthu at gmail.com (santhu kumar) Date: Fri, 3 Feb 2012 04:16:52 -0600 Subject: [Numpy-discussion] Trick for fast Message-ID: Hello all, I have tried to optimize most of my code but this ones seems to be the major bottleneck as it gets called many times. I have run out of ideas to make it more faster, can somebody please help me here. x = nX3 vector. mass = nX1 vector inert = zeros((3,3)) for i in range(n): ri = x[i,:].reshape(1,3) inert = inert + mass[i,]*(sum(ri*ri)*eye(3) - dot(ri.T,ri)) Any pythonic or numpy technique to make it faster. What I can think of is a sloppy nX9 matrix creation and then performing row wise summation and reshaping it. But is there any better way. Thanks santhosh -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Fri Feb 3 08:44:54 2012 From: alan.isaac at gmail.com (Alan G Isaac) Date: Fri, 03 Feb 2012 08:44:54 -0500 Subject: [Numpy-discussion] Trick for fast In-Reply-To: References: Message-ID: <4F2BE4D6.1020703@gmail.com> On 2/3/2012 5:16 AM, santhu kumar wrote: > x = nX3 vector. > mass = nX1 vector > inert = zeros((3,3)) > for i in range(n): > ri = x[i,:].reshape(1,3) > inert = inert + mass[i,]*(sum(ri*ri)*eye(3) - dot(ri.T,ri)) > This should buy you a bit. xdot = (x*x).sum(axis=1) for (massi,xi,xdoti) in zip(mass.flat,x,xdot): temp = -np.outer(xi,xi) temp.flat[slice(0,None,4)] += xdoti inert += massi*temp Alan Isaac From josef.pktd at gmail.com Fri Feb 3 09:10:28 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 3 Feb 2012 09:10:28 -0500 Subject: [Numpy-discussion] Trick for fast In-Reply-To: <4F2BE4D6.1020703@gmail.com> References: <4F2BE4D6.1020703@gmail.com> Message-ID: On Fri, Feb 3, 2012 at 8:44 AM, Alan G Isaac wrote: > On 2/3/2012 5:16 AM, santhu kumar wrote: >> x = nX3 vector. >> mass = nX1 vector >> inert = zeros((3,3)) >> for i in range(n): >> ? ? ? ?ri = x[i,:].reshape(1,3) >> ? ? ? ?inert = inert + mass[i,]*(sum(ri*ri)*eye(3) - dot(ri.T,ri)) >> > > > This should buy you a bit. > > xdot = (x*x).sum(axis=1) > for (massi,xi,xdoti) in zip(mass.flat,x,xdot): > ? ? ? temp = -np.outer(xi,xi) > ? ? ? temp.flat[slice(0,None,4)] += xdoti > ? ? ? inert += massi*temp > > Alan Isaac maybe something like this, (self contained example and name spaces to make running it easier) import numpy as np n = 15 x = np.arange(n*3.).reshape(-1,3) #nX3 vector. mass = np.linspace(1,2,n)[:,None] #nX1 vector inert = np.zeros((3,3)) for i in range(n): ri = x[i,:].reshape(1,3) inert = inert + mass[i,]*(sum(ri*ri)*np.eye(3) - np.dot(ri.T,ri)) print inert print np.diag((mass * x**2).sum(0)) - np.dot(x.T, mass*x) [[ 0. -16755. -17287.5] [-16755. 0. -17865. ] [-17287.5 -17865. 0. ]] [[ 0. -16755. -17287.5] [-16755. 0. -17865. ] [-17287.5 -17865. 0. ]] Josef > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From gammelmark at gmail.com Fri Feb 3 09:43:25 2012 From: gammelmark at gmail.com (=?ISO-8859-1?Q?S=F8ren_Gammelmark?=) Date: Fri, 3 Feb 2012 15:43:25 +0100 Subject: [Numpy-discussion] Trick for fast In-Reply-To: References: <4F2BE4D6.1020703@gmail.com> Message-ID: What about this? A = einsum("i,ij->", mass, x ** 2) B = einsum("i,ij,ik->jk", mass, x, x) I = A * eye(3) - B /S?ren On 3 February 2012 15:10, wrote: > On Fri, Feb 3, 2012 at 8:44 AM, Alan G Isaac wrote: > > On 2/3/2012 5:16 AM, santhu kumar wrote: > >> x = nX3 vector. > >> mass = nX1 vector > >> inert = zeros((3,3)) > >> for i in range(n): > >> ri = x[i,:].reshape(1,3) > >> inert = inert + mass[i,]*(sum(ri*ri)*eye(3) - dot(ri.T,ri)) > >> > > > > > > This should buy you a bit. > > > > xdot = (x*x).sum(axis=1) > > for (massi,xi,xdoti) in zip(mass.flat,x,xdot): > > temp = -np.outer(xi,xi) > > temp.flat[slice(0,None,4)] += xdoti > > inert += massi*temp > > > > Alan Isaac > > maybe something like this, (self contained example and name spaces to > make running it easier) > > import numpy as np > n = 15 > x = np.arange(n*3.).reshape(-1,3) #nX3 vector. > mass = np.linspace(1,2,n)[:,None] #nX1 vector > inert = np.zeros((3,3)) > for i in range(n): > ri = x[i,:].reshape(1,3) > inert = inert + mass[i,]*(sum(ri*ri)*np.eye(3) - np.dot(ri.T,ri)) > print inert > > print np.diag((mass * x**2).sum(0)) - np.dot(x.T, mass*x) > > [[ 0. -16755. -17287.5] > [-16755. 0. -17865. ] > [-17287.5 -17865. 0. ]] > [[ 0. -16755. -17287.5] > [-16755. 0. -17865. ] > [-17287.5 -17865. 0. ]] > > Josef > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Fri Feb 3 10:14:04 2012 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 03 Feb 2012 16:14:04 +0100 Subject: [Numpy-discussion] Trick for fast In-Reply-To: References: <4F2BE4D6.1020703@gmail.com> Message-ID: <1328282044.2830.9.camel@sebastian-laptop> I guess Einsum is much cleaner, but I already had started with this and maybe someone likes it, this is fully vectorized and uses a bit of funny stuff too: # The dot product(s), written using broadcasting rules: a = -(x.reshape(-1,1,3) * x[...,None]) # Magic, to avoid the eye thing, takes all diagonal elements as view, maybe there is a cooler way for it: diagonals = np.lib.stride_tricks.as_strided(a, (a.shape[0], 3), (a.dtype.itemsize*9, a.dtype.itemsize*4)) # Add the x**2 (s is a view on the diagonals), the sum is broadcasted. diagonals += (sum(x**2, 1))[:,None] # And multiply by mass using broadcasting: a *= mass[...,None] # And sum up all the intermediat results: inert = a.sum(0) print inert Regards, Sebastian On Fri, 2012-02-03 at 15:43 +0100, S?ren Gammelmark wrote: > What about this? > > > A = einsum("i,ij->", mass, x ** 2) > B = einsum("i,ij,ik->jk", mass, x, x) > I = A * eye(3) - B > > > /S?ren > > On 3 February 2012 15:10, wrote: > On Fri, Feb 3, 2012 at 8:44 AM, Alan G Isaac > wrote: > > On 2/3/2012 5:16 AM, santhu kumar wrote: > >> x = nX3 vector. > >> mass = nX1 vector > >> inert = zeros((3,3)) > >> for i in range(n): > >> ri = x[i,:].reshape(1,3) > >> inert = inert + mass[i,]*(sum(ri*ri)*eye(3) - > dot(ri.T,ri)) > >> > > > > > > This should buy you a bit. > > > > xdot = (x*x).sum(axis=1) > > for (massi,xi,xdoti) in zip(mass.flat,x,xdot): > > temp = -np.outer(xi,xi) > > temp.flat[slice(0,None,4)] += xdoti > > inert += massi*temp > > > > Alan Isaac > > > maybe something like this, (self contained example and name > spaces to > make running it easier) > > import numpy as np > n = 15 > x = np.arange(n*3.).reshape(-1,3) #nX3 vector. > mass = np.linspace(1,2,n)[:,None] #nX1 vector > inert = np.zeros((3,3)) > for i in range(n): > ri = x[i,:].reshape(1,3) > > inert = inert + mass[i,]*(sum(ri*ri)*np.eye(3) - > np.dot(ri.T,ri)) > print inert > > print np.diag((mass * x**2).sum(0)) - np.dot(x.T, mass*x) > > [[ 0. -16755. -17287.5] > [-16755. 0. -17865. ] > [-17287.5 -17865. 0. ]] > [[ 0. -16755. -17287.5] > [-16755. 0. -17865. ] > [-17287.5 -17865. 0. ]] > > Josef > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From mesanthu at gmail.com Fri Feb 3 13:29:26 2012 From: mesanthu at gmail.com (santhu kumar) Date: Fri, 3 Feb 2012 12:29:26 -0600 Subject: [Numpy-discussion] Trick for fast Message-ID: Hello all, Thanks for lovely solutions. I have sat on it for some time and wrote it myself : n =x.shape[0] ea = np.array([1,0,0,0,1,0,0,0,1]) inert = ((np.tile(ea,(n,1))*((x*x).sum(axis=1)[:,np.newaxis]) - np.hstack([x*x[:,0][:,np.newaxis],x*x[:,1][:,np.newaxis],x*x[:,2][:,np.newaxis]]))*mass[:,np.newaxis]).sum(axis=0) inert.shape = 3,3 Does the trick and reduces the time from over 45 secs to 3 secs. I do want to try einsum but my numpy is little old and it does not have it. Thanks Sebastian (it was tricky to understand your code for me) and Josef (clean). On Fri, Feb 3, 2012 at 12:00 PM, wrote: > Send NumPy-Discussion mailing list submissions to > numpy-discussion at scipy.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://mail.scipy.org/mailman/listinfo/numpy-discussion > or, via email, send a message with subject or body 'help' to > numpy-discussion-request at scipy.org > > You can reach the person managing the list at > numpy-discussion-owner at scipy.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of NumPy-Discussion digest..." > > > Today's Topics: > > 1. Re: Trick for fast (josef.pktd at gmail.com) > 2. Re: Trick for fast (S?ren Gammelmark) > 3. Re: Trick for fast (Sebastian Berg) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 3 Feb 2012 09:10:28 -0500 > From: josef.pktd at gmail.com > Subject: Re: [Numpy-discussion] Trick for fast > To: Discussion of Numerical Python > Message-ID: > > > Content-Type: text/plain; charset=ISO-8859-1 > > On Fri, Feb 3, 2012 at 8:44 AM, Alan G Isaac wrote: > > On 2/3/2012 5:16 AM, santhu kumar wrote: > >> x = nX3 vector. > >> mass = nX1 vector > >> inert = zeros((3,3)) > >> for i in range(n): > >> ? ? ? ?ri = x[i,:].reshape(1,3) > >> ? ? ? ?inert = inert + mass[i,]*(sum(ri*ri)*eye(3) - dot(ri.T,ri)) > >> > > > > > > This should buy you a bit. > > > > xdot = (x*x).sum(axis=1) > > for (massi,xi,xdoti) in zip(mass.flat,x,xdot): > > ? ? ? temp = -np.outer(xi,xi) > > ? ? ? temp.flat[slice(0,None,4)] += xdoti > > ? ? ? inert += massi*temp > > > > Alan Isaac > > maybe something like this, (self contained example and name spaces to > make running it easier) > > import numpy as np > n = 15 > x = np.arange(n*3.).reshape(-1,3) #nX3 vector. > mass = np.linspace(1,2,n)[:,None] #nX1 vector > inert = np.zeros((3,3)) > for i in range(n): > ri = x[i,:].reshape(1,3) > inert = inert + mass[i,]*(sum(ri*ri)*np.eye(3) - np.dot(ri.T,ri)) > print inert > > print np.diag((mass * x**2).sum(0)) - np.dot(x.T, mass*x) > > [[ 0. -16755. -17287.5] > [-16755. 0. -17865. ] > [-17287.5 -17865. 0. ]] > [[ 0. -16755. -17287.5] > [-16755. 0. -17865. ] > [-17287.5 -17865. 0. ]] > > Josef > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > ------------------------------ > > Message: 2 > Date: Fri, 3 Feb 2012 15:43:25 +0100 > From: S?ren Gammelmark > Subject: Re: [Numpy-discussion] Trick for fast > To: Discussion of Numerical Python > Message-ID: > > > Content-Type: text/plain; charset="iso-8859-1" > > What about this? > > A = einsum("i,ij->", mass, x ** 2) > B = einsum("i,ij,ik->jk", mass, x, x) > I = A * eye(3) - B > > /S?ren > > On 3 February 2012 15:10, wrote: > > > On Fri, Feb 3, 2012 at 8:44 AM, Alan G Isaac > wrote: > > > On 2/3/2012 5:16 AM, santhu kumar wrote: > > >> x = nX3 vector. > > >> mass = nX1 vector > > >> inert = zeros((3,3)) > > >> for i in range(n): > > >> ri = x[i,:].reshape(1,3) > > >> inert = inert + mass[i,]*(sum(ri*ri)*eye(3) - dot(ri.T,ri)) > > >> > > > > > > > > > This should buy you a bit. > > > > > > xdot = (x*x).sum(axis=1) > > > for (massi,xi,xdoti) in zip(mass.flat,x,xdot): > > > temp = -np.outer(xi,xi) > > > temp.flat[slice(0,None,4)] += xdoti > > > inert += massi*temp > > > > > > Alan Isaac > > > > maybe something like this, (self contained example and name spaces to > > make running it easier) > > > > import numpy as np > > n = 15 > > x = np.arange(n*3.).reshape(-1,3) #nX3 vector. > > mass = np.linspace(1,2,n)[:,None] #nX1 vector > > inert = np.zeros((3,3)) > > for i in range(n): > > ri = x[i,:].reshape(1,3) > > inert = inert + mass[i,]*(sum(ri*ri)*np.eye(3) - np.dot(ri.T,ri)) > > print inert > > > > print np.diag((mass * x**2).sum(0)) - np.dot(x.T, mass*x) > > > > [[ 0. -16755. -17287.5] > > [-16755. 0. -17865. ] > > [-17287.5 -17865. 0. ]] > > [[ 0. -16755. -17287.5] > > [-16755. 0. -17865. ] > > [-17287.5 -17865. 0. ]] > > > > Josef > > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at scipy.org > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120203/d1faa546/attachment-0001.html > > ------------------------------ > > Message: 3 > Date: Fri, 03 Feb 2012 16:14:04 +0100 > From: Sebastian Berg > Subject: Re: [Numpy-discussion] Trick for fast > To: Discussion of Numerical Python > Message-ID: <1328282044.2830.9.camel at sebastian-laptop> > Content-Type: text/plain; charset="UTF-8" > > I guess Einsum is much cleaner, but I already had started with this and > maybe someone likes it, this is fully vectorized and uses a bit of funny > stuff too: > > # The dot product(s), written using broadcasting rules: > a = -(x.reshape(-1,1,3) * x[...,None]) > > # Magic, to avoid the eye thing, takes all diagonal elements as view, > maybe there is a cooler way for it: > diagonals = np.lib.stride_tricks.as_strided(a, (a.shape[0], 3), > (a.dtype.itemsize*9, a.dtype.itemsize*4)) > > # Add the x**2 (s is a view on the diagonals), the sum is broadcasted. > diagonals += (sum(x**2, 1))[:,None] > > # And multiply by mass using broadcasting: > a *= mass[...,None] > > # And sum up all the intermediat results: > inert = a.sum(0) > > print inert > > Regards, > > Sebastian > > On Fri, 2012-02-03 at 15:43 +0100, S?ren Gammelmark wrote: > > What about this? > > > > > > A = einsum("i,ij->", mass, x ** 2) > > B = einsum("i,ij,ik->jk", mass, x, x) > > I = A * eye(3) - B > > > > > > /S?ren > > > > On 3 February 2012 15:10, wrote: > > On Fri, Feb 3, 2012 at 8:44 AM, Alan G Isaac > > wrote: > > > On 2/3/2012 5:16 AM, santhu kumar wrote: > > >> x = nX3 vector. > > >> mass = nX1 vector > > >> inert = zeros((3,3)) > > >> for i in range(n): > > >> ri = x[i,:].reshape(1,3) > > >> inert = inert + mass[i,]*(sum(ri*ri)*eye(3) - > > dot(ri.T,ri)) > > >> > > > > > > > > > This should buy you a bit. > > > > > > xdot = (x*x).sum(axis=1) > > > for (massi,xi,xdoti) in zip(mass.flat,x,xdot): > > > temp = -np.outer(xi,xi) > > > temp.flat[slice(0,None,4)] += xdoti > > > inert += massi*temp > > > > > > Alan Isaac > > > > > > maybe something like this, (self contained example and name > > spaces to > > make running it easier) > > > > import numpy as np > > n = 15 > > x = np.arange(n*3.).reshape(-1,3) #nX3 vector. > > mass = np.linspace(1,2,n)[:,None] #nX1 vector > > inert = np.zeros((3,3)) > > for i in range(n): > > ri = x[i,:].reshape(1,3) > > > > inert = inert + mass[i,]*(sum(ri*ri)*np.eye(3) - > > np.dot(ri.T,ri)) > > print inert > > > > print np.diag((mass * x**2).sum(0)) - np.dot(x.T, mass*x) > > > > [[ 0. -16755. -17287.5] > > [-16755. 0. -17865. ] > > [-17287.5 -17865. 0. ]] > > [[ 0. -16755. -17287.5] > > [-16755. 0. -17865. ] > > [-17287.5 -17865. 0. ]] > > > > Josef > > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at scipy.org > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > ------------------------------ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > End of NumPy-Discussion Digest, Vol 65, Issue 11 > ************************************************ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Feb 3 13:51:27 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 3 Feb 2012 13:51:27 -0500 Subject: [Numpy-discussion] Trick for fast In-Reply-To: References: Message-ID: On Fri, Feb 3, 2012 at 1:29 PM, santhu kumar wrote: > Hello all, > > Thanks for lovely solutions. I have sat on it for some time and wrote it > myself : > > n =x.shape[0] > ea = np.array([1,0,0,0,1,0,0,0,1]) > inert = ((np.tile(ea,(n,1))*((x*x).sum(axis=1)[:,np.newaxis]) - > np.hstack([x*x[:,0][:,np.newaxis],x*x[:,1][:,np.newaxis],x*x[:,2][:,np.newaxis]]))*mass[:,np.newaxis]).sum(axis=0) > inert.shape = 3,3 > > Does the trick and reduces the time from over 45 secs to 3 secs. > I do want to try einsum but my numpy is little old and it does not have it. > > Thanks Sebastian (it was tricky to understand your code for me) and Josef > (clean). Isn't the entire substraction of the first term just to set the diagonal of the result to zero. It looks to me now just like the weighted dot product and setting the diagonal to zero. That shouldn't take 3 secs unless you actual dimensions are huge. Josef > > > On Fri, Feb 3, 2012 at 12:00 PM, wrote: >> >> Send NumPy-Discussion mailing list submissions to >> ? ? ? ?numpy-discussion at scipy.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> ? ? ? ?http://mail.scipy.org/mailman/listinfo/numpy-discussion >> or, via email, send a message with subject or body 'help' to >> ? ? ? ?numpy-discussion-request at scipy.org >> >> You can reach the person managing the list at >> ? ? ? ?numpy-discussion-owner at scipy.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of NumPy-Discussion digest..." >> >> >> Today's Topics: >> >> ? 1. Re: Trick for fast (josef.pktd at gmail.com) >> ? 2. Re: Trick for fast (S?ren Gammelmark) >> ? 3. Re: Trick for fast (Sebastian Berg) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Fri, 3 Feb 2012 09:10:28 -0500 >> From: josef.pktd at gmail.com >> Subject: Re: [Numpy-discussion] Trick for fast >> To: Discussion of Numerical Python >> Message-ID: >> >> ? >> Content-Type: text/plain; charset=ISO-8859-1 >> >> >> On Fri, Feb 3, 2012 at 8:44 AM, Alan G Isaac wrote: >> > On 2/3/2012 5:16 AM, santhu kumar wrote: >> >> x = nX3 vector. >> >> mass = nX1 vector >> >> inert = zeros((3,3)) >> >> for i in range(n): >> >> ? ? ? ?ri = x[i,:].reshape(1,3) >> >> ? ? ? ?inert = inert + mass[i,]*(sum(ri*ri)*eye(3) - dot(ri.T,ri)) >> >> >> >> > >> > >> > This should buy you a bit. >> > >> > xdot = (x*x).sum(axis=1) >> > for (massi,xi,xdoti) in zip(mass.flat,x,xdot): >> > ? ? ? temp = -np.outer(xi,xi) >> > ? ? ? temp.flat[slice(0,None,4)] += xdoti >> > ? ? ? inert += massi*temp >> >> > >> > Alan Isaac >> >> maybe something like this, (self contained example and name spaces to >> make running it easier) >> >> import numpy as np >> n = 15 >> x = np.arange(n*3.).reshape(-1,3) #nX3 vector. >> mass = np.linspace(1,2,n)[:,None] #nX1 vector >> inert = np.zeros((3,3)) >> for i in range(n): >> ? ? ?ri = x[i,:].reshape(1,3) >> ? ? ?inert = inert + mass[i,]*(sum(ri*ri)*np.eye(3) - np.dot(ri.T,ri)) >> print inert >> >> print ?np.diag((mass * x**2).sum(0)) ?- np.dot(x.T, mass*x) >> >> [[ ? ? 0. ?-16755. ?-17287.5] >> ?[-16755. ? ? ? 0. ?-17865. ] >> ?[-17287.5 -17865. ? ? ? 0. ]] >> [[ ? ? 0. ?-16755. ?-17287.5] >> ?[-16755. ? ? ? 0. ?-17865. ] >> ?[-17287.5 -17865. ? ? ? 0. ]] >> >> Josef >> >> >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> ------------------------------ >> >> Message: 2 >> Date: Fri, 3 Feb 2012 15:43:25 +0100 >> From: S?ren Gammelmark >> Subject: Re: [Numpy-discussion] Trick for fast >> To: Discussion of Numerical Python >> Message-ID: >> >> ? >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> What about this? >> >> A = einsum("i,ij->", mass, x ** 2) >> B = einsum("i,ij,ik->jk", mass, x, x) >> I = A * eye(3) - B >> >> /S?ren >> >> >> On 3 February 2012 15:10, wrote: >> >> > On Fri, Feb 3, 2012 at 8:44 AM, Alan G Isaac >> > wrote: >> > > On 2/3/2012 5:16 AM, santhu kumar wrote: >> > >> x = nX3 vector. >> > >> mass = nX1 vector >> > >> inert = zeros((3,3)) >> > >> for i in range(n): >> > >> ? ? ? ?ri = x[i,:].reshape(1,3) >> > >> ? ? ? ?inert = inert + mass[i,]*(sum(ri*ri)*eye(3) - dot(ri.T,ri)) >> > >> >> > > >> > > >> > > This should buy you a bit. >> > > >> > > xdot = (x*x).sum(axis=1) >> > > for (massi,xi,xdoti) in zip(mass.flat,x,xdot): >> > > ? ? ? temp = -np.outer(xi,xi) >> > > ? ? ? temp.flat[slice(0,None,4)] += xdoti >> > > ? ? ? inert += massi*temp >> > > >> > > Alan Isaac >> > >> > maybe something like this, (self contained example and name spaces to >> > make running it easier) >> > >> > import numpy as np >> > n = 15 >> > x = np.arange(n*3.).reshape(-1,3) #nX3 vector. >> > mass = np.linspace(1,2,n)[:,None] #nX1 vector >> > inert = np.zeros((3,3)) >> > for i in range(n): >> > ? ? ?ri = x[i,:].reshape(1,3) >> > ? ? ? inert = inert + mass[i,]*(sum(ri*ri)*np.eye(3) - np.dot(ri.T,ri)) >> > print inert >> > >> > print ?np.diag((mass * x**2).sum(0)) ?- np.dot(x.T, mass*x) >> > >> > [[ ? ? 0. ?-16755. ?-17287.5] >> > ?[-16755. ? ? ? 0. ?-17865. ] >> > ?[-17287.5 -17865. ? ? ? 0. ]] >> > [[ ? ? 0. ?-16755. ?-17287.5] >> > ?[-16755. ? ? ? 0. ?-17865. ] >> > ?[-17287.5 -17865. ? ? ? 0. ]] >> > >> > Josef >> > >> > >> > > >> > > _______________________________________________ >> > > NumPy-Discussion mailing list >> > > NumPy-Discussion at scipy.org >> > > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: >> http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120203/d1faa546/attachment-0001.html >> >> ------------------------------ >> >> Message: 3 >> Date: Fri, 03 Feb 2012 16:14:04 +0100 >> From: Sebastian Berg >> Subject: Re: [Numpy-discussion] Trick for fast >> To: Discussion of Numerical Python >> Message-ID: <1328282044.2830.9.camel at sebastian-laptop> >> Content-Type: text/plain; charset="UTF-8" >> >> >> I guess Einsum is much cleaner, but I already had started with this and >> maybe someone likes it, this is fully vectorized and uses a bit of funny >> stuff too: >> >> # The dot product(s), written using broadcasting rules: >> a = -(x.reshape(-1,1,3) * x[...,None]) >> >> # Magic, to avoid the eye thing, takes all diagonal elements as view, >> maybe there is a cooler way for it: >> diagonals = np.lib.stride_tricks.as_strided(a, (a.shape[0], 3), >> (a.dtype.itemsize*9, a.dtype.itemsize*4)) >> >> # Add the x**2 (s is a view on the diagonals), the sum is broadcasted. >> diagonals += (sum(x**2, 1))[:,None] >> >> # And multiply by mass using broadcasting: >> a *= mass[...,None] >> >> # And sum up all the intermediat results: >> inert = a.sum(0) >> >> print inert >> >> Regards, >> >> Sebastian >> >> On Fri, 2012-02-03 at 15:43 +0100, S?ren Gammelmark wrote: >> > What about this? >> > >> > >> > A = einsum("i,ij->", mass, x ** 2) >> > B = einsum("i,ij,ik->jk", mass, x, x) >> > I = A * eye(3) - B >> > >> > >> > /S?ren >> >> > >> > On 3 February 2012 15:10, wrote: >> > ? ? ? ? On Fri, Feb 3, 2012 at 8:44 AM, Alan G Isaac >> > ? ? ? ? wrote: >> > ? ? ? ? > On 2/3/2012 5:16 AM, santhu kumar wrote: >> > ? ? ? ? >> x = nX3 vector. >> > ? ? ? ? >> mass = nX1 vector >> > ? ? ? ? >> inert = zeros((3,3)) >> > ? ? ? ? >> for i in range(n): >> > ? ? ? ? >> ? ? ? ?ri = x[i,:].reshape(1,3) >> > ? ? ? ? >> ? ? ? ?inert = inert + mass[i,]*(sum(ri*ri)*eye(3) - >> > ? ? ? ? dot(ri.T,ri)) >> > ? ? ? ? >> >> > ? ? ? ? > >> > ? ? ? ? > >> > ? ? ? ? > This should buy you a bit. >> > ? ? ? ? > >> > ? ? ? ? > xdot = (x*x).sum(axis=1) >> > ? ? ? ? > for (massi,xi,xdoti) in zip(mass.flat,x,xdot): >> > ? ? ? ? > ? ? ? temp = -np.outer(xi,xi) >> > ? ? ? ? > ? ? ? temp.flat[slice(0,None,4)] += xdoti >> > ? ? ? ? > ? ? ? inert += massi*temp >> > ? ? ? ? > >> > ? ? ? ? > Alan Isaac >> > >> > >> > ? ? ? ? maybe something like this, (self contained example and name >> > ? ? ? ? spaces to >> > ? ? ? ? make running it easier) >> > >> > ? ? ? ? import numpy as np >> > ? ? ? ? n = 15 >> > ? ? ? ? x = np.arange(n*3.).reshape(-1,3) #nX3 vector. >> > ? ? ? ? mass = np.linspace(1,2,n)[:,None] #nX1 vector >> > ? ? ? ? inert = np.zeros((3,3)) >> > ? ? ? ? for i in range(n): >> > ? ? ? ? ? ? ?ri = x[i,:].reshape(1,3) >> > >> > ? ? ? ? ? ? ?inert = inert + mass[i,]*(sum(ri*ri)*np.eye(3) - >> > ? ? ? ? np.dot(ri.T,ri)) >> > ? ? ? ? print inert >> > >> > ? ? ? ? print ?np.diag((mass * x**2).sum(0)) ?- np.dot(x.T, mass*x) >> > >> > ? ? ? ? [[ ? ? 0. ?-16755. ?-17287.5] >> > ? ? ? ? ?[-16755. ? ? ? 0. ?-17865. ] >> > ? ? ? ? ?[-17287.5 -17865. ? ? ? 0. ]] >> > ? ? ? ? [[ ? ? 0. ?-16755. ?-17287.5] >> > ? ? ? ? ?[-16755. ? ? ? 0. ?-17865. ] >> > ? ? ? ? ?[-17287.5 -17865. ? ? ? 0. ]] >> > >> > ? ? ? ? Josef >> > >> > >> > ? ? ? ? > >> > ? ? ? ? > _______________________________________________ >> > ? ? ? ? > NumPy-Discussion mailing list >> > ? ? ? ? > NumPy-Discussion at scipy.org >> > ? ? ? ? > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > ? ? ? ? _______________________________________________ >> > ? ? ? ? NumPy-Discussion mailing list >> > ? ? ? ? NumPy-Discussion at scipy.org >> > ? ? ? ? http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> >> ------------------------------ >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> End of NumPy-Discussion Digest, Vol 65, Issue 11 >> ************************************************ > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From mesanthu at gmail.com Fri Feb 3 13:58:43 2012 From: mesanthu at gmail.com (santhu kumar) Date: Fri, 3 Feb 2012 12:58:43 -0600 Subject: [Numpy-discussion] Trick for fast Message-ID: Hi Josef, I am unclear on what you want to say, but all I am doing in the code is getting inertia tensor for a bunch of particle masses. (http://en.wikipedia.org/wiki/Moment_of_inertia#Moment_of_inertia_tensor) So the diagonals are not actually zeros but would have z^2 + y^2 .. The reason which I said 3secs could be misunderstood .. This code is called many times over a loop and the bulk time is taken in computing this inertial tensor. After code change, the entire loop finishes off in 3 ses. Thanks for alertness, Santhosh On Fri, Feb 3, 2012 at 12:47 PM, wrote: > Send NumPy-Discussion mailing list submissions to > numpy-discussion at scipy.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://mail.scipy.org/mailman/listinfo/numpy-discussion > or, via email, send a message with subject or body 'help' to > numpy-discussion-request at scipy.org > > You can reach the person managing the list at > numpy-discussion-owner at scipy.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of NumPy-Discussion digest..." > > > Today's Topics: > > 1. Re: Trick for fast (santhu kumar) > 2. Re: Trick for fast (josef.pktd at gmail.com) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 3 Feb 2012 12:29:26 -0600 > From: santhu kumar > Subject: Re: [Numpy-discussion] Trick for fast > To: numpy-discussion at scipy.org > Message-ID: > > > Content-Type: text/plain; charset="iso-8859-1" > > On Fri, Feb 3, 2012 at 1:29 PM, santhu kumar wrote: > > Hello all, > > > > Thanks for lovely solutions. I have sat on it for some time and wrote it > > myself : > > > > n =x.shape[0] > > ea = np.array([1,0,0,0,1,0,0,0,1]) > > inert = ((np.tile(ea,(n,1))*((x*x).sum(axis=1)[:,np.newaxis]) - > > > np.hstack([x*x[:,0][:,np.newaxis],x*x[:,1][:,np.newaxis],x*x[:,2][:,np.newaxis]]))*mass[:,np.newaxis]).sum(axis=0) > > inert.shape = 3,3 > > > > Does the trick and reduces the time from over 45 secs to 3 secs. > > I do want to try einsum but my numpy is little old and it does not have > it. > > > > Thanks Sebastian (it was tricky to understand your code for me) and Josef > > (clean). > > Isn't the entire substraction of the first term just to set the > diagonal of the result to zero. > > It looks to me now just like the weighted dot product and setting the > diagonal to zero. That shouldn't take 3 secs unless you actual > dimensions are huge. > > Josef > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From howard at renci.org Fri Feb 3 14:02:49 2012 From: howard at renci.org (Howard) Date: Fri, 3 Feb 2012 14:02:49 -0500 Subject: [Numpy-discussion] Masked array elements where mask = True? Message-ID: <4F2C2F59.30204@renci.org> Is there a method that gives an array of all the array indices of a masked array where the mask is True? I've been looking through the docs and don't see it yet... Thanks Howard -- Howard Lander Senior Research Software Developer Renaissance Computing Institute (RENCI) The University of North Carolina at Chapel Hill Duke University North Carolina State University 100 Europa Drive Suite 540 Chapel Hill, NC 27517 919-445-9651 -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Fri Feb 3 14:17:16 2012 From: shish at keba.be (Olivier Delalleau) Date: Fri, 3 Feb 2012 14:17:16 -0500 Subject: [Numpy-discussion] Masked array elements where mask = True? In-Reply-To: <4F2C2F59.30204@renci.org> References: <4F2C2F59.30204@renci.org> Message-ID: numpy.where(x.mask) should do it. -=- Olivier Le 3 f?vrier 2012 14:02, Howard a ?crit : > Is there a method that gives an array of all the array indices of a > masked array where the mask is True? I've been looking through the docs and > don't see it yet... > > Thanks > Howard > -- > Howard Lander > Senior Research Software Developer > Renaissance Computing Institute (RENCI) > The University of North Carolina at Chapel Hill > Duke University > North Carolina State University > 100 Europa Drive > Suite 540 > Chapel Hill, NC 27517 > 919-445-9651 > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From howard at renci.org Fri Feb 3 14:22:32 2012 From: howard at renci.org (Howard) Date: Fri, 3 Feb 2012 14:22:32 -0500 Subject: [Numpy-discussion] Masked array elements where mask = True? In-Reply-To: References: <4F2C2F59.30204@renci.org> Message-ID: <4F2C33F8.7090604@renci.org> Indeed it does! Thanks very much. I was not aware of the numpy where command. Howard On 2/3/12 2:17 PM, Olivier Delalleau wrote: > numpy.where(x.mask) should do it. > > -=- Olivier > > Le 3 f?vrier 2012 14:02, Howard > a ?crit : > > Is there a method that gives an array of all the array indices of > a masked array where the mask is True? I've been looking through > the docs and don't see it yet... > > Thanks > Howard > -- > Howard Lander > Senior Research Software Developer > Renaissance Computing Institute (RENCI) > The University of North Carolina at Chapel Hill > Duke University > North Carolina State University > 100 Europa Drive > Suite 540 > Chapel Hill, NC 27517 > 919-445-9651 > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Howard Lander Senior Research Software Developer Renaissance Computing Institute (RENCI) The University of North Carolina at Chapel Hill Duke University North Carolina State University 100 Europa Drive Suite 540 Chapel Hill, NC 27517 919-445-9651 -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Feb 3 14:33:25 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 3 Feb 2012 14:33:25 -0500 Subject: [Numpy-discussion] Trick for fast In-Reply-To: References: Message-ID: On Fri, Feb 3, 2012 at 1:58 PM, santhu kumar wrote: > Hi Josef, > > I am unclear on what you want to say, but all I am doing in the code is > getting inertia tensor for a bunch of particle masses. > (http://en.wikipedia.org/wiki/Moment_of_inertia#Moment_of_inertia_tensor) > > So the diagonals are not actually zeros but would have z^2 + y^2 .. > The reason which I said 3secs could be misunderstood .. This code is called > many times over a loop and the bulk time is taken in computing this inertial > tensor. > After code change, the entire loop finishes off in 3 ses. ok, I still had a python sum instead of the numpy sum in there (I really really like namespaces :) >>> np.sum(ri*ri) 5549.0 >>> sum(ri*ri) array([ 1764., 1849., 1936.]) this might match better. print (mass * x**2).sum() * np.eye(3) - np.dot(x.T, mass*x) or res = - np.dot(x.T, mass*x) res[np.arange(3), np.arange(3)] += (mass * x**2).sum() print res Josef > > Thanks for alertness, > Santhosh > > On Fri, Feb 3, 2012 at 12:47 PM, wrote: >> >> Send NumPy-Discussion mailing list submissions to >> ? ? ? ?numpy-discussion at scipy.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> ? ? ? ?http://mail.scipy.org/mailman/listinfo/numpy-discussion >> or, via email, send a message with subject or body 'help' to >> ? ? ? ?numpy-discussion-request at scipy.org >> >> You can reach the person managing the list at >> ? ? ? ?numpy-discussion-owner at scipy.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of NumPy-Discussion digest..." >> >> >> Today's Topics: >> >> ? 1. Re: Trick for fast (santhu kumar) >> ? 2. Re: Trick for fast (josef.pktd at gmail.com) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Fri, 3 Feb 2012 12:29:26 -0600 >> From: santhu kumar >> >> Subject: Re: [Numpy-discussion] Trick for fast >> To: numpy-discussion at scipy.org >> Message-ID: >> >> ? >> Content-Type: text/plain; charset="iso-8859-1" >> >> >> On Fri, Feb 3, 2012 at 1:29 PM, santhu kumar wrote: >> > Hello all, >> > >> > Thanks for lovely solutions. I have sat on it for some time and wrote it >> > myself : >> > >> > n =x.shape[0] >> > ea = np.array([1,0,0,0,1,0,0,0,1]) >> > inert = ((np.tile(ea,(n,1))*((x*x).sum(axis=1)[:,np.newaxis]) - >> > >> > np.hstack([x*x[:,0][:,np.newaxis],x*x[:,1][:,np.newaxis],x*x[:,2][:,np.newaxis]]))*mass[:,np.newaxis]).sum(axis=0) >> > inert.shape = 3,3 >> > >> > Does the trick and reduces the time from over 45 secs to 3 secs. >> > I do want to try einsum but my numpy is little old and it does not have >> > it. >> > >> > Thanks Sebastian (it was tricky to understand your code for me) and >> > Josef >> > (clean). >> >> Isn't the entire substraction of the first term just to set the >> diagonal of the result to zero. >> >> It looks to me now just like the weighted dot product and setting the >> diagonal to zero. That shouldn't take 3 secs unless you actual >> dimensions are huge. >> >> Josef >> >> >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Fri Feb 3 15:37:24 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 3 Feb 2012 15:37:24 -0500 Subject: [Numpy-discussion] Trick for fast In-Reply-To: References: Message-ID: On Fri, Feb 3, 2012 at 2:33 PM, wrote: > On Fri, Feb 3, 2012 at 1:58 PM, santhu kumar wrote: >> Hi Josef, >> >> I am unclear on what you want to say, but all I am doing in the code is >> getting inertia tensor for a bunch of particle masses. >> (http://en.wikipedia.org/wiki/Moment_of_inertia#Moment_of_inertia_tensor) >> >> So the diagonals are not actually zeros but would have z^2 + y^2 .. >> The reason which I said 3secs could be misunderstood .. This code is called >> many times over a loop and the bulk time is taken in computing this inertial >> tensor. >> After code change, the entire loop finishes off in 3 ses. > > ok, I still had a python sum instead of the numpy sum in there ?(I > really really like namespaces :) > >>>> np.sum(ri*ri) > 5549.0 >>>> sum(ri*ri) > array([ 1764., ?1849., ?1936.]) > > this might match better. > > print ?(mass * x**2).sum() * np.eye(3) ?- np.dot(x.T, mass*x) > > or > res = - np.dot(x.T, mass*x) > res[np.arange(3), np.arange(3)] += (mass * x**2).sum() > print res if my interpretation is now correct, and because I have seen many traces lately >>> res = - np.dot(x.T, mass*x) >>> res[np.arange(3), np.arange(3)] -= np.trace(res) >>> res array([[ 35752.5, -16755. , -17287.5], [-16755. , 34665. , -17865. ], [-17287.5, -17865. , 33532.5]]) >>> res - inert array([[ 7.27595761e-12, 0.00000000e+00, 3.63797881e-12], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]]) Josef > > Josef >> >> Thanks for alertness, >> Santhosh >> >> On Fri, Feb 3, 2012 at 12:47 PM, wrote: >>> >>> Send NumPy-Discussion mailing list submissions to >>> ? ? ? ?numpy-discussion at scipy.org >>> >>> To subscribe or unsubscribe via the World Wide Web, visit >>> ? ? ? ?http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> or, via email, send a message with subject or body 'help' to >>> ? ? ? ?numpy-discussion-request at scipy.org >>> >>> You can reach the person managing the list at >>> ? ? ? ?numpy-discussion-owner at scipy.org >>> >>> When replying, please edit your Subject line so it is more specific >>> than "Re: Contents of NumPy-Discussion digest..." >>> >>> >>> Today's Topics: >>> >>> ? 1. Re: Trick for fast (santhu kumar) >>> ? 2. Re: Trick for fast (josef.pktd at gmail.com) >>> >>> >>> ---------------------------------------------------------------------- >>> >>> Message: 1 >>> Date: Fri, 3 Feb 2012 12:29:26 -0600 >>> From: santhu kumar >>> >>> Subject: Re: [Numpy-discussion] Trick for fast >>> To: numpy-discussion at scipy.org >>> Message-ID: >>> >>> ? >>> Content-Type: text/plain; charset="iso-8859-1" >>> >>> >>> On Fri, Feb 3, 2012 at 1:29 PM, santhu kumar wrote: >>> > Hello all, >>> > >>> > Thanks for lovely solutions. I have sat on it for some time and wrote it >>> > myself : >>> > >>> > n =x.shape[0] >>> > ea = np.array([1,0,0,0,1,0,0,0,1]) >>> > inert = ((np.tile(ea,(n,1))*((x*x).sum(axis=1)[:,np.newaxis]) - >>> > >>> > np.hstack([x*x[:,0][:,np.newaxis],x*x[:,1][:,np.newaxis],x*x[:,2][:,np.newaxis]]))*mass[:,np.newaxis]).sum(axis=0) >>> > inert.shape = 3,3 >>> > >>> > Does the trick and reduces the time from over 45 secs to 3 secs. >>> > I do want to try einsum but my numpy is little old and it does not have >>> > it. >>> > >>> > Thanks Sebastian (it was tricky to understand your code for me) and >>> > Josef >>> > (clean). >>> >>> Isn't the entire substraction of the first term just to set the >>> diagonal of the result to zero. >>> >>> It looks to me now just like the weighted dot product and setting the >>> diagonal to zero. That shouldn't take 3 secs unless you actual >>> dimensions are huge. >>> >>> Josef >>> >>> >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> From alan.isaac at gmail.com Fri Feb 3 16:49:54 2012 From: alan.isaac at gmail.com (Alan G Isaac) Date: Fri, 03 Feb 2012 16:49:54 -0500 Subject: [Numpy-discussion] Trick for fast In-Reply-To: References: Message-ID: <4F2C5682.9000504@gmail.com> On 2/3/2012 3:37 PM, josef.pktd at gmail.com wrote: > res = - np.dot(x.T, mass*x) > res[np.arange(3), np.arange(3)] -= np.trace(res) Nice! Get some speed gain with slicing: res = - np.dot(x.T, mass*x) res.flat[slice(0,None,4)] -= np.trace(res) Alan From josef.pktd at gmail.com Fri Feb 3 16:56:42 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 3 Feb 2012 16:56:42 -0500 Subject: [Numpy-discussion] Trick for fast In-Reply-To: <4F2C5682.9000504@gmail.com> References: <4F2C5682.9000504@gmail.com> Message-ID: On Fri, Feb 3, 2012 at 4:49 PM, Alan G Isaac wrote: > On 2/3/2012 3:37 PM, josef.pktd at gmail.com wrote: >> res = - np.dot(x.T, mass*x) >> res[np.arange(3), np.arange(3)] -= np.trace(res) > > > Nice! > Get some speed gain with slicing: > > res = - np.dot(x.T, mass*x) > res.flat[slice(0,None,4)] -= np.trace(res) Actually, I thought about the most readable version using diag_indices >>> di = np.diag_indices(3) >>> res = - np.dot(x.T, mass*x) >>> res[di] -= np.trace(res) (but I thought I said already enough) Josef > > Alan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ralf.gommers at googlemail.com Sat Feb 4 10:55:42 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 4 Feb 2012 16:55:42 +0100 Subject: [Numpy-discussion] Moving to gcc 4.* for win32 installers ? In-Reply-To: References: Message-ID: On Wed, Dec 14, 2011 at 6:50 PM, Ralf Gommers wrote: > > > On Wed, Dec 14, 2011 at 3:04 PM, David Cournapeau wrote: > >> On Tue, Dec 13, 2011 at 3:43 PM, Ralf Gommers >> wrote: >> > On Sun, Oct 30, 2011 at 12:18 PM, David Cournapeau >> > wrote: >> >> >> >> On Thu, Oct 27, 2011 at 5:19 PM, Ralf Gommers >> >> wrote: >> >> > Hi David, >> >> > >> >> > On Thu, Oct 27, 2011 at 3:02 PM, David Cournapeau < >> cournape at gmail.com> >> >> > wrote: >> >> >> >> >> >> Hi, >> >> >> >> >> >> I was wondering if we could finally move to a more recent version of >> >> >> compilers for official win32 installers. This would of course >> concern >> >> >> the next release cycle, not the ones where beta/rc are already in >> >> >> progress. >> >> >> >> >> >> Basically, the pros: >> >> >> - we will have to move at some point >> >> >> - gcc 4.* seem less buggy, especially C++ and fortran. >> >> >> - no need to maintain msvcr90 vodoo >> >> >> The cons: >> >> >> - it will most likely break the ABI >> >> >> - we need to recompile atlas (but I can take care of it) >> >> >> - the biggest: it is difficult to combine gfortran with visual >> >> >> studio (more exactly you cannot link gfortran runtime to a visual >> >> >> studio executable). The only solution I could think of would be to >> >> >> recompile the gfortran runtime with Visual Studio, which for some >> >> >> reason does not sound very appealing :) >> >> > >> >> > To get the datetime changes to work with MinGW, we already concluded >> >> > that >> >> > building with 4.x is more or less required (without recognizing some >> of >> >> > the >> >> > points you list above). Changes to mingw32ccompiler to fix >> compilation >> >> > with >> >> > 4.x went in in https://github.com/numpy/numpy/pull/156. It would be >> good >> >> > if >> >> > you could check those. >> >> >> >> I will look into it more carefully, but overall, it seems that >> >> building atlas 3.8.4, numpy and scipy with gcc 4.x works quite well. >> >> The main issue is that gcc 4.* adds some dependencies on mingw dlls. >> >> There are two options: >> >> - adding the dlls in the installers >> >> - statically linking those, which seems to be a bad idea >> >> (generalizing the dll boundaries problem to exception and things we >> >> would rather not care about: >> >> http://cygwin.com/ml/cygwin/2007-06/msg00332.html). >> >> >> >> > It probably makes sense make this move for numpy 1.7. If this breaks >> the >> >> > ABI >> >> > then it would be easiest to make numpy 1.7 the minimum required >> version >> >> > for >> >> > scipy 0.11. >> >> >> >> My thinking as well. >> >> >> > >> > Hi David, what is the current status of this issue? I kind of forgot >> this is >> > a prerequisite for the next release when starting the 1.7.0 release >> thread. >> >> The only issue at this point is the distribution of mingw dlls. I have >> not found a way to do it nicely (where nicely means something that is >> distributed within numpy package). Given that those dlls are actually >> versioned and seem to have a strong versioning policy, maybe we can >> just install them inside the python installation ? >> >> Although not ideal, I don't have a problem with that in principle. > However, wouldn't it break installing without admin rights if Python is > installed by the admin? > David, do you have any more thoughts on this? Is there a final solution in sight? Anything I or anyone else can do to help? Thanks, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sat Feb 4 11:07:41 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 4 Feb 2012 17:07:41 +0100 Subject: [Numpy-discussion] dtype related deprecations In-Reply-To: References: Message-ID: On Wed, Dec 28, 2011 at 2:58 PM, Ralf Gommers wrote: > Hi, > > I'm having some trouble cleaning up tests to deal with these two > deprecations: > > DeprecationWarning: Setting NumPy dtype names is deprecated, the dtype > will become immutable in a future version > DeprecationWarning: DType strings 'O4' and 'O8' are deprecated because > they are platform specific. Use 'O' instead > > They seem fairly invasive, judged by the test noise in both numpy and > scipy. Record arrays rely on setting dtype names. There are tests for > picking and the buffer protocol that generate warnings for O4/O8. Can > anyone comment on the necessity of these deprecations and how to deal with > them? > > Anyone? This is important for the 1.7.0 release. Also, I don't recall this being discussed. If I'm wrong please point me to the discussion, otherwise some discussion/explanation is in order I think. Thanks, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Feb 4 12:37:33 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 4 Feb 2012 10:37:33 -0700 Subject: [Numpy-discussion] On making Numpy 1.7 a long term support release. Message-ID: Hi All, In the discussion on deprecating old macros in 1.7 as part of pull request 189 the issue of how to move numpy forward and start clearing the decks of accumulated cruft arose. The proposal here is to make the 1.7 a long term support release that we support with bug fixes for the next 2-3 years, and start removing stuff in 1.8 and bump the Python version up to 2.6 (2.7?). The 1.7 release has NA, datetime, and einsum, and I think that is enough to keep folks happy for a while. I don't know what other major features might be on the horizon (labeled arrays?), but 1.7 looks like a good resting place to me. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From npai at uark.edu Sat Feb 4 15:01:24 2012 From: npai at uark.edu (Naresh Pai) Date: Sat, 4 Feb 2012 14:01:24 -0600 Subject: [Numpy-discussion] fast method to to count a particular value in a large matrix Message-ID: I am somewhat new to Python (been coding with Matlab mostly). I am trying to simplify (and expedite) a piece of code that is currently a bottleneck in a larger code. I have a large array (7000 rows x 4500 columns) titled say, abc, and I am trying to find a fast method to count the number of instances of each unique value within it. All unique values are stored in a variable, say, unique_elem. My current code is as follows: import numpy as np #allocate space for storing element count elem_count = zeros((len(unique_elem),1)) #loop through and count number of unique_elem for i in range(len(unique_elem)): elem_count[i]= np.sum(reduce(np.logical_or,(abc== x for x in [unique_elem[i]]))) This loop is bottleneck because I have about 850 unique elements and it takes about 9-10 minutes. Can you suggest a faster way to do this? Thank you, Naresh -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Sat Feb 4 15:35:08 2012 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 4 Feb 2012 14:35:08 -0600 Subject: [Numpy-discussion] fast method to to count a particular value in a large matrix In-Reply-To: References: Message-ID: On Saturday, February 4, 2012, Naresh Pai wrote: > I am somewhat new to Python (been coding with Matlab mostly). I am trying to > simplify (and expedite) a piece of code that is currently a bottleneck in a larger > code. > I have a large array (7000 rows x 4500 columns) titled say, abc, and I am trying > to find a fast method to count the number of instances of each unique value within > it. All unique values are stored in a variable, say, unique_elem. My current code > is as follows: > import numpy as np > #allocate space for storing element count > elem_count = zeros((len(unique_elem),1)) > #loop through and count number of unique_elem > for i in range(len(unique_elem)): > elem_count[i]= np.sum(reduce(np.logical_or,(abc== x for x in [unique_elem[i]]))) > This loop is bottleneck because I have about 850 unique elements and it takes > about 9-10 minutes. Can you suggest a faster way to do this? > Thank you, > Naresh > no.unique() can return indices and reverse indices. It would be trivial to histogram the reverse indices using np.histogram(). Does that help? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jerome.Kieffer at esrf.fr Sat Feb 4 15:42:48 2012 From: Jerome.Kieffer at esrf.fr (Jerome Kieffer) Date: Sat, 4 Feb 2012 21:42:48 +0100 Subject: [Numpy-discussion] fast method to to count a particular value in a large matrix In-Reply-To: References: Message-ID: <20120204214248.8a4382bf.Jerome.Kieffer@esrf.fr> On Sat, 4 Feb 2012 14:35:08 -0600 Benjamin Root wrote: > > no.unique() can return indices and reverse indices. It would be trivial to > histogram the reverse indices using np.histogram(). Even np.histogram(abc,unique_elem) or something like this. Works if unique_elem is ordered. np.histogram(abc,list(unique_elem)+[unique_elem[-1]+1])[0].reshape(-1,1) is 40x faster and gives the same result. -- J?r?me Kieffer Data analysis unit - ESRF From warren.weckesser at enthought.com Sat Feb 4 16:04:51 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Sat, 4 Feb 2012 15:04:51 -0600 Subject: [Numpy-discussion] fast method to to count a particular value in a large matrix In-Reply-To: References: Message-ID: On Sat, Feb 4, 2012 at 2:35 PM, Benjamin Root wrote: > > > On Saturday, February 4, 2012, Naresh Pai wrote: > > I am somewhat new to Python (been coding with Matlab mostly). I am > trying to > > simplify (and expedite) a piece of code that is currently a bottleneck > in a larger > > code. > > I have a large array (7000 rows x 4500 columns) titled say, abc, and I > am trying > > to find a fast method to count the number of instances of each unique > value within > > it. All unique values are stored in a variable, say, unique_elem. My > current code > > is as follows: > > import numpy as np > > #allocate space for storing element count > > elem_count = zeros((len(unique_elem),1)) > > #loop through and count number of unique_elem > > for i in range(len(unique_elem)): > > elem_count[i]= np.sum(reduce(np.logical_or,(abc== x for x > in [unique_elem[i]]))) > > This loop is bottleneck because I have about 850 unique elements and it > takes > > about 9-10 minutes. Can you suggest a faster way to do this? > > Thank you, > > Naresh > > > > no.unique() can return indices and reverse indices. It would be trivial > to histogram the reverse indices using np.histogram(). > > Instead of histogram(), you can use bincount() on the inverse indices: u, inv = np.unique(abc, return_inverse=True) n = np.bincount(inv) u will be an array of the unique elements, and n will be an array of the corresponding number of occurrences. Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: From npai at uark.edu Sat Feb 4 16:20:59 2012 From: npai at uark.edu (Naresh) Date: Sat, 4 Feb 2012 21:20:59 +0000 (UTC) Subject: [Numpy-discussion] fast method to to count a particular value in a large matrix References: Message-ID: Warren Weckesser enthought.com> writes: > > > On Sat, Feb 4, 2012 at 2:35 PM, Benjamin Root ou.edu> wrote: > > > On Saturday, February 4, 2012, Naresh Pai uark.edu> wrote:> I am somewhat new to Python (been coding with Matlab mostly). I am trying to? > > > simplify (and expedite) a piece of code that is currently a bottleneck in a larger? > > code.?> I have a large array (7000 rows x 4500 columns) titled say, abc, and I am trying?> to find a fast method to count the number of instances of each unique value within?> it. All unique values are stored in a variable, say, unique_elem. My current code? > > > > is as follows:> import numpy as np> #allocate space for storing element count> elem_count = zeros((len(unique_elem),1))> #loop through and count number of unique_elem> for i in range(len(unique_elem)): > > > > ? ?elem_count[i]= np.sum(reduce(np.logical_or,(abc== x for x in?[unique_elem[i]])))> This loop is bottleneck because I have about 850 unique elements and it takes?> about 9-10 minutes. Can you suggest a faster way to do this?? > > > > Thank you,> Naresh> > no.unique() can return indices and reverse indices. ?It would be trivial to histogram the reverse indices using np.histogram(). > > > Instead of histogram(), you can use bincount() on the inverse indices:u, inv = np.unique(abc, return_inverse=True)n = np.bincount(inv)u will be an array of the unique elements, and n will be an array of the corresponding number of occurrences.Warren > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > The histogram() solution works perfect since unique_elem is ordered. I appreciate everyone's help. From travis at continuum.io Sat Feb 4 17:03:27 2012 From: travis at continuum.io (Travis Oliphant) Date: Sat, 4 Feb 2012 16:03:27 -0600 Subject: [Numpy-discussion] On making Numpy 1.7 a long term support release. In-Reply-To: References: Message-ID: <41D27F43-E5C6-478C-8A11-297AEC50431D@continuum.io> We are spending a lot of time on NumPy and will be for the next few months. I think that 1.8 will be a better long term release. We need a few more fundamental features yet. Look for a roadmap document for discussion from Mark Wiebe and I within the week about NumPy 1.8 which has a target release of June 2012. Thanks, Travis -- Travis Oliphant (on a mobile) 512-826-7480 On Feb 4, 2012, at 11:37 AM, Charles R Harris wrote: > Hi All, > > In the discussion on deprecating old macros in 1.7 as part of pull request 189 the issue of how to move numpy forward and start clearing the decks of accumulated cruft arose. The proposal here is to make the 1.7 a long term support release that we support with bug fixes for the next 2-3 years, and start removing stuff in 1.8 and bump the Python version up to 2.6 (2.7?). The 1.7 release has NA, datetime, and einsum, and I think that is enough to keep folks happy for a while. I don't know what other major features might be on the horizon (labeled arrays?), but 1.7 looks like a good resting place to me. > > Thoughts? > > Chuck > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sat Feb 4 17:19:46 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 4 Feb 2012 23:19:46 +0100 Subject: [Numpy-discussion] autocorrelation computation performance : use of np.correlate In-Reply-To: References: <4F2984E6.4070005@crans.org> Message-ID: On Thu, Feb 2, 2012 at 1:58 AM, wrote: > On Wed, Feb 1, 2012 at 6:48 PM, Benjamin Root wrote: > > > > > > On Wednesday, February 1, 2012, Pierre Haessig > > > wrote: > >> Hi, > >> > >> [I'm not sure whether this discussion belongs to numpy-discussion or > >> scipy-dev] > >> > >> In day to day time series analysis I regularly need to look at the data > >> autocorrelation ("acorr" or "acf" depending on the software package). > >> The straighforward available function I have is matplotlib.pyplot.acorr. > >> However, for a moderately long time series (say of length 10**5) it > taking a > >> huge time just to just dislays the autocorrelation values within a small > >> range of time lags. > >> The main reason being it is relying on np.correlate(x,x, mode=2) while > >> only a few lags are needed. > >> (I guess mode=2 is an (old fashioned?) way to set mode='full') > >> > >> I know that np.correlate performance issue has been discussed already, > and > >> there is a *somehow* related ticket > >> (http://projects.scipy.org/numpy/ticket/1260). I noticed in the > ticket's > >> change number 2 the following comment by Josef : "Maybe a truncated > >> convolution/correlation would be good". I'll come back to this soon. > >> > >> I made an example script "acf_timing.py" to start my point with some > >> timing data : > >> > >> In Ipython: > >>>>> run acf_timing.py # it imports statsmodel's acf + define 2 other acf > >>>>> implementations + an example data 10**5 samples long > >> > >> %time l,c = mpl_acf(a, 10) > >> CPU times: user 8.69 s, sys: 0.00 s, total: 8.69 s > >> Wall time: 11.18 s # pretty long... > >> > >> %time c = sm_acf(a, 10) > >> CPU times: user 8.76 s, sys: 0.01 s, total: 8.78 s > >> Wall time: 10.79 s # long as well. statsmodel has a similar underlying > >> implementation > >> # > >> > http://statsmodels.sourceforge.net/generated/scikits.statsmodels.tsa.stattools.acf.html#scikits.statsmodels.tsa.stattools.acf > >> > >> #Now, better option : use the fft convolution > >> %time c=sm_acf(a, 10,fft=True) > >> CPU times: user 0.03 s, sys: 0.01 s, total: 0.04 s > >> Wall time: 0.07 s > >> # Fast, but I'm not sure about the memory implication of using fft > though. > >> > >> #The naive option : just compute the acf lags that are needed > >> %time l,c = naive_acf(a, 10) > >> CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s > >> Wall time: 0.01 s > >> # Iterative computation. Pretty silly but very fast > >> # (Now of course, this naive implementation won't scale nicely for a lot > >> of lags) > > I don't think it's silly to have a short python loop, statsmodels > actually uses the loop in the models, for example in yule_walker (and > GLSAR), because in most statistical application I wouldn't expect a > large number of lags. The time series models don't use the acov > directly, but I think most of the time we just loop over the lags. > > >> > >> Now comes (at last) the question : what should be done about this > >> performance issue ? > >> - should there be a truncated np.convolve/np.correlate function, as > Josef > >> suggested ? > >> - or should people in need of autocorrelation find some workarounds > >> because this usecase is not big enough to call for a change in > np.convolve ? > >> > >> I really feel this question is about *where* a change should be > >> implemented (numpy, scipy.signal, maplotlib ?) so that it makes sense > while > >> not breaking 10^10 lines of numpy related code... > >> > >> Best, > >> Pierre > >> > >> > > > > Speaking for matplotlib, the acorr() (and xcorr()) functions in mpl are > > merely a convenience. The proper place for any change would not be mpl > > (although, we would certainly take advantage of any improved acorr() and > > xcorr() that are made available in numpy. > > I also think that numpy or scipy would be the natural candidates for a > correlate that works fast for an intermediate number of desired lags > (but still short compared to length of data). > > scipy.signal is the right place I think. numpy shouldn't grow too many functions like this. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Sat Feb 4 18:24:58 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sat, 4 Feb 2012 15:24:58 -0800 Subject: [Numpy-discussion] dtype related deprecations In-Reply-To: References: Message-ID: On Sat, Feb 4, 2012 at 8:07 AM, Ralf Gommers wrote: > > > On Wed, Dec 28, 2011 at 2:58 PM, Ralf Gommers > wrote: > >> Hi, >> >> I'm having some trouble cleaning up tests to deal with these two >> deprecations: >> >> DeprecationWarning: Setting NumPy dtype names is deprecated, the dtype >> will become immutable in a future version >> DeprecationWarning: DType strings 'O4' and 'O8' are deprecated because >> they are platform specific. Use 'O' instead >> >> They seem fairly invasive, judged by the test noise in both numpy and >> scipy. Record arrays rely on setting dtype names. There are tests for >> picking and the buffer protocol that generate warnings for O4/O8. Can >> anyone comment on the necessity of these deprecations and how to deal with >> them? >> >> Anyone? This is important for the 1.7.0 release. > > Also, I don't recall this being discussed. If I'm wrong please point me to > the discussion, otherwise some discussion/explanation is in order I think. > I've made a pull request for the O4/O8 issue. This one is an obvious bug, and there was no way to fix the bug without breaking backwards compatibility, so I used deprecation as a mechanism to fix it. Read the pull request here: https://github.com/numpy/numpy/pull/193 The names issue is a bit trickier. There has been some back and forth in some tickets, and I recall some discussion on the mailing list, but that may be long ago and without clear resolution. That this should be changed is however very clear to me, because NumPy is violating a definition in the Python documentation of what rules hashable objects should obey. The trouble is that there isn't a convenience API written yet to replace the usage of that mutability. Perhaps the thing to do is comment out the deprecation warning in the source code, and reintroduce it in 1.8 along with a replacement API appropriate for immutable dtypes? -Mark > > Thanks, > Ralf > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Feb 4 19:07:05 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 4 Feb 2012 17:07:05 -0700 Subject: [Numpy-discussion] On making Numpy 1.7 a long term support release. In-Reply-To: <41D27F43-E5C6-478C-8A11-297AEC50431D@continuum.io> References: <41D27F43-E5C6-478C-8A11-297AEC50431D@continuum.io> Message-ID: On Sat, Feb 4, 2012 at 3:03 PM, Travis Oliphant wrote: > We are spending a lot of time on NumPy and will be for the next few > months. I think that 1.8 will be a better long term release. We need a > few more fundamental features yet. > > Look for a roadmap document for discussion from Mark Wiebe and I within > the week about NumPy 1.8 which has a target release of June 2012. > > Looking forward to that document. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Sat Feb 4 23:13:53 2012 From: bsouthey at gmail.com (Bruce Southey) Date: Sat, 4 Feb 2012 22:13:53 -0600 Subject: [Numpy-discussion] On making Numpy 1.7 a long term support release. In-Reply-To: References: <41D27F43-E5C6-478C-8A11-297AEC50431D@continuum.io> Message-ID: On Sat, Feb 4, 2012 at 6:07 PM, Charles R Harris wrote: > > > On Sat, Feb 4, 2012 at 3:03 PM, Travis Oliphant wrote: >> >> We are spending a lot of time on NumPy and will be for the next few >> months. ?I think that 1.8 will be a better long term release. ?We need a few >> more fundamental features yet. >> >> Look for a roadmap document for discussion from Mark Wiebe and I within >> the week about NumPy 1.8 which has a target release of June 2012. >> > > Looking forward to that document. > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > A suitable long term release would include deprecating old macros, datetime and einsum. While I would like to include NA, I am rather concerned with the recent bugs that have been uncovered with it. So I am rather wary of having to forced to backport fixes simply because someone said we would "support with bug fixes for the next 2-3 years". Rather at least clearly indicate that not every fix will be backported. I propose that we use this opportunity end support for Python 2.4 especially since Red Hat Enterprise Linux (RHEL) 4 is February 29th, 2012. According to SourceForge, the last available binary release for Python 2.4 was for numpy 1.2.1 (released 2008-10-29). There is still quite a few downloads (3769) of the Python 2.5 numpy 1,6.1 binary. Bruce From travis at continuum.io Sun Feb 5 01:33:49 2012 From: travis at continuum.io (Travis Oliphant) Date: Sun, 5 Feb 2012 00:33:49 -0600 Subject: [Numpy-discussion] On making Numpy 1.7 a long term support release. In-Reply-To: References: <41D27F43-E5C6-478C-8A11-297AEC50431D@continuum.io> Message-ID: I think supporting Python 2.5 and above is completely fine. I'd even be in favor of bumping up to Python 2.6 for NumPy 1.7 and certainly for NumPy 2.8 -Travis On Feb 4, 2012, at 10:13 PM, Bruce Southey wrote: > On Sat, Feb 4, 2012 at 6:07 PM, Charles R Harris > wrote: >> >> >> On Sat, Feb 4, 2012 at 3:03 PM, Travis Oliphant wrote: >>> >>> We are spending a lot of time on NumPy and will be for the next few >>> months. I think that 1.8 will be a better long term release. We need a few >>> more fundamental features yet. >>> >>> Look for a roadmap document for discussion from Mark Wiebe and I within >>> the week about NumPy 1.8 which has a target release of June 2012. >>> >> >> Looking forward to that document. >> >> Chuck >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > A suitable long term release would include deprecating old macros, > datetime and einsum. While I would like to include NA, I am rather > concerned with the recent bugs that have been uncovered with it. So I > am rather wary of having to forced to backport fixes simply because > someone said we would "support with bug fixes for the next 2-3 years". > Rather at least clearly indicate that not every fix will be > backported. > > I propose that we use this opportunity end support for Python 2.4 > especially since Red Hat Enterprise Linux (RHEL) 4 is February 29th, > 2012. According to SourceForge, the last available binary release for > Python 2.4 was for numpy 1.2.1 (released 2008-10-29). There is still > quite a few downloads (3769) of the Python 2.5 numpy 1,6.1 binary. > > Bruce > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ralf.gommers at googlemail.com Sun Feb 5 02:19:36 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 5 Feb 2012 08:19:36 +0100 Subject: [Numpy-discussion] On making Numpy 1.7 a long term support release. In-Reply-To: References: <41D27F43-E5C6-478C-8A11-297AEC50431D@continuum.io> Message-ID: On Sun, Feb 5, 2012 at 7:33 AM, Travis Oliphant wrote: > I think supporting Python 2.5 and above is completely fine. I'd even be > in favor of bumping up to Python 2.6 for NumPy 1.7 and certainly for NumPy > 2.8 > > +1 for dropping Python 2.5 support also for an LTS release. That will make it a lot easier to use str.format() and the with statement (plus many other things) going forward, without having to think about if your changes can be backported to that LTS release. Ralf > > > On Feb 4, 2012, at 10:13 PM, Bruce Southey wrote: > > > On Sat, Feb 4, 2012 at 6:07 PM, Charles R Harris > > wrote: > >> > >> > >> On Sat, Feb 4, 2012 at 3:03 PM, Travis Oliphant > wrote: > >>> > >>> We are spending a lot of time on NumPy and will be for the next few > >>> months. I think that 1.8 will be a better long term release. We need > a few > >>> more fundamental features yet. > >>> > >>> Look for a roadmap document for discussion from Mark Wiebe and I within > >>> the week about NumPy 1.8 which has a target release of June 2012. > >>> > >> > >> Looking forward to that document. > >> > >> Chuck > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > > > A suitable long term release would include deprecating old macros, > > datetime and einsum. While I would like to include NA, I am rather > > concerned with the recent bugs that have been uncovered with it. So I > > am rather wary of having to forced to backport fixes simply because > > someone said we would "support with bug fixes for the next 2-3 years". > > Rather at least clearly indicate that not every fix will be > > backported. > > > > I propose that we use this opportunity end support for Python 2.4 > > especially since Red Hat Enterprise Linux (RHEL) 4 is February 29th, > > 2012. According to SourceForge, the last available binary release for > > Python 2.4 was for numpy 1.2.1 (released 2008-10-29). There is still > > quite a few downloads (3769) of the Python 2.5 numpy 1,6.1 binary. > > > > Bruce > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Feb 5 02:32:48 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 5 Feb 2012 08:32:48 +0100 Subject: [Numpy-discussion] dtype related deprecations In-Reply-To: References: Message-ID: On Sun, Feb 5, 2012 at 12:24 AM, Mark Wiebe wrote: > > > On Sat, Feb 4, 2012 at 8:07 AM, Ralf Gommers wrote: > >> >> >> On Wed, Dec 28, 2011 at 2:58 PM, Ralf Gommers < >> ralf.gommers at googlemail.com> wrote: >> >>> Hi, >>> >>> I'm having some trouble cleaning up tests to deal with these two >>> deprecations: >>> >>> DeprecationWarning: Setting NumPy dtype names is deprecated, the dtype >>> will become immutable in a future version >>> DeprecationWarning: DType strings 'O4' and 'O8' are deprecated because >>> they are platform specific. Use 'O' instead >>> >>> They seem fairly invasive, judged by the test noise in both numpy and >>> scipy. Record arrays rely on setting dtype names. There are tests for >>> picking and the buffer protocol that generate warnings for O4/O8. Can >>> anyone comment on the necessity of these deprecations and how to deal with >>> them? >>> >>> Anyone? This is important for the 1.7.0 release. >> >> Also, I don't recall this being discussed. If I'm wrong please point me >> to the discussion, otherwise some discussion/explanation is in order I >> think. >> > > I've made a pull request for the O4/O8 issue. This one is an obvious bug, > and there was no way to fix the bug without breaking backwards > compatibility, so I used deprecation as a mechanism to fix it. Read the > pull request here: > > https://github.com/numpy/numpy/pull/193 > > The names issue is a bit trickier. There has been some back and forth in > some tickets, and I recall some discussion on the mailing list, but that > may be long ago and without clear resolution. That this should be changed > is however very clear to me, because NumPy is violating a definition in the > Python documentation of what rules hashable objects should obey. The > trouble is that there isn't a convenience API written yet to replace the > usage of that mutability. Perhaps the thing to do is comment out the > deprecation warning in the source code, and reintroduce it in 1.8 along > with a replacement API appropriate for immutable dtypes? > > That's a good plan. Thanks for the PRs. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Feb 5 06:07:44 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 5 Feb 2012 12:07:44 +0100 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: <4F1955BC.4000207@gmail.com> References: <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io> <4F1955BC.4000207@gmail.com> Message-ID: On Fri, Jan 20, 2012 at 12:53 PM, David Verelst wrote: > I would like to assist on the website. Although I have not made any code > contributions to Numpy/SciPy (yet), I do follow the mailing lists and > try to keep up to date on the scientific python scene. However, I need > to hold my breath until the end of my wind tunnel test campaign mid > February. > > And I do like the sound of the gihub workflow as currently done by the > ipython team. > > Regards, > David > > On 20/01/12 08:49, Scott Sinclair wrote: > > On 19 January 2012 21:48, Fernando Perez wrote: > >> We've moved to the following setup with ipython, which works very well > >> for us so far: > >> > >> 1. ipython.org: Main website with only static content, manged as a > >> repo in github (https://github.com/ipython/ipython-website) and > >> updated with a gh-pages build > >> (https://github.com/ipython/ipython.github.com). > > I like this idea, and to get the ball rolling I've stripped out the > > www directory of the scipy.org-new repo into it's own repository using > > git filter-branch (posted here: > > https://github.com/scottza/scipy_website) and created > > https://github.com/scottza/scottza.github.com. This puts a copy of the > > new scipy website at http://scottza.github.com as a proof of concept. > > Nice! > > Since there seems to be some agreement on rehosting numpy's website on > > github, I'd be happy to do as much of the legwork as I can in getting > > the numpy.scipy.org content hosted at numpy.github.com. I don't have > > permission to create new repos for the Numpy organization, so someone > > would have to create an empty > > https://github.com/numpy/numpy.github.com and give me push permission > > on that repo. > Does it need to be a new repo, or would permissions on https://github.com/numpy/numpy.scipy.org work as well? > > > It would be great to see scipy go the same way and make updating the > > site easier. I know that David Warde-Farley, Pauli and others put in a > > lot of work scraping content off the wiki to produce the new website, > > it would be fantastic to see the fruits of that effort. > Yes it would. I've taken Fernando's suggestion and created a github team "Scipy web team" and given you push-pull permissions for https://github.com/scipy/scipy.org-new Scott. David, once you make a few pull requests please ping us, and we can give you permissions too. Ralf > > Issues with scipy "Trac, the doc editor, and the conference.scipy.org > > and docs.scipy.org" as mentioned by Pauli. There is also the cookbook > > on the wiki to consider (perhaps http://scipy-central.org/ could play > > a role there). > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Feb 5 06:16:40 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 5 Feb 2012 12:16:40 +0100 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: References: <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io> Message-ID: On Fri, Jan 20, 2012 at 8:49 AM, Scott Sinclair wrote: > > Issues with scipy "Trac, the doc editor, and the conference.scipy.org > and docs.scipy.org" as mentioned by Pauli. There is also the cookbook > on the wiki to consider (perhaps http://scipy-central.org/ could play > a role there). > > Cleaning up the Cookbook and putting it on scipy-central.org would be useful. I'd like to understand though where the content of that site resides. Is there a public repo somewhere? Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Feb 5 06:43:41 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 5 Feb 2012 12:43:41 +0100 Subject: [Numpy-discussion] GSOC In-Reply-To: References: Message-ID: On Sun, Jan 15, 2012 at 3:02 AM, Charles R Harris wrote: > > > On Thu, Dec 29, 2011 at 2:36 PM, Ralf Gommers > wrote: > >> >> >> On Thu, Dec 29, 2011 at 9:50 PM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> Hi All, >>> >>> I thought I'd raise this topic just to get some ideas out there. At the >>> moment I see two areas that I'd like to see addressed. >>> >>> >>> 1. Documentation editor. This would involve looking at the generated >>> documentation and it's organization/coverage as well such things as style >>> and maybe reviewing stuff on the documentation site. This would be more >>> technical writing than coding. >>> 2. Test coverage. There are a lot of areas of numpy that are not >>> well tested as well as some tests that are still doc tests and should >>> probably be updated. This is a substantial amount of work and would require >>> some familiarity with numpy as well as a willingness to ping developers for >>> clarification of some topics. >>> >>> Thoughts? >>> >> First thought: very useful, but probably not GSOC topics by themselves. >> >> For a very good student, I'd think topics like implementing NA bit masks >> or improved user-defined dtypes would be interesting. In SciPy there's also >> a lot to do, and that's probably a better project for students who prefer >> to work in Python. >> >> > Besides NA bit masks, the new iterator isn't used in a lot of places it > could be. Maybe replacing all uses of the old iterator? I'll admit, that > smacks more of maintenance than developing new code and might be a hard > sell. > > That does smell like maintenance. I've cleaned up http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas and added the idea on missing data. It would be useful to describe new ideas there well, not as a single bullet. Maybe also add potential mentors? And once we have some more ideas, link to it from scipy.org? Is anyone keeping an eye on the relevant channels to get involved this year? Previously Jarrod was doing that. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.zaffino at yahoo.it Sun Feb 5 09:32:43 2012 From: p.zaffino at yahoo.it (Paolo) Date: Sun, 05 Feb 2012 15:32:43 +0100 Subject: [Numpy-discussion] "ValueError: total size of new array must be unchanged" only on Windows Message-ID: <4F2E930B.9070201@yahoo.it> Hello, I wrote a function that works on a numpy matrix and it works fine on Mac OS and GNU/Linux (I didn't test it on python 3). Now I have a problem with numpy: the same python file doesn't work on Windows (Windows xp, python 2.7 and numpy 2.6.1). I get this error: matrix=matrix.reshape(a, b, c) ValueError: total size of new array must be unchanged Why? Do anyone have an idea about this? Thank you very much. From shish at keba.be Sun Feb 5 10:02:44 2012 From: shish at keba.be (Olivier Delalleau) Date: Sun, 5 Feb 2012 10:02:44 -0500 Subject: [Numpy-discussion] "ValueError: total size of new array must be unchanged" only on Windows In-Reply-To: <4F2E930B.9070201@yahoo.it> References: <4F2E930B.9070201@yahoo.it> Message-ID: It should mean that matrix.size != a * b * c. -=- Olivier Le 5 f?vrier 2012 09:32, Paolo a ?crit : > Hello, > I wrote a function that works on a numpy matrix and it works fine on Mac > OS and GNU/Linux (I didn't test it on python 3). > Now I have a problem with numpy: the same python file doesn't work on > Windows (Windows xp, python 2.7 and numpy 2.6.1). > I get this error: > > matrix=matrix.reshape(a, b, c) > ValueError: total size of new array must be unchanged > > Why? Do anyone have an idea about this? > Thank you very much. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From erin.sheldon at gmail.com Sun Feb 5 10:57:19 2012 From: erin.sheldon at gmail.com (Erin Sheldon) Date: Sun, 05 Feb 2012 10:57:19 -0500 Subject: [Numpy-discussion] dtype related deprecations In-Reply-To: References: Message-ID: <1328457071-sup-3856@rohan> Excerpts from Mark Wiebe's message of Sat Feb 04 18:24:58 -0500 2012: > The names issue is a bit trickier. There has been some back and forth in > some tickets, and I recall some discussion on the mailing list, but that > may be long ago and without clear resolution. That this should be changed > is however very clear to me, because NumPy is violating a definition in the > Python documentation of what rules hashable objects should obey. The > trouble is that there isn't a convenience API written yet to replace the > usage of that mutability. Perhaps the thing to do is comment out the > deprecation warning in the source code, and reintroduce it in 1.8 along > with a replacement API appropriate for immutable dtypes? I think we need *some* way to rename fields without making a copy of the data. I often must read from standardized file formats where I can only change the names after the fact. If we deprecate this I would have to make a copy of the data in memory to alter names, which would be prohibitive for large arrays that use >= half the memory. -e -- Erin Scott Sheldon Brookhaven National Laboratory From travis at continuum.io Sun Feb 5 11:00:47 2012 From: travis at continuum.io (Travis Oliphant) Date: Sun, 5 Feb 2012 10:00:47 -0600 Subject: [Numpy-discussion] dtype related deprecations In-Reply-To: <1328457071-sup-3856@rohan> References: <1328457071-sup-3856@rohan> Message-ID: <3F1734C2-2148-44A5-9117-551AACDA50C8@continuum.io> Fortunately that's not the case. All that Mark is advocating is not allowing changing the *itself* in place. You are still free to change the dtype of the array in order to change the field names without making a data copy. Travis -- Travis Oliphant (on a mobile) 512-826-7480 On Feb 5, 2012, at 9:57 AM, Erin Sheldon wrote: > Excerpts from Mark Wiebe's message of Sat Feb 04 18:24:58 -0500 2012: >> The names issue is a bit trickier. There has been some back and forth in >> some tickets, and I recall some discussion on the mailing list, but that >> may be long ago and without clear resolution. That this should be changed >> is however very clear to me, because NumPy is violating a definition in the >> Python documentation of what rules hashable objects should obey. The >> trouble is that there isn't a convenience API written yet to replace the >> usage of that mutability. Perhaps the thing to do is comment out the >> deprecation warning in the source code, and reintroduce it in 1.8 along >> with a replacement API appropriate for immutable dtypes? > > I think we need *some* way to rename fields without making a copy of the > data. I often must read from standardized file formats where I can only > change the names after the fact. If we deprecate this I would have to > make a copy of the data in memory to alter names, which would be > prohibitive for large arrays that use >= half the memory. > > -e > -- > Erin Scott Sheldon > Brookhaven National Laboratory > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From p.zaffino at yahoo.it Sun Feb 5 11:16:58 2012 From: p.zaffino at yahoo.it (Paolo Zaffino) Date: Sun, 5 Feb 2012 16:16:58 +0000 (GMT) Subject: [Numpy-discussion] R: Re: "ValueError: total size of new array must be unchanged" only on Windows Message-ID: <1328458618.46757.androidMobile@web27404.mail.ukl.yahoo.com> Yes, I understand this but I don't know because on Linux and Mac it works well. If the matrix size is different it should be different indipendently from os type. Am I wrong? Thanks for your support! -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Sun Feb 5 11:21:18 2012 From: shish at keba.be (Olivier Delalleau) Date: Sun, 5 Feb 2012 11:21:18 -0500 Subject: [Numpy-discussion] "ValueError: total size of new array must be unchanged" only on Windows In-Reply-To: <1328458618.46757.androidMobile@web27404.mail.ukl.yahoo.com> References: <1328458618.46757.androidMobile@web27404.mail.ukl.yahoo.com> Message-ID: It means there is some of your code that is not entirely platform-independent. It's not possible to tell you which part because you didn't provide your code. The problem may not even be numpy-related. So you should first look at the current shape of 'matrix', and what are the values of a, b and c, then see where the discrepancy is, and work from there. -=- Olivier Le 5 f?vrier 2012 11:16, Paolo Zaffino a ?crit : > Yes, I understand this but I don't know because on Linux and Mac it works > well. > If the matrix size is different it should be different indipendently from > os type. > Am I wrong? > Thanks for your support! > > ------------------------------ > * From: * Olivier Delalleau ; > * To: * Discussion of Numerical Python ; > * Subject: * Re: [Numpy-discussion] "ValueError: total size of new array > must be unchanged" only on Windows > * Sent: * Sun, Feb 5, 2012 3:02:44 PM > > It should mean that matrix.size != a * b * c. > > -=- Olivier > > Le 5 f?vrier 2012 09:32, Paolo a ?crit : > >> Hello, >> I wrote a function that works on a numpy matrix and it works fine on Mac >> OS and GNU/Linux (I didn't test it on python 3). >> Now I have a problem with numpy: the same python file doesn't work on >> Windows (Windows xp, python 2.7 and numpy 2.6.1). >> I get this error: >> >> matrix=matrix.reshape(a, b, c) >> ValueError: total size of new array must be unchanged >> >> Why? Do anyone have an idea about this? >> Thank you very much. >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.zaffino at yahoo.it Sun Feb 5 12:39:08 2012 From: p.zaffino at yahoo.it (Paolo) Date: Sun, 05 Feb 2012 18:39:08 +0100 Subject: [Numpy-discussion] "ValueError: total size of new array must be unchanged" only on Windows In-Reply-To: References: <1328458618.46757.androidMobile@web27404.mail.ukl.yahoo.com> Message-ID: <4F2EBEBC.8040501@yahoo.it> This is my code: matrix="".join(f.readlines()) matrix=np.fromstring(matrix, dtype=np.int16) matrix=matrix.reshape(siz[2],siz[1],siz[0]).T Il 05/02/2012 17:21, Olivier Delalleau ha scritto: > It means there is some of your code that is not entirely > platform-independent. It's not possible to tell you which part because > you didn't provide your code. The problem may not even be numpy-related. > So you should first look at the current shape of 'matrix', and what > are the values of a, b and c, then see where the discrepancy is, and > work from there. > > -=- Olivier > > Le 5 f?vrier 2012 11:16, Paolo Zaffino > a ?crit : > > Yes, I understand this but I don't know because on Linux and Mac > it works well. > If the matrix size is different it should be different > indipendently from os type. > Am I wrong? > Thanks for your support! > > > ------------------------------------------------------------------------ > *From: * Olivier Delalleau >; > *To: * Discussion of Numerical Python >; > *Subject: * Re: [Numpy-discussion] "ValueError: total size of new > array must be unchanged" only on Windows > *Sent: * Sun, Feb 5, 2012 3:02:44 PM > > It should mean that matrix.size != a * b * c. > > -=- Olivier > > Le 5 f?vrier 2012 09:32, Paolo a ?crit : > > Hello, > I wrote a function that works on a numpy matrix and it works > fine on Mac > OS and GNU/Linux (I didn't test it on python 3). > Now I have a problem with numpy: the same python file doesn't > work on > Windows (Windows xp, python 2.7 and numpy 2.6.1). > I get this error: > > matrix=matrix.reshape(a, b, c) > ValueError: total size of new array must be unchanged > > Why? Do anyone have an idea about this? > Thank you very much. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun Feb 5 12:47:03 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 5 Feb 2012 12:47:03 -0500 Subject: [Numpy-discussion] "ValueError: total size of new array must be unchanged" only on Windows In-Reply-To: <4F2EBEBC.8040501@yahoo.it> References: <1328458618.46757.androidMobile@web27404.mail.ukl.yahoo.com> <4F2EBEBC.8040501@yahoo.it> Message-ID: On Sun, Feb 5, 2012 at 12:39 PM, Paolo wrote: > This is my code: > > matrix="".join(f.readlines()) > my guess would be, that you have to strip the line endings \n versus \r\n Josef > matrix=np.fromstring(matrix, dtype=np.int16) > matrix=matrix.reshape(siz[2],siz[1],siz[0]).T > > > > > Il 05/02/2012 17:21, Olivier Delalleau ha scritto: > > It means there is some of your code that is not entirely > platform-independent. It's not possible to tell you which part because you > didn't provide your code. The problem may not even be numpy-related. > So you should first look at the current shape of 'matrix', and what are > the values of a, b and c, then see where the discrepancy is, and work from > there. > > -=- Olivier > > Le 5 f?vrier 2012 11:16, Paolo Zaffino a ?crit : > >> Yes, I understand this but I don't know because on Linux and Mac it >> works well. >> If the matrix size is different it should be different indipendently from >> os type. >> Am I wrong? >> Thanks for your support! >> >> ------------------------------ >> * From: * Olivier Delalleau ; >> * To: * Discussion of Numerical Python ; >> * Subject: * Re: [Numpy-discussion] "ValueError: total size of new array >> must be unchanged" only on Windows >> * Sent: * Sun, Feb 5, 2012 3:02:44 PM >> >> It should mean that matrix.size != a * b * c. >> >> -=- Olivier >> >> Le 5 f?vrier 2012 09:32, Paolo a ?crit : >> >>> Hello, >>> I wrote a function that works on a numpy matrix and it works fine on Mac >>> OS and GNU/Linux (I didn't test it on python 3). >>> Now I have a problem with numpy: the same python file doesn't work on >>> Windows (Windows xp, python 2.7 and numpy 2.6.1). >>> I get this error: >>> >>> matrix=matrix.reshape(a, b, c) >>> ValueError: total size of new array must be unchanged >>> >>> Why? Do anyone have an idea about this? >>> Thank you very much. >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.zaffino at yahoo.it Sun Feb 5 12:52:28 2012 From: p.zaffino at yahoo.it (Paolo) Date: Sun, 05 Feb 2012 18:52:28 +0100 Subject: [Numpy-discussion] "ValueError: total size of new array must be unchanged" only on Windows In-Reply-To: References: <1328458618.46757.androidMobile@web27404.mail.ukl.yahoo.com> <4F2EBEBC.8040501@yahoo.it> Message-ID: <4F2EC1DC.9090700@yahoo.it> How I can do this? Il 05/02/2012 18:47, josef.pktd at gmail.com ha scritto: > > > On Sun, Feb 5, 2012 at 12:39 PM, Paolo > wrote: > > This is my code: > > matrix="".join(f.readlines()) > > > my guess would be, that you have to strip the line endings \n versus \r\n > > Josef > > matrix=np.fromstring(matrix, dtype=np.int16) > matrix=matrix.reshape(siz[2],siz[1],siz[0]).T > > > > > Il 05/02/2012 17:21, Olivier Delalleau ha scritto: >> It means there is some of your code that is not entirely >> platform-independent. It's not possible to tell you which part >> because you didn't provide your code. The problem may not even be >> numpy-related. >> So you should first look at the current shape of 'matrix', and >> what are the values of a, b and c, then see where the discrepancy >> is, and work from there. >> >> -=- Olivier >> >> Le 5 f?vrier 2012 11:16, Paolo Zaffino > > a ?crit : >> >> Yes, I understand this but I don't know because on Linux and >> Mac it works well. >> If the matrix size is different it should be different >> indipendently from os type. >> Am I wrong? >> Thanks for your support! >> >> >> ------------------------------------------------------------------------ >> *From: * Olivier Delalleau > >; >> *To: * Discussion of Numerical Python >> > >; >> *Subject: * Re: [Numpy-discussion] "ValueError: total size of >> new array must be unchanged" only on Windows >> *Sent: * Sun, Feb 5, 2012 3:02:44 PM >> >> It should mean that matrix.size != a * b * c. >> >> -=- Olivier >> >> Le 5 f?vrier 2012 09:32, Paolo a ?crit : >> >> Hello, >> I wrote a function that works on a numpy matrix and it >> works fine on Mac >> OS and GNU/Linux (I didn't test it on python 3). >> Now I have a problem with numpy: the same python file >> doesn't work on >> Windows (Windows xp, python 2.7 and numpy 2.6.1). >> I get this error: >> >> matrix=matrix.reshape(a, b, c) >> ValueError: total size of new array must be unchanged >> >> Why? Do anyone have an idea about this? >> Thank you very much. >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun Feb 5 13:06:16 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 5 Feb 2012 13:06:16 -0500 Subject: [Numpy-discussion] "ValueError: total size of new array must be unchanged" only on Windows In-Reply-To: <4F2EC1DC.9090700@yahoo.it> References: <1328458618.46757.androidMobile@web27404.mail.ukl.yahoo.com> <4F2EBEBC.8040501@yahoo.it> <4F2EC1DC.9090700@yahoo.it> Message-ID: On Sun, Feb 5, 2012 at 12:52 PM, Paolo wrote: > How I can do this? > > I'm not sure without trying, numpy.loadtxt might be the easier choice matrix="".join((i.strip() for i in f.readlines())) I think strip() also removes newlines besides other whitespace otherwise more explicitly matrix="".join((i.strip(f.newlines) for i in f.readlines())) or open the file with mode 'rU' and strip('\n') Josef > > > > Il 05/02/2012 18:47, josef.pktd at gmail.com ha scritto: > > > > On Sun, Feb 5, 2012 at 12:39 PM, Paolo wrote: > >> This is my code: >> >> matrix="".join(f.readlines()) >> > > my guess would be, that you have to strip the line endings \n versus \r\n > > Josef > > >> matrix=np.fromstring(matrix, dtype=np.int16) >> matrix=matrix.reshape(siz[2],siz[1],siz[0]).T >> >> >> >> >> Il 05/02/2012 17:21, Olivier Delalleau ha scritto: >> >> It means there is some of your code that is not entirely >> platform-independent. It's not possible to tell you which part because you >> didn't provide your code. The problem may not even be numpy-related. >> So you should first look at the current shape of 'matrix', and what are >> the values of a, b and c, then see where the discrepancy is, and work from >> there. >> >> -=- Olivier >> >> Le 5 f?vrier 2012 11:16, Paolo Zaffino a ?crit : >> >>> Yes, I understand this but I don't know because on Linux and Mac it >>> works well. >>> If the matrix size is different it should be different indipendently >>> from os type. >>> Am I wrong? >>> Thanks for your support! >>> >>> ------------------------------ >>> * From: * Olivier Delalleau ; >>> * To: * Discussion of Numerical Python ; >>> * Subject: * Re: [Numpy-discussion] "ValueError: total size of new >>> array must be unchanged" only on Windows >>> * Sent: * Sun, Feb 5, 2012 3:02:44 PM >>> >>> It should mean that matrix.size != a * b * c. >>> >>> -=- Olivier >>> >>> Le 5 f?vrier 2012 09:32, Paolo a ?crit : >>> >>>> Hello, >>>> I wrote a function that works on a numpy matrix and it works fine on Mac >>>> OS and GNU/Linux (I didn't test it on python 3). >>>> Now I have a problem with numpy: the same python file doesn't work on >>>> Windows (Windows xp, python 2.7 and numpy 2.6.1). >>>> I get this error: >>>> >>>> matrix=matrix.reshape(a, b, c) >>>> ValueError: total size of new array must be unchanged >>>> >>>> Why? Do anyone have an idea about this? >>>> Thank you very much. >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>> >>> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > _______________________________________________ > NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at enthought.com Sun Feb 5 13:13:01 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Sun, 5 Feb 2012 12:13:01 -0600 Subject: [Numpy-discussion] "ValueError: total size of new array must be unchanged" only on Windows In-Reply-To: References: <1328458618.46757.androidMobile@web27404.mail.ukl.yahoo.com> <4F2EBEBC.8040501@yahoo.it> <4F2EC1DC.9090700@yahoo.it> Message-ID: On Sun, Feb 5, 2012 at 12:06 PM, wrote: > > > On Sun, Feb 5, 2012 at 12:52 PM, Paolo wrote: > >> How I can do this? >> >> > > I'm not sure without trying, numpy.loadtxt might be the easier choice > > matrix="".join((i.strip() for i in f.readlines())) > > I think strip() also removes newlines besides other whitespace > otherwise more explicitly > matrix="".join((i.strip(f.newlines) for i in f.readlines())) > > or open the file with mode 'rU' and strip('\n') > > Josef > This code: matrix="".join(f.readlines()) matrix=np.fromstring(matrix, dtype=np.int16) matrix=matrix.reshape(siz[2],siz[1],siz[0]).T implies that the data in f is binary, because the 'sep' keyword is not used in the call to np.fromstring. If that is the case, you should not use f.readlines() to read the data. Instead, read it as a single string with f.read(). (Or perhaps read the file with a single call to np.fromfile()). Also be sure that the file was opened in binary mode (i.e. f = open(filename, 'rb')). Warren > > > >> >> >> >> Il 05/02/2012 18:47, josef.pktd at gmail.com ha scritto: >> >> >> >> On Sun, Feb 5, 2012 at 12:39 PM, Paolo wrote: >> >>> This is my code: >>> >>> matrix="".join(f.readlines()) >>> >> >> my guess would be, that you have to strip the line endings \n versus \r\n >> >> Josef >> >> >>> matrix=np.fromstring(matrix, dtype=np.int16) >>> matrix=matrix.reshape(siz[2],siz[1],siz[0]).T >>> >>> >>> >>> >>> Il 05/02/2012 17:21, Olivier Delalleau ha scritto: >>> >>> It means there is some of your code that is not entirely >>> platform-independent. It's not possible to tell you which part because you >>> didn't provide your code. The problem may not even be numpy-related. >>> So you should first look at the current shape of 'matrix', and what are >>> the values of a, b and c, then see where the discrepancy is, and work from >>> there. >>> >>> -=- Olivier >>> >>> Le 5 f?vrier 2012 11:16, Paolo Zaffino a ?crit : >>> >>>> Yes, I understand this but I don't know because on Linux and Mac it >>>> works well. >>>> If the matrix size is different it should be different indipendently >>>> from os type. >>>> Am I wrong? >>>> Thanks for your support! >>>> >>>> ------------------------------ >>>> * From: * Olivier Delalleau ; >>>> * To: * Discussion of Numerical Python ; >>>> * Subject: * Re: [Numpy-discussion] "ValueError: total size of new >>>> array must be unchanged" only on Windows >>>> * Sent: * Sun, Feb 5, 2012 3:02:44 PM >>>> >>>> It should mean that matrix.size != a * b * c. >>>> >>>> -=- Olivier >>>> >>>> Le 5 f?vrier 2012 09:32, Paolo a ?crit : >>>> >>>>> Hello, >>>>> I wrote a function that works on a numpy matrix and it works fine on >>>>> Mac >>>>> OS and GNU/Linux (I didn't test it on python 3). >>>>> Now I have a problem with numpy: the same python file doesn't work on >>>>> Windows (Windows xp, python 2.7 and numpy 2.6.1). >>>>> I get this error: >>>>> >>>>> matrix=matrix.reshape(a, b, c) >>>>> ValueError: total size of new array must be unchanged >>>>> >>>>> Why? Do anyone have an idea about this? >>>>> Thank you very much. >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>> >>>> >>>> >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> >> _______________________________________________ >> NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.zaffino at yahoo.it Sun Feb 5 13:41:28 2012 From: p.zaffino at yahoo.it (Paolo) Date: Sun, 05 Feb 2012 19:41:28 +0100 Subject: [Numpy-discussion] "ValueError: total size of new array must be unchanged" only on Windows In-Reply-To: References: <1328458618.46757.androidMobile@web27404.mail.ukl.yahoo.com> <4F2EBEBC.8040501@yahoo.it> <4F2EC1DC.9090700@yahoo.it> Message-ID: <4F2ECD58.9000005@yahoo.it> I solved using 'rb' instead of 'r' option in the open file task. Thank you very much. Il 05/02/2012 19:13, Warren Weckesser ha scritto: > > > On Sun, Feb 5, 2012 at 12:06 PM, > wrote: > > > > On Sun, Feb 5, 2012 at 12:52 PM, Paolo > wrote: > > How I can do this? > > > I'm not sure without trying, numpy.loadtxt might be the easier choice > > matrix="".join((i.strip() for i in f.readlines())) > > I think strip() also removes newlines besides other whitespace > otherwise more explicitly > matrix="".join((i.strip(f.newlines) for i in f.readlines())) > > or open the file with mode 'rU' and strip('\n') > > Josef > > > > This code: > > matrix="".join(f.readlines()) > matrix=np.fromstring(matrix, dtype=np.int16) > matrix=matrix.reshape(siz[2],siz[1],siz[0]).T > > implies that the data in f is binary, because the 'sep' keyword is not > used in the call to np.fromstring. If that is the case, you should > not use f.readlines() to read the data. Instead, read it as a single > string with f.read(). (Or perhaps read the file with a single call to > np.fromfile()). Also be sure that the file was opened in binary mode > (i.e. f = open(filename, 'rb')). > > Warren > > > > > > > > Il 05/02/2012 18:47, josef.pktd at gmail.com > ha scritto: >> >> >> On Sun, Feb 5, 2012 at 12:39 PM, Paolo > > wrote: >> >> This is my code: >> >> matrix="".join(f.readlines()) >> >> >> my guess would be, that you have to strip the line endings \n >> versus \r\n >> >> Josef >> >> matrix=np.fromstring(matrix, dtype=np.int16) >> matrix=matrix.reshape(siz[2],siz[1],siz[0]).T >> >> >> >> >> Il 05/02/2012 17:21, Olivier Delalleau ha scritto: >>> It means there is some of your code that is not entirely >>> platform-independent. It's not possible to tell you >>> which part because you didn't provide your code. The >>> problem may not even be numpy-related. >>> So you should first look at the current shape of >>> 'matrix', and what are the values of a, b and c, then >>> see where the discrepancy is, and work from there. >>> >>> -=- Olivier >>> >>> Le 5 f?vrier 2012 11:16, Paolo Zaffino >>> > a ?crit : >>> >>> Yes, I understand this but I don't know because on >>> Linux and Mac it works well. >>> If the matrix size is different it should be >>> different indipendently from os type. >>> Am I wrong? >>> Thanks for your support! >>> >>> >>> ------------------------------------------------------------------------ >>> *From: * Olivier Delalleau >> >; >>> *To: * Discussion of Numerical Python >>> >> >; >>> *Subject: * Re: [Numpy-discussion] "ValueError: >>> total size of new array must be unchanged" only on >>> Windows >>> *Sent: * Sun, Feb 5, 2012 3:02:44 PM >>> >>> It should mean that matrix.size != a * b * c. >>> >>> -=- Olivier >>> >>> Le 5 f?vrier 2012 09:32, Paolo >>> a ?crit : >>> >>> Hello, >>> I wrote a function that works on a numpy matrix >>> and it works fine on Mac >>> OS and GNU/Linux (I didn't test it on python 3). >>> Now I have a problem with numpy: the same python >>> file doesn't work on >>> Windows (Windows xp, python 2.7 and numpy 2.6.1). >>> I get this error: >>> >>> matrix=matrix.reshape(a, b, c) >>> ValueError: total size of new array must be >>> unchanged >>> >>> Why? Do anyone have an idea about this? >>> Thank you very much. >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >>> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.verelst at gmail.com Sun Feb 5 19:02:37 2012 From: david.verelst at gmail.com (David Verelst) Date: Mon, 06 Feb 2012 01:02:37 +0100 Subject: [Numpy-discussion] fast method to to count a particular value in a large matrix In-Reply-To: References: Message-ID: <4F2F189D.8010809@gmail.com> Just out of curiosity, what speed-up factor did you achieve? Regards, David On 04/02/12 22:20, Naresh wrote: > Warren Weckesser enthought.com> writes: > >> >> On Sat, Feb 4, 2012 at 2:35 PM, Benjamin Root ou.edu> wrote: >> >> >> On Saturday, February 4, 2012, Naresh Pai uark.edu> wrote:> I am > somewhat new to Python (been coding with Matlab mostly). I am trying to >>> simplify (and expedite) a piece of code that is currently a bottleneck in a > larger >>> code.> I have a large array (7000 rows x 4500 columns) titled say, abc, and > I am trying> to find a fast method to count the number of instances of each > unique value within> it. All unique values are stored in a variable, say, > unique_elem. My current code >> >>> is as follows:> import numpy as np> #allocate space for storing element > count> elem_count = zeros((len(unique_elem),1))> #loop through and count number > of unique_elem> for i in range(len(unique_elem)): >> >>> elem_count[i]= np.sum(reduce(np.logical_or,(abc== x for x > in [unique_elem[i]])))> This loop is bottleneck because I have about 850 unique > elements and it takes> about 9-10 minutes. Can you suggest a faster way to do > this? >> >>> Thank you,> Naresh> >> no.unique() can return indices and reverse indices. It would be trivial to > histogram the reverse indices using np.histogram(). >> >> Instead of histogram(), you can use bincount() on the inverse indices:u, inv = > np.unique(abc, return_inverse=True)n = np.bincount(inv)u will be an array of the > unique elements, and n will be an array of the corresponding number of > occurrences.Warren >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > The histogram() solution works perfect since unique_elem is ordered. I > appreciate everyone's help. > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From wesmckinn at gmail.com Sun Feb 5 20:17:38 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Sun, 5 Feb 2012 20:17:38 -0500 Subject: [Numpy-discussion] fast method to to count a particular value in a large matrix In-Reply-To: <4F2F189D.8010809@gmail.com> References: <4F2F189D.8010809@gmail.com> Message-ID: On Sun, Feb 5, 2012 at 7:02 PM, David Verelst wrote: > Just out of curiosity, what speed-up factor did you achieve? > > Regards, > David > > On 04/02/12 22:20, Naresh wrote: >> Warren Weckesser ?enthought.com> ?writes: >> >>> >>> On Sat, Feb 4, 2012 at 2:35 PM, Benjamin Root ?ou.edu> ?wrote: >>> >>> >>> On Saturday, February 4, 2012, Naresh Pai ?uark.edu> ?wrote:> ?I am >> somewhat new to Python (been coding with Matlab mostly). I am trying to >>>> simplify (and expedite) a piece of code that is currently a bottleneck in a >> larger >>>> code.> ?I have a large array (7000 rows x 4500 columns) titled say, abc, and >> I am trying> ?to find a fast method to count the number of instances of each >> unique value within> ?it. All unique values are stored in a variable, say, >> unique_elem. My current code >>> >>>> is as follows:> ?import numpy as np> ?#allocate space for storing element >> count> ?elem_count = zeros((len(unique_elem),1))> ?#loop through and count number >> of unique_elem> ?for i in range(len(unique_elem)): >>> >>>> ? ? elem_count[i]= np.sum(reduce(np.logical_or,(abc== x for x >> in [unique_elem[i]])))> ?This loop is bottleneck because I have about 850 unique >> elements and it takes> ?about 9-10 minutes. Can you suggest a faster way to do >> this? >>> >>>> Thank you,> ?Naresh> >>> no.unique() can return indices and reverse indices. ?It would be trivial to >> histogram the reverse indices using np.histogram(). >>> >>> Instead of histogram(), you can use bincount() on the inverse indices:u, inv = >> np.unique(abc, return_inverse=True)n = np.bincount(inv)u will be an array of the >> unique elements, and n will be an array of the corresponding number of >> occurrences.Warren >>> >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion ?scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> The histogram() solution works perfect since unique_elem is ordered. I >> appreciate everyone's help. >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion np.histogram works pretty well. I'm getting speeds something like 1300 ms on float64 data. A hash table-based solution is faster (no big surprise here), about 800ms so in the ballpark of 40% faster. Whenever I get motivated enough I'm going to make a pull request on NumPy with something like khash.h and start fixing all the O(N log N) algorithms floating around that ought to be O(N). NumPy should really have a "match" function similar to R's and a lot of other things. - Wes From scott.sinclair.za at gmail.com Mon Feb 6 02:17:45 2012 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Mon, 6 Feb 2012 09:17:45 +0200 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: References: <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io> <4F1955BC.4000207@gmail.com> Message-ID: On 5 February 2012 13:07, Ralf Gommers wrote: > >> On 20/01/12 08:49, Scott Sinclair wrote: >> > On 19 January 2012 21:48, Fernando Perez ?wrote: >> >> We've moved to the following setup with ipython, which works very well >> >> for us so far: >> >> >> >> 1. ipython.org: Main website with only static content, manged as a >> >> repo in github (https://github.com/ipython/ipython-website) and >> >> updated with a gh-pages build >> >> (https://github.com/ipython/ipython.github.com). >> > I like this idea, and to get the ball rolling I've stripped out the >> > www directory of the scipy.org-new repo into it's own repository using >> > git filter-branch (posted here: >> > https://github.com/scottza/scipy_website) and created >> > https://github.com/scottza/scottza.github.com. This puts a copy of the >> > new scipy website at http://scottza.github.com as a proof of concept. >> > Nice! > >> >> > Since there seems to be some agreement on rehosting numpy's website on >> > github, I'd be happy to do as much of the legwork as I can in getting >> > the numpy.scipy.org content hosted at numpy.github.com. I don't have >> > permission to create new repos for the Numpy organization, so someone >> > would have to create an empty >> > https://github.com/numpy/numpy.github.com and give me push permission >> > on that repo. > > > Does it need to be a new repo, or would permissions on > https://github.com/numpy/numpy.scipy.org work as well? Yes a new repo is required. Github will render html checked into a repo called https://github.com/numpy/numpy.github.com at http://numpy.github.com. Since the html is built from reST sources using Sphinx, we'd need a repo for the website source (https://github.com/numpy/numpy.github.com) and a repo to check the built html into (https://github.com/numpy/numpy.github.com). To update the website will require push permissions to both repos. The IPython team have scripts to automate the update, build and commit process for their website, which we could borrow from. Cheers, Scott From chris.barker at noaa.gov Mon Feb 6 11:41:19 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 6 Feb 2012 08:41:19 -0800 Subject: [Numpy-discussion] "ValueError: total size of new array must be unchanged" only on Windows In-Reply-To: <4F2ECD58.9000005@yahoo.it> References: <1328458618.46757.androidMobile@web27404.mail.ukl.yahoo.com> <4F2EBEBC.8040501@yahoo.it> <4F2EC1DC.9090700@yahoo.it> <4F2ECD58.9000005@yahoo.it> Message-ID: On Sun, Feb 5, 2012 at 10:41 AM, Paolo wrote: > I solved using 'rb' instead of 'r' option in the open file task. > > that would do it, if it's binary data, but you might as well so it "right": > matrix="".join(f.readlines()) > > readlines is giving you a list of the data, as separated by newline charactors ("\n") -- it was broken on Windows, because opening the file in text mode translated Windows newlines ("\r\n") to *nix style ones -- opening it in binary fixed that, but why use readlines at all? That's for text -- use read. However, even better is to use fromfile(), which creates an array form binary data in a file without puttin git in a string, first: > matrix = np.fromfile(f, dtype=np.int16) by the way -- be careful of endian issues here -- if you are moving data among different machines. You could specify the endian-ness, for instance: dt = numpy.dtype(' From npai at uark.edu Mon Feb 6 12:44:23 2012 From: npai at uark.edu (Naresh Pai) Date: Mon, 6 Feb 2012 11:44:23 -0600 Subject: [Numpy-discussion] matrix indexing Message-ID: I have two large matrices, say, ABC and DEF, each with a shape of 7000 by 4500. I have another list, say, elem, containing 850 values from ABC. I am interested in finding out the corresponding values in DEF where ABC has elem and store them *separately*. The code that I am using is: for i in range(len(elem)): DEF_distr = DEF[ABC==elem[i]] DEF_distr gets used for further processing before it gets cleared from memory and the next round of the above loop begins. The loop above currently takes about 20 minutes! I think the bottle neck is where elem is getting searched repeatedly in ABC. So I am looking for a solution where all elem can get processed in a single call and the indices of ABC be stored in another variable (separately). I would appreciate if you suggest any faster method for getting DEF_distr. Thanks, Naresh -------------- next part -------------- An HTML attachment was scrubbed... URL: From heng at cantab.net Mon Feb 6 13:32:09 2012 From: heng at cantab.net (Henry Gomersall) Date: Mon, 06 Feb 2012 18:32:09 +0000 Subject: [Numpy-discussion] numpy.fft.irfftn fails apparently unexpectedly Message-ID: <1328553129.11049.12.camel@farnsworth> Is the following behaviour expected: >>> import numpy >>> a_shape = (63, 4, 98) >>> a = numpy.complex128(numpy.random.rand(*a_shape)+\ ... 1j*numpy.random.rand(*a_shape)) >>> >>> axes = [0, 2] >>> >>> numpy.fft.irfftn(a, axes=axes) Traceback (most recent call last): File "", line 1, in File "/usr/lib/pymodules/python2.7/numpy/fft/fftpack.py", line 1080, in irfftn s, axes = _cook_nd_args(a, s, axes, invreal=1) File "/usr/lib/pymodules/python2.7/numpy/fft/fftpack.py", line 515, in _cook_nd_args s[axes[-1]] = (s[axes[-1]] - 1) * 2 IndexError: list index out of range The implication from the docs is that axes can be arbitrary. The following *does* work fine: >>> import numpy >>> a = numpy.float64(numpy.random.rand(*a_shape)) >>> axes = [0, 2] >>> numpy.fft.rfftn(a, axes=axes) Thanks, Henry From Catherine.M.Moroney at jpl.nasa.gov Mon Feb 6 14:16:01 2012 From: Catherine.M.Moroney at jpl.nasa.gov (Moroney, Catherine M (388D)) Date: Mon, 6 Feb 2012 11:16:01 -0800 Subject: [Numpy-discussion] avoiding loops when downsampling arrays Message-ID: <6520DA7C-EC08-4742-9E55-35FE16C0A559@jpl.nasa.gov> Hello, I have to write a code to downsample an array in a specific way, and I am hoping that somebody can tell me how to do this without the nested do-loops. Here is the problem statement: Segment a (MXN) array into 4x4 squares and set a flag if any of the pixels in that 4x4 square meet a certain condition. Here is the code that I want to rewrite avoiding loops: shape_out = (data_in.shape[0]/4, data_in.shape[1]/4) found = numpy.zeros(shape_out).astype(numpy.bool) for i in xrange(0, shape_out[0]): for j in xrange(0, shape_out[1]): excerpt = data_in[i*4:(i+1)*4, j*4:(j+1)*4] mask = numpy.where( (excerpt >= t1) & (excerpt <= t2), True, False) if (numpy.any(mask)): found[i,j] = True Thank you for any hints and education! Catherine From stefan at sun.ac.za Mon Feb 6 14:19:02 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 6 Feb 2012 11:19:02 -0800 Subject: [Numpy-discussion] Structure of polynomial module Message-ID: Hi all, I noticed the following docstring on ``np.polynomial.polyval``: In [116]: np.polynomial.polyval? File: /home/stefan/src/numpy/numpy/lib/utils.py Definition: np.polynomial.polyval(*args, **kwds) Docstring: `polyval` is deprecated! Please import polyval from numpy.polynomial.polynomial I guess we don't expect users to do "from numpy.polynomial.polynomial import polyval, Polynomial", so what is the suggested API for getting hold of the polynomial functions? Also, why is numpy.polynomial.polynomial.polyfit different from numpy.polyfit? Regards St?fan From silideba at gmail.com Mon Feb 6 14:21:26 2012 From: silideba at gmail.com (Debashish Saha) Date: Tue, 7 Feb 2012 00:51:26 +0530 Subject: [Numpy-discussion] (no subject) Message-ID: basic difference between the commands: import numpy as np from numpy import * From npai at uark.edu Mon Feb 6 14:30:33 2012 From: npai at uark.edu (Naresh) Date: Mon, 6 Feb 2012 19:30:33 +0000 (UTC) Subject: [Numpy-discussion] fast method to to count a particular value in a large matrix References: <4F2F189D.8010809@gmail.com> Message-ID: David: from 9-10 minutes to about 2-3 seconds, it's amazing! Thanks, Naresh From ralf.gommers at googlemail.com Mon Feb 6 14:41:23 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 6 Feb 2012 20:41:23 +0100 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: References: <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io> <4F1955BC.4000207@gmail.com> Message-ID: On Mon, Feb 6, 2012 at 8:17 AM, Scott Sinclair wrote: > On 5 February 2012 13:07, Ralf Gommers > wrote: > > > >> On 20/01/12 08:49, Scott Sinclair wrote: > >> > On 19 January 2012 21:48, Fernando Perez > wrote: > >> >> We've moved to the following setup with ipython, which works very > well > >> >> for us so far: > >> >> > >> >> 1. ipython.org: Main website with only static content, manged as a > >> >> repo in github (https://github.com/ipython/ipython-website) and > >> >> updated with a gh-pages build > >> >> (https://github.com/ipython/ipython.github.com). > >> > I like this idea, and to get the ball rolling I've stripped out the > >> > www directory of the scipy.org-new repo into it's own repository using > >> > git filter-branch (posted here: > >> > https://github.com/scottza/scipy_website) and created > >> > https://github.com/scottza/scottza.github.com. This puts a copy of > the > >> > new scipy website at http://scottza.github.com as a proof of concept. > >> > > Nice! > > > >> > >> > Since there seems to be some agreement on rehosting numpy's website on > >> > github, I'd be happy to do as much of the legwork as I can in getting > >> > the numpy.scipy.org content hosted at numpy.github.com. I don't have > >> > permission to create new repos for the Numpy organization, so someone > >> > would have to create an empty > >> > https://github.com/numpy/numpy.github.com and give me push permission > >> > on that repo. > > > > > > Does it need to be a new repo, or would permissions on > > https://github.com/numpy/numpy.scipy.org work as well? > > Yes a new repo is required. Github will render html checked into a > repo called https://github.com/numpy/numpy.github.com at > http://numpy.github.com. Since the html is built from reST sources > using Sphinx, we'd need a repo for the website source > (https://github.com/numpy/numpy.github.com) and a repo to check the > built html into (https://github.com/numpy/numpy.github.com). To update > the website will require push permissions to both repos. > > I've created https://github.com/scipy/scipy.github.com and gave you permissions on that. So with that for the built html and https://github.com/scipy/scipy.org-new for the sources, that should do it. On the numpy org I don't have the right permissions to do the same. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett.olsen at gmail.com Mon Feb 6 15:10:29 2012 From: brett.olsen at gmail.com (Brett Olsen) Date: Mon, 6 Feb 2012 14:10:29 -0600 Subject: [Numpy-discussion] (no subject) In-Reply-To: References: Message-ID: The namespace is different. If you want to use numpy.sin(), for example, you would use: import numpy as np np.sin(angle) or from numpy import * sin(angle) I generally prefer the first option because then I don't need to worry about multiple imports writing on top of each other (i.e., having test functions in several modules, and then accidentally using the wrong one). ~Brett On Mon, Feb 6, 2012 at 1:21 PM, Debashish Saha wrote: > basic difference between the commands: > import numpy as np > from numpy import * > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From jsalvati at u.washington.edu Mon Feb 6 15:40:51 2012 From: jsalvati at u.washington.edu (John Salvatier) Date: Mon, 6 Feb 2012 12:40:51 -0800 Subject: [Numpy-discussion] datetime64 format parameter? Message-ID: Hello, Is there a way to specify a format for the datetime64 constructor? The constructor doesn't have a doc. I have dates in a file with the format "MM/dd/YY". datetime64 used to be able to parse these in 1.6.1 but the dev version throws an error. Cheers, John -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Mon Feb 6 15:57:26 2012 From: sturla at molden.no (Sturla Molden) Date: Mon, 6 Feb 2012 21:57:26 +0100 Subject: [Numpy-discussion] avoiding loops when downsampling arrays In-Reply-To: <6520DA7C-EC08-4742-9E55-35FE16C0A559@jpl.nasa.gov> References: <6520DA7C-EC08-4742-9E55-35FE16C0A559@jpl.nasa.gov> Message-ID: <193F5EC0-0586-45E0-85A9-3F3384BC752C@molden.no> Short answer: Create 16 view arrays, each with a stride of 4 in both dimensions. Test them against the conditions and combine the tests with an |= operator. Thus you replace the nested loop with one that has only 16 iterations. Or reshape to 3 dimensions, the last with length 4, and you can do the same with only four view arrays. Sturla Sendt fra min iPad Den 6. feb. 2012 kl. 20:16 skrev "Moroney, Catherine M (388D)" : > Hello, > > I have to write a code to downsample an array in a specific way, and I am hoping that > somebody can tell me how to do this without the nested do-loops. Here is the problem > statement: Segment a (MXN) array into 4x4 squares and set a flag if any of the pixels > in that 4x4 square meet a certain condition. > > Here is the code that I want to rewrite avoiding loops: > > shape_out = (data_in.shape[0]/4, data_in.shape[1]/4) > found = numpy.zeros(shape_out).astype(numpy.bool) > > for i in xrange(0, shape_out[0]): > for j in xrange(0, shape_out[1]): > > excerpt = data_in[i*4:(i+1)*4, j*4:(j+1)*4] > mask = numpy.where( (excerpt >= t1) & (excerpt <= t2), True, False) > if (numpy.any(mask)): > found[i,j] = True > > Thank you for any hints and education! > > Catherine > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From warren.weckesser at enthought.com Mon Feb 6 16:10:58 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Mon, 6 Feb 2012 15:10:58 -0600 Subject: [Numpy-discussion] avoiding loops when downsampling arrays In-Reply-To: <193F5EC0-0586-45E0-85A9-3F3384BC752C@molden.no> References: <6520DA7C-EC08-4742-9E55-35FE16C0A559@jpl.nasa.gov> <193F5EC0-0586-45E0-85A9-3F3384BC752C@molden.no> Message-ID: On Mon, Feb 6, 2012 at 2:57 PM, Sturla Molden wrote: > Short answer: Create 16 view arrays, each with a stride of 4 in both > dimensions. Test them against the conditions and combine the tests with an > |= operator. Thus you replace the nested loop with one that has only 16 > iterations. Or reshape to 3 dimensions, the last with length 4, and you can > do the same with only four view arrays. > > Sturla > > Or use 'as_strided' from the stride_tricks module: import numpy as np from numpy.lib.stride_tricks import as_strided # Make some demonstration data. a = np.random.random_integers(0, 99, size=(12,16)) # Find 4x4 blocks whose values are between these limits. low = 5 high = 94 # Make a 4D view of this data, such that b[i,j] # is a 2D block with shape (4,4) (e.g. b[0,0] is # the same as a[:4, :4]). b = as_strided(a, shape=(a.shape[0]/4, a.shape[1]/4, 4, 4), strides=(4*a.strides[0], 4*a.strides[1], a.strides[0], a.strides[1])) # Reduce this to a 2D array of boolean values. The value is # True if the corresponding 4x4 block contains only values # between low and high (inclusive). result = np.all(np.all((b >= low) & (b <= high), axis=-1), axis=-1) print a print result Output (from ipython)t: In [111]: run 2Dto4by4blocks.py [[ 5 50 62 43 52 21 67 70 55 12 25 21 1 95 73 31] [44 4 60 27 93 54 25 87 15 22 37 31 46 84 10 46] [35 26 11 76 91 79 58 92 57 62 27 27 17 19 39 86] [94 96 21 36 90 18 80 62 91 68 39 22 68 31 71 48] [89 34 52 80 93 73 54 13 25 28 57 32 55 42 3 13] [65 68 41 82 55 81 64 59 73 4 44 46 91 1 86 52] [99 87 21 70 26 64 2 10 62 82 52 67 85 88 45 53] [33 10 6 3 46 71 17 58 20 56 30 18 19 17 60 76] [18 22 62 53 45 21 83 86 69 35 32 36 33 74 81 70] [24 39 93 12 37 4 4 16 45 59 46 4 90 24 1 13] [26 37 11 11 24 58 6 44 43 44 94 55 22 8 7 85] [26 91 31 75 72 25 23 89 3 30 45 93 62 72 96 39]] [[False True True False] [False False False False] [ True False False False]] Warren > Sendt fra min iPad > > Den 6. feb. 2012 kl. 20:16 skrev "Moroney, Catherine M (388D)" < > Catherine.M.Moroney at jpl.nasa.gov>: > > > Hello, > > > > I have to write a code to downsample an array in a specific way, and I > am hoping that > > somebody can tell me how to do this without the nested do-loops. Here > is the problem > > statement: Segment a (MXN) array into 4x4 squares and set a flag if any > of the pixels > > in that 4x4 square meet a certain condition. > > > > Here is the code that I want to rewrite avoiding loops: > > > > shape_out = (data_in.shape[0]/4, data_in.shape[1]/4) > > found = numpy.zeros(shape_out).astype(numpy.bool) > > > > for i in xrange(0, shape_out[0]): > > for j in xrange(0, shape_out[1]): > > > > excerpt = data_in[i*4:(i+1)*4, j*4:(j+1)*4] > > mask = numpy.where( (excerpt >= t1) & (excerpt <= t2), True, > False) > > if (numpy.any(mask)): > > found[i,j] = True > > > > Thank you for any hints and education! > > > > Catherine > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Mon Feb 6 16:12:58 2012 From: sturla at molden.no (Sturla Molden) Date: Mon, 6 Feb 2012 22:12:58 +0100 Subject: [Numpy-discussion] avoiding loops when downsampling arrays In-Reply-To: <193F5EC0-0586-45E0-85A9-3F3384BC752C@molden.no> References: <6520DA7C-EC08-4742-9E55-35FE16C0A559@jpl.nasa.gov> <193F5EC0-0586-45E0-85A9-3F3384BC752C@molden.no> Message-ID: Something like this: m,n = data.shape x = data.reshape((m,n//4,4)) z = (x[0::4,...] >= t1) & (x[0::4,...] <= t1) z |= (x[1::4,...] >= t1) & (x[1::4,...] <= t1) z |= (x[2::4,...] >= t1) & (x[2::4,...] <= t1) z |= (x[3::4,...] >= t1) & (x[3::4,...] <= t1) found = np.any(z, axis=2) Sturla Sendt fra min iPad Den 6. feb. 2012 kl. 21:57 skrev Sturla Molden : > Short answer: Create 16 view arrays, each with a stride of 4 in both dimensions. Test them against the conditions and combine the tests with an |= operator. Thus you replace the nested loop with one that has only 16 iterations. Or reshape to 3 dimensions, the last with length 4, and you can do the same with only four view arrays. > > Sturla > > Sendt fra min iPad > > Den 6. feb. 2012 kl. 20:16 skrev "Moroney, Catherine M (388D)" : > >> Hello, >> >> I have to write a code to downsample an array in a specific way, and I am hoping that >> somebody can tell me how to do this without the nested do-loops. Here is the problem >> statement: Segment a (MXN) array into 4x4 squares and set a flag if any of the pixels >> in that 4x4 square meet a certain condition. >> >> Here is the code that I want to rewrite avoiding loops: >> >> shape_out = (data_in.shape[0]/4, data_in.shape[1]/4) >> found = numpy.zeros(shape_out).astype(numpy.bool) >> >> for i in xrange(0, shape_out[0]): >> for j in xrange(0, shape_out[1]): >> >> excerpt = data_in[i*4:(i+1)*4, j*4:(j+1)*4] >> mask = numpy.where( (excerpt >= t1) & (excerpt <= t2), True, False) >> if (numpy.any(mask)): >> found[i,j] = True >> >> Thank you for any hints and education! >> >> Catherine >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From sturla at molden.no Mon Feb 6 16:17:36 2012 From: sturla at molden.no (Sturla Molden) Date: Mon, 6 Feb 2012 22:17:36 +0100 Subject: [Numpy-discussion] avoiding loops when downsampling arrays In-Reply-To: References: <6520DA7C-EC08-4742-9E55-35FE16C0A559@jpl.nasa.gov> <193F5EC0-0586-45E0-85A9-3F3384BC752C@molden.no> Message-ID: <42C21947-68E3-426A-BC59-9F75DF8BA297@molden.no> The last t1 on each lineis of course t2. Sorry for the typo. Hard to code on an ipad ;-) Sturla Sendt fra min iPad Den 6. feb. 2012 kl. 22:12 skrev Sturla Molden : > > Something like this: > > m,n = data.shape > x = data.reshape((m,n//4,4)) > z = (x[0::4,...] >= t1) & (x[0::4,...] <= t1) > z |= (x[1::4,...] >= t1) & (x[1::4,...] <= t1) > z |= (x[2::4,...] >= t1) & (x[2::4,...] <= t1) > z |= (x[3::4,...] >= t1) & (x[3::4,...] <= t1) > found = np.any(z, axis=2) > > Sturla > > Sendt fra min iPad > > Den 6. feb. 2012 kl. 21:57 skrev Sturla Molden : > >> Short answer: Create 16 view arrays, each with a stride of 4 in both dimensions. Test them against the conditions and combine the tests with an |= operator. Thus you replace the nested loop with one that has only 16 iterations. Or reshape to 3 dimensions, the last with length 4, and you can do the same with only four view arrays. >> >> Sturla >> >> Sendt fra min iPad >> >> Den 6. feb. 2012 kl. 20:16 skrev "Moroney, Catherine M (388D)" : >> >>> Hello, >>> >>> I have to write a code to downsample an array in a specific way, and I am hoping that >>> somebody can tell me how to do this without the nested do-loops. Here is the problem >>> statement: Segment a (MXN) array into 4x4 squares and set a flag if any of the pixels >>> in that 4x4 square meet a certain condition. >>> >>> Here is the code that I want to rewrite avoiding loops: >>> >>> shape_out = (data_in.shape[0]/4, data_in.shape[1]/4) >>> found = numpy.zeros(shape_out).astype(numpy.bool) >>> >>> for i in xrange(0, shape_out[0]): >>> for j in xrange(0, shape_out[1]): >>> >>> excerpt = data_in[i*4:(i+1)*4, j*4:(j+1)*4] >>> mask = numpy.where( (excerpt >= t1) & (excerpt <= t2), True, False) >>> if (numpy.any(mask)): >>> found[i,j] = True >>> >>> Thank you for any hints and education! >>> >>> Catherine >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From mwwiebe at gmail.com Mon Feb 6 16:18:57 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Mon, 6 Feb 2012 13:18:57 -0800 Subject: [Numpy-discussion] datetime64 format parameter? In-Reply-To: References: Message-ID: Hey John, NumPy doesn't provide this, because it's already provided by the datetime.date.strftime function in Python: http://docs.python.org/library/datetime.html#datetime.date.strftime One reason this format isn't supported automatically is that parsing "MM/dd/YY" is inherently ambiguous, and the convention is different in different parts of the world. The date "01/02/03" could be January 2nd 2003 or February 3rd, 2001, for example. The datetime constructor follows the ISO 8601 standard for date and time formatting, which is unambiguous. This was specified in the datetime NEP, but the 1.6 implementation unfortunately hadn't followed that part of the spec. Cheers, Mark On Mon, Feb 6, 2012 at 12:40 PM, John Salvatier wrote: > Hello, > > Is there a way to specify a format for the datetime64 constructor? The > constructor doesn't have a doc. I have dates in a file with the format > "MM/dd/YY". datetime64 used to be able to parse these in 1.6.1 but the dev > version throws an error. > > Cheers, > John > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Mon Feb 6 16:27:36 2012 From: sturla at molden.no (Sturla Molden) Date: Mon, 6 Feb 2012 22:27:36 +0100 Subject: [Numpy-discussion] avoiding loops when downsampling arrays In-Reply-To: References: <6520DA7C-EC08-4742-9E55-35FE16C0A559@jpl.nasa.gov> <193F5EC0-0586-45E0-85A9-3F3384BC752C@molden.no> Message-ID: <1CE6D632-DF72-4BF8-919D-D625ACC9E19B@molden.no> > > # Make a 4D view of this data, such that b[i,j] > # is a 2D block with shape (4,4) (e.g. b[0,0] is > # the same as a[:4, :4]). > b = as_strided(a, shape=(a.shape[0]/4, a.shape[1]/4, 4, 4), > strides=(4*a.strides[0], 4*a.strides[1], a.strides[0], a.strides[1])) > Yes :-) Being used to Fortran (and also MATLAB) this is the kind of mapping It never occurs for me to think about. What else but NumPy is flexible enough to do this? :-) Sturla From cournape at gmail.com Mon Feb 6 16:54:11 2012 From: cournape at gmail.com (David Cournapeau) Date: Mon, 6 Feb 2012 21:54:11 +0000 Subject: [Numpy-discussion] Moving to gcc 4.* for win32 installers ? In-Reply-To: References: Message-ID: On Sat, Feb 4, 2012 at 3:55 PM, Ralf Gommers wrote: > > > On Wed, Dec 14, 2011 at 6:50 PM, Ralf Gommers > wrote: >> >> >> >> On Wed, Dec 14, 2011 at 3:04 PM, David Cournapeau >> wrote: >>> >>> On Tue, Dec 13, 2011 at 3:43 PM, Ralf Gommers >>> wrote: >>> > On Sun, Oct 30, 2011 at 12:18 PM, David Cournapeau >>> > wrote: >>> >> >>> >> On Thu, Oct 27, 2011 at 5:19 PM, Ralf Gommers >>> >> wrote: >>> >> > Hi David, >>> >> > >>> >> > On Thu, Oct 27, 2011 at 3:02 PM, David Cournapeau >>> >> > >>> >> > wrote: >>> >> >> >>> >> >> Hi, >>> >> >> >>> >> >> I was wondering if we could finally move to a more recent version >>> >> >> of >>> >> >> compilers for official win32 installers. This would of course >>> >> >> concern >>> >> >> the next release cycle, not the ones where beta/rc are already in >>> >> >> progress. >>> >> >> >>> >> >> Basically, the pros: >>> >> >> ?- we will have to move at some point >>> >> >> ?- gcc 4.* seem less buggy, especially C++ and fortran. >>> >> >> ?- no need to maintain msvcr90 vodoo >>> >> >> The cons: >>> >> >> ?- it will most likely break the ABI >>> >> >> ?- we need to recompile atlas (but I can take care of it) >>> >> >> ?- the biggest: it is difficult to combine gfortran with visual >>> >> >> studio (more exactly you cannot link gfortran runtime to a visual >>> >> >> studio executable). The only solution I could think of would be to >>> >> >> recompile the gfortran runtime with Visual Studio, which for some >>> >> >> reason does not sound very appealing :) >>> >> > >>> >> > To get the datetime changes to work with MinGW, we already concluded >>> >> > that >>> >> > building with 4.x is more or less required (without recognizing some >>> >> > of >>> >> > the >>> >> > points you list above). Changes to mingw32ccompiler to fix >>> >> > compilation >>> >> > with >>> >> > 4.x went in in https://github.com/numpy/numpy/pull/156. It would be >>> >> > good >>> >> > if >>> >> > you could check those. >>> >> >>> >> I will look into it more carefully, but overall, it seems that >>> >> building atlas 3.8.4, numpy and scipy with gcc 4.x works quite well. >>> >> The main issue is that gcc 4.* adds some dependencies on mingw dlls. >>> >> There are two options: >>> >> ?- adding the dlls in the installers >>> >> ?- statically linking those, which seems to be a bad idea >>> >> (generalizing the dll boundaries problem to exception and things we >>> >> would rather not care about: >>> >> http://cygwin.com/ml/cygwin/2007-06/msg00332.html). >>> >> >>> >> > It probably makes sense make this move for numpy 1.7. If this breaks >>> >> > the >>> >> > ABI >>> >> > then it would be easiest to make numpy 1.7 the minimum required >>> >> > version >>> >> > for >>> >> > scipy 0.11. >>> >> >>> >> My thinking as well. >>> >> >>> > >>> > Hi David, what is the current status of this issue? I kind of forgot >>> > this is >>> > a prerequisite for the next release when starting the 1.7.0 release >>> > thread. >>> >>> The only issue at this point is the distribution of mingw dlls. I have >>> not found a way to do it nicely (where nicely means something that is >>> distributed within numpy package). Given that those dlls are actually >>> versioned and seem to have a strong versioning policy, maybe we can >>> just install them inside the python installation ? >>> >> Although not ideal, I don't have a problem with that in principle. >> However, wouldn't it break installing without admin rights if Python is >> installed by the admin? > > > David, do you have any more thoughts on this? Is there a final solution in > sight? Anything I or anyone else can do to help? I have not found a way to do it without installing the dll alongside python libraries. That brings the problem of how to install libraries there from bdist_wininst/bdist_msi installers, which I had not the time to look at. David From jsalvati at u.washington.edu Mon Feb 6 17:01:39 2012 From: jsalvati at u.washington.edu (John Salvatier) Date: Mon, 6 Feb 2012 14:01:39 -0800 Subject: [Numpy-discussion] datetime64 format parameter? In-Reply-To: References: Message-ID: That makes sense. I figured that ambiguity was the reason it was removed. Thank you for the explanation. I'm a big fan of your work. John On Mon, Feb 6, 2012 at 1:18 PM, Mark Wiebe wrote: > Hey John, > > NumPy doesn't provide this, because it's already provided by the > datetime.date.strftime function in Python: > > http://docs.python.org/library/datetime.html#datetime.date.strftime > > One reason this format isn't supported automatically is that parsing > "MM/dd/YY" is inherently ambiguous, and the convention is different in > different parts of the world. The date "01/02/03" could be January 2nd 2003 > or February 3rd, 2001, for example. The datetime constructor follows the > ISO 8601 standard for date and time formatting, which is unambiguous. This > was specified in the datetime NEP, but the 1.6 implementation unfortunately > hadn't followed that part of the spec. > > Cheers, > Mark > > On Mon, Feb 6, 2012 at 12:40 PM, John Salvatier > wrote: > >> Hello, >> >> Is there a way to specify a format for the datetime64 constructor? The >> constructor doesn't have a doc. I have dates in a file with the format >> "MM/dd/YY". datetime64 used to be able to parse these in 1.6.1 but the dev >> version throws an error. >> >> Cheers, >> John >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Mon Feb 6 17:16:29 2012 From: cournape at gmail.com (David Cournapeau) Date: Mon, 6 Feb 2012 22:16:29 +0000 Subject: [Numpy-discussion] fast method to to count a particular value in a large matrix In-Reply-To: References: <4F2F189D.8010809@gmail.com> Message-ID: On Mon, Feb 6, 2012 at 1:17 AM, Wes McKinney wrote: > > Whenever I get motivated enough I'm going to make a pull request on > NumPy with something like khash.h and start fixing all the O(N log N) > algorithms floating around that ought to be O(N). NumPy should really > have a "match" function similar to R's and a lot of other things. khash.h is not the only thing that I'd like to use in numpy if I had more time :) David From e.antero.tammi at gmail.com Mon Feb 6 17:27:16 2012 From: e.antero.tammi at gmail.com (eat) Date: Tue, 7 Feb 2012 00:27:16 +0200 Subject: [Numpy-discussion] avoiding loops when downsampling arrays In-Reply-To: <6520DA7C-EC08-4742-9E55-35FE16C0A559@jpl.nasa.gov> References: <6520DA7C-EC08-4742-9E55-35FE16C0A559@jpl.nasa.gov> Message-ID: Hi, On Mon, Feb 6, 2012 at 9:16 PM, Moroney, Catherine M (388D) < Catherine.M.Moroney at jpl.nasa.gov> wrote: > Hello, > > I have to write a code to downsample an array in a specific way, and I am > hoping that > somebody can tell me how to do this without the nested do-loops. Here is > the problem > statement: Segment a (MXN) array into 4x4 squares and set a flag if any > of the pixels > in that 4x4 square meet a certain condition. > > Here is the code that I want to rewrite avoiding loops: > > shape_out = (data_in.shape[0]/4, data_in.shape[1]/4) > found = numpy.zeros(shape_out).astype(numpy.bool) > > for i in xrange(0, shape_out[0]): > for j in xrange(0, shape_out[1]): > > excerpt = data_in[i*4:(i+1)*4, j*4:(j+1)*4] > mask = numpy.where( (excerpt >= t1) & (excerpt <= t2), > True, False) > if (numpy.any(mask)): > found[i,j] = True > > Thank you for any hints and education! > > Catherine > Following Warrens answer a slight demonstration of code like this: import numpy as np def ds_0(data_in, t1= 1, t2= 4): shape_out= (data_in.shape[0]/ 4, data_in.shape[1]/ 4) found= np.zeros(shape_out).astype(np.bool) for i in xrange(0, shape_out[0]): for j in xrange(0, shape_out[1]): excerpt= data_in[i* 4: (i+ 1)* 4, j* 4: (j+ 1)* 4] mask= np.where((excerpt>= t1)& (excerpt<= t2), True, False) if (np.any(mask)): found[i, j]= True return found # with stride_tricks you may cook up something like this: from numpy.lib.stride_tricks import as_strided as ast def _ss(dt, ds, s): return {'shape': (ds[0]/ s[0], ds[1]/ s[1])+ s, 'strides': (s[0]* dt[0], s[1]* dt[1])+ dt} def _view(D, shape= (4, 4)): return ast(D, **_ss(D.strides, D.shape, shape)) def ds_1(data_in, t1= 1, t2= 4): # return _view(data_in) excerpt= _view(data_in) mask= np.where((excerpt>= t1)& (excerpt<= t2), True, False) return mask.sum(2).sum(2).astype(np.bool) if __name__ == '__main__': from numpy.random import randint r= randint(777, size= (64, 288)); print r print np.allclose(ds_0(r), ds_1(r)) My 2 cents, eat > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Feb 6 17:34:59 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 6 Feb 2012 15:34:59 -0700 Subject: [Numpy-discussion] Structure of polynomial module In-Reply-To: References: Message-ID: 2012/2/6 St?fan van der Walt > Hi all, > > I noticed the following docstring on ``np.polynomial.polyval``: > > In [116]: np.polynomial.polyval? > File: /home/stefan/src/numpy/numpy/lib/utils.py > Definition: np.polynomial.polyval(*args, **kwds) > Docstring: > `polyval` is deprecated! > Please import polyval from numpy.polynomial.polynomial > > > I guess we don't expect users to do "from numpy.polynomial.polynomial > import polyval, Polynomial", so what is the suggested API for getting > hold of the polynomial functions? You can still import Polynomial from numpy.polynomial. The other functions were removed because 1) They are essentially duplicated for 6 polynomial types, 2) they are mostly there to support Polynomial. You shouldn't need them for most things if you use Polynomial. > Also, why is > numpy.polynomial.polynomial.polyfit different from numpy.polyfit? > > Use Polynomial.fit, it tracks the domain for you. Want to use Legendre functions? Use Legendre.fit. Want to plot the result? plot(*p.linspace()), want to plot the derivative? plot(*p.deriv().linspace()). Want to convert a Legendre series to a Polynomial? p.convert(kind=Polynomial). So on and so forth. The fitting is also NA aware in the development branch. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From e.antero.tammi at gmail.com Mon Feb 6 17:38:34 2012 From: e.antero.tammi at gmail.com (eat) Date: Tue, 7 Feb 2012 00:38:34 +0200 Subject: [Numpy-discussion] avoiding loops when downsampling arrays In-Reply-To: <6520DA7C-EC08-4742-9E55-35FE16C0A559@jpl.nasa.gov> References: <6520DA7C-EC08-4742-9E55-35FE16C0A559@jpl.nasa.gov> Message-ID: Hi, Sorry for my latest post, hands way too quick ;( On Mon, Feb 6, 2012 at 9:16 PM, Moroney, Catherine M (388D) < Catherine.M.Moroney at jpl.nasa.gov> wrote: > Hello, > > I have to write a code to downsample an array in a specific way, and I am > hoping that > somebody can tell me how to do this without the nested do-loops. Here is > the problem > statement: Segment a (MXN) array into 4x4 squares and set a flag if any > of the pixels > in that 4x4 square meet a certain condition. > > Here is the code that I want to rewrite avoiding loops: > > shape_out = (data_in.shape[0]/4, data_in.shape[1]/4) > found = numpy.zeros(shape_out).astype(numpy.bool) > > for i in xrange(0, shape_out[0]): > for j in xrange(0, shape_out[1]): > > excerpt = data_in[i*4:(i+1)*4, j*4:(j+1)*4] > mask = numpy.where( (excerpt >= t1) & (excerpt <= t2), > True, False) > if (numpy.any(mask)): > found[i,j] = True > > Thank you for any hints and education! > Following closely with Warrens answer a slight demonstration of code like this: import numpy as np def ds_0(data_in, t1= 1, t2= 4): shape_out= (data_in.shape[0]/ 4, data_in.shape[1]/ 4) found= np.zeros(shape_out).astype(np.bool) for i in xrange(0, shape_out[0]): for j in xrange(0, shape_out[1]): excerpt= data_in[i* 4: (i+ 1)* 4, j* 4: (j+ 1)* 4] mask= np.where((excerpt>= t1)& (excerpt<= t2), True, False) if (np.any(mask)): found[i, j]= True return found # with stride_tricks you may cook up something like this: from numpy.lib.stride_tricks import as_strided as ast def _ss(dt, ds, s): return {'shape': (ds[0]/ s[0], ds[1]/ s[1])+ s, 'strides': (s[0]* dt[0], s[1]* dt[1])+ dt} def _view(D, shape= (4, 4)): return ast(D, **_ss(D.strides, D.shape, shape)) def ds_1(data_in, t1= 1, t2= 4): excerpt= _view(data_in) mask= np.where((excerpt>= t1)& (excerpt<= t2), True, False) return mask.sum(2).sum(2).astype(np.bool) if __name__ == '__main__': from numpy.random import randint r= randint(777, size= (64, 288)); print r print np.allclose(ds_0(r), ds_1(r)) and when run, it will yield like: In []: run dsa [[ 60 470 521 ..., 147 435 295] [246 127 662 ..., 718 525 256] [354 384 205 ..., 225 364 239] ..., [277 428 201 ..., 460 282 433] [ 27 407 130 ..., 245 346 309] [649 157 153 ..., 316 613 570]] True and compared in performance wise: In []: %timeit ds_0(r) 10 loops, best of 3: 56.3 ms per loop In []: %timeit ds_1(r) 100 loops, best of 3: 2.17 ms per loop My 2 cents, eat > Catherine > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From torgil.svensson at gmail.com Mon Feb 6 19:04:47 2012 From: torgil.svensson at gmail.com (Torgil Svensson) Date: Tue, 7 Feb 2012 01:04:47 +0100 Subject: [Numpy-discussion] numpy.fft.irfftn fails apparently unexpectedly In-Reply-To: <1328553129.11049.12.camel@farnsworth> References: <1328553129.11049.12.camel@farnsworth> Message-ID: irfftn is an optimization for real input and does not take complex input. You have to use numpy.fft.ifftn instead: >>>> import numpy >>>> a_shape = (63, 4, 98) >>>> a = numpy.complex128(numpy.random.rand(*a_shape)+\ > ... 1j*numpy.random.rand(*a_shape)) >>>> >>>> axes = [0, 2] >>>> >>>> numpy.fft.ifftn(a, axes=axes) Or do you mean if the error message is expected? Best Regards, //Torgil On Mon, Feb 6, 2012 at 7:32 PM, Henry Gomersall wrote: > Is the following behaviour expected: > >>>> import numpy >>>> a_shape = (63, 4, 98) >>>> a = numpy.complex128(numpy.random.rand(*a_shape)+\ > ... ? ? 1j*numpy.random.rand(*a_shape)) >>>> >>>> axes = [0, 2] >>>> >>>> numpy.fft.irfftn(a, axes=axes) > Traceback (most recent call last): > ?File "", line 1, in > ?File "/usr/lib/pymodules/python2.7/numpy/fft/fftpack.py", line 1080, > in irfftn > ? ?s, axes = _cook_nd_args(a, s, axes, invreal=1) > ?File "/usr/lib/pymodules/python2.7/numpy/fft/fftpack.py", line 515, in > _cook_nd_args > ? ?s[axes[-1]] = (s[axes[-1]] - 1) * 2 > IndexError: list index out of range > > The implication from the docs is that axes can be arbitrary. The > following *does* work fine: > >>>> import numpy >>>> a = numpy.float64(numpy.random.rand(*a_shape)) >>>> axes = [0, 2] >>>> numpy.fft.rfftn(a, axes=axes) > > > Thanks, > > Henry > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From scott.sinclair.za at gmail.com Tue Feb 7 01:04:05 2012 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Tue, 7 Feb 2012 08:04:05 +0200 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: References: <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io> <4F1955BC.4000207@gmail.com> Message-ID: On 6 February 2012 21:41, Ralf Gommers wrote: > > > On Mon, Feb 6, 2012 at 8:17 AM, Scott Sinclair > wrote: >> >> On 5 February 2012 13:07, Ralf Gommers >> wrote: >> > >> > Does it need to be a new repo, or would permissions on >> > https://github.com/numpy/numpy.scipy.org work as well? >> >> Yes a new repo is required. Github will render html checked into a >> repo called https://github.com/numpy/numpy.github.com at >> http://numpy.github.com. Since the html is built from reST sources >> using Sphinx, we'd need a repo for the website source >> (https://github.com/numpy/numpy.github.com) and a repo to check the >> built html into (https://github.com/numpy/numpy.github.com). To update >> the website will require push permissions to both repos. >> > I've created https://github.com/scipy/scipy.github.com and gave you > permissions on that. So with that for the built html and > https://github.com/scipy/scipy.org-new for the sources, that should do it. > > On the numpy org I don't have the right permissions to do the same. The updated version of the 'old' new.scipy.org is now at http://scipy.github.com/. There are still a few things that I think need to get cleaned up. I'll ping the scipy mailing list in the next week or two to start the discussion on redirecting scipy.org and www.scipy.org, as well as solicit comments on the website content. Cheers, Scott From heng at cantab.net Tue Feb 7 04:15:36 2012 From: heng at cantab.net (Henry Gomersall) Date: Tue, 07 Feb 2012 09:15:36 +0000 Subject: [Numpy-discussion] numpy.fft.irfftn fails apparently unexpectedly In-Reply-To: References: <1328553129.11049.12.camel@farnsworth> Message-ID: <1328606136.17023.11.camel@farnsworth> On Tue, 2012-02-07 at 01:04 +0100, Torgil Svensson wrote: > irfftn is an optimization for real input and does not take complex > input. You have to use numpy.fft.ifftn instead: > hmmm, that doesn't sound right to me (though there could be some non obvious DFT magic that I'm missing). Indeed, np.irfftn(np.rfftn(a)) ~= a # The interim array is complex Though the documentation is a bit vague as to what inputs are expected! Actually, reading the fftpack docs, it *does* seem that this is the correct behaviour (assuming when it says "Fourier coefficients" it means complex), though I've not read any of the Python code. > >>>> import numpy > >>>> a_shape = (63, 4, 98) > >>>> a = numpy.complex128(numpy.random.rand(*a_shape)+\ > > ... 1j*numpy.random.rand(*a_shape)) > >>>> > >>>> axes = [0, 2] > >>>> > >>>> numpy.fft.ifftn(a, axes=axes) > > Or do you mean if the error message is expected? Yeah, the question was regarding the error message. Specifically, the problem it seems to have with an axes argument like that. Cheers, Henry From heng at cantab.net Tue Feb 7 04:19:18 2012 From: heng at cantab.net (Henry Gomersall) Date: Tue, 07 Feb 2012 09:19:18 +0000 Subject: [Numpy-discussion] numpy.fft.irfftn fails apparently unexpectedly In-Reply-To: <1328606136.17023.11.camel@farnsworth> References: <1328553129.11049.12.camel@farnsworth> <1328606136.17023.11.camel@farnsworth> Message-ID: <1328606358.17023.14.camel@farnsworth> On Tue, 2012-02-07 at 09:15 +0000, Henry Gomersall wrote: > > On Tue, 2012-02-07 at 01:04 +0100, Torgil Svensson wrote: > > irfftn is an optimization for real input and does not take complex > > input. You have to use numpy.fft.ifftn instead: > > > hmmm, that doesn't sound right to me (though there could be some non > obvious DFT magic that I'm missing). Indeed, > > np.irfftn(np.rfftn(a)) ~= a # The interim array is complex Actually, further to this, the issue arose when I was testing some fftw wrappers I wrote against the output from numpy.fft. The outputs match very closely (as expected!) except for that case when numpy.fft raises an exception, so numpy.fft is generally behaving as I expected. Cheers, Henry From pav at iki.fi Tue Feb 7 05:02:41 2012 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 07 Feb 2012 11:02:41 +0100 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: References: <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io> <4F1955BC.4000207@gmail.com> Message-ID: Hi, 06.02.2012 20:41, Ralf Gommers kirjoitti: [clip] > I've created https://github.com/scipy/scipy.github.com and gave you > permissions on that. So with that for the built html and > https://github.com/scipy/scipy.org-new for the sources, that should do it. > > On the numpy org I don't have the right permissions to do the same. Ditto for numpy.github.com, now. Pauli From stefan at sun.ac.za Tue Feb 7 05:40:53 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 7 Feb 2012 02:40:53 -0800 Subject: [Numpy-discussion] Structure of polynomial module In-Reply-To: References: Message-ID: On Mon, Feb 6, 2012 at 2:34 PM, Charles R Harris wrote: > Use Polynomial.fit, it tracks the domain for you. Want to use Legendre > functions? Use Legendre.fit. Want to plot the result? plot(*p.linspace()), > want to plot the derivative? plot(*p.deriv().linspace()). Want to convert a > Legendre series to a Polynomial? p.convert(kind=Polynomial). So on and so > forth. The fitting is also NA aware in the development branch. That's really neat; thanks, Chuck. St?fan From sturla at molden.no Tue Feb 7 07:57:46 2012 From: sturla at molden.no (Sturla Molden) Date: Tue, 07 Feb 2012 13:57:46 +0100 Subject: [Numpy-discussion] avoiding loops when downsampling arrays In-Reply-To: <1CE6D632-DF72-4BF8-919D-D625ACC9E19B@molden.no> References: <6520DA7C-EC08-4742-9E55-35FE16C0A559@jpl.nasa.gov> <193F5EC0-0586-45E0-85A9-3F3384BC752C@molden.no> <1CE6D632-DF72-4BF8-919D-D625ACC9E19B@molden.no> Message-ID: <4F311FCA.6000101@molden.no> On 06.02.2012 22:27, Sturla Molden wrote: > > >> >> # Make a 4D view of this data, such that b[i,j] >> # is a 2D block with shape (4,4) (e.g. b[0,0] is >> # the same as a[:4, :4]). >> b = as_strided(a, shape=(a.shape[0]/4, a.shape[1]/4, 4, 4), >> strides=(4*a.strides[0], 4*a.strides[1], a.strides[0], a.strides[1])) >> > > Yes :-) Being used to Fortran (and also MATLAB) this is the kind of mapping It never occurs for me to think about. What else but NumPy is flexible enough to do this? :-) Actually, using as_strided is not needed. We can just reshape like this: (m,n) ---> (m//4, 4, n//4, 4) and then use np.any along the two length-4 dimensions. m,n = data.shape cond = lamda x : (x <= t1) & (x >= t2) x = cond(data).reshape((m//4, 4, n//4, 4)) found = np.any(np.any(x, axis=1), axis=2) Sturla From sturla at molden.no Tue Feb 7 08:23:40 2012 From: sturla at molden.no (Sturla Molden) Date: Tue, 07 Feb 2012 14:23:40 +0100 Subject: [Numpy-discussion] Moving to gcc 4.* for win32 installers ? In-Reply-To: References: Message-ID: <4F3125DC.5090207@molden.no> On 04.02.2012 16:55, Ralf Gommers wrote: > Although not ideal, I don't have a problem with that in principle. > However, wouldn't it break installing without admin rights if Python > is installed by the admin? Not on Windows. Sturla From sturla at molden.no Tue Feb 7 08:30:31 2012 From: sturla at molden.no (Sturla Molden) Date: Tue, 07 Feb 2012 14:30:31 +0100 Subject: [Numpy-discussion] Moving to gcc 4.* for win32 installers ? In-Reply-To: References: Message-ID: <4F312777.6030407@molden.no> On 27.10.2011 15:02, David Cournapeau wrote: > - we need to recompile atlas (but I can take care of it) > - the biggest: it is difficult to combine gfortran with visual > studio (more exactly you cannot link gfortran runtime to a visual > studio executable). Why is that? I have used gfortran with Python on Windows a lot, never had a problem. It's not like we are going to share CRT resources between C/Python and Fortran. That would be silly, regardless of compiler. Sturla From sturla at molden.no Tue Feb 7 08:38:41 2012 From: sturla at molden.no (Sturla Molden) Date: Tue, 07 Feb 2012 14:38:41 +0100 Subject: [Numpy-discussion] Moving to gcc 4.* for win32 installers ? In-Reply-To: References: Message-ID: <4F312961.5040706@molden.no> On 27.10.2011 15:02, David Cournapeau wrote: > - we need to recompile atlas (but I can take care of it) May I suggest GotoBLAS2 instead of ATLAS? Is is faster (comparable to MKL), easier to build, and now released under BSD licence. http://www.tacc.utexas.edu/tacc-projects/gotoblas2 Sturla From sturla at molden.no Tue Feb 7 08:55:32 2012 From: sturla at molden.no (Sturla Molden) Date: Tue, 07 Feb 2012 14:55:32 +0100 Subject: [Numpy-discussion] Moving to gcc 4.* for win32 installers ? In-Reply-To: <4F312961.5040706@molden.no> References: <4F312961.5040706@molden.no> Message-ID: <4F312D54.7080908@molden.no> On 07.02.2012 14:38, Sturla Molden wrote: > May I suggest GotoBLAS2 instead of ATLAS? Or OpenBLAS, which is GotoBLAS2 except it is still maintained. https://github.com/xianyi/OpenBLAS From heng at cantab.net Tue Feb 7 09:12:29 2012 From: heng at cantab.net (Henry Gomersall) Date: Tue, 07 Feb 2012 14:12:29 +0000 Subject: [Numpy-discussion] numpy.fft.irfftn fails apparently unexpectedly In-Reply-To: <1328606136.17023.11.camel@farnsworth> References: <1328553129.11049.12.camel@farnsworth> <1328606136.17023.11.camel@farnsworth> Message-ID: <1328623949.17023.21.camel@farnsworth> On Tue, 2012-02-07 at 09:15 +0000, Henry Gomersall wrote: > > >>>> numpy.fft.ifftn(a, axes=axes) > > > > Or do you mean if the error message is expected? > > Yeah, the question was regarding the error message. Specifically, the > problem it seems to have with an axes argument like that. Sorry, the error message got lost in the thread. It's in response to irfftn (as below), not ifftn as above. >>> import numpy >>> a_shape = (63, 4, 98) >>> a = numpy.complex128(numpy.random.rand(*a_shape)+\ ... 1j*numpy.random.rand(*a_shape)) >>> >>> axes = [0, 2] >>> numpy.fft.irfftn(a, axes=axes) From e.antero.tammi at gmail.com Tue Feb 7 09:27:24 2012 From: e.antero.tammi at gmail.com (eat) Date: Tue, 7 Feb 2012 16:27:24 +0200 Subject: [Numpy-discussion] avoiding loops when downsampling arrays In-Reply-To: <4F311FCA.6000101@molden.no> References: <6520DA7C-EC08-4742-9E55-35FE16C0A559@jpl.nasa.gov> <193F5EC0-0586-45E0-85A9-3F3384BC752C@molden.no> <1CE6D632-DF72-4BF8-919D-D625ACC9E19B@molden.no> <4F311FCA.6000101@molden.no> Message-ID: Hi This is elegant and very fast as well! On Tue, Feb 7, 2012 at 2:57 PM, Sturla Molden wrote: > On 06.02.2012 22:27, Sturla Molden wrote: > > > > > >> > >> # Make a 4D view of this data, such that b[i,j] > >> # is a 2D block with shape (4,4) (e.g. b[0,0] is > >> # the same as a[:4, :4]). > >> b = as_strided(a, shape=(a.shape[0]/4, a.shape[1]/4, 4, 4), > >> strides=(4*a.strides[0], 4*a.strides[1], a.strides[0], > a.strides[1])) > >> > > > > Yes :-) Being used to Fortran (and also MATLAB) this is the kind of > mapping It never occurs for me to think about. What else but NumPy is > flexible enough to do this? :-) > > Actually, using as_strided is not needed. We can just reshape like this: > > (m,n) ---> (m//4, 4, n//4, 4) > > and then use np.any along the two length-4 dimensions. > > m,n = data.shape > cond = lamda x : (x <= t1) & (x >= t2) > I guess you meant here cond= lambda x: (x>= t1)& (x<= t2) > x = cond(data).reshape((m//4, 4, n//4, 4)) > found = np.any(np.any(x, axis=1), axis=2) > Regards, eat > > > Sturla > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Tue Feb 7 11:14:46 2012 From: cournape at gmail.com (David Cournapeau) Date: Tue, 7 Feb 2012 16:14:46 +0000 Subject: [Numpy-discussion] Moving to gcc 4.* for win32 installers ? In-Reply-To: <4F312777.6030407@molden.no> References: <4F312777.6030407@molden.no> Message-ID: On Tue, Feb 7, 2012 at 1:30 PM, Sturla Molden wrote: > On 27.10.2011 15:02, David Cournapeau wrote: > >> ? ?- we need to recompile atlas (but I can take care of it) >> ? ?- the biggest: it is difficult to combine gfortran with visual >> studio (more exactly you cannot link gfortran runtime to a visual >> studio executable). > > Why is that? > > I have used gfortran with Python on Windows a lot, never had a problem. How did you link a library with mixed C and gfortran ? > It's not like we are going to share CRT resources between C/Python and > Fortran. That would be silly, regardless of compiler. Well, that actually happens quite a bit in the libraries we depend on. One solution could actually be removing any dependency on the fortran runtime. David From cournape at gmail.com Tue Feb 7 11:15:52 2012 From: cournape at gmail.com (David Cournapeau) Date: Tue, 7 Feb 2012 16:15:52 +0000 Subject: [Numpy-discussion] Moving to gcc 4.* for win32 installers ? In-Reply-To: <4F312D54.7080908@molden.no> References: <4F312961.5040706@molden.no> <4F312D54.7080908@molden.no> Message-ID: On Tue, Feb 7, 2012 at 1:55 PM, Sturla Molden wrote: > On 07.02.2012 14:38, Sturla Molden wrote: > >> May I suggest GotoBLAS2 instead of ATLAS? > > Or OpenBLAS, which is GotoBLAS2 except it is still maintained. I did not know GotoBLAS2 was open source (it wasn't last time I checked). That's very useful information, I will look into it. David From sturla at molden.no Tue Feb 7 12:10:43 2012 From: sturla at molden.no (Sturla Molden) Date: Tue, 07 Feb 2012 18:10:43 +0100 Subject: [Numpy-discussion] avoiding loops when downsampling arrays In-Reply-To: References: <6520DA7C-EC08-4742-9E55-35FE16C0A559@jpl.nasa.gov> <193F5EC0-0586-45E0-85A9-3F3384BC752C@molden.no> <1CE6D632-DF72-4BF8-919D-D625ACC9E19B@molden.no> <4F311FCA.6000101@molden.no> Message-ID: <4F315B13.6040907@molden.no> On 07.02.2012 15:27, eat wrote: > This is elegant and very fast as well! Just be aware that it depends on C ordered input. So: m,n = data.shape cond = lamda x : (x >= t1) & (x <= t2) x = cond(np.ascontiguousarray(data)).reshape((m//4, 4, n//4, 4)) found = np.any(np.any(x, axis=1), axis=2) With Fortran ordered data, we must reshape in the opposite direction: (m,n) ---> (4, m//4, 4, n//4) That is: m,n = data.shape cond = lamda x : (x >= t1) & (x <= t2) x = cond(np.asfortranarray(data)).reshape((4, m//4, 4, n//4)) found = np.any(np.any(x, axis=0), axis=1) On the other hand, using as_strided instead of reshape should work with any ordering. Think of as_strided as a generalization of reshape. A problem with both these approaches is that it implicitely loops over discontiguous memory. The only solution to that is to use C, Fortran or Cython. So in Cython: import numpy as np cimport numpy as np cimport cython cdef inline np.npy_bool cond(np.float64_t x, np.float64_t t1, np.float64_t t2): return 1 if (x >= t1 and x <= t2) else 0 @cython.wraparound(False) @cython.boundscheck(False) @cython.cdivision(True) def find(object data, np.float64_t t1, np.float64_t t2): cdef np.ndarray[np.float64_t, ndim=2, mode='c'] x cdef np.ndarray[np.npy_bool, ndim=2, mode='c'] found cdef Py_ssize_t m, n, i, j x = np.ascontiguousarray(data) m,n = x.shape[0], x.shape[1] found = np.zeros((m//4,n//4),dtype=bool) for i in range(m): for j in range(n): found[i//4,j//4] = cond(x[i,j]) return found It might be that the C compiler has difficulties optimizing the loop if it think x and found cound be aliased, in which case the next logical step would be C99 or Fortran... Sturla From jordigh at octave.org Tue Feb 7 12:11:34 2012 From: jordigh at octave.org (=?UTF-8?Q?Jordi_Guti=C3=A9rrez_Hermoso?=) Date: Tue, 7 Feb 2012 12:11:34 -0500 Subject: [Numpy-discussion] Logical indexing and higher-dimensional arrays. Message-ID: Consider the following. Is this a bug? Thanks, - Jordi G. H. ----------------------------------------------- #!/usr/bin/python import numpy as np x = np.reshape(np.random.uniform(size=2*3*4), [2,3,4]) idx = np.array([False, True, False, True]) y = x[0,:,:]; ## Why is this transposed? print x[0, :, idx].T == y[:, idx] ## This doesn't happen with non-boolean indexing print x[0, :, 1:3] == y[:, 1:3] From sturla at molden.no Tue Feb 7 12:22:08 2012 From: sturla at molden.no (Sturla Molden) Date: Tue, 07 Feb 2012 18:22:08 +0100 Subject: [Numpy-discussion] Moving to gcc 4.* for win32 installers ? In-Reply-To: References: <4F312777.6030407@molden.no> Message-ID: <4F315DC0.6070908@molden.no> On 07.02.2012 17:14, David Cournapeau wrote: > How did you link a library with mixed C and gfortran ? Use gfortran instead of gcc when you link. gfortran knows what to do (and don't put -lgfortran in there). Something like this I think: gfortran -o whatever.pyd -shared cobj.o fobj.o -lmsvcr90 -lpython27 (Or just do whatever f2py does on Windows, it's mixed C and Fortran as well, and works with gfortran.) Sturla From sturla at molden.no Tue Feb 7 12:24:49 2012 From: sturla at molden.no (Sturla Molden) Date: Tue, 07 Feb 2012 18:24:49 +0100 Subject: [Numpy-discussion] avoiding loops when downsampling arrays In-Reply-To: <4F315B13.6040907@molden.no> References: <6520DA7C-EC08-4742-9E55-35FE16C0A559@jpl.nasa.gov> <193F5EC0-0586-45E0-85A9-3F3384BC752C@molden.no> <1CE6D632-DF72-4BF8-919D-D625ACC9E19B@molden.no> <4F311FCA.6000101@molden.no> <4F315B13.6040907@molden.no> Message-ID: <4F315E61.2030903@molden.no> > for i in range(m): > for j in range(n): > found[i//4,j//4] = cond(x[i,j]) > Blah, that should be found[i//4,j//4] |= cond(x[i,j]) Sturla From sturla at molden.no Tue Feb 7 12:38:32 2012 From: sturla at molden.no (Sturla Molden) Date: Tue, 07 Feb 2012 18:38:32 +0100 Subject: [Numpy-discussion] Moving to gcc 4.* for win32 installers ? In-Reply-To: References: <4F312961.5040706@molden.no> <4F312D54.7080908@molden.no> Message-ID: <4F316198.1030108@molden.no> On 07.02.2012 17:15, David Cournapeau wrote: > I did not know GotoBLAS2 was open source (it wasn't last time I > checked). That's very useful information, I will look into it. One potential problem I just discovered is dependency on a DLL called libpthreadGC2.dll. First, it's a DLL that must be put somewhere. And second it is LGPL, but mingw uses it for any OpenMP by default. Well it is compiled independently so it should not taint the rest of NumPy, but it would be there. It would be better to have a different BSD pthreads library to link with instead, but I don't know of any that could be used. http://sourceware.org/pthreads-win32/ Sturla From ben.root at ou.edu Tue Feb 7 13:17:02 2012 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 7 Feb 2012 12:17:02 -0600 Subject: [Numpy-discussion] Logical indexing and higher-dimensional arrays. In-Reply-To: References: Message-ID: On Tue, Feb 7, 2012 at 11:11 AM, Jordi Guti?rrez Hermoso wrote: > Consider the following. Is this a bug? > > Thanks, > - Jordi G. H. > > ----------------------------------------------- > #!/usr/bin/python > > import numpy as np > > x = np.reshape(np.random.uniform(size=2*3*4), [2,3,4]) > > idx = np.array([False, True, False, True]) > y = x[0,:,:]; > > ## Why is this transposed? > print x[0, :, idx].T == y[:, idx] > > ## This doesn't happen with non-boolean indexing > print x[0, :, 1:3] == y[:, 1:3] Funny things do happen when you mix boolean indexing with slicing/fancy indexing. This does seem strange to me: >>> print x.shape (2, 3, 4) >>> print x[0, :, :].shape (3, 4) >>> print x[0, :, idx].shape (2, 3) # 1-D slices using regular indexing and slicing >>> print x[0, :, 1] [ 0.06500275 0.46899149 0.50125757] >>> print x[0, :, 3] [ 0.06500275 0.46899149 0.50125757] # 2-D view using regular indexing, slicing and boolean indexing >>> print x[0, :, idx] [[ 0.06500275 0.46899149 0.50125757] [ 0.68811907 0.94795054 0.86839934]] # 2-D view using indexing and slicing >>> print x[0, :, 1:4] [[ 0.06500275 0.95042819 0.68811907] [ 0.46899149 0.49388795 0.94795054] [ 0.50125757 0.04363919 0.86839934]] The 1-D views makes sense for them to be of shape (3,), but the 2-D view is inconsistent with the last result. Could this be a bug related to how boolean indexing tends to flatten the results? Stacking flattened arrays would yield the second result. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Tue Feb 7 13:24:22 2012 From: sturla at molden.no (Sturla Molden) Date: Tue, 07 Feb 2012 19:24:22 +0100 Subject: [Numpy-discussion] Logical indexing and higher-dimensional arrays. In-Reply-To: References: Message-ID: <4F316C56.6070003@molden.no> On 07.02.2012 19:17, Benjamin Root wrote: > >>> print x.shape > (2, 3, 4) > >>> print x[0, :, :].shape > (3, 4) > >>> print x[0, :, idx].shape > (2, 3) That looks like a bug to me. The length of the first dimension should be the same. Sturla From sturla at molden.no Tue Feb 7 13:41:41 2012 From: sturla at molden.no (Sturla Molden) Date: Tue, 07 Feb 2012 19:41:41 +0100 Subject: [Numpy-discussion] Logical indexing and higher-dimensional arrays. In-Reply-To: <4F316C56.6070003@molden.no> References: <4F316C56.6070003@molden.no> Message-ID: <4F317065.2030905@molden.no> On 07.02.2012 19:24, Sturla Molden wrote: > On 07.02.2012 19:17, Benjamin Root wrote: > >> >>> print x.shape >> (2, 3, 4) >> >>> print x[0, :, :].shape >> (3, 4) >> >>> print x[0, :, idx].shape >> (2, 3) > > That looks like a bug to me. The length of the first dimension should be > the same. I can reproduce this as well: >>> a = np.zeros((4,4,4)) >>> b = np.array([0,1,1,1], dtype=bool) >>> a[0,:,:].shape (4L, 4L) >>> a[0,:,b].shape (3L, 4L) >>> i, = np.where(b) >>> a[0,:,i].shape (3L, 4L) Take a look at this: >>> a = np.zeros((1,2,3,4,5,6,7,8,9,10)) >>> b = np.zeros(5, dtype=bool) >>> a[:,:,:,:,b,:,:,:,:,:].shape (1L, 2L, 3L, 4L, 0L, 6L, 7L, 8L, 9L, 10L) >>> a[0,:,:,:,b,:,:,:,:,:].shape (0L, 2L, 3L, 4L, 6L, 7L, 8L, 9L, 10L) >>> a[:,0,:,:,b,:,:,:,:,:].shape (0L, 1L, 3L, 4L, 6L, 7L, 8L, 9L, 10L) It's the combination of a single index and fancy indexing that does this, not the slicing. Sturla From focke at slac.stanford.edu Tue Feb 7 14:53:21 2012 From: focke at slac.stanford.edu (Warren Focke) Date: Tue, 7 Feb 2012 11:53:21 -0800 (PST) Subject: [Numpy-discussion] numpy.fft.irfftn fails apparently unexpectedly In-Reply-To: <1328606136.17023.11.camel@farnsworth> References: <1328553129.11049.12.camel@farnsworth> <1328606136.17023.11.camel@farnsworth> Message-ID: On Tue, 7 Feb 2012, Henry Gomersall wrote: > On Tue, 2012-02-07 at 01:04 +0100, Torgil Svensson wrote: >> irfftn is an optimization for real input and does not take complex >> input. You have to use numpy.fft.ifftn instead: >> > hmmm, that doesn't sound right to me (though there could be some non > obvious DFT magic that I'm missing). Indeed, > > np.irfftn(np.rfftn(a)) ~= a # The interim array is complex > > Though the documentation is a bit vague as to what inputs are expected! > > Actually, reading the fftpack docs, it *does* seem that this is the > correct behaviour (assuming when it says "Fourier coefficients" it means > complex), though I've not read any of the Python code. > You're not doing anything wrong. irfftn takes complex input and returns real output. The exception is a bug which is triggered because max(axes) >= len(axes). Warren Focke From stefan at sun.ac.za Tue Feb 7 15:15:56 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 7 Feb 2012 12:15:56 -0800 Subject: [Numpy-discussion] Logical indexing and higher-dimensional arrays. In-Reply-To: <4F317065.2030905@molden.no> References: <4F316C56.6070003@molden.no> <4F317065.2030905@molden.no> Message-ID: On Tue, Feb 7, 2012 at 10:41 AM, Sturla Molden wrote: > It's the combination of a single index and fancy indexing that does > this, not the slicing. There are some quirks in the broadcasting machinery that makes it almost impossible to guess what the outcome of mixed indexing is going to be. The safest is to stick either to slicing or to fancy indexing, e.g. In [64]: idx1 = np.arange(x.shape[1])[:, None] In [65]: idx2 = np.array([False, True, False, True]) In [66]: idx1.shape, idx2.shape Out[66]: ((3, 1), (4,)) In [67]: np.broadcast_arrays(idx1, idx2)[0].shape Out[67]: (3, 4) The output will be (.., 3, 4), except that idx2 only has two True elements, so (..., 3, 2). In [68]: x[0, idx1, idx2].shape Out[68]: (3, 2) Regards St?fan From heng at cantab.net Tue Feb 7 15:19:17 2012 From: heng at cantab.net (Henry Gomersall) Date: Tue, 07 Feb 2012 20:19:17 +0000 Subject: [Numpy-discussion] numpy.fft.irfftn fails apparently unexpectedly In-Reply-To: References: <1328553129.11049.12.camel@farnsworth> <1328606136.17023.11.camel@farnsworth> Message-ID: <1328645957.2371.6.camel@farnsworth> On Tue, 2012-02-07 at 11:53 -0800, Warren Focke wrote: > You're not doing anything wrong. > irfftn takes complex input and returns real output. > The exception is a bug which is triggered because max(axes) >= > len(axes). Is this a bug I should register? Cheers, Henry From focke at slac.stanford.edu Tue Feb 7 15:26:06 2012 From: focke at slac.stanford.edu (Warren Focke) Date: Tue, 7 Feb 2012 12:26:06 -0800 (PST) Subject: [Numpy-discussion] numpy.fft.irfftn fails apparently unexpectedly In-Reply-To: <1328645957.2371.6.camel@farnsworth> References: <1328553129.11049.12.camel@farnsworth> <1328606136.17023.11.camel@farnsworth> <1328645957.2371.6.camel@farnsworth> Message-ID: On Tue, 7 Feb 2012, Henry Gomersall wrote: > On Tue, 2012-02-07 at 11:53 -0800, Warren Focke wrote: >> You're not doing anything wrong. >> irfftn takes complex input and returns real output. >> The exception is a bug which is triggered because max(axes) >= >> len(axes). > > Is this a bug I should register? Yes. It should work right if you replace s[axes[-1]] = (s[axes[-1]] - 1) * 2 with s[-1] = (a.shape[axes[-1]] - 1) * 2 but I'm not really in a position to test it right now. Warren Focke From ralf.gommers at googlemail.com Tue Feb 7 15:59:00 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 7 Feb 2012 21:59:00 +0100 Subject: [Numpy-discussion] Moving to gcc 4.* for win32 installers ? In-Reply-To: <4F3125DC.5090207@molden.no> References: <4F3125DC.5090207@molden.no> Message-ID: On Tue, Feb 7, 2012 at 2:23 PM, Sturla Molden wrote: > On 04.02.2012 16:55, Ralf Gommers wrote: > > > Although not ideal, I don't have a problem with that in principle. > > However, wouldn't it break installing without admin rights if Python > > is installed by the admin? > > Not on Windows. > Not sure why you say that, in corporate environments it's fairly standard to have limited rights as a user. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Tue Feb 7 17:03:12 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 7 Feb 2012 16:03:12 -0600 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: References: <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io> <4F1955BC.4000207@gmail.com> Message-ID: <563BE2FC-8439-4418-AB74-64C83A5B1503@continuum.io> On Feb 7, 2012, at 4:02 AM, Pauli Virtanen wrote: > Hi, > > 06.02.2012 20:41, Ralf Gommers kirjoitti: > [clip] >> I've created https://github.com/scipy/scipy.github.com and gave you >> permissions on that. So with that for the built html and >> https://github.com/scipy/scipy.org-new for the sources, that should do it. >> >> On the numpy org I don't have the right permissions to do the same. > > Ditto for numpy.github.com, now. This is really nice. It will really help us make changes to the web-site quickly and synchronously with code changes. John Turner at ORNL has the numpy.org domain and perhaps we could get him to point it to numpy.github.com -Travis > > Pauli > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From travis at continuum.io Tue Feb 7 17:51:37 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 7 Feb 2012 16:51:37 -0600 Subject: [Numpy-discussion] Logical indexing and higher-dimensional arrays. In-Reply-To: References: Message-ID: This comes up from time to time. This is an example of what is described at the top of page 84 of "Guide to NumPy". Also read Chapter 17 to get the explanation of how fancy indexing is implemented if you really want to understand the issues. When you mix fancy-indexing with "simple indexing", the rules are not "special-cased" to handle this case of integer and masked indexing. They really should be, since it is non-ambiguous. There are other cases (say using two masked arrays in dimensions 1 and 3) where it's ambiguous what the dimension of the output should be and so the "punt" of moving the selected subspace (the one without the ':') to the front is used. So, I would consider this a bug. We will likely be re-visiting the situation of this kind of fancy indexing for NumPy 2.0. Thanks, -Travis On Feb 7, 2012, at 11:11 AM, Jordi Guti?rrez Hermoso wrote: > Consider the following. Is this a bug? > > Thanks, > - Jordi G. H. > > ----------------------------------------------- > #!/usr/bin/python > > import numpy as np > > x = np.reshape(np.random.uniform(size=2*3*4), [2,3,4]) > > idx = np.array([False, True, False, True]) > y = x[0,:,:]; > > ## Why is this transposed? > print x[0, :, idx].T == y[:, idx] > > ## This doesn't happen with non-boolean indexing > print x[0, :, 1:3] == y[:, 1:3] > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From heng at cantab.net Tue Feb 7 18:09:00 2012 From: heng at cantab.net (Henry Gomersall) Date: Tue, 07 Feb 2012 23:09:00 +0000 Subject: [Numpy-discussion] numpy.fft.irfftn fails apparently unexpectedly In-Reply-To: References: <1328553129.11049.12.camel@farnsworth> <1328606136.17023.11.camel@farnsworth> <1328645957.2371.6.camel@farnsworth> Message-ID: <1328656140.2371.19.camel@farnsworth> On Tue, 2012-02-07 at 12:26 -0800, Warren Focke wrote: > > Is this a bug I should register? > > Yes. > > It should work right if you replace > s[axes[-1]] = (s[axes[-1]] - 1) * 2 > with > s[-1] = (a.shape[axes[-1]] - 1) * 2 > but I'm not really in a position to test it right now. I can confirm that seems to solve the problem. Still want a bug report? Cheers, Henry From focke at slac.stanford.edu Tue Feb 7 18:15:05 2012 From: focke at slac.stanford.edu (Warren Focke) Date: Tue, 7 Feb 2012 15:15:05 -0800 (PST) Subject: [Numpy-discussion] numpy.fft.irfftn fails apparently unexpectedly In-Reply-To: <1328656140.2371.19.camel@farnsworth> References: <1328553129.11049.12.camel@farnsworth> <1328606136.17023.11.camel@farnsworth> <1328645957.2371.6.camel@farnsworth> <1328656140.2371.19.camel@farnsworth> Message-ID: On Tue, 7 Feb 2012, Henry Gomersall wrote: > On Tue, 2012-02-07 at 12:26 -0800, Warren Focke wrote: >>> Is this a bug I should register? >> >> Yes. >> >> It should work right if you replace >> s[axes[-1]] = (s[axes[-1]] - 1) * 2 >> with >> s[-1] = (a.shape[axes[-1]] - 1) * 2 >> but I'm not really in a position to test it right now. > > I can confirm that seems to solve the problem. Still want a bug report? No, I can file it, thanks. w From stefan at sun.ac.za Tue Feb 7 20:07:02 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 7 Feb 2012 17:07:02 -0800 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: <563BE2FC-8439-4418-AB74-64C83A5B1503@continuum.io> References: <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io> <4F1955BC.4000207@gmail.com> <563BE2FC-8439-4418-AB74-64C83A5B1503@continuum.io> Message-ID: On Tue, Feb 7, 2012 at 2:03 PM, Travis Oliphant wrote: > John Turner at ORNL has the numpy.org domain and perhaps we could get him to point it to numpy.github.com Remember to also put a CNAME file in the root of the repository: http://pages.github.com/ St?fan From aronne.merrelli at gmail.com Tue Feb 7 23:34:26 2012 From: aronne.merrelli at gmail.com (Aronne Merrelli) Date: Tue, 7 Feb 2012 22:34:26 -0600 Subject: [Numpy-discussion] matrix indexing In-Reply-To: References: Message-ID: On Mon, Feb 6, 2012 at 11:44 AM, Naresh Pai wrote: > I have two large matrices, say, ABC and DEF, each with a shape of 7000 by > 4500. I have another list, say, elem, containing 850 values from ABC. I am > interested in finding out the corresponding values in DEF where ABC has > elem and store them *separately*. The code that I am using is: > > for i in range(len(elem)): > DEF_distr = DEF[ABC==elem[i]] > > DEF_distr gets used for further processing before it gets cleared from > memory and the next round of the above loop begins. The loop above > currently takes about 20 minutes! I think the bottle neck is where elem is > getting searched repeatedly in ABC. So I am looking for a solution where > all elem can get processed in a single call and the indices of ABC be > stored in another variable (separately). I would appreciate if you suggest > any faster method for getting DEF_distr. > > You'll need to mention some details about the contents of ABC/DEF in order to get the best answer (what range of values, do they have a certain structure, etc). I made the assumption that ABC and elem have integers (I'm not sure it makes sense to search for ABC==elem[n] unless they are both integers), and then used a sort followed by searchsorted. This has a side effect of reordering the elements in DEF_distr. I don't know if that matters. You can skip the .copy() calls if you don't care that ABC/DEF are sorted. ABC_1D = ABC.copy().ravel() ABC_1D_sorter = np.argsort(ABC_1D) ABC_1D = ABC_1D[ABC_1D_sorter] DEF_1D = DEF.copy().ravel() DEF_1D = DEF_1D[ABC_1D_sorter] ind1 = np.searchsorted(ABC_1D, elem, side='left') ind2 = np.searchsorted(ABC_1D, elem, side='right') DEF_distr = [] for n in range(len(elem)): DEF_distr.append( DEF_1D[ind1[n]:ind2[n]] ) I tried this on the big memory workstation, and for the 7Kx4K size I get about 100 seconds for the simple method and 10 seconds for this more complicated sort-based method - if you are getting 20 minutes for that, maybe there is a memory problem, or a different part of the code that is the bottleneck? Hope that helps, Aronne -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Wed Feb 8 00:01:30 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 7 Feb 2012 23:01:30 -0600 Subject: [Numpy-discussion] Logical indexing and higher-dimensional arrays. In-Reply-To: <4F316C56.6070003@molden.no> References: <4F316C56.6070003@molden.no> Message-ID: <2DC5C816-DEE7-46DA-897A-25D0F1F64204@continuum.io> On Feb 7, 2012, at 12:24 PM, Sturla Molden wrote: > On 07.02.2012 19:17, Benjamin Root wrote: > >>>>> print x.shape >> (2, 3, 4) >>>>> print x[0, :, :].shape >> (3, 4) >>>>> print x[0, :, idx].shape >> (2, 3) > > That looks like a bug to me. The length of the first dimension should be > the same. What you are probably expecting is (3,2) for this selection, but whenever you have ':' dimensions in-between "fancy-indexing", the rules that govern fancy-indexing are ambiguous in general about how to handle this case. In this specific case (with a scalar being broadcast against the idx) it is pretty clear what to do, and I consider it a bug that a special case for this situation is not there. Recall that the shape of the output with fancy indexing is determined by broadcasting together the indexing objects and using that as the shape of the output: x[ind1, ind2] will produce an output with the shape of "broadcast(ind1, ind2)" whose elements are selected by the broadcasted tuple. When this is combined with standard slicing like so: x[ind1, :, ind2], the question is what should the shape of the output me. If ind1 is a scalar there is no ambiguity (and this should be special cased --- but unfortunately isn't). If ind1 is not a scalar, then what should the shape be under the rules of "zip-based" indexing. I don't know. So, in fact, what happens is that the broadcasted shape is determined and used as the "first part" of the shape. The "second part" of the shape is the shape of the slice-based selection. So, in this case the (0 and idx) broadcast to the (2,) part of the shape which is placed at the first of the result. The last part of the shape is the middle dimension (3,) resulting in the final shape (2,3). It could be argued that, in fact, this is a good example of why fancy indexing should follow cross-product semantics, and the current zip-based semantics should be moved to a method --- where the difficult-to-understand behavior with intermediate slices is also harder to spell because you have to explicitly create slice objects with "slice". What do others think? Obviously this couldn't change immediately, but it could be on the road-map for NumPy 2.0 or later. Best regards, -Travis From kalatsky at gmail.com Wed Feb 8 01:25:58 2012 From: kalatsky at gmail.com (Val Kalatsky) Date: Wed, 8 Feb 2012 00:25:58 -0600 Subject: [Numpy-discussion] matrix indexing In-Reply-To: References: Message-ID: Aronne made good suggestions. Here is another weapon for your arsenal: 1) I assume that the shape of your array is irrelevant (reshape if needed) 2) Depending on the structure of your data np.unique can be handy: arr_unique, idx = np.unique(arr1d, return_inverse=True) then search arr_unique instead of arr1d. 3) Caveat: np.unique is a major memory hogger, be prepared to waste ~1GB. Val On Tue, Feb 7, 2012 at 10:34 PM, Aronne Merrelli wrote: > > > On Mon, Feb 6, 2012 at 11:44 AM, Naresh Pai wrote: > >> I have two large matrices, say, ABC and DEF, each with a shape of 7000 by >> 4500. I have another list, say, elem, containing 850 values from ABC. I am >> interested in finding out the corresponding values in DEF where ABC has >> elem and store them *separately*. The code that I am using is: >> >> for i in range(len(elem)): >> DEF_distr = DEF[ABC==elem[i]] >> >> DEF_distr gets used for further processing before it gets cleared from >> memory and the next round of the above loop begins. The loop above >> currently takes about 20 minutes! I think the bottle neck is where elem is >> getting searched repeatedly in ABC. So I am looking for a solution where >> all elem can get processed in a single call and the indices of ABC be >> stored in another variable (separately). I would appreciate if you suggest >> any faster method for getting DEF_distr. >> >> > You'll need to mention some details about the contents of ABC/DEF in order > to get the best answer (what range of values, do they have a certain > structure, etc). I made the assumption that ABC and elem have integers (I'm > not sure it makes sense to search for ABC==elem[n] unless they are both > integers), and then used a sort followed by searchsorted. This has a side > effect of reordering the elements in DEF_distr. I don't know if that > matters. You can skip the .copy() calls if you don't care that ABC/DEF are > sorted. > > ABC_1D = ABC.copy().ravel() > ABC_1D_sorter = np.argsort(ABC_1D) > ABC_1D = ABC_1D[ABC_1D_sorter] > DEF_1D = DEF.copy().ravel() > DEF_1D = DEF_1D[ABC_1D_sorter] > ind1 = np.searchsorted(ABC_1D, elem, side='left') > ind2 = np.searchsorted(ABC_1D, elem, side='right') > DEF_distr = [] > for n in range(len(elem)): > DEF_distr.append( DEF_1D[ind1[n]:ind2[n]] ) > > > I tried this on the big memory workstation, and for the 7Kx4K size I get > about 100 seconds for the simple method and 10 seconds for this more > complicated sort-based method - if you are getting 20 minutes for that, > maybe there is a memory problem, or a different part of the code that is > the bottleneck? > > Hope that helps, > Aronne > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Wed Feb 8 03:55:31 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 8 Feb 2012 00:55:31 -0800 Subject: [Numpy-discussion] Logical indexing and higher-dimensional arrays. In-Reply-To: <2DC5C816-DEE7-46DA-897A-25D0F1F64204@continuum.io> References: <4F316C56.6070003@molden.no> <2DC5C816-DEE7-46DA-897A-25D0F1F64204@continuum.io> Message-ID: On Tue, Feb 7, 2012 at 9:01 PM, Travis Oliphant wrote: > like so: ?x[ind1, :, ind2], the question is what should the shape of the output me. ? If ind1 is a scalar there is no ambiguity (and this should be special cased --- but unfortunately isn't). If x.shape == (a0, a1, a2) ind1.shape == (b0, 1, 1) ind2.shape == (1, 1, b1) then an expected output shape for x[ind1, :, ind2] could be (b0, a1, b1). At the moment it is (b0, 1, b1, a1). The logic would be fairly straightforward, something akin to: out_shape = list(np.broadcast_arrays(ind1, ind2)[0].shape) out_shape[colon_positions] = x.shape[colon_positions] St?fan From scott.sinclair.za at gmail.com Wed Feb 8 05:22:58 2012 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Wed, 8 Feb 2012 12:22:58 +0200 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: References: <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io> <4F1955BC.4000207@gmail.com> <563BE2FC-8439-4418-AB74-64C83A5B1503@continuum.io> Message-ID: 2012/2/8 St?fan van der Walt : > On Tue, Feb 7, 2012 at 2:03 PM, Travis Oliphant wrote: >> John Turner at ORNL has the numpy.org domain and perhaps we could get him to point it to numpy.github.com > > Remember to also put a CNAME file in the root of the repository: > > http://pages.github.com/ Hi Pauli, I see that you've added the CNAME file. Now numpy.github.com is being redirected to numpy.scipy.org (the old site). As I understand it, whoever controls the scipy.org DNS settings needs point numpy.scipy.org at numpy.github.com so that people get the updated site when they browse to numpy.scipy.org.. Cheers, Scott From pav at iki.fi Wed Feb 8 05:50:36 2012 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 08 Feb 2012 11:50:36 +0100 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: References: <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io> <4F1955BC.4000207@gmail.com> <563BE2FC-8439-4418-AB74-64C83A5B1503@continuum.io> Message-ID: Hi, 08.02.2012 11:22, Scott Sinclair kirjoitti: [clip] > I see that you've added the CNAME file. Now numpy.github.com is being > redirected to numpy.scipy.org (the old site). > > As I understand it, whoever controls the scipy.org DNS settings needs > point numpy.scipy.org at numpy.github.com so that people get the > updated site when they browse to numpy.scipy.org.. Oops, so it seems. I read the Github pages docs and thought that the CNAME file just adds a virtual host, but apparently I misunderstood. I'll revert... Pauli From shish at keba.be Wed Feb 8 09:11:55 2012 From: shish at keba.be (Olivier Delalleau) Date: Wed, 8 Feb 2012 09:11:55 -0500 Subject: [Numpy-discussion] Logical indexing and higher-dimensional arrays. In-Reply-To: <2DC5C816-DEE7-46DA-897A-25D0F1F64204@continuum.io> References: <4F316C56.6070003@molden.no> <2DC5C816-DEE7-46DA-897A-25D0F1F64204@continuum.io> Message-ID: Le 8 f?vrier 2012 00:01, Travis Oliphant a ?crit : > > On Feb 7, 2012, at 12:24 PM, Sturla Molden wrote: > > > On 07.02.2012 19:17, Benjamin Root wrote: > > > >>>>> print x.shape > >> (2, 3, 4) > >>>>> print x[0, :, :].shape > >> (3, 4) > >>>>> print x[0, :, idx].shape > >> (2, 3) > > > > That looks like a bug to me. The length of the first dimension should be > > the same. > > What you are probably expecting is (3,2) for this selection, but whenever > you have ':' dimensions in-between "fancy-indexing", the rules that govern > fancy-indexing are ambiguous in general about how to handle this case. In > this specific case (with a scalar being broadcast against the idx) it is > pretty clear what to do, and I consider it a bug that a special case for > this situation is not there. > > Recall that the shape of the output with fancy indexing is determined by > broadcasting together the indexing objects and using that as the shape of > the output: > > x[ind1, ind2] will produce an output with the shape of "broadcast(ind1, > ind2)" whose elements are selected by the broadcasted tuple. When this > is combined with standard slicing like so: x[ind1, :, ind2], the question > is what should the shape of the output me. If ind1 is a scalar there is > no ambiguity (and this should be special cased --- but unfortunately > isn't). If ind1 is not a scalar, then what should the shape be under the > rules of "zip-based" indexing. I don't know. So, in fact, what happens > is that the broadcasted shape is determined and used as the "first part" of > the shape. The "second part" of the shape is the shape of the slice-based > selection. > > So, in this case the (0 and idx) broadcast to the (2,) part of the shape > which is placed at the first of the result. The last part of the shape is > the middle dimension (3,) resulting in the final shape (2,3). > > It could be argued that, in fact, this is a good example of why fancy > indexing should follow cross-product semantics, and the current zip-based > semantics should be moved to a method --- where the difficult-to-understand > behavior with intermediate slices is also harder to spell because you have > to explicitly create slice objects with "slice". What do others think? > Obviously this couldn't change immediately, but it could be on the > road-map for NumPy 2.0 or later. > >From a user perspective, I would definitely prefer cross-product semantics for fancy indexing. In fact, I had never used fancy indexing with more than one array index, so the behavior described in this thread totally baffled me. If for instance x is a matrix, I think it's intuitive to expect x[0:2, 0:2] and x[[0, 1], [0, 1]] to return the same data. -=- Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Wed Feb 8 09:29:30 2012 From: sturla at molden.no (Sturla Molden) Date: Wed, 08 Feb 2012 15:29:30 +0100 Subject: [Numpy-discussion] Logical indexing and higher-dimensional arrays. In-Reply-To: <2DC5C816-DEE7-46DA-897A-25D0F1F64204@continuum.io> References: <4F316C56.6070003@molden.no> <2DC5C816-DEE7-46DA-897A-25D0F1F64204@continuum.io> Message-ID: <4F3286CA.3060609@molden.no> On 08.02.2012 06:01, Travis Oliphant wrote: > Recall that the shape of the output with fancy indexing is determined by broadcasting together the indexing objects and using that as the shape of the output: > > x[ind1, ind2] will produce an output with the shape of "broadcast(ind1, ind2)" whose elements are selected by the broadcasted tuple. Thank you for the explanation, Travis. I think my main confusion is why we broadcast indices at all. Broadcasting is something I would expect to happen for binary operations between ndarrays, not for indexing. (But it might be that I don't fully understand the problem.) Sturla From travis at continuum.io Wed Feb 8 09:49:50 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 8 Feb 2012 08:49:50 -0600 Subject: [Numpy-discussion] Logical indexing and higher-dimensional arrays. In-Reply-To: <4F3286CA.3060609@molden.no> References: <4F316C56.6070003@molden.no> <2DC5C816-DEE7-46DA-897A-25D0F1F64204@continuum.io> <4F3286CA.3060609@molden.no> Message-ID: <386B8495-FB9C-422A-9735-04966EB31322@continuum.io> On Feb 8, 2012, at 8:29 AM, Sturla Molden wrote: > On 08.02.2012 06:01, Travis Oliphant wrote: > >> Recall that the shape of the output with fancy indexing is determined by broadcasting together the indexing objects and using that as the shape of the output: >> >> x[ind1, ind2] will produce an output with the shape of "broadcast(ind1, ind2)" whose elements are selected by the broadcasted tuple. > > > > > Thank you for the explanation, Travis. > > I think my main confusion is why we broadcast indices at all. > Broadcasting is something I would expect to happen for binary operations > between ndarrays, not for indexing. We broadcast because of the "zip-based" semantics of current fancy-indexing which is basically an element-by-element operation: x[ind1, ind2] will choose it's elements out of x by taken elements from ind1 as the first index and elements out of ind2 as the second. As an essentially element-by-element operation, if the shapes of the input arrays are not exactly the same, it is natural to broadcast them to the same shape. This is fairly intuitive if you are used to broadcasting in NumPy. The problem is that it does not coincide other intuitions in certain cases. This behavior actually allows easy-to-access "cross-product" behavior by broadcasting a 1-d ind1 shaped as (4,) with a ind2 shaped as (4,1). The numpy.ix_ function does the basic reshaping needed so that 0:2, 0:2 matches [0,1],[0,1] indexing: x[ix_([0,1],[0,1])] is the same as x[0:2,0:2]. There are also some very nice applications where you can select out of a 3-d volume a depth-surface defined by indexes like so: arr[ i[:,newaxis], j, depth] where arr is a 3-d array, i and j are 1-d index arrays: i = arange(arr.shape[0]) and j = arange(arr.shape[1]), and depth is a 2-d array of "depths". The selected result will be 2-d. When you understand what is happening, it is consistent and it does make sense and it has some very nice applications. I recognize that people get confused because their expectations are different, but the current behavior can be understood with a fairly simple mental model. I could support, however, a move to push this style of indexing to a method, and make fancy-indexing use cross-product semantics if: * we do it in a phased-manner with lot's of deprecation warnings and even a tool that helps you change code (the tool would have to "play" your code and make a note of the locations where this style of indexing was used --- because static code analysis wouldn't know which objects were arrays and which weren't). * we also make the ndarray object general enough so that fancy-indexing could be returned as a view on the array (the same as regular indexing) This sort of thing would take time, but is not out of the question in my mind because I suspect the number of users and use-cases of "broadcasted" fancy-indexing is small. -Travis > > (But it might be that I don't fully understand the problem.) > > Sturla > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From sturla at molden.no Wed Feb 8 10:08:29 2012 From: sturla at molden.no (Sturla Molden) Date: Wed, 08 Feb 2012 16:08:29 +0100 Subject: [Numpy-discussion] Logical indexing and higher-dimensional arrays. In-Reply-To: References: <4F316C56.6070003@molden.no> <2DC5C816-DEE7-46DA-897A-25D0F1F64204@continuum.io> Message-ID: <4F328FED.9070902@molden.no> On 08.02.2012 15:11, Olivier Delalleau wrote: > From a user perspective, I would definitely prefer cross-product > semantics for fancy indexing. In fact, I had never used fancy indexing > with more than one array index, so the behavior described in this thread > totally baffled me. If for instance x is a matrix, I think it's > intuitive to expect x[0:2, 0:2] and x[[0, 1], [0, 1]] to return the same > data. I think most would prefer cross-product semantics. We might be copying a bad feature of Matlab. Maybe we should just disallow fancy indexing with more than one dimension, e.g. array[X,Y] with X and Y from meshgrid. It might be that the kind of result x[[0, 1],[0, 1]] produces today is better left to a function, e.g. np.meshselect(x, *indices). Then we could just require that all arguments passed to *indices have the same shape. Sturla From sturla at molden.no Wed Feb 8 10:29:49 2012 From: sturla at molden.no (Sturla Molden) Date: Wed, 08 Feb 2012 16:29:49 +0100 Subject: [Numpy-discussion] Logical indexing and higher-dimensional arrays. In-Reply-To: <386B8495-FB9C-422A-9735-04966EB31322@continuum.io> References: <4F316C56.6070003@molden.no> <2DC5C816-DEE7-46DA-897A-25D0F1F64204@continuum.io> <4F3286CA.3060609@molden.no> <386B8495-FB9C-422A-9735-04966EB31322@continuum.io> Message-ID: <4F3294ED.6040308@molden.no> On 08.02.2012 15:49, Travis Oliphant wrote: > This sort of thing would take time, but is not out of the question in my mind because I suspect the number of users and use-cases of "broadcasted" fancy-indexing is small. In Matlab this (misfeature?) is generally used to compensate for the lack of broadbasting. So many Matlab programmers manually broadcast by fancy indexing with an array of (boolean) ones (it's less verbose than reshape). But with broadcasting for binary array operators (NumPy and Fortran 90), this is never needed. Sturla From cooke.stephanie at gmail.com Wed Feb 8 11:32:34 2012 From: cooke.stephanie at gmail.com (Stephanie Cooke) Date: Wed, 8 Feb 2012 11:32:34 -0500 Subject: [Numpy-discussion] hstack Message-ID: Hello, When I try to use the command hstack, I am given the error message "TypeError: hstack() takes exactly 1 argument (2 given)". I have a 9X1 array (called array) that I would like to concatenate to a 9X2 matrix (called matrix), and I try to do this by typing the command hstack(array,matrix). I would appreciate any help in getting this to work. Thanks, Stephanie From tsyu80 at gmail.com Wed Feb 8 11:35:05 2012 From: tsyu80 at gmail.com (Tony Yu) Date: Wed, 8 Feb 2012 11:35:05 -0500 Subject: [Numpy-discussion] hstack In-Reply-To: References: Message-ID: On Wed, Feb 8, 2012 at 11:32 AM, Stephanie Cooke wrote: > Hello, > > When I try to use the command hstack, I am given the error message > "TypeError: hstack() takes exactly 1 argument (2 given)". I have a 9X1 > array (called array) that I would like to concatenate to a 9X2 matrix > (called matrix), and I try to do this by typing the command > hstack(array,matrix). I would appreciate any help in getting this to > work. > > Thanks, > > Stephanie Try hstack([array, matrix]). The input needs to be put into a single sequence (here a list, but tuple, etc. would work too). Best, -Tony -------------- next part -------------- An HTML attachment was scrubbed... URL: From malcolm.reynolds at gmail.com Wed Feb 8 11:34:54 2012 From: malcolm.reynolds at gmail.com (Malcolm Reynolds) Date: Wed, 8 Feb 2012 16:34:54 +0000 Subject: [Numpy-discussion] hstack In-Reply-To: References: Message-ID: You On Wed, Feb 8, 2012 at 4:32 PM, Stephanie Cooke wrote: > Hello, > > When I try to use the command hstack, I am given the error message > "TypeError: hstack() takes exactly 1 argument (2 given)". I have a 9X1 > array (called array) that I would like to concatenate to a 9X2 matrix > (called matrix), and I try to do this by typing the command > hstack(array,matrix). I would appreciate any help in getting this to > work. You need to put the arrays / matrices you want to stack in a tuple. So in your case, use "hstack((array,matrix))" - note the extra parentheses. Try "print numpy.hstack.__doc__" for more details. Malcolm From pjabardo at yahoo.com.br Wed Feb 8 11:39:38 2012 From: pjabardo at yahoo.com.br (Paulo Jabardo) Date: Wed, 8 Feb 2012 08:39:38 -0800 (PST) Subject: [Numpy-discussion] hstack In-Reply-To: References: Message-ID: <1328719178.36148.YahooMailNeo@web30004.mail.mud.yahoo.com> The problem is that hstack needs a tuple as argument: x = ones( (9,1) ) y = zeros( (9,2) ) z = hstack( (x,y) ) Notice the parenthesis in the arguments. Paulo ________________________________ De: Stephanie Cooke Para: numpy-discussion at scipy.org Enviadas: Quarta-feira, 8 de Fevereiro de 2012 14:32 Assunto: [Numpy-discussion] hstack Hello, When I try to use the command hstack, I am given the error message "TypeError: hstack() takes exactly 1 argument (2 given)". I have a 9X1 array (called array) that I would like to concatenate to a 9X2 matrix (called matrix), and I try to do this by typing the command hstack(array,matrix). I would appreciate any help in getting this to work. Thanks, Stephanie _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsalvati at u.washington.edu Wed Feb 8 12:10:37 2012 From: jsalvati at u.washington.edu (John Salvatier) Date: Wed, 8 Feb 2012 09:10:37 -0800 Subject: [Numpy-discussion] just the date part of a datetime64[s]? Message-ID: Hello, is there a good way to get just the date part of a datetime64? Frequently datetime datatypes have month(), date(), hour(), etc functions that pull out part of the datetime, but I didn't see those mentioned in the datetime64 docs. Casting to a 'D' dtype didn't work as I would have hoped: In [30]: x= datetime64('2012-02-02 09:00:00', 's') In [31]: x Out[31]: numpy.datetime64('2012-02-02T09:00:00-0800') In [32]: x.astype('datetime64[D]').astype('datetime64[s]') Out[32]: numpy.datetime64('2012-02-01T16:00:00-0800') What's the simplest way to do this? -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Feb 8 12:17:42 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 8 Feb 2012 12:17:42 -0500 Subject: [Numpy-discussion] Logical indexing and higher-dimensional arrays. In-Reply-To: <4F3294ED.6040308@molden.no> References: <4F316C56.6070003@molden.no> <2DC5C816-DEE7-46DA-897A-25D0F1F64204@continuum.io> <4F3286CA.3060609@molden.no> <386B8495-FB9C-422A-9735-04966EB31322@continuum.io> <4F3294ED.6040308@molden.no> Message-ID: On Wed, Feb 8, 2012 at 10:29 AM, Sturla Molden wrote: > On 08.02.2012 15:49, Travis Oliphant wrote: > >> This sort of thing would take time, but is not out of the question in my mind because I suspect the number of users and use-cases of "broadcasted" fancy-indexing is small. I think I use it quite a bit, and I like that the broadcasting in indexing is as flexible as the broadcasting of numpy arrays themselves. x[np.arange(len(x)), np.arange(len(x))] gives the diagonal for example. or picking a different element from each column, (I don't remember where I used that) It is surprising at first and takes some getting used to, but I think it's pretty nice. On the other hand, I always avoid mixing slices and indexes because of "strange" results. Josef > > In Matlab this (misfeature?) is generally used to compensate for the > lack of broadbasting. So many Matlab programmers manually broadcast by > fancy indexing with an array of (boolean) ones (it's less verbose than > reshape). But with broadcasting for binary array operators (NumPy and > Fortran 90), this is never needed. > > Sturla > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From silideba at gmail.com Wed Feb 8 13:18:32 2012 From: silideba at gmail.com (Debashish Saha) Date: Wed, 8 Feb 2012 23:48:32 +0530 Subject: [Numpy-discussion] how to insert some specific delay Message-ID: how to insert some specific delay in python programming using numpy command. From sturla at molden.no Wed Feb 8 13:20:25 2012 From: sturla at molden.no (Sturla Molden) Date: Wed, 08 Feb 2012 19:20:25 +0100 Subject: [Numpy-discussion] Logical indexing and higher-dimensional arrays. In-Reply-To: References: <4F316C56.6070003@molden.no> <2DC5C816-DEE7-46DA-897A-25D0F1F64204@continuum.io> <4F3286CA.3060609@molden.no> <386B8495-FB9C-422A-9735-04966EB31322@continuum.io> <4F3294ED.6040308@molden.no> Message-ID: <4F32BCE9.5070407@molden.no> On 08.02.2012 18:17, josef.pktd at gmail.com wrote: > I think I use it quite a bit, and I like that the broadcasting in > indexing is as flexible as the broadcasting of numpy arrays > themselves. > > x[np.arange(len(x)), np.arange(len(x))] gives the diagonal for example. Personally I would prefer that a function like np.diag could return a view. (The stride is regular, so why not?) def diag(x): return x.reshape(np.prod(x.shape))[::x.shape[0]+1] >>> a = np.zeros((10,10)) >>> d = diag(a) >>> d[:] = 2 >>> a array([[ 2., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 2., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 2., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 2., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 2., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 2., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 2., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 2., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 2., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0., 2.]]) Sturla From chris.barker at noaa.gov Wed Feb 8 13:21:25 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 8 Feb 2012 10:21:25 -0800 Subject: [Numpy-discussion] how to insert some specific delay In-Reply-To: References: Message-ID: On Wed, Feb 8, 2012 at 10:18 AM, Debashish Saha wrote: > how to insert some specific delay in python programming using numpy command. do you mean a time delay? If so -- numpy doesn't (and shouldn't) have such a thing. however, the time module has time.sleep() whether it's a good idea to use that depends on your context -- for a simple script, maybe. If your code is running in a GUI or something, then probably not, look at your GUI toolkit's options for timers, etc. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From aronne.merrelli at gmail.com Wed Feb 8 13:25:09 2012 From: aronne.merrelli at gmail.com (Aronne Merrelli) Date: Wed, 8 Feb 2012 12:25:09 -0600 Subject: [Numpy-discussion] Logical indexing and higher-dimensional arrays. In-Reply-To: <386B8495-FB9C-422A-9735-04966EB31322@continuum.io> References: <4F316C56.6070003@molden.no> <2DC5C816-DEE7-46DA-897A-25D0F1F64204@continuum.io> <4F3286CA.3060609@molden.no> <386B8495-FB9C-422A-9735-04966EB31322@continuum.io> Message-ID: On Wed, Feb 8, 2012 at 8:49 AM, Travis Oliphant wrote: > > On Feb 8, 2012, at 8:29 AM, Sturla Molden wrote: > > > On 08.02.2012 06:01, Travis Oliphant wrote: > > > >> Recall that the shape of the output with fancy indexing is determined > by broadcasting together the indexing objects and using that as the shape > of the output: > >> > >> x[ind1, ind2] will produce an output with the shape of "broadcast(ind1, > ind2)" whose elements are selected by the broadcasted tuple. > > > > > > > > > > Thank you for the explanation, Travis. > > > > I think my main confusion is why we broadcast indices at all. > > Broadcasting is something I would expect to happen for binary operations > > between ndarrays, not for indexing. > > We broadcast because of the "zip-based" semantics of current > fancy-indexing which is basically an element-by-element operation: > x[ind1, ind2] will choose it's elements out of x by taken elements from > ind1 as the first index and elements out of ind2 as the second. > > As an essentially element-by-element operation, if the shapes of the input > arrays are not exactly the same, it is natural to broadcast them to the > same shape. This is fairly intuitive if you are used to broadcasting in > NumPy. The problem is that it does not coincide other intuitions in > certain cases. > > This behavior actually allows easy-to-access "cross-product" behavior by > broadcasting a 1-d ind1 shaped as (4,) with a ind2 shaped as (4,1). The > numpy.ix_ function does the basic reshaping needed so that 0:2, 0:2 > matches [0,1],[0,1] indexing: x[ix_([0,1],[0,1])] is the same as > x[0:2,0:2]. > > There are also some very nice applications where you can select out of a > 3-d volume a depth-surface defined by indexes like so: > > arr[ i[:,newaxis], j, depth] > > where arr is a 3-d array, i and j are 1-d index arrays: i = > arange(arr.shape[0]) and j = arange(arr.shape[1]), and depth is a 2-d array > of "depths". The selected result will be 2-d. > > When you understand what is happening, it is consistent and it does make > sense and it has some very nice applications. I recognize that people > get confused because their expectations are different, but the current > behavior can be understood with a fairly simple mental model. > > I could support, however, a move to push this style of indexing to a > method, and make fancy-indexing use cross-product semantics if: > > * we do it in a phased-manner with lot's of deprecation warnings > and even a tool that helps you change code (the tool would have to "play" > your code and make a note of the locations where this style of indexing was > used --- because static code analysis wouldn't know which objects were > arrays and which weren't). > > * we also make the ndarray object general enough so that > fancy-indexing could be returned as a view on the array (the same as > regular indexing) > > This sort of thing would take time, but is not out of the question in my > mind because I suspect the number of users and use-cases of "broadcasted" > fancy-indexing is small. > > -Travis > > Speaking for myself, I've used both methods quite often. For the broadcasting method, it is very useful for image processing scenarios where you want to extract a small subregion with an arbitrary shape. It is also extremely useful for this to work both ways, for example if I wanted to do some processing on that subregion and then put it back in the larger image: sub_reg_vals = img[ ind1, ind2 ] do_stuff(sub_reg_vals) img[ ind1, ind2 ] = sub_reg_vals Of course this may require awareness of the coordinates versus flat index, but the ravel/unravel_index functions do this for you. Note that IDL works in this way, unlike MATLAB. The "quick and dirty" way I've been doing it in MATLAB is sub_reg_vals = diag( img[ind1, ind2] ) Which works but is obviously a bit inefficient. I have no convincing ideas about which method should be the "default", but I think both methods get a fair bit of use, so as long as there are fast and well documented methods for doing either, I would be happy. I actually had no idea that np.ix_ did the other method in NumPy. (Thanks for mentioning that! I had been using A[ind1,:][:,ind2]). There is no generic terminology for these operations, so I think that generates a lot of confusion. Thanks, Aronne -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Wed Feb 8 17:11:17 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 8 Feb 2012 16:11:17 -0600 Subject: [Numpy-discussion] Logical indexing and higher-dimensional arrays. In-Reply-To: References: <4F316C56.6070003@molden.no> <2DC5C816-DEE7-46DA-897A-25D0F1F64204@continuum.io> <4F3286CA.3060609@molden.no> <386B8495-FB9C-422A-9735-04966EB31322@continuum.io> <4F3294ED.6040308@molden.no> Message-ID: On Feb 8, 2012, at 11:17 AM, josef.pktd at gmail.com wrote: > On Wed, Feb 8, 2012 at 10:29 AM, Sturla Molden wrote: >> On 08.02.2012 15:49, Travis Oliphant wrote: >> >>> This sort of thing would take time, but is not out of the question in my mind because I suspect the number of users and use-cases of "broadcasted" fancy-indexing is small. > > I think I use it quite a bit, and I like that the broadcasting in > indexing is as flexible as the broadcasting of numpy arrays > themselves. > > x[np.arange(len(x)), np.arange(len(x))] gives the diagonal for example. > > or picking a different element from each column, (I don't remember > where I used that) > > It is surprising at first and takes some getting used to, but I think > it's pretty nice. On the other hand, I always avoid mixing slices and > indexes because of "strange" results. It actually is pretty nice once you understand it. Mixing of fancy indexing and slicing is only nice in special circumstances. I think we would have been better off if rather than move the subspace to the beginning of the array, NumPy raised an error in that case. That would be a useful change. -Travis > > Josef > > >> >> In Matlab this (misfeature?) is generally used to compensate for the >> lack of broadbasting. So many Matlab programmers manually broadcast by >> fancy indexing with an array of (boolean) ones (it's less verbose than >> reshape). But with broadcasting for binary array operators (NumPy and >> Fortran 90), this is never needed. >> >> Sturla >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From robert.kern at gmail.com Wed Feb 8 17:19:03 2012 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 8 Feb 2012 22:19:03 +0000 Subject: [Numpy-discussion] Logical indexing and higher-dimensional arrays. In-Reply-To: References: <4F316C56.6070003@molden.no> <2DC5C816-DEE7-46DA-897A-25D0F1F64204@continuum.io> <4F3286CA.3060609@molden.no> <386B8495-FB9C-422A-9735-04966EB31322@continuum.io> <4F3294ED.6040308@molden.no> Message-ID: On Wed, Feb 8, 2012 at 22:11, Travis Oliphant wrote: > > On Feb 8, 2012, at 11:17 AM, josef.pktd at gmail.com wrote: > >> On Wed, Feb 8, 2012 at 10:29 AM, Sturla Molden wrote: >>> On 08.02.2012 15:49, Travis Oliphant wrote: >>> >>>> This sort of thing would take time, but is not out of the question in my mind because I suspect the number of users and use-cases of "broadcasted" fancy-indexing is small. >> >> I think I use it quite a bit, and I like that the broadcasting in >> indexing is as flexible as the broadcasting of numpy arrays >> themselves. >> >> x[np.arange(len(x)), np.arange(len(x))] ?gives the diagonal for example. >> >> or picking a different element from each column, (I don't remember >> where I used that) >> >> It is surprising at first and takes some getting used to, but I think >> it's pretty nice. ?On the other hand, I always avoid mixing slices and >> indexes because of "strange" results. > > It actually is pretty nice once you understand it. ? Mixing of fancy indexing and slicing is only nice in special circumstances. ? I think we would have been better off if rather than move the subspace to the beginning of the array, NumPy raised an error in that case. > > That would be a useful change. We could start with a warning. See how many people kvetch about it. I don't like removing long-standing, documented features based on suspicions that their user base is small. Our suspicions and intuitions about such things aren't worth much. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From travis at continuum.io Wed Feb 8 17:27:53 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 8 Feb 2012 16:27:53 -0600 Subject: [Numpy-discussion] Logical indexing and higher-dimensional arrays. In-Reply-To: References: <4F316C56.6070003@molden.no> <2DC5C816-DEE7-46DA-897A-25D0F1F64204@continuum.io> <4F3286CA.3060609@molden.no> <386B8495-FB9C-422A-9735-04966EB31322@continuum.io> <4F3294ED.6040308@molden.no> Message-ID: On Feb 8, 2012, at 4:19 PM, Robert Kern wrote: > On Wed, Feb 8, 2012 at 22:11, Travis Oliphant wrote: >> >> On Feb 8, 2012, at 11:17 AM, josef.pktd at gmail.com wrote: >> >>> On Wed, Feb 8, 2012 at 10:29 AM, Sturla Molden wrote: >>>> On 08.02.2012 15:49, Travis Oliphant wrote: >>>> >>>>> This sort of thing would take time, but is not out of the question in my mind because I suspect the number of users and use-cases of "broadcasted" fancy-indexing is small. >>> >>> I think I use it quite a bit, and I like that the broadcasting in >>> indexing is as flexible as the broadcasting of numpy arrays >>> themselves. >>> >>> x[np.arange(len(x)), np.arange(len(x))] gives the diagonal for example. >>> >>> or picking a different element from each column, (I don't remember >>> where I used that) >>> >>> It is surprising at first and takes some getting used to, but I think >>> it's pretty nice. On the other hand, I always avoid mixing slices and >>> indexes because of "strange" results. >> >> It actually is pretty nice once you understand it. Mixing of fancy indexing and slicing is only nice in special circumstances. I think we would have been better off if rather than move the subspace to the beginning of the array, NumPy raised an error in that case. >> >> That would be a useful change. > > We could start with a warning. See how many people kvetch about it. I > don't like removing long-standing, documented features based on > suspicions that their user base is small. Our suspicions and > intuitions about such things aren't worth much. Yes, Agreed! Starting with a warning would be a good thing. -Travis From stefan at sun.ac.za Wed Feb 8 18:54:42 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 8 Feb 2012 15:54:42 -0800 Subject: [Numpy-discussion] Logical indexing and higher-dimensional arrays. In-Reply-To: <386B8495-FB9C-422A-9735-04966EB31322@continuum.io> References: <4F316C56.6070003@molden.no> <2DC5C816-DEE7-46DA-897A-25D0F1F64204@continuum.io> <4F3286CA.3060609@molden.no> <386B8495-FB9C-422A-9735-04966EB31322@continuum.io> Message-ID: On Wed, Feb 8, 2012 at 6:49 AM, Travis Oliphant wrote: > There are also some very nice applications where you can select out of a 3-d volume a depth-surface defined by indexes like so: > > ? ? ? ?arr[ i[:,newaxis], j, depth] > > where arr is a 3-d array, ?i and j are 1-d index arrays: i = arange(arr.shape[0]) and j = arange(arr.shape[1]), and depth is a 2-d array of "depths". ? The selected result will be 2-d. For those of you new to fancy indexing and broadcasting, have a look at the Advanced NumPy Tutorial slides from SciPy2008: http://mentat.za.net/numpy/numpy_advanced_slides/ It also includes an example like the one Travis mentioned. Regards St?fan From mwwiebe at gmail.com Wed Feb 8 21:48:02 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 8 Feb 2012 18:48:02 -0800 Subject: [Numpy-discussion] just the date part of a datetime64[s]? In-Reply-To: References: Message-ID: Converting between date and datetime requires caution, because it depends on your time zone. Because all datetime64's are internally stored in UTC, simply casting as in your example treats it in UTC. The 'astype' function does not raise an error to tell you that this is problematic, because NumPy's default casting for that function has no error policy (yet). Here's the trouble you can get into: x = datetime64('2012-02-02 22:00:00', 's') x.astype('M8[D]') Out[19]: numpy.datetime64('2012-02-03') The trouble happens the other way too, because a date is represented as midnight UTC. This would also raise an exception, but for the fact that astype does no checking: x = datetime64('2012-02-02') x.astype('M8[m]') Out[23]: numpy.datetime64('2012-02-01T16:00-0800') The intention is to have functions which handles this casting explicitly, called datetime_as_date and date_as_datetime. They would take a timezone parameter, so the code explicitly specifies how the conversion takes place. A crude replacement for now is: x = datetime64('2012-02-02 22:00:00', 's') np.datetime64(np.datetime_as_string(x, timezone='local')[:10]) Out[21]: numpy.datetime64('2012-02-02') This is hackish, but it should do what you want. -Mark On Wed, Feb 8, 2012 at 9:10 AM, John Salvatier wrote: > > Hello, is there a good way to get just the date part of a datetime64? > Frequently datetime datatypes have month(), date(), hour(), etc functions > that pull out part of the datetime, but I didn't see those mentioned in the > datetime64 docs. Casting to a 'D' dtype didn't work as I would have hoped: > > In [30]: x= datetime64('2012-02-02 09:00:00', 's') > > In [31]: x > Out[31]: numpy.datetime64('2012-02-02T09:00:00-0800') > > In [32]: x.astype('datetime64[D]').astype('datetime64[s]') > Out[32]: numpy.datetime64('2012-02-01T16:00:00-0800') > > What's the simplest way to do this? > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matteo.malosio at itia.cnr.it Thu Feb 9 02:31:43 2012 From: matteo.malosio at itia.cnr.it (teomat) Date: Wed, 8 Feb 2012 23:31:43 -0800 (PST) Subject: [Numpy-discussion] numpy.arange() error? Message-ID: <33277269.post@talk.nabble.com> Hi, Am I wrong or the numpy.arange() function is not correct 100%? Try to do this: In [7]: len(np.arange(3.1, 4.9, 0.1)) Out[7]: 18 In [8]: len(np.arange(8.1, 9.9, 0.1)) Out[8]: 19 I would expect the same result for each command. All the best -- View this message in context: http://old.nabble.com/numpy.arange%28%29-error--tp33277269p33277269.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From efiring at hawaii.edu Thu Feb 9 02:43:47 2012 From: efiring at hawaii.edu (Eric Firing) Date: Wed, 08 Feb 2012 21:43:47 -1000 Subject: [Numpy-discussion] numpy.arange() error? In-Reply-To: <33277269.post@talk.nabble.com> References: <33277269.post@talk.nabble.com> Message-ID: <4F337933.6010809@hawaii.edu> On 02/08/2012 09:31 PM, teomat wrote: > > Hi, > > Am I wrong or the numpy.arange() function is not correct 100%? > > Try to do this: > > In [7]: len(np.arange(3.1, 4.9, 0.1)) > Out[7]: 18 > > In [8]: len(np.arange(8.1, 9.9, 0.1)) > Out[8]: 19 > > I would expect the same result for each command. Not after more experience with the wonders of floating point! Nice-looking decimal numbers often have long, drawn-out, inexact floating point (base 2) representations. That leads to exactly this sort of problem. numpy.linspace is provided to help get around some of these surprises; or you can use an integer sequence and then scale and shift it. Eric > > All the best > > From eirik.gjerlow at astro.uio.no Thu Feb 9 06:32:33 2012 From: eirik.gjerlow at astro.uio.no (=?ISO-8859-1?Q?Eirik_Gjerl=F8w?=) Date: Thu, 09 Feb 2012 12:32:33 +0100 Subject: [Numpy-discussion] Numpy array slicing Message-ID: <4F33AED1.60606@uio.no> Hello, (also sent to Scipy-User, sorry for duplicates). This is (I think) a rather basic question about numpy slicing. I have the following code: In [29]: a.shape Out[29]: (3, 4, 12288, 2) In [30]: mask.shape Out[30]: (3, 12288) In [31]: mask.dtype Out[31]: dtype('bool') In [32]: sum(mask[0]) Out[32]: 12285 In [33]: a[[0] + [slice(None)] + [mask[0]] + [slice(None)]].shape Out[33]: (12285, 4, 2) My question is: Why is not the final shape (4, 12285, 2) instead of (12285, 4, 2)? Eirik Gjerl?w From shish at keba.be Thu Feb 9 06:55:18 2012 From: shish at keba.be (Olivier Delalleau) Date: Thu, 9 Feb 2012 06:55:18 -0500 Subject: [Numpy-discussion] Numpy array slicing In-Reply-To: <4F33AED1.60606@uio.no> References: <4F33AED1.60606@uio.no> Message-ID: This was actually discussed very recently (for more details: http://mail.scipy.org/pipermail/numpy-discussion/2012-February/060232.html). It's caused by mixing slicing with advanced indexing. The resulting shape is the concatenation of a first part obtained by broadcasting of the non-slice items (in your case, 0 and mask[0] being broadcasted to shape (12285,), the number of non-zero elements in mask[0]), followed by a second part obtained by the slice items (in your case extracting dimensions #1 and #3 of a, i.e. shape (4, 2)). So the final shape is (12285, 4, 2). -=- Olivier Le 9 f?vrier 2012 06:32, Eirik Gjerl?w a ?crit : > Hello, > > (also sent to Scipy-User, sorry for duplicates). > > This is (I think) a rather basic question about numpy slicing. I have > the following code: > > In [29]: a.shape > Out[29]: (3, 4, 12288, 2) > > In [30]: mask.shape > Out[30]: (3, 12288) > > In [31]: mask.dtype > Out[31]: dtype('bool') > > In [32]: sum(mask[0]) > Out[32]: 12285 > > In [33]: a[[0] + [slice(None)] + [mask[0]] + [slice(None)]].shape > Out[33]: (12285, 4, 2) > > My question is: Why is not the final shape (4, 12285, 2) instead of > (12285, 4, 2)? > > Eirik Gjerl?w > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eirik.gjerlow at astro.uio.no Thu Feb 9 08:28:12 2012 From: eirik.gjerlow at astro.uio.no (=?ISO-8859-1?Q?Eirik_Gjerl=F8w?=) Date: Thu, 09 Feb 2012 14:28:12 +0100 Subject: [Numpy-discussion] Numpy array slicing In-Reply-To: References: <4F33AED1.60606@uio.no> Message-ID: <4F33C9EC.7030205@uio.no> Ok, that was an enlightening discussion, I guess I signed up for this list a couple of days too late! Thanks, Eirik On 09. feb. 2012 12:55, Olivier Delalleau wrote: > This was actually discussed very recently (for more details: > http://mail.scipy.org/pipermail/numpy-discussion/2012-February/060232.html). > > It's caused by mixing slicing with advanced indexing. The resulting > shape is the concatenation of a first part obtained by broadcasting of > the non-slice items (in your case, 0 and mask[0] being broadcasted to > shape (12285,), the number of non-zero elements in mask[0]), followed > by a second part obtained by the slice items (in your case extracting > dimensions #1 and #3 of a, i.e. shape (4, 2)). > So the final shape is (12285, 4, 2). > > -=- Olivier > > > Le 9 f?vrier 2012 06:32, Eirik Gjerl?w > a ?crit : > > Hello, > > (also sent to Scipy-User, sorry for duplicates). > > This is (I think) a rather basic question about numpy slicing. I have > the following code: > > In [29]: a.shape > Out[29]: (3, 4, 12288, 2) > > In [30]: mask.shape > Out[30]: (3, 12288) > > In [31]: mask.dtype > Out[31]: dtype('bool') > > In [32]: sum(mask[0]) > Out[32]: 12285 > > In [33]: a[[0] + [slice(None)] + [mask[0]] + [slice(None)]].shape > Out[33]: (12285, 4, 2) > > My question is: Why is not the final shape (4, 12285, 2) instead of > (12285, 4, 2)? > > Eirik Gjerl?w > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Feb 9 10:29:51 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 9 Feb 2012 08:29:51 -0700 Subject: [Numpy-discussion] Cython question Message-ID: Hi All, Does anyone know how to make Cython emit a C macro? I would like to be able to #define NO_DEPRECATED_API and can do so by including a header file or futzing with the generator script, but I was wondering if there was an easy way to do it in Cython. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From markflorisson88 at gmail.com Thu Feb 9 10:35:03 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Thu, 9 Feb 2012 15:35:03 +0000 Subject: [Numpy-discussion] Cython question In-Reply-To: References: Message-ID: On 9 February 2012 15:29, Charles R Harris wrote: > Hi All, > > Does anyone know how to make Cython emit a C macro? I would like to be able > to > > #define NO_DEPRECATED_API > > and can do so by including a header file or futzing with the generator > script, but I was wondering if there was an easy way to do it in Cython. > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Best way is probably to include a C header file (cdef extern from "myheader.h": pass), or define the macro with the 'define_macros' argument to distutils' Extension. There are other tricks but they are more hacks than anything else. From charlesr.harris at gmail.com Thu Feb 9 11:05:14 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 9 Feb 2012 09:05:14 -0700 Subject: [Numpy-discussion] Cython question In-Reply-To: References: Message-ID: On Thu, Feb 9, 2012 at 8:35 AM, mark florisson wrote: > On 9 February 2012 15:29, Charles R Harris > wrote: > > Hi All, > > > > Does anyone know how to make Cython emit a C macro? I would like to be > able > > to > > > > #define NO_DEPRECATED_API > > > > and can do so by including a header file or futzing with the generator > > script, but I was wondering if there was an easy way to do it in Cython. > > > > Chuck > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > Best way is probably to include a C header file (cdef extern from > "myheader.h": pass), or define the macro with the 'define_macros' > argument to distutils' Extension. There are other tricks but they are > more hacks than anything else. > Thanks Mark, The header file looks like the cleanest way to do this. I'll put one in the numpy/include/numpy directory. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsalvati at u.washington.edu Thu Feb 9 11:42:55 2012 From: jsalvati at u.washington.edu (John Salvatier) Date: Thu, 9 Feb 2012 08:42:55 -0800 Subject: [Numpy-discussion] just the date part of a datetime64[s]? In-Reply-To: References: Message-ID: Thanks Mark! John On Wed, Feb 8, 2012 at 6:48 PM, Mark Wiebe wrote: > Converting between date and datetime requires caution, because it depends > on your time zone. Because all datetime64's are internally stored in UTC, > simply casting as in your example treats it in UTC. The 'astype' function > does not raise an error to tell you that this is problematic, because > NumPy's default casting for that function has no error policy (yet). > > Here's the trouble you can get into: > > x = datetime64('2012-02-02 22:00:00', 's') > > x.astype('M8[D]') > Out[19]: numpy.datetime64('2012-02-03') > > > The trouble happens the other way too, because a date is represented as > midnight UTC. This would also raise an exception, but for the fact that > astype does no checking: > > x = datetime64('2012-02-02') > > x.astype('M8[m]') > Out[23]: numpy.datetime64('2012-02-01T16:00-0800') > > > The intention is to have functions which handles this casting explicitly, > called datetime_as_date and date_as_datetime. They would take a timezone > parameter, so the code explicitly specifies how the conversion takes place. > A crude replacement for now is: > > x = datetime64('2012-02-02 22:00:00', 's') > > np.datetime64(np.datetime_as_string(x, timezone='local')[:10]) > Out[21]: numpy.datetime64('2012-02-02') > > > This is hackish, but it should do what you want. > > -Mark > > On Wed, Feb 8, 2012 at 9:10 AM, John Salvatier wrote: > >> >> Hello, is there a good way to get just the date part of a datetime64? >> Frequently datetime datatypes have month(), date(), hour(), etc functions >> that pull out part of the datetime, but I didn't see those mentioned in the >> datetime64 docs. Casting to a 'D' dtype didn't work as I would have hoped: >> >> In [30]: x= datetime64('2012-02-02 09:00:00', 's') >> >> In [31]: x >> Out[31]: numpy.datetime64('2012-02-02T09:00:00-0800') >> >> In [32]: x.astype('datetime64[D]').astype('datetime64[s]') >> Out[32]: numpy.datetime64('2012-02-01T16:00:00-0800') >> >> What's the simplest way to do this? >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Thu Feb 9 13:07:20 2012 From: sturla at molden.no (Sturla Molden) Date: Thu, 09 Feb 2012 19:07:20 +0100 Subject: [Numpy-discussion] Moving to gcc 4.* for win32 installers ? In-Reply-To: <4F316198.1030108@molden.no> References: <4F312961.5040706@molden.no> <4F312D54.7080908@molden.no> <4F316198.1030108@molden.no> Message-ID: <4F340B58.1070804@molden.no> On 07.02.2012 18:38, Sturla Molden wrote: > One potential problem I just discovered is dependency on a DLL called > libpthreadGC2.dll. This is not correct!!! :-D Two threading APIs can be used for OpenBLAS/GotoBLAS2, Win32 threads or OpenMP. driver/others/blas_server_omp.c driver/others/blas_server_win32.c Simply build without telling OpenBLAS/GotoBLAS2 to use OpenMP (i.e. make without USE_OPENMP=1), and no dependency on libpthreadGC2.dll is ever made. OpenBLAS/GotoBLAS2 is thus a plain BSD licenced BLAS. I tried to compile OpenBLAS on my office computer. It did not know about "Sandy Shore" architecture so I had to tell it to use NEHALEM instead: $ make TARGET=NEHALEM This worked just fine :) Setup: - TDM-GCC 4.6.1 for x64 (install before MSYS) with gfortran. - MSYS (mingw-get-inst-20111118.exe). During the MSYS install, deselect "C compiler" and select "MinGw Developer ToolKit" to get Perl. NB! OpenBLAS/GotoBLAS2 will not build without Perl in MSYS, you will get an error that says "couldn't commit memory for cygwin heap". Never mind that OpenBLAS/GotoBLAS2 says you need Cygwin and Visual Studio, those are not needed. The DLL that is produced (OpenBLAS.dll) is linked against msvcrt.dll, not msvcr90.dll. Thus, don't use it with Python27, or at least don't share any CRT resources with it. The static library (libopenblas_nehalemp-r0.1alpha2.4.lib) is not linked with msvcrt.dll as far as I can tell, or any other library such as libgfortran. (This is the one we need for NumPy anyway, I think, David C. hates DLLs.) We will probably have to build one for all the different AMD and Intel architectures. If it is of interest for building NumPy, it seems the OpenBLAS DLL is linked with this sequence: -lgfortran -lmingw32 -lmoldname -lmingwex -lmsvcrt -lquadmath -lm I tried to build plain GotoBLAS2 as well... Using $ make BINARY=64 resulted this error: http://lists.freebsd.org/pipermail/freebsd-ports-bugs/2011-October/220422.html That is because GotoBLAS2 thinks Shandy Shore is Prescott, and then does something stupid... Thus: Building OpenBLAS with MinGW workes just fine (TDM-GCC with gfortran and MSYS DTK) and requires no configuration. Just type make and specity the CPU architecture, see the text file TargetList.txt. Sturla From drewfrank at gmail.com Thu Feb 9 14:20:11 2012 From: drewfrank at gmail.com (Drew Frank) Date: Thu, 9 Feb 2012 19:20:11 +0000 (UTC) Subject: [Numpy-discussion] numpy.arange() error? References: <33277269.post@talk.nabble.com> <4F337933.6010809@hawaii.edu> Message-ID: Eric Firing hawaii.edu> writes: > > On 02/08/2012 09:31 PM, teomat wrote: > > > > Hi, > > > > Am I wrong or the numpy.arange() function is not correct 100%? > > > > Try to do this: > > > > In [7]: len(np.arange(3.1, 4.9, 0.1)) > > Out[7]: 18 > > > > In [8]: len(np.arange(8.1, 9.9, 0.1)) > > Out[8]: 19 > > > > I would expect the same result for each command. > > Not after more experience with the wonders of floating point! > Nice-looking decimal numbers often have long, drawn-out, inexact > floating point (base 2) representations. That leads to exactly this > sort of problem. > > numpy.linspace is provided to help get around some of these surprises; > or you can use an integer sequence and then scale and shift it. > > Eric > > > > > All the best > > > > > I also found this surprising -- not because I lack experience with floating point, but because I do have experience with MATLAB. In MATLAB, the corresponding operation 3.1:0.1:4.9 has length 19 because of an explicit tolerance parameter used in the implmentation (http://www.mathworks.com/support/solutions/en/data/1-4FLI96/index.html?solution=1-4FLI96). Of course, NumPy is not MATLAB :). That said, I prefer the MATLAB behavior in this case -- even though it has a bit of a "magic" feel to it, I find it hard to imagine code that operates correctly given the Python semantics and incorrectly under MATLAB's. Thoughts? From charlesr.harris at gmail.com Thu Feb 9 14:40:20 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 9 Feb 2012 12:40:20 -0700 Subject: [Numpy-discussion] numpy.arange() error? In-Reply-To: References: <33277269.post@talk.nabble.com> <4F337933.6010809@hawaii.edu> Message-ID: On Thu, Feb 9, 2012 at 12:20 PM, Drew Frank wrote: > Eric Firing hawaii.edu> writes: > > > > > On 02/08/2012 09:31 PM, teomat wrote: > > > > > > Hi, > > > > > > Am I wrong or the numpy.arange() function is not correct 100%? > > > > > > Try to do this: > > > > > > In [7]: len(np.arange(3.1, 4.9, 0.1)) > > > Out[7]: 18 > > > > > > In [8]: len(np.arange(8.1, 9.9, 0.1)) > > > Out[8]: 19 > > > > > > I would expect the same result for each command. > > > > Not after more experience with the wonders of floating point! > > Nice-looking decimal numbers often have long, drawn-out, inexact > > floating point (base 2) representations. That leads to exactly this > > sort of problem. > > > > numpy.linspace is provided to help get around some of these surprises; > > or you can use an integer sequence and then scale and shift it. > > > > Eric > > > > > > > > All the best > > > > > > > > > I also found this surprising -- not because I lack experience with floating > point, but because I do have experience with MATLAB. In MATLAB, the > corresponding operation 3.1:0.1:4.9 has length 19 because of an explicit > tolerance parameter used in the implmentation > ( > http://www.mathworks.com/support/solutions/en/data/1-4FLI96/index.html?solution=1-4FLI96 > ). > > Of course, NumPy is not MATLAB :). That said, I prefer the MATLAB > behavior in > this case -- even though it has a bit of a "magic" feel to it, I find it > hard to > imagine code that operates correctly given the Python semantics and > incorrectly > under MATLAB's. Thoughts? > > Matlab didn't have integers, so they did the best they could ;) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Thu Feb 9 14:47:51 2012 From: efiring at hawaii.edu (Eric Firing) Date: Thu, 09 Feb 2012 09:47:51 -1000 Subject: [Numpy-discussion] numpy.arange() error? In-Reply-To: References: <33277269.post@talk.nabble.com> <4F337933.6010809@hawaii.edu> Message-ID: <4F3422E7.6090407@hawaii.edu> On 02/09/2012 09:20 AM, Drew Frank wrote: > Eric Firing hawaii.edu> writes: > >> >> On 02/08/2012 09:31 PM, teomat wrote: >>> >>> Hi, >>> >>> Am I wrong or the numpy.arange() function is not correct 100%? >>> >>> Try to do this: >>> >>> In [7]: len(np.arange(3.1, 4.9, 0.1)) >>> Out[7]: 18 >>> >>> In [8]: len(np.arange(8.1, 9.9, 0.1)) >>> Out[8]: 19 >>> >>> I would expect the same result for each command. >> >> Not after more experience with the wonders of floating point! >> Nice-looking decimal numbers often have long, drawn-out, inexact >> floating point (base 2) representations. That leads to exactly this >> sort of problem. >> >> numpy.linspace is provided to help get around some of these surprises; >> or you can use an integer sequence and then scale and shift it. >> >> Eric >> >>> >>> All the best >>> >>> >> > I also found this surprising -- not because I lack experience with floating > point, but because I do have experience with MATLAB. In MATLAB, the > corresponding operation 3.1:0.1:4.9 has length 19 because of an explicit > tolerance parameter used in the implmentation > (http://www.mathworks.com/support/solutions/en/data/1-4FLI96/index.html?solution=1-4FLI96). > > Of course, NumPy is not MATLAB :). That said, I prefer the MATLAB behavior in > this case -- even though it has a bit of a "magic" feel to it, I find it hard to > imagine code that operates correctly given the Python semantics and incorrectly > under MATLAB's. Thoughts? You raise a good point. Neither arange nor linspace provides a close equivalent to the nice behavior of the Matlab colon, even though that is often what one really wants. Adding this, either via an arange kwarg, a linspace kwarg, or a new function, seems like a good idea. Eric > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From tsyu80 at gmail.com Thu Feb 9 15:49:27 2012 From: tsyu80 at gmail.com (Tony Yu) Date: Thu, 9 Feb 2012 15:49:27 -0500 Subject: [Numpy-discussion] numpy.arange() error? In-Reply-To: <4F3422E7.6090407@hawaii.edu> References: <33277269.post@talk.nabble.com> <4F337933.6010809@hawaii.edu> <4F3422E7.6090407@hawaii.edu> Message-ID: On Thu, Feb 9, 2012 at 2:47 PM, Eric Firing wrote: > On 02/09/2012 09:20 AM, Drew Frank wrote: > > Eric Firing hawaii.edu> writes: > > > >> > >> On 02/08/2012 09:31 PM, teomat wrote: > >>> > >>> Hi, > >>> > >>> Am I wrong or the numpy.arange() function is not correct 100%? > >>> > >>> Try to do this: > >>> > >>> In [7]: len(np.arange(3.1, 4.9, 0.1)) > >>> Out[7]: 18 > >>> > >>> In [8]: len(np.arange(8.1, 9.9, 0.1)) > >>> Out[8]: 19 > >>> > >>> I would expect the same result for each command. > >> > >> Not after more experience with the wonders of floating point! > >> Nice-looking decimal numbers often have long, drawn-out, inexact > >> floating point (base 2) representations. That leads to exactly this > >> sort of problem. > >> > >> numpy.linspace is provided to help get around some of these surprises; > >> or you can use an integer sequence and then scale and shift it. > >> > >> Eric > >> > >>> > >>> All the best > >>> > >>> > >> > > I also found this surprising -- not because I lack experience with > floating > > point, but because I do have experience with MATLAB. In MATLAB, the > > corresponding operation 3.1:0.1:4.9 has length 19 because of an explicit > > tolerance parameter used in the implmentation > > ( > http://www.mathworks.com/support/solutions/en/data/1-4FLI96/index.html?solution=1-4FLI96 > ). > > > > Of course, NumPy is not MATLAB :). That said, I prefer the MATLAB > behavior in > > this case -- even though it has a bit of a "magic" feel to it, I find it > hard to > > imagine code that operates correctly given the Python semantics and > incorrectly > > under MATLAB's. Thoughts? > > You raise a good point. Neither arange nor linspace provides a close > equivalent to the nice behavior of the Matlab colon, even though that is > often what one really wants. Adding this, either via an arange kwarg, a > linspace kwarg, or a new function, seems like a good idea. > > Eric > > On a related note: would an `endpoint` parameter for `arange` be desirable? For example, `linspace` already has this as a parameter. When using `arange` with int values, I often want the endpoint. Of course, adding 1 is easy to do (and shorter), but in some ways it's less readable (are you adding 1 because the variable itself is off by 1 or because of the fact the limit is exclusive; sometimes I end up adding 2 because both are true). It's a subtle difference, I know. And it's debatable whether `endpoint` would be more readable (though I think it is). This is somewhat tangential to how `arange` behaves with floats, in which the second limit sometimes *appears* inclusive and sometimes *appears* exclusive. I never use `arange` with floats because of this unpredictably, so I like the idea of adding a tolerance. -Tony -------------- next part -------------- An HTML attachment was scrubbed... URL: From e.antero.tammi at gmail.com Thu Feb 9 16:44:01 2012 From: e.antero.tammi at gmail.com (eat) Date: Thu, 9 Feb 2012 23:44:01 +0200 Subject: [Numpy-discussion] numpy.arange() error? In-Reply-To: <4F3422E7.6090407@hawaii.edu> References: <33277269.post@talk.nabble.com> <4F337933.6010809@hawaii.edu> <4F3422E7.6090407@hawaii.edu> Message-ID: Hi, On Thu, Feb 9, 2012 at 9:47 PM, Eric Firing wrote: > On 02/09/2012 09:20 AM, Drew Frank wrote: > > Eric Firing hawaii.edu> writes: > > > >> > >> On 02/08/2012 09:31 PM, teomat wrote: > >>> > >>> Hi, > >>> > >>> Am I wrong or the numpy.arange() function is not correct 100%? > >>> > >>> Try to do this: > >>> > >>> In [7]: len(np.arange(3.1, 4.9, 0.1)) > >>> Out[7]: 18 > >>> > >>> In [8]: len(np.arange(8.1, 9.9, 0.1)) > >>> Out[8]: 19 > >>> > >>> I would expect the same result for each command. > >> > >> Not after more experience with the wonders of floating point! > >> Nice-looking decimal numbers often have long, drawn-out, inexact > >> floating point (base 2) representations. That leads to exactly this > >> sort of problem. > >> > >> numpy.linspace is provided to help get around some of these surprises; > >> or you can use an integer sequence and then scale and shift it. > >> > >> Eric > >> > >>> > >>> All the best > >>> > >>> > >> > > I also found this surprising -- not because I lack experience with > floating > > point, but because I do have experience with MATLAB. In MATLAB, the > > corresponding operation 3.1:0.1:4.9 has length 19 because of an explicit > > tolerance parameter used in the implmentation > > ( > http://www.mathworks.com/support/solutions/en/data/1-4FLI96/index.html?solution=1-4FLI96 > ). > > > > Of course, NumPy is not MATLAB :). That said, I prefer the MATLAB > behavior in > > this case -- even though it has a bit of a "magic" feel to it, I find it > hard to > > imagine code that operates correctly given the Python semantics and > incorrectly > > under MATLAB's. Thoughts? > > You raise a good point. Neither arange nor linspace provides a close > equivalent to the nice behavior of the Matlab colon, even though that is > often what one really wants. Adding this, either via an arange kwarg, a > linspace kwarg, or a new function, seems like a good idea. > Maybe this issue is raised also earlier, but wouldn't it be more consistent to let arange operate only with integers (like Python's range) and let linspace handle the floats as well? My 2 cents, eat > > Eric > > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Thu Feb 9 17:34:16 2012 From: sturla at molden.no (Sturla Molden) Date: Thu, 9 Feb 2012 23:34:16 +0100 Subject: [Numpy-discussion] numpy.arange() error? In-Reply-To: References: <33277269.post@talk.nabble.com> <4F337933.6010809@hawaii.edu> <4F3422E7.6090407@hawaii.edu> Message-ID: <7C1DA01E-F94D-4366-8084-F9C73444010B@molden.no> Den 9. feb. 2012 kl. 22:44 skrev eat : > > Maybe this issue is raised also earlier, but wouldn't it be more consistent to let arange operate only with integers (like Python's range) and let linspace handle the floats as well? > > Perhaps. Another possibility would be to let arange take decimal arguments, possibly entered as text strings. Sturla -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Thu Feb 9 18:40:31 2012 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 9 Feb 2012 17:40:31 -0600 Subject: [Numpy-discussion] numpy.arange() error? In-Reply-To: <7C1DA01E-F94D-4366-8084-F9C73444010B@molden.no> References: <33277269.post@talk.nabble.com> <4F337933.6010809@hawaii.edu> <4F3422E7.6090407@hawaii.edu> <7C1DA01E-F94D-4366-8084-F9C73444010B@molden.no> Message-ID: On Thursday, February 9, 2012, Sturla Molden wrote: > > > Den 9. feb. 2012 kl. 22:44 skrev eat : > >> > Maybe this issue is raised also earlier, but wouldn't it be more consistent to let arange operate only with integers (like Python's range) and let linspace handle the floats as well? > > > Perhaps. Another possibility would be to let arange take decimal arguments, possibly entered as text strings. > Sturla Personally, I treat arange() to mean, "give me a sequence of values from x to y, exclusive, with a specific step size". Nowhere in that statement does it guarantee a particular number of elements. Whereas linspace() means, "give me a sequence of evenly spaced numbers from x to y, optionally inclusive, such that there are exactly N elements". They complement each other well. There are times when I intentionally will specify a range where the step size will not nicely fit. i.e.- np.arange(1, 7, 3.5). I wouldn't want this to change. My vote is that if users want matlab-colon-like behavior, we could make a new function - maybe erange() for "exact range"? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From ajfrank at ics.uci.edu Thu Feb 9 20:22:23 2012 From: ajfrank at ics.uci.edu (Drew Frank) Date: Thu, 9 Feb 2012 17:22:23 -0800 Subject: [Numpy-discussion] numpy.arange() error? In-Reply-To: References: <33277269.post@talk.nabble.com> <4F337933.6010809@hawaii.edu> <4F3422E7.6090407@hawaii.edu> <7C1DA01E-F94D-4366-8084-F9C73444010B@molden.no> Message-ID: On Thu, Feb 9, 2012 at 3:40 PM, Benjamin Root wrote: > > > On Thursday, February 9, 2012, Sturla Molden wrote: > > > > > > Den 9. feb. 2012 kl. 22:44 skrev eat : > > > >> > > Maybe this issue is raised also earlier, but wouldn't it be more > consistent to let arange operate only with integers (like Python's range) > and let linspace handle the floats as well? > > > > > > Perhaps. Another possibility would be to let arange take decimal > arguments, possibly entered as text strings. > > Sturla > > > Personally, I treat arange() to mean, "give me a sequence of values from x > to y, exclusive, with a specific step size". Nowhere in that statement > does it guarantee a particular number of elements. Whereas linspace() > means, "give me a sequence of evenly spaced numbers from x to y, optionally > inclusive, such that there are exactly N elements". They complement each > other well. > I agree -- both functions are useful and I think about them the same way. The unfortunate part is that tiny precision errors in y can make arange appear to be "sometimes-exclusive" rather than always exclusive. I've always imagined there to be a sort of duality between the two functions, where arange(low, high, step) == linspace(low, high-step, round((high-low)/step)) in cases where (high - low)/step is integral, but it turns out this is not the case. > > There are times when I intentionally will specify a range where the step > size will not nicely fit. i.e.- np.arange(1, 7, 3.5). I wouldn't want this > to change. > Nor would I. What I meant to express earlier is that I like how Matlab addresses this particular class of floating point precision errors, not that I think arange output should somehow include both endpoints. > > My vote is that if users want matlab-colon-like behavior, we could make a > new function - maybe erange() for "exact range"? > > Ben Root > That could work; it would completely replace arange for me in every circumstance I can think of, but I understand we can't just go changing the behavior of core functions. Drew -------------- next part -------------- An HTML attachment was scrubbed... URL: From daverz at gmail.com Thu Feb 9 23:39:57 2012 From: daverz at gmail.com (Dave Cook) Date: Thu, 9 Feb 2012 20:39:57 -0800 Subject: [Numpy-discussion] cumsum much slower than simple loop? Message-ID: Why is numpy.cumsum (along axis=0) so much slower than a simple loop? The same goes for numpy.add.accumulate # cumsumtest.py import numpy as np def loopcumsum(a): csum = np.empty_like(a) s = 0.0 for i in range(len(a)): csum[i] = s = s + a[i] return csum npcumsum = lambda a: np.cumsum(a, axis=0) addaccum = lambda a: np.add.accumulate(a) shape = (100, 8, 512) a = np.arange(np.prod(shape), dtype='f').reshape(shape) # check that we get the same results print (npcumsum(a)==loopcumsum(a)).all() print (addaccum(a)==loopcumsum(a)).all() ipython session: In [1]: from cumsumtest import * True True In [2]: timeit npcumsum(a) 100 loops, best of 3: 14.7 ms per loop In [3]: timeit addaccum(a) 100 loops, best of 3: 15.4 ms per loop In [4]: timeit loopcumsum(a) 100 loops, best of 3: 2.16 ms per loop Dave Cook -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Feb 10 00:21:46 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 10 Feb 2012 00:21:46 -0500 Subject: [Numpy-discussion] cumsum much slower than simple loop? In-Reply-To: References: Message-ID: On Thu, Feb 9, 2012 at 11:39 PM, Dave Cook wrote: > Why is numpy.cumsum (along axis=0) so much slower than a simple loop?? The > same goes for numpy.add.accumulate > > # cumsumtest.py > import numpy as np > > def loopcumsum(a): > ??? csum = np.empty_like(a) > ??? s = 0.0 > ??? for i in range(len(a)): > ??????? csum[i] = s = s + a[i] > ??? return csum > > npcumsum = lambda a: np.cumsum(a, axis=0) > > addaccum = lambda a: np.add.accumulate(a) > > shape = (100, 8, 512) > a = np.arange(np.prod(shape), dtype='f').reshape(shape) > # check that we get the same results > print (npcumsum(a)==loopcumsum(a)).all() > print (addaccum(a)==loopcumsum(a)).all() > > ipython session: > > In [1]: from cumsumtest import * > True > True > > In [2]: timeit npcumsum(a) > 100 loops, best of 3: 14.7 ms per loop > > In [3]: timeit addaccum(a) > 100 loops, best of 3: 15.4 ms per loop > > In [4]: timeit loopcumsum(a) > 100 loops, best of 3: 2.16 ms per loop strange (if I didn't make a mistake) In [12]: timeit a.cumsum(0) 100 loops, best of 3: 7.17 ms per loop In [13]: timeit a.T.cumsum(-1).T 1000 loops, best of 3: 1.78 ms per loop In [14]: (a.T.cumsum(-1).T == a.cumsum(0)).all() Out[14]: True Josef > > Dave Cook > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From daverz at gmail.com Fri Feb 10 00:41:19 2012 From: daverz at gmail.com (Dave Cook) Date: Thu, 9 Feb 2012 21:41:19 -0800 Subject: [Numpy-discussion] cumsum much slower than simple loop? In-Reply-To: References: Message-ID: On Thu, Feb 9, 2012 at 9:21 PM, wrote: > strange (if I didn't make a mistake) > > In [12]: timeit a.cumsum(0) > 100 loops, best of 3: 7.17 ms per loop > > In [13]: timeit a.T.cumsum(-1).T > 1000 loops, best of 3: 1.78 ms per loop > > In [14]: (a.T.cumsum(-1).T == a.cumsum(0)).all() > Out[14]: True > > Interesting. I should have mentioned that I'm using numpy 1.5.1 on 64-bit Ubuntu 10.10. This transpose/compute/transpose trick did not work for me. In [27]: timeit a.T.cumsum(-1).T 10 loops, best of 3: 18.3 ms per loop Dave Cook -------------- next part -------------- An HTML attachment was scrubbed... URL: From daverz at gmail.com Fri Feb 10 00:51:38 2012 From: daverz at gmail.com (Dave Cook) Date: Thu, 9 Feb 2012 21:51:38 -0800 Subject: [Numpy-discussion] cumsum much slower than simple loop? In-Reply-To: References: Message-ID: On Thu, Feb 9, 2012 at 9:41 PM, Dave Cook wrote: > > Interesting. I should have mentioned that I'm using numpy 1.5.1 on 64-bit > Ubuntu 10.10. This transpose/compute/transpose trick did not work for me. > > Nor does it work under numpy 1.6.1 built with MKL under Windows 7 on a core i7. Dave Cook -------------- next part -------------- An HTML attachment was scrubbed... URL: From staticfloat at gmail.com Fri Feb 10 02:15:07 2012 From: staticfloat at gmail.com (Elliot Saba) Date: Thu, 9 Feb 2012 23:15:07 -0800 Subject: [Numpy-discussion] cumsum much slower than simple loop? In-Reply-To: References: Message-ID: numpy 1.6.1, OSX, Core 2 Duo: In [7]: timeit a.cumsum(0) 100 loops, best of 3: 6.67 ms per loop In [8]: timeit a.T.cumsum(-1).T 100 loops, best of 3: 6.75 ms per loop -E On Thu, Feb 9, 2012 at 9:51 PM, Dave Cook wrote: > On Thu, Feb 9, 2012 at 9:41 PM, Dave Cook wrote: > >> >> Interesting. I should have mentioned that I'm using numpy 1.5.1 on >> 64-bit Ubuntu 10.10. This transpose/compute/transpose trick did not work >> for me. >> >> > Nor does it work under numpy 1.6.1 built with MKL under Windows 7 on a > core i7. > > Dave Cook > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Fri Feb 10 03:55:26 2012 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 10 Feb 2012 09:55:26 +0100 Subject: [Numpy-discussion] cumsum much slower than simple loop? In-Reply-To: References: Message-ID: 10.02.2012 05:39, Dave Cook kirjoitti: > Why is numpy.cumsum (along axis=0) so much slower than a simple loop? > The same goes for numpy.add.accumulate The reason is loop ordering. The reduction operator when using `cumsum` or `add.reduce` does the summation in the inmost loop, whereas the `loopcumsum` has the summation in the outmost loop. Although both algorithms do the same number of operations, the latter is more efficient with regards to CPU cache (and maybe memory data dependency) --- the arrays are in C-order so summing along the first axis is wasteful as the elements are far from each other in memory. The effect goes away, if you use a Fortran-ordered array: a = np.array(a, order='F') print a.shape Numpy does not currently have heuristics to determine when swapping the loop order would be beneficial in accumulation and reductions. It does, however, have the heuristics in place for elementwise operations. -- Pauli Virtanen From cournape at gmail.com Fri Feb 10 04:25:10 2012 From: cournape at gmail.com (David Cournapeau) Date: Fri, 10 Feb 2012 09:25:10 +0000 Subject: [Numpy-discussion] On making Numpy 1.7 a long term support release. In-Reply-To: References: <41D27F43-E5C6-478C-8A11-297AEC50431D@continuum.io> Message-ID: On Sun, Feb 5, 2012 at 7:19 AM, Ralf Gommers wrote: > > > On Sun, Feb 5, 2012 at 7:33 AM, Travis Oliphant wrote: >> >> I think supporting Python 2.5 and above is completely fine. ?I'd even be >> in favor of bumping up to Python 2.6 for NumPy 1.7 and certainly for NumPy >> 2.8 >> > +1 for dropping Python 2.5 support also for an LTS release. That will make > it a lot easier to use str.format() and the with statement (plus many other > things) going forward, without having to think about if your changes can be > backported to that LTS release. At the risk of sounding like a broken record, I would really like to stay to 2.4, especially for a long term release :) This is still the basis used by a lots of long-term python products. If we can support 2.4 for a LTS, I would then be much more comfortable to allow bumping to 2.5 for 1.8. David From markflorisson88 at gmail.com Fri Feb 10 05:20:29 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Fri, 10 Feb 2012 10:20:29 +0000 Subject: [Numpy-discussion] On making Numpy 1.7 a long term support release. In-Reply-To: References: <41D27F43-E5C6-478C-8A11-297AEC50431D@continuum.io> Message-ID: On 5 February 2012 07:19, Ralf Gommers wrote: > > > On Sun, Feb 5, 2012 at 7:33 AM, Travis Oliphant wrote: >> >> I think supporting Python 2.5 and above is completely fine. ?I'd even be >> in favor of bumping up to Python 2.6 for NumPy 1.7 and certainly for NumPy >> 2.8 >> > +1 for dropping Python 2.5 support also for an LTS release. That will make > it a lot easier to use str.format() and the with statement (plus many other > things) going forward, without having to think about if your changes can be > backported to that LTS release. The with statement works just fine in python 2.5, all you have to do is 'from __future__ import with_statement'. As for str.format, well... > Ralf > >> >> >> >> On Feb 4, 2012, at 10:13 PM, Bruce Southey wrote: >> >> > On Sat, Feb 4, 2012 at 6:07 PM, Charles R Harris >> > wrote: >> >> >> >> >> >> On Sat, Feb 4, 2012 at 3:03 PM, Travis Oliphant >> >> wrote: >> >>> >> >>> We are spending a lot of time on NumPy and will be for the next few >> >>> months. ?I think that 1.8 will be a better long term release. ?We need >> >>> a few >> >>> more fundamental features yet. >> >>> >> >>> Look for a roadmap document for discussion from Mark Wiebe and I >> >>> within >> >>> the week about NumPy 1.8 which has a target release of June 2012. >> >>> >> >> >> >> Looking forward to that document. >> >> >> >> Chuck >> >> >> >> _______________________________________________ >> >> NumPy-Discussion mailing list >> >> NumPy-Discussion at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> > >> > A suitable long term release would include deprecating old macros, >> > datetime and einsum. While I would like to include NA, I am rather >> > concerned with the recent bugs that have been uncovered with it. So I >> > am rather wary of having to forced to backport fixes simply because >> > someone said we would "support with bug fixes for the next 2-3 years". >> > Rather at least clearly indicate that not every fix will be >> > backported. >> > >> > I propose that we use this opportunity end support for Python 2.4 >> > especially since Red Hat Enterprise Linux (RHEL) 4 is February 29th, >> > 2012. According to SourceForge, the last available binary release for >> > Python 2.4 was for numpy 1.2.1 (released 2008-10-29). There is still >> > quite a few downloads (3769) of the Python 2.5 numpy 1,6.1 binary. >> > >> > Bruce >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From brad.reisfeld at gmail.com Fri Feb 10 09:29:12 2012 From: brad.reisfeld at gmail.com (Brad Reisfeld) Date: Fri, 10 Feb 2012 07:29:12 -0700 Subject: [Numpy-discussion] simple manipulations of numpy arrays Message-ID: Hi, I am relatively new to numpy and am seeking some advice on an appropriate way to do the following simple task. The idea is to build a class that will allow a user to easily remove and keep columns and rows in a 2D numpy array. An outline of the class is as follows: class DataEditor(object): def __init__(self, data_array): self.data_array = data_array self.edited_data_array = None def get_edited_data(self): return self.edited_data_array def remove_cols(self, a_list_of_column_numbers): """remove the specified columns, but keep the rest""" # some functionality to produce self.edited_data_array def remove_rows(self, a_list_of_row_numbers): """remove the specified rows, but keep the rest""" # some functionality to produce self.edited_data_array def keep_cols(self, a_list_of_column_numbers): """keep the specified columns, but remove the rest""" # some functionality to produce self.edited_data_array def keep_rows(self, a_list_of_row_numbers): """keep the specified rows, but remove the rest""" # some functionality to produce self.edited_data_array Usage would be something like the following: >> import numpy as np >> import data_editor >> data = np.random.rand(7,9) >> editor = data_editor.DataEditor(data) >> editor.remove_rows([2,4]) >> editor.keep_cols([3,5,7]) >> edited_data = data_editor.get_edited_array() I don't have much experience using them, but would using a masked array would make sense in this context? If so, how does one convert a masked array to a 'normal' array with only the unmasked values present from the original? Or, is another approach more appropriate. Thank you for your help. -Brad From aldcroft at head.cfa.harvard.edu Fri Feb 10 09:47:58 2012 From: aldcroft at head.cfa.harvard.edu (Tom Aldcroft) Date: Fri, 10 Feb 2012 09:47:58 -0500 Subject: [Numpy-discussion] simple manipulations of numpy arrays In-Reply-To: References: Message-ID: This is not yet released (but will be in the near future): http://readthedocs.org/docs/astropy/en/latest/table/index.html https://github.com/astropy/astropy/blob/master/astropy/table/table.py You can at least use this as an example of how to add rows and columns to a structured array. ?Or be an early adopter and install and use astropy.table. ?:-) - Tom On Fri, Feb 10, 2012 at 9:29 AM, Brad Reisfeld wrote: > Hi, > > I am relatively new to numpy and am seeking some advice on an > appropriate way to do the following simple task. > > The idea is to build a class that will allow a user to easily remove > and keep columns and rows in a 2D numpy array. > > An outline of the class is as follows: > > class DataEditor(object): > > ? ?def __init__(self, data_array): > ? ? ? ?self.data_array = data_array > ? ? ? ?self.edited_data_array = None > > ? ?def get_edited_data(self): > ? ? ? ?return self.edited_data_array > > ? ?def remove_cols(self, a_list_of_column_numbers): > ? ? ? ?"""remove the specified columns, but keep the rest""" > ? ? ? ?# some functionality to produce self.edited_data_array > > ? ?def remove_rows(self, a_list_of_row_numbers): > ? ? ? ?"""remove the specified rows, but keep the rest""" > ? ? ? ?# some functionality to produce self.edited_data_array > > ? ?def keep_cols(self, a_list_of_column_numbers): > ? ? ? ?"""keep the specified columns, but remove the rest""" > ? ? ? ?# some functionality to produce self.edited_data_array > > ? ?def keep_rows(self, a_list_of_row_numbers): > ? ? ? ?"""keep the specified rows, but remove the rest""" > ? ? ? ?# some functionality to produce self.edited_data_array > > > Usage would be something like the following: > >>> import numpy as np >>> import data_editor >>> data = np.random.rand(7,9) >>> editor = data_editor.DataEditor(data) >>> editor.remove_rows([2,4]) >>> editor.keep_cols([3,5,7]) >>> edited_data = data_editor.get_edited_array() > > > I don't have much experience using them, but would using a masked > array would make sense in this context? > If so, how does one convert a masked array to a 'normal' array with > only the unmasked values present from the original? > Or, is another approach more appropriate. > > Thank you for your help. > > -Brad > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From tmp50 at ukr.net Fri Feb 10 10:10:46 2012 From: tmp50 at ukr.net (Dmitrey) Date: Fri, 10 Feb 2012 17:10:46 +0200 Subject: [Numpy-discussion] [ANN] new solver for multiobjective optimization problems Message-ID: <87834.1328886646.5803578716629106688@ffe16.ukr.net> hi, I'm glad to inform you about new Python solver for multiobjective optimization (MOP). Some changes committed to solver interalg made it capable of handling global nonlinear constrained multiobjective problem (MOP), see the page for more details. > > Using interalg you can be 100% sure your result covers whole Pareto front according to the required tolerances on objective functions. > > Available features include real-time or final graphical output, possibility of involving parallel calculations, handling both continuous and discrete variables, export result to xls files. > > Regards, D. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From francesc at continuum.io Fri Feb 10 10:50:54 2012 From: francesc at continuum.io (Francesc Alted) Date: Fri, 10 Feb 2012 16:50:54 +0100 Subject: [Numpy-discussion] simple manipulations of numpy arrays In-Reply-To: References: Message-ID: On Feb 10, 2012, at 3:29 PM, Brad Reisfeld wrote: > Hi, > > I am relatively new to numpy and am seeking some advice on an > appropriate way to do the following simple task. > > The idea is to build a class that will allow a user to easily remove > and keep columns and rows in a 2D numpy array. Apart from the good suggestions already made, you may also find useful the carray package available in: https://github.com/FrancescAlted/carry It implements a ctable object that has different capabilities: * Allows addition and removal of columns very efficiently * Allows to enlarge and shrink the ctable * Supports compression (via the fast blosc compressor) * If numexpr is installed, you can seamlessly operate with columns efficiently * You can efficiently select rows using complex conditions (needs numexpr too) You can have a quick look at how this works in the second part of the tutorial: https://github.com/FrancescAlted/carray/blob/master/doc/tutorial.rst Hope it helps, -- Francesc Alted From francesc at continuum.io Fri Feb 10 10:53:38 2012 From: francesc at continuum.io (Francesc Alted) Date: Fri, 10 Feb 2012 16:53:38 +0100 Subject: [Numpy-discussion] simple manipulations of numpy arrays In-Reply-To: References: Message-ID: On Feb 10, 2012, at 4:50 PM, Francesc Alted wrote: > https://github.com/FrancescAlted/carry Hmm, this should be: https://github.com/FrancescAlted/carray Blame my (too) smart spell corrector. -- Francesc Alted From ndbecker2 at gmail.com Fri Feb 10 10:59:53 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 10 Feb 2012 10:59:53 -0500 Subject: [Numpy-discussion] [ANN] new solver for multiobjective optimization problems References: <87834.1328886646.5803578716629106688@ffe16.ukr.net> Message-ID: And where do we find this gem? From andrea.gavana at gmail.com Fri Feb 10 11:38:00 2012 From: andrea.gavana at gmail.com (Andrea Gavana) Date: Fri, 10 Feb 2012 17:38:00 +0100 Subject: [Numpy-discussion] Creating parallel curves Message-ID: Hi All, my apologies for my deep ignorance about math stuff; I guess I should be able to find this out but I keep getting impossible results. Basically I have a set of x, y data (around 1,000 elements each) and I want to create 2 parallel "curves" (offset curves) to the original one; "parallel" means curves which are displaced from the base curve by a constant offset, either positive or negative, in the direction of the curve's normal. Something like this: http://pyx.sourceforge.net/examples/drawing2/parallel.html The point in the x variable are monotonically increasing. For every point in the curve, I also have the angle this point creates to the vertical (y) direction, so I naively thought of doing this: angles = numpy.deg2rad(angles) x_low = x - DISTANCE*numpy.cos(angles) y_low = y + DISTANCE*numpy.sin(angles) x_high = x + DISTANCE*numpy.cos(angles) y_high = y - DISTANCE*numpy.sin(angles) But by plotting these thing out with matplotlib it seems to me they don't really look very parallel nor very constant-distance. I admit the matplotlib axis scales can play a significant role here, but I was wondering if some of you could spot my dumb mistake (and maybe provide a way to nicely take into account the different scales of the matplotlib axes so that the curves really look parallel...). Thank you in advance for any suggestion. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ From chris.barker at noaa.gov Fri Feb 10 11:53:23 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 10 Feb 2012 08:53:23 -0800 Subject: [Numpy-discussion] Creating parallel curves In-Reply-To: References: Message-ID: Andrea, > Basically I have a set of x, y data (around 1,000 elements each) and I > want to create 2 parallel "curves" (offset curves) to the original > one; "parallel" means curves which are displaced from the base curve > by a constant offset, either positive or negative, in the direction of > the curve's normal. Something like this: > > http://pyx.sourceforge.net/examples/drawing2/parallel.html THis is called "buffering" in GIS parlance -- there are functions available to do it in GIS an computational geometry libraries: you might look in the shapely package: https://github.com/sgillies/shapely or CGAL http://www.cgal.org/ If the overhead of these packages is too much, and you still want to write your own code, try googling: "buffering a line GIS algorithm" or something like that, and you'll find pointers. > But by plotting these thing out with matplotlib it seems to me they > don't really look very parallel nor very constant-distance. as we say on the wxPython list -- post a fully functional example, so we can check it out. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From sourceforge.numpy at user.fastmail.fm Fri Feb 10 12:26:02 2012 From: sourceforge.numpy at user.fastmail.fm (Hugo Gagnon) Date: Fri, 10 Feb 2012 12:26:02 -0500 Subject: [Numpy-discussion] "trapezoidal" grids Message-ID: <1328894762.12027.140661034881277@webmail.messagingengine.com> Hello, Say I have four corner points a = (X0, Y0), b = (X1, Y1), c = (X2, Y2) and d = (X3, Y3): a----------b \ / \ / c----d Is there a function like meshgrid that would return me a grid of points linearly interpolating those four corner points? Thanks, From ralf.gommers at googlemail.com Fri Feb 10 14:51:57 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 10 Feb 2012 20:51:57 +0100 Subject: [Numpy-discussion] On making Numpy 1.7 a long term support release. In-Reply-To: References: <41D27F43-E5C6-478C-8A11-297AEC50431D@continuum.io> Message-ID: On Fri, Feb 10, 2012 at 10:25 AM, David Cournapeau wrote: > On Sun, Feb 5, 2012 at 7:19 AM, Ralf Gommers > wrote: > > > > > > On Sun, Feb 5, 2012 at 7:33 AM, Travis Oliphant > wrote: > >> > >> I think supporting Python 2.5 and above is completely fine. I'd even be > >> in favor of bumping up to Python 2.6 for NumPy 1.7 and certainly for > NumPy > >> 2.8 > >> > > +1 for dropping Python 2.5 support also for an LTS release. That will > make > > it a lot easier to use str.format() and the with statement (plus many > other > > things) going forward, without having to think about if your changes can > be > > backported to that LTS release. > > At the risk of sounding like a broken record, I would really like to > stay to 2.4, especially for a long term release :) This is still the > basis used by a lots of long-term python products. If we can support > 2.4 for a LTS, I would then be much more comfortable to allow bumping > to 2.5 for 1.8. > At the very least someone should step up to do the testing or maintain a buildbot for Python 2.4 then. Also for scipy, assuming that scipy keeps supporting the same Python versions as numpy. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Sat Feb 11 00:45:50 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Fri, 10 Feb 2012 21:45:50 -0800 Subject: [Numpy-discussion] "trapezoidal" grids In-Reply-To: <1328894762.12027.140661034881277@webmail.messagingengine.com> References: <1328894762.12027.140661034881277@webmail.messagingengine.com> Message-ID: On Fri, Feb 10, 2012 at 9:26 AM, Hugo Gagnon wrote: > Hello, > > Say I have four corner points a = (X0, Y0), b = (X1, Y1), c = (X2, Y2) > and d = (X3, Y3): > > a----------b > ?\ ? ? ? ?/ > ?\ ? ? ?/ > ? c----d > > Is there a function like meshgrid that would return me a grid of points > linearly interpolating those four corner points? It depends on what you mean by "linearly interpolating". For example, you can construct a regular grid and simply discard points outside the trapesium, or assume that the trapesium is a "warped perspective" on a regular grid. For the latter case, you can construct a grid like this: http://mentat.za.net/refer/np_warped_grid.png (code attached) St?fan -------------- next part -------------- A non-text attachment was scrubbed... Name: quad_grid.py Type: text/x-python Size: 899 bytes Desc: not available URL: From scott.sinclair.za at gmail.com Sat Feb 11 01:03:25 2012 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Sat, 11 Feb 2012 08:03:25 +0200 Subject: [Numpy-discussion] [ANN] new solver for multiobjective optimization problems In-Reply-To: References: <87834.1328886646.5803578716629106688@ffe16.ukr.net> Message-ID: On 10 February 2012 17:59, Neal Becker wrote: > And where do we find this gem? Presumably by following the hyper-links in the e-mail (non-obvious if you're using a plain-text mail client..) Cheers, Scott From ralf.gommers at googlemail.com Sat Feb 11 04:08:04 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 11 Feb 2012 10:08:04 +0100 Subject: [Numpy-discussion] On making Numpy 1.7 a long term support release. In-Reply-To: References: <41D27F43-E5C6-478C-8A11-297AEC50431D@continuum.io> Message-ID: On Fri, Feb 10, 2012 at 8:51 PM, Ralf Gommers wrote: > > > On Fri, Feb 10, 2012 at 10:25 AM, David Cournapeau wrote: > >> On Sun, Feb 5, 2012 at 7:19 AM, Ralf Gommers >> wrote: >> > >> > >> > On Sun, Feb 5, 2012 at 7:33 AM, Travis Oliphant >> wrote: >> >> >> >> I think supporting Python 2.5 and above is completely fine. I'd even >> be >> >> in favor of bumping up to Python 2.6 for NumPy 1.7 and certainly for >> NumPy >> >> 2.8 >> >> >> > +1 for dropping Python 2.5 support also for an LTS release. That will >> make >> > it a lot easier to use str.format() and the with statement (plus many >> other >> > things) going forward, without having to think about if your changes >> can be >> > backported to that LTS release. >> >> At the risk of sounding like a broken record, I would really like to >> stay to 2.4, especially for a long term release :) This is still the >> basis used by a lots of long-term python products. If we can support >> 2.4 for a LTS, I would then be much more comfortable to allow bumping >> to 2.5 for 1.8. >> > > At the very least someone should step up to do the testing or maintain a > buildbot for Python 2.4 then. Also for scipy, assuming that scipy keeps > supporting the same Python versions as numpy. > > Here's a list of Python requirements for other important scientific python projects: - ipython: >= 2.6 - matplotlib: v1.1 supports 2.4-2.7, v1.2 will support >= 2.6 - scikit-learn: >= 2.6 - scikit-image: >= 2.5 - scikits.statsmodels: >= 2.5 (next release probably >= 2.6) That there are still some projects/products out there that still use Python 2.4 (some examples of such products would be nice by the way) is not enough of a reason by itself to continue to support it in new releases. There has to be a good reason for those products to require the very latest numpy, even though they are fine with a very old Python and older versions of almost any other Python package. It would also make sense to ship binaries (for Windows at least) of all Python versions we say we support. We haven't done so for 2.4 for a very long time. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sat Feb 11 06:05:41 2012 From: cournape at gmail.com (David Cournapeau) Date: Sat, 11 Feb 2012 11:05:41 +0000 Subject: [Numpy-discussion] On making Numpy 1.7 a long term support release. In-Reply-To: References: <41D27F43-E5C6-478C-8A11-297AEC50431D@continuum.io> Message-ID: On Sat, Feb 11, 2012 at 9:08 AM, Ralf Gommers wrote: > > > On Fri, Feb 10, 2012 at 8:51 PM, Ralf Gommers > wrote: >> >> >> >> On Fri, Feb 10, 2012 at 10:25 AM, David Cournapeau >> wrote: >>> >>> On Sun, Feb 5, 2012 at 7:19 AM, Ralf Gommers >>> wrote: >>> > >>> > >>> > On Sun, Feb 5, 2012 at 7:33 AM, Travis Oliphant >>> > wrote: >>> >> >>> >> I think supporting Python 2.5 and above is completely fine. ?I'd even >>> >> be >>> >> in favor of bumping up to Python 2.6 for NumPy 1.7 and certainly for >>> >> NumPy >>> >> 2.8 >>> >> >>> > +1 for dropping Python 2.5 support also for an LTS release. That will >>> > make >>> > it a lot easier to use str.format() and the with statement (plus many >>> > other >>> > things) going forward, without having to think about if your changes >>> > can be >>> > backported to that LTS release. >>> >>> At the risk of sounding like a broken record, I would really like to >>> stay to 2.4, especially for a long term release :) This is still the >>> basis used by a lots of long-term python products. If we can support >>> 2.4 for a LTS, I would then be much more comfortable to allow bumping >>> to 2.5 for 1.8. >> >> >> At the very least someone should step up to do the testing or maintain a >> buildbot for Python 2.4 then. Also for scipy, assuming that scipy keeps >> supporting the same Python versions as numpy. >> > Here's a list of Python requirements for other important scientific python > projects: > - ipython: >= 2.6 > - matplotlib: v1.1 supports 2.4-2.7, v1.2 will support >= 2.6 > - scikit-learn: >= 2.6 > - scikit-image: >= 2.5 > - scikits.statsmodels: >= 2.5 (next release probably >= 2.6) > > That there are still some projects/products out there that still use Python > 2.4 (some examples of such products would be nice by the way) is not enough > of a reason by itself to continue to support it in new releases. There has > to be a good reason for those products to require the very latest numpy, > even though they are fine with a very old Python and older versions of > almost any other Python package. I don't think that last argument is relevant for a LTS. Numpy is used in environments where you cannot easily control what's installed. RHEL still uses python 2.4 and will be supported until 2014 in the production phase. As for projects still using python 2.4, using the most downloaded packages from this list http://taichino.appspot.com/pypi_ranking/modules?page=1, most of them supported python 2.4 or below. lxml, zc.buildout, setuptools, pip, virtualenv and sqlalchemy do. Numpy itself is also used outside the strict scientific realm, which is why I am a bit warry about just comparing with other scientific python packages. Now, if everybody else is against it, I don't want to be a pain about it either :) David From ralf.gommers at googlemail.com Sat Feb 11 08:30:39 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 11 Feb 2012 14:30:39 +0100 Subject: [Numpy-discussion] On making Numpy 1.7 a long term support release. In-Reply-To: References: <41D27F43-E5C6-478C-8A11-297AEC50431D@continuum.io> Message-ID: On Sat, Feb 11, 2012 at 12:05 PM, David Cournapeau wrote: > On Sat, Feb 11, 2012 at 9:08 AM, Ralf Gommers > wrote: > > > > > > On Fri, Feb 10, 2012 at 8:51 PM, Ralf Gommers < > ralf.gommers at googlemail.com> > > wrote: > >> > >> > >> > >> On Fri, Feb 10, 2012 at 10:25 AM, David Cournapeau > >> wrote: > >>> > >>> On Sun, Feb 5, 2012 at 7:19 AM, Ralf Gommers > >>> wrote: > >>> > > >>> > > >>> > On Sun, Feb 5, 2012 at 7:33 AM, Travis Oliphant > > >>> > wrote: > >>> >> > >>> >> I think supporting Python 2.5 and above is completely fine. I'd > even > >>> >> be > >>> >> in favor of bumping up to Python 2.6 for NumPy 1.7 and certainly for > >>> >> NumPy > >>> >> 2.8 > >>> >> > >>> > +1 for dropping Python 2.5 support also for an LTS release. That will > >>> > make > >>> > it a lot easier to use str.format() and the with statement (plus many > >>> > other > >>> > things) going forward, without having to think about if your changes > >>> > can be > >>> > backported to that LTS release. > >>> > >>> At the risk of sounding like a broken record, I would really like to > >>> stay to 2.4, especially for a long term release :) This is still the > >>> basis used by a lots of long-term python products. If we can support > >>> 2.4 for a LTS, I would then be much more comfortable to allow bumping > >>> to 2.5 for 1.8. > >> > >> > >> At the very least someone should step up to do the testing or maintain a > >> buildbot for Python 2.4 then. Also for scipy, assuming that scipy keeps > >> supporting the same Python versions as numpy. > >> > > Here's a list of Python requirements for other important scientific > python > > projects: > > - ipython: >= 2.6 > > - matplotlib: v1.1 supports 2.4-2.7, v1.2 will support >= 2.6 > > - scikit-learn: >= 2.6 > > - scikit-image: >= 2.5 > > - scikits.statsmodels: >= 2.5 (next release probably >= 2.6) > > > > That there are still some projects/products out there that still use > Python > > 2.4 (some examples of such products would be nice by the way) is not > enough > > of a reason by itself to continue to support it in new releases. There > has > > to be a good reason for those products to require the very latest numpy, > > even though they are fine with a very old Python and older versions of > > almost any other Python package. > > I don't think that last argument is relevant for a LTS. I think it's a relevant argument right now, irrespective of whether or not 1.7 will be an LTS. > Numpy is used in environments where you cannot easily control what's > installed. RHEL > still uses python 2.4 and will be supported until 2014 in the > production phase. > As Bruce said, 29 Feb 2012 and not 2014: https://access.redhat.com/support/policy/updates/errata/ Ralf As for projects still using python 2.4, using the most downloaded > packages from this list > http://taichino.appspot.com/pypi_ranking/modules?page=1, most of them > supported python 2.4 or below. lxml, zc.buildout, setuptools, pip, > virtualenv and sqlalchemy do. Numpy itself is also used outside the > strict scientific realm, which is why I am a bit warry about just > comparing with other scientific python packages. > > Now, if everybody else is against it, I don't want to be a pain about > it either :) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brad.reisfeld at gmail.com Sat Feb 11 08:49:28 2012 From: brad.reisfeld at gmail.com (Brad Reisfeld) Date: Sat, 11 Feb 2012 06:49:28 -0700 Subject: [Numpy-discussion] simple manipulations of numpy arrays In-Reply-To: References: Message-ID: <4F3671E8.7060701@gmail.com> On 2/10/2012 8:53 AM, Francesc Alted wrote: > On Feb 10, 2012, at 4:50 PM, Francesc Alted wrote: > >> https://github.com/FrancescAlted/carry > > Hmm, this should be: > > https://github.com/FrancescAlted/carray > > Blame my (too) smart spell corrector. > > -- Francesc Alted Thank you, Francesc and Tom, for your suggestions. -Brad From cournape at gmail.com Sat Feb 11 10:08:39 2012 From: cournape at gmail.com (David Cournapeau) Date: Sat, 11 Feb 2012 15:08:39 +0000 Subject: [Numpy-discussion] On making Numpy 1.7 a long term support release. In-Reply-To: References: <41D27F43-E5C6-478C-8A11-297AEC50431D@continuum.io> Message-ID: On Sat, Feb 11, 2012 at 1:30 PM, Ralf Gommers wrote: > > > As Bruce said, 29 Feb 2012 and not 2014: > https://access.redhat.com/support/policy/updates/errata/ I think Bruce and me were not talking about the same RHEL version (4 vs 5). Let me see if I can set up a buildbot for 2.4. David From ralf.gommers at googlemail.com Sat Feb 11 10:19:00 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 11 Feb 2012 16:19:00 +0100 Subject: [Numpy-discussion] On making Numpy 1.7 a long term support release. In-Reply-To: References: <41D27F43-E5C6-478C-8A11-297AEC50431D@continuum.io> Message-ID: On Sat, Feb 11, 2012 at 4:08 PM, David Cournapeau wrote: > On Sat, Feb 11, 2012 at 1:30 PM, Ralf Gommers > wrote: > > > > > > As Bruce said, 29 Feb 2012 and not 2014: > > https://access.redhat.com/support/policy/updates/errata/ > > I think Bruce and me were not talking about the same RHEL version (4 vs 5). > > Ah, in that case it makes more sense to keep supporting it for two more years. Let me see if I can set up a buildbot for 2.4. > That would be very helpful, thanks. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Sat Feb 11 13:51:50 2012 From: travis at continuum.io (Travis Oliphant) Date: Sat, 11 Feb 2012 12:51:50 -0600 Subject: [Numpy-discussion] On making Numpy 1.7 a long term support release. In-Reply-To: References: <41D27F43-E5C6-478C-8A11-297AEC50431D@continuum.io> Message-ID: <7EBDEAD7-53C3-4E5A-8A09-47DEE7368A3F@continuum.io> NumPy 1.7 is due out in the next few weeks. This will obviously support 2.4. It can be used for as long as people want. Right now, there is a plan for NumPy 1.8 to be released in the summer which will have much attention paid to it in order to improve the documentation, add bug-fixes, as well as make feature additions. Right now, the plan is to have that release support 2.5, however, major bug-fixes will be back-ported to the 1.7 series as patches are available. I suspect that different organizations will use different versions of NumPy as their own LTS. I plan on encouraging people to use 1.8 as the LTS. Work on NumPy 2.0 is already underway, but it will likely not be ready until January of 2013 at the earliest. Of course, there may be happy circumstances that accelerate that plan and other events that delay it. But, that is my best guess at the moment. Best, -Travis On Feb 10, 2012, at 3:25 AM, David Cournapeau wrote: > On Sun, Feb 5, 2012 at 7:19 AM, Ralf Gommers > wrote: >> >> >> On Sun, Feb 5, 2012 at 7:33 AM, Travis Oliphant wrote: >>> >>> I think supporting Python 2.5 and above is completely fine. I'd even be >>> in favor of bumping up to Python 2.6 for NumPy 1.7 and certainly for NumPy >>> 2.8 >>> >> +1 for dropping Python 2.5 support also for an LTS release. That will make >> it a lot easier to use str.format() and the with statement (plus many other >> things) going forward, without having to think about if your changes can be >> backported to that LTS release. > > At the risk of sounding like a broken record, I would really like to > stay to 2.4, especially for a long term release :) This is still the > basis used by a lots of long-term python products. If we can support > 2.4 for a LTS, I would then be much more comfortable to allow bumping > to 2.5 for 1.8. > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From travis at continuum.io Sat Feb 11 14:11:34 2012 From: travis at continuum.io (Travis Oliphant) Date: Sat, 11 Feb 2012 13:11:34 -0600 Subject: [Numpy-discussion] Commit rights to NumPy for Francesc Alted Message-ID: I propose to give Francesc Alted commit rights to the NumPy project. Francesc will be working full time on NumPy for several months and it will enable him to participate in pull requests. Francesc Alted has been very active in the larger Python for Science community and has written PyTables, Blosc, carray and made contributions to NumExpr. He was also instrumental in early design efforts around the fully-recursive dtype functionality of NumPy. Thoughts? -Travis From travis at continuum.io Sat Feb 11 14:44:13 2012 From: travis at continuum.io (Travis Oliphant) Date: Sat, 11 Feb 2012 13:44:13 -0600 Subject: [Numpy-discussion] Migrating issues to GitHub Message-ID: How to people feel about moving the issue tracking for NumPy to Github? It looks like they have improved their issue tracking quite a bit and the workflow and integration with commits looks quite good from what I can see. Here is one tool I saw that might help in the migration: https://github.com/trustmaster/trac2github Are there others? -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Sat Feb 11 15:06:17 2012 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 11 Feb 2012 14:06:17 -0600 Subject: [Numpy-discussion] Migrating issues to GitHub In-Reply-To: References: Message-ID: On Saturday, February 11, 2012, Travis Oliphant wrote: > How to people feel about moving the issue tracking for NumPy to Github? It looks like they have improved their issue tracking quite a bit and the workflow and integration with commits looks quite good from what I can see. > Here is one tool I saw that might help in the migration: https://github.com/trustmaster/trac2github > Are there others? > -Travis > This is probably less of an issue for numpy, but our biggest complaint about the github tracker for matplotlib is the inability for users to add attachments. The second complaint is that it is awkward to assign priorities (has to be done via labels). Particularly, users can not apply labels themselves. Mind you, neither of these complaints were enough to completely preclude mpl from migrating, but it should be taken into consideration. Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Feb 11 15:13:14 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 11 Feb 2012 13:13:14 -0700 Subject: [Numpy-discussion] Commit rights to NumPy for Francesc Alted In-Reply-To: References: Message-ID: On Sat, Feb 11, 2012 at 12:11 PM, Travis Oliphant wrote: > I propose to give Francesc Alted commit rights to the NumPy project. > Francesc will be working full time on NumPy for several months and it will > enable him to participate in pull requests. > > Francesc Alted has been very active in the larger Python for Science > community and has written PyTables, Blosc, carray and made contributions to > NumExpr. He was also instrumental in early design efforts around the > fully-recursive dtype functionality of NumPy. > > Thoughts? > > Oh, definitely. +1. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Feb 11 15:31:51 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 11 Feb 2012 13:31:51 -0700 Subject: [Numpy-discussion] @Dag re numpy.pxd Message-ID: Hi Dag, This probably needs to be on the cython mailing list at some point, but I thought I'd start the discussion here. Numpy is going to begin deprecating direct access to ndarray/dtype internals, ala arr->data etc. There are currently macros/functions for many of these operations in the numpy development branch and I expect more to go in over the coming year. Also, some of the macros have been renamed. I don't know the best way for Cython to support this, but the current version (0.15 here) generates code that will fail if the deprecated things are excluded. Ideally, numpy.pxd would have numpy version dependent parts but I don't know if that is possible. In any case, I'd like your thoughts on the best way to coordinate this migration with Cython. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sat Feb 11 15:36:26 2012 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 11 Feb 2012 21:36:26 +0100 Subject: [Numpy-discussion] Migrating issues to GitHub In-Reply-To: References: Message-ID: 11.02.2012 20:44, Travis Oliphant kirjoitti: > How to people feel about moving the issue tracking for NumPy to Github? > It looks like they have improved their issue tracking quite a bit and > the workflow and integration with commits looks quite good from what I > can see. The lack of attachments is the main problem with this transition. It's not so seldom that numerical input data or scripts demonstrating an issue come useful. This is probably less of an issue for Numpy than for Scipy, though. Pauli From travis at continuum.io Sat Feb 11 15:44:20 2012 From: travis at continuum.io (Travis Oliphant) Date: Sat, 11 Feb 2012 14:44:20 -0600 Subject: [Numpy-discussion] Migrating issues to GitHub In-Reply-To: References: Message-ID: <43E1D8BA-5E1C-45C3-AAF2-608FD2439132@continuum.io> This is good feedback. It looks like there are 2 concerns: 1) no way to add attachments --- it would seem that gists and indeed other github repos solves that problem. 2) You must be an admin to label an issue (i.e. set it as a bug, enhancement, or so forth). This second concern seems more of a problem. Perhaps this is something that can be brought up with the github developers directly. Not separating issue permissions from code permissions seems rather unfortunate, and creates work for all admins. On the other hand, it might force having an admin who is paying regular attention to the issues which is not necessarily a bad thing. So, despite the drawback, it seems that having issues on Trac and having code-conversations on those issues happening separately from the pull-request conversations is even less optimal. -Travis On Feb 11, 2012, at 2:06 PM, Benjamin Root wrote: > > > On Saturday, February 11, 2012, Travis Oliphant wrote: > > How to people feel about moving the issue tracking for NumPy to Github? It looks like they have improved their issue tracking quite a bit and the workflow and integration with commits looks quite good from what I can see. > > Here is one tool I saw that might help in the migration: https://github.com/trustmaster/trac2github > > Are there others? > > -Travis > > > > This is probably less of an issue for numpy, but our biggest complaint about the github tracker for matplotlib is the inability for users to add attachments. > > The second complaint is that it is awkward to assign priorities (has to be done via labels). Particularly, users can not apply labels themselves. > > Mind you, neither of these complaints were enough to completely preclude mpl from migrating, but it should be taken into consideration. > > Cheers! > Ben Root _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Feb 11 15:53:17 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 11 Feb 2012 13:53:17 -0700 Subject: [Numpy-discussion] Migrating issues to GitHub In-Reply-To: <43E1D8BA-5E1C-45C3-AAF2-608FD2439132@continuum.io> References: <43E1D8BA-5E1C-45C3-AAF2-608FD2439132@continuum.io> Message-ID: On Sat, Feb 11, 2012 at 1:44 PM, Travis Oliphant wrote: > This is good feedback. > > It looks like there are 2 concerns: > > 1) no way to add attachments --- it would seem that gists and indeed other > github repos solves that problem. > 2) You must be an admin to label an issue (i.e. set it as a bug, > enhancement, or so forth). > > This second concern seems more of a problem. Perhaps this is something > that can be brought up with the github developers directly. Not > separating issue permissions from code permissions seems rather > unfortunate, and creates work for all admins. > > On the other hand, it might force having an admin who is paying regular > attention to the issues which is not necessarily a bad thing. > > So, despite the drawback, it seems that having issues on Trac and having > code-conversations on those issues happening separately from the > pull-request conversations is even less optimal. > > It does present a problem for migrating current trac as a number of the tickets have attachments and we don't want to lose them. > On Feb 11, 2012, at 2:06 PM, Benjamin Root wrote: > > > > On Saturday, February 11, 2012, Travis Oliphant > wrote: > > How to people feel about moving the issue tracking for NumPy to Github? > It looks like they have improved their issue tracking quite a bit and > the workflow and integration with commits looks quite good from what I can > see. > > Here is one tool I saw that might help in the migration: > https://github.com/trustmaster/trac2github > > Are there others? > > -Travis > > > > This is probably less of an issue for numpy, but our biggest complaint > about the github tracker for matplotlib is the inability for users to add > attachments. > > The second complaint is that it is awkward to assign priorities (has to be > done via labels). Particularly, users can not apply labels themselves. > > Mind you, neither of these complaints were enough to completely preclude > mpl from migrating, but it should be taken into consideration. > > Cheers! > Ben Root _______________________________________________ > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Sat Feb 11 15:54:56 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Sat, 11 Feb 2012 12:54:56 -0800 Subject: [Numpy-discussion] Migrating issues to GitHub In-Reply-To: References: Message-ID: On Sat, Feb 11, 2012 at 12:36 PM, Pauli Virtanen wrote: > The lack of attachments is the main problem with this transition. It's > not so seldom that numerical input data or scripts demonstrating an > issue come useful. This is probably less of an issue for Numpy than for > Scipy, though. We've taken to using gist for scripts/data and free image hosting sites for screenshots, using Alternatively, as Dag mentioned in the cython-devel thread, we could > just deprecate the fields in Cython as well and place the burden on > the user (and possibly issue warnings for their use). > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Sat Feb 11 16:49:44 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sat, 11 Feb 2012 15:49:44 -0600 Subject: [Numpy-discussion] Migrating issues to GitHub In-Reply-To: <4F36D9A1.6020500@hawaii.edu> References: <43E1D8BA-5E1C-45C3-AAF2-608FD2439132@continuum.io> <4F36D9A1.6020500@hawaii.edu> Message-ID: On Sat, Feb 11, 2012 at 3:12 PM, Eric Firing wrote: > On 02/11/2012 10:44 AM, Travis Oliphant wrote: > > > 2) You must be an admin to label an issue (i.e. set it as a bug, > > enhancement, or so forth). > > A third problem is that the entire style of presentation is poorly > designed from a use standpoint, in comparison to the sourceforge tracker > which mpl used previously. The github tracker appears to have been > designed by a graphics person, not a software maintainer. The > information density in the issue list is very low; it is impossible to > scan a large number of issues at once; there doesn't seem to be any > useful sorting and selection mechanism. > The lack of a tabular way to mass-edit bugs is one of my biggest problems with the current trac. One thing that ideally we could do regularly is to rapidly triage 100s of bugs. Currently trac requires you to go through them one by one, like harvesting wheat with a scythe instead of a combine. Users who are mentioned in a lot of tickets also get spammed by a large number of message, instead of getting a single summary update of all the triaging that was done. Does the github bug tracker have a good story about mass bug-updating? -Mark > > > > This second concern seems more of a problem. Perhaps this is something > > that can be brought up with the github developers directly. Not > > separating issue permissions from code permissions seems rather > > unfortunate, and creates work for all admins. > > This doesn't seem so bad to me, at least compared to the *really* bad > aspects. > > > > > On the other hand, it might force having an admin who is paying regular > > attention to the issues which is not necessarily a bad thing. > > > > So, despite the drawback, it seems that having issues on Trac and having > > code-conversations on those issues happening separately from the > > pull-request conversations is even less optimal. > > The one good thing about the github tracker is its integration with the > code. Otherwise it is still just plain bad, and will remain so until it > is given an information-dense tabular interface, with things like > initiation date, last update, category, priority, etc. Down with > whitespace and icons! We need information! > > Eric > > > > -Travis > > > > > > > > On Feb 11, 2012, at 2:06 PM, Benjamin Root wrote: > > > >> > >> > >> On Saturday, February 11, 2012, Travis Oliphant >> > wrote: > >> > How to people feel about moving the issue tracking for NumPy to > >> Github? It looks like they have improved their issue tracking quite a > >> bit and the workflow and integration with commits looks quite good > >> from what I can see. > >> > Here is one tool I saw that might help in the migration: > >> https://github.com/trustmaster/trac2github > >> > Are there others? > >> > -Travis > >> > > >> > >> This is probably less of an issue for numpy, but our biggest complaint > >> about the github tracker for matplotlib is the inability for users to > >> add attachments. > >> > >> The second complaint is that it is awkward to assign priorities (has > >> to be done via labels). Particularly, users can not apply labels > >> themselves. > >> > >> Mind you, neither of these complaints were enough to completely > >> preclude mpl from migrating, but it should be taken into consideration. > >> > >> Cheers! > >> Ben Root _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From markflorisson88 at gmail.com Sat Feb 11 16:57:48 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sat, 11 Feb 2012 21:57:48 +0000 Subject: [Numpy-discussion] @Dag re numpy.pxd In-Reply-To: References: Message-ID: On 11 February 2012 21:45, Mark Wiebe wrote: > On Sat, Feb 11, 2012 at 3:27 PM, mark florisson > wrote: >> >> On 11 February 2012 20:31, Charles R Harris >> wrote: >> > Hi Dag, >> > >> > This probably needs to be on the cython mailing list at some point, but >> > I >> > thought I'd start the discussion here. Numpy is going to begin >> > deprecating >> > direct access to ndarray/dtype internals, ala arr->data etc. There are >> > currently macros/functions for many of these operations in the numpy >> > development branch and I expect more to go in over the coming year. >> > Also, >> > some of the macros have been renamed. I don't know the best way for >> > Cython >> > to support this, but the current version (0.15 here) generates code that >> > will fail if the deprecated things are excluded. Ideally, numpy.pxd >> > would >> > have numpy version dependent parts but I don't know if that is possible. >> > In >> > any case, I'd like your thoughts on the best way to coordinate this >> > migration with Cython. >> > >> > Chuck >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> >> This was discussed not too long ago on the cython-devel mailing list: >> http://mail.python.org/pipermail/cython-devel/2012-January/001848.html >> >> I personally think it'd be nice to not break existing Cython code, by >> e.g. writing nogil cdef properties (something which doesn't currently >> exist). That way the properties could use the non-deprecated way to >> actually access the data from numpy. (In any case the deprecated numpy >> functionality should go through a deprecation process before being >> removed). > > > In the nditer, some functions are explicitly documented with a mechanism to > be called without holding the GIL. > > http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#NpyIter_Reset > > Internally, this produces a generic message that doesn't include the normal > user-friendly context, but is still better than just spitting out "runtime > error." Is this style good for cython, or do you have any other ideas of how > to support nogil while adding the possibility of raising errors? That's a nice way to support it. Cython itself often acquires the GIL in the exception case in nogil contexts, sets the exception, and then takes the error path. The problem with that is of course overhead, but it should usually do it for exceptional conditions only (i.e. things that normally should not occur, so not for normal conditions like raising StopIteration etc). However, we also want to get rid of the 'except' clause and in general need a way to check for error conditions for functions that have non-object return types and no known exceptional values for those types. For externally shared Cython functions this may mean exporting multiple versions with different ABIs, the point being that the user will not have to care, as taking the address or making it public would still give the normal C ABI compatible version of the function. Anyway, I digress. For NumPy that seems like a good thing to do. Perhaps it would be even nicer to pass in a pointer to npy_errorstate or some such, which holds complete error information. Then, with the GIL one could call something like npy_raise_from_errorstate(&my_error_state). Functions could easily set the error type as well in there through a borrowed reference. > -Mark > >> >> Alternatively, as Dag mentioned in the cython-devel thread, we could >> just deprecate the fields in Cython as well and place the burden on >> the user (and possibly issue warnings for their use). >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From mwwiebe at gmail.com Sat Feb 11 16:58:06 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sat, 11 Feb 2012 15:58:06 -0600 Subject: [Numpy-discussion] Commit rights to NumPy for Francesc Alted In-Reply-To: References: Message-ID: On Sat, Feb 11, 2012 at 2:13 PM, Charles R Harris wrote: > > On Sat, Feb 11, 2012 at 12:11 PM, Travis Oliphant wrote: > >> I propose to give Francesc Alted commit rights to the NumPy project. >> Francesc will be working full time on NumPy for several months and it will >> enable him to participate in pull requests. >> >> Francesc Alted has been very active in the larger Python for Science >> community and has written PyTables, Blosc, carray and made contributions to >> NumExpr. He was also instrumental in early design efforts around the >> fully-recursive dtype functionality of NumPy. >> >> Thoughts? >> >> > Oh, definitely. +1. > +1 as well -Mark > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sat Feb 11 16:59:41 2012 From: cournape at gmail.com (David Cournapeau) Date: Sat, 11 Feb 2012 21:59:41 +0000 Subject: [Numpy-discussion] Migrating issues to GitHub In-Reply-To: References: <43E1D8BA-5E1C-45C3-AAF2-608FD2439132@continuum.io> <4F36D9A1.6020500@hawaii.edu> Message-ID: On Sat, Feb 11, 2012 at 9:49 PM, Mark Wiebe wrote: > On Sat, Feb 11, 2012 at 3:12 PM, Eric Firing wrote: >> >> On 02/11/2012 10:44 AM, Travis Oliphant wrote: > > >> >> > 2) You must be an admin to label an issue (i.e. set it as a bug, >> > enhancement, or so forth). >> >> A third problem is that the entire style of presentation is poorly >> designed from a use standpoint, in comparison to the sourceforge tracker >> which mpl used previously. ?The github tracker appears to have been >> designed by a graphics person, not a software maintainer. ?The >> information density in the issue list is very low; it is impossible to >> scan a large number of issues at once; there doesn't seem to be any >> useful sorting and selection mechanism. > > > The lack of a tabular way to mass-edit bugs is one of my biggest problems > with the current trac. One thing that ideally we could do regularly is to > rapidly triage 100s of bugs. Currently trac requires you to go through them > one by one, like harvesting wheat with a scythe instead of a combine. Users > who are mentioned in a lot of tickets also get spammed by a large number of > message, instead of getting a single summary update of all the triaging that > was done. > > Does the github bug tracker have a good story about mass bug-updating? Github is better than trac on that issue: updating the milestone for many bugs at once is simple. You don't have priorities, etc?, though. The Rest API also enables in principle to write tools to automate the repetitive tasks. David From fperez.net at gmail.com Sat Feb 11 17:06:19 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Sat, 11 Feb 2012 14:06:19 -0800 Subject: [Numpy-discussion] Commit rights to NumPy for Francesc Alted In-Reply-To: References: Message-ID: On Sat, Feb 11, 2012 at 11:11 AM, Travis Oliphant wrote: > I propose to give Francesc Alted commit rights to the NumPy project. I'm only surprised he didn't have them already, given how much he has contributed over the years! I remember when numpy was reaching 1.0 stage, the insane amount of careful, detailed feedback he provided, which I think was an important part of numpy 1.0 being a great success out of the gate. Cheers, f From ralf.gommers at googlemail.com Sat Feb 11 18:07:55 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 12 Feb 2012 00:07:55 +0100 Subject: [Numpy-discussion] Commit rights to NumPy for Francesc Alted In-Reply-To: References: Message-ID: On Sat, Feb 11, 2012 at 11:06 PM, Fernando Perez wrote: > On Sat, Feb 11, 2012 at 11:11 AM, Travis Oliphant > wrote: > > I propose to give Francesc Alted commit rights to the NumPy project. > > +1. > I'm only surprised he didn't have them already, given how much he has > contributed over the years! With the github move, anyone who didn't ask for commit rights again doesn't have them anymore. Ralf > I remember when numpy was reaching 1.0 > stage, the insane amount of careful, detailed feedback he provided, > which I think was an important part of numpy 1.0 being a great success > out of the gate. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sat Feb 11 18:11:05 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 12 Feb 2012 00:11:05 +0100 Subject: [Numpy-discussion] On making Numpy 1.7 a long term support release. In-Reply-To: <7EBDEAD7-53C3-4E5A-8A09-47DEE7368A3F@continuum.io> References: <41D27F43-E5C6-478C-8A11-297AEC50431D@continuum.io> <7EBDEAD7-53C3-4E5A-8A09-47DEE7368A3F@continuum.io> Message-ID: On Sat, Feb 11, 2012 at 7:51 PM, Travis Oliphant wrote: > NumPy 1.7 is due out in the next few weeks. This depends on whether all the issues regarding the move to gcc 4.x on Windows will be solved. Right now numpy is not releasable. Either those issues get solved, or we have to do something about the part of datetime that requires 4.x. Neither seems to be very easy. Ralf > This will obviously support 2.4. It can be used for as long as people > want. > > Right now, there is a plan for NumPy 1.8 to be released in the summer > which will have much attention paid to it in order to improve the > documentation, add bug-fixes, as well as make feature additions. Right > now, the plan is to have that release support 2.5, however, major bug-fixes > will be back-ported to the 1.7 series as patches are available. I suspect > that different organizations will use different versions of NumPy as their > own LTS. I plan on encouraging people to use 1.8 as the LTS. > > Work on NumPy 2.0 is already underway, but it will likely not be ready > until January of 2013 at the earliest. Of course, there may be happy > circumstances that accelerate that plan and other events that delay it. > But, that is my best guess at the moment. > > Best, > > -Travis > > > > > > > > On Feb 10, 2012, at 3:25 AM, David Cournapeau wrote: > > > On Sun, Feb 5, 2012 at 7:19 AM, Ralf Gommers > > wrote: > >> > >> > >> On Sun, Feb 5, 2012 at 7:33 AM, Travis Oliphant > wrote: > >>> > >>> I think supporting Python 2.5 and above is completely fine. I'd even > be > >>> in favor of bumping up to Python 2.6 for NumPy 1.7 and certainly for > NumPy > >>> 2.8 > >>> > >> +1 for dropping Python 2.5 support also for an LTS release. That will > make > >> it a lot easier to use str.format() and the with statement (plus many > other > >> things) going forward, without having to think about if your changes > can be > >> backported to that LTS release. > > > > At the risk of sounding like a broken record, I would really like to > > stay to 2.4, especially for a long term release :) This is still the > > basis used by a lots of long-term python products. If we can support > > 2.4 for a LTS, I would then be much more comfortable to allow bumping > > to 2.5 for 1.8. > > > > David > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From e.antero.tammi at gmail.com Sat Feb 11 18:12:14 2012 From: e.antero.tammi at gmail.com (eat) Date: Sun, 12 Feb 2012 01:12:14 +0200 Subject: [Numpy-discussion] Want to eliminate direct for-loop In-Reply-To: References: Message-ID: Hi, On Sat, Feb 11, 2012 at 10:56 PM, Dinesh B Vadhia wrote: > ** > Could the following be written without the direct for-loop? > > import numpy > # numpy vector r of any data type and length, eg. > r = numpy.ones(25, dtype='int') > # s is a list of values (of any data type), eg. > s = [47, 27, 67] > # c is a list of (variable length) lists where the sub-list elements are > index values of r and len(s) = len(c), eg. > c = [[3, 6, 9], [6, 11, 19, 24], [4, 9, 11, 21 ]] > # for each element in each sub-list c, add corresponding s value to the > index value in r, eg. > for i, j in enumerate(c): > r[j] += s[i] > > So, we get: > r[[3, 6, 9]] += s[0] = 1 + 47 = 48 > r[[6, 11, 19, 24]] += s[1] = 1 + 27 = 28 > r[[4, 9, 11, 21]] += s[2] = 1 + 67 = 68 > > ie. r = array([ 1, 1, 1, 95, 68, 1, 122, 1, 1, 162, 1, > 95, 1, 1, 1, 1, 1, 1, 1, 28, 1, 68, 1, 1, 28]) > > Thank-you! > Could you describe more detailed manner about why you want to get rid of that loop? Performance wise? If so, do you have profiled what's the bottleneck? Please provide also a more detailed description of your problem, since now your current spec seems to yield: r= array([ 1, 1, 1, 48, 68, 1, 75, 1, 1, 115, 1, 95, 1, 1, 1, 1, 1, 1, 1, 28, 1, 68, 1, 1, 28]) My 2 cents, -eat > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Feb 11 18:38:29 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 11 Feb 2012 16:38:29 -0700 Subject: [Numpy-discussion] @Dag re numpy.pxd In-Reply-To: References: Message-ID: On Sat, Feb 11, 2012 at 2:57 PM, mark florisson wrote: > On 11 February 2012 21:45, Mark Wiebe wrote: > > On Sat, Feb 11, 2012 at 3:27 PM, mark florisson < > markflorisson88 at gmail.com> > > wrote: > >> > >> On 11 February 2012 20:31, Charles R Harris > >> wrote: > >> > Hi Dag, > >> > > >> > This probably needs to be on the cython mailing list at some point, > but > >> > I > >> > thought I'd start the discussion here. Numpy is going to begin > >> > deprecating > >> > direct access to ndarray/dtype internals, ala arr->data etc. There are > >> > currently macros/functions for many of these operations in the numpy > >> > development branch and I expect more to go in over the coming year. > >> > Also, > >> > some of the macros have been renamed. I don't know the best way for > >> > Cython > >> > to support this, but the current version (0.15 here) generates code > that > >> > will fail if the deprecated things are excluded. Ideally, numpy.pxd > >> > would > >> > have numpy version dependent parts but I don't know if that is > possible. > >> > In > >> > any case, I'd like your thoughts on the best way to coordinate this > >> > migration with Cython. > >> > > >> > Chuck > >> > > >> > _______________________________________________ > >> > NumPy-Discussion mailing list > >> > NumPy-Discussion at scipy.org > >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > >> > >> This was discussed not too long ago on the cython-devel mailing list: > >> http://mail.python.org/pipermail/cython-devel/2012-January/001848.html > >> > >> I personally think it'd be nice to not break existing Cython code, by > >> e.g. writing nogil cdef properties (something which doesn't currently > >> exist). That way the properties could use the non-deprecated way to > >> actually access the data from numpy. (In any case the deprecated numpy > >> functionality should go through a deprecation process before being > >> removed). > > > > > > In the nditer, some functions are explicitly documented with a mechanism > to > > be called without holding the GIL. > > > > > http://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#NpyIter_Reset > > > > Internally, this produces a generic message that doesn't include the > normal > > user-friendly context, but is still better than just spitting out > "runtime > > error." Is this style good for cython, or do you have any other ideas of > how > > to support nogil while adding the possibility of raising errors? > > That's a nice way to support it. Cython itself often acquires the GIL > in the exception case in nogil contexts, sets the exception, and then > takes the error path. The problem with that is of course overhead, but > it should usually do it for exceptional conditions only (i.e. things > that normally should not occur, so not for normal conditions like > raising StopIteration etc). > > However, we also want to get rid of the 'except' clause and in general > need a way to check for error conditions for functions that have > non-object return types and no known exceptional values for those > types. For externally shared Cython functions this may mean exporting > multiple versions with different ABIs, the point being that the user > will not have to care, as taking the address or making it public would > still give the normal C ABI compatible version of the function. > > Anyway, I digress. For NumPy that seems like a good thing to do. > Perhaps it would be even nicer to pass in a pointer to npy_errorstate > or some such, which holds complete error information. Then, with the > GIL one could call something like > npy_raise_from_errorstate(&my_error_state). Functions could easily set > the error type as well in there through a borrowed reference. > > Another thing to worry about, arr->data can be NULL. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From dineshbvadhia at hotmail.com Sat Feb 11 19:04:20 2012 From: dineshbvadhia at hotmail.com (Dinesh B Vadhia) Date: Sat, 11 Feb 2012 16:04:20 -0800 Subject: [Numpy-discussion] Want to eliminate direct for-loop In-Reply-To: References: Message-ID: Sorry, I copy and pasted the wrong example r result - it should be as you say: r = array([ 1, 1, 1, 48, 68, 1, 75, 1, 1, 115, 1, 95, 1, 1, 1, 1, 1, 1, 1, 28, 1, 68, 1, 1, 28]) The reason for looking for an alternative solution is performance as the sizes of r, s and c are very large with the for-loop calculation repeated continuously (with different r, s and c). From: eat Sent: Saturday, February 11, 2012 3:12 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Want to eliminate direct for-loop Hi, On Sat, Feb 11, 2012 at 10:56 PM, Dinesh B Vadhia wrote: Could the following be written without the direct for-loop? import numpy # numpy vector r of any data type and length, eg. r = numpy.ones(25, dtype='int') # s is a list of values (of any data type), eg. s = [47, 27, 67] # c is a list of (variable length) lists where the sub-list elements are index values of r and len(s) = len(c), eg. c = [[3, 6, 9], [6, 11, 19, 24], [4, 9, 11, 21 ]] # for each element in each sub-list c, add corresponding s value to the index value in r, eg. for i, j in enumerate(c): r[j] += s[i] So, we get: r[[3, 6, 9]] += s[0] = 1 + 47 = 48 r[[6, 11, 19, 24]] += s[1] = 1 + 27 = 28 r[[4, 9, 11, 21]] += s[2] = 1 + 67 = 68 ie. r = array([ 1, 1, 1, 95, 68, 1, 122, 1, 1, 162, 1, 95, 1, 1, 1, 1, 1, 1, 1, 28, 1, 68, 1, 1, 28]) Thank-you! Could you describe more detailed manner about why you want to get rid of that loop? Performance wise? If so, do you have profiled what's the bottleneck? Please provide also a more detailed description of your problem, since now your current spec seems to yield: r= array([ 1, 1, 1, 48, 68, 1, 75, 1, 1, 115, 1, 95, 1, 1, 1, 1, 1, 1, 1, 28, 1, 68, 1, 1, 28]) My 2 cents, -eat _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From emmanuelle.gouillart at nsup.org Sun Feb 12 04:39:14 2012 From: emmanuelle.gouillart at nsup.org (Emmanuelle Gouillart) Date: Sun, 12 Feb 2012 10:39:14 +0100 Subject: [Numpy-discussion] Euroscipy 2012 - Brussels - August 23-37 - call for abstracts Message-ID: <20120212093914.GA5452@phare.normalesup.org> We apologize for the inconvenience if you received this e-mail through several mailing-lists. ------------------------------------------------------------- Euroscipy 2012, the 5th European meeting on Python in Science ------------------------------------------------------------- It is our pleasure to announce the conference Euroscipy 2012, that will be held in **Brussels**, **August 23-27**, at the Universit? Libre de Bruxelles (ULB, Solbosch Campus). The EuroSciPy meeting is a cross-disciplinary gathering focused on the use and development of the Python language in scientific research and industry. This event strives to bring together both users and developers of scientific tools, as well as academic research and state of the art industry. Website ======= http://www.euroscipy.org/conference/euroscipy2012 Main topics =========== - Presentations of scientific tools and libraries using the Python language, including but not limited to: - vector and array manipulation - parallel computing - scientific visualization - scientific data flow and persistence - algorithms implemented or exposed in Python - web applications and portals for science and engineering. - Reports on the use of Python in scientific achievements or ongoing projects. - General-purpose Python tools that can be of special interest to the scientific community. Tutorials ========= There will be two tutorial tracks at the conference, an introductory one, to bring up to speed with the Python language as a scientific tool, and an advanced track, during which experts of the field will lecture on specific advanced topics such as advanced use of numpy, paralllel computing, advanced testing... Keynote Speaker: David Beazley ============================== This year, we are very happy to welcome David Beazley (http://www.dabeaz.com) as our keynote speaker. David is the original author of SWIG, a software development tool that connects programs written in C and C++ with a variety of high-level programming languages such as Python. He has also authored the acclaimed Python Essential Reference. Important dates =============== Talk submission deadline: Mon Apr 30, 2012 Program announced: end of May Tutorials tracks: Thursday August 23 - Friday August 24, 2012 Conference track: Saturday August 25 - Sunday August 26, 2012 Satellites: Monday August 27 Satellite meetings are yet to be announced. Call for talks and posters ========================== We are soliciting talks and posters that discuss topics related to scientific computing using Python. These include applications, teaching, future development directions, and research. We welcome contributions from the industry as well as the academic world. Indeed, industrial research and development as well academic research face the challenge of mastering IT tools for exploration, modeling and analysis. We look forward to hearing your recent breakthroughs using Python! Submission guidelines ===================== - We solicit proposals in the form of a **one-page long abstract**. - Submissions whose main purpose is to promote a commercial product or service will be refused. - All accepted proposals must be presented at the EuroSciPy conference by at least one author. Abstracts should be detailed enough for the reviewers to appreciate the interest of the work for a wide audience. Examples of abstracts can be found on last year's webpage www.euroscipy.org/track/3992 (talks tab). The one-page long abstracts are for conference planning and selection purposes only. How to submit an abstract ========================= To submit a talk to the EuroScipy conference follow the instructions here: http://www.euroscipy.org/card/euroscipy2012_call_for_contributions Organizers ========== Chairs: - Pierre de Buyl - Didrik Pinte Local organizing committee - Kael Hanson - Nicolas Pettiaux Program committee - Tiziano Zito (Chair) - Pierre de Buyl - Emmanuelle Gouillart - Kael Hanson - Konrad Hinsen - Hans Petter Langtangen - Mike M?ller - Stefan Van Der Walt - Ga?l Varoquaux Tutorials chair: Valentin Haenel General organizing committee - Communication: Emmanuelle Gouillart - Sponsoring: Mike M?ller. - Web site: Nicolas Chauvat. Still have questions? ===================== send an e-mail to org-team at lists.euroscipy.org 94,1 -- Emmanuelle, for the organizing team From andrea.gavana at gmail.com Sun Feb 12 06:47:24 2012 From: andrea.gavana at gmail.com (Andrea Gavana) Date: Sun, 12 Feb 2012 12:47:24 +0100 Subject: [Numpy-discussion] Creating parallel curves In-Reply-To: References: Message-ID: HI Chris and All, On 10 February 2012 17:53, Chris Barker wrote: > Andrea, > >> Basically I have a set of x, y data (around 1,000 elements each) and I >> want to create 2 parallel "curves" (offset curves) to the original >> one; "parallel" means curves which are displaced from the base curve >> by a constant offset, either positive or negative, in the direction of >> the curve's normal. Something like this: >> >> http://pyx.sourceforge.net/examples/drawing2/parallel.html > > THis is called "buffering" in GIS parlance -- there are functions > available to do it in GIS an computational geometry libraries: you > might look in the shapely package: > > https://github.com/sgillies/shapely Thanks, I hoped this one would prove itself useful, but unfortunately to me it looks like one of the most impenetrable Python code I have ever seen. Or maybe my Python is just too weak. The end result is the same. I have surfed quite a lot and found many reference to Bezier curves, with a lot of mathematical acrobatics but little of practical value (i.e., code :-) ). In the end I stumbled upon a nice function in the matplotlib library (obviously) which can give me the normal line to every point in the curve, and I almost got there. I still have 2 problems (attached sample), picture at http://img689.imageshack.us/img689/7246/exampleplot.png 1) If you run the attached sample, you'll see a plot of a cubic polynomial with two "almost" parallel lines to it on the first subplot. I say "almost" because at the inflection points of the cubic something funny happens (see suplot number 2); 2) You can see in subplot 3 that the 3 lines are nicely shown as "almost" parallel (except the problem at point 1), as the X and Y axes have the same scale. Unfortunately I can't keep the same scales in my real plot and I can't use axis=square in matplotlib either. How do I "normalize" the get_normal_points method to get the same visual appearance of parallelism on X and Y having different X and Y scales (i.e., the same X and Y scales as in subplot 1)? >> But by plotting these thing out with matplotlib it seems to me they >> don't really look very parallel nor very constant-distance. > > as we say on the wxPython list -- post a fully functional example, so > we can check it out. Attached as promised. Thank you in advance for any suggestion. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ -------------- next part -------------- A non-text attachment was scrubbed... Name: sample_buffer.py Type: application/octet-stream Size: 2250 bytes Desc: not available URL: From d.s.seljebotn at astro.uio.no Sun Feb 12 10:18:34 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sun, 12 Feb 2012 16:18:34 +0100 Subject: [Numpy-discussion] @Dag re numpy.pxd In-Reply-To: References: Message-ID: <4F37D84A.1020802@astro.uio.no> On 02/11/2012 10:27 PM, mark florisson wrote: > On 11 February 2012 20:31, Charles R Harris wrote: >> Hi Dag, >> >> This probably needs to be on the cython mailing list at some point, but I >> thought I'd start the discussion here. Numpy is going to begin deprecating >> direct access to ndarray/dtype internals, ala arr->data etc. There are >> currently macros/functions for many of these operations in the numpy >> development branch and I expect more to go in over the coming year. Also, >> some of the macros have been renamed. I don't know the best way for Cython >> to support this, but the current version (0.15 here) generates code that >> will fail if the deprecated things are excluded. Ideally, numpy.pxd would >> have numpy version dependent parts but I don't know if that is possible. In >> any case, I'd like your thoughts on the best way to coordinate this >> migration with Cython. >> >> Chuck >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > This was discussed not too long ago on the cython-devel mailing list: > http://mail.python.org/pipermail/cython-devel/2012-January/001848.html > > I personally think it'd be nice to not break existing Cython code, by > e.g. writing nogil cdef properties (something which doesn't currently > exist). That way the properties could use the non-deprecated way to > actually access the data from numpy. (In any case the deprecated numpy > functionality should go through a deprecation process before being > removed). The only attribute that really concerns me is shape. For the rest, I think it's OK to require using the C API. E.g., the data is a Python attribute returning something else, and exposing the C field was a mistake in the first place. If we remove the shape field, it would still work (through the Python getattr), but this might in some locations give a speed regresssion (and, in many places, not). If we do anything about this, then I really think we should emulate the Python API in full, so that one could finally do "print a.shape", "len(a.shape)"; even if "a.shape[0]" is fast. This requires something else that just cdef properties though -- perhaps a typed tuple type in addition. > Alternatively, as Dag mentioned in the cython-devel thread, we could > just deprecate the fields in Cython as well and place the burden on > the user (and possibly issue warnings for their use). Something this list may not be aware of is that Cython 0.16 will support a different mechanism for fast array access, implemented by Mark F.: cdef double[:, :] a = np.zeros((3, 3)) In this case, "a" is NOT a NumPy array but is coerced to a "typed memory view", where Cython lays down semantics for the shape attribute etc. So I think the recommended approach for old code will be to move to this syntax. This makes it less important to cook up fast NumPy-specific .shape access; it could just revert to using the Python .shape attribute, and then if there's a speed regression one could port the code to the new memoryviews... Dag Sverre From markflorisson88 at gmail.com Sun Feb 12 10:48:14 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Sun, 12 Feb 2012 15:48:14 +0000 Subject: [Numpy-discussion] @Dag re numpy.pxd In-Reply-To: <4F37D84A.1020802@astro.uio.no> References: <4F37D84A.1020802@astro.uio.no> Message-ID: On 12 February 2012 15:18, Dag Sverre Seljebotn wrote: > On 02/11/2012 10:27 PM, mark florisson wrote: >> On 11 February 2012 20:31, Charles R Harris ?wrote: >>> Hi Dag, >>> >>> This probably needs to be on the cython mailing list at some point, but I >>> thought I'd start the discussion here. Numpy is going to begin deprecating >>> direct access to ndarray/dtype internals, ala arr->data etc. There are >>> currently macros/functions for many of these operations in the numpy >>> development branch and I expect more to go in over the coming year. Also, >>> some of the macros have been renamed. I don't know the best way for Cython >>> to support this, but the current version (0.15 here) generates code that >>> will fail if the deprecated things are excluded. Ideally, numpy.pxd would >>> have numpy version dependent parts but I don't know if that is possible. In >>> any case, I'd like your thoughts on the best way to coordinate this >>> migration with Cython. >>> >>> Chuck >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> This was discussed not too long ago on the cython-devel mailing list: >> http://mail.python.org/pipermail/cython-devel/2012-January/001848.html >> >> I personally think it'd be nice to not break existing Cython code, by >> e.g. writing nogil cdef properties (something which doesn't currently >> exist). That way the properties could use the non-deprecated way to >> actually access the data from numpy. (In any case the deprecated numpy >> functionality should go through a deprecation process before being >> removed). > > The only attribute that really concerns me is shape. > > For the rest, I think it's OK to require using the C API. E.g., the data > is a Python attribute returning something else, and exposing the C field > was a mistake in the first place. > > If we remove the shape field, it would still work (through the Python > getattr), but this might in some locations give a speed regresssion > (and, in many places, not). My concern with that is that it doesn't work in nogil mode. But maybe not that many people are using it in nogil mode and can readily change the code to use the macro or memoryviews if wanted. > If we do anything about this, then I really think we should emulate the > Python API in full, so that one could finally do "print a.shape", > "len(a.shape)"; even if "a.shape[0]" is fast. > > This requires something else that just cdef properties though -- perhaps > a typed tuple type in addition. It could return a view on the shape :p. >> Alternatively, as Dag mentioned in the cython-devel thread, we could >> just deprecate the fields in Cython as well and place the burden on >> the user (and possibly issue warnings for their use). > > Something this list may not be aware of is that Cython 0.16 will support > a different mechanism for fast array access, implemented by Mark F.: > > cdef double[:, :] a = np.zeros((3, 3)) > > In this case, "a" is NOT a NumPy array but is coerced to a "typed memory > view", where Cython lays down semantics for the shape attribute etc. > > So I think the recommended approach for old code will be to move to this > syntax. This makes it less important to cook up fast NumPy-specific > .shape access; it could just revert to using the Python .shape > attribute, and then if there's a speed regression one could port the > code to the new memoryviews... > > Dag Sverre > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From jkhilmer at gmail.com Sun Feb 12 14:53:31 2012 From: jkhilmer at gmail.com (Jonathan Hilmer) Date: Sun, 12 Feb 2012 12:53:31 -0700 Subject: [Numpy-discussion] Creating parallel curves In-Reply-To: References: Message-ID: Andrea, Here is how to do it with splines. I would be more standard to return an array of normals, rather than two arrays of x and y components, but it actually requires less housekeeping this way. As an aside, I would prefer to work with rotations via matrices, but it looks like there's no support for that built in to Numpy or Scipy? def normal_vectors(x, y, scalar=1.0): tck = scipy.interpolate.splrep(x, y) y_deriv = scipy.interpolate.splev(x, tck, der=1) normals_rad = np.arctan(y_deriv)+np.pi/2. return np.cos(normals_rad)*scalar, np.sin(normals_rad)*scalar # Jonathan On Sun, Feb 12, 2012 at 4:47 AM, Andrea Gavana wrote: > HI Chris and All, > > On 10 February 2012 17:53, Chris Barker wrote: >> Andrea, >> >>> Basically I have a set of x, y data (around 1,000 elements each) and I >>> want to create 2 parallel "curves" (offset curves) to the original >>> one; "parallel" means curves which are displaced from the base curve >>> by a constant offset, either positive or negative, in the direction of >>> the curve's normal. Something like this: >>> >>> http://pyx.sourceforge.net/examples/drawing2/parallel.html >> >> THis is called "buffering" in GIS parlance -- there are functions >> available to do it in GIS an computational geometry libraries: you >> might look in the shapely package: >> >> https://github.com/sgillies/shapely > > Thanks, I hoped this one would prove itself useful, but unfortunately > to me it looks like one of the most impenetrable Python code I have > ever seen. Or maybe my Python is just too weak. The end result is the > same. > > I have surfed quite a lot and found many reference to Bezier curves, > with a lot of mathematical acrobatics but little of practical value > (i.e., code :-) ). In the end I stumbled upon a nice function in the > matplotlib library (obviously) which can give me the normal line to > every point in the curve, and I almost got there. > > I still have 2 problems (attached sample), picture at > http://img689.imageshack.us/img689/7246/exampleplot.png > > 1) If you run the attached sample, you'll see a plot of a cubic > polynomial with two "almost" parallel lines to it on the first > subplot. I say "almost" because at the inflection points of the cubic > something funny happens (see suplot number 2); > > 2) You can see in subplot 3 that the 3 lines are nicely shown as > "almost" parallel (except the problem at point 1), as the X and Y axes > have the same scale. Unfortunately I can't keep the same scales in my > real plot and I can't use axis=square in matplotlib either. How do I > "normalize" the get_normal_points method to get the same visual > appearance of parallelism on X and Y having different X and Y scales > (i.e., the same X and Y scales as in subplot 1)? > >>> But by plotting these thing out with matplotlib it seems to me they >>> don't really look very parallel nor very constant-distance. >> >> as we say on the wxPython list -- post a fully functional example, so >> we can check it out. > > Attached as promised. > > Thank you in advance for any suggestion. > > Andrea. > > "Imagination Is The Only Weapon In The War Against Reality." > http://xoomer.alice.it/infinity77/ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From charlesr.harris at gmail.com Sun Feb 12 15:00:58 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 12 Feb 2012 13:00:58 -0700 Subject: [Numpy-discussion] Creating parallel curves In-Reply-To: References: Message-ID: On Fri, Feb 10, 2012 at 9:38 AM, Andrea Gavana wrote: > Hi All, > > my apologies for my deep ignorance about math stuff; I guess I > should be able to find this out but I keep getting impossible results. > > Basically I have a set of x, y data (around 1,000 elements each) and I > want to create 2 parallel "curves" (offset curves) to the original > one; "parallel" means curves which are displaced from the base curve > by a constant offset, either positive or negative, in the direction of > the curve's normal. Something like this: > > Note that curves produced in this way aren't actually 'parallel' and can even cross themselves. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrea.gavana at gmail.com Sun Feb 12 15:21:52 2012 From: andrea.gavana at gmail.com (Andrea Gavana) Date: Sun, 12 Feb 2012 21:21:52 +0100 Subject: [Numpy-discussion] Creating parallel curves In-Reply-To: References: Message-ID: Jonathan, On 12 February 2012 20:53, Jonathan Hilmer wrote: > Andrea, > > Here is how to do it with splines. ?I would be more standard to return > an array of normals, rather than two arrays of x and y components, but > it actually requires less housekeeping this way. ?As an aside, I would > prefer to work with rotations via matrices, but it looks like there's > no support for that built in to Numpy or Scipy? > > ? ? ? ?def normal_vectors(x, y, scalar=1.0): > ? ? ? ? ? ? ? ?tck = scipy.interpolate.splrep(x, y) > > ? ? ? ? ? ? ? ?y_deriv = scipy.interpolate.splev(x, tck, der=1) > > ? ? ? ? ? ? ? ?normals_rad = np.arctan(y_deriv)+np.pi/2. > > ? ? ? ? ? ? ? ?return np.cos(normals_rad)*scalar, np.sin(normals_rad)*scalar Thank you for this, I'll give it a go in a few minutes (hopefully I will also be able to correctly understand what you did). One thing though, at first glance, it appears to me that your approach is very similar to mine (meaning it will give "parallel" curves that cross themselves as in the example I posted). But maybe I am wrong, my apologies if I missed something. Thank you so much for your answer. Andrea. From andrea.gavana at gmail.com Sun Feb 12 15:26:03 2012 From: andrea.gavana at gmail.com (Andrea Gavana) Date: Sun, 12 Feb 2012 21:26:03 +0100 Subject: [Numpy-discussion] Creating parallel curves In-Reply-To: References: Message-ID: Charles, On 12 February 2012 21:00, Charles R Harris wrote: > > > On Fri, Feb 10, 2012 at 9:38 AM, Andrea Gavana > wrote: >> >> Hi All, >> >> ? ?my apologies for my deep ignorance about math stuff; I guess I >> should be able to find this out but I keep getting impossible results. >> >> Basically I have a set of x, y data (around 1,000 elements each) and I >> want to create 2 parallel "curves" (offset curves) to the original >> one; "parallel" means curves which are displaced from the base curve >> by a constant offset, either positive or negative, in the direction of >> the curve's normal. Something like this: >> > > Note that curves produced in this way aren't actually 'parallel' and can > even cross themselves. I know, my definition of "parallel" was probably not orthodox enough. What I am looking for is to generate 2 curves that look "graphically parallel enough" to the original one, and not "parallel" in the true mathematical sense. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ From charlesr.harris at gmail.com Sun Feb 12 15:27:23 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 12 Feb 2012 13:27:23 -0700 Subject: [Numpy-discussion] Creating parallel curves In-Reply-To: References: Message-ID: On Sun, Feb 12, 2012 at 1:21 PM, Andrea Gavana wrote: > Jonathan, > > On 12 February 2012 20:53, Jonathan Hilmer wrote: > > Andrea, > > > > Here is how to do it with splines. I would be more standard to return > > an array of normals, rather than two arrays of x and y components, but > > it actually requires less housekeeping this way. As an aside, I would > > prefer to work with rotations via matrices, but it looks like there's > > no support for that built in to Numpy or Scipy? > > > > def normal_vectors(x, y, scalar=1.0): > > tck = scipy.interpolate.splrep(x, y) > > > > y_deriv = scipy.interpolate.splev(x, tck, der=1) > > > > normals_rad = np.arctan(y_deriv)+np.pi/2. > > > > return np.cos(normals_rad)*scalar, > np.sin(normals_rad)*scalar > > > Thank you for this, I'll give it a go in a few minutes (hopefully I > will also be able to correctly understand what you did). One thing > though, at first glance, it appears to me that your approach is very > similar to mine (meaning it will give "parallel" curves that cross > themselves as in the example I posted). But maybe I am wrong, my > apologies if I missed something. > > Thank you so much for your answer. > > Crossing curves is the correct behavior for this method, think propagating wavefronts and the appearance of light on the bottom of a swimming pool when there are ripples on the surface. I think you need to define what you really want from 'parallel'. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Feb 12 15:37:03 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 12 Feb 2012 13:37:03 -0700 Subject: [Numpy-discussion] Creating parallel curves In-Reply-To: References: Message-ID: On Sun, Feb 12, 2012 at 1:26 PM, Andrea Gavana wrote: > Charles, > > On 12 February 2012 21:00, Charles R Harris wrote: > > > > > > On Fri, Feb 10, 2012 at 9:38 AM, Andrea Gavana > > wrote: > >> > >> Hi All, > >> > >> my apologies for my deep ignorance about math stuff; I guess I > >> should be able to find this out but I keep getting impossible results. > >> > >> Basically I have a set of x, y data (around 1,000 elements each) and I > >> want to create 2 parallel "curves" (offset curves) to the original > >> one; "parallel" means curves which are displaced from the base curve > >> by a constant offset, either positive or negative, in the direction of > >> the curve's normal. Something like this: > >> > > > > Note that curves produced in this way aren't actually 'parallel' and can > > even cross themselves. > > I know, my definition of "parallel" was probably not orthodox enough. > What I am looking for is to generate 2 curves that look "graphically > parallel enough" to the original one, and not "parallel" in the true > mathematical sense. > > You could try setting a point and 'contracting' the curve towards the point. A point a infinity would give the usual parallel curves. There are probably a lot of perspective like transformations that would do something similar. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jkhilmer at gmail.com Sun Feb 12 15:59:12 2012 From: jkhilmer at gmail.com (Jonathan Hilmer) Date: Sun, 12 Feb 2012 13:59:12 -0700 Subject: [Numpy-discussion] Creating parallel curves In-Reply-To: References: Message-ID: Andrea, I realized that my answer wouldn't be complete, but as people have pointed out that's a substantially more difficult question, so I wanted to give you a complete answer to just a subset of your problem. I'm currently writing a variant that avoids the overlapping normal vectors by interatively 1.) expanding along normals then 2.) condensing points, for every iteration. However, I'm doing it mostly for my own interest since I'm pretty sure it will not be functional when complete: your problem is that calculation of derivatives/normals is going to become unstable in acute convex regions, and the overlap issue there will become more severe. I would strongly recommend adapting some existing library for this problem. Jonathan On Sun, Feb 12, 2012 at 1:37 PM, Charles R Harris wrote: > > > On Sun, Feb 12, 2012 at 1:26 PM, Andrea Gavana > wrote: >> >> Charles, >> >> On 12 February 2012 21:00, Charles R Harris wrote: >> > >> > >> > On Fri, Feb 10, 2012 at 9:38 AM, Andrea Gavana >> > wrote: >> >> >> >> Hi All, >> >> >> >> ? ?my apologies for my deep ignorance about math stuff; I guess I >> >> should be able to find this out but I keep getting impossible results. >> >> >> >> Basically I have a set of x, y data (around 1,000 elements each) and I >> >> want to create 2 parallel "curves" (offset curves) to the original >> >> one; "parallel" means curves which are displaced from the base curve >> >> by a constant offset, either positive or negative, in the direction of >> >> the curve's normal. Something like this: >> >> >> > >> > Note that curves produced in this way aren't actually 'parallel' and can >> > even cross themselves. >> >> I know, my definition of "parallel" was probably not orthodox enough. >> What I am looking for is to generate 2 curves that look "graphically >> parallel enough" to the original one, and not "parallel" in the true >> mathematical sense. >> > > You could try setting a point and 'contracting' the curve towards the point. > A point a infinity would give the usual parallel curves. There are probably > a lot of perspective like transformations that would do something similar. > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From francesc at continuum.io Sun Feb 12 16:12:17 2012 From: francesc at continuum.io (Francesc Alted) Date: Sun, 12 Feb 2012 22:12:17 +0100 Subject: [Numpy-discussion] Commit rights to NumPy for Francesc Alted In-Reply-To: References: Message-ID: <74AB8EC1-F503-4586-84A6-860EC148A664@continuum.io> On Feb 12, 2012, at 12:07 AM, Ralf Gommers wrote: > On Sat, Feb 11, 2012 at 11:06 PM, Fernando Perez wrote: > On Sat, Feb 11, 2012 at 11:11 AM, Travis Oliphant wrote: > > I propose to give Francesc Alted commit rights to the NumPy project. > > +1. Thanks for the kind invitation. While it is true that in the last year I?ve been quite away of NumPy business, right now, and thanks to Travis O., NumPy has become a high priority project for me. I just hope that I can provide some interesting value in the path for forthcoming NumPy 2.0. Cheers! -- Francesc Alted From andrea.gavana at gmail.com Sun Feb 12 16:14:48 2012 From: andrea.gavana at gmail.com (Andrea Gavana) Date: Sun, 12 Feb 2012 22:14:48 +0100 Subject: [Numpy-discussion] Creating parallel curves In-Reply-To: References: Message-ID: Jonathan, On 12 February 2012 21:59, Jonathan Hilmer wrote: > Andrea, > > I realized that my answer wouldn't be complete, but as people have > pointed out that's a substantially more difficult question, so I > wanted to give you a complete answer to just a subset of your problem. > > I'm currently writing a variant that avoids the overlapping normal > vectors by interatively 1.) expanding along normals then 2.) > condensing points, for every iteration. ?However, I'm doing it mostly > for my own interest since I'm pretty sure it will not be functional > when complete: your problem is that calculation of derivatives/normals > is going to become unstable in acute convex regions, and the overlap > issue there will become more severe. ?I would strongly recommend > adapting some existing library for this problem. Thank you for your clear explanation. I feared something like that, but then again I was almost expecting it... I do wonder, however, if the problem I am having could be somehow simplified/modified by looking at another situation, in 3D this time: streamtubes and 3D representation of "tubes"/"pipes" are relatively common (i.e., VTKTubeFilter from the VTK library, streamtubes in Mayavi). They all build some kind of "cylindrical" shape around the main path (curve in 3D) to give the visual effect of some "tube" surrounding the main path. What I am trying to do is basically the same thing but in 2D: now, I have no idea if this can be done, and not even how it could be done. My math is relatively weak, I can't fathom how to build a 3D tube let alone splatter it back on a 2D map to give that effect. I am not sure my original problem and this one are related or not, but again every suggestion is most welcome. Thank you. Andrea. > > > Jonathan > > > On Sun, Feb 12, 2012 at 1:37 PM, Charles R Harris > wrote: >> >> >> On Sun, Feb 12, 2012 at 1:26 PM, Andrea Gavana >> wrote: >>> >>> Charles, >>> >>> On 12 February 2012 21:00, Charles R Harris wrote: >>> > >>> > >>> > On Fri, Feb 10, 2012 at 9:38 AM, Andrea Gavana >>> > wrote: >>> >> >>> >> Hi All, >>> >> >>> >> ? ?my apologies for my deep ignorance about math stuff; I guess I >>> >> should be able to find this out but I keep getting impossible results. >>> >> >>> >> Basically I have a set of x, y data (around 1,000 elements each) and I >>> >> want to create 2 parallel "curves" (offset curves) to the original >>> >> one; "parallel" means curves which are displaced from the base curve >>> >> by a constant offset, either positive or negative, in the direction of >>> >> the curve's normal. Something like this: >>> >> >>> > >>> > Note that curves produced in this way aren't actually 'parallel' and can >>> > even cross themselves. >>> >>> I know, my definition of "parallel" was probably not orthodox enough. >>> What I am looking for is to generate 2 curves that look "graphically >>> parallel enough" to the original one, and not "parallel" in the true >>> mathematical sense. >>> >> >> You could try setting a point and 'contracting' the curve towards the point. >> A point a infinity would give the usual parallel curves. There are probably >> a lot of perspective like transformations that would do something similar. >> >> Chuck >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ >>> import PyQt4.QtGui Traceback (most recent call last): ? File "", line 1, in ImportError: No module named PyQt4.QtGui >>> >>> import pygtk Traceback (most recent call last): ? File "", line 1, in ImportError: No module named pygtk >>> >>> import wx >>> >>> From robert.kern at gmail.com Sun Feb 12 16:31:33 2012 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 12 Feb 2012 21:31:33 +0000 Subject: [Numpy-discussion] Creating parallel curves In-Reply-To: References: Message-ID: On Sun, Feb 12, 2012 at 20:26, Andrea Gavana wrote: > I know, my definition of "parallel" was probably not orthodox enough. > What I am looking for is to generate 2 curves that look "graphically > parallel enough" to the original one, and not "parallel" in the true > mathematical sense. There is a rigorous way to define the curve that you are looking for, and fortunately it gives some hints for implementation. For each point (x,y) in space, associate with it the nearest distance D from that point to the reference curve. The "parallel" curves are just two sides of the level set where D(x,y) is equal to the specified distance (possibly removing the circular caps that surround the ends of the reference curve). If performance is not a constraint, then you could just evaluate that D(x,y) function on a fine-enough grid and do marching squares to find the level set. matplotlib's contour plotting routines can help here. There is a hint in the PyX page that you linked to that you should consider. Angles in the reference curve become circular arcs in the "parallel" curves. So if your reference curve is just a bunch of line segments, then what you can do is take each line segment, and make parallel copies the same length to either side. Now you just need to connect up these parallel segments with each other. You do this by using circular arcs centered on the vertices of the reference curve. Do this on both sides. On the "outer" side, the arcs will go "forward" while on the "inner" side, the arcs will go "backwards" just like the cusps that you saw in your attempt. Now let's take care of that. You will have two self-intersecting curves consisting of alternating line segments and circular arcs. Parts of these curves will be too close to the reference curve. You will have to go through these curves to find the locations of self-intersection and remove the parts of the segments and arcs that are too close to the reference curve. This is tricky to do, but the formulae for segment-segment, segment-arc, and arc-arc intersection can be found online. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From travis at continuum.io Sun Feb 12 18:12:22 2012 From: travis at continuum.io (Travis Oliphant) Date: Sun, 12 Feb 2012 17:12:22 -0600 Subject: [Numpy-discussion] Issue Tracking Message-ID: I'm wondering about using one of these commercial issue tracking plans for NumPy and would like thoughts and comments. Both of these plans allow Open Source projects to have unlimited plans for free. YouTrack from JetBrains: http://www.jetbrains.com/youtrack/features/issue_tracking.html JIRA: http://www.atlassian.com/software/jira/overview/tour/code-integration What Mark Wiebe said about making it easy to "manage the issues" quickly and what Eric said about making sure there are interfaces with dense information content really struck chords with me. I have seen a lot of time wasted on issue management with Trac --- time that could be better spent on NumPy. I'd like to make issue management efficient --- even if it means a system separate from GitHub. Issue management is a very important part of the open-source process. -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From niki.spahiev at gmail.com Mon Feb 13 04:01:43 2012 From: niki.spahiev at gmail.com (Niki Spahiev) Date: Mon, 13 Feb 2012 11:01:43 +0200 Subject: [Numpy-discussion] Creating parallel curves In-Reply-To: References: Message-ID: You can get polygon buffer from http://angusj.com/delphi/clipper.php and make cython interface to it. HTH Niki From pierre.haessig at crans.org Mon Feb 13 08:30:04 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Mon, 13 Feb 2012 14:30:04 +0100 Subject: [Numpy-discussion] Initializing an array to a constant value Message-ID: <4F39105C.6060204@crans.org> I have a pretty silly question about initializing an array a to a given scalar value, say A. Most of the time I use a=np.ones(shape)*A which seems the most widespread idiom, but I got recently interested in getting some performance improvement. I tried a=np.zeros(shape)+A, based on broadcasting but it seems to be equivalent in terms of speed. Now, the fastest : a = np.empty(shape) a.fill(A) but it is a two-steps instruction to do one thing, which I feel doesn't look very nice. Did I miss an all-in-one function like numpy.fill(shape, A) ? Best, Pierre From e.antero.tammi at gmail.com Mon Feb 13 13:17:55 2012 From: e.antero.tammi at gmail.com (eat) Date: Mon, 13 Feb 2012 20:17:55 +0200 Subject: [Numpy-discussion] Initializing an array to a constant value In-Reply-To: <4F39105C.6060204@crans.org> References: <4F39105C.6060204@crans.org> Message-ID: Hi, A slightly OT (and not directly answering to your question), but On Mon, Feb 13, 2012 at 3:30 PM, Pierre Haessig wrote: > I have a pretty silly question about initializing an array a to a given > scalar value, say A. > > Most of the time I use a=np.ones(shape)*A which seems the most > widespread idiom, but I got recently interested in getting some > performance improvement. > > I tried a=np.zeros(shape)+A, based on broadcasting but it seems to be > equivalent in terms of speed. > > Now, the fastest : > a = np.empty(shape) > a.fill(A) > wouldn't it be nice if you could just write: a= np.empty(shape).fill(A) this would be possible if .fill(.) just returned self. I assume that this topic has been discussed previously, but personally I feel that the code would be more readable when returning self rather None. Thus all ndarray methods should return something meaningful to act on (in the spirit that methods are more like functions than subroutines). Just my 2 cents, -eat > > but it is a two-steps instruction to do one thing, which I feel doesn't > look very nice. > > Did I miss an all-in-one function like numpy.fill(shape, A) ? > > Best, > Pierre > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From will at thearete.co.uk Mon Feb 13 14:26:44 2012 From: will at thearete.co.uk (William Furnass) Date: Mon, 13 Feb 2012 19:26:44 +0000 Subject: [Numpy-discussion] Indexing 2d arrays by column using an integer array Message-ID: Hi, Apologies if the following is a trivial question. I wish to index the columns of the following 2D array In [78]: neighbourhoods Out[78]: array([[8, 0, 1], [0, 1, 2], [1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5, 6], [5, 6, 7], [6, 7, 8], [7, 8, 0]]) using the integer array In [76]: perf[neighbourhoods].argmax(axis=1) Out[76]: array([2, 1, 0, 2, 1, 0, 0, 2, 1]) to produce a 9-element array but can't find a way of applying the indices to the columns rather than the rows. Is this do-able without using loops? The looped version of what I want is np.array( [neighbourhoods[i][perf[neighbourhoods].argmax(axis=1)[i]] for i in xrange(neighbourhoods.shape[0])] ) Regards, -- Will Furnass Doctoral Student Pennine Water Group Department of Civil and Structural Engineering University of Sheffield Phone: +44 (0)114 22 25768 From travis at continuum.io Mon Feb 13 14:39:10 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 13 Feb 2012 13:39:10 -0600 Subject: [Numpy-discussion] Indexing 2d arrays by column using an integer array In-Reply-To: References: Message-ID: <111BE35A-8815-4350-9E7C-2DC258941F56@continuum.io> I think the following is what you want: neighborhoods[range(9),perf[neighbourhoods].argmax(axis=1)] -Travis On Feb 13, 2012, at 1:26 PM, William Furnass wrote: > np.array( [neighbourhoods[i][perf[neighbourhoods].argmax(axis=1)[i]] > for i in xrange(neighbourhoods.shape[0])] ) From chris.barker at noaa.gov Mon Feb 13 14:44:15 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 13 Feb 2012 11:44:15 -0800 Subject: [Numpy-discussion] Creating parallel curves In-Reply-To: References: Message-ID: On Mon, Feb 13, 2012 at 1:01 AM, Niki Spahiev wrote: > You can get polygon buffer from http://angusj.com/delphi/clipper.php and > make cython interface to it. This should be built into GEOS as well, and the shapely package provides a python wrapper already. -Chris > HTH > > Niki > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From will at thearete.co.uk Mon Feb 13 14:52:30 2012 From: will at thearete.co.uk (William Furnass) Date: Mon, 13 Feb 2012 19:52:30 +0000 Subject: [Numpy-discussion] Indexing 2d arrays by column using an integer array In-Reply-To: <111BE35A-8815-4350-9E7C-2DC258941F56@continuum.io> References: <111BE35A-8815-4350-9E7C-2DC258941F56@continuum.io> Message-ID: Thank you, that does the trick. Regards, Will On 13 February 2012 19:39, Travis Oliphant wrote: > I think the following is what you want: > > neighborhoods[range(9),perf[neighbourhoods].argmax(axis=1)] > > -Travis > > > On Feb 13, 2012, at 1:26 PM, William Furnass wrote: > >> np.array( [neighbourhoods[i][perf[neighbourhoods].argmax(axis=1)[i]] >> for i in xrange(neighbourhoods.shape[0])] ) > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ralf.gommers at googlemail.com Mon Feb 13 15:37:51 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 13 Feb 2012 21:37:51 +0100 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: Message-ID: On Mon, Feb 13, 2012 at 12:12 AM, Travis Oliphant wrote: > I'm wondering about using one of these commercial issue tracking plans for > NumPy and would like thoughts and comments. Both of these plans allow > Open Source projects to have unlimited plans for free. > > Free usage of a tool that's itself not open source is not all that different from using Github, so no objections from me. > YouTrack from JetBrains: > > http://www.jetbrains.com/youtrack/features/issue_tracking.html > > This looks promising. It seems to have good Github integration, and I checked that you can easily export all your issues (so no lock-in). It's a company that isn't going anywhere (I hope), and they do a very nice job with PyCharm. > JIRA: > > http://www.atlassian.com/software/jira/overview/tour/code-integration > > Haven't looked into this one in much detail. I happen to have a dislike for Confluence (their wiki system), so someone else can say some nice things about JIRA. Haven't tried either tracker though. Anyone with actual experience? > What Mark Wiebe said about making it easy to "manage the issues" quickly > and what Eric said about making sure there are interfaces with dense > information content really struck chords with me. I have seen a lot of > time wasted on issue management with Trac --- time that could be better > spent on NumPy. I'd like to make issue management efficient --- even if > it means a system separate from GitHub. > > Issue management is a very important part of the open-source process. > > While we're at it, our buildbot situation is much worse than our issue tracker situation. This also looks good (and free): http://www.jetbrains.com/teamcity/ Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Mon Feb 13 15:44:55 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 13 Feb 2012 14:44:55 -0600 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: Message-ID: > > On Mon, Feb 13, 2012 at 12:12 AM, Travis Oliphant wrote: > I'm wondering about using one of these commercial issue tracking plans for NumPy and would like thoughts and comments. Both of these plans allow Open Source projects to have unlimited plans for free. > > Free usage of a tool that's itself not open source is not all that different from using Github, so no objections from me. > > YouTrack from JetBrains: > > http://www.jetbrains.com/youtrack/features/issue_tracking.html > > This looks promising. It seems to have good Github integration, and I checked that you can easily export all your issues (so no lock-in). It's a company that isn't going anywhere (I hope), and they do a very nice job with PyCharm. I do like the team behind JetBrains. And I've seen and heard good things about TeamCity. Thanks for reminding me about the build-bot situation. That is one thing I would like to address sooner rather than later as well. Thanks, -Travis > > JIRA: > > http://www.atlassian.com/software/jira/overview/tour/code-integration > > Haven't looked into this one in much detail. I happen to have a dislike for Confluence (their wiki system), so someone else can say some nice things about JIRA. > > Haven't tried either tracker though. Anyone with actual experience? > > What Mark Wiebe said about making it easy to "manage the issues" quickly and what Eric said about making sure there are interfaces with dense information content really struck chords with me. I have seen a lot of time wasted on issue management with Trac --- time that could be better spent on NumPy. I'd like to make issue management efficient --- even if it means a system separate from GitHub. > > Issue management is a very important part of the open-source process. > > While we're at it, our buildbot situation is much worse than our issue tracker situation. This also looks good (and free): http://www.jetbrains.com/teamcity/ > > Ralf > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon Feb 13 15:56:08 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 13 Feb 2012 12:56:08 -0800 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: Message-ID: Hi, On Mon, Feb 13, 2012 at 12:44 PM, Travis Oliphant wrote: > > On Mon, Feb 13, 2012 at 12:12 AM, Travis Oliphant > wrote: >> >> I'm wondering about using one of these commercial issue tracking plans for >> NumPy and would like thoughts and comments. ? ?Both of these plans allow >> Open Source projects to have unlimited plans for free. >> > Free usage of a tool that's itself not open source is not all that different > from using Github, so no objections from me. > >> >> YouTrack from JetBrains: >> >> http://www.jetbrains.com/youtrack/features/issue_tracking.html >> > This looks promising. It seems to have good Github integration, and I > checked that you can easily export all your issues (so no lock-in). It's a > company that isn't going anywhere (I hope), and they do a very nice job with > PyCharm. > > > I do like the team behind JetBrains. ? And I've seen and heard good things > about TeamCity. ? Thanks for reminding me about the build-bot situation. > ?That is one thing I would like to address sooner rather than later as > well. We've (nipy) got a buildbot collection working OK. If you want to go that way you are welcome to use our machines. It's a somewhat flaky setup though. http://nipy.bic.berkeley.edu/builders I have the impression that the Cython / SAGE team are happy with their Jenkins configuration. Ondrej did some nice stuff on integrating a build with the github pull requests: https://github.com/sympy/sympy-bot Some discussion of buildbot and Jenkins: http://vperic.blogspot.com/2011/05/continuous-integration-and-sympy.html See you, Matthew From pierre.haessig at crans.org Mon Feb 13 17:19:31 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Mon, 13 Feb 2012 23:19:31 +0100 Subject: [Numpy-discussion] Initializing an array to a constant value In-Reply-To: References: <4F39105C.6060204@crans.org> Message-ID: <4F398C73.1010100@crans.org> Le 13/02/2012 19:17, eat a ?crit : > wouldn't it be nice if you could just write: > a= np.empty(shape).fill(A) > this would be possible if .fill(.) just returned self. Thanks for the tip. I noticed several times this was not working (because of course, in the mean time, I forgot it...) but I had totally overlooked the reasons (just imagining there was some garbage collection magic vanishing my arrays !!) I find the syntax "np.empty(shape).fill(A)" being indeed a good alternative to the burden of creating a new numpy.fill (or numpy.filled ?) function. -- Pierre -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrea.gavana at gmail.com Mon Feb 13 17:32:28 2012 From: andrea.gavana at gmail.com (Andrea Gavana) Date: Mon, 13 Feb 2012 23:32:28 +0100 Subject: [Numpy-discussion] Fwd: Re: Creating parallel curves In-Reply-To: References: Message-ID: ---------- Forwarded message ---------- From: "Andrea Gavana" Date: Feb 13, 2012 11:31 PM Subject: Re: [Numpy-discussion] Creating parallel curves To: "Jonathan Hilmer" Thank you Jonathan for this, it's exactly what I was looking for. I' ll try it tomorrow on the 768 well trajectories I have and I'll let you know if I stumble upon any issue. If someone could shed some light on my problem number 2 (how to adjust the scaling/distance) so that the curves look parallel on a matplotlib graph even though the axes scales are different, I'd be more than grateful. Thank you in advance. Andrea. On Feb 13, 2012 4:32 AM, "Jonathan Hilmer" wrote: > Andrea, > > This is playing some tricks with 2D array expansion to make a tradeoff > in memory for speed. Given two sets of matching vectors (one > reference, given first, and a newly-expanded one, given second), it > removes all points from the expanded vectors that aren't needed to > describe the new contour. > > def filter_expansion(x, y, x_expan, y_expan, distance_target, tol=1e-6): > > target_xx, expansion_xx = scipy.meshgrid(x, x_expan) > target_yy, expansion_yy = scipy.meshgrid(y, y_expan) > > distance = scipy.sqrt((expansion_yy - target_yy)**2 + (expansion_xx > - > target_xx)**2) > > valid = distance.min(axis=1) > distance_target*(1.-tol) > > return x_expan.compress(valid), y_expan.compress(valid) > # > > > Jonathan > > > On Sun, Feb 12, 2012 at 2:31 PM, Robert Kern > wrote: > > On Sun, Feb 12, 2012 at 20:26, Andrea Gavana > wrote: > > > >> I know, my definition of "parallel" was probably not orthodox enough. > >> What I am looking for is to generate 2 curves that look "graphically > >> parallel enough" to the original one, and not "parallel" in the true > >> mathematical sense. > > > > There is a rigorous way to define the curve that you are looking for, > > and fortunately it gives some hints for implementation. For each point > > (x,y) in space, associate with it the nearest distance D from that > > point to the reference curve. The "parallel" curves are just two sides > > of the level set where D(x,y) is equal to the specified distance > > (possibly removing the circular caps that surround the ends of the > > reference curve). > > > > If performance is not a constraint, then you could just evaluate that > > D(x,y) function on a fine-enough grid and do marching squares to find > > the level set. matplotlib's contour plotting routines can help here. > > > > There is a hint in the PyX page that you linked to that you should > > consider. Angles in the reference curve become circular arcs in the > > "parallel" curves. So if your reference curve is just a bunch of line > > segments, then what you can do is take each line segment, and make > > parallel copies the same length to either side. Now you just need to > > connect up these parallel segments with each other. You do this by > > using circular arcs centered on the vertices of the reference curve. > > Do this on both sides. On the "outer" side, the arcs will go "forward" > > while on the "inner" side, the arcs will go "backwards" just like the > > cusps that you saw in your attempt. Now let's take care of that. You > > will have two self-intersecting curves consisting of alternating line > > segments and circular arcs. Parts of these curves will be too close to > > the reference curve. You will have to go through these curves to find > > the locations of self-intersection and remove the parts of the > > segments and arcs that are too close to the reference curve. This is > > tricky to do, but the formulae for segment-segment, segment-arc, and > > arc-arc intersection can be found online. > > > > -- > > Robert Kern > > > > "I have come to believe that the whole world is an enigma, a harmless > > enigma that is made terrible by our own mad attempt to interpret it as > > though it had an underlying truth." > > -- Umberto Eco > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason-sage at creativetrax.com Mon Feb 13 17:33:55 2012 From: jason-sage at creativetrax.com (jason-sage at creativetrax.com) Date: Mon, 13 Feb 2012 16:33:55 -0600 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: Message-ID: <4F398FD3.6040904@creativetrax.com> On 2/13/12 2:56 PM, Matthew Brett wrote: > I have the impression that the Cython / SAGE team are happy with their > Jenkins configuration. I'm not aware of a Jenkins buildbot system for Sage, though I think Cython uses such a system: https://sage.math.washington.edu:8091/hudson/ We do have a number of systems we build and test Sage on, though I don't think we have continuous integration yet. I've CCd Jeroen Demeyer, who is the current release manager for Sage. Jeroen, do we have an automatic buildbot system for Sage? Thanks, Jason -- Jason Grout From matthew.brett at gmail.com Mon Feb 13 18:12:24 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 13 Feb 2012 15:12:24 -0800 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: <4F398FD3.6040904@creativetrax.com> References: <4F398FD3.6040904@creativetrax.com> Message-ID: Hi, On Mon, Feb 13, 2012 at 2:33 PM, wrote: > On 2/13/12 2:56 PM, Matthew Brett wrote: >> I have the impression that the Cython / SAGE team are happy with their >> Jenkins configuration. > > I'm not aware of a Jenkins buildbot system for Sage, though I think > Cython uses such a system: https://sage.math.washington.edu:8091/hudson/ > > We do have a number of systems we build and test Sage on, though I don't > think we have continuous integration yet. ?I've CCd Jeroen Demeyer, who > is the current release manager for Sage. ?Jeroen, do we have an > automatic buildbot system for Sage? Ah - sorry - I was thinking of the Cython system on the SAGE server. Best, Matthew From fperez.net at gmail.com Mon Feb 13 18:18:47 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Mon, 13 Feb 2012 15:18:47 -0800 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: Message-ID: On Mon, Feb 13, 2012 at 12:56 PM, Matthew Brett wrote: > I have the impression that the Cython / SAGE team are happy with their > Jenkins configuration. So are we in IPython, thanks to Thomas Kluyver's recent leadership on this front it's now running quite smoothly: https://jenkins.shiningpanda.com/ipython/ I'm pretty sure Thomas is on this list, if you folks have any questions on the details of the setup. Cheers, f From m.oliver at jacobs-university.de Mon Feb 13 18:23:49 2012 From: m.oliver at jacobs-university.de (Marcel Oliver) Date: Tue, 14 Feb 2012 00:23:49 +0100 Subject: [Numpy-discussion] Index Array Performance Message-ID: <20281.39813.720754.45947@localhost.localdomain> Hi, I have a short piece of code where the use of an index array "feels right", but incurs a severe performance penalty: It's about an order of magnitude slower than all other operations with arrays of that size. It comes up in a piece of code which is doing a large number of "on the fly" histograms via hist[i,j] += 1 where i is an array with the bin index to be incremented and j is simply enumerating the histograms. I attach a full short sample code below which shows how it's being used in context, and corresponding timeit output from the critical code section. Questions: - Is this a matter of principle, or due to an inefficient implementation? - Is there an equivalent way of doing it which is fast? Regards, Marcel ================================================================= #! /usr/bin/env python # Plot the bifurcation diagram of the logistic map from pylab import * Nx = 800 Ny = 600 I = 50000 rmin = 2.5 rmax = 4.0 ymin = 0.0 ymax = 1.0 rr = linspace (rmin, rmax, Nx) x = 0.5*ones(rr.shape) hist = zeros((Ny+1,Nx), dtype=int) j = arange(Nx) dy = ymax/Ny def f(x): return rr*x*(1.0-x) for n in xrange(1000): x = f(x) for n in xrange(I): x = f(x) i = array(x/dy, dtype=int) hist[i,j] += 1 figure() imshow(hist, cmap='binary', origin='lower', interpolation='nearest', extent=(rmin,rmax,ymin,ymax), norm=matplotlib.colors.LogNorm()) xlabel ('$r$') ylabel ('$x$') title('Attractor of the logistic map $x_{n+1} = r \, x_n (1-x_n)$') show() ==================================================================== In [4]: timeit y=f(x) 10000 loops, best of 3: 19.4 us per loop In [5]: timeit i = array(x/dy, dtype=int) 10000 loops, best of 3: 22 us per loop In [6]: timeit img[i,j] += 1 10000 loops, best of 3: 119 us per loop From travis at vaught.net Mon Feb 13 18:46:46 2012 From: travis at vaught.net (Travis Vaught) Date: Mon, 13 Feb 2012 17:46:46 -0600 Subject: [Numpy-discussion] [Enthought-Dev] Discussion with Guido van Rossum and (hopefully) core python-dev on scientific Python and Python3 In-Reply-To: References: Message-ID: <6BCE895F-7958-4D4A-A16F-1C7889B30070@vaught.net> On Feb 13, 2012, at 3:55 PM, Fernando Perez wrote: > ... > - Extra operators/PEP 225. Here's a summary from the last time we > went over this, years ago at Scipy 2008: > http://mail.scipy.org/pipermail/numpy-discussion/2008-October/038234.html, > and the current status of the document we wrote about it is here: > file:///home/fperez/www/site/_build/html/py4science/numpy-pep225/numpy-pep225.html. > > ... The link to the document isn't quite right. Please update it -- I can't wait for some nostalgic reading ;-) Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Mon Feb 13 18:48:34 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Mon, 13 Feb 2012 15:48:34 -0800 Subject: [Numpy-discussion] [Enthought-Dev] Discussion with Guido van Rossum and (hopefully) core python-dev on scientific Python and Python3 In-Reply-To: <6BCE895F-7958-4D4A-A16F-1C7889B30070@vaught.net> References: <6BCE895F-7958-4D4A-A16F-1C7889B30070@vaught.net> Message-ID: On Mon, Feb 13, 2012 at 3:46 PM, Travis Vaught wrote: > > - Extra operators/PEP 225. ?Here's a summary from the last time we > went over this, years ago at Scipy 2008: > http://mail.scipy.org/pipermail/numpy-discussion/2008-October/038234.html, > and the current status of the document we wrote about it is here: > file:///home/fperez/www/site/_build/html/py4science/numpy-pep225/numpy-pep225.html. > > ... > > > The link to the document isn't quite right. ?Please update it -- I can't > wait for some nostalgic reading ;-) Oops, sorry; I pasted the local build url by accident: http://fperez.org/py4science/numpy-pep225/numpy-pep225.html And BTW, this discussion will take place on Friday March 2nd, most likely 3-5pm. We'll add that info to the pydata page as soon as it's finalized. Cheers, f From wesmckinn at gmail.com Mon Feb 13 19:18:01 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Mon, 13 Feb 2012 19:18:01 -0500 Subject: [Numpy-discussion] Index Array Performance In-Reply-To: <20281.39813.720754.45947@localhost.localdomain> References: <20281.39813.720754.45947@localhost.localdomain> Message-ID: On Mon, Feb 13, 2012 at 6:23 PM, Marcel Oliver wrote: > Hi, > > I have a short piece of code where the use of an index array "feels > right", but incurs a severe performance penalty: It's about an order > of magnitude slower than all other operations with arrays of that > size. > > It comes up in a piece of code which is doing a large number of "on > the fly" histograms via > > ?hist[i,j] += 1 > > where i is an array with the bin index to be incremented and j is > simply enumerating the histograms. ?I attach a full short sample code > below which shows how it's being used in context, and corresponding > timeit output from the critical code section. > > Questions: > > - Is this a matter of principle, or due to an inefficient > ?implementation? > - Is there an equivalent way of doing it which is fast? > > Regards, > Marcel > > ================================================================= > > #! /usr/bin/env python > # Plot the bifurcation diagram of the logistic map > > from pylab import * > > Nx = 800 > Ny = 600 > I = 50000 > > rmin = 2.5 > rmax = 4.0 > ymin = 0.0 > ymax = 1.0 > > rr = linspace (rmin, rmax, Nx) > x = 0.5*ones(rr.shape) > hist = zeros((Ny+1,Nx), dtype=int) > j = arange(Nx) > > dy = ymax/Ny > > def f(x): > ? ?return rr*x*(1.0-x) > > for n in xrange(1000): > ? ?x = f(x) > > for n in xrange(I): > ? ?x = f(x) > ? ?i = array(x/dy, dtype=int) > ? ?hist[i,j] += 1 > > figure() > > imshow(hist, > ? ? ? cmap='binary', > ? ? ? origin='lower', > ? ? ? interpolation='nearest', > ? ? ? extent=(rmin,rmax,ymin,ymax), > ? ? ? norm=matplotlib.colors.LogNorm()) > > xlabel ('$r$') > ylabel ('$x$') > > title('Attractor of the logistic map $x_{n+1} = r \, x_n (1-x_n)$') > > show() > > ==================================================================== > > In [4]: timeit y=f(x) > 10000 loops, best of 3: 19.4 us per loop > > In [5]: timeit i = array(x/dy, dtype=int) > 10000 loops, best of 3: 22 us per loop > > In [6]: timeit img[i,j] += 1 > 10000 loops, best of 3: 119 us per loop > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion This suggests to me that fancy indexing could be quite a bit faster in this case: In [40]: timeit hist[i,j] += 110000 loops, best of 3: 58.2 us per loop In [39]: timeit hist.put(np.ravel_multi_index((i, j), hist.shape), 1) 10000 loops, best of 3: 20.6 us per loop I wrote a simple Cython method def fancy_inc(ndarray[int64_t, ndim=2] values, ndarray[int64_t] iarr, ndarray[int64_t] jarr, int64_t inc): cdef: Py_ssize_t i, n = len(iarr) for i in range(n): values[iarr[i], jarr[i]] += inc that does even faster In [8]: timeit sbx.fancy_inc(hist, i, j, 1) 100000 loops, best of 3: 4.85 us per loop About 10% faster if bounds checking and wraparound are disabled. Kind of a bummer-- perhaps this should go high on the NumPy 2.0 TODO list? - Wes From matthew.brett at gmail.com Mon Feb 13 19:25:50 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 13 Feb 2012 16:25:50 -0800 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? Message-ID: Hi, I recently noticed a change in the upcasting rules in numpy 1.6.0 / 1.6.1 and I just wanted to check it was intentional. For all versions of numpy I've tested, we have: >>> import numpy as np >>> Adata = np.array([127], dtype=np.int8) >>> Bdata = np.int16(127) >>> (Adata + Bdata).dtype dtype('int8') That is - adding an integer scalar of a larger dtype does not result in upcasting of the output dtype, if the data in the scalar type fits in the smaller. For numpy < 1.6.0 we have this: >>> Bdata = np.int16(128) >>> (Adata + Bdata).dtype dtype('int8') That is - even if the data in the scalar does not fit in the dtype of the array to which it is being added, there is no upcasting. For numpy >= 1.6.0 we have this: >>> Bdata = np.int16(128) >>> (Adata + Bdata).dtype dtype('int16') There is upcasting... I can see why the numpy 1.6.0 way might be preferable but it is an API change I suppose. Best, Matthew From njs at pobox.com Mon Feb 13 19:30:29 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 14 Feb 2012 00:30:29 +0000 Subject: [Numpy-discussion] Index Array Performance In-Reply-To: References: <20281.39813.720754.45947@localhost.localdomain> Message-ID: How would you fix it? I shouldn't speculate without profiling, but I'll be naughty. Presumably the problem is that python turns that into something like hist[i,j] = hist[i,j] + 1 which means there's no way for numpy to avoid creating a temporary array. So maybe this could be fixed by adding a fused __inplace_add__ protocol to the language (and similarly for all the other inplace operators), but that seems really unlikely. Fundamentally this is just the sort of optimization opportunity you miss when you don't have a compiler with a global view; Fortran or c++ expression templates will win every time. Maybe pypy will fix it someday. Perhaps it would help to make np.add(hist, 1, out=hist, where=(i,j)) work? - N On Feb 14, 2012 12:18 AM, "Wes McKinney" wrote: > On Mon, Feb 13, 2012 at 6:23 PM, Marcel Oliver > wrote: > > Hi, > > > > I have a short piece of code where the use of an index array "feels > > right", but incurs a severe performance penalty: It's about an order > > of magnitude slower than all other operations with arrays of that > > size. > > > > It comes up in a piece of code which is doing a large number of "on > > the fly" histograms via > > > > hist[i,j] += 1 > > > > where i is an array with the bin index to be incremented and j is > > simply enumerating the histograms. I attach a full short sample code > > below which shows how it's being used in context, and corresponding > > timeit output from the critical code section. > > > > Questions: > > > > - Is this a matter of principle, or due to an inefficient > > implementation? > > - Is there an equivalent way of doing it which is fast? > > > > Regards, > > Marcel > > > > ================================================================= > > > > #! /usr/bin/env python > > # Plot the bifurcation diagram of the logistic map > > > > from pylab import * > > > > Nx = 800 > > Ny = 600 > > I = 50000 > > > > rmin = 2.5 > > rmax = 4.0 > > ymin = 0.0 > > ymax = 1.0 > > > > rr = linspace (rmin, rmax, Nx) > > x = 0.5*ones(rr.shape) > > hist = zeros((Ny+1,Nx), dtype=int) > > j = arange(Nx) > > > > dy = ymax/Ny > > > > def f(x): > > return rr*x*(1.0-x) > > > > for n in xrange(1000): > > x = f(x) > > > > for n in xrange(I): > > x = f(x) > > i = array(x/dy, dtype=int) > > hist[i,j] += 1 > > > > figure() > > > > imshow(hist, > > cmap='binary', > > origin='lower', > > interpolation='nearest', > > extent=(rmin,rmax,ymin,ymax), > > norm=matplotlib.colors.LogNorm()) > > > > xlabel ('$r$') > > ylabel ('$x$') > > > > title('Attractor of the logistic map $x_{n+1} = r \, x_n (1-x_n)$') > > > > show() > > > > ==================================================================== > > > > In [4]: timeit y=f(x) > > 10000 loops, best of 3: 19.4 us per loop > > > > In [5]: timeit i = array(x/dy, dtype=int) > > 10000 loops, best of 3: 22 us per loop > > > > In [6]: timeit img[i,j] += 1 > > 10000 loops, best of 3: 119 us per loop > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > This suggests to me that fancy indexing could be quite a bit faster in > this case: > > In [40]: timeit hist[i,j] += 110000 loops, best of 3: 58.2 us per loop > In [39]: timeit hist.put(np.ravel_multi_index((i, j), hist.shape), 1) > 10000 loops, best of 3: 20.6 us per loop > > I wrote a simple Cython method > > def fancy_inc(ndarray[int64_t, ndim=2] values, > ndarray[int64_t] iarr, ndarray[int64_t] jarr, int64_t inc): > cdef: > Py_ssize_t i, n = len(iarr) > > for i in range(n): > values[iarr[i], jarr[i]] += inc > > that does even faster > > In [8]: timeit sbx.fancy_inc(hist, i, j, 1) > 100000 loops, best of 3: 4.85 us per loop > > About 10% faster if bounds checking and wraparound are disabled. > > Kind of a bummer-- perhaps this should go high on the NumPy 2.0 TODO list? > > - Wes > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wesmckinn at gmail.com Mon Feb 13 19:46:21 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Mon, 13 Feb 2012 19:46:21 -0500 Subject: [Numpy-discussion] Index Array Performance In-Reply-To: References: <20281.39813.720754.45947@localhost.localdomain> Message-ID: On Mon, Feb 13, 2012 at 7:30 PM, Nathaniel Smith wrote: > How would you fix it? I shouldn't speculate without profiling, but I'll be > naughty. Presumably the problem is that python turns that into something > like > > hist[i,j] = hist[i,j] + 1 > > which means there's no way for numpy to avoid creating a temporary array. So > maybe this could be fixed by adding a fused __inplace_add__ protocol to the > language (and similarly for all the other inplace operators), but that seems > really unlikely. Fundamentally this is just the sort of optimization > opportunity you miss when you don't have a compiler with a global view; > Fortran or c++ expression templates will win every time. Maybe pypy will fix > it someday. > > Perhaps it would help to make np.add(hist, 1, out=hist, where=(i,j)) work? > > - N Nope, don't buy it: In [33]: timeit arr.__iadd__(1) 1000 loops, best of 3: 1.13 ms per loop In [37]: timeit arr[:] += 1 1000 loops, best of 3: 1.13 ms per loop - Wes > On Feb 14, 2012 12:18 AM, "Wes McKinney" wrote: >> >> On Mon, Feb 13, 2012 at 6:23 PM, Marcel Oliver >> wrote: >> > Hi, >> > >> > I have a short piece of code where the use of an index array "feels >> > right", but incurs a severe performance penalty: It's about an order >> > of magnitude slower than all other operations with arrays of that >> > size. >> > >> > It comes up in a piece of code which is doing a large number of "on >> > the fly" histograms via >> > >> > ?hist[i,j] += 1 >> > >> > where i is an array with the bin index to be incremented and j is >> > simply enumerating the histograms. ?I attach a full short sample code >> > below which shows how it's being used in context, and corresponding >> > timeit output from the critical code section. >> > >> > Questions: >> > >> > - Is this a matter of principle, or due to an inefficient >> > ?implementation? >> > - Is there an equivalent way of doing it which is fast? >> > >> > Regards, >> > Marcel >> > >> > ================================================================= >> > >> > #! /usr/bin/env python >> > # Plot the bifurcation diagram of the logistic map >> > >> > from pylab import * >> > >> > Nx = 800 >> > Ny = 600 >> > I = 50000 >> > >> > rmin = 2.5 >> > rmax = 4.0 >> > ymin = 0.0 >> > ymax = 1.0 >> > >> > rr = linspace (rmin, rmax, Nx) >> > x = 0.5*ones(rr.shape) >> > hist = zeros((Ny+1,Nx), dtype=int) >> > j = arange(Nx) >> > >> > dy = ymax/Ny >> > >> > def f(x): >> > ? ?return rr*x*(1.0-x) >> > >> > for n in xrange(1000): >> > ? ?x = f(x) >> > >> > for n in xrange(I): >> > ? ?x = f(x) >> > ? ?i = array(x/dy, dtype=int) >> > ? ?hist[i,j] += 1 >> > >> > figure() >> > >> > imshow(hist, >> > ? ? ? cmap='binary', >> > ? ? ? origin='lower', >> > ? ? ? interpolation='nearest', >> > ? ? ? extent=(rmin,rmax,ymin,ymax), >> > ? ? ? norm=matplotlib.colors.LogNorm()) >> > >> > xlabel ('$r$') >> > ylabel ('$x$') >> > >> > title('Attractor of the logistic map $x_{n+1} = r \, x_n (1-x_n)$') >> > >> > show() >> > >> > ==================================================================== >> > >> > In [4]: timeit y=f(x) >> > 10000 loops, best of 3: 19.4 us per loop >> > >> > In [5]: timeit i = array(x/dy, dtype=int) >> > 10000 loops, best of 3: 22 us per loop >> > >> > In [6]: timeit img[i,j] += 1 >> > 10000 loops, best of 3: 119 us per loop >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> This suggests to me that fancy indexing could be quite a bit faster in >> this case: >> >> In [40]: timeit hist[i,j] += 110000 loops, best of 3: 58.2 us per loop >> In [39]: timeit hist.put(np.ravel_multi_index((i, j), hist.shape), 1) >> 10000 loops, best of 3: 20.6 us per loop >> >> I wrote a simple Cython method >> >> def fancy_inc(ndarray[int64_t, ndim=2] values, >> ? ? ? ? ? ? ?ndarray[int64_t] iarr, ndarray[int64_t] jarr, int64_t inc): >> ? ?cdef: >> ? ? ? ?Py_ssize_t i, n = len(iarr) >> >> ? ?for i in range(n): >> ? ? ? ?values[iarr[i], jarr[i]] += inc >> >> that does even faster >> >> In [8]: timeit sbx.fancy_inc(hist, i, j, 1) >> 100000 loops, best of 3: 4.85 us per loop >> >> About 10% faster if bounds checking and wraparound are disabled. >> >> Kind of a bummer-- perhaps this should go high on the NumPy 2.0 TODO list? >> >> - Wes >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From wesmckinn at gmail.com Mon Feb 13 19:48:12 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Mon, 13 Feb 2012 19:48:12 -0500 Subject: [Numpy-discussion] Index Array Performance In-Reply-To: References: <20281.39813.720754.45947@localhost.localdomain> Message-ID: On Mon, Feb 13, 2012 at 7:46 PM, Wes McKinney wrote: > On Mon, Feb 13, 2012 at 7:30 PM, Nathaniel Smith wrote: >> How would you fix it? I shouldn't speculate without profiling, but I'll be >> naughty. Presumably the problem is that python turns that into something >> like >> >> hist[i,j] = hist[i,j] + 1 >> >> which means there's no way for numpy to avoid creating a temporary array. So >> maybe this could be fixed by adding a fused __inplace_add__ protocol to the >> language (and similarly for all the other inplace operators), but that seems >> really unlikely. Fundamentally this is just the sort of optimization >> opportunity you miss when you don't have a compiler with a global view; >> Fortran or c++ expression templates will win every time. Maybe pypy will fix >> it someday. >> >> Perhaps it would help to make np.add(hist, 1, out=hist, where=(i,j)) work? >> >> - N > > Nope, don't buy it: > > In [33]: timeit arr.__iadd__(1) > 1000 loops, best of 3: 1.13 ms per loop > > In [37]: timeit arr[:] += 1 > 1000 loops, best of 3: 1.13 ms per loop > > - Wes Actually, apologies, I'm being silly (had too much coffee or something). Python may be doing something nefarious with the hist[i,j] += 1. So both a get, add, then set, which is probably the problem. >> On Feb 14, 2012 12:18 AM, "Wes McKinney" wrote: >>> >>> On Mon, Feb 13, 2012 at 6:23 PM, Marcel Oliver >>> wrote: >>> > Hi, >>> > >>> > I have a short piece of code where the use of an index array "feels >>> > right", but incurs a severe performance penalty: It's about an order >>> > of magnitude slower than all other operations with arrays of that >>> > size. >>> > >>> > It comes up in a piece of code which is doing a large number of "on >>> > the fly" histograms via >>> > >>> > ?hist[i,j] += 1 >>> > >>> > where i is an array with the bin index to be incremented and j is >>> > simply enumerating the histograms. ?I attach a full short sample code >>> > below which shows how it's being used in context, and corresponding >>> > timeit output from the critical code section. >>> > >>> > Questions: >>> > >>> > - Is this a matter of principle, or due to an inefficient >>> > ?implementation? >>> > - Is there an equivalent way of doing it which is fast? >>> > >>> > Regards, >>> > Marcel >>> > >>> > ================================================================= >>> > >>> > #! /usr/bin/env python >>> > # Plot the bifurcation diagram of the logistic map >>> > >>> > from pylab import * >>> > >>> > Nx = 800 >>> > Ny = 600 >>> > I = 50000 >>> > >>> > rmin = 2.5 >>> > rmax = 4.0 >>> > ymin = 0.0 >>> > ymax = 1.0 >>> > >>> > rr = linspace (rmin, rmax, Nx) >>> > x = 0.5*ones(rr.shape) >>> > hist = zeros((Ny+1,Nx), dtype=int) >>> > j = arange(Nx) >>> > >>> > dy = ymax/Ny >>> > >>> > def f(x): >>> > ? ?return rr*x*(1.0-x) >>> > >>> > for n in xrange(1000): >>> > ? ?x = f(x) >>> > >>> > for n in xrange(I): >>> > ? ?x = f(x) >>> > ? ?i = array(x/dy, dtype=int) >>> > ? ?hist[i,j] += 1 >>> > >>> > figure() >>> > >>> > imshow(hist, >>> > ? ? ? cmap='binary', >>> > ? ? ? origin='lower', >>> > ? ? ? interpolation='nearest', >>> > ? ? ? extent=(rmin,rmax,ymin,ymax), >>> > ? ? ? norm=matplotlib.colors.LogNorm()) >>> > >>> > xlabel ('$r$') >>> > ylabel ('$x$') >>> > >>> > title('Attractor of the logistic map $x_{n+1} = r \, x_n (1-x_n)$') >>> > >>> > show() >>> > >>> > ==================================================================== >>> > >>> > In [4]: timeit y=f(x) >>> > 10000 loops, best of 3: 19.4 us per loop >>> > >>> > In [5]: timeit i = array(x/dy, dtype=int) >>> > 10000 loops, best of 3: 22 us per loop >>> > >>> > In [6]: timeit img[i,j] += 1 >>> > 10000 loops, best of 3: 119 us per loop >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion at scipy.org >>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> This suggests to me that fancy indexing could be quite a bit faster in >>> this case: >>> >>> In [40]: timeit hist[i,j] += 110000 loops, best of 3: 58.2 us per loop >>> In [39]: timeit hist.put(np.ravel_multi_index((i, j), hist.shape), 1) >>> 10000 loops, best of 3: 20.6 us per loop >>> >>> I wrote a simple Cython method >>> >>> def fancy_inc(ndarray[int64_t, ndim=2] values, >>> ? ? ? ? ? ? ?ndarray[int64_t] iarr, ndarray[int64_t] jarr, int64_t inc): >>> ? ?cdef: >>> ? ? ? ?Py_ssize_t i, n = len(iarr) >>> >>> ? ?for i in range(n): >>> ? ? ? ?values[iarr[i], jarr[i]] += inc >>> >>> that does even faster >>> >>> In [8]: timeit sbx.fancy_inc(hist, i, j, 1) >>> 100000 loops, best of 3: 4.85 us per loop >>> >>> About 10% faster if bounds checking and wraparound are disabled. >>> >>> Kind of a bummer-- perhaps this should go high on the NumPy 2.0 TODO list? >>> >>> - Wes >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> From wesmckinn at gmail.com Mon Feb 13 19:50:39 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Mon, 13 Feb 2012 19:50:39 -0500 Subject: [Numpy-discussion] Index Array Performance In-Reply-To: References: <20281.39813.720754.45947@localhost.localdomain> Message-ID: On Mon, Feb 13, 2012 at 7:48 PM, Wes McKinney wrote: > On Mon, Feb 13, 2012 at 7:46 PM, Wes McKinney wrote: >> On Mon, Feb 13, 2012 at 7:30 PM, Nathaniel Smith wrote: >>> How would you fix it? I shouldn't speculate without profiling, but I'll be >>> naughty. Presumably the problem is that python turns that into something >>> like >>> >>> hist[i,j] = hist[i,j] + 1 >>> >>> which means there's no way for numpy to avoid creating a temporary array. So >>> maybe this could be fixed by adding a fused __inplace_add__ protocol to the >>> language (and similarly for all the other inplace operators), but that seems >>> really unlikely. Fundamentally this is just the sort of optimization >>> opportunity you miss when you don't have a compiler with a global view; >>> Fortran or c++ expression templates will win every time. Maybe pypy will fix >>> it someday. >>> >>> Perhaps it would help to make np.add(hist, 1, out=hist, where=(i,j)) work? >>> >>> - N >> >> Nope, don't buy it: >> >> In [33]: timeit arr.__iadd__(1) >> 1000 loops, best of 3: 1.13 ms per loop >> >> In [37]: timeit arr[:] += 1 >> 1000 loops, best of 3: 1.13 ms per loop >> >> - Wes > > Actually, apologies, I'm being silly (had too much coffee or > something). Python may be doing something nefarious with the hist[i,j] > += 1. So both a get, add, then set, which is probably the problem. > >>> On Feb 14, 2012 12:18 AM, "Wes McKinney" wrote: >>>> >>>> On Mon, Feb 13, 2012 at 6:23 PM, Marcel Oliver >>>> wrote: >>>> > Hi, >>>> > >>>> > I have a short piece of code where the use of an index array "feels >>>> > right", but incurs a severe performance penalty: It's about an order >>>> > of magnitude slower than all other operations with arrays of that >>>> > size. >>>> > >>>> > It comes up in a piece of code which is doing a large number of "on >>>> > the fly" histograms via >>>> > >>>> > ?hist[i,j] += 1 >>>> > >>>> > where i is an array with the bin index to be incremented and j is >>>> > simply enumerating the histograms. ?I attach a full short sample code >>>> > below which shows how it's being used in context, and corresponding >>>> > timeit output from the critical code section. >>>> > >>>> > Questions: >>>> > >>>> > - Is this a matter of principle, or due to an inefficient >>>> > ?implementation? >>>> > - Is there an equivalent way of doing it which is fast? >>>> > >>>> > Regards, >>>> > Marcel >>>> > >>>> > ================================================================= >>>> > >>>> > #! /usr/bin/env python >>>> > # Plot the bifurcation diagram of the logistic map >>>> > >>>> > from pylab import * >>>> > >>>> > Nx = 800 >>>> > Ny = 600 >>>> > I = 50000 >>>> > >>>> > rmin = 2.5 >>>> > rmax = 4.0 >>>> > ymin = 0.0 >>>> > ymax = 1.0 >>>> > >>>> > rr = linspace (rmin, rmax, Nx) >>>> > x = 0.5*ones(rr.shape) >>>> > hist = zeros((Ny+1,Nx), dtype=int) >>>> > j = arange(Nx) >>>> > >>>> > dy = ymax/Ny >>>> > >>>> > def f(x): >>>> > ? ?return rr*x*(1.0-x) >>>> > >>>> > for n in xrange(1000): >>>> > ? ?x = f(x) >>>> > >>>> > for n in xrange(I): >>>> > ? ?x = f(x) >>>> > ? ?i = array(x/dy, dtype=int) >>>> > ? ?hist[i,j] += 1 >>>> > >>>> > figure() >>>> > >>>> > imshow(hist, >>>> > ? ? ? cmap='binary', >>>> > ? ? ? origin='lower', >>>> > ? ? ? interpolation='nearest', >>>> > ? ? ? extent=(rmin,rmax,ymin,ymax), >>>> > ? ? ? norm=matplotlib.colors.LogNorm()) >>>> > >>>> > xlabel ('$r$') >>>> > ylabel ('$x$') >>>> > >>>> > title('Attractor of the logistic map $x_{n+1} = r \, x_n (1-x_n)$') >>>> > >>>> > show() >>>> > >>>> > ==================================================================== >>>> > >>>> > In [4]: timeit y=f(x) >>>> > 10000 loops, best of 3: 19.4 us per loop >>>> > >>>> > In [5]: timeit i = array(x/dy, dtype=int) >>>> > 10000 loops, best of 3: 22 us per loop >>>> > >>>> > In [6]: timeit img[i,j] += 1 >>>> > 10000 loops, best of 3: 119 us per loop >>>> > _______________________________________________ >>>> > NumPy-Discussion mailing list >>>> > NumPy-Discussion at scipy.org >>>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> This suggests to me that fancy indexing could be quite a bit faster in >>>> this case: >>>> >>>> In [40]: timeit hist[i,j] += 110000 loops, best of 3: 58.2 us per loop >>>> In [39]: timeit hist.put(np.ravel_multi_index((i, j), hist.shape), 1) >>>> 10000 loops, best of 3: 20.6 us per loop >>>> >>>> I wrote a simple Cython method >>>> >>>> def fancy_inc(ndarray[int64_t, ndim=2] values, >>>> ? ? ? ? ? ? ?ndarray[int64_t] iarr, ndarray[int64_t] jarr, int64_t inc): >>>> ? ?cdef: >>>> ? ? ? ?Py_ssize_t i, n = len(iarr) >>>> >>>> ? ?for i in range(n): >>>> ? ? ? ?values[iarr[i], jarr[i]] += inc >>>> >>>> that does even faster >>>> >>>> In [8]: timeit sbx.fancy_inc(hist, i, j, 1) >>>> 100000 loops, best of 3: 4.85 us per loop >>>> >>>> About 10% faster if bounds checking and wraparound are disabled. >>>> >>>> Kind of a bummer-- perhaps this should go high on the NumPy 2.0 TODO list? >>>> >>>> - Wes >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> But: In [40]: timeit hist[i, j] 10000 loops, best of 3: 32 us per loop So that's roughly 7-8x slower than a simple Cython method, so I sincerely hope it could be brought down to the sub 10 microsecond level with a little bit of work. From asmeurer at gmail.com Mon Feb 13 19:53:25 2012 From: asmeurer at gmail.com (Aaron Meurer) Date: Mon, 13 Feb 2012 17:53:25 -0700 Subject: [Numpy-discussion] [IPython-dev] Discussion with Guido van Rossum and (hopefully) core python-dev on scientific Python and Python3 In-Reply-To: References: Message-ID: I'd like the ability to make "in" (i.e., __contains__) return something other than a bool. Also, the ability to make the x < y < z syntax would be useful. It's been suggested that the ability to override the boolean operators (and, or, not) would be the way to do this (pep 335), though I'm not 100% convinced that's the way to go. Aaron Meurer On Mon, Feb 13, 2012 at 2:55 PM, Fernando Perez wrote: > Hi folks, > > [ I'm broadcasting this widely for maximum reach, but I'd appreciate > it if replies can be kept to the *numpy* list, which is sort of the > 'base' list for scientific/numerical work. ?It will make it much > easier to organize a coherent set of notes later on. ?Apology if > you're subscribed to all and get it 10 times. ] > > As part of the PyData workshop (http://pydataworkshop.eventbrite.com) > to be held March 2 and 3 at the Mountain View Google offices, we have > scheduled a session for an open discussion with Guido van Rossum and > hopefully as many core python-dev members who can make it. ?We wanted > to seize the combined opportunity of the PyData workshop bringing a > number of 'scipy people' to Google with the timeline for Python 3.3, > the first release after the Python language moratorium, being within > sight: http://www.python.org/dev/peps/pep-0398. > > While a number of scientific Python packages are already available for > Python 3 (either in released form or in their master git branches), > it's fair to say that there hasn't been a major transition of the > scientific community to Python3. ?Since there is no more development > being done on the Python2 series, eventually we will all want to find > ways to make this transition, and we think that this is an excellent > time to engage the core python development team and consider ideas > that would make Python3 generally a more appealing language for > scientific work. ?Guido has made it clear that he doesn't speak for > the day-to-day development of Python anymore, so we all should be > aware that any ideas that come out of this panel will still need to be > discussed with python-dev itself via standard mechanisms before > anything is implemented. ?Nonetheless, the opportunity for a solid > face-to-face dialog for brainstorming was too good to pass up. > > The purpose of this email is then to solicit, from all of our > community, ideas for this discussion. ?In a week or so we'll need to > summarize the main points brought up here and make a more concrete > agenda out of it; I will also post a summary of the meeting afterwards > here. > > Anything is a valid topic, some points just to get the conversation started: > > - Extra operators/PEP 225. ?Here's a summary from the last time we > went over this, years ago at Scipy 2008: > http://mail.scipy.org/pipermail/numpy-discussion/2008-October/038234.html, > and the current status of the document we wrote about it is here: > file:///home/fperez/www/site/_build/html/py4science/numpy-pep225/numpy-pep225.html. > > - Improved syntax/support for rationals or decimal literals? ?While > Python now has both decimals > (http://docs.python.org/library/decimal.html) and rationals > (http://docs.python.org/library/fractions.html), they're quite clunky > to use because they require full constructor calls. ?Guido has > mentioned in previous discussions toying with ideas about support for > different kinds of numeric literals... > > - Using the numpy docstring standard python-wide, and thus having > python improve the pathetic state of the stdlib's docstrings? ?This is > an area where our community is light years ahead of the standard > library, but we'd all benefit from Python itself improving on this > front. ?I'm toying with the idea of giving a lighting talk at PyConn > about this, comparing the great, robust culture and tools of good > docstrings across the Scipy ecosystem with the sad, sad state of > docstrings in the stdlib. ?It might spur some movement on that front > from the stdlib authors, esp. if the core python-dev team realizes the > value and benefit it can bring (at relatively low cost, given how most > of the information does exist, it's just in the wrong places). ?But > more importantly for us, if there was truly a universal standard for > high-quality docstrings across Python projects, building good > documentation/help machinery would be a lot easier, as we'd know what > to expect and search for (such as rendering them nicely in the ipython > notebook, providing high-quality cross-project help search, etc). > > - Literal syntax for arrays? ?Sage has been floating a discussion > about a literal matrix syntax > (https://groups.google.com/forum/#!topic/sage-devel/mzwepqZBHnA). ?For > something like this to go into python in any meaningful way there > would have to be core multidimensional arrays in the language, but > perhaps it's time to think about a piece of the numpy array itself > into Python? ?This is one of the more 'out there' ideas, but after > all, that's the point of a discussion like this, especially > considering we'll have both Travis and Guido in one room. > > - Other syntactic sugar? Sage has "a..b" <=> range(a, b+1), which I > actually think is ?both nice and useful... There's also the question > of allowing "a:b:c" notation outside of [], which has come up a few > times in conversation over the last few years. Others? > > - The packaging quagmire? ?This continues to be a problem, though > python3 does have new improvements to distutils. ?I'm not really up to > speed on the situation, to be frank. ?If we want to bring this up, > someone will have to provide a solid reference or volunteer to do it > in person. > > - etc... > > I'm putting the above just to *start* the discussion, but the real > point is for the rest of the community to contribute ideas, so don't > be shy. > > Final note: while I am here commiting to organizing and presenting > this at the discussion with Guido (as well as contacting python-dev), > I would greatly appreciate help with the task of summarizing this > prior to the meeting as I'm pretty badly swamped in the run-in to > pydata/pycon. ?So if anyone is willing to help draft the summary as > the date draws closer (we can put it up on a github wiki, gist, > whatever), I will be very grateful. ?I'm sure it will be better than > what I'll otherwise do the last night at 2am :) > > Cheers, > > f > > ps - to the obvious question about webcasting the discussion live for > remote participation: yes, we looked into it already; no, > unfortunately it appears it won't be possible. ?We'll try to at least > have the audio recorded (and possibly video) for posting later on. > > pps- if you are close to Mountain View and are interested in attending > this panel in person, drop me a line at fernando.perez at berkeley.edu. > We have a few spots available *for this discussion only* on top of the > pydata regular attendance (which is long closed, I'm afraid). ?But > we'll need to provide Google with a list of those attendees in > advance. ?Please indicate if you are a core python committer in your > email, as we'll give priority for this overflow pool to core python > developers (but will otherwise accommodate as many people as Google > lets us). > _______________________________________________ > IPython-dev mailing list > IPython-dev at scipy.org > http://mail.scipy.org/mailman/listinfo/ipython-dev From travis at continuum.io Mon Feb 13 20:00:37 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 13 Feb 2012 19:00:37 -0600 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: References: Message-ID: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> Hmmm. This seems like a regression. The scalar casting API was fairly intentional. What is the reason for the change? -- Travis Oliphant (on a mobile) 512-826-7480 On Feb 13, 2012, at 6:25 PM, Matthew Brett wrote: > Hi, > > I recently noticed a change in the upcasting rules in numpy 1.6.0 / > 1.6.1 and I just wanted to check it was intentional. > > For all versions of numpy I've tested, we have: > >>>> import numpy as np >>>> Adata = np.array([127], dtype=np.int8) >>>> Bdata = np.int16(127) >>>> (Adata + Bdata).dtype > dtype('int8') > > That is - adding an integer scalar of a larger dtype does not result > in upcasting of the output dtype, if the data in the scalar type fits > in the smaller. > > For numpy < 1.6.0 we have this: > >>>> Bdata = np.int16(128) >>>> (Adata + Bdata).dtype > dtype('int8') > > That is - even if the data in the scalar does not fit in the dtype of > the array to which it is being added, there is no upcasting. > > For numpy >= 1.6.0 we have this: > >>>> Bdata = np.int16(128) >>>> (Adata + Bdata).dtype > dtype('int16') > > There is upcasting... > > I can see why the numpy 1.6.0 way might be preferable but it is an API > change I suppose. > > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From matthew.brett at gmail.com Mon Feb 13 20:20:52 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 13 Feb 2012 17:20:52 -0800 Subject: [Numpy-discussion] can_cast with structured array output - bug? Message-ID: Hi, I've also just noticed this oddity: In [17]: np.can_cast('c', 'u1') Out[17]: False OK so far, but... In [18]: np.can_cast('c', [('f1', 'u1')]) Out[18]: True In [19]: np.can_cast('c', [('f1', 'u1')], 'safe') Out[19]: True In [20]: np.can_cast(np.ones(10, dtype='c'), [('f1', 'u1')]) Out[20]: True I think this must be a bug. In the other direction, it makes more sense to me: In [24]: np.can_cast([('f1', 'u1')], 'c') Out[24]: False In [25]: np.can_cast([('f1', 'u1')], [('f1', 'u1')]) Out[25]: True Best, Matthew From mwwiebe at gmail.com Mon Feb 13 20:58:45 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Mon, 13 Feb 2012 17:58:45 -0800 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> References: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> Message-ID: On Mon, Feb 13, 2012 at 5:00 PM, Travis Oliphant wrote: > Hmmm. This seems like a regression. The scalar casting API was fairly > intentional. > > What is the reason for the change? > In order to make 1.6 ABI-compatible with 1.5, I basically had to rewrite this subsystem. There were virtually no tests in the test suite specifying what the expected behavior should be, and there were clear inconsistencies where for example "a+b" could result in a different type than "b+a". I recall there being some bugs in the tracker related to this as well, but I don't remember those details. This change felt like an obvious extension of an existing behavior for eliminating overflow, where the promotion changed unsigned -> signed based on the value of the scalar. This change introduced minimal upcasting only in a set of cases where an overflow was guaranteed to happen without that upcasting. During the 1.6 beta period, I signaled that this subsystem had changed, as the bullet point starting "The ufunc uses a more consistent algorithm for loop selection.": http://mail.scipy.org/pipermail/numpy-discussion/2011-March/055156.html The behavior Matthew has observed is a direct result of how I designed the minimization function mentioned in that bullet point, and the algorithm for it is documented in the 'Notes' section of the result_type page: http://docs.scipy.org/doc/numpy/reference/generated/numpy.result_type.html Hopefully that explains it well enough. I made the change intentionally and carefully, tested its impact on SciPy and other projects, and advocated for it during the release cycle. Cheers, Mark -- > Travis Oliphant > (on a mobile) > 512-826-7480 > > > On Feb 13, 2012, at 6:25 PM, Matthew Brett > wrote: > > > Hi, > > > > I recently noticed a change in the upcasting rules in numpy 1.6.0 / > > 1.6.1 and I just wanted to check it was intentional. > > > > For all versions of numpy I've tested, we have: > > > >>>> import numpy as np > >>>> Adata = np.array([127], dtype=np.int8) > >>>> Bdata = np.int16(127) > >>>> (Adata + Bdata).dtype > > dtype('int8') > > > > That is - adding an integer scalar of a larger dtype does not result > > in upcasting of the output dtype, if the data in the scalar type fits > > in the smaller. > > > > For numpy < 1.6.0 we have this: > > > >>>> Bdata = np.int16(128) > >>>> (Adata + Bdata).dtype > > dtype('int8') > > > > That is - even if the data in the scalar does not fit in the dtype of > > the array to which it is being added, there is no upcasting. > > > > For numpy >= 1.6.0 we have this: > > > >>>> Bdata = np.int16(128) > >>>> (Adata + Bdata).dtype > > dtype('int16') > > > > There is upcasting... > > > > I can see why the numpy 1.6.0 way might be preferable but it is an API > > change I suppose. > > > > Best, > > > > Matthew > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan at ajackson.org Mon Feb 13 21:07:04 2012 From: alan at ajackson.org (alan at ajackson.org) Date: Mon, 13 Feb 2012 20:07:04 -0600 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: References: Message-ID: <20120213200704.79873e2b@ajackson.org> >I'm wondering about using one of these commercial issue tracking plans for NumPy and would like thoughts and comments. Both of these plans allow Open Source projects to have unlimited plans for free. > >JIRA: > >http://www.atlassian.com/software/jira/overview/tour/code-integration > > At work we just transitioned off JIRA to TFS. Have to say, for bug tracking, JIRA was a lot better than TFS, not too good as a planning tool though. It is quite customizable and flexible. Nice ability to set up automatic e-mails and such as well. -- ----------------------------------------------------------------------- | Alan K. Jackson | To see a World in a Grain of Sand | | alan at ajackson.org | And a Heaven in a Wild Flower, | | www.ajackson.org | Hold Infinity in the palm of your hand | | Houston, Texas | And Eternity in an hour. - Blake | ----------------------------------------------------------------------- From mwwiebe at gmail.com Mon Feb 13 21:19:16 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Mon, 13 Feb 2012 18:19:16 -0800 Subject: [Numpy-discussion] [IPython-dev] Discussion with Guido van Rossum and (hopefully) core python-dev on scientific Python and Python3 In-Reply-To: References: Message-ID: It might be nice to turn the matrix class into a short class hierarchy, something like this: class MatrixBase class DenseMatrix(MatrixBase) class TriangularMatrix(MatrixBase) # Maybe a few variations of upper/lower triangular and whether the diagonal is stored class SymmetricMatrix(MatrixBase) These other matrix classes could use packed storage, and could call the specific optimized BLAS/LAPACK functions to get higher performance when it is known the matrix is triangular or symmetric. I'm not sure whether this affects the discussion of the matrix * and \ operators, but it's a possibility to consider. -Mark On Mon, Feb 13, 2012 at 4:53 PM, Aaron Meurer wrote: > I'd like the ability to make "in" (i.e., __contains__) return > something other than a bool. > > Also, the ability to make the x < y < z syntax would be useful. It's > been suggested that the ability to override the boolean operators > (and, or, not) would be the way to do this (pep 335), though I'm not > 100% convinced that's the way to go. > > Aaron Meurer > > On Mon, Feb 13, 2012 at 2:55 PM, Fernando Perez > wrote: > > Hi folks, > > > > [ I'm broadcasting this widely for maximum reach, but I'd appreciate > > it if replies can be kept to the *numpy* list, which is sort of the > > 'base' list for scientific/numerical work. It will make it much > > easier to organize a coherent set of notes later on. Apology if > > you're subscribed to all and get it 10 times. ] > > > > As part of the PyData workshop (http://pydataworkshop.eventbrite.com) > > to be held March 2 and 3 at the Mountain View Google offices, we have > > scheduled a session for an open discussion with Guido van Rossum and > > hopefully as many core python-dev members who can make it. We wanted > > to seize the combined opportunity of the PyData workshop bringing a > > number of 'scipy people' to Google with the timeline for Python 3.3, > > the first release after the Python language moratorium, being within > > sight: http://www.python.org/dev/peps/pep-0398. > > > > While a number of scientific Python packages are already available for > > Python 3 (either in released form or in their master git branches), > > it's fair to say that there hasn't been a major transition of the > > scientific community to Python3. Since there is no more development > > being done on the Python2 series, eventually we will all want to find > > ways to make this transition, and we think that this is an excellent > > time to engage the core python development team and consider ideas > > that would make Python3 generally a more appealing language for > > scientific work. Guido has made it clear that he doesn't speak for > > the day-to-day development of Python anymore, so we all should be > > aware that any ideas that come out of this panel will still need to be > > discussed with python-dev itself via standard mechanisms before > > anything is implemented. Nonetheless, the opportunity for a solid > > face-to-face dialog for brainstorming was too good to pass up. > > > > The purpose of this email is then to solicit, from all of our > > community, ideas for this discussion. In a week or so we'll need to > > summarize the main points brought up here and make a more concrete > > agenda out of it; I will also post a summary of the meeting afterwards > > here. > > > > Anything is a valid topic, some points just to get the conversation > started: > > > > - Extra operators/PEP 225. Here's a summary from the last time we > > went over this, years ago at Scipy 2008: > > > http://mail.scipy.org/pipermail/numpy-discussion/2008-October/038234.html, > > and the current status of the document we wrote about it is here: > > > file:///home/fperez/www/site/_build/html/py4science/numpy-pep225/numpy-pep225.html. > > > > - Improved syntax/support for rationals or decimal literals? While > > Python now has both decimals > > (http://docs.python.org/library/decimal.html) and rationals > > (http://docs.python.org/library/fractions.html), they're quite clunky > > to use because they require full constructor calls. Guido has > > mentioned in previous discussions toying with ideas about support for > > different kinds of numeric literals... > > > > - Using the numpy docstring standard python-wide, and thus having > > python improve the pathetic state of the stdlib's docstrings? This is > > an area where our community is light years ahead of the standard > > library, but we'd all benefit from Python itself improving on this > > front. I'm toying with the idea of giving a lighting talk at PyConn > > about this, comparing the great, robust culture and tools of good > > docstrings across the Scipy ecosystem with the sad, sad state of > > docstrings in the stdlib. It might spur some movement on that front > > from the stdlib authors, esp. if the core python-dev team realizes the > > value and benefit it can bring (at relatively low cost, given how most > > of the information does exist, it's just in the wrong places). But > > more importantly for us, if there was truly a universal standard for > > high-quality docstrings across Python projects, building good > > documentation/help machinery would be a lot easier, as we'd know what > > to expect and search for (such as rendering them nicely in the ipython > > notebook, providing high-quality cross-project help search, etc). > > > > - Literal syntax for arrays? Sage has been floating a discussion > > about a literal matrix syntax > > (https://groups.google.com/forum/#!topic/sage-devel/mzwepqZBHnA). For > > something like this to go into python in any meaningful way there > > would have to be core multidimensional arrays in the language, but > > perhaps it's time to think about a piece of the numpy array itself > > into Python? This is one of the more 'out there' ideas, but after > > all, that's the point of a discussion like this, especially > > considering we'll have both Travis and Guido in one room. > > > > - Other syntactic sugar? Sage has "a..b" <=> range(a, b+1), which I > > actually think is both nice and useful... There's also the question > > of allowing "a:b:c" notation outside of [], which has come up a few > > times in conversation over the last few years. Others? > > > > - The packaging quagmire? This continues to be a problem, though > > python3 does have new improvements to distutils. I'm not really up to > > speed on the situation, to be frank. If we want to bring this up, > > someone will have to provide a solid reference or volunteer to do it > > in person. > > > > - etc... > > > > I'm putting the above just to *start* the discussion, but the real > > point is for the rest of the community to contribute ideas, so don't > > be shy. > > > > Final note: while I am here commiting to organizing and presenting > > this at the discussion with Guido (as well as contacting python-dev), > > I would greatly appreciate help with the task of summarizing this > > prior to the meeting as I'm pretty badly swamped in the run-in to > > pydata/pycon. So if anyone is willing to help draft the summary as > > the date draws closer (we can put it up on a github wiki, gist, > > whatever), I will be very grateful. I'm sure it will be better than > > what I'll otherwise do the last night at 2am :) > > > > Cheers, > > > > f > > > > ps - to the obvious question about webcasting the discussion live for > > remote participation: yes, we looked into it already; no, > > unfortunately it appears it won't be possible. We'll try to at least > > have the audio recorded (and possibly video) for posting later on. > > > > pps- if you are close to Mountain View and are interested in attending > > this panel in person, drop me a line at fernando.perez at berkeley.edu. > > We have a few spots available *for this discussion only* on top of the > > pydata regular attendance (which is long closed, I'm afraid). But > > we'll need to provide Google with a list of those attendees in > > advance. Please indicate if you are a core python committer in your > > email, as we'll give priority for this overflow pool to core python > > developers (but will otherwise accommodate as many people as Google > > lets us). > > _______________________________________________ > > IPython-dev mailing list > > IPython-dev at scipy.org > > http://mail.scipy.org/mailman/listinfo/ipython-dev > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Mon Feb 13 22:02:38 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Mon, 13 Feb 2012 19:02:38 -0800 Subject: [Numpy-discussion] can_cast with structured array output - bug? In-Reply-To: References: Message-ID: I took a look into the code to see what is causing this, and the reason is that nothing has ever been implemented to deal with the fields. This means it falls back to treating all struct dtypes as if they were a plain "void" dtype, which allows anything to be cast to it. While I was redoing the casting subsystem for 1.6, I did think on this issue, and decided that it wasn't worth tackling it at the time because the 'safe'/'same_kind'/'unsafe' don't seem sufficient to handle what might be desired. I tried to leave this alone as much as possible. Some random thoughts about this are: * Casting a scalar to a struct dtype: should it be safe if the scalar can be safely cast to each member of the struct dtype? This is the NumPy broadcasting rule applied to dtypes as if the struct dtype is another dimension. * Casting one struct dtype to another: If the fields of the source are a subset of the target, and the types can safely convert, should that be a safe cast? If the fields of the source are not a subset of the target, should that still be a same_kind cast? Should a second enum which complements the safe/same_kind/unsafe one, but is specific for how adding/removing struct fields be added? This is closely related to adding ufunc support for struct dtypes, and the choices here should probably be decided at the same time as designing how the ufuncs should work. -Mark On Mon, Feb 13, 2012 at 5:20 PM, Matthew Brett wrote: > Hi, > > I've also just noticed this oddity: > > In [17]: np.can_cast('c', 'u1') > Out[17]: False > > OK so far, but... > > In [18]: np.can_cast('c', [('f1', 'u1')]) > Out[18]: True > > In [19]: np.can_cast('c', [('f1', 'u1')], 'safe') > Out[19]: True > > In [20]: np.can_cast(np.ones(10, dtype='c'), [('f1', 'u1')]) > Out[20]: True > > I think this must be a bug. > > In the other direction, it makes more sense to me: > > In [24]: np.can_cast([('f1', 'u1')], 'c') > Out[24]: False > > In [25]: np.can_cast([('f1', 'u1')], [('f1', 'u1')]) > Out[25]: True > > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Mon Feb 13 22:11:06 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 13 Feb 2012 21:11:06 -0600 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: References: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> Message-ID: <4F3CCD79-7DDB-4FB9-9C2E-57759C1FB679@continuum.io> The problem is that these sorts of things take a while to emerge. The original system was more consistent than I think you give it credit. What you are seeing is that most people get NumPy from distributions and are relying on us to keep things consistent. The scalar coercion rules were deterministic and based on the idea that a scalar does not determine the output dtype unless it is of a different kind. The new code changes that unfortunately. Another thing I noticed is that I thought that int16 scalar float would produce float32 originally. This seems to have changed, but I need to check on an older version of NumPy. Changing the scalar coercion rules is an unfortunate substantial change in semantics and should not have happened in the 1.X series. I understand you did not get a lot of feedback and spent a lot of time on the code which we all appreciate. I worked to stay true to the Numeric casting rules incorporating the changes to prevent scalar upcasting due to the absence of single precision Numeric literals in Python. We will need to look in detail at what has changed. I will write a test to do that. Thanks, Travis -- Travis Oliphant (on a mobile) 512-826-7480 On Feb 13, 2012, at 7:58 PM, Mark Wiebe wrote: > On Mon, Feb 13, 2012 at 5:00 PM, Travis Oliphant wrote: > Hmmm. This seems like a regression. The scalar casting API was fairly intentional. > > What is the reason for the change? > > In order to make 1.6 ABI-compatible with 1.5, I basically had to rewrite this subsystem. There were virtually no tests in the test suite specifying what the expected behavior should be, and there were clear inconsistencies where for example "a+b" could result in a different type than "b+a". I recall there being some bugs in the tracker related to this as well, but I don't remember those details. > > This change felt like an obvious extension of an existing behavior for eliminating overflow, where the promotion changed unsigned -> signed based on the value of the scalar. This change introduced minimal upcasting only in a set of cases where an overflow was guaranteed to happen without that upcasting. > > During the 1.6 beta period, I signaled that this subsystem had changed, as the bullet point starting "The ufunc uses a more consistent algorithm for loop selection.": > > http://mail.scipy.org/pipermail/numpy-discussion/2011-March/055156.html > > The behavior Matthew has observed is a direct result of how I designed the minimization function mentioned in that bullet point, and the algorithm for it is documented in the 'Notes' section of the result_type page: > > http://docs.scipy.org/doc/numpy/reference/generated/numpy.result_type.html > > Hopefully that explains it well enough. I made the change intentionally and carefully, tested its impact on SciPy and other projects, and advocated for it during the release cycle. > > Cheers, > Mark > > -- > Travis Oliphant > (on a mobile) > 512-826-7480 > > > On Feb 13, 2012, at 6:25 PM, Matthew Brett wrote: > > > Hi, > > > > I recently noticed a change in the upcasting rules in numpy 1.6.0 / > > 1.6.1 and I just wanted to check it was intentional. > > > > For all versions of numpy I've tested, we have: > > > >>>> import numpy as np > >>>> Adata = np.array([127], dtype=np.int8) > >>>> Bdata = np.int16(127) > >>>> (Adata + Bdata).dtype > > dtype('int8') > > > > That is - adding an integer scalar of a larger dtype does not result > > in upcasting of the output dtype, if the data in the scalar type fits > > in the smaller. > > > > For numpy < 1.6.0 we have this: > > > >>>> Bdata = np.int16(128) > >>>> (Adata + Bdata).dtype > > dtype('int8') > > > > That is - even if the data in the scalar does not fit in the dtype of > > the array to which it is being added, there is no upcasting. > > > > For numpy >= 1.6.0 we have this: > > > >>>> Bdata = np.int16(128) > >>>> (Adata + Bdata).dtype > > dtype('int16') > > > > There is upcasting... > > > > I can see why the numpy 1.6.0 way might be preferable but it is an API > > change I suppose. > > > > Best, > > > > Matthew > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Mon Feb 13 22:40:38 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Mon, 13 Feb 2012 19:40:38 -0800 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: <4F3CCD79-7DDB-4FB9-9C2E-57759C1FB679@continuum.io> References: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> <4F3CCD79-7DDB-4FB9-9C2E-57759C1FB679@continuum.io> Message-ID: I believe the main lessons to draw from this are just how incredibly important a complete test suite and staying on top of code reviews are. I'm of the opinion that any explicit design choice of this nature should be reflected in the test suite, so that if someone changes it years later, they get immediate feedback that they're breaking something important. NumPy has gradually increased its test suite coverage, and when I dealt with the type promotion subsystem, I added fairly extensive tests: https://github.com/numpy/numpy/blob/master/numpy/core/tests/test_numeric.py#L345 Another subsystem which is in a similar state as what the type promotion subsystem was, is the subscript operator and how regular/fancy indexing work. What this means is that any attempt to improve it that doesn't coincide with the original intent years ago can easily break things that were originally intended without them being caught by a test. I believe this subsystem needs improvement, and the transition to new/improved code will probably be trickier to manage than for the dtype promotion case. Let's try to learn from the type promotion case as best we can, and use it to improve NumPy's process. I believe Charles and Ralph have been doing a great job of enforcing high standards in new NumPy code, and managing the release process in a way that has resulted in very few bugs and regressions in the release. Most of these quality standards are still informal, however, and it's probably a good idea to write them down in a canonical location. It will be especially helpful for newcomers, who can treat the standards as a checklist before submitting pull requests. Thanks, -Mark On Mon, Feb 13, 2012 at 7:11 PM, Travis Oliphant wrote: > The problem is that these sorts of things take a while to emerge. The > original system was more consistent than I think you give it credit. What > you are seeing is that most people get NumPy from distributions and are > relying on us to keep things consistent. > > The scalar coercion rules were deterministic and based on the idea that a > scalar does not determine the output dtype unless it is of a different > kind. The new code changes that unfortunately. > > Another thing I noticed is that I thought that int16 scalar float > would produce float32 originally. This seems to have changed, but I need > to check on an older version of NumPy. > > Changing the scalar coercion rules is an unfortunate substantial change in > semantics and should not have happened in the 1.X series. > > I understand you did not get a lot of feedback and spent a lot of time on > the code which we all appreciate. I worked to stay true to the Numeric > casting rules incorporating the changes to prevent scalar upcasting due to > the absence of single precision Numeric literals in Python. > > We will need to look in detail at what has changed. I will write a test > to do that. > > Thanks, > > Travis > > -- > Travis Oliphant > (on a mobile) > 512-826-7480 > > > On Feb 13, 2012, at 7:58 PM, Mark Wiebe wrote: > > On Mon, Feb 13, 2012 at 5:00 PM, Travis Oliphant wrote: > >> Hmmm. This seems like a regression. The scalar casting API was fairly >> intentional. >> >> What is the reason for the change? >> > > In order to make 1.6 ABI-compatible with 1.5, I basically had to rewrite > this subsystem. There were virtually no tests in the test suite specifying > what the expected behavior should be, and there were clear inconsistencies > where for example "a+b" could result in a different type than "b+a". I > recall there being some bugs in the tracker related to this as well, but I > don't remember those details. > > This change felt like an obvious extension of an existing behavior for > eliminating overflow, where the promotion changed unsigned -> signed based > on the value of the scalar. This change introduced minimal upcasting only > in a set of cases where an overflow was guaranteed to happen without that > upcasting. > > During the 1.6 beta period, I signaled that this subsystem had changed, as > the bullet point starting "The ufunc uses a more consistent algorithm for > loop selection.": > > http://mail.scipy.org/pipermail/numpy-discussion/2011-March/055156.html > > The behavior Matthew has observed is a direct result of how I designed the > minimization function mentioned in that bullet point, and the algorithm for > it is documented in the 'Notes' section of the result_type page: > > http://docs.scipy.org/doc/numpy/reference/generated/numpy.result_type.html > > Hopefully that explains it well enough. I made the change intentionally > and carefully, tested its impact on SciPy and other projects, and advocated > for it during the release cycle. > > Cheers, > Mark > > -- >> Travis Oliphant >> (on a mobile) >> 512-826-7480 >> >> >> On Feb 13, 2012, at 6:25 PM, Matthew Brett >> wrote: >> >> > Hi, >> > >> > I recently noticed a change in the upcasting rules in numpy 1.6.0 / >> > 1.6.1 and I just wanted to check it was intentional. >> > >> > For all versions of numpy I've tested, we have: >> > >> >>>> import numpy as np >> >>>> Adata = np.array([127], dtype=np.int8) >> >>>> Bdata = np.int16(127) >> >>>> (Adata + Bdata).dtype >> > dtype('int8') >> > >> > That is - adding an integer scalar of a larger dtype does not result >> > in upcasting of the output dtype, if the data in the scalar type fits >> > in the smaller. >> > >> > For numpy < 1.6.0 we have this: >> > >> >>>> Bdata = np.int16(128) >> >>>> (Adata + Bdata).dtype >> > dtype('int8') >> > >> > That is - even if the data in the scalar does not fit in the dtype of >> > the array to which it is being added, there is no upcasting. >> > >> > For numpy >= 1.6.0 we have this: >> > >> >>>> Bdata = np.int16(128) >> >>>> (Adata + Bdata).dtype >> > dtype('int16') >> > >> > There is upcasting... >> > >> > I can see why the numpy 1.6.0 way might be preferable but it is an API >> > change I suppose. >> > >> > Best, >> > >> > Matthew >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Mon Feb 13 23:01:54 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 13 Feb 2012 20:01:54 -0800 Subject: [Numpy-discussion] [IPython-dev] Discussion with Guido van Rossum and (hopefully) core python-dev on scientific Python and Python3 In-Reply-To: References: Message-ID: <4F39DCB2.7060007@astro.uio.no> On 02/13/2012 06:19 PM, Mark Wiebe wrote: > It might be nice to turn the matrix class into a short class hierarchy, > something like this: > > class MatrixBase > class DenseMatrix(MatrixBase) > class TriangularMatrix(MatrixBase) # Maybe a few variations of > upper/lower triangular and whether the diagonal is stored > class SymmetricMatrix(MatrixBase) > > These other matrix classes could use packed storage, and could call the > specific optimized BLAS/LAPACK functions to get higher performance when > it is known the matrix is triangular or symmetric. I'm not sure whether > this affects the discussion of the matrix * and \ operators, but it's a > possibility to consider. I've been working on exactly this (+ some more) in January, and will be continuing to in the months to come. (Can write more tomorrow if anybody's interested -- or email me directly as I don't have a 0.1 release to show yet -- got to go now) Dag From ben.root at ou.edu Mon Feb 13 23:03:07 2012 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 13 Feb 2012 22:03:07 -0600 Subject: [Numpy-discussion] [SciPy-Dev] Discussion with Guido van Rossum and (hopefully) core python-dev on scientific Python and Python3 In-Reply-To: References: Message-ID: On Monday, February 13, 2012, Aaron Meurer wrote: > I'd like the ability to make "in" (i.e., __contains__) return > something other than a bool. > > Also, the ability to make the x < y < z syntax would be useful. It's > been suggested that the ability to override the boolean operators > (and, or, not) would be the way to do this (pep 335), though I'm not > 100% convinced that's the way to go. > > Aaron Meurer +1 on these syntax ideas, however I do agree that it might be a bit problematic. Also, I remember once talking about labeled arrays and discussing ways to index them and ways to indicate which axis the indexing was for. That might require some sort of syntax changes. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Mon Feb 13 23:04:27 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 13 Feb 2012 22:04:27 -0600 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: References: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> <4F3CCD79-7DDB-4FB9-9C2E-57759C1FB679@continuum.io> Message-ID: <10FB0A88-DB1F-41BF-9A1F-45BC02C5D9EB@continuum.io> I disagree with your assessment of the subscript operator, but I'm sure we will have plenty of time to discuss that. I don't think it's correct to compare the corner cases of the fancy indexing and regular indexing to the corner cases of type coercion system. If you recall, I was quite nervous about all the changes you made to the coercion rules because I didn't believe you fully understood what had been done before and I knew there was not complete test coverage. It is true that both systems have emerged from a long history and could definitely use fresh perspectives which we all appreciate you and others bringing. It is also true that few are aware of the details of how things are actually implemented and that there are corner cases that are basically defined by the algorithm used (this is more true of the type-coercion system than fancy-indexing, however). I think it would have been wise to write those extensive tests prior to writing new code. I'm curious if what you were expecting for the output was derived from what earlier versions of NumPy produced. NumPy has never been in a state where you could just re-factor at will and assume that tests will catch all intended use cases. Numeric before it was not in that state either. This is a good goal, and we always welcome new tests. It just takes a lot of time and a lot of tedious work that the volunteer labor to this point have not had the time to do. Very few of us have ever been paid to work on NumPy directly and have often been trying to fit in improvements to the code base between other jobs we are supposed to be doing. Of course, you and I are hoping to change that this year and look forward to the code quality improving commensurately. Thanks for all you are doing. I also agree that Rolf and Charles have-been and are invaluable in the maintenance and progress of NumPy and SciPy. They deserve as much praise and kudos as anyone can give them. -Travis On Feb 13, 2012, at 9:40 PM, Mark Wiebe wrote: > I believe the main lessons to draw from this are just how incredibly important a complete test suite and staying on top of code reviews are. I'm of the opinion that any explicit design choice of this nature should be reflected in the test suite, so that if someone changes it years later, they get immediate feedback that they're breaking something important. NumPy has gradually increased its test suite coverage, and when I dealt with the type promotion subsystem, I added fairly extensive tests: > > https://github.com/numpy/numpy/blob/master/numpy/core/tests/test_numeric.py#L345 > > Another subsystem which is in a similar state as what the type promotion subsystem was, is the subscript operator and how regular/fancy indexing work. What this means is that any attempt to improve it that doesn't coincide with the original intent years ago can easily break things that were originally intended without them being caught by a test. I believe this subsystem needs improvement, and the transition to new/improved code will probably be trickier to manage than for the dtype promotion case. > > Let's try to learn from the type promotion case as best we can, and use it to improve NumPy's process. I believe Charles and Ralph have been doing a great job of enforcing high standards in new NumPy code, and managing the release process in a way that has resulted in very few bugs and regressions in the release. Most of these quality standards are still informal, however, and it's probably a good idea to write them down in a canonical location. It will be especially helpful for newcomers, who can treat the standards as a checklist before submitting pull requests. > > Thanks, > -Mark > > On Mon, Feb 13, 2012 at 7:11 PM, Travis Oliphant wrote: > The problem is that these sorts of things take a while to emerge. The original system was more consistent than I think you give it credit. What you are seeing is that most people get NumPy from distributions and are relying on us to keep things consistent. > > The scalar coercion rules were deterministic and based on the idea that a scalar does not determine the output dtype unless it is of a different kind. The new code changes that unfortunately. > > Another thing I noticed is that I thought that int16 scalar float would produce float32 originally. This seems to have changed, but I need to check on an older version of NumPy. > > Changing the scalar coercion rules is an unfortunate substantial change in semantics and should not have happened in the 1.X series. > > I understand you did not get a lot of feedback and spent a lot of time on the code which we all appreciate. I worked to stay true to the Numeric casting rules incorporating the changes to prevent scalar upcasting due to the absence of single precision Numeric literals in Python. > > We will need to look in detail at what has changed. I will write a test to do that. > > Thanks, > > Travis > > -- > Travis Oliphant > (on a mobile) > 512-826-7480 > > > On Feb 13, 2012, at 7:58 PM, Mark Wiebe wrote: > >> On Mon, Feb 13, 2012 at 5:00 PM, Travis Oliphant wrote: >> Hmmm. This seems like a regression. The scalar casting API was fairly intentional. >> >> What is the reason for the change? >> >> In order to make 1.6 ABI-compatible with 1.5, I basically had to rewrite this subsystem. There were virtually no tests in the test suite specifying what the expected behavior should be, and there were clear inconsistencies where for example "a+b" could result in a different type than "b+a". I recall there being some bugs in the tracker related to this as well, but I don't remember those details. >> >> This change felt like an obvious extension of an existing behavior for eliminating overflow, where the promotion changed unsigned -> signed based on the value of the scalar. This change introduced minimal upcasting only in a set of cases where an overflow was guaranteed to happen without that upcasting. >> >> During the 1.6 beta period, I signaled that this subsystem had changed, as the bullet point starting "The ufunc uses a more consistent algorithm for loop selection.": >> >> http://mail.scipy.org/pipermail/numpy-discussion/2011-March/055156.html >> >> The behavior Matthew has observed is a direct result of how I designed the minimization function mentioned in that bullet point, and the algorithm for it is documented in the 'Notes' section of the result_type page: >> >> http://docs.scipy.org/doc/numpy/reference/generated/numpy.result_type.html >> >> Hopefully that explains it well enough. I made the change intentionally and carefully, tested its impact on SciPy and other projects, and advocated for it during the release cycle. >> >> Cheers, >> Mark >> >> -- >> Travis Oliphant >> (on a mobile) >> 512-826-7480 >> >> >> On Feb 13, 2012, at 6:25 PM, Matthew Brett wrote: >> >> > Hi, >> > >> > I recently noticed a change in the upcasting rules in numpy 1.6.0 / >> > 1.6.1 and I just wanted to check it was intentional. >> > >> > For all versions of numpy I've tested, we have: >> > >> >>>> import numpy as np >> >>>> Adata = np.array([127], dtype=np.int8) >> >>>> Bdata = np.int16(127) >> >>>> (Adata + Bdata).dtype >> > dtype('int8') >> > >> > That is - adding an integer scalar of a larger dtype does not result >> > in upcasting of the output dtype, if the data in the scalar type fits >> > in the smaller. >> > >> > For numpy < 1.6.0 we have this: >> > >> >>>> Bdata = np.int16(128) >> >>>> (Adata + Bdata).dtype >> > dtype('int8') >> > >> > That is - even if the data in the scalar does not fit in the dtype of >> > the array to which it is being added, there is no upcasting. >> > >> > For numpy >= 1.6.0 we have this: >> > >> >>>> Bdata = np.int16(128) >> >>>> (Adata + Bdata).dtype >> > dtype('int16') >> > >> > There is upcasting... >> > >> > I can see why the numpy 1.6.0 way might be preferable but it is an API >> > change I suppose. >> > >> > Best, >> > >> > Matthew >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Mon Feb 13 23:08:39 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 13 Feb 2012 22:08:39 -0600 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: References: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> <4F3CCD79-7DDB-4FB9-9C2E-57759C1FB679@continuum.io> Message-ID: I can also confirm that at least on NumPy 1.5.1: integer array * (literal Python float scalar) --- creates a double result. So, my memory was incorrect on that (unless it changed at an earlier release, but I don't think so). -Travis On Feb 13, 2012, at 9:40 PM, Mark Wiebe wrote: > I believe the main lessons to draw from this are just how incredibly important a complete test suite and staying on top of code reviews are. I'm of the opinion that any explicit design choice of this nature should be reflected in the test suite, so that if someone changes it years later, they get immediate feedback that they're breaking something important. NumPy has gradually increased its test suite coverage, and when I dealt with the type promotion subsystem, I added fairly extensive tests: > > https://github.com/numpy/numpy/blob/master/numpy/core/tests/test_numeric.py#L345 > > Another subsystem which is in a similar state as what the type promotion subsystem was, is the subscript operator and how regular/fancy indexing work. What this means is that any attempt to improve it that doesn't coincide with the original intent years ago can easily break things that were originally intended without them being caught by a test. I believe this subsystem needs improvement, and the transition to new/improved code will probably be trickier to manage than for the dtype promotion case. > > Let's try to learn from the type promotion case as best we can, and use it to improve NumPy's process. I believe Charles and Ralph have been doing a great job of enforcing high standards in new NumPy code, and managing the release process in a way that has resulted in very few bugs and regressions in the release. Most of these quality standards are still informal, however, and it's probably a good idea to write them down in a canonical location. It will be especially helpful for newcomers, who can treat the standards as a checklist before submitting pull requests. > > Thanks, > -Mark > > On Mon, Feb 13, 2012 at 7:11 PM, Travis Oliphant wrote: > The problem is that these sorts of things take a while to emerge. The original system was more consistent than I think you give it credit. What you are seeing is that most people get NumPy from distributions and are relying on us to keep things consistent. > > The scalar coercion rules were deterministic and based on the idea that a scalar does not determine the output dtype unless it is of a different kind. The new code changes that unfortunately. > > Another thing I noticed is that I thought that int16 scalar float would produce float32 originally. This seems to have changed, but I need to check on an older version of NumPy. > > Changing the scalar coercion rules is an unfortunate substantial change in semantics and should not have happened in the 1.X series. > > I understand you did not get a lot of feedback and spent a lot of time on the code which we all appreciate. I worked to stay true to the Numeric casting rules incorporating the changes to prevent scalar upcasting due to the absence of single precision Numeric literals in Python. > > We will need to look in detail at what has changed. I will write a test to do that. > > Thanks, > > Travis > > -- > Travis Oliphant > (on a mobile) > 512-826-7480 > > > On Feb 13, 2012, at 7:58 PM, Mark Wiebe wrote: > >> On Mon, Feb 13, 2012 at 5:00 PM, Travis Oliphant wrote: >> Hmmm. This seems like a regression. The scalar casting API was fairly intentional. >> >> What is the reason for the change? >> >> In order to make 1.6 ABI-compatible with 1.5, I basically had to rewrite this subsystem. There were virtually no tests in the test suite specifying what the expected behavior should be, and there were clear inconsistencies where for example "a+b" could result in a different type than "b+a". I recall there being some bugs in the tracker related to this as well, but I don't remember those details. >> >> This change felt like an obvious extension of an existing behavior for eliminating overflow, where the promotion changed unsigned -> signed based on the value of the scalar. This change introduced minimal upcasting only in a set of cases where an overflow was guaranteed to happen without that upcasting. >> >> During the 1.6 beta period, I signaled that this subsystem had changed, as the bullet point starting "The ufunc uses a more consistent algorithm for loop selection.": >> >> http://mail.scipy.org/pipermail/numpy-discussion/2011-March/055156.html >> >> The behavior Matthew has observed is a direct result of how I designed the minimization function mentioned in that bullet point, and the algorithm for it is documented in the 'Notes' section of the result_type page: >> >> http://docs.scipy.org/doc/numpy/reference/generated/numpy.result_type.html >> >> Hopefully that explains it well enough. I made the change intentionally and carefully, tested its impact on SciPy and other projects, and advocated for it during the release cycle. >> >> Cheers, >> Mark >> >> -- >> Travis Oliphant >> (on a mobile) >> 512-826-7480 >> >> >> On Feb 13, 2012, at 6:25 PM, Matthew Brett wrote: >> >> > Hi, >> > >> > I recently noticed a change in the upcasting rules in numpy 1.6.0 / >> > 1.6.1 and I just wanted to check it was intentional. >> > >> > For all versions of numpy I've tested, we have: >> > >> >>>> import numpy as np >> >>>> Adata = np.array([127], dtype=np.int8) >> >>>> Bdata = np.int16(127) >> >>>> (Adata + Bdata).dtype >> > dtype('int8') >> > >> > That is - adding an integer scalar of a larger dtype does not result >> > in upcasting of the output dtype, if the data in the scalar type fits >> > in the smaller. >> > >> > For numpy < 1.6.0 we have this: >> > >> >>>> Bdata = np.int16(128) >> >>>> (Adata + Bdata).dtype >> > dtype('int8') >> > >> > That is - even if the data in the scalar does not fit in the dtype of >> > the array to which it is being added, there is no upcasting. >> > >> > For numpy >= 1.6.0 we have this: >> > >> >>>> Bdata = np.int16(128) >> >>>> (Adata + Bdata).dtype >> > dtype('int16') >> > >> > There is upcasting... >> > >> > I can see why the numpy 1.6.0 way might be preferable but it is an API >> > change I suppose. >> > >> > Best, >> > >> > Matthew >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Feb 13 23:14:44 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 13 Feb 2012 21:14:44 -0700 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: <10FB0A88-DB1F-41BF-9A1F-45BC02C5D9EB@continuum.io> References: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> <4F3CCD79-7DDB-4FB9-9C2E-57759C1FB679@continuum.io> <10FB0A88-DB1F-41BF-9A1F-45BC02C5D9EB@continuum.io> Message-ID: On Mon, Feb 13, 2012 at 9:04 PM, Travis Oliphant wrote: > I disagree with your assessment of the subscript operator, but I'm sure we > will have plenty of time to discuss that. I don't think it's correct to > compare the corner cases of the fancy indexing and regular indexing to the > corner cases of type coercion system. If you recall, I was quite nervous > about all the changes you made to the coercion rules because I didn't > believe you fully understood what had been done before and I knew there was > not complete test coverage. > > It is true that both systems have emerged from a long history and could > definitely use fresh perspectives which we all appreciate you and others > bringing. It is also true that few are aware of the details of how things > are actually implemented and that there are corner cases that are basically > defined by the algorithm used (this is more true of the type-coercion > system than fancy-indexing, however). > > I think it would have been wise to write those extensive tests prior to > writing new code. I'm curious if what you were expecting for the output > was derived from what earlier versions of NumPy produced. NumPy has > never been in a state where you could just re-factor at will and assume > that tests will catch all intended use cases. Numeric before it was not > in that state either. This is a good goal, and we always welcome new > tests. It just takes a lot of time and a lot of tedious work that the > volunteer labor to this point have not had the time to do. > > Very few of us have ever been paid to work on NumPy directly and have > often been trying to fit in improvements to the code base between other > jobs we are supposed to be doing. Of course, you and I are hoping to > change that this year and look forward to the code quality improving > commensurately. > > Thanks for all you are doing. I also agree that Rolf and Charles > have-been and are invaluable in the maintenance and progress of NumPy and > SciPy. They deserve as much praise and kudos as anyone can give them. > > Well, the typecasting wasn't perfect and, as Mark points out, it wasn't commutative. The addition of float16 also complicated the picture, and user types is going to do more in that direction. And I don't see how a new developer should be responsible for tests enforcing old traditions, the original developers should be responsible for those. But history is history, it didn't happen that way, and here we are. That said, I think we need to show a little flexibility in the corner cases. And going forward I think that typecasting is going to need a rethink. Chuck On Feb 13, 2012, at 9:40 PM, Mark Wiebe wrote: > > I believe the main lessons to draw from this are just how incredibly > important a complete test suite and staying on top of code reviews are. I'm > of the opinion that any explicit design choice of this nature should be > reflected in the test suite, so that if someone changes it years later, > they get immediate feedback that they're breaking something important. > NumPy has gradually increased its test suite coverage, and when I dealt > with the type promotion subsystem, I added fairly extensive tests: > > > https://github.com/numpy/numpy/blob/master/numpy/core/tests/test_numeric.py#L345 > > Another subsystem which is in a similar state as what the type promotion > subsystem was, is the subscript operator and how regular/fancy indexing > work. What this means is that any attempt to improve it that doesn't > coincide with the original intent years ago can easily break things that > were originally intended without them being caught by a test. I believe > this subsystem needs improvement, and the transition to new/improved code > will probably be trickier to manage than for the dtype promotion case. > > Let's try to learn from the type promotion case as best we can, and use it > to improve NumPy's process. I believe Charles and Ralph have been doing a > great job of enforcing high standards in new NumPy code, and managing the > release process in a way that has resulted in very few bugs and regressions > in the release. Most of these quality standards are still informal, > however, and it's probably a good idea to write them down in a canonical > location. It will be especially helpful for newcomers, who can treat the > standards as a checklist before submitting pull requests. > > Thanks, > -Mark > > On Mon, Feb 13, 2012 at 7:11 PM, Travis Oliphant wrote: > >> The problem is that these sorts of things take a while to emerge. The >> original system was more consistent than I think you give it credit. What >> you are seeing is that most people get NumPy from distributions and are >> relying on us to keep things consistent. >> >> The scalar coercion rules were deterministic and based on the idea that a >> scalar does not determine the output dtype unless it is of a different >> kind. The new code changes that unfortunately. >> >> Another thing I noticed is that I thought that int16 scalar float >> would produce float32 originally. This seems to have changed, but I need >> to check on an older version of NumPy. >> >> Changing the scalar coercion rules is an unfortunate substantial change >> in semantics and should not have happened in the 1.X series. >> >> I understand you did not get a lot of feedback and spent a lot of time on >> the code which we all appreciate. I worked to stay true to the Numeric >> casting rules incorporating the changes to prevent scalar upcasting due to >> the absence of single precision Numeric literals in Python. >> >> We will need to look in detail at what has changed. I will write a test >> to do that. >> >> Thanks, >> >> Travis >> >> -- >> Travis Oliphant >> (on a mobile) >> 512-826-7480 >> >> >> On Feb 13, 2012, at 7:58 PM, Mark Wiebe wrote: >> >> On Mon, Feb 13, 2012 at 5:00 PM, Travis Oliphant wrote: >> >>> Hmmm. This seems like a regression. The scalar casting API was fairly >>> intentional. >>> >>> What is the reason for the change? >>> >> >> In order to make 1.6 ABI-compatible with 1.5, I basically had to rewrite >> this subsystem. There were virtually no tests in the test suite specifying >> what the expected behavior should be, and there were clear inconsistencies >> where for example "a+b" could result in a different type than "b+a". I >> recall there being some bugs in the tracker related to this as well, but I >> don't remember those details. >> >> This change felt like an obvious extension of an existing behavior for >> eliminating overflow, where the promotion changed unsigned -> signed based >> on the value of the scalar. This change introduced minimal upcasting only >> in a set of cases where an overflow was guaranteed to happen without that >> upcasting. >> >> During the 1.6 beta period, I signaled that this subsystem had changed, >> as the bullet point starting "The ufunc uses a more consistent algorithm >> for loop selection.": >> >> http://mail.scipy.org/pipermail/numpy-discussion/2011-March/055156.html >> >> The behavior Matthew has observed is a direct result of how I designed >> the minimization function mentioned in that bullet point, and the algorithm >> for it is documented in the 'Notes' section of the result_type page: >> >> http://docs.scipy.org/doc/numpy/reference/generated/numpy.result_type.html >> >> Hopefully that explains it well enough. I made the change intentionally >> and carefully, tested its impact on SciPy and other projects, and advocated >> for it during the release cycle. >> >> Cheers, >> Mark >> >> -- >>> Travis Oliphant >>> (on a mobile) >>> 512-826-7480 >>> >>> >>> On Feb 13, 2012, at 6:25 PM, Matthew Brett >>> wrote: >>> >>> > Hi, >>> > >>> > I recently noticed a change in the upcasting rules in numpy 1.6.0 / >>> > 1.6.1 and I just wanted to check it was intentional. >>> > >>> > For all versions of numpy I've tested, we have: >>> > >>> >>>> import numpy as np >>> >>>> Adata = np.array([127], dtype=np.int8) >>> >>>> Bdata = np.int16(127) >>> >>>> (Adata + Bdata).dtype >>> > dtype('int8') >>> > >>> > That is - adding an integer scalar of a larger dtype does not result >>> > in upcasting of the output dtype, if the data in the scalar type fits >>> > in the smaller. >>> > >>> > For numpy < 1.6.0 we have this: >>> > >>> >>>> Bdata = np.int16(128) >>> >>>> (Adata + Bdata).dtype >>> > dtype('int8') >>> > >>> > That is - even if the data in the scalar does not fit in the dtype of >>> > the array to which it is being added, there is no upcasting. >>> > >>> > For numpy >= 1.6.0 we have this: >>> > >>> >>>> Bdata = np.int16(128) >>> >>>> (Adata + Bdata).dtype >>> > dtype('int16') >>> > >>> > There is upcasting... >>> > >>> > I can see why the numpy 1.6.0 way might be preferable but it is an API >>> > change I suppose. >>> > >>> > Best, >>> > >>> > Matthew >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion at scipy.org >>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Mon Feb 13 23:19:55 2012 From: shish at keba.be (Olivier Delalleau) Date: Mon, 13 Feb 2012 23:19:55 -0500 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: References: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> <4F3CCD79-7DDB-4FB9-9C2E-57759C1FB679@continuum.io> Message-ID: It hasn't changed: since float is of "a fundamentally different kind of data", it's expected to upcast the result. However, if I may add a personal comment on numpy's casting rules: until now, I've found them confusing and somewhat inconsistent. Some of the inconsistencies I've found were bugs, while others were unintuitive behavior (or, you may say, me not having the correct intuition ;) In particular the rule about mixed scalar / array operations is currently only described in the doc by a rather vague sentence. Also, the fact that the result's dtype can depend on the actual numerical values can be confusing when you work with variable whose values can span a wide range. So I think if you could come up with a table that says an operation involving two arrays of dtype1 & dtype2 always returns an output of dtype3, and a similar table for mixed scalar / array operations, that would be great! My 2 cents, -=- Olivier Le 13 f?vrier 2012 23:08, Travis Oliphant a ?crit : > I can also confirm that at least on NumPy 1.5.1: > > integer array * (literal Python float scalar) --- creates a double > result. > > So, my memory was incorrect on that (unless it changed at an earlier > release, but I don't think so). > > -Travis > > > > On Feb 13, 2012, at 9:40 PM, Mark Wiebe wrote: > > I believe the main lessons to draw from this are just how incredibly > important a complete test suite and staying on top of code reviews are. I'm > of the opinion that any explicit design choice of this nature should be > reflected in the test suite, so that if someone changes it years later, > they get immediate feedback that they're breaking something important. > NumPy has gradually increased its test suite coverage, and when I dealt > with the type promotion subsystem, I added fairly extensive tests: > > > https://github.com/numpy/numpy/blob/master/numpy/core/tests/test_numeric.py#L345 > > Another subsystem which is in a similar state as what the type promotion > subsystem was, is the subscript operator and how regular/fancy indexing > work. What this means is that any attempt to improve it that doesn't > coincide with the original intent years ago can easily break things that > were originally intended without them being caught by a test. I believe > this subsystem needs improvement, and the transition to new/improved code > will probably be trickier to manage than for the dtype promotion case. > > Let's try to learn from the type promotion case as best we can, and use it > to improve NumPy's process. I believe Charles and Ralph have been doing a > great job of enforcing high standards in new NumPy code, and managing the > release process in a way that has resulted in very few bugs and regressions > in the release. Most of these quality standards are still informal, > however, and it's probably a good idea to write them down in a canonical > location. It will be especially helpful for newcomers, who can treat the > standards as a checklist before submitting pull requests. > > Thanks, > -Mark > > On Mon, Feb 13, 2012 at 7:11 PM, Travis Oliphant wrote: > >> The problem is that these sorts of things take a while to emerge. The >> original system was more consistent than I think you give it credit. What >> you are seeing is that most people get NumPy from distributions and are >> relying on us to keep things consistent. >> >> The scalar coercion rules were deterministic and based on the idea that a >> scalar does not determine the output dtype unless it is of a different >> kind. The new code changes that unfortunately. >> >> Another thing I noticed is that I thought that int16 scalar float >> would produce float32 originally. This seems to have changed, but I need >> to check on an older version of NumPy. >> >> Changing the scalar coercion rules is an unfortunate substantial change >> in semantics and should not have happened in the 1.X series. >> >> I understand you did not get a lot of feedback and spent a lot of time on >> the code which we all appreciate. I worked to stay true to the Numeric >> casting rules incorporating the changes to prevent scalar upcasting due to >> the absence of single precision Numeric literals in Python. >> >> We will need to look in detail at what has changed. I will write a test >> to do that. >> >> Thanks, >> >> Travis >> >> -- >> Travis Oliphant >> (on a mobile) >> 512-826-7480 >> >> >> On Feb 13, 2012, at 7:58 PM, Mark Wiebe wrote: >> >> On Mon, Feb 13, 2012 at 5:00 PM, Travis Oliphant wrote: >> >>> Hmmm. This seems like a regression. The scalar casting API was fairly >>> intentional. >>> >>> What is the reason for the change? >>> >> >> In order to make 1.6 ABI-compatible with 1.5, I basically had to rewrite >> this subsystem. There were virtually no tests in the test suite specifying >> what the expected behavior should be, and there were clear inconsistencies >> where for example "a+b" could result in a different type than "b+a". I >> recall there being some bugs in the tracker related to this as well, but I >> don't remember those details. >> >> This change felt like an obvious extension of an existing behavior for >> eliminating overflow, where the promotion changed unsigned -> signed based >> on the value of the scalar. This change introduced minimal upcasting only >> in a set of cases where an overflow was guaranteed to happen without that >> upcasting. >> >> During the 1.6 beta period, I signaled that this subsystem had changed, >> as the bullet point starting "The ufunc uses a more consistent algorithm >> for loop selection.": >> >> http://mail.scipy.org/pipermail/numpy-discussion/2011-March/055156.html >> >> The behavior Matthew has observed is a direct result of how I designed >> the minimization function mentioned in that bullet point, and the algorithm >> for it is documented in the 'Notes' section of the result_type page: >> >> http://docs.scipy.org/doc/numpy/reference/generated/numpy.result_type.html >> >> Hopefully that explains it well enough. I made the change intentionally >> and carefully, tested its impact on SciPy and other projects, and advocated >> for it during the release cycle. >> >> Cheers, >> Mark >> >> -- >>> Travis Oliphant >>> (on a mobile) >>> 512-826-7480 >>> >>> >>> On Feb 13, 2012, at 6:25 PM, Matthew Brett >>> wrote: >>> >>> > Hi, >>> > >>> > I recently noticed a change in the upcasting rules in numpy 1.6.0 / >>> > 1.6.1 and I just wanted to check it was intentional. >>> > >>> > For all versions of numpy I've tested, we have: >>> > >>> >>>> import numpy as np >>> >>>> Adata = np.array([127], dtype=np.int8) >>> >>>> Bdata = np.int16(127) >>> >>>> (Adata + Bdata).dtype >>> > dtype('int8') >>> > >>> > That is - adding an integer scalar of a larger dtype does not result >>> > in upcasting of the output dtype, if the data in the scalar type fits >>> > in the smaller. >>> > >>> > For numpy < 1.6.0 we have this: >>> > >>> >>>> Bdata = np.int16(128) >>> >>>> (Adata + Bdata).dtype >>> > dtype('int8') >>> > >>> > That is - even if the data in the scalar does not fit in the dtype of >>> > the array to which it is being added, there is no upcasting. >>> > >>> > For numpy >= 1.6.0 we have this: >>> > >>> >>>> Bdata = np.int16(128) >>> >>>> (Adata + Bdata).dtype >>> > dtype('int16') >>> > >>> > There is upcasting... >>> > >>> > I can see why the numpy 1.6.0 way might be preferable but it is an API >>> > change I suppose. >>> > >>> > Best, >>> > >>> > Matthew >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion at scipy.org >>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Mon Feb 13 23:30:43 2012 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 13 Feb 2012 22:30:43 -0600 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: References: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> <4F3CCD79-7DDB-4FB9-9C2E-57759C1FB679@continuum.io> <10FB0A88-DB1F-41BF-9A1F-45BC02C5D9EB@continuum.io> Message-ID: On Monday, February 13, 2012, Charles R Harris wrote: > > > On Mon, Feb 13, 2012 at 9:04 PM, Travis Oliphant wrote: >> >> I disagree with your assessment of the subscript operator, but I'm sure we will have plenty of time to discuss that. I don't think it's correct to compare the corner cases of the fancy indexing and regular indexing to the corner cases of type coercion system. If you recall, I was quite nervous about all the changes you made to the coercion rules because I didn't believe you fully understood what had been done before and I knew there was not complete test coverage. >> It is true that both systems have emerged from a long history and could definitely use fresh perspectives which we all appreciate you and others bringing. It is also true that few are aware of the details of how things are actually implemented and that there are corner cases that are basically defined by the algorithm used (this is more true of the type-coercion system than fancy-indexing, however). >> I think it would have been wise to write those extensive tests prior to writing new code. I'm curious if what you were expecting for the output was derived from what earlier versions of NumPy produced. NumPy has never been in a state where you could just re-factor at will and assume that tests will catch all intended use cases. Numeric before it was not in that state either. This is a good goal, and we always welcome new tests. It just takes a lot of time and a lot of tedious work that the volunteer labor to this point have not had the time to do. >> Very few of us have ever been paid to work on NumPy directly and have often been trying to fit in improvements to the code base between other jobs we are supposed to be doing. Of course, you and I are hoping to change that this year and look forward to the code quality improving commensurately. >> Thanks for all you are doing. I also agree that Rolf and Charles have-been and are invaluable in the maintenance and progress of NumPy and SciPy. They deserve as much praise and kudos as anyone can give them. > > Well, the typecasting wasn't perfect and, as Mark points out, it wasn't commutative. The addition of float16 also complicated the picture, and user types is going to do more in that direction. And I don't see how a new developer should be responsible for tests enforcing old traditions, the original developers should be responsible for those. But history is history, it didn't happen that way, and here we are. > > That said, I think we need to show a little flexibility in the corner cases. And going forward I think that typecasting is going to need a rethink. > > Chuck > > On Feb 13, 2012, at 9:40 PM, Mark Wiebe wrote: > > I believe the main lessons to draw from this are just how incredibly important a complete test suite and staying on top of code reviews are. I'm of the opinion that any explicit design choice of this nature should be reflected in the test suite, so that if someone changes it years later, they get immediate feedback that they're breaking something important. NumPy has gradually increased its test suite coverage, and when I dealt with the type promotion subsystem, I added fairly extensive tests: > https://github.com/numpy/numpy/blob/master/numpy/core/tests/test_numeric.py#L345 > Another subsystem which is in a similar state as what the type promotion subsystem was, is the subscript operator and how regular/fancy indexing work. What this means is that any attempt to improve it that doesn't coincide with the original intent years ago can easily break things that were originally intended without them being caught by a test. I believe this subsystem needs improvement, and the transition to new/improved code will probably be trickier to manage than for the dtype promotion case. > Let's try to learn from the type promotion case as best we can, and use it to improve NumPy's process. I believe Charles and Ralph have been doing a great job of enforcing high standards in new NumPy code, and managing the release process in a way that has resulted in very few bugs and regressions in the release. Most of these quality standards are still informal, however, and it's probably a good idea to write them down in a canonical location. It will be especially helpful for newcomers, who can treat the standards as a checklist before submitting pull requests. > Thanks, > -Mark > > On Mon, Feb 13, 2012 at 7:11 PM, Travis Oliphant wrote: > > The problem is that these sorts of things take a while to emerge. The original system was more consistent than I think you give it credit. What you are seeing is that most people get NumPy from distributions and are relying on us to keep things consistent. > The scalar coercion rules were deterministic and based on the idea that a scalar does not determine the output dtype unless it is of a different kind. The new code changes that unfortunately. > Another thing I noticed is that I thought that int16 scalar float would produce float32 originally. This seems to have changed, but I need to check on an older version of NumPy. > Changing the scalar coercion rules is an unfortunate substantial change in semantics and should not have happened in the 1.X series. > I understand you did not get a lot of feedback and spent a lot of time on the code which we all appreciate. I worked to stay true to the Numeric casting rules incorporating the changes to prevent scalar upcasting due to the absence of single precision Numeric literals in Python. > We will need to look in detail at what has changed. I will write a test to do that. > Thanks, > Travis > -- > Travis Oliphant > (on a mobile) > 512-826-7480 > > On Feb 13, 2012, at 7:58 PM, Mark Wiebe wrote: > > On Mon, Feb 13, 2012 at 5:00 PM, Travis Oliphant wrote: > > Hmmm. This seems like a regression. The scalar casting API was fairly intentional. > > What is the reason for the change? > > In order to make 1.6 ABI-compatible with 1.5, I basically had to rewrite this subsystem. There were virtually no tests in the test suite specifying what the expected behavior should be, and there were clear inconsistencies where for example > I actually remember there being some discussion about the type coercion changes and those of us involved agreed that the old behavior was unintuitive and was likely a bug. Commutibility is important. Numpy already has one situation where users have to watch out for order (list times integer gives a new array, while an integer times list throws an exception), we really shouldn't have to keep worrying about bugs like these, and the one I just mentioned isn't even numpy's fault. I have to agree with Mark's changes, and I think he was very open about the possible impacts. It just so happened that the ones who reviewed it were probably newer users. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Mon Feb 13 23:31:43 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Mon, 13 Feb 2012 20:31:43 -0800 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: <10FB0A88-DB1F-41BF-9A1F-45BC02C5D9EB@continuum.io> References: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> <4F3CCD79-7DDB-4FB9-9C2E-57759C1FB679@continuum.io> <10FB0A88-DB1F-41BF-9A1F-45BC02C5D9EB@continuum.io> Message-ID: On Mon, Feb 13, 2012 at 8:04 PM, Travis Oliphant wrote: > I disagree with your assessment of the subscript operator, but I'm sure we > will have plenty of time to discuss that. I don't think it's correct to > compare the corner cases of the fancy indexing and regular indexing to the > corner cases of type coercion system. If you recall, I was quite nervous > about all the changes you made to the coercion rules because I didn't > believe you fully understood what had been done before and I knew there was > not complete test coverage. > > It is true that both systems have emerged from a long history and could > definitely use fresh perspectives which we all appreciate you and others > bringing. It is also true that few are aware of the details of how things > are actually implemented and that there are corner cases that are basically > defined by the algorithm used (this is more true of the type-coercion > system than fancy-indexing, however). > Likely the only way we will be able to know for certain the extent to which our opinions are accurate is to actually dig into the code. I think we can agree, however, that at the very least it could use some performance improvement. :) > I think it would have been wise to write those extensive tests prior to > writing new code. I'm curious if what you were expecting for the output > was derived from what earlier versions of NumPy produced. NumPy has > never been in a state where you could just re-factor at will and assume > that tests will catch all intended use cases. Numeric before it was not > in that state either. This is a good goal, and we always welcome new > tests. It just takes a lot of time and a lot of tedious work that the > volunteer labor to this point have not had the time to do. > I did put quite a bit of effort into maintaining compatibility, and was incredibly careful about the change we're discussing. I used something I suspect you created, the "can cast safely" table here: http://docs.scipy.org/doc/numpy/reference/ufuncs.html#casting-rules I extended it to more cases including scalar/array combinations of type promotion, and validated that 1.5 and 1.6 produced the same outputs. The script I used is here: https://github.com/numpy/numpy/blob/master/numpy/testing/print_coercion_tables.py I definitely didn't jump into the change blind, but I did approach it from a clean perspective with the willingness to try and make things better. I understand this is a delicate balance to walk, and I'd like to stress that I didn't take any of the changes I made here lightly. Very few of us have ever been paid to work on NumPy directly and have often > been trying to fit in improvements to the code base between other jobs we > are supposed to be doing. Of course, you and I are hoping to change that > this year and look forward to the code quality improving commensurately. > Well, everything I did for 1.6 that we're discussing here was volunteer work too. :) You and Enthought have all the credit for the later bit where I did get paid a little bit to do the datetime64 and NA stuff! Thanks for all you are doing. I also agree that Rolf and Charles > have-been and are invaluable in the maintenance and progress of NumPy and > SciPy. They deserve as much praise and kudos as anyone can give them. > It's great to have you back and active in the community again too. I'm sure this is improving the moods of many NumPy and SciPy users. -Mark > > -Travis > > > > On Feb 13, 2012, at 9:40 PM, Mark Wiebe wrote: > > I believe the main lessons to draw from this are just how incredibly > important a complete test suite and staying on top of code reviews are. I'm > of the opinion that any explicit design choice of this nature should be > reflected in the test suite, so that if someone changes it years later, > they get immediate feedback that they're breaking something important. > NumPy has gradually increased its test suite coverage, and when I dealt > with the type promotion subsystem, I added fairly extensive tests: > > > https://github.com/numpy/numpy/blob/master/numpy/core/tests/test_numeric.py#L345 > > Another subsystem which is in a similar state as what the type promotion > subsystem was, is the subscript operator and how regular/fancy indexing > work. What this means is that any attempt to improve it that doesn't > coincide with the original intent years ago can easily break things that > were originally intended without them being caught by a test. I believe > this subsystem needs improvement, and the transition to new/improved code > will probably be trickier to manage than for the dtype promotion case. > > Let's try to learn from the type promotion case as best we can, and use it > to improve NumPy's process. I believe Charles and Ralph have been doing a > great job of enforcing high standards in new NumPy code, and managing the > release process in a way that has resulted in very few bugs and regressions > in the release. Most of these quality standards are still informal, > however, and it's probably a good idea to write them down in a canonical > location. It will be especially helpful for newcomers, who can treat the > standards as a checklist before submitting pull requests. > > Thanks, > -Mark > > On Mon, Feb 13, 2012 at 7:11 PM, Travis Oliphant wrote: > >> The problem is that these sorts of things take a while to emerge. The >> original system was more consistent than I think you give it credit. What >> you are seeing is that most people get NumPy from distributions and are >> relying on us to keep things consistent. >> >> The scalar coercion rules were deterministic and based on the idea that a >> scalar does not determine the output dtype unless it is of a different >> kind. The new code changes that unfortunately. >> >> Another thing I noticed is that I thought that int16 scalar float >> would produce float32 originally. This seems to have changed, but I need >> to check on an older version of NumPy. >> >> Changing the scalar coercion rules is an unfortunate substantial change >> in semantics and should not have happened in the 1.X series. >> >> I understand you did not get a lot of feedback and spent a lot of time on >> the code which we all appreciate. I worked to stay true to the Numeric >> casting rules incorporating the changes to prevent scalar upcasting due to >> the absence of single precision Numeric literals in Python. >> >> We will need to look in detail at what has changed. I will write a test >> to do that. >> >> Thanks, >> >> Travis >> >> -- >> Travis Oliphant >> (on a mobile) >> 512-826-7480 >> >> >> On Feb 13, 2012, at 7:58 PM, Mark Wiebe wrote: >> >> On Mon, Feb 13, 2012 at 5:00 PM, Travis Oliphant wrote: >> >>> Hmmm. This seems like a regression. The scalar casting API was fairly >>> intentional. >>> >>> What is the reason for the change? >>> >> >> In order to make 1.6 ABI-compatible with 1.5, I basically had to rewrite >> this subsystem. There were virtually no tests in the test suite specifying >> what the expected behavior should be, and there were clear inconsistencies >> where for example "a+b" could result in a different type than "b+a". I >> recall there being some bugs in the tracker related to this as well, but I >> don't remember those details. >> >> This change felt like an obvious extension of an existing behavior for >> eliminating overflow, where the promotion changed unsigned -> signed based >> on the value of the scalar. This change introduced minimal upcasting only >> in a set of cases where an overflow was guaranteed to happen without that >> upcasting. >> >> During the 1.6 beta period, I signaled that this subsystem had changed, >> as the bullet point starting "The ufunc uses a more consistent algorithm >> for loop selection.": >> >> http://mail.scipy.org/pipermail/numpy-discussion/2011-March/055156.html >> >> The behavior Matthew has observed is a direct result of how I designed >> the minimization function mentioned in that bullet point, and the algorithm >> for it is documented in the 'Notes' section of the result_type page: >> >> http://docs.scipy.org/doc/numpy/reference/generated/numpy.result_type.html >> >> Hopefully that explains it well enough. I made the change intentionally >> and carefully, tested its impact on SciPy and other projects, and advocated >> for it during the release cycle. >> >> Cheers, >> Mark >> >> -- >>> Travis Oliphant >>> (on a mobile) >>> 512-826-7480 >>> >>> >>> On Feb 13, 2012, at 6:25 PM, Matthew Brett >>> wrote: >>> >>> > Hi, >>> > >>> > I recently noticed a change in the upcasting rules in numpy 1.6.0 / >>> > 1.6.1 and I just wanted to check it was intentional. >>> > >>> > For all versions of numpy I've tested, we have: >>> > >>> >>>> import numpy as np >>> >>>> Adata = np.array([127], dtype=np.int8) >>> >>>> Bdata = np.int16(127) >>> >>>> (Adata + Bdata).dtype >>> > dtype('int8') >>> > >>> > That is - adding an integer scalar of a larger dtype does not result >>> > in upcasting of the output dtype, if the data in the scalar type fits >>> > in the smaller. >>> > >>> > For numpy < 1.6.0 we have this: >>> > >>> >>>> Bdata = np.int16(128) >>> >>>> (Adata + Bdata).dtype >>> > dtype('int8') >>> > >>> > That is - even if the data in the scalar does not fit in the dtype of >>> > the array to which it is being added, there is no upcasting. >>> > >>> > For numpy >= 1.6.0 we have this: >>> > >>> >>>> Bdata = np.int16(128) >>> >>>> (Adata + Bdata).dtype >>> > dtype('int16') >>> > >>> > There is upcasting... >>> > >>> > I can see why the numpy 1.6.0 way might be preferable but it is an API >>> > change I suppose. >>> > >>> > Best, >>> > >>> > Matthew >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion at scipy.org >>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Mon Feb 13 23:56:40 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 13 Feb 2012 22:56:40 -0600 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: References: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> <4F3CCD79-7DDB-4FB9-9C2E-57759C1FB679@continuum.io> <10FB0A88-DB1F-41BF-9A1F-45BC02C5D9EB@continuum.io> Message-ID: These are great suggestions. I am happy to start digging into the code. I'm also happy to re-visit any and all design decisions for NumPy 2.0 (with a strong-eye towards helping people migrate and documenting the results). Mark, I think you have done an excellent job of working with a stodgy group and pushing things forward. That is a rare talent, and the world is a better place because you jumped in. There is a lot of cruft all over the place, I know. I also know a lot more now than I did 6 years ago about software design :-) I'm very excited about what we are going to be able to do with NumPy together --- and with the others in the community. But, I am also aware of *a lot* of users who never voice their opinion on this list, and a lot of features that they want and need and are currently working around the limitations of NumPy to get. These are going to be my primary focus for the rest of the 1.X series. I see at least a NumPy 1.8 at this point with maybe even a NumPy 1.9. At the same time, I am looking forward to working with you and others in the community as you lead the push toward NumPy 2.0 (which I hope is not delayed too long with all the possible discussions that can take place :-) ) Best regards, -Travis On Feb 13, 2012, at 10:31 PM, Mark Wiebe wrote: > On Mon, Feb 13, 2012 at 8:04 PM, Travis Oliphant wrote: > I disagree with your assessment of the subscript operator, but I'm sure we will have plenty of time to discuss that. I don't think it's correct to compare the corner cases of the fancy indexing and regular indexing to the corner cases of type coercion system. If you recall, I was quite nervous about all the changes you made to the coercion rules because I didn't believe you fully understood what had been done before and I knew there was not complete test coverage. > > It is true that both systems have emerged from a long history and could definitely use fresh perspectives which we all appreciate you and others bringing. It is also true that few are aware of the details of how things are actually implemented and that there are corner cases that are basically defined by the algorithm used (this is more true of the type-coercion system than fancy-indexing, however). > > Likely the only way we will be able to know for certain the extent to which our opinions are accurate is to actually dig into the code. I think we can agree, however, that at the very least it could use some performance improvement. :) > > I think it would have been wise to write those extensive tests prior to writing new code. I'm curious if what you were expecting for the output was derived from what earlier versions of NumPy produced. NumPy has never been in a state where you could just re-factor at will and assume that tests will catch all intended use cases. Numeric before it was not in that state either. This is a good goal, and we always welcome new tests. It just takes a lot of time and a lot of tedious work that the volunteer labor to this point have not had the time to do. > > I did put quite a bit of effort into maintaining compatibility, and was incredibly careful about the change we're discussing. I used something I suspect you created, the "can cast safely" table here: > > http://docs.scipy.org/doc/numpy/reference/ufuncs.html#casting-rules > > I extended it to more cases including scalar/array combinations of type promotion, and validated that 1.5 and 1.6 produced the same outputs. The script I used is here: > > https://github.com/numpy/numpy/blob/master/numpy/testing/print_coercion_tables.py > > I definitely didn't jump into the change blind, but I did approach it from a clean perspective with the willingness to try and make things better. I understand this is a delicate balance to walk, and I'd like to stress that I didn't take any of the changes I made here lightly. > > Very few of us have ever been paid to work on NumPy directly and have often been trying to fit in improvements to the code base between other jobs we are supposed to be doing. Of course, you and I are hoping to change that this year and look forward to the code quality improving commensurately. > > Well, everything I did for 1.6 that we're discussing here was volunteer work too. :) > > You and Enthought have all the credit for the later bit where I did get paid a little bit to do the datetime64 and NA stuff! > > Thanks for all you are doing. I also agree that Rolf and Charles have-been and are invaluable in the maintenance and progress of NumPy and SciPy. They deserve as much praise and kudos as anyone can give them. > > It's great to have you back and active in the community again too. I'm sure this is improving the moods of many NumPy and SciPy users. > > -Mark > > > -Travis > > > > On Feb 13, 2012, at 9:40 PM, Mark Wiebe wrote: > >> I believe the main lessons to draw from this are just how incredibly important a complete test suite and staying on top of code reviews are. I'm of the opinion that any explicit design choice of this nature should be reflected in the test suite, so that if someone changes it years later, they get immediate feedback that they're breaking something important. NumPy has gradually increased its test suite coverage, and when I dealt with the type promotion subsystem, I added fairly extensive tests: >> >> https://github.com/numpy/numpy/blob/master/numpy/core/tests/test_numeric.py#L345 >> >> Another subsystem which is in a similar state as what the type promotion subsystem was, is the subscript operator and how regular/fancy indexing work. What this means is that any attempt to improve it that doesn't coincide with the original intent years ago can easily break things that were originally intended without them being caught by a test. I believe this subsystem needs improvement, and the transition to new/improved code will probably be trickier to manage than for the dtype promotion case. >> >> Let's try to learn from the type promotion case as best we can, and use it to improve NumPy's process. I believe Charles and Ralph have been doing a great job of enforcing high standards in new NumPy code, and managing the release process in a way that has resulted in very few bugs and regressions in the release. Most of these quality standards are still informal, however, and it's probably a good idea to write them down in a canonical location. It will be especially helpful for newcomers, who can treat the standards as a checklist before submitting pull requests. >> >> Thanks, >> -Mark >> >> On Mon, Feb 13, 2012 at 7:11 PM, Travis Oliphant wrote: >> The problem is that these sorts of things take a while to emerge. The original system was more consistent than I think you give it credit. What you are seeing is that most people get NumPy from distributions and are relying on us to keep things consistent. >> >> The scalar coercion rules were deterministic and based on the idea that a scalar does not determine the output dtype unless it is of a different kind. The new code changes that unfortunately. >> >> Another thing I noticed is that I thought that int16 scalar float would produce float32 originally. This seems to have changed, but I need to check on an older version of NumPy. >> >> Changing the scalar coercion rules is an unfortunate substantial change in semantics and should not have happened in the 1.X series. >> >> I understand you did not get a lot of feedback and spent a lot of time on the code which we all appreciate. I worked to stay true to the Numeric casting rules incorporating the changes to prevent scalar upcasting due to the absence of single precision Numeric literals in Python. >> >> We will need to look in detail at what has changed. I will write a test to do that. >> >> Thanks, >> >> Travis >> >> -- >> Travis Oliphant >> (on a mobile) >> 512-826-7480 >> >> >> On Feb 13, 2012, at 7:58 PM, Mark Wiebe wrote: >> >>> On Mon, Feb 13, 2012 at 5:00 PM, Travis Oliphant wrote: >>> Hmmm. This seems like a regression. The scalar casting API was fairly intentional. >>> >>> What is the reason for the change? >>> >>> In order to make 1.6 ABI-compatible with 1.5, I basically had to rewrite this subsystem. There were virtually no tests in the test suite specifying what the expected behavior should be, and there were clear inconsistencies where for example "a+b" could result in a different type than "b+a". I recall there being some bugs in the tracker related to this as well, but I don't remember those details. >>> >>> This change felt like an obvious extension of an existing behavior for eliminating overflow, where the promotion changed unsigned -> signed based on the value of the scalar. This change introduced minimal upcasting only in a set of cases where an overflow was guaranteed to happen without that upcasting. >>> >>> During the 1.6 beta period, I signaled that this subsystem had changed, as the bullet point starting "The ufunc uses a more consistent algorithm for loop selection.": >>> >>> http://mail.scipy.org/pipermail/numpy-discussion/2011-March/055156.html >>> >>> The behavior Matthew has observed is a direct result of how I designed the minimization function mentioned in that bullet point, and the algorithm for it is documented in the 'Notes' section of the result_type page: >>> >>> http://docs.scipy.org/doc/numpy/reference/generated/numpy.result_type.html >>> >>> Hopefully that explains it well enough. I made the change intentionally and carefully, tested its impact on SciPy and other projects, and advocated for it during the release cycle. >>> >>> Cheers, >>> Mark >>> >>> -- >>> Travis Oliphant >>> (on a mobile) >>> 512-826-7480 >>> >>> >>> On Feb 13, 2012, at 6:25 PM, Matthew Brett wrote: >>> >>> > Hi, >>> > >>> > I recently noticed a change in the upcasting rules in numpy 1.6.0 / >>> > 1.6.1 and I just wanted to check it was intentional. >>> > >>> > For all versions of numpy I've tested, we have: >>> > >>> >>>> import numpy as np >>> >>>> Adata = np.array([127], dtype=np.int8) >>> >>>> Bdata = np.int16(127) >>> >>>> (Adata + Bdata).dtype >>> > dtype('int8') >>> > >>> > That is - adding an integer scalar of a larger dtype does not result >>> > in upcasting of the output dtype, if the data in the scalar type fits >>> > in the smaller. >>> > >>> > For numpy < 1.6.0 we have this: >>> > >>> >>>> Bdata = np.int16(128) >>> >>>> (Adata + Bdata).dtype >>> > dtype('int8') >>> > >>> > That is - even if the data in the scalar does not fit in the dtype of >>> > the array to which it is being added, there is no upcasting. >>> > >>> > For numpy >= 1.6.0 we have this: >>> > >>> >>>> Bdata = np.int16(128) >>> >>>> (Adata + Bdata).dtype >>> > dtype('int16') >>> > >>> > There is upcasting... >>> > >>> > I can see why the numpy 1.6.0 way might be preferable but it is an API >>> > change I suppose. >>> > >>> > Best, >>> > >>> > Matthew >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion at scipy.org >>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Tue Feb 14 00:01:53 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 13 Feb 2012 23:01:53 -0600 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: References: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> <4F3CCD79-7DDB-4FB9-9C2E-57759C1FB679@continuum.io> <10FB0A88-DB1F-41BF-9A1F-45BC02C5D9EB@continuum.io> Message-ID: <33970232-2E23-406D-BB01-1EBAE53FFD54@continuum.io> On Feb 13, 2012, at 10:14 PM, Charles R Harris wrote: > > > On Mon, Feb 13, 2012 at 9:04 PM, Travis Oliphant wrote: > I disagree with your assessment of the subscript operator, but I'm sure we will have plenty of time to discuss that. I don't think it's correct to compare the corner cases of the fancy indexing and regular indexing to the corner cases of type coercion system. If you recall, I was quite nervous about all the changes you made to the coercion rules because I didn't believe you fully understood what had been done before and I knew there was not complete test coverage. > > It is true that both systems have emerged from a long history and could definitely use fresh perspectives which we all appreciate you and others bringing. It is also true that few are aware of the details of how things are actually implemented and that there are corner cases that are basically defined by the algorithm used (this is more true of the type-coercion system than fancy-indexing, however). > > I think it would have been wise to write those extensive tests prior to writing new code. I'm curious if what you were expecting for the output was derived from what earlier versions of NumPy produced. NumPy has never been in a state where you could just re-factor at will and assume that tests will catch all intended use cases. Numeric before it was not in that state either. This is a good goal, and we always welcome new tests. It just takes a lot of time and a lot of tedious work that the volunteer labor to this point have not had the time to do. > > Very few of us have ever been paid to work on NumPy directly and have often been trying to fit in improvements to the code base between other jobs we are supposed to be doing. Of course, you and I are hoping to change that this year and look forward to the code quality improving commensurately. > > Thanks for all you are doing. I also agree that Rolf and Charles have-been and are invaluable in the maintenance and progress of NumPy and SciPy. They deserve as much praise and kudos as anyone can give them. > > > Well, the typecasting wasn't perfect and, as Mark points out, it wasn't commutative. The addition of float16 also complicated the picture, and user types is going to do more in that direction. And I don't see how a new developer should be responsible for tests enforcing old traditions, the original developers should be responsible for those. But history is history, it didn't happen that way, and here we are. > > That said, I think we need to show a little flexibility in the corner cases. And going forward I think that typecasting is going to need a rethink. > No argument on any of this. It's just that this needs to happen at NumPy 2.0, not in the NumPy 1.X series. I think requiring a re-compile is far-less onerous than changing the type-coercion subtly in a 1.5 to 1.6 release. That's my major point, and I'm surprised others are more cavalier about this. New developers are awesome, and the life-blood of a project. But, you have to respect the history of a code-base and if you are re-factoring code that might create a change in corner-cases, then you are absolutely responsible for writing the tests if they aren't there already. That is a pretty simple rule. If you are changing semantics and are not doing a new major version number that you can document the changes in, then any re-factor needs to have tests written *before* the re-factor to ensure behavior does not change. That might be annoying, for sure, and might make you curse the original author for not writing the tests you wish were already written --- but it doesn't change the fact that a released code has many, many tests already written for it in the way of applications and users. All of these are outside of the actual code-base, and may rely on behavior that you can't just change even if you think it needs to change. Bug-fixes are different, of course, but it can sometimes be difficult to discern what is a "bug" and what is just behavior that seems inappropriate. Type-coercion, in particular, can be a difficult nut to crack because NumPy doesn't always control what happens and is trying to work-within Python's stunted type-system. I've often thought that it might be easier if NumPy were more tightly integrated into Python. For example, it would be great if NumPy's Int-scalar was the same thing as Python's int. Same for float and complex. It would also be nice if you could specify scalar literals with different precisions in Python directly. I've often wished that NumPy developers had more access to all the great language people who have spent their time on IronPython, Jython, and PyPy instead. -Travis > Chuck > > On Feb 13, 2012, at 9:40 PM, Mark Wiebe wrote: > >> I believe the main lessons to draw from this are just how incredibly important a complete test suite and staying on top of code reviews are. I'm of the opinion that any explicit design choice of this nature should be reflected in the test suite, so that if someone changes it years later, they get immediate feedback that they're breaking something important. NumPy has gradually increased its test suite coverage, and when I dealt with the type promotion subsystem, I added fairly extensive tests: >> >> https://github.com/numpy/numpy/blob/master/numpy/core/tests/test_numeric.py#L345 >> >> Another subsystem which is in a similar state as what the type promotion subsystem was, is the subscript operator and how regular/fancy indexing work. What this means is that any attempt to improve it that doesn't coincide with the original intent years ago can easily break things that were originally intended without them being caught by a test. I believe this subsystem needs improvement, and the transition to new/improved code will probably be trickier to manage than for the dtype promotion case. >> >> Let's try to learn from the type promotion case as best we can, and use it to improve NumPy's process. I believe Charles and Ralph have been doing a great job of enforcing high standards in new NumPy code, and managing the release process in a way that has resulted in very few bugs and regressions in the release. Most of these quality standards are still informal, however, and it's probably a good idea to write them down in a canonical location. It will be especially helpful for newcomers, who can treat the standards as a checklist before submitting pull requests. >> >> Thanks, >> -Mark >> >> On Mon, Feb 13, 2012 at 7:11 PM, Travis Oliphant wrote: >> The problem is that these sorts of things take a while to emerge. The original system was more consistent than I think you give it credit. What you are seeing is that most people get NumPy from distributions and are relying on us to keep things consistent. >> >> The scalar coercion rules were deterministic and based on the idea that a scalar does not determine the output dtype unless it is of a different kind. The new code changes that unfortunately. >> >> Another thing I noticed is that I thought that int16 scalar float would produce float32 originally. This seems to have changed, but I need to check on an older version of NumPy. >> >> Changing the scalar coercion rules is an unfortunate substantial change in semantics and should not have happened in the 1.X series. >> >> I understand you did not get a lot of feedback and spent a lot of time on the code which we all appreciate. I worked to stay true to the Numeric casting rules incorporating the changes to prevent scalar upcasting due to the absence of single precision Numeric literals in Python. >> >> We will need to look in detail at what has changed. I will write a test to do that. >> >> Thanks, >> >> Travis >> >> -- >> Travis Oliphant >> (on a mobile) >> 512-826-7480 >> >> >> On Feb 13, 2012, at 7:58 PM, Mark Wiebe wrote: >> >>> On Mon, Feb 13, 2012 at 5:00 PM, Travis Oliphant wrote: >>> Hmmm. This seems like a regression. The scalar casting API was fairly intentional. >>> >>> What is the reason for the change? >>> >>> In order to make 1.6 ABI-compatible with 1.5, I basically had to rewrite this subsystem. There were virtually no tests in the test suite specifying what the expected behavior should be, and there were clear inconsistencies where for example "a+b" could result in a different type than "b+a". I recall there being some bugs in the tracker related to this as well, but I don't remember those details. >>> >>> This change felt like an obvious extension of an existing behavior for eliminating overflow, where the promotion changed unsigned -> signed based on the value of the scalar. This change introduced minimal upcasting only in a set of cases where an overflow was guaranteed to happen without that upcasting. >>> >>> During the 1.6 beta period, I signaled that this subsystem had changed, as the bullet point starting "The ufunc uses a more consistent algorithm for loop selection.": >>> >>> http://mail.scipy.org/pipermail/numpy-discussion/2011-March/055156.html >>> >>> The behavior Matthew has observed is a direct result of how I designed the minimization function mentioned in that bullet point, and the algorithm for it is documented in the 'Notes' section of the result_type page: >>> >>> http://docs.scipy.org/doc/numpy/reference/generated/numpy.result_type.html >>> >>> Hopefully that explains it well enough. I made the change intentionally and carefully, tested its impact on SciPy and other projects, and advocated for it during the release cycle. >>> >>> Cheers, >>> Mark >>> >>> -- >>> Travis Oliphant >>> (on a mobile) >>> 512-826-7480 >>> >>> >>> On Feb 13, 2012, at 6:25 PM, Matthew Brett wrote: >>> >>> > Hi, >>> > >>> > I recently noticed a change in the upcasting rules in numpy 1.6.0 / >>> > 1.6.1 and I just wanted to check it was intentional. >>> > >>> > For all versions of numpy I've tested, we have: >>> > >>> >>>> import numpy as np >>> >>>> Adata = np.array([127], dtype=np.int8) >>> >>>> Bdata = np.int16(127) >>> >>>> (Adata + Bdata).dtype >>> > dtype('int8') >>> > >>> > That is - adding an integer scalar of a larger dtype does not result >>> > in upcasting of the output dtype, if the data in the scalar type fits >>> > in the smaller. >>> > >>> > For numpy < 1.6.0 we have this: >>> > >>> >>>> Bdata = np.int16(128) >>> >>>> (Adata + Bdata).dtype >>> > dtype('int8') >>> > >>> > That is - even if the data in the scalar does not fit in the dtype of >>> > the array to which it is being added, there is no upcasting. >>> > >>> > For numpy >= 1.6.0 we have this: >>> > >>> >>>> Bdata = np.int16(128) >>> >>>> (Adata + Bdata).dtype >>> > dtype('int16') >>> > >>> > There is upcasting... >>> > >>> > I can see why the numpy 1.6.0 way might be preferable but it is an API >>> > change I suppose. >>> > >>> > Best, >>> > >>> > Matthew >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion at scipy.org >>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Tue Feb 14 00:25:52 2012 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 13 Feb 2012 23:25:52 -0600 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: <33970232-2E23-406D-BB01-1EBAE53FFD54@continuum.io> References: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> <4F3CCD79-7DDB-4FB9-9C2E-57759C1FB679@continuum.io> <10FB0A88-DB1F-41BF-9A1F-45BC02C5D9EB@continuum.io> <33970232-2E23-406D-BB01-1EBAE53FFD54@continuum.io> Message-ID: On Monday, February 13, 2012, Travis Oliphant wrote: > > On Feb 13, 2012, at 10:14 PM, Charles R Harris wrote: > > > On Mon, Feb 13, 2012 at 9:04 PM, Travis Oliphant wrote: >> >> I disagree with your assessment of the subscript operator, but I'm sure we will have plenty of time to discuss that. I don't think it's correct to compare the corner cases of the fancy indexing and regular indexing to the corner cases of type coercion system. If you recall, I was quite nervous about all the changes you made to the coercion rules because I didn't believe you fully understood what had been done before and I knew there was not complete test coverage. >> It is true that both systems have emerged from a long history and could definitely use fresh perspectives which we all appreciate you and others bringing. It is also true that few are aware of the details of how things are actually implemented and that there are corner cases that are basically defined by the algorithm used (this is more true of the type-coercion system than fancy-indexing, however). >> I think it would have been wise to write those extensive tests prior to writing new code. I'm curious if what you were expecting for the output was derived from what earlier versions of NumPy produced. NumPy has never been in a state where you could just re-factor at will and assume that tests will catch all intended use cases. Numeric before it was not in that state either. This is a good goal, and we always welcome new tests. It just takes a lot of time and a lot of tedious work that the volunteer labor to this point have not had the time to do. >> Very few of us have ever been paid to work on NumPy directly and have often been trying to fit in improvements to the code base between other jobs we are supposed to be doing. Of course, you and I are hoping to change that this year and look forward to the code quality improving commensurately. >> Thanks for all you are doing. I also agree that Rolf and Charles have-been and are invaluable in the maintenance and progress of NumPy and SciPy. They deserve as much praise and kudos as anyone can give them. > > Well, the typecasting wasn't perfect and, as Mark points out, it wasn't commutative. The addition of float16 also complicated the picture, and user types is going to do more in that direction. And I don't see how a new developer should be responsible for tests enforcing old traditions, the original developers should be responsible for those. But history is history, it didn't happen that way, and here we are. > > That said, I think we need to show a little flexibility in the corner cases. And going forward I think that typecasting is going to need a rethink. > > > No argument on any of this. It's just that this needs to happen at NumPy 2.0, not in the NumPy 1.X series. I think requiring a re-compile is far-less onerous than changing the type-coercion subtly in a 1.5 to 1.6 release. That's my major point, and I'm surprised others are more cavalier about this. I thought the whole datetime debacle was the impetus for binary compatibility? Also, I disagree with your "cavalier" charge here. When we looked at the rationale for the changes Mark made, the old behavior was not documented, broke commutibility, and was unexpected. So, if it walks like a duck... Now we are in an odd situation. We have undocumented old behavior, and documented new behavior. What do we do? I understand the drive to revert, but I hate the idea of putting back what I see as buggy, especially when new software may fail with old behavior. Maybe a Boolean switch defaulting to new behavior? Anybody having issues with old software could just flip the switch? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Feb 14 00:27:13 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 13 Feb 2012 22:27:13 -0700 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: <33970232-2E23-406D-BB01-1EBAE53FFD54@continuum.io> References: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> <4F3CCD79-7DDB-4FB9-9C2E-57759C1FB679@continuum.io> <10FB0A88-DB1F-41BF-9A1F-45BC02C5D9EB@continuum.io> <33970232-2E23-406D-BB01-1EBAE53FFD54@continuum.io> Message-ID: On Mon, Feb 13, 2012 at 10:01 PM, Travis Oliphant wrote: > > On Feb 13, 2012, at 10:14 PM, Charles R Harris wrote: > > > > On Mon, Feb 13, 2012 at 9:04 PM, Travis Oliphant wrote: > >> I disagree with your assessment of the subscript operator, but I'm sure >> we will have plenty of time to discuss that. I don't think it's correct to >> compare the corner cases of the fancy indexing and regular indexing to the >> corner cases of type coercion system. If you recall, I was quite nervous >> about all the changes you made to the coercion rules because I didn't >> believe you fully understood what had been done before and I knew there was >> not complete test coverage. >> >> It is true that both systems have emerged from a long history and could >> definitely use fresh perspectives which we all appreciate you and others >> bringing. It is also true that few are aware of the details of how things >> are actually implemented and that there are corner cases that are basically >> defined by the algorithm used (this is more true of the type-coercion >> system than fancy-indexing, however). >> >> I think it would have been wise to write those extensive tests prior to >> writing new code. I'm curious if what you were expecting for the output >> was derived from what earlier versions of NumPy produced. NumPy has >> never been in a state where you could just re-factor at will and assume >> that tests will catch all intended use cases. Numeric before it was not >> in that state either. This is a good goal, and we always welcome new >> tests. It just takes a lot of time and a lot of tedious work that the >> volunteer labor to this point have not had the time to do. >> >> Very few of us have ever been paid to work on NumPy directly and have >> often been trying to fit in improvements to the code base between other >> jobs we are supposed to be doing. Of course, you and I are hoping to >> change that this year and look forward to the code quality improving >> commensurately. >> >> Thanks for all you are doing. I also agree that Rolf and Charles >> have-been and are invaluable in the maintenance and progress of NumPy and >> SciPy. They deserve as much praise and kudos as anyone can give them. >> >> > Well, the typecasting wasn't perfect and, as Mark points out, it wasn't > commutative. The addition of float16 also complicated the picture, and user > types is going to do more in that direction. And I don't see how a new > developer should be responsible for tests enforcing old traditions, the > original developers should be responsible for those. But history is > history, it didn't happen that way, and here we are. > > That said, I think we need to show a little flexibility in the corner > cases. And going forward I think that typecasting is going to need a > rethink. > > > No argument on any of this. It's just that this needs to happen at NumPy > 2.0, not in the NumPy 1.X series. I think requiring a re-compile is > far-less onerous than changing the type-coercion subtly in a 1.5 to 1.6 > release. That's my major point, and I'm surprised others are more > cavalier about this. > > New developers are awesome, and the life-blood of a project. But, you > have to respect the history of a code-base and if you are re-factoring code > that might create a change in corner-cases, then you are absolutely > responsible for writing the tests if they aren't there already. That is > a pretty simple rule. > > If you are changing semantics and are not doing a new major version number > that you can document the changes in, then any re-factor needs to have > tests written *before* the re-factor to ensure behavior does not change. > That might be annoying, for sure, and might make you curse the original > author for not writing the tests you wish were already written --- but it > doesn't change the fact that a released code has many, many tests already > written for it in the way of applications and users. All of these are > outside of the actual code-base, and may rely on behavior that you can't > just change even if you think it needs to change. Bug-fixes are different, > of course, but it can sometimes be difficult to discern what is a "bug" and > what is just behavior that seems inappropriate. > > Type-coercion, in particular, can be a difficult nut to crack because > NumPy doesn't always control what happens and is trying to work-within > Python's stunted type-system. I've often thought that it might be easier > if NumPy were more tightly integrated into Python. For example, it would > be great if NumPy's Int-scalar was the same thing as Python's int. Same > for float and complex. It would also be nice if you could specify scalar > literals with different precisions in Python directly. I've often wished > that NumPy developers had more access to all the great language people who > have spent their time on IronPython, Jython, and PyPy instead. > > Going on about casting, somewhere I might still have a table I generated back around 1.3. Also, Numpy has a mixed C/Numarray system that leads to some problems, so that should be rationalized at some point. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Feb 14 00:31:17 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 13 Feb 2012 22:31:17 -0700 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: References: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> <4F3CCD79-7DDB-4FB9-9C2E-57759C1FB679@continuum.io> <10FB0A88-DB1F-41BF-9A1F-45BC02C5D9EB@continuum.io> <33970232-2E23-406D-BB01-1EBAE53FFD54@continuum.io> Message-ID: On Mon, Feb 13, 2012 at 10:25 PM, Benjamin Root wrote: > > > On Monday, February 13, 2012, Travis Oliphant wrote: > > > > On Feb 13, 2012, at 10:14 PM, Charles R Harris wrote: > > > > > > On Mon, Feb 13, 2012 at 9:04 PM, Travis Oliphant > wrote: > >> > >> I disagree with your assessment of the subscript operator, but I'm sure > we will have plenty of time to discuss that. I don't think it's correct to > compare the corner cases of the fancy indexing and regular indexing to the > corner cases of type coercion system. If you recall, I was quite nervous > about all the changes you made to the coercion rules because I didn't > believe you fully understood what had been done before and I knew there was > not complete test coverage. > >> It is true that both systems have emerged from a long history and could > definitely use fresh perspectives which we all appreciate you and others > bringing. It is also true that few are aware of the details of how things > are actually implemented and that there are corner cases that are basically > defined by the algorithm used (this is more true of the type-coercion > system than fancy-indexing, however). > >> I think it would have been wise to write those extensive tests prior to > writing new code. I'm curious if what you were expecting for the output > was derived from what earlier versions of NumPy produced. NumPy has > never been in a state where you could just re-factor at will and assume > that tests will catch all intended use cases. Numeric before it was not > in that state either. This is a good goal, and we always welcome new > tests. It just takes a lot of time and a lot of tedious work that the > volunteer labor to this point have not had the time to do. > >> Very few of us have ever been paid to work on NumPy directly and have > often been trying to fit in improvements to the code base between other > jobs we are supposed to be doing. Of course, you and I are hoping to > change that this year and look forward to the code quality improving > commensurately. > >> Thanks for all you are doing. I also agree that Rolf and Charles > have-been and are invaluable in the maintenance and progress of NumPy and > SciPy. They deserve as much praise and kudos as anyone can give them. > > > > Well, the typecasting wasn't perfect and, as Mark points out, it wasn't > commutative. The addition of float16 also complicated the picture, and user > types is going to do more in that direction. And I don't see how a new > developer should be responsible for tests enforcing old traditions, the > original developers should be responsible for those. But history is > history, it didn't happen that way, and here we are. > > > > That said, I think we need to show a little flexibility in the corner > cases. And going forward I think that typecasting is going to need a > rethink. > > > > > > No argument on any of this. It's just that this needs to happen at > NumPy 2.0, not in the NumPy 1.X series. I think requiring a re-compile is > far-less onerous than changing the type-coercion subtly in a 1.5 to 1.6 > release. That's my major point, and I'm surprised others are more > cavalier about this. > > I thought the whole datetime debacle was the impetus for binary > compatibility? Also, I disagree with your "cavalier" charge here. When we > looked at the rationale for the changes Mark made, the old behavior was not > documented, broke commutibility, and was unexpected. So, if it walks like > a duck... > > Now we are in an odd situation. We have undocumented old behavior, and > documented new behavior. What do we do? I understand the drive to revert, > but I hate the idea of putting back what I see as buggy, especially when > new software may fail with old behavior. > > Maybe a Boolean switch defaulting to new behavior? Anybody having issues > with old software could just flip the switch? > > I think we just leave it as is. If it was a big problem we would have heard screams of complaint long ago. The post that started this off wasn't even a complaint, more of a "see this". Spending time reverting or whatever would be a waste of resources, IMHO. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Tue Feb 14 01:00:35 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 14 Feb 2012 00:00:35 -0600 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: References: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> <4F3CCD79-7DDB-4FB9-9C2E-57759C1FB679@continuum.io> <10FB0A88-DB1F-41BF-9A1F-45BC02C5D9EB@continuum.io> <33970232-2E23-406D-BB01-1EBAE53FFD54@continuum.io> Message-ID: <8FF629A5-38BC-4179-9647-69924A39513D@continuum.io> > > > > No argument on any of this. It's just that this needs to happen at NumPy 2.0, not in the NumPy 1.X series. I think requiring a re-compile is far-less onerous than changing the type-coercion subtly in a 1.5 to 1.6 release. That's my major point, and I'm surprised others are more cavalier about this. > > I thought the whole datetime debacle was the impetus for binary compatibility? Also, I disagree with your "cavalier" charge here. When we looked at the rationale for the changes Mark made, the old behavior was not documented, broke commutibility, and was unexpected. So, if it walks like a duck... First of all, I don't recall the "broken commutibility" issue --- nor how long it had actually been in the code-base. So, I'm not sure how much weight to give that "problem" The problem I see with the weighting of these issues that is being implied is that 1) Requiring a re-compile is getting easier and easier as more and more people get their NumPy from distributions and not from downloads of NumPy itself. They just wait until the distribution upgrades and everything is re-compiled. 2) That same trend means that changes to run-time code (like those that can occur when type-coercion is changed) is likely to affect people much later after the discussions have taken place on the list and everyone who was involved in the discussion assumes all is fine. This sort of change should be signaled by a version change. I would like to understand what the "bugginess" was and where it was better because I think we are painting a wide-brush. Some-things I will probably agree with you were "buggy", but others are likely just different preferences. I have a script that "documents" the old-behavior. I will compare it to the new behavior and we can go from there. Certainly, there is precedent for using something like a "__future__" statement to move forward which your boolean switch implies. -Travis > > Now we are in an odd situation. We have undocumented old behavior, and documented new behavior. What do we do? I understand the drive to revert, but I hate the idea of putting back what I see as buggy, especially when new software may fail with old behavior. > > Maybe a Boolean switch defaulting to new behavior? Anybody having issues with old software could just flip the switch? > > Ben Root _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From travis at continuum.io Tue Feb 14 01:07:34 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 14 Feb 2012 00:07:34 -0600 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: References: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> <4F3CCD79-7DDB-4FB9-9C2E-57759C1FB679@continuum.io> <10FB0A88-DB1F-41BF-9A1F-45BC02C5D9EB@continuum.io> <33970232-2E23-406D-BB01-1EBAE53FFD54@continuum.io> Message-ID: <0F4F764E-878A-454D-B58B-354E5EE5EB0E@continuum.io> > > > > No argument on any of this. It's just that this needs to happen at NumPy 2.0, not in the NumPy 1.X series. I think requiring a re-compile is far-less onerous than changing the type-coercion subtly in a 1.5 to 1.6 release. That's my major point, and I'm surprised others are more cavalier about this. > > I thought the whole datetime debacle was the impetus for binary compatibility? Also, I disagree with your "cavalier" charge here. When we looked at the rationale for the changes Mark made, the old behavior was not documented, broke commutibility, and was unexpected. So, if it walks like a duck... > > Now we are in an odd situation. We have undocumented old behavior, and documented new behavior. What do we do? I understand the drive to revert, but I hate the idea of putting back what I see as buggy, especially when new software may fail with old behavior. > > Maybe a Boolean switch defaulting to new behavior? Anybody having issues with old software could just flip the switch? > > > I think we just leave it as is. If it was a big problem we would have heard screams of complaint long ago. The post that started this off wasn't even a complaint, more of a "see this". Spending time reverting or whatever would be a waste of resources, IMHO. > > Chuck You might be right, Chuck. I would like to investigate more, however. What I fear is that there are *a lot* of users still on NumPy 1.3 and NumPy 1.5. The fact that we haven't heard any complaints, yet, does not mean to me that we aren't creating headache for people later who have just not had time to try things. However, I can believe that the specifics of "minor" casting rules are probably not relied upon by a lot of codes out there. Still, as Robert Kern often reminds us well --- our intuitions about this are usually not worth much. I may be making more of this then it's worth, I realize. I was just sensitive to it at the time things were changing (even though I didn't have time to be vocal), and now hearing this users experience, it confirms my bias... Believe me, I do not want to "revert" if at all possible. There is plenty of more work to do, and I'm very much in favor of the spirit of the work Mark was and is doing. Best regards, -Travis > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Feb 14 01:07:37 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 13 Feb 2012 23:07:37 -0700 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: <8FF629A5-38BC-4179-9647-69924A39513D@continuum.io> References: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> <4F3CCD79-7DDB-4FB9-9C2E-57759C1FB679@continuum.io> <10FB0A88-DB1F-41BF-9A1F-45BC02C5D9EB@continuum.io> <33970232-2E23-406D-BB01-1EBAE53FFD54@continuum.io> <8FF629A5-38BC-4179-9647-69924A39513D@continuum.io> Message-ID: On Mon, Feb 13, 2012 at 11:00 PM, Travis Oliphant wrote: > > > > > > > No argument on any of this. It's just that this needs to happen at > NumPy 2.0, not in the NumPy 1.X series. I think requiring a re-compile is > far-less onerous than changing the type-coercion subtly in a 1.5 to 1.6 > release. That's my major point, and I'm surprised others are more > cavalier about this. > > > > I thought the whole datetime debacle was the impetus for binary > compatibility? Also, I disagree with your "cavalier" charge here. When we > looked at the rationale for the changes Mark made, the old behavior was not > documented, broke commutibility, and was unexpected. So, if it walks like > a duck... > > First of all, I don't recall the "broken commutibility" issue --- nor how > long it had actually been in the code-base. So, I'm not sure how much > weight to give that "problem" > > The problem I see with the weighting of these issues that is being implied > is that > > 1) Requiring a re-compile is getting easier and easier as more and > more people get their NumPy from distributions and not from downloads of > NumPy itself. They just wait until the distribution upgrades and > everything is re-compiled. > 2) That same trend means that changes to run-time code (like those > that can occur when type-coercion is changed) is likely to affect people > much later after the discussions have taken place on the list and everyone > who was involved in the discussion assumes all is fine. > > This sort of change should be signaled by a version change. I would > like to understand what the "bugginess" was and where it was better because > I think we are painting a wide-brush. Some-things I will probably agree > with you were "buggy", but others are likely just different preferences. > > I have a script that "documents" the old-behavior. I will compare it to > the new behavior and we can go from there. Certainly, there is precedent > for using something like a "__future__" statement to move forward which > your boolean switch implies. > > Let it go, Travis. It's a waste of time. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Feb 14 01:38:18 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 13 Feb 2012 23:38:18 -0700 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: <0F4F764E-878A-454D-B58B-354E5EE5EB0E@continuum.io> References: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> <4F3CCD79-7DDB-4FB9-9C2E-57759C1FB679@continuum.io> <10FB0A88-DB1F-41BF-9A1F-45BC02C5D9EB@continuum.io> <33970232-2E23-406D-BB01-1EBAE53FFD54@continuum.io> <0F4F764E-878A-454D-B58B-354E5EE5EB0E@continuum.io> Message-ID: On Mon, Feb 13, 2012 at 11:07 PM, Travis Oliphant wrote: > > > >> > No argument on any of this. It's just that this needs to happen at >> NumPy 2.0, not in the NumPy 1.X series. I think requiring a re-compile is >> far-less onerous than changing the type-coercion subtly in a 1.5 to 1.6 >> release. That's my major point, and I'm surprised others are more >> cavalier about this. >> >> I thought the whole datetime debacle was the impetus for binary >> compatibility? Also, I disagree with your "cavalier" charge here. When we >> looked at the rationale for the changes Mark made, the old behavior was not >> documented, broke commutibility, and was unexpected. So, if it walks like >> a duck... >> >> Now we are in an odd situation. We have undocumented old behavior, and >> documented new behavior. What do we do? I understand the drive to revert, >> but I hate the idea of putting back what I see as buggy, especially when >> new software may fail with old behavior. >> >> Maybe a Boolean switch defaulting to new behavior? Anybody having issues >> with old software could just flip the switch? >> >> > I think we just leave it as is. If it was a big problem we would have > heard screams of complaint long ago. The post that started this off wasn't > even a complaint, more of a "see this". Spending time reverting or whatever > would be a waste of resources, IMHO. > > Chuck > > > You might be right, Chuck. I would like to investigate more, however. > > What I fear is that there are *a lot* of users still on NumPy 1.3 and > NumPy 1.5. The fact that we haven't heard any complaints, yet, does not > mean to me that we aren't creating headache for people later who have just > not had time to try things. > > However, I can believe that the specifics of "minor" casting rules are > probably not relied upon by a lot of codes out there. Still, as Robert > Kern often reminds us well --- our intuitions about this are usually not > worth much. > > I may be making more of this then it's worth, I realize. I was just > sensitive to it at the time things were changing (even though I didn't have > time to be vocal), and now hearing this users experience, it confirms my > bias... Believe me, I do not want to "revert" if at all possible. There > is plenty of more work to do, and I'm very much in favor of the spirit of > the work Mark was and is doing. > > I think writing tests would be more productive. The current coverage is skimpy in that we typically don't cover *all* the combinations. Sometimes we don't cover any of them ;) I know you are sensitive to the typecasting, it was one of your babies. Nevertheless, I don't think it is that big an issue at the moment. If you can think of ways to *improve* it I think everyone will be interested in that. The lack of commutativity wasn't in precision, it was in the typecodes, and was there from the beginning. That caused confusion. A current cause of confusion is the many to one relation of, say, int32 and long, longlong which varies platform to platform. I think that confusion is a more significant problem. Having some types derived from Python types, a correspondence that also varies platform to platform is another source of inconsistent behavior that can be confusing. So there are still plenty of issues to deal with. I'd like to point out that the addition of float16 necessitated a certain amount of rewriting, as well as the addition of datetime. It was only through Mark's work that we were able to include the latter in the 1.* series at all. Before, we always had to remove datetime before a release, a royal PITA, while waiting on the ever receding 2.0. So there were very good reasons to deal with the type system. That isn't to say that typecasting can't use some tweaks here and there, I think we are all open to discussion along those lines. But it should about specific cases. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Tue Feb 14 01:48:44 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Mon, 13 Feb 2012 22:48:44 -0800 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: References: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> <4F3CCD79-7DDB-4FB9-9C2E-57759C1FB679@continuum.io> <10FB0A88-DB1F-41BF-9A1F-45BC02C5D9EB@continuum.io> <33970232-2E23-406D-BB01-1EBAE53FFD54@continuum.io> <0F4F764E-878A-454D-B58B-354E5EE5EB0E@continuum.io> Message-ID: On Mon, Feb 13, 2012 at 10:38 PM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > On Mon, Feb 13, 2012 at 11:07 PM, Travis Oliphant wrote: > >> >> > >>> > No argument on any of this. It's just that this needs to happen at >>> NumPy 2.0, not in the NumPy 1.X series. I think requiring a re-compile is >>> far-less onerous than changing the type-coercion subtly in a 1.5 to 1.6 >>> release. That's my major point, and I'm surprised others are more >>> cavalier about this. >>> >>> I thought the whole datetime debacle was the impetus for binary >>> compatibility? Also, I disagree with your "cavalier" charge here. When we >>> looked at the rationale for the changes Mark made, the old behavior was not >>> documented, broke commutibility, and was unexpected. So, if it walks like >>> a duck... >>> >>> Now we are in an odd situation. We have undocumented old behavior, and >>> documented new behavior. What do we do? I understand the drive to revert, >>> but I hate the idea of putting back what I see as buggy, especially when >>> new software may fail with old behavior. >>> >>> Maybe a Boolean switch defaulting to new behavior? Anybody having >>> issues with old software could just flip the switch? >>> >>> >> I think we just leave it as is. If it was a big problem we would have >> heard screams of complaint long ago. The post that started this off wasn't >> even a complaint, more of a "see this". Spending time reverting or whatever >> would be a waste of resources, IMHO. >> >> Chuck >> >> >> You might be right, Chuck. I would like to investigate more, however. >> >> What I fear is that there are *a lot* of users still on NumPy 1.3 and >> NumPy 1.5. The fact that we haven't heard any complaints, yet, does not >> mean to me that we aren't creating headache for people later who have just >> not had time to try things. >> >> However, I can believe that the specifics of "minor" casting rules are >> probably not relied upon by a lot of codes out there. Still, as Robert >> Kern often reminds us well --- our intuitions about this are usually not >> worth much. >> >> I may be making more of this then it's worth, I realize. I was just >> sensitive to it at the time things were changing (even though I didn't have >> time to be vocal), and now hearing this users experience, it confirms my >> bias... Believe me, I do not want to "revert" if at all possible. There >> is plenty of more work to do, and I'm very much in favor of the spirit of >> the work Mark was and is doing. >> >> > I think writing tests would be more productive. The current coverage is > skimpy in that we typically don't cover *all* the combinations. Sometimes > we don't cover any of them ;) I know you are sensitive to the typecasting, > it was one of your babies. Nevertheless, I don't think it is that big an > issue at the moment. If you can think of ways to *improve* it I think > everyone will be interested in that. > > The lack of commutativity wasn't in precision, it was in the typecodes, > and was there from the beginning. That caused confusion. A current cause of > confusion is the many to one relation of, say, int32 and long, longlong > which varies platform to platform. I think that confusion is a more > significant problem. Having some types derived from Python types, a > correspondence that also varies platform to platform is another source of > inconsistent behavior that can be confusing. So there are still plenty of > issues to deal with. > This reminds me of something that it would be really nice for the bug tracker to have - user votes. This might be a particularly good way to draw in some more of the users who don't want to stick their neck out with emails and comments, put are comfortable adding a vote to a bug. Something like this: http://code.google.com/p/googleappengine/issues/detail?id=190 where it says that 566 people have starred the issue. -Mark > > I'd like to point out that the addition of float16 necessitated a certain > amount of rewriting, as well as the addition of datetime. It was only > through Mark's work that we were able to include the latter in the 1.* > series at all. Before, we always had to remove datetime before a release, a > royal PITA, while waiting on the ever receding 2.0. So there were very good > reasons to deal with the type system. > > That isn't to say that typecasting can't use some tweaks here and there, I > think we are all open to discussion along those lines. But it should about > specific cases. > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matteo.malosio at itia.cnr.it Tue Feb 14 01:53:10 2012 From: matteo.malosio at itia.cnr.it (Matteo Malosio) Date: Tue, 14 Feb 2012 07:53:10 +0100 Subject: [Numpy-discussion] numpy.arange() error? In-Reply-To: <4F3A03D4.3070903@itia.cnr.it> References: <4F3A03D4.3070903@itia.cnr.it> Message-ID: <4F3A04D6.3070500@itia.cnr.it> I think the problem is quite easy to solve, without changing the "documentation" behaviour. The doc says: Help on built-in function arange in module numpy.core.multiarray: / arange(...) arange([start,] stop[, step,], dtype=None) Return evenly spaced values within a given interval. Values are generated within the half-open interval ``[start, stop)`` (in other words, the interval including `start` but excluding `stop`). For integer arguments the function is equivalent to the Python built-in `range `_ function, but returns a ndarray rather than a list. / stop is exclusive "by definition". So substracting a very small value to stop when processing "stop" I think is the best way. Matteo Il 10/02/2012 02:22, Drew Frank ha scritto: > On Thu, Feb 9, 2012 at 3:40 PM, Benjamin Root > wrote: > > > > On Thursday, February 9, 2012, Sturla Molden > wrote: > > > > > > Den 9. feb. 2012 kl. 22:44 skrev eat >: > > > >> > > Maybe this issue is raised also earlier, but wouldn't it be more > consistent to let arange operate only with integers (like Python's > range) and let linspace handle the floats as well? > > > > > > Perhaps. Another possibility would be to let arange take decimal > arguments, possibly entered as text strings. > > Sturla > > > Personally, I treat arange() to mean, "give me a sequence of > values from x to y, exclusive, with a specific step size". > Nowhere in that statement does it guarantee a particular number > of elements. Whereas linspace() means, "give me a sequence of > evenly spaced numbers from x to y, optionally inclusive, such that > there are exactly N elements". They complement each other well. > > > I agree -- both functions are useful and I think about them the same > way. The unfortunate part is that tiny precision errors in y can make > arange appear to be "sometimes-exclusive" rather than always > exclusive. I've always imagined there to be a sort of duality between > the two functions, where arange(low, high, step) == linspace(low, > high-step, round((high-low)/step)) in cases where (high - low)/step is > integral, but it turns out this is not the case. > > > There are times when I intentionally will specify a range where > the step size will not nicely fit. i.e.- np.arange(1, 7, 3.5). I > wouldn't want this to change. > > > Nor would I. What I meant to express earlier is that I like how > Matlab addresses this particular class of floating point precision > errors, not that I think arange output should somehow include both > endpoints. > > > My vote is that if users want matlab-colon-like behavior, we could > make a new function - maybe erange() for "exact range"? > > > Ben Root > > > That could work; it would completely replace arange for me in every > circumstance I can think of, but I understand we can't just go > changing the behavior of core functions. > > Drew > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- ------------------------------------------------------- Matteo Malosio, Eng. Researcher ITIA-CNR (www.itia.cnr.it) Institute of Industrial Technologies and Automation National Research Council via Bassini 15, 20133 MILANO, ITALY Ph: +39 0223699625 Fax: +39 0223699925 e-mail:matteo.malosio at itia.cnr.it ------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Tue Feb 14 02:02:57 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Mon, 13 Feb 2012 23:02:57 -0800 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: References: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> <4F3CCD79-7DDB-4FB9-9C2E-57759C1FB679@continuum.io> <10FB0A88-DB1F-41BF-9A1F-45BC02C5D9EB@continuum.io> <33970232-2E23-406D-BB01-1EBAE53FFD54@continuum.io> <0F4F764E-878A-454D-B58B-354E5EE5EB0E@continuum.io> Message-ID: On Mon, Feb 13, 2012 at 10:48 PM, Mark Wiebe wrote: > On Mon, Feb 13, 2012 at 10:38 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Mon, Feb 13, 2012 at 11:07 PM, Travis Oliphant wrote: >> >>> >>> > >>>> > No argument on any of this. It's just that this needs to happen at >>>> NumPy 2.0, not in the NumPy 1.X series. I think requiring a re-compile is >>>> far-less onerous than changing the type-coercion subtly in a 1.5 to 1.6 >>>> release. That's my major point, and I'm surprised others are more >>>> cavalier about this. >>>> >>>> I thought the whole datetime debacle was the impetus for binary >>>> compatibility? Also, I disagree with your "cavalier" charge here. When we >>>> looked at the rationale for the changes Mark made, the old behavior was not >>>> documented, broke commutibility, and was unexpected. So, if it walks like >>>> a duck... >>>> >>>> Now we are in an odd situation. We have undocumented old behavior, and >>>> documented new behavior. What do we do? I understand the drive to revert, >>>> but I hate the idea of putting back what I see as buggy, especially when >>>> new software may fail with old behavior. >>>> >>>> Maybe a Boolean switch defaulting to new behavior? Anybody having >>>> issues with old software could just flip the switch? >>>> >>>> >>> I think we just leave it as is. If it was a big problem we would have >>> heard screams of complaint long ago. The post that started this off wasn't >>> even a complaint, more of a "see this". Spending time reverting or whatever >>> would be a waste of resources, IMHO. >>> >>> Chuck >>> >>> >>> You might be right, Chuck. I would like to investigate more, however. >>> >>> What I fear is that there are *a lot* of users still on NumPy 1.3 and >>> NumPy 1.5. The fact that we haven't heard any complaints, yet, does not >>> mean to me that we aren't creating headache for people later who have just >>> not had time to try things. >>> >>> However, I can believe that the specifics of "minor" casting rules are >>> probably not relied upon by a lot of codes out there. Still, as Robert >>> Kern often reminds us well --- our intuitions about this are usually not >>> worth much. >>> >>> I may be making more of this then it's worth, I realize. I was just >>> sensitive to it at the time things were changing (even though I didn't have >>> time to be vocal), and now hearing this users experience, it confirms my >>> bias... Believe me, I do not want to "revert" if at all possible. There >>> is plenty of more work to do, and I'm very much in favor of the spirit of >>> the work Mark was and is doing. >>> >>> >> I think writing tests would be more productive. The current coverage is >> skimpy in that we typically don't cover *all* the combinations. Sometimes >> we don't cover any of them ;) I know you are sensitive to the typecasting, >> it was one of your babies. Nevertheless, I don't think it is that big an >> issue at the moment. If you can think of ways to *improve* it I think >> everyone will be interested in that. >> >> The lack of commutativity wasn't in precision, it was in the typecodes, >> and was there from the beginning. That caused confusion. A current cause of >> confusion is the many to one relation of, say, int32 and long, longlong >> which varies platform to platform. I think that confusion is a more >> significant problem. Having some types derived from Python types, a >> correspondence that also varies platform to platform is another source of >> inconsistent behavior that can be confusing. So there are still plenty of >> issues to deal with. >> > > This reminds me of something that it would be really nice for the bug > tracker to have - user votes. This might be a particularly good way to draw > in some more of the users who don't want to stick their neck out with > emails and comments, put are comfortable adding a vote to a bug. Something > like this: > > http://code.google.com/p/googleappengine/issues/detail?id=190 > > where it says that 566 people have starred the issue. > Here's how this feature looks in YouTrack: http://youtrack.jetbrains.net/issues?q=sort+by%3Avotes -Mark > > -Mark > > >> >> I'd like to point out that the addition of float16 necessitated a certain >> amount of rewriting, as well as the addition of datetime. It was only >> through Mark's work that we were able to include the latter in the 1.* >> series at all. Before, we always had to remove datetime before a release, a >> royal PITA, while waiting on the ever receding 2.0. So there were very good >> reasons to deal with the type system. >> >> That isn't to say that typecasting can't use some tweaks here and there, >> I think we are all open to discussion along those lines. But it should >> about specific cases. >> >> Chuck >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Tue Feb 14 02:22:30 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 14 Feb 2012 01:22:30 -0600 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: References: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> <4F3CCD79-7DDB-4FB9-9C2E-57759C1FB679@continuum.io> <10FB0A88-DB1F-41BF-9A1F-45BC02C5D9EB@continuum.io> <33970232-2E23-406D-BB01-1EBAE53FFD54@continuum.io> <0F4F764E-878A-454D-B58B-354E5EE5EB0E@continuum.io> Message-ID: > > You might be right, Chuck. I would like to investigate more, however. > > What I fear is that there are *a lot* of users still on NumPy 1.3 and NumPy 1.5. The fact that we haven't heard any complaints, yet, does not mean to me that we aren't creating headache for people later who have just not had time to try things. > > However, I can believe that the specifics of "minor" casting rules are probably not relied upon by a lot of codes out there. Still, as Robert Kern often reminds us well --- our intuitions about this are usually not worth much. > > I may be making more of this then it's worth, I realize. I was just sensitive to it at the time things were changing (even though I didn't have time to be vocal), and now hearing this users experience, it confirms my bias... Believe me, I do not want to "revert" if at all possible. There is plenty of more work to do, and I'm very much in favor of the spirit of the work Mark was and is doing. > > > I think writing tests would be more productive. The current coverage is skimpy in that we typically don't cover *all* the combinations. Sometimes we don't cover any of them ;) I know you are sensitive to the typecasting, it was one of your babies. Nevertheless, I don't think it is that big an issue at the moment. If you can think of ways to *improve* it I think everyone will be interested in that. First of all, I would hardly call it one of my babies. I care far more for my actual babies than for this. It was certainly one of my headaches that I had to deal with and write code for (and take into account previous behavior with). I certainly spent a lot of time wrestling with type-coercion and integrating numerous opinions as quickly as I could with it --- even in Numeric with the funny down_casting arrays. At best the resulting system was a compromise (with an implementation that you could reason about with the right perspective despite claims to the contrary). This discussion is not about me being sensitive because I wrote some code or had a hand in a design that needed changing. I hope we replace all the code I've written with something better. I expect that eventually. This just has to be done in an appropriate way. I'm sensitive because I understand where the previous code came from and *why it was written* and am concerned about changing things out from under users in ways that are subtle. I continue to affirm that breaking ABI compatibility is much preferable to changing type-casting behavior. I know people disagree with me. But, distributions help solve the "ABI compatibility problem", but nothing solves required code changes due to subtle type-casting issues. I would just expect this sort of change at NumPy 2.0. We could have waited for half-float until then. I will send the result of my analysis shortly on what changed between 1.5.1 and 1.6.1 -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From kikocorreoso at gmail.com Tue Feb 14 02:35:08 2012 From: kikocorreoso at gmail.com (Kiko) Date: Tue, 14 Feb 2012 08:35:08 +0100 Subject: [Numpy-discussion] Fwd: Re: Creating parallel curves In-Reply-To: References: Message-ID: 2012/2/13 Andrea Gavana > ---------- Forwarded message ---------- > From: "Andrea Gavana" > Date: Feb 13, 2012 11:31 PM > Subject: Re: [Numpy-discussion] Creating parallel curves > To: "Jonathan Hilmer" > > Thank you Jonathan for this, it's exactly what I was looking for. I' ll > try it tomorrow on the 768 well trajectories I have and I'll let you know > if I stumble upon any issue. > > If someone could shed some light on my problem number 2 (how to adjust the > scaling/distance) so that the curves look parallel on a matplotlib graph > even though the axes scales are different, I'd be more than grateful. > > Thank you in advance. > Hi. Maybe this could help you as a starting point. *from Shapely.geometry import LineString from matplotlib import pyplot myline = LineString(...) x, y = myline.xy xx, yy = myline.buffer(distancefrommyline).exterior.xy # coordinates around myline pyplot.plot(x, y) pyplot.plot(xx,yy) pyplot.show()* Best. -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Tue Feb 14 02:46:16 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 14 Feb 2012 01:46:16 -0600 Subject: [Numpy-discussion] Typecasting changes from 1.5.1 to 1.6.1 Message-ID: Here is the code I used to determine the coercion table of types. I first used *all* of the numeric_ops, narrowed it down to those with 2 inputs and 1 output, and then determined the run-time coercion table. Then, I removed ops that had the same tables until I was left with binary ops that had different coercion tables. Some operations were NotImplemented and I used 'X' in the table for those combinations. The table for each op is a dictionary with keys given by (type1, type2) and values given by a length-4 list of the types of the result between: [scalar-scalar, scalar-array, array-scalar, array-array] where the first term is type1 and the second term is type2. This resulting dictionary of tables for each op is then saved to a file. I ran this code for NumPy 1.5.1 64-bit and then again for NumPy 1.6.1 64-bit. I also ran this code for NumPy 1.4.1 64-bit and NumPy 1.3.1.dev 64-bit. The code to compare them is also attached. I'm attaching also the changes that have occurred between 1.3.1.dev and 1.4.1, 1.4.1 to 1.5.1, and finally 1.5.1 to 1.6.1 As you can see there were changes in each release. Most of these were minor prior to the change from 1.5.1 to 1.6.1. I am still reviewing the changes from 1.5.1 to 1.6.1. At first blush, it looks like there are a lot of changes to swallow that are not necessarily minor. I really would like to just say all is well, and it's no big deal. I hope that users really don't care and nobody's code is really relying on array-scalar combination conversions. -Travis -------------- next part -------------- A non-text attachment was scrubbed... Name: coercion_test.py Type: text/x-python-script Size: 1864 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: coercion_compare.py Type: text/x-python-script Size: 1026 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1.3.1.dev_to_1.4.1 Type: application/octet-stream Size: 19026 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1.4.1_to_1.5.1 Type: application/octet-stream Size: 3246 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 1.5.1_to_1.6.1 Type: application/octet-stream Size: 35266 bytes Desc: not available URL: -------------- next part -------------- From travis at continuum.io Tue Feb 14 02:58:33 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 14 Feb 2012 01:58:33 -0600 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: References: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> <4F3CCD79-7DDB-4FB9-9C2E-57759C1FB679@continuum.io> <10FB0A88-DB1F-41BF-9A1F-45BC02C5D9EB@continuum.io> <33970232-2E23-406D-BB01-1EBAE53FFD54@continuum.io> <0F4F764E-878A-454D-B58B-354E5EE5EB0E@continuum.io> Message-ID: <934DC5A9-14BD-40BC-A616-83C6FAF9FA33@continuum.io> > > The lack of commutativity wasn't in precision, it was in the typecodes, and was there from the beginning. That caused confusion. A current cause of confusion is the many to one relation of, say, int32 and long, longlong which varies platform to platform. I think that confusion is a more significant problem. Having some types derived from Python types, a correspondence that also varies platform to platform is another source of inconsistent behavior that can be > confusing. So there are still plenty of issues to deal with I didn't think it was in the precision. I knew what you meant. However, I'm still hoping for an example of what you mean by "lack of commutativity in the typecodes". The confusion of long and longlong varying from platform to platform comes from C. The whole point of having long and longlong is to ensure that you can specify the same types in Python that you would in C. They should not be used if you don't care about that. Deriving from Python types for some array-scalars is an issue. I don't like that either. However, Python itself special-cases it's scalars in ways that necessitated it to have some use-cases not fall-over. This shows a limitation of Python. I would prefer that all array-scalars were recognized appropriately by the Python type system. Most of the concerns that you mention here are mis-understandings. Maybe there are solutions that "fix" the problem without just educating people. I am open to them. I do think that it was a mistake to have the intp and uintp dtypes as *separate* dtypes. They should have just mapped to the right one. I think it was also a mistake to have dtypes for all the C-spellings instead of just a dtype for each different bit-length with an alias for the C-spellings. We should change that in NumPy 2.0. -Travis > > I'd like to point out that the addition of float16 necessitated a certain amount of rewriting, as well as the addition of datetime. It was only through Mark's work that we were able to include the latter in the 1.* series at all. Before, we always had to remove datetime before a release, a royal PITA, while waiting on the ever receding 2.0. So there were very good reasons to deal with the type system. > > That isn't to say that typecasting can't use some tweaks here and there, I think we are all open to discussion along those lines. But it should about specific cases. > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From efiring at hawaii.edu Tue Feb 14 03:05:38 2012 From: efiring at hawaii.edu (Eric Firing) Date: Mon, 13 Feb 2012 22:05:38 -1000 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: References: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> <4F3CCD79-7DDB-4FB9-9C2E-57759C1FB679@continuum.io> <10FB0A88-DB1F-41BF-9A1F-45BC02C5D9EB@continuum.io> <33970232-2E23-406D-BB01-1EBAE53FFD54@continuum.io> <8FF629A5-38BC-4179-9647-69924A39513D@continuum.io> Message-ID: <4F3A15D2.4060504@hawaii.edu> On 02/13/2012 08:07 PM, Charles R Harris wrote: > > > Let it go, Travis. It's a waste of time. (Off-list) Chuck, I really appreciate your consistent good sense; this is just one of many examples. Thank you for all your numpy work. Eric From martin.raspaud at smhi.se Tue Feb 14 03:44:53 2012 From: martin.raspaud at smhi.se (Martin Raspaud) Date: Tue, 14 Feb 2012 09:44:53 +0100 Subject: [Numpy-discussion] Numpy 1.6.1 installation problem Message-ID: <4F3A1F05.10609@smhi.se> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all, I am trying to compile numpy 1.6.1 from source on a Redhat Linux enterprise 6 machine, and I get a problem with Python.h : somehow it can't be located by numpy's install script: SystemError: Cannot compile 'Python.h'. Perhaps you need to install python-dev|python-devel. Now, the trick is that python-devel IS installed: bash-4.1$ rpm -qa | grep python-dev python-devel-2.6.6-29.el6.x86_64 and Python.h is logically in /usr/include/python2.6 Anyone got a clue ? The full log is included below. Best regards, Martin bash-4.1$ python setup.py build --fcompiler=gnu95 Running from numpy source directory.non-existing path in 'numpy/distutils': 'site.cfg' F2PY Version 2 blas_opt_info: blas_mkl_info: libraries mkl,vml,guide not found in /usr/local/lib64 libraries mkl,vml,guide not found in /usr/local/lib libraries mkl,vml,guide not found in /usr/lib64 libraries mkl,vml,guide not found in /usr/lib NOT AVAILABLE atlas_blas_threads_info: Setting PTATLAS=ATLAS libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib64 libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib libraries ptf77blas,ptcblas,atlas not found in /usr/lib64/atlas libraries ptf77blas,ptcblas,atlas not found in /usr/lib64/sse2 libraries ptf77blas,ptcblas,atlas not found in /usr/lib64 libraries ptf77blas,ptcblas,atlas not found in /usr/lib/sse2 libraries ptf77blas,ptcblas,atlas not found in /usr/lib NOT AVAILABLE atlas_blas_info: libraries f77blas,cblas,atlas not found in /usr/local/lib64 libraries f77blas,cblas,atlas not found in /usr/local/lib libraries f77blas,cblas,atlas not found in /usr/lib64/atlas libraries f77blas,cblas,atlas not found in /usr/lib64/sse2 libraries f77blas,cblas,atlas not found in /usr/lib64 libraries f77blas,cblas,atlas not found in /usr/lib/sse2 libraries f77blas,cblas,atlas not found in /usr/lib NOT AVAILABLE /data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/system_info.py:1414: UserWarning: Atlas (http://math-atlas.sourceforge.net/) libraries not found. Directories to search for the libraries can be specified in the numpy/distutils/site.cfg file (section [atlas]) or by setting the ATLAS environment variable. warnings.warn(AtlasNotFoundError.__doc__) blas_info: libraries blas not found in /usr/local/lib64 libraries blas not found in /usr/local/lib libraries blas not found in /usr/lib64 libraries blas not found in /usr/lib NOT AVAILABLE /data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/system_info.py:1423: UserWarning: Blas (http://www.netlib.org/blas/) libraries not found. Directories to search for the libraries can be specified in the numpy/distutils/site.cfg file (section [blas]) or by setting the BLAS environment variable. warnings.warn(BlasNotFoundError.__doc__) blas_src_info: NOT AVAILABLE /data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/system_info.py:1426: UserWarning: Blas (http://www.netlib.org/blas/) sources not found. Directories to search for the sources can be specified in the numpy/distutils/site.cfg file (section [blas_src]) or by setting the BLAS_SRC environment variable. warnings.warn(BlasSrcNotFoundError.__doc__) NOT AVAILABLE lapack_opt_info: lapack_mkl_info: mkl_info: libraries mkl,vml,guide not found in /usr/local/lib64 libraries mkl,vml,guide not found in /usr/local/lib libraries mkl,vml,guide not found in /usr/lib64 libraries mkl,vml,guide not found in /usr/lib NOT AVAILABLE NOT AVAILABLE atlas_threads_info: Setting PTATLAS=ATLAS libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib64 libraries lapack_atlas not found in /usr/local/lib64 libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib libraries lapack_atlas not found in /usr/local/lib libraries ptf77blas,ptcblas,atlas not found in /usr/lib64/atlas libraries lapack_atlas not found in /usr/lib64/atlas libraries ptf77blas,ptcblas,atlas not found in /usr/lib64/sse2 libraries lapack_atlas not found in /usr/lib64/sse2 libraries ptf77blas,ptcblas,atlas not found in /usr/lib64 libraries lapack_atlas not found in /usr/lib64 libraries ptf77blas,ptcblas,atlas not found in /usr/lib/sse2 libraries lapack_atlas not found in /usr/lib/sse2 libraries ptf77blas,ptcblas,atlas not found in /usr/lib libraries lapack_atlas not found in /usr/lib numpy.distutils.system_info.atlas_threads_info NOT AVAILABLE atlas_info: libraries f77blas,cblas,atlas not found in /usr/local/lib64 libraries lapack_atlas not found in /usr/local/lib64 libraries f77blas,cblas,atlas not found in /usr/local/lib libraries lapack_atlas not found in /usr/local/lib libraries f77blas,cblas,atlas not found in /usr/lib64/atlas libraries lapack_atlas not found in /usr/lib64/atlas libraries f77blas,cblas,atlas not found in /usr/lib64/sse2 libraries lapack_atlas not found in /usr/lib64/sse2 libraries f77blas,cblas,atlas not found in /usr/lib64 libraries lapack_atlas not found in /usr/lib64 libraries f77blas,cblas,atlas not found in /usr/lib/sse2 libraries lapack_atlas not found in /usr/lib/sse2 libraries f77blas,cblas,atlas not found in /usr/lib libraries lapack_atlas not found in /usr/lib numpy.distutils.system_info.atlas_info NOT AVAILABLE /data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/system_info.py:1330: UserWarning: Atlas (http://math-atlas.sourceforge.net/) libraries not found. Directories to search for the libraries can be specified in the numpy/distutils/site.cfg file (section [atlas]) or by setting the ATLAS environment variable. warnings.warn(AtlasNotFoundError.__doc__) lapack_info: libraries lapack not found in /usr/local/lib64 libraries lapack not found in /usr/local/lib libraries lapack not found in /usr/lib64 libraries lapack not found in /usr/lib NOT AVAILABLE /data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/system_info.py:1341: UserWarning: Lapack (http://www.netlib.org/lapack/) libraries not found. Directories to search for the libraries can be specified in the numpy/distutils/site.cfg file (section [lapack]) or by setting the LAPACK environment variable. warnings.warn(LapackNotFoundError.__doc__) lapack_src_info: NOT AVAILABLE /data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/system_info.py:1344: UserWarning: Lapack (http://www.netlib.org/lapack/) sources not found. Directories to search for the sources can be specified in the numpy/distutils/site.cfg file (section [lapack_src]) or by setting the LAPACK_SRC environment variable. warnings.warn(LapackSrcNotFoundError.__doc__) NOT AVAILABLE running build running config_cc unifing config_cc, config, build_clib, build_ext, build commands - --compiler options running config_fc unifing config_fc, config, build_clib, build_ext, build commands - --fcompiler options running build_src build_src building py_modules sources building library "npymath" sources customize Gnu95FCompiler Found executable /usr/bin/gfortran customize Gnu95FCompiler using config C compiler: gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall - -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector - --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC - -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions - -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic - -D_GNU_SOURCE -fPIC -fwrapv -fPIC compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core - -Inumpy/core/src/npymath -Inumpy/core/src/multiarray - -Inumpy/core/src/umath -Inumpy/core/include -c' gcc: _configtest.c gcc -pthread _configtest.o -o _configtest success! removing: _configtest.c _configtest.o _configtest C compiler: gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall - -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector - --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC - -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions - -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic - -D_GNU_SOURCE -fPIC -fwrapv -fPIC compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core - -Inumpy/core/src/npymath -Inumpy/core/src/multiarray - -Inumpy/core/src/umath -Inumpy/core/include -c' gcc: _configtest.c _configtest.c:1: warning: conflicting types for built-in function 'exp' gcc -pthread _configtest.o -o _configtest _configtest.o: In function `main': /data/proj/safutv/src/numpy-1.6.1/_configtest.c:6: undefined reference to `exp' collect2: ld returned 1 exit status _configtest.o: In function `main': /data/proj/safutv/src/numpy-1.6.1/_configtest.c:6: undefined reference to `exp' collect2: ld returned 1 exit status failure. removing: _configtest.c _configtest.o C compiler: gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall - -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector - --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC - -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions - -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic - -D_GNU_SOURCE -fPIC -fwrapv -fPIC compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core - -Inumpy/core/src/npymath -Inumpy/core/src/multiarray - -Inumpy/core/src/umath -Inumpy/core/include -c' gcc: _configtest.c _configtest.c:1: warning: conflicting types for built-in function 'exp' gcc -pthread _configtest.o -lm -o _configtest success! removing: _configtest.c _configtest.o _configtest building extension "numpy.core._sort" sources Generating build/src.linux-x86_64-2.6/numpy/core/include/numpy/config.h C compiler: gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall - -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector - --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC - -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions - -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic - -D_GNU_SOURCE -fPIC -fwrapv -fPIC compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core - -Inumpy/core/src/npymath -Inumpy/core/src/multiarray - -Inumpy/core/src/umath -Inumpy/core/include -c' gcc: _configtest.c _configtest.c:1:20: error: Python.h: No such file or directory _configtest.c:1:20: error: Python.h: No such file or directory failure. removing: _configtest.c _configtest.o Traceback (most recent call last): File "setup.py", line 196, in setup_package() File "setup.py", line 189, in setup_package configuration=configuration ) File "/data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/core.py", line 186, in setup return old_setup(**new_attr) File "/usr/lib64/python2.6/distutils/core.py", line 152, in setup dist.run_commands() File "/usr/lib64/python2.6/distutils/dist.py", line 975, in run_commands self.run_command(cmd) File "/usr/lib64/python2.6/distutils/dist.py", line 995, in run_command cmd_obj.run() File "/data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/command/build.py", line 37, in run old_build.run(self) File "/usr/lib64/python2.6/distutils/command/build.py", line 134, in run self.run_command(cmd_name) File "/usr/lib64/python2.6/distutils/cmd.py", line 333, in run_command self.distribution.run_command(command) File "/usr/lib64/python2.6/distutils/dist.py", line 995, in run_command cmd_obj.run() File "/data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/command/build_src.py", line 152, in run self.build_sources() File "/data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/command/build_src.py", line 169, in build_sources self.build_extension_sources(ext) File "/data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/command/build_src.py", line 328, in build_extension_sources sources = self.generate_sources(sources, ext) File "/data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/command/build_src.py", line 385, in generate_sources source = func(extension, build_dir) File "numpy/core/setup.py", line 410, in generate_config_h moredefs, ignored = cocache.check_types(config_cmd, ext, build_dir) File "numpy/core/setup.py", line 41, in check_types out = check_types(*a, **kw) File "numpy/core/setup.py", line 271, in check_types "Cannot compile 'Python.h'. Perhaps you need to "\ SystemError: Cannot compile 'Python.h'. Perhaps you need to install python-dev|python-devel. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) Comment: Using GnuPG with Red Hat - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJPOh8EAAoJEBdvyODiyJI4umMIAOQOxNPK5XdQIV7scjtALh9L g71eIIZcqjvZ9LOwOaYaWA2jx+PqQ0yGu+vWsJ3Rk7WumffnM2wZAXww3lgcs8Jm MBcosmML5O5bFHagUG2VrrmB8stV8sTWdbV+8vf/7me8tIuPgLIUb4no/oaAjNPu o7ZgyQI/FIEyCbtYM8vgvbFu7XKal9nkHHPZ4hXviDHsa1adtJjjhsWfjf8Rcins us3Wr43ErUbFEWfDGGl4EMaTBaR0KjiVxLCFLq9g4MXxfmdCL7lrgl7oq8itP4lt MqC2BEHLo8qs4PLmkOImelknu+6wINor61Iwa/1atFTqRLB0WPOLJZvA2sTTT3U= =OXPR -----END PGP SIGNATURE----- -------------- next part -------------- A non-text attachment was scrubbed... Name: martin_raspaud.vcf Type: text/x-vcard Size: 303 bytes Desc: not available URL: From matthew.brett at gmail.com Tue Feb 14 03:55:33 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 14 Feb 2012 00:55:33 -0800 Subject: [Numpy-discussion] Typecasting changes from 1.5.1 to 1.6.1 In-Reply-To: References: Message-ID: Hi Travis, On Mon, Feb 13, 2012 at 11:46 PM, Travis Oliphant wrote: > Here is the code I used to determine the coercion table of types. ? I first used *all* of the numeric_ops, narrowed it down to those with 2 inputs and 1 output, and then determined the run-time coercion table. ? Then, I removed ops that had the same tables until I was left with binary ops that had different coercion tables. > > Some operations were NotImplemented and I used 'X' in the table for those combinations. > > The table for each op is a dictionary with keys given by (type1, type2) and values given by a length-4 list of the types of the result between: ?[scalar-scalar, scalar-array, array-scalar, array-array] where the first term is type1 and the second term is type2. > > This resulting dictionary of tables for each op is then saved to a file. ? I ran this code for NumPy 1.5.1 64-bit and then again for NumPy 1.6.1 64-bit. ? I also ran this code for NumPy 1.4.1 64-bit and NumPy 1.3.1.dev 64-bit. > > The code to compare them is also attached. ? ?I'm attaching also the changes that have occurred between 1.3.1.dev and 1.4.1, 1.4.1 to 1.5.1, and finally 1.5.1 to 1.6.1 > > As you can see there were changes in each release. ? Most of these were minor prior to the change from 1.5.1 to 1.6.1. I am still reviewing the changes from 1.5.1 to 1.6.1. ? ?At first blush, it looks like there are a lot of changes to swallow that are not necessarily minor. ? ?I really would like to just say all is well, and it's no big deal. ? I hope that users really don't care and nobody's code is really relying on array-scalar combination conversions. Thanks for looking into this. It strikes me that changes in behavior here could be dangerous and easily missed, and it does seem to me that it is worth a pause to consider what the effect of the changes might be. Obviously, now both 1.6 and 1.6.1 are in the wild, there will be costs to reverting as well. Best, Matthew From jason-sage at creativetrax.com Tue Feb 14 04:00:55 2012 From: jason-sage at creativetrax.com (Jason Grout) Date: Tue, 14 Feb 2012 03:00:55 -0600 Subject: [Numpy-discussion] Issue Tracking In-Reply-To: <4F398FD3.6040904@creativetrax.com> References: <4F398FD3.6040904@creativetrax.com> Message-ID: <4F3A22C7.5000009@creativetrax.com> Jeroen's reply about the Sage "buildbot" is below: >Jeroen, do we have an > automatic buildbot system for Sage? Depends on what you mean with "automatic". We have the buildbot setup at http://build.sagemath.org/sage/waterfall which builds automatically but I still have to change versions by hand and start the builders by hand (in theory, this could be automated but in practice, this is not so easy). Thanks, Jason From madsipsen at gmail.com Tue Feb 14 04:03:23 2012 From: madsipsen at gmail.com (Mads Ipsen) Date: Tue, 14 Feb 2012 10:03:23 +0100 Subject: [Numpy-discussion] _import_array() Message-ID: <4F3A235B.1060102@gmail.com> Hi, I have C++ module (OpenGL) that extracts data from numpy arrays. The interface is pure read-only: It never returns any Python objects but only extracts data from numpy arrays. Eg: #include "numpy/arrayobject.h" void PrimitiveManager::deleteAtoms(PyObject * numpy_indices) { // Extract number of indices int const n = static_cast(PyArray_DIMS(numpy_indices)[0]); long * const indices = (long *) PyArray_DATA(numpy_indices); // Delete atoms in buffer for (int i = 0; i < n; ++i) { // Do stuff } } Now, when I compile the code with g++, I get the following warning: numpy/core/include/numpy/__multiarray_api.h:1532: warning: 'int _import_array()' defined but not used Do I need to call '_import_array()' somewhere? Am I doing something potentially nasty? Best regards, Mads -- +-----------------------------------------------------+ | Mads Ipsen | +----------------------+------------------------------+ | G?seb?ksvej 7, 4. tv | | | DK-2500 Valby | phone: +45-29716388 | | Denmark | email: mads.ipsen at gmail.com | +----------------------+------------------------------+ -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Tue Feb 14 04:03:25 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 14 Feb 2012 03:03:25 -0600 Subject: [Numpy-discussion] Updated differences between 1.5.1 to 1.6.1 Message-ID: For reference, here is the table that shows the actual changes between 1.5.1 and 1.6.1 at least on 64-bit platforms in terms of type-casting. I updated the comparison code to throw out changes that are just "spelling differences" (i.e. where 1.6.1 chooses to create an output dtype with an 'L' character code instead of a 'Q' which on 64-bit system is effectively the same). -------------- next part -------------- A non-text attachment was scrubbed... Name: 1.5.1_to_1.6.1 Type: application/octet-stream Size: 23164 bytes Desc: not available URL: -------------- next part -------------- Mostly I'm happy with the changes (after a cursory review). As I expected, there are some real improvements. Of course, I haven't looked at the changes that occur when the scalar being used does not fit in the range of the array data-type. I don't see this change documented in the link that Mark sent previously. Is it somewhere else? Also, it looks like previously object arrays were returned for some coercions which now simply fail. Is that an expected result? At this point, I'm not going to recommend changes to 1.7 to deal with these type-casting changes --- at least this thread will serve to show some of what changes occurred if it bites anyone in the future. However, I will have other changes to NumPy 1.X that I will be proposing and writing (and directing other people to write as well). After some period of quiet, this might be a refreshing change. But, not all may see it that way. I'm confident that we can resolve any concerns people might have. Any feature additions will preserve backward compatibility in NumPy 1.X. Mark W. will be helping with some of these changes, but mostly he will be working on NumPy 2.0 which we have tentatively targeted for next January. We have a tentative target for NumPy 1.8 in June/July. So far, there are three developers who will be working on NumPy 1.8 (me, Francesc Alted, and Bryan Van de Ven). Mark Wiebe is slated to help us, as well, but I would like to sponsor him as much as possible on the work for NumPy 2.0. If anyone else would like to join us, please let me know off-list. There is room for another talented person on our team. In addition to a few select features in NumPy 1.8 (a list of which will follow in a later email), we will also be working on reviewing the list of bugs on Trac and fixing them, writing tests, and improving docstrings. I would also like to improve the state of the bug-tracker and get in place a continuous integration system for NumPy. We will be advertising our NumPy 1.8 roadmap and our NumPy 2.0 roadmap at PyCon, and are working on documents that describe plans which we are hoping will be reviewed and discussed on this list. I know that having more people working on the code-base for several months will be a different scenario than what has transpired in the past. Hopefully, this will be a productive time for everybody and our sometimes different perspectives will be able to coalesce into a better result for more people. Best regards, -Travis From francesc at continuum.io Tue Feb 14 04:03:30 2012 From: francesc at continuum.io (Francesc Alted) Date: Tue, 14 Feb 2012 10:03:30 +0100 Subject: [Numpy-discussion] Index Array Performance In-Reply-To: References: <20281.39813.720754.45947@localhost.localdomain> Message-ID: On Feb 14, 2012, at 1:50 AM, Wes McKinney wrote: [clip] > But: > > In [40]: timeit hist[i, j] > 10000 loops, best of 3: 32 us per loop > > So that's roughly 7-8x slower than a simple Cython method, so I > sincerely hope it could be brought down to the sub 10 microsecond > level with a little bit of work. I vaguely remember this has shown up before. My hunch is that indexing in NumPy is so powerful, that it has to check for a lot of different values for indices (integers, tuples, lists, arrays?), and that it is all these checks what is taking time. Your Cython wrapper just assumed that the indices where integers, so this is probably the reason why it is that much faster. This is not to say that indexing in NumPy could not be accelerated, but it won't be trivial, IMO. -- Francesc Alted From travis at continuum.io Tue Feb 14 04:09:48 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 14 Feb 2012 03:09:48 -0600 Subject: [Numpy-discussion] _import_array() In-Reply-To: <4F3A235B.1060102@gmail.com> References: <4F3A235B.1060102@gmail.com> Message-ID: <63AA12AF-50DB-43BF-A1C2-E46CB768380F@continuum.io> Technically, when you write an extension module you really should use import_array(); in the init method of the extensions module. This ensures that the C-API is loaded so that the API -table is available if your C++ code uses the C-API at all. In this case you are just using some #defines that access the NumPy array structure, so it works without the import_array(). However, this could change in future releases (i.e. PyArray_DIMS and PyArray_DATA could become functions that are looked up in an API-table that must be loaded by import_array() ). Best regards, -Travis On Feb 14, 2012, at 3:03 AM, Mads Ipsen wrote: > Hi, > > I have C++ module (OpenGL) that extracts data from numpy arrays. The interface is pure read-only: It never returns any Python objects but only extracts data from numpy arrays. Eg: > > #include "numpy/arrayobject.h" > > void PrimitiveManager::deleteAtoms(PyObject * numpy_indices) > { > // Extract number of indices > int const n = static_cast(PyArray_DIMS(numpy_indices)[0]); > long * const indices = (long *) PyArray_DATA(numpy_indices); > > // Delete atoms in buffer > for (int i = 0; i < n; ++i) > { > // Do stuff > } > } > > Now, when I compile the code with g++, I get the following warning: > > numpy/core/include/numpy/__multiarray_api.h:1532: warning: ?int _import_array()? defined but not used > > Do I need to call '_import_array()' somewhere? Am I doing something potentially nasty? > > Best regards, > > Mads > > > > > > -- > +-----------------------------------------------------+ > | Mads Ipsen | > +----------------------+------------------------------+ > | G?seb?ksvej 7, 4. tv | | > | DK-2500 Valby | phone: +45-29716388 | > | Denmark | email: mads.ipsen at gmail.com | > +----------------------+------------------------------+ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From madsipsen at gmail.com Tue Feb 14 04:20:19 2012 From: madsipsen at gmail.com (Mads Ipsen) Date: Tue, 14 Feb 2012 10:20:19 +0100 Subject: [Numpy-discussion] _import_array() In-Reply-To: <63AA12AF-50DB-43BF-A1C2-E46CB768380F@continuum.io> References: <4F3A235B.1060102@gmail.com> <63AA12AF-50DB-43BF-A1C2-E46CB768380F@continuum.io> Message-ID: <4F3A2753.8030200@gmail.com> Hi, The C++ module here is a class that's used by an OpenGL window, to extract data from numpy arrays and basically draw molecules whose coordinates are stored in numpy arrays. The C++ module is accessed from Python using wrappers generated by swig. Our application may contain many active OpenGL windows, where each one of them contains an instance of the swig wrapped C++ module. So the question is then: * Should the constructor of each instance of the C++ class call import_array()? * Should import_array() be called every time a method in the C++ class handles a numpy structure? * Should import_array() only be called one time, namely when the main application is started? Best regards, Mads On 14/02/2012 10:09, Travis Oliphant wrote: > Technically, when you write an extension module you really should use > import_array(); in the init method of the extensions module. This > ensures that the C-API is loaded so that the API -table is available > if your C++ code uses the C-API at all. > > In this case you are just using some #defines that access the NumPy > array structure, so it works without the import_array(). However, > this could change in future releases (i.e. PyArray_DIMS and > PyArray_DATA could become functions that are looked up in an API-table > that must be loaded by import_array() ). > > Best regards, > > -Travis > > > > > > > > On Feb 14, 2012, at 3:03 AM, Mads Ipsen wrote: > >> Hi, >> >> I have C++ module (OpenGL) that extracts data from numpy arrays. The >> interface is pure read-only: It never returns any Python objects but >> only extracts data from numpy arrays. Eg: >> >> #include "numpy/arrayobject.h" >> >> void PrimitiveManager::deleteAtoms(PyObject * numpy_indices) >> { >> // Extract number of indices >> int const n = static_cast(PyArray_DIMS(numpy_indices)[0]); >> long * const indices = (long *) PyArray_DATA(numpy_indices); >> >> // Delete atoms in buffer >> for (int i = 0; i < n; ++i) >> { >> // Do stuff >> } >> } >> >> Now, when I compile the code with g++, I get the following warning: >> >> numpy/core/include/numpy/__multiarray_api.h:1532: warning: ?int >> _import_array()? defined but not used >> >> Do I need to call '_import_array()' somewhere? Am I doing something >> potentially nasty? >> >> Best regards, >> >> Mads >> >> >> >> >> >> -- >> +-----------------------------------------------------+ >> | Mads Ipsen | >> +----------------------+------------------------------+ >> | G?seb?ksvej 7, 4. tv | | >> | DK-2500 Valby | phone: +45-29716388 | >> | Denmark | email:mads.ipsen at gmail.com | >> +----------------------+------------------------------+ >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- +-----------------------------------------------------+ | Mads Ipsen | +----------------------+------------------------------+ | G?seb?ksvej 7, 4. tv | | | DK-2500 Valby | phone: +45-29716388 | | Denmark | email: mads.ipsen at gmail.com | +----------------------+------------------------------+ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Tue Feb 14 04:30:56 2012 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 14 Feb 2012 10:30:56 +0100 Subject: [Numpy-discussion] _import_array() In-Reply-To: <4F3A2753.8030200@gmail.com> References: <4F3A235B.1060102@gmail.com> <63AA12AF-50DB-43BF-A1C2-E46CB768380F@continuum.io> <4F3A2753.8030200@gmail.com> Message-ID: 14.02.2012 10:20, Mads Ipsen kirjoitti: [clip] > * Should import_array() only be called one time, namely when the main > application is started? It should be called once when the application is started, before you do any other Numpy-using operations. http://docs.scipy.org/doc/numpy/reference/c-api.array.html#import_array -- Pauli Virtanen From cournape at gmail.com Tue Feb 14 04:32:34 2012 From: cournape at gmail.com (David Cournapeau) Date: Tue, 14 Feb 2012 09:32:34 +0000 Subject: [Numpy-discussion] Updated differences between 1.5.1 to 1.6.1 In-Reply-To: References: Message-ID: Hi Travis, It is great that some resources can be spent to have people paid to work on NumPy. Thank you for making that happen. I am slightly confused about roadmaps for numpy 1.8 and 2.0. This needs discussion on the ML, and our release manager currently is Ralf - he is the one who ultimately decides what goes when. I am also not completely comfortable by having a roadmap advertised to Pycon not coming from the community. regards, David On Tue, Feb 14, 2012 at 9:03 AM, Travis Oliphant wrote: > For reference, here is the table that shows the actual changes between 1.5.1 and 1.6.1 at least on 64-bit platforms in terms of type-casting. ?I updated the comparison code to throw out changes that are just "spelling differences" (i.e. where 1.6.1 chooses to create an output dtype with an 'L' character code instead of a 'Q' which on 64-bit system is effectively the same). > > > > > ? ? ? ?Mostly I'm happy with the changes (after a cursory review). ?As I expected, there are some real improvements. ? ?Of course, I haven't looked at the changes that occur when the scalar being used does not fit in the range of the array data-type. ? I don't see this change documented in the link that Mark sent previously. ? Is it somewhere else? ? Also, it looks like previously object arrays were returned for some coercions which now simply fail. ?Is that an expected result? > > At this point, I'm not going to recommend changes to 1.7 to deal with these type-casting changes --- at least this thread will serve to show some of what changes occurred if it bites anyone in the future. > > However, I will have other changes to NumPy 1.X that I will be proposing and writing (and directing other people to write as well). ?After some period of quiet, this might be a refreshing change. ?But, not all may see it that way. ? I'm confident that we can resolve any concerns people might have. ? Any feature additions will preserve backward compatibility in NumPy 1.X. ? Mark W. will be helping with some of these changes, but mostly he will be working on NumPy 2.0 which we have tentatively targeted for next January. ? ?We have a tentative target for NumPy 1.8 in June/July. ? ?So far, there are three developers who will be working on NumPy 1.8 (me, Francesc Alted, and Bryan Van de Ven). ?Mark Wiebe is slated to help us, as well, but I would like to sponsor him as much as possible on the work for NumPy 2.0. ? ?If anyone else would like to join us, please let me know off-list. ? ? There is room for another talented person on our team. > > In addition to a few select features in NumPy 1.8 (a list of which will follow in a later email), ?we will also be working on reviewing the list of bugs on Trac and fixing them, writing tests, and improving docstrings. ? ?I would also like to improve the state of the bug-tracker and get in place a continuous integration system for NumPy. ? We will be advertising our NumPy 1.8 roadmap and our NumPy 2.0 roadmap at PyCon, and are working on documents that describe plans which we are hoping will be reviewed and discussed on this list. > > I know that having more people working on the code-base for several months will be a different scenario than what has transpired in the past. ? Hopefully, this will be a productive time for everybody and our sometimes different perspectives will be able to coalesce into a better result for more people. > > Best regards, > > -Travis > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From robert.kern at gmail.com Tue Feb 14 04:49:51 2012 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 14 Feb 2012 09:49:51 +0000 Subject: [Numpy-discussion] _import_array() In-Reply-To: References: <4F3A235B.1060102@gmail.com> <63AA12AF-50DB-43BF-A1C2-E46CB768380F@continuum.io> <4F3A2753.8030200@gmail.com> Message-ID: On Tue, Feb 14, 2012 at 09:30, Pauli Virtanen wrote: > 14.02.2012 10:20, Mads Ipsen kirjoitti: > [clip] >> * Should import_array() only be called one time, namely when the main >> application is started? > > It should be called once when the application is started, before you do > any other Numpy-using operations. > > http://docs.scipy.org/doc/numpy/reference/c-api.array.html#import_array Rather, it must be called once in the initialization routine of each extension module that uses numpy. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From madsipsen at gmail.com Tue Feb 14 04:56:52 2012 From: madsipsen at gmail.com (Mads Ipsen) Date: Tue, 14 Feb 2012 10:56:52 +0100 Subject: [Numpy-discussion] _import_array() In-Reply-To: References: <4F3A235B.1060102@gmail.com> <63AA12AF-50DB-43BF-A1C2-E46CB768380F@continuum.io> <4F3A2753.8030200@gmail.com> Message-ID: <4F3A2FE4.5050204@gmail.com> On 14/02/2012 10:30, Pauli Virtanen wrote: > 14.02.2012 10:20, Mads Ipsen kirjoitti: > [clip] >> * Should import_array() only be called one time, namely when the main >> application is started? > It should be called once when the application is started, before you do > any other Numpy-using operations. > > http://docs.scipy.org/doc/numpy/reference/c-api.array.html#import_array > This is what we have in our swig.i file: %init %{ import_array(); %} so I guess we are doing it the right way. But I still get the warning, when the code is compiled. Mads -- +-----------------------------------------------------+ | Mads Ipsen | +----------------------+------------------------------+ | G?seb?ksvej 7, 4. tv | | | DK-2500 Valby | phone: +45-29716388 | | Denmark | email: mads.ipsen at gmail.com | +----------------------+------------------------------+ From robert.kern at gmail.com Tue Feb 14 04:58:02 2012 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 14 Feb 2012 09:58:02 +0000 Subject: [Numpy-discussion] Index Array Performance In-Reply-To: <20281.39813.720754.45947@localhost.localdomain> References: <20281.39813.720754.45947@localhost.localdomain> Message-ID: On Mon, Feb 13, 2012 at 23:23, Marcel Oliver wrote: > Hi, > > I have a short piece of code where the use of an index array "feels > right", but incurs a severe performance penalty: It's about an order > of magnitude slower than all other operations with arrays of that > size. > > It comes up in a piece of code which is doing a large number of "on > the fly" histograms via > > ?hist[i,j] += 1 > > where i is an array with the bin index to be incremented and j is > simply enumerating the histograms. ?I attach a full short sample code > below which shows how it's being used in context, and corresponding > timeit output from the critical code section. Other people have explained that yes, applying index arrays is slow. I would just like to add the tangential point that this code does not behave the way that you think it does. You cannot make histograms like this. The statement "hist[i,j] += 1" gets broken down into three separate statements by the Python compiler: tmp = hist.__getitem__((i,j)) tmp = tmp.__iadd__(1) hist.__setitem__((i,j), tmp) Note that tmp is a new array with copies of the data in hist at the (i,j) locations, possibly multiple copies if the i index has repetitions. Each one of these copies gets incremented by 1, then the __setitem__() will apply each of those in turn to the appropriate cell in hist, each one simply overwriting the previous one. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From stefan at sun.ac.za Tue Feb 14 07:43:29 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 14 Feb 2012 04:43:29 -0800 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: <4F3A15D2.4060504@hawaii.edu> References: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> <4F3CCD79-7DDB-4FB9-9C2E-57759C1FB679@continuum.io> <10FB0A88-DB1F-41BF-9A1F-45BC02C5D9EB@continuum.io> <33970232-2E23-406D-BB01-1EBAE53FFD54@continuum.io> <8FF629A5-38BC-4179-9647-69924A39513D@continuum.io> <4F3A15D2.4060504@hawaii.edu> Message-ID: On Tue, Feb 14, 2012 at 12:05 AM, Eric Firing wrote: > On 02/13/2012 08:07 PM, Charles R Harris wrote: >> > >> >> Let it go, Travis. It's a waste of time. > > (Off-list) Chuck, I really appreciate your consistent good sense; this > is just one of many examples. ?Thank you for all your numpy work. Not off list. From charlesr.harris at gmail.com Tue Feb 14 07:47:47 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 14 Feb 2012 05:47:47 -0700 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: <934DC5A9-14BD-40BC-A616-83C6FAF9FA33@continuum.io> References: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> <4F3CCD79-7DDB-4FB9-9C2E-57759C1FB679@continuum.io> <10FB0A88-DB1F-41BF-9A1F-45BC02C5D9EB@continuum.io> <33970232-2E23-406D-BB01-1EBAE53FFD54@continuum.io> <0F4F764E-878A-454D-B58B-354E5EE5EB0E@continuum.io> <934DC5A9-14BD-40BC-A616-83C6FAF9FA33@continuum.io> Message-ID: On Tue, Feb 14, 2012 at 12:58 AM, Travis Oliphant wrote: > > > > The lack of commutativity wasn't in precision, it was in the typecodes, > and was there from the beginning. That caused confusion. A current cause of > confusion is the many to one relation of, say, int32 and long, longlong > which varies platform to platform. I think that confusion is a more > significant problem. Having some types derived from Python types, a > correspondence that also varies platform to platform is another source of > inconsistent behavior that can be > > confusing. So there are still plenty of issues to deal with > > I didn't think it was in the precision. I knew what you meant. However, > I'm still hoping for an example of what you mean by "lack of commutativity > in the typecodes". > > I made a table back around 1.3 and the lack of symmetry was readily apparent. The confusion of long and longlong varying from platform to platform comes > from C. The whole point of having long and longlong is to ensure that you > can specify the same types in Python that you would in C. They should not > be used if you don't care about that. > > Deriving from Python types for some array-scalars is an issue. I don't > like that either. However, Python itself special-cases it's scalars in > ways that necessitated it to have some use-cases not fall-over. This > shows a limitation of Python. I would prefer that all array-scalars were > recognized appropriately by the Python type system. > > Most of the concerns that you mention here are mis-understandings. Maybe > there are solutions that "fix" the problem without just educating people. > I am open to them. > > I do think that it was a mistake to have the intp and uintp dtypes as > *separate* dtypes. They should have just mapped to the right one. I > think it was also a mistake to have dtypes for all the C-spellings instead > of just a dtype for each different bit-length with an alias for the > C-spellings. We should change that in NumPy 2.0. > > About the behavior in question, I would frame this as a specific case with argument for and against like so: *The Current Behavior* In [1]: array([127], int8) + 127 Out[1]: array([-2], dtype=int8) In [2]: array([127], int8) + 128 Out[2]: array([255], dtype=int16) *Arguments for Old Behavior* Predictable, explicit output type. This is a good thing, in that no one wants their 8GB int8 array turning into a 16GB int16 array. Backward compatibility. *Arguments for New Behavior* Fewer overflow problems. But no cure. Put that way I think you can make a solid argument for a tweak to restore old behavior. Overflow can be a problem, but partial cures are not going to solve it. I think we do need a way to deal with overflow. Maybe in two ways. 1) saturated operations, i.e., 127 + 128 -> 127. This might be good for images. 2) raise an error. We could make specific ufuncs for these behaviors. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From heng at cantab.net Tue Feb 14 08:04:35 2012 From: heng at cantab.net (Henry Gomersall) Date: Tue, 14 Feb 2012 13:04:35 +0000 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: References: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> <4F3CCD79-7DDB-4FB9-9C2E-57759C1FB679@continuum.io> <10FB0A88-DB1F-41BF-9A1F-45BC02C5D9EB@continuum.io> Message-ID: <1329224675.27469.11.camel@farnsworth> On Mon, 2012-02-13 at 22:56 -0600, Travis Oliphant wrote: > But, I am also aware of *a lot* of users who never voice their opinion > on this list, and a lot of features that they want and need and are > currently working around the limitations of NumPy to get. These are > going to be my primary focus for the rest of the 1.X series. Is that a prompt for feedback? :) (btw, whilst the back slapping is going on, I think Python and Numpy in conjunction with Cython is just how software should be developed nowadays. It truly is a wonderful and winning combo. So a huge thanks to all the developers for making the world a better place.) Cheers, Henry From francesc at continuum.io Tue Feb 14 08:16:38 2012 From: francesc at continuum.io (Francesc Alted) Date: Tue, 14 Feb 2012 14:16:38 +0100 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: References: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> <4F3CCD79-7DDB-4FB9-9C2E-57759C1FB679@continuum.io> <10FB0A88-DB1F-41BF-9A1F-45BC02C5D9EB@continuum.io> <33970232-2E23-406D-BB01-1EBAE53FFD54@continuum.io> <0F4F764E-878A-454D-B58B-354E5EE5EB0E@continuum.io> <934DC5A9-14BD-40BC-A616-83C6FAF9FA33@continuum.io> Message-ID: <4F7CFE18-4E78-40E7-96ED-0B7E9577553E@continuum.io> On Feb 14, 2012, at 1:47 PM, Charles R Harris wrote: > About the behavior in question, I would frame this as a specific case with argument for and against like so: > > The Current Behavior > > In [1]: array([127], int8) + 127 > Out[1]: array([-2], dtype=int8) > > In [2]: array([127], int8) + 128 > Out[2]: array([255], dtype=int16) > Good point. > Arguments for Old Behavior > > Predictable, explicit output type. This is a good thing, in that no one wants their 8GB int8 array turning into a 16GB int16 array. Exactly, IIRC this is the main reason why the old behavior was decided. > Backward compatibility. > > Arguments for New Behavior > > Fewer overflow problems. But no cure. > > > Put that way I think you can make a solid argument for a tweak to restore old behavior. Overflow can be a problem, but partial cures are not going to solve it. I think we do need a way to deal with overflow. Maybe in two ways. 1) saturated operations, i.e., 127 + 128 -> 127. This might be good for images. 2) raise an error. We could make specific ufuncs for these behaviors. Hmm, I'm thinking that it would be nice if NumPy could actually support both behaviors. I just wonder whether that should be implemented as a property of each array or as global attribute for the whole NumPy package. While the latter should be easier to implement (what to do when different behaved arrays are being operated?), the former would give more flexibility. I know, this will introduce more complexity in the code base, but anyway, I think that would be a nice thing to support for NumPy 2.0. Just a thought, -- Francesc Alted From fperez.net at gmail.com Mon Feb 13 16:55:45 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Mon, 13 Feb 2012 13:55:45 -0800 Subject: [Numpy-discussion] Discussion with Guido van Rossum and (hopefully) core python-dev on scientific Python and Python3 Message-ID: Hi folks, [ I'm broadcasting this widely for maximum reach, but I'd appreciate it if replies can be kept to the *numpy* list, which is sort of the 'base' list for scientific/numerical work. It will make it much easier to organize a coherent set of notes later on. Apology if you're subscribed to all and get it 10 times. ] As part of the PyData workshop (http://pydataworkshop.eventbrite.com) to be held March 2 and 3 at the Mountain View Google offices, we have scheduled a session for an open discussion with Guido van Rossum and hopefully as many core python-dev members who can make it. We wanted to seize the combined opportunity of the PyData workshop bringing a number of 'scipy people' to Google with the timeline for Python 3.3, the first release after the Python language moratorium, being within sight: http://www.python.org/dev/peps/pep-0398. While a number of scientific Python packages are already available for Python 3 (either in released form or in their master git branches), it's fair to say that there hasn't been a major transition of the scientific community to Python3. Since there is no more development being done on the Python2 series, eventually we will all want to find ways to make this transition, and we think that this is an excellent time to engage the core python development team and consider ideas that would make Python3 generally a more appealing language for scientific work. Guido has made it clear that he doesn't speak for the day-to-day development of Python anymore, so we all should be aware that any ideas that come out of this panel will still need to be discussed with python-dev itself via standard mechanisms before anything is implemented. Nonetheless, the opportunity for a solid face-to-face dialog for brainstorming was too good to pass up. The purpose of this email is then to solicit, from all of our community, ideas for this discussion. In a week or so we'll need to summarize the main points brought up here and make a more concrete agenda out of it; I will also post a summary of the meeting afterwards here. Anything is a valid topic, some points just to get the conversation started: - Extra operators/PEP 225. Here's a summary from the last time we went over this, years ago at Scipy 2008: http://mail.scipy.org/pipermail/numpy-discussion/2008-October/038234.html, and the current status of the document we wrote about it is here: file:///home/fperez/www/site/_build/html/py4science/numpy-pep225/numpy-pep225.html. - Improved syntax/support for rationals or decimal literals? While Python now has both decimals (http://docs.python.org/library/decimal.html) and rationals (http://docs.python.org/library/fractions.html), they're quite clunky to use because they require full constructor calls. Guido has mentioned in previous discussions toying with ideas about support for different kinds of numeric literals... - Using the numpy docstring standard python-wide, and thus having python improve the pathetic state of the stdlib's docstrings? This is an area where our community is light years ahead of the standard library, but we'd all benefit from Python itself improving on this front. I'm toying with the idea of giving a lighting talk at PyConn about this, comparing the great, robust culture and tools of good docstrings across the Scipy ecosystem with the sad, sad state of docstrings in the stdlib. It might spur some movement on that front from the stdlib authors, esp. if the core python-dev team realizes the value and benefit it can bring (at relatively low cost, given how most of the information does exist, it's just in the wrong places). But more importantly for us, if there was truly a universal standard for high-quality docstrings across Python projects, building good documentation/help machinery would be a lot easier, as we'd know what to expect and search for (such as rendering them nicely in the ipython notebook, providing high-quality cross-project help search, etc). - Literal syntax for arrays? Sage has been floating a discussion about a literal matrix syntax (https://groups.google.com/forum/#!topic/sage-devel/mzwepqZBHnA). For something like this to go into python in any meaningful way there would have to be core multidimensional arrays in the language, but perhaps it's time to think about a piece of the numpy array itself into Python? This is one of the more 'out there' ideas, but after all, that's the point of a discussion like this, especially considering we'll have both Travis and Guido in one room. - Other syntactic sugar? Sage has "a..b" <=> range(a, b+1), which I actually think is both nice and useful... There's also the question of allowing "a:b:c" notation outside of [], which has come up a few times in conversation over the last few years. Others? - The packaging quagmire? This continues to be a problem, though python3 does have new improvements to distutils. I'm not really up to speed on the situation, to be frank. If we want to bring this up, someone will have to provide a solid reference or volunteer to do it in person. - etc... I'm putting the above just to *start* the discussion, but the real point is for the rest of the community to contribute ideas, so don't be shy. Final note: while I am here commiting to organizing and presenting this at the discussion with Guido (as well as contacting python-dev), I would greatly appreciate help with the task of summarizing this prior to the meeting as I'm pretty badly swamped in the run-in to pydata/pycon. So if anyone is willing to help draft the summary as the date draws closer (we can put it up on a github wiki, gist, whatever), I will be very grateful. I'm sure it will be better than what I'll otherwise do the last night at 2am :) Cheers, f ps - to the obvious question about webcasting the discussion live for remote participation: yes, we looked into it already; no, unfortunately it appears it won't be possible. We'll try to at least have the audio recorded (and possibly video) for posting later on. pps- if you are close to Mountain View and are interested in attending this panel in person, drop me a line at fernando.perez at berkeley.edu. We have a few spots available *for this discussion only* on top of the pydata regular attendance (which is long closed, I'm afraid). But we'll need to provide Google with a list of those attendees in advance. Please indicate if you are a core python committer in your email, as we'll give priority for this overflow pool to core python developers (but will otherwise accommodate as many people as Google lets us). From glk at uchicago.edu Tue Feb 14 09:22:08 2012 From: glk at uchicago.edu (Gordon L. Kindlmann) Date: Tue, 14 Feb 2012 08:22:08 -0600 Subject: [Numpy-discussion] @Dag re numpy.pxd In-Reply-To: References: Message-ID: <1CF87713-BCFC-4903-9940-C58B7F263905@uchicago.edu> Hello, This (below) caught my eye and I'm wondering what further information is available? I very much value the ability to wrap underlying array data from numpy for processing in non-python libraries, as well as the ability to wrap numpy arrays around array data allocated by non-python libraries. Is this capability going to be removed? Gordon > Message: 6 > Date: Sat, 11 Feb 2012 13:31:51 -0700 > From: Charles R Harris > Subject: [Numpy-discussion] @Dag re numpy.pxd > To: numpy-discussion > Message-ID: > > Content-Type: text/plain; charset="iso-8859-1" > > Hi Dag, > > This probably needs to be on the cython mailing list at some point, but I > thought I'd start the discussion here. Numpy is going to begin deprecating > direct access to ndarray/dtype internals, ala arr->data etc. There are > currently macros/functions for many of these operations in the numpy > development branch and I expect more to go in over the coming year. Also, > some of the macros have been renamed. I don't know the best way for Cython > to support this, but the current version (0.15 here) generates code that > will fail if the deprecated things are excluded. Ideally, numpy.pxd would > have numpy version dependent parts but I don't know if that is possible. In > any case, I'd like your thoughts on the best way to coordinate this > migration with Cython. > > Chuck From d.s.seljebotn at astro.uio.no Tue Feb 14 09:50:13 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 14 Feb 2012 06:50:13 -0800 Subject: [Numpy-discussion] @Dag re numpy.pxd In-Reply-To: <1CF87713-BCFC-4903-9940-C58B7F263905@uchicago.edu> References: <1CF87713-BCFC-4903-9940-C58B7F263905@uchicago.edu> Message-ID: <4F3A74A5.7070301@astro.uio.no> On 02/14/2012 06:22 AM, Gordon L. Kindlmann wrote: > Hello, > > This (below) caught my eye and I'm wondering what further information is available? > > I very much value the ability to wrap underlying array data from numpy for processing in non-python libraries, as well as the ability to wrap numpy arrays around array data allocated by non-python libraries. > > Is this capability going to be removed? This discussion had nothing to do with what you are saying above. Dag > > Gordon > > >> Message: 6 >> Date: Sat, 11 Feb 2012 13:31:51 -0700 >> From: Charles R Harris >> Subject: [Numpy-discussion] @Dag re numpy.pxd >> To: numpy-discussion >> Message-ID: >> >> Content-Type: text/plain; charset="iso-8859-1" >> >> Hi Dag, >> >> This probably needs to be on the cython mailing list at some point, but I >> thought I'd start the discussion here. Numpy is going to begin deprecating >> direct access to ndarray/dtype internals, ala arr->data etc. There are >> currently macros/functions for many of these operations in the numpy >> development branch and I expect more to go in over the coming year. Also, >> some of the macros have been renamed. I don't know the best way for Cython >> to support this, but the current version (0.15 here) generates code that >> will fail if the deprecated things are excluded. Ideally, numpy.pxd would >> have numpy version dependent parts but I don't know if that is possible. In >> any case, I'd like your thoughts on the best way to coordinate this >> migration with Cython. >> >> Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Tue Feb 14 09:50:19 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 14 Feb 2012 07:50:19 -0700 Subject: [Numpy-discussion] @Dag re numpy.pxd In-Reply-To: <1CF87713-BCFC-4903-9940-C58B7F263905@uchicago.edu> References: <1CF87713-BCFC-4903-9940-C58B7F263905@uchicago.edu> Message-ID: On Tue, Feb 14, 2012 at 7:22 AM, Gordon L. Kindlmann wrote: > Hello, > > This (below) caught my eye and I'm wondering what further information is > available? > > I very much value the ability to wrap underlying array data from numpy for > processing in non-python libraries, as well as the ability to wrap numpy > arrays around array data allocated by non-python libraries. > > Is this capability going to be removed? > > No. But we are going to try making some things go through functions. The problem with direct access is that we can't modify structures and implementations while maintaining backward compatibility and that will be limiting going forward. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Tue Feb 14 10:12:55 2012 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 14 Feb 2012 09:12:55 -0600 Subject: [Numpy-discussion] Typecasting changes from 1.5.1 to 1.6.1 In-Reply-To: References: Message-ID: On Tuesday, February 14, 2012, Travis Oliphant wrote: > Here is the code I used to determine the coercion table of types. I first used *all* of the numeric_ops, narrowed it down to those with 2 inputs and 1 output, and then determined the run-time coercion table. Then, I removed ops that had the same tables until I was left with binary ops that had different coercion tables. > > Some operations were NotImplemented and I used 'X' in the table for those combinations. > > The table for each op is a dictionary with keys given by (type1, type2) and values given by a length-4 list of the types of the result between: [scalar-scalar, scalar-array, array-scalar, array-array] where the first term is type1 and the second term is type2. > > This resulting dictionary of tables for each op is then saved to a file. I ran this code for NumPy 1.5.1 64-bit and then again for NumPy 1.6.1 64-bit. I also ran this code for NumPy 1.4.1 64-bit and NumPy 1.3.1.dev 64-bit. > > The code to compare them is also attached. I'm attaching also the changes that have occurred between 1.3.1.dev and 1.4.1, 1.4.1 to 1.5.1, and finally 1.5.1 to 1.6.1 > > As you can see there were changes in each release. Most of these were minor prior to the change from 1.5.1 to 1.6.1. I am still reviewing the changes from 1.5.1 to 1.6.1. At first blush, it looks like there are a lot of changes to swallow that are not necessarily minor. I really would like to just say all is well, and it's no big deal. I hope that users really don't care and nobody's code is really relying on array-scalar combination conversions. > > -Travis > > > Would it make sense to adapt this code to go into the test suite? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Tue Feb 14 10:40:59 2012 From: shish at keba.be (Olivier Delalleau) Date: Tue, 14 Feb 2012 10:40:59 -0500 Subject: [Numpy-discussion] Numpy 1.6.1 installation problem In-Reply-To: <4F3A1F05.10609@smhi.se> References: <4F3A1F05.10609@smhi.se> Message-ID: Really not an expert here, but it looks like it's trying various compilation options, some work and some don't, and for some reason it's really unhappy about the one where it can't find Python.h. Maybe add /usr/include/python2.6 to your CPATH, see if that helps (and make sure permissions are correctly set on this directory)? However, it may very well be something else.... -=- Olivier Le 14 f?vrier 2012 03:44, Martin Raspaud a ?crit : > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi all, > > I am trying to compile numpy 1.6.1 from source on a Redhat Linux > enterprise 6 machine, and I get a problem with Python.h : somehow it > can't be located by numpy's install script: > SystemError: Cannot compile 'Python.h'. Perhaps you need to install > python-dev|python-devel. > > Now, the trick is that python-devel IS installed: > bash-4.1$ rpm -qa | grep python-dev > python-devel-2.6.6-29.el6.x86_64 > > and Python.h is logically in /usr/include/python2.6 > > Anyone got a clue ? > > The full log is included below. > > Best regards, > Martin > > > bash-4.1$ python setup.py build --fcompiler=gnu95 > Running from numpy source directory.non-existing path in > 'numpy/distutils': 'site.cfg' > F2PY Version 2 > blas_opt_info: > blas_mkl_info: > libraries mkl,vml,guide not found in /usr/local/lib64 > libraries mkl,vml,guide not found in /usr/local/lib > libraries mkl,vml,guide not found in /usr/lib64 > libraries mkl,vml,guide not found in /usr/lib > NOT AVAILABLE > > atlas_blas_threads_info: > Setting PTATLAS=ATLAS > libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib64 > libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib > libraries ptf77blas,ptcblas,atlas not found in /usr/lib64/atlas > libraries ptf77blas,ptcblas,atlas not found in /usr/lib64/sse2 > libraries ptf77blas,ptcblas,atlas not found in /usr/lib64 > libraries ptf77blas,ptcblas,atlas not found in /usr/lib/sse2 > libraries ptf77blas,ptcblas,atlas not found in /usr/lib > NOT AVAILABLE > > atlas_blas_info: > libraries f77blas,cblas,atlas not found in /usr/local/lib64 > libraries f77blas,cblas,atlas not found in /usr/local/lib > libraries f77blas,cblas,atlas not found in /usr/lib64/atlas > libraries f77blas,cblas,atlas not found in /usr/lib64/sse2 > libraries f77blas,cblas,atlas not found in /usr/lib64 > libraries f77blas,cblas,atlas not found in /usr/lib/sse2 > libraries f77blas,cblas,atlas not found in /usr/lib > NOT AVAILABLE > > /data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/system_info.py:1414: > UserWarning: > Atlas (http://math-atlas.sourceforge.net/) libraries not found. > Directories to search for the libraries can be specified in the > numpy/distutils/site.cfg file (section [atlas]) or by setting > the ATLAS environment variable. > warnings.warn(AtlasNotFoundError.__doc__) > blas_info: > libraries blas not found in /usr/local/lib64 > libraries blas not found in /usr/local/lib > libraries blas not found in /usr/lib64 > libraries blas not found in /usr/lib > NOT AVAILABLE > > /data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/system_info.py:1423: > UserWarning: > Blas (http://www.netlib.org/blas/) libraries not found. > Directories to search for the libraries can be specified in the > numpy/distutils/site.cfg file (section [blas]) or by setting > the BLAS environment variable. > warnings.warn(BlasNotFoundError.__doc__) > blas_src_info: > NOT AVAILABLE > > /data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/system_info.py:1426: > UserWarning: > Blas (http://www.netlib.org/blas/) sources not found. > Directories to search for the sources can be specified in the > numpy/distutils/site.cfg file (section [blas_src]) or by setting > the BLAS_SRC environment variable. > warnings.warn(BlasSrcNotFoundError.__doc__) > NOT AVAILABLE > > lapack_opt_info: > lapack_mkl_info: > mkl_info: > libraries mkl,vml,guide not found in /usr/local/lib64 > libraries mkl,vml,guide not found in /usr/local/lib > libraries mkl,vml,guide not found in /usr/lib64 > libraries mkl,vml,guide not found in /usr/lib > NOT AVAILABLE > > NOT AVAILABLE > > atlas_threads_info: > Setting PTATLAS=ATLAS > libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib64 > libraries lapack_atlas not found in /usr/local/lib64 > libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib > libraries lapack_atlas not found in /usr/local/lib > libraries ptf77blas,ptcblas,atlas not found in /usr/lib64/atlas > libraries lapack_atlas not found in /usr/lib64/atlas > libraries ptf77blas,ptcblas,atlas not found in /usr/lib64/sse2 > libraries lapack_atlas not found in /usr/lib64/sse2 > libraries ptf77blas,ptcblas,atlas not found in /usr/lib64 > libraries lapack_atlas not found in /usr/lib64 > libraries ptf77blas,ptcblas,atlas not found in /usr/lib/sse2 > libraries lapack_atlas not found in /usr/lib/sse2 > libraries ptf77blas,ptcblas,atlas not found in /usr/lib > libraries lapack_atlas not found in /usr/lib > numpy.distutils.system_info.atlas_threads_info > NOT AVAILABLE > > atlas_info: > libraries f77blas,cblas,atlas not found in /usr/local/lib64 > libraries lapack_atlas not found in /usr/local/lib64 > libraries f77blas,cblas,atlas not found in /usr/local/lib > libraries lapack_atlas not found in /usr/local/lib > libraries f77blas,cblas,atlas not found in /usr/lib64/atlas > libraries lapack_atlas not found in /usr/lib64/atlas > libraries f77blas,cblas,atlas not found in /usr/lib64/sse2 > libraries lapack_atlas not found in /usr/lib64/sse2 > libraries f77blas,cblas,atlas not found in /usr/lib64 > libraries lapack_atlas not found in /usr/lib64 > libraries f77blas,cblas,atlas not found in /usr/lib/sse2 > libraries lapack_atlas not found in /usr/lib/sse2 > libraries f77blas,cblas,atlas not found in /usr/lib > libraries lapack_atlas not found in /usr/lib > numpy.distutils.system_info.atlas_info > NOT AVAILABLE > > /data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/system_info.py:1330: > UserWarning: > Atlas (http://math-atlas.sourceforge.net/) libraries not found. > Directories to search for the libraries can be specified in the > numpy/distutils/site.cfg file (section [atlas]) or by setting > the ATLAS environment variable. > warnings.warn(AtlasNotFoundError.__doc__) > lapack_info: > libraries lapack not found in /usr/local/lib64 > libraries lapack not found in /usr/local/lib > libraries lapack not found in /usr/lib64 > libraries lapack not found in /usr/lib > NOT AVAILABLE > > /data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/system_info.py:1341: > UserWarning: > Lapack (http://www.netlib.org/lapack/) libraries not found. > Directories to search for the libraries can be specified in the > numpy/distutils/site.cfg file (section [lapack]) or by setting > the LAPACK environment variable. > warnings.warn(LapackNotFoundError.__doc__) > lapack_src_info: > NOT AVAILABLE > > /data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/system_info.py:1344: > UserWarning: > Lapack (http://www.netlib.org/lapack/) sources not found. > Directories to search for the sources can be specified in the > numpy/distutils/site.cfg file (section [lapack_src]) or by setting > the LAPACK_SRC environment variable. > warnings.warn(LapackSrcNotFoundError.__doc__) > NOT AVAILABLE > > running build > running config_cc > unifing config_cc, config, build_clib, build_ext, build commands > - --compiler options > running config_fc > unifing config_fc, config, build_clib, build_ext, build commands > - --fcompiler options > running build_src > build_src > building py_modules sources > building library "npymath" sources > customize Gnu95FCompiler > Found executable /usr/bin/gfortran > customize Gnu95FCompiler using config > C compiler: gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall > - -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector > - --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC > - -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions > - -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic > - -D_GNU_SOURCE -fPIC -fwrapv -fPIC > > compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core > - -Inumpy/core/src/npymath -Inumpy/core/src/multiarray > - -Inumpy/core/src/umath -Inumpy/core/include -c' > gcc: _configtest.c > gcc -pthread _configtest.o -o _configtest > success! > removing: _configtest.c _configtest.o _configtest > C compiler: gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall > - -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector > - --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC > - -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions > - -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic > - -D_GNU_SOURCE -fPIC -fwrapv -fPIC > > compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core > - -Inumpy/core/src/npymath -Inumpy/core/src/multiarray > - -Inumpy/core/src/umath -Inumpy/core/include -c' > gcc: _configtest.c > _configtest.c:1: warning: conflicting types for built-in function 'exp' > gcc -pthread _configtest.o -o _configtest > _configtest.o: In function `main': > /data/proj/safutv/src/numpy-1.6.1/_configtest.c:6: undefined reference > to `exp' > collect2: ld returned 1 exit status > _configtest.o: In function `main': > /data/proj/safutv/src/numpy-1.6.1/_configtest.c:6: undefined reference > to `exp' > collect2: ld returned 1 exit status > failure. > removing: _configtest.c _configtest.o > C compiler: gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall > - -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector > - --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC > - -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions > - -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic > - -D_GNU_SOURCE -fPIC -fwrapv -fPIC > > compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core > - -Inumpy/core/src/npymath -Inumpy/core/src/multiarray > - -Inumpy/core/src/umath -Inumpy/core/include -c' > gcc: _configtest.c > _configtest.c:1: warning: conflicting types for built-in function 'exp' > gcc -pthread _configtest.o -lm -o _configtest > success! > removing: _configtest.c _configtest.o _configtest > building extension "numpy.core._sort" sources > Generating build/src.linux-x86_64-2.6/numpy/core/include/numpy/config.h > C compiler: gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall > - -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector > - --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC > - -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions > - -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic > - -D_GNU_SOURCE -fPIC -fwrapv -fPIC > > compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core > - -Inumpy/core/src/npymath -Inumpy/core/src/multiarray > - -Inumpy/core/src/umath -Inumpy/core/include -c' > gcc: _configtest.c > _configtest.c:1:20: error: Python.h: No such file or directory > _configtest.c:1:20: error: Python.h: No such file or directory > failure. > removing: _configtest.c _configtest.o > Traceback (most recent call last): > File "setup.py", line 196, in > setup_package() > File "setup.py", line 189, in setup_package > configuration=configuration ) > File "/data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/core.py", > line 186, in setup > return old_setup(**new_attr) > File "/usr/lib64/python2.6/distutils/core.py", line 152, in setup > dist.run_commands() > File "/usr/lib64/python2.6/distutils/dist.py", line 975, in run_commands > self.run_command(cmd) > File "/usr/lib64/python2.6/distutils/dist.py", line 995, in run_command > cmd_obj.run() > File > "/data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/command/build.py", > line 37, in run > old_build.run(self) > File "/usr/lib64/python2.6/distutils/command/build.py", line 134, in run > self.run_command(cmd_name) > File "/usr/lib64/python2.6/distutils/cmd.py", line 333, in run_command > self.distribution.run_command(command) > File "/usr/lib64/python2.6/distutils/dist.py", line 995, in run_command > cmd_obj.run() > File > "/data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/command/build_src.py", > line 152, in run > self.build_sources() > File > "/data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/command/build_src.py", > line 169, in build_sources > self.build_extension_sources(ext) > File > "/data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/command/build_src.py", > line 328, in build_extension_sources > sources = self.generate_sources(sources, ext) > File > "/data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/command/build_src.py", > line 385, in generate_sources > source = func(extension, build_dir) > File "numpy/core/setup.py", line 410, in generate_config_h > moredefs, ignored = cocache.check_types(config_cmd, ext, build_dir) > File "numpy/core/setup.py", line 41, in check_types > out = check_types(*a, **kw) > File "numpy/core/setup.py", line 271, in check_types > "Cannot compile 'Python.h'. Perhaps you need to "\ > SystemError: Cannot compile 'Python.h'. Perhaps you need to install > python-dev|python-devel. > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2.0.14 (GNU/Linux) > Comment: Using GnuPG with Red Hat - http://enigmail.mozdev.org/ > > iQEcBAEBAgAGBQJPOh8EAAoJEBdvyODiyJI4umMIAOQOxNPK5XdQIV7scjtALh9L > g71eIIZcqjvZ9LOwOaYaWA2jx+PqQ0yGu+vWsJ3Rk7WumffnM2wZAXww3lgcs8Jm > MBcosmML5O5bFHagUG2VrrmB8stV8sTWdbV+8vf/7me8tIuPgLIUb4no/oaAjNPu > o7ZgyQI/FIEyCbtYM8vgvbFu7XKal9nkHHPZ4hXviDHsa1adtJjjhsWfjf8Rcins > us3Wr43ErUbFEWfDGGl4EMaTBaR0KjiVxLCFLq9g4MXxfmdCL7lrgl7oq8itP4lt > MqC2BEHLo8qs4PLmkOImelknu+6wINor61Iwa/1atFTqRLB0WPOLJZvA2sTTT3U= > =OXPR > -----END PGP SIGNATURE----- > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Tue Feb 14 10:48:22 2012 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 14 Feb 2012 09:48:22 -0600 Subject: [Numpy-discussion] Numpy 1.6.1 installation problem In-Reply-To: References: <4F3A1F05.10609@smhi.se> Message-ID: <4F3A8246.40307@gmail.com> On 02/14/2012 09:40 AM, Olivier Delalleau wrote: > Really not an expert here, but it looks like it's trying various > compilation options, some work and some don't, and for some reason > it's really unhappy about the one where it can't find Python.h. > Maybe add /usr/include/python2.6 to your CPATH, see if that helps (and > make sure permissions are correctly set on this directory)? However, > it may very well be something else.... > > -=- Olivier > > Le 14 f?vrier 2012 03:44, Martin Raspaud > a ?crit : > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi all, > > I am trying to compile numpy 1.6.1 from source on a Redhat Linux > enterprise 6 machine, and I get a problem with Python.h : somehow it > can't be located by numpy's install script: > SystemError: Cannot compile 'Python.h'. Perhaps you need to install > python-dev|python-devel. > > Now, the trick is that python-devel IS installed: > bash-4.1$ rpm -qa | grep python-dev > python-devel-2.6.6-29.el6.x86_64 > > and Python.h is logically in /usr/include/python2.6 > > Anyone got a clue ? > > The full log is included below. > > Best regards, > Martin > > > bash-4.1$ python setup.py build --fcompiler=gnu95 > Running from numpy source directory.non-existing path in > 'numpy/distutils': 'site.cfg' > F2PY Version 2 > blas_opt_info: > blas_mkl_info: > libraries mkl,vml,guide not found in /usr/local/lib64 > libraries mkl,vml,guide not found in /usr/local/lib > libraries mkl,vml,guide not found in /usr/lib64 > libraries mkl,vml,guide not found in /usr/lib > NOT AVAILABLE > > atlas_blas_threads_info: > Setting PTATLAS=ATLAS > libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib64 > libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib > libraries ptf77blas,ptcblas,atlas not found in /usr/lib64/atlas > libraries ptf77blas,ptcblas,atlas not found in /usr/lib64/sse2 > libraries ptf77blas,ptcblas,atlas not found in /usr/lib64 > libraries ptf77blas,ptcblas,atlas not found in /usr/lib/sse2 > libraries ptf77blas,ptcblas,atlas not found in /usr/lib > NOT AVAILABLE > > atlas_blas_info: > libraries f77blas,cblas,atlas not found in /usr/local/lib64 > libraries f77blas,cblas,atlas not found in /usr/local/lib > libraries f77blas,cblas,atlas not found in /usr/lib64/atlas > libraries f77blas,cblas,atlas not found in /usr/lib64/sse2 > libraries f77blas,cblas,atlas not found in /usr/lib64 > libraries f77blas,cblas,atlas not found in /usr/lib/sse2 > libraries f77blas,cblas,atlas not found in /usr/lib > NOT AVAILABLE > > /data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/system_info.py:1414: > UserWarning: > Atlas (http://math-atlas.sourceforge.net/) libraries not found. > Directories to search for the libraries can be specified in the > numpy/distutils/site.cfg file (section [atlas]) or by setting > the ATLAS environment variable. > warnings.warn(AtlasNotFoundError.__doc__) > blas_info: > libraries blas not found in /usr/local/lib64 > libraries blas not found in /usr/local/lib > libraries blas not found in /usr/lib64 > libraries blas not found in /usr/lib > NOT AVAILABLE > > /data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/system_info.py:1423: > UserWarning: > Blas (http://www.netlib.org/blas/) libraries not found. > Directories to search for the libraries can be specified in the > numpy/distutils/site.cfg file (section [blas]) or by setting > the BLAS environment variable. > warnings.warn(BlasNotFoundError.__doc__) > blas_src_info: > NOT AVAILABLE > > /data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/system_info.py:1426: > UserWarning: > Blas (http://www.netlib.org/blas/) sources not found. > Directories to search for the sources can be specified in the > numpy/distutils/site.cfg file (section [blas_src]) or by setting > the BLAS_SRC environment variable. > warnings.warn(BlasSrcNotFoundError.__doc__) > NOT AVAILABLE > > lapack_opt_info: > lapack_mkl_info: > mkl_info: > libraries mkl,vml,guide not found in /usr/local/lib64 > libraries mkl,vml,guide not found in /usr/local/lib > libraries mkl,vml,guide not found in /usr/lib64 > libraries mkl,vml,guide not found in /usr/lib > NOT AVAILABLE > > NOT AVAILABLE > > atlas_threads_info: > Setting PTATLAS=ATLAS > libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib64 > libraries lapack_atlas not found in /usr/local/lib64 > libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib > libraries lapack_atlas not found in /usr/local/lib > libraries ptf77blas,ptcblas,atlas not found in /usr/lib64/atlas > libraries lapack_atlas not found in /usr/lib64/atlas > libraries ptf77blas,ptcblas,atlas not found in /usr/lib64/sse2 > libraries lapack_atlas not found in /usr/lib64/sse2 > libraries ptf77blas,ptcblas,atlas not found in /usr/lib64 > libraries lapack_atlas not found in /usr/lib64 > libraries ptf77blas,ptcblas,atlas not found in /usr/lib/sse2 > libraries lapack_atlas not found in /usr/lib/sse2 > libraries ptf77blas,ptcblas,atlas not found in /usr/lib > libraries lapack_atlas not found in /usr/lib > numpy.distutils.system_info.atlas_threads_info > NOT AVAILABLE > > atlas_info: > libraries f77blas,cblas,atlas not found in /usr/local/lib64 > libraries lapack_atlas not found in /usr/local/lib64 > libraries f77blas,cblas,atlas not found in /usr/local/lib > libraries lapack_atlas not found in /usr/local/lib > libraries f77blas,cblas,atlas not found in /usr/lib64/atlas > libraries lapack_atlas not found in /usr/lib64/atlas > libraries f77blas,cblas,atlas not found in /usr/lib64/sse2 > libraries lapack_atlas not found in /usr/lib64/sse2 > libraries f77blas,cblas,atlas not found in /usr/lib64 > libraries lapack_atlas not found in /usr/lib64 > libraries f77blas,cblas,atlas not found in /usr/lib/sse2 > libraries lapack_atlas not found in /usr/lib/sse2 > libraries f77blas,cblas,atlas not found in /usr/lib > libraries lapack_atlas not found in /usr/lib > numpy.distutils.system_info.atlas_info > NOT AVAILABLE > > /data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/system_info.py:1330: > UserWarning: > Atlas (http://math-atlas.sourceforge.net/) libraries not found. > Directories to search for the libraries can be specified in the > numpy/distutils/site.cfg file (section [atlas]) or by setting > the ATLAS environment variable. > warnings.warn(AtlasNotFoundError.__doc__) > lapack_info: > libraries lapack not found in /usr/local/lib64 > libraries lapack not found in /usr/local/lib > libraries lapack not found in /usr/lib64 > libraries lapack not found in /usr/lib > NOT AVAILABLE > > /data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/system_info.py:1341: > UserWarning: > Lapack (http://www.netlib.org/lapack/) libraries not found. > Directories to search for the libraries can be specified in the > numpy/distutils/site.cfg file (section [lapack]) or by setting > the LAPACK environment variable. > warnings.warn(LapackNotFoundError.__doc__) > lapack_src_info: > NOT AVAILABLE > > /data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/system_info.py:1344: > UserWarning: > Lapack (http://www.netlib.org/lapack/) sources not found. > Directories to search for the sources can be specified in the > numpy/distutils/site.cfg file (section [lapack_src]) or by setting > the LAPACK_SRC environment variable. > warnings.warn(LapackSrcNotFoundError.__doc__) > NOT AVAILABLE > > running build > running config_cc > unifing config_cc, config, build_clib, build_ext, build commands > - --compiler options > running config_fc > unifing config_fc, config, build_clib, build_ext, build commands > - --fcompiler options > running build_src > build_src > building py_modules sources > building library "npymath" sources > customize Gnu95FCompiler > Found executable /usr/bin/gfortran > customize Gnu95FCompiler using config > C compiler: gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall > - -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector > - --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC > - -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions > - -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic > - -D_GNU_SOURCE -fPIC -fwrapv -fPIC > > compile options: '-Inumpy/core/src/private -Inumpy/core/src > -Inumpy/core > - -Inumpy/core/src/npymath -Inumpy/core/src/multiarray > - -Inumpy/core/src/umath -Inumpy/core/include -c' > gcc: _configtest.c > gcc -pthread _configtest.o -o _configtest > success! > removing: _configtest.c _configtest.o _configtest > C compiler: gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall > - -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector > - --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC > - -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions > - -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic > - -D_GNU_SOURCE -fPIC -fwrapv -fPIC > > compile options: '-Inumpy/core/src/private -Inumpy/core/src > -Inumpy/core > - -Inumpy/core/src/npymath -Inumpy/core/src/multiarray > - -Inumpy/core/src/umath -Inumpy/core/include -c' > gcc: _configtest.c > _configtest.c:1: warning: conflicting types for built-in function > 'exp' > gcc -pthread _configtest.o -o _configtest > _configtest.o: In function `main': > /data/proj/safutv/src/numpy-1.6.1/_configtest.c:6: undefined reference > to `exp' > collect2: ld returned 1 exit status > _configtest.o: In function `main': > /data/proj/safutv/src/numpy-1.6.1/_configtest.c:6: undefined reference > to `exp' > collect2: ld returned 1 exit status > failure. > removing: _configtest.c _configtest.o > C compiler: gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall > - -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector > - --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC > - -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions > - -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic > - -D_GNU_SOURCE -fPIC -fwrapv -fPIC > > compile options: '-Inumpy/core/src/private -Inumpy/core/src > -Inumpy/core > - -Inumpy/core/src/npymath -Inumpy/core/src/multiarray > - -Inumpy/core/src/umath -Inumpy/core/include -c' > gcc: _configtest.c > _configtest.c:1: warning: conflicting types for built-in function > 'exp' > gcc -pthread _configtest.o -lm -o _configtest > success! > removing: _configtest.c _configtest.o _configtest > building extension "numpy.core._sort" sources > Generating > build/src.linux-x86_64-2.6/numpy/core/include/numpy/config.h > C compiler: gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall > - -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector > - --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC > - -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions > - -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic > - -D_GNU_SOURCE -fPIC -fwrapv -fPIC > > compile options: '-Inumpy/core/src/private -Inumpy/core/src > -Inumpy/core > - -Inumpy/core/src/npymath -Inumpy/core/src/multiarray > - -Inumpy/core/src/umath -Inumpy/core/include -c' > gcc: _configtest.c > _configtest.c:1:20: error: Python.h: No such file or directory > _configtest.c:1:20: error: Python.h: No such file or directory > failure. > removing: _configtest.c _configtest.o > Traceback (most recent call last): > File "setup.py", line 196, in > setup_package() > File "setup.py", line 189, in setup_package > configuration=configuration ) > File "/data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/core.py", > line 186, in setup > return old_setup(**new_attr) > File "/usr/lib64/python2.6/distutils/core.py", line 152, in setup > dist.run_commands() > File "/usr/lib64/python2.6/distutils/dist.py", line 975, in > run_commands > self.run_command(cmd) > File "/usr/lib64/python2.6/distutils/dist.py", line 995, in > run_command > cmd_obj.run() > File > "/data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/command/build.py", > line 37, in run > old_build.run(self) > File "/usr/lib64/python2.6/distutils/command/build.py", line 134, > in run > self.run_command(cmd_name) > File "/usr/lib64/python2.6/distutils/cmd.py", line 333, in > run_command > self.distribution.run_command(command) > File "/usr/lib64/python2.6/distutils/dist.py", line 995, in > run_command > cmd_obj.run() > File > "/data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/command/build_src.py", > line 152, in run > self.build_sources() > File > "/data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/command/build_src.py", > line 169, in build_sources > self.build_extension_sources(ext) > File > "/data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/command/build_src.py", > line 328, in build_extension_sources > sources = self.generate_sources(sources, ext) > File > "/data/proj6/safutv/src/numpy-1.6.1/numpy/distutils/command/build_src.py", > line 385, in generate_sources > source = func(extension, build_dir) > File "numpy/core/setup.py", line 410, in generate_config_h > moredefs, ignored = cocache.check_types(config_cmd, ext, build_dir) > File "numpy/core/setup.py", line 41, in check_types > out = check_types(*a, **kw) > File "numpy/core/setup.py", line 271, in check_types > "Cannot compile 'Python.h'. Perhaps you need to "\ > SystemError: Cannot compile 'Python.h'. Perhaps you need to install > python-dev|python-devel. > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2.0.14 (GNU/Linux) > Comment: Using GnuPG with Red Hat - http://enigmail.mozdev.org/ > > iQEcBAEBAgAGBQJPOh8EAAoJEBdvyODiyJI4umMIAOQOxNPK5XdQIV7scjtALh9L > g71eIIZcqjvZ9LOwOaYaWA2jx+PqQ0yGu+vWsJ3Rk7WumffnM2wZAXww3lgcs8Jm > MBcosmML5O5bFHagUG2VrrmB8stV8sTWdbV+8vf/7me8tIuPgLIUb4no/oaAjNPu > o7ZgyQI/FIEyCbtYM8vgvbFu7XKal9nkHHPZ4hXviDHsa1adtJjjhsWfjf8Rcins > us3Wr43ErUbFEWfDGGl4EMaTBaR0KjiVxLCFLq9g4MXxfmdCL7lrgl7oq8itP4lt > MqC2BEHLo8qs4PLmkOImelknu+6wINor61Iwa/1atFTqRLB0WPOLJZvA2sTTT3U= > =OXPR > -----END PGP SIGNATURE----- > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion This there a reason why you are using the fcompiler option? If not just try the basic approach: $ python setup.py build Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Feb 14 11:59:55 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 14 Feb 2012 08:59:55 -0800 Subject: [Numpy-discussion] [IPython-dev] Discussion with Guido van Rossum and (hopefully) core python-dev on scientific Python and Python3 In-Reply-To: References: Message-ID: On Mon, Feb 13, 2012 at 6:19 PM, Mark Wiebe wrote: > It might be nice to turn the matrix class into a short class hierarchy, am I confused, or did a thread get mixed in? This seems to be a numpy/scipy thing, not a Python3 thing. Or is there some support in Python itself required for this to be practical? > something like this: > > class MatrixBase > class DenseMatrix(MatrixBase) > class TriangularMatrix(MatrixBase) # Maybe a few variations of upper/lower > triangular and whether the diagonal is stored > class SymmetricMatrix(MatrixBase) and while we're at it -- first class support for "row vector" and "column vector" -- it seems that the use of the MAtrix class has never really caught on, and the fact that there is no way to represent these two vectors cleanly is perhaps one important missing feature. See the numpy list from I think a couple years ago for discussion -- we had great idea, no one who knew how cared enough to implement them. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From d.s.seljebotn at astro.uio.no Tue Feb 14 12:16:19 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 14 Feb 2012 09:16:19 -0800 Subject: [Numpy-discussion] [IPython-dev] Discussion with Guido van Rossum and (hopefully) core python-dev on scientific Python and Python3 In-Reply-To: References: Message-ID: <4F3A96E3.20401@astro.uio.no> On 02/14/2012 08:59 AM, Chris Barker wrote: > On Mon, Feb 13, 2012 at 6:19 PM, Mark Wiebe wrote: >> It might be nice to turn the matrix class into a short class hierarchy, > > am I confused, or did a thread get mixed in? This seems to be a > numpy/scipy thing, not a Python3 thing. Or is there some support in > Python itself required for this to be practical? It was about the need for a dedicated matrix multiplication operator. The idea is that if you have more decent, first-class matrices, inherently different from arrays, that people tend to use for real-world production code, then having a matrix-multiplication operator invoking np.dot in Python 3 is less of an issue, because then it's obvious that if you have a matrix then * can mean linear algebra multiplication rather than elementwise multiplication. (The difference from the current situation being that people avoid np.matrix because it's so similar to np.ndarray that you too easily get confused). I myself never missed a matrix multiplication operator, precisely because my matrices are very often diagonal or triangular or sparse or something else, so having syntax candy simply to invoke np.dot wouldn't help me. Dag > >> something like this: >> >> class MatrixBase >> class DenseMatrix(MatrixBase) >> class TriangularMatrix(MatrixBase) # Maybe a few variations of upper/lower >> triangular and whether the diagonal is stored >> class SymmetricMatrix(MatrixBase) > > and while we're at it -- first class support for "row vector" and > "column vector" -- it seems that the use of the MAtrix class has never > really caught on, and the fact that there is no way to represent these > two vectors cleanly is perhaps one important missing feature. See the > numpy list from I think a couple years ago for discussion -- we had > great idea, no one who knew how cared enough to implement them. > > -Chris > > From cournape at gmail.com Tue Feb 14 12:26:44 2012 From: cournape at gmail.com (David Cournapeau) Date: Tue, 14 Feb 2012 17:26:44 +0000 Subject: [Numpy-discussion] [cython-users] Discussion with Guido van Rossum and (hopefully) core python-dev on scientific Python and Python3 In-Reply-To: References: Message-ID: On Mon, Feb 13, 2012 at 9:55 PM, Fernando Perez wrote: > Hi folks, > > [ I'm broadcasting this widely for maximum reach, but I'd appreciate > it if replies can be kept to the *numpy* list, which is sort of the > 'base' list for scientific/numerical work. ?It will make it much > easier to organize a coherent set of notes later on. ?Apology if > you're subscribed to all and get it 10 times. ] > > As part of the PyData workshop (http://pydataworkshop.eventbrite.com) > to be held March 2 and 3 at the Mountain View Google offices, we have > scheduled a session for an open discussion with Guido van Rossum and > hopefully as many core python-dev members who can make it. ?We wanted > to seize the combined opportunity of the PyData workshop bringing a > number of 'scipy people' to Google with the timeline for Python 3.3, > the first release after the Python language moratorium, being within > sight: http://www.python.org/dev/peps/pep-0398. > > While a number of scientific Python packages are already available for > Python 3 (either in released form or in their master git branches), > it's fair to say that there hasn't been a major transition of the > scientific community to Python3. ?Since there is no more development > being done on the Python2 series, eventually we will all want to find > ways to make this transition, and we think that this is an excellent > time to engage the core python development team and consider ideas > that would make Python3 generally a more appealing language for > scientific work. ?Guido has made it clear that he doesn't speak for > the day-to-day development of Python anymore, so we all should be > aware that any ideas that come out of this panel will still need to be > discussed with python-dev itself via standard mechanisms before > anything is implemented. ?Nonetheless, the opportunity for a solid > face-to-face dialog for brainstorming was too good to pass up. > > The purpose of this email is then to solicit, from all of our > community, ideas for this discussion. ?In a week or so we'll need to > summarize the main points brought up here and make a more concrete > agenda out of it; I will also post a summary of the meeting afterwards > here. > > Anything is a valid topic, some points just to get the conversation started: > > - Extra operators/PEP 225. ?Here's a summary from the last time we > went over this, years ago at Scipy 2008: > http://mail.scipy.org/pipermail/numpy-discussion/2008-October/038234.html, > and the current status of the document we wrote about it is here: > file:///home/fperez/www/site/_build/html/py4science/numpy-pep225/numpy-pep225.html. > > - Improved syntax/support for rationals or decimal literals? ?While > Python now has both decimals > (http://docs.python.org/library/decimal.html) and rationals > (http://docs.python.org/library/fractions.html), they're quite clunky > to use because they require full constructor calls. ?Guido has > mentioned in previous discussions toying with ideas about support for > different kinds of numeric literals... > > - Using the numpy docstring standard python-wide, and thus having > python improve the pathetic state of the stdlib's docstrings? ?This is > an area where our community is light years ahead of the standard > library, but we'd all benefit from Python itself improving on this > front. ?I'm toying with the idea of giving a lighting talk at PyConn > about this, comparing the great, robust culture and tools of good > docstrings across the Scipy ecosystem with the sad, sad state of > docstrings in the stdlib. ?It might spur some movement on that front > from the stdlib authors, esp. if the core python-dev team realizes the > value and benefit it can bring (at relatively low cost, given how most > of the information does exist, it's just in the wrong places). ?But > more importantly for us, if there was truly a universal standard for > high-quality docstrings across Python projects, building good > documentation/help machinery would be a lot easier, as we'd know what > to expect and search for (such as rendering them nicely in the ipython > notebook, providing high-quality cross-project help search, etc). > > - Literal syntax for arrays? ?Sage has been floating a discussion > about a literal matrix syntax > (https://groups.google.com/forum/#!topic/sage-devel/mzwepqZBHnA). ?For > something like this to go into python in any meaningful way there > would have to be core multidimensional arrays in the language, but > perhaps it's time to think about a piece of the numpy array itself > into Python? ?This is one of the more 'out there' ideas, but after > all, that's the point of a discussion like this, especially > considering we'll have both Travis and Guido in one room. > > - Other syntactic sugar? Sage has "a..b" <=> range(a, b+1), which I > actually think is ?both nice and useful... There's also the question > of allowing "a:b:c" notation outside of [], which has come up a few > times in conversation over the last few years. Others? > > - The packaging quagmire? ?This continues to be a problem, though > python3 does have new improvements to distutils. ?I'm not really up to > speed on the situation, to be frank. ?If we want to bring this up, > someone will have to provide a solid reference or volunteer to do it > in person. I will be at pydata, so I can try to get an elevator pitch ready for the packaging situation. I may be biased, but I don't think distutils2 actually improved the situation much for the scientific community (most likely made it worse by having yet one more solution without much improvement). In particular; - commands are still coupled to each other - it is still not possible to use an actual build system with dependency handling - no sensible API (distutils2 cannot be used as a library) cheers, David From fperez.net at gmail.com Tue Feb 14 12:34:10 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 14 Feb 2012 09:34:10 -0800 Subject: [Numpy-discussion] [cython-users] Discussion with Guido van Rossum and (hopefully) core python-dev on scientific Python and Python3 In-Reply-To: References: Message-ID: On Tue, Feb 14, 2012 at 9:26 AM, David Cournapeau wrote: > I will be at pydata, so I can try to get an elevator pitch ready for > the packaging situation. Awesome! I didn't realize you were coming, and you're obviously the person I had in mind for this job :) Cheers, f From travis at continuum.io Tue Feb 14 13:25:23 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 14 Feb 2012 12:25:23 -0600 Subject: [Numpy-discussion] Updated differences between 1.5.1 to 1.6.1 In-Reply-To: References: Message-ID: <94A35936-A58E-4C4C-9909-3A4A88A07B1A@continuum.io> On Feb 14, 2012, at 3:32 AM, David Cournapeau wrote: > Hi Travis, > > It is great that some resources can be spent to have people paid to > work on NumPy. Thank you for making that happen. > > I am slightly confused about roadmaps for numpy 1.8 and 2.0. This > needs discussion on the ML, and our release manager currently is Ralf > - he is the one who ultimately decides what goes when. Thank you for reminding me of this. Ralf and I spoke several days ago, and have been working on how to give him more time to spend on SciPy full-time. As a result, he will be release managing NumPy 1.7, but for NumPy 1.8, I will be the release manager again. Ralf will continue serving as release manager for SciPy. For NumPy 2.0 and beyond, Mark Wiebe will likely be the release manager. I only know that I won't be release manager past NumPy 1.X. > I am also not > completely comfortable by having a roadmap advertised to Pycon not > coming from the community. This is my bad wording which is a function of being up very late. At PyCon we will be discussing the roadmap conversations that are taking place on this list. We won't be presenting anything there related to the NumPy project that has not first been discussed here. The community will have ample opportunity to provide input, suggestions, and criticisms for anything that goes into NumPy --- the same as I've always done before when releasing open source software. In fact, I will also be discussing at PyCon, the creation of NumFOCUS (NumPy Foundation for Open Code for Usable Science) which has been organized precisely for ensuring that NumPy, SciPy, Matplotlib, and IPython stay community-focused and community-led even while receiving input and money from multiple companies and organizations. There is a mailing list for numfocus that you can sign up for if you would like to be part of those discussions. Let me know if you would like more information about that. John Hunter, Fernando Perez, me, Perry Greenfield, and Jarrod Millman are the initial board of the Foundation. But, I expect the Foundation directors to evolve over time. Best regards, -Travis > > regards, > > David > > On Tue, Feb 14, 2012 at 9:03 AM, Travis Oliphant wrote: >> For reference, here is the table that shows the actual changes between 1.5.1 and 1.6.1 at least on 64-bit platforms in terms of type-casting. I updated the comparison code to throw out changes that are just "spelling differences" (i.e. where 1.6.1 chooses to create an output dtype with an 'L' character code instead of a 'Q' which on 64-bit system is effectively the same). >> >> >> >> >> Mostly I'm happy with the changes (after a cursory review). As I expected, there are some real improvements. Of course, I haven't looked at the changes that occur when the scalar being used does not fit in the range of the array data-type. I don't see this change documented in the link that Mark sent previously. Is it somewhere else? Also, it looks like previously object arrays were returned for some coercions which now simply fail. Is that an expected result? >> >> At this point, I'm not going to recommend changes to 1.7 to deal with these type-casting changes --- at least this thread will serve to show some of what changes occurred if it bites anyone in the future. >> >> However, I will have other changes to NumPy 1.X that I will be proposing and writing (and directing other people to write as well). After some period of quiet, this might be a refreshing change. But, not all may see it that way. I'm confident that we can resolve any concerns people might have. Any feature additions will preserve backward compatibility in NumPy 1.X. Mark W. will be helping with some of these changes, but mostly he will be working on NumPy 2.0 which we have tentatively targeted for next January. We have a tentative target for NumPy 1.8 in June/July. So far, there are three developers who will be working on NumPy 1.8 (me, Francesc Alted, and Bryan Van de Ven). Mark Wiebe is slated to help us, as well, but I would like to sponsor him as much as possible on the work for NumPy 2.0. If anyone else would like to join us, please let me know off-list. There is room for another talented person on our team. >> >> In addition to a few select features in NumPy 1.8 (a list of which will follow in a later email), we will also be working on reviewing the list of bugs on Trac and fixing them, writing tests, and improving docstrings. I would also like to improve the state of the bug-tracker and get in place a continuous integration system for NumPy. We will be advertising our NumPy 1.8 roadmap and our NumPy 2.0 roadmap at PyCon, and are working on documents that describe plans which we are hoping will be reviewed and discussed on this list. >> >> I know that having more people working on the code-base for several months will be a different scenario than what has transpired in the past. Hopefully, this will be a productive time for everybody and our sometimes different perspectives will be able to coalesce into a better result for more people. >> >> Best regards, >> >> -Travis >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From mwwiebe at gmail.com Tue Feb 14 13:50:42 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Tue, 14 Feb 2012 10:50:42 -0800 Subject: [Numpy-discussion] Updated differences between 1.5.1 to 1.6.1 In-Reply-To: References: Message-ID: On Tue, Feb 14, 2012 at 1:03 AM, Travis Oliphant wrote: > > Mostly I'm happy with the changes (after a cursory review). As I > expected, there are some real improvements. Of course, I haven't looked > at the changes that occur when the scalar being used does not fit in the > range of the array data-type. I don't see this change documented in the > link that Mark sent previously. Is it somewhere else? That part is handled by the min_scalar_type function, linked in the description. An aspect of the previous mechanism I didn't like is that the existing analogous promotion of unsigned -> signed was tucked away deep in the C code, and that motivated this exposure of the components of the type promotion system as function calls. http://docs.scipy.org/doc/numpy/reference/generated/numpy.min_scalar_type.html > Also, it looks like previously object arrays were returned for some > coercions which now simply fail. Is that an expected result? > I believe what you're seeing here is the result of a performance optimization (note: done carefully, I might point out that there were many performance optimizations I did not do because they would have broken backwards compatibility). This was a pretty large performance issue, I believe in the comparison operators. The operation on non-object dtypes would promote to object dtypes and hence run very slowly. After this, the code which called it would determine that it was the wrong result, throw away the computation, and return something else. Cheers, Mark At this point, I'm not going to recommend changes to 1.7 to deal with these > type-casting changes --- at least this thread will serve to show some of > what changes occurred if it bites anyone in the future. > > However, I will have other changes to NumPy 1.X that I will be proposing > and writing (and directing other people to write as well). After some > period of quiet, this might be a refreshing change. But, not all may see > it that way. I'm confident that we can resolve any concerns people might > have. Any feature additions will preserve backward compatibility in NumPy > 1.X. Mark W. will be helping with some of these changes, but mostly he > will be working on NumPy 2.0 which we have tentatively targeted for next > January. We have a tentative target for NumPy 1.8 in June/July. So > far, there are three developers who will be working on NumPy 1.8 (me, > Francesc Alted, and Bryan Van de Ven). Mark Wiebe is slated to help us, as > well, but I would like to sponsor him as much as possible on the work for > NumPy 2.0. If anyone else would like to join us, please let me know > off-list. There is room for another talented person on our team. > > In addition to a few select features in NumPy 1.8 (a list of which will > follow in a later email), we will also be working on reviewing the list of > bugs on Trac and fixing them, writing tests, and improving docstrings. I > would also like to improve the state of the bug-tracker and get in place a > continuous integration system for NumPy. We will be advertising our NumPy > 1.8 roadmap and our NumPy 2.0 roadmap at PyCon, and are working on > documents that describe plans which we are hoping will be reviewed and > discussed on this list. > > I know that having more people working on the code-base for several months > will be a different scenario than what has transpired in the past. > Hopefully, this will be a productive time for everybody and our sometimes > different perspectives will be able to coalesce into a better result for > more people. > > Best regards, > > -Travis > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yadavpulkit at gmail.com Tue Feb 14 14:22:42 2012 From: yadavpulkit at gmail.com (pulkit yadav) Date: Wed, 15 Feb 2012 00:52:42 +0530 Subject: [Numpy-discussion] addition to numpy discussion list Message-ID: Hello, I am a Python enthusiast and developer. Please add me to numpy mailing list so that I can contribute to the FLOSS community. -- Pulkit -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Tue Feb 14 14:26:08 2012 From: shish at keba.be (Olivier Delalleau) Date: Tue, 14 Feb 2012 14:26:08 -0500 Subject: [Numpy-discussion] addition to numpy discussion list In-Reply-To: References: Message-ID: Hi, You can subscribe here: http://mail.scipy.org/mailman/listinfo/numpy-discussion -=- Olivier Le 14 f?vrier 2012 14:22, pulkit yadav a ?crit : > Hello, > > I am a Python enthusiast and developer. Please add me to numpy mailing > list so that I can contribute to the FLOSS community. > > -- > Pulkit > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue Feb 14 14:28:57 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 14 Feb 2012 11:28:57 -0800 Subject: [Numpy-discussion] Numpy governance update - was: Updated differences between 1.5.1 to 1.6.1 Message-ID: Hi, On Tue, Feb 14, 2012 at 10:25 AM, Travis Oliphant wrote: > > On Feb 14, 2012, at 3:32 AM, David Cournapeau wrote: > >> Hi Travis, >> >> It is great that some resources can be spent to have people paid to >> work on NumPy. Thank you for making that happen. >> >> I am slightly confused about roadmaps for numpy 1.8 and 2.0. This >> needs discussion on the ML, and our release manager currently is Ralf >> - he is the one who ultimately decides what goes when. > > Thank you for reminding me of this. ?Ralf and I spoke several days ago, and have been working on how to give him more time to spend on SciPy full-time. ? As a result, he will be release managing NumPy 1.7, but for NumPy 1.8, I will be the release manager again. ? Ralf will continue serving as release manager for SciPy. > > For NumPy 2.0 and beyond, Mark Wiebe will likely be the release manager. ? I only know that I won't be release manager past NumPy 1.X. > >> I am also not >> completely comfortable by having a roadmap advertised to Pycon not >> coming from the community. > > This is my bad wording which is a function of being up very late. ? ?At PyCon we will be discussing the roadmap conversations that are taking place on this list. ? We won't be presenting anything there related to the NumPy project that has not first been discussed here. > > The community will have ample opportunity to provide input, suggestions, and criticisms for anything that goes into NumPy --- the same as I've always done before when releasing open source software. ? In fact, I will also be discussing at PyCon, the creation of NumFOCUS (NumPy Foundation for Open Code for Usable Science) which has been organized precisely for ensuring that NumPy, SciPy, Matplotlib, and IPython stay community-focused and community-led even while receiving input and money from multiple companies and organizations. > > There is a mailing list for numfocus that you can sign up for if you would like to be part of those discussions. ? Let me know if you would like more information about that. ? ?John Hunter, Fernando Perez, me, Perry Greenfield, and Jarrod Millman are the initial board of the Foundation. ? But, I expect the Foundation directors to evolve over time. I should say that I have no knowledge of the events above other than from the mailing list (I say that only because some of you may know that I'm a friend and colleague of Jarrod and Fernando). Travis - I hope you don't mind, but here I post some links that I have just found: http://technicaldiscovery.blogspot.com/2012/01/transition-to-continuum.html http://www.continuum.io/ I see that you've founded a new company, Continuum Analytics, and you are working with Peter Wang, Mark Wiebe, Francesc Alted (PyTables), and Bryan Van de Ven. I think you mentioned this earlier in one of the recent threads. In practice this gives your company an overwhelming voice in the direction of numpy. >From the blog post you say: "This may also mean different business models and licensing around some of the NumPy-related code that the company writes." Obviously your company will need to make enough money to cover your salaries and more. There is huge potential here for clashes of interest, and for perceived clashes of interest. The perceived clashes are just as damaging as the actual clashes. I still don't think we've got a "Numpy steering group". The combination of the huge concentration of numpy resources in your company, and a lack of explicit community governance, seems to me to be something that needs to be fixed urgently. Do you agree? Is there any reason why the numfocus group was formed without obvious public discussion about it's composition, remit or governance? I'm not objecting to it's composition, but I think it is a mistake to make large decisions like this without public consultation. I imagine that what happened was that things moved too fast to make it attractive to slow the process by public discussion. I implore you to slow down and commit yourself to have that discussion in full and in public, in the interests of the common ownership of the project. Best, Matthew From warren.weckesser at enthought.com Tue Feb 14 14:38:52 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Tue, 14 Feb 2012 13:38:52 -0600 Subject: [Numpy-discussion] Updated differences between 1.5.1 to 1.6.1 In-Reply-To: <94A35936-A58E-4C4C-9909-3A4A88A07B1A@continuum.io> References: <94A35936-A58E-4C4C-9909-3A4A88A07B1A@continuum.io> Message-ID: On Tue, Feb 14, 2012 at 12:25 PM, Travis Oliphant wrote: > > There is a mailing list for numfocus that you can sign up for if you would > like to be part of those discussions. Let me know if you would like more > information about that. I would like more information about (as would many others here, I suspect). Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Tue Feb 14 15:14:52 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 14 Feb 2012 14:14:52 -0600 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: <1329224675.27469.11.camel@farnsworth> References: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> <4F3CCD79-7DDB-4FB9-9C2E-57759C1FB679@continuum.io> <10FB0A88-DB1F-41BF-9A1F-45BC02C5D9EB@continuum.io> <1329224675.27469.11.camel@farnsworth> Message-ID: <9BAD0F2D-F36F-4A3E-B4B2-FA2BDC8514AE@continuum.io> On Feb 14, 2012, at 7:04 AM, Henry Gomersall wrote: > On Mon, 2012-02-13 at 22:56 -0600, Travis Oliphant wrote: >> But, I am also aware of *a lot* of users who never voice their opinion >> on this list, and a lot of features that they want and need and are >> currently working around the limitations of NumPy to get. These are >> going to be my primary focus for the rest of the 1.X series. > > Is that a prompt for feedback? :) Absolutely. That's the reason I'm getting more active on this list. But, at the same time, we all need to be aware of the tens of thousands of users of NumPy who don't use the mailing list and who need a better way to communicate their voice. Even while I have not been active on the mailing list, I have had a chance to communicate with, work with, and collaborate with hundreds of those users and hear about their needs, use-cases, and requirements. It has given me a fairly broad perspective on where particular corners cut in delivery of NumPy 1.0 need to be smoothed off, and where easy-to-add, but essential missing features could be proposed. Some of these have been proposed already, and others will be proposed throughout this year. I look forward to the discussion. -Travis > > (btw, whilst the back slapping is going on, I think Python and Numpy in > conjunction with Cython is just how software should be developed > nowadays. It truly is a wonderful and winning combo. So a huge thanks to > all the developers for making the world a better place.) > > Cheers, > > Henry > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From travis at continuum.io Tue Feb 14 15:17:13 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 14 Feb 2012 14:17:13 -0600 Subject: [Numpy-discussion] Typecasting changes from 1.5.1 to 1.6.1 In-Reply-To: References: Message-ID: <5FCAD379-0E60-4993-976C-6A29E7E73B1B@continuum.io> > > As you can see there were changes in each release. Most of these were minor prior to the change from 1.5.1 to 1.6.1. I am still reviewing the changes from 1.5.1 to 1.6.1. At first blush, it looks like there are a lot of changes to swallow that are not necessarily minor. I really would like to just say all is well, and it's no big deal. I hope that users really don't care and nobody's code is really relying on array-scalar combination conversions. > > > > -Travis > > > > > > > > Would it make sense to adapt this code to go into the test suite? Possibly. I'm not sure. I guess it depends on how much we want to hard-code the current behavior now. Mark? How confident do you feel in solidifying the casting rules? We would need to add cases for overflow that are currently handled differently than originally specified. -Travis > > Ben Root _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From travis at continuum.io Tue Feb 14 15:22:02 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 14 Feb 2012 14:22:02 -0600 Subject: [Numpy-discussion] [cython-users] Discussion with Guido van Rossum and (hopefully) core python-dev on scientific Python and Python3 In-Reply-To: References: Message-ID: <19C1AB69-C3C7-4924-99CB-45E7672D8049@continuum.io> >> >> - The packaging quagmire? This continues to be a problem, though >> python3 does have new improvements to distutils. I'm not really up to >> speed on the situation, to be frank. If we want to bring this up, >> someone will have to provide a solid reference or volunteer to do it >> in person. > > I will be at pydata, so I can try to get an elevator pitch ready for > the packaging situation. I may be biased, but I don't think distutils2 > actually improved the situation much for the scientific community > (most likely made it worse by having yet one more solution without > much improvement). > > In particular; > - commands are still coupled to each other > - it is still not possible to use an actual build system with > dependency handling > - no sensible API (distutils2 cannot be used as a library) David has explained his system to me several times and I will voice my strong approval for pushing for something built on his system. The Python community around distutils2 has no intuition for the needs of extension writers with large libraries supporting them, and has not been open enough to David's tireless suggestions. -Travis From ralf.gommers at googlemail.com Tue Feb 14 15:24:29 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 14 Feb 2012 21:24:29 +0100 Subject: [Numpy-discussion] Release management (was: Updated differences between 1.5.1 to 1.6.1) Message-ID: On Tue, Feb 14, 2012 at 7:25 PM, Travis Oliphant wrote: > > On Feb 14, 2012, at 3:32 AM, David Cournapeau wrote: > > > Hi Travis, > > > > It is great that some resources can be spent to have people paid to > > work on NumPy. Thank you for making that happen. > > > > I am slightly confused about roadmaps for numpy 1.8 and 2.0. This > > needs discussion on the ML, and our release manager currently is Ralf > > - he is the one who ultimately decides what goes when. > > Thank you for reminding me of this. Ralf and I spoke several days ago, > and have been working on how to give him more time to spend on SciPy > full-time. Well, full-time is the job that I get paid for:) As a result, he will be release managing NumPy 1.7, but for NumPy 1.8, I > will be the release manager again. Ralf will continue serving as release > manager for SciPy. > I had planned to bring this up only after the 1.7 release but yes, I would like to push the balance of my open-source work a little from release/maintenance work towards writing more new code. I've been doing both NumPy and SciPy releases for about two years now, and it's time for me to hand over the manager hat for one of those two. And my preference is to keep on doing the SciPy releases rather than the NumPy ones. For NumPy 2.0 and beyond, Mark Wiebe will likely be the release manager. > I only know that I won't be release manager past NumPy 1.X. > Travis, it's very good to see that the release manager role can be filled going forward (it's not the most popular job), but I think the way it should work is that people volunteer for this role and then the community agrees on giving a volunteer that role. I actually started contributing when David asked for someone to take over from him in the above manner. Maybe someone else will step up now, giving you or Mark more time to work on new NumPy features (which I'm pretty sure you'd prefer). Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Tue Feb 14 15:27:18 2012 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 14 Feb 2012 14:27:18 -0600 Subject: [Numpy-discussion] Typecasting changes from 1.5.1 to 1.6.1 In-Reply-To: <5FCAD379-0E60-4993-976C-6A29E7E73B1B@continuum.io> References: <5FCAD379-0E60-4993-976C-6A29E7E73B1B@continuum.io> Message-ID: On Tuesday, February 14, 2012, Travis Oliphant wrote: >> > As you can see there were changes in each release. Most of these were minor prior to the change from 1.5.1 to 1.6.1. I am still reviewing the changes from 1.5.1 to 1.6.1. At first blush, it looks like there are a lot of changes to swallow that are not necessarily minor. I really would like to just say all is well, and it's no big deal. I hope that users really don't care and nobody's code is really relying on array-scalar combination conversions. >> > >> > -Travis >> > >> > >> > >> >> Would it make sense to adapt this code to go into the test suite? > > Possibly. I'm not sure. I guess it depends on how much we want to hard-code the current behavior now. Mark? How confident do you feel in solidifying the casting rules? We would need to add cases for overflow that are currently handled differently than originally specified. > > -Travis > I don't think it is so much a question about whether or not to solidify these casting rules as much it is about being aware of changes in the table. This prevents accidental changes and forces us to make purposeful changes to the agreed test results. This would also make it very easy to diagnose cross-platform differences. Of course, the question is, what do we consider to be the "truth" data? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From heng at cantab.net Tue Feb 14 16:31:41 2012 From: heng at cantab.net (Henry Gomersall) Date: Tue, 14 Feb 2012 21:31:41 +0000 Subject: [Numpy-discussion] Change in scalar upcasting rules for 1.6.x? In-Reply-To: <9BAD0F2D-F36F-4A3E-B4B2-FA2BDC8514AE@continuum.io> References: <8B9E34A2-8D1F-457F-815C-077D8BA0733E@continuum.io> <4F3CCD79-7DDB-4FB9-9C2E-57759C1FB679@continuum.io> <10FB0A88-DB1F-41BF-9A1F-45BC02C5D9EB@continuum.io> <1329224675.27469.11.camel@farnsworth> <9BAD0F2D-F36F-4A3E-B4B2-FA2BDC8514AE@continuum.io> Message-ID: <1329255101.27469.23.camel@farnsworth> On Tue, 2012-02-14 at 14:14 -0600, Travis Oliphant wrote: > > Is that a prompt for feedback? :) > > Absolutely. That's the reason I'm getting more active on this list. > But, at the same time, we all need to be aware of the tens of > thousands of users of NumPy who don't use the mailing list and who > need a better way to communicate their voice. Great, I'll have a think. I have been pushing quite hard at the boundaries of Numpy in my projects. cheers, Henry From mwwiebe at gmail.com Tue Feb 14 16:40:31 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Tue, 14 Feb 2012 13:40:31 -0800 Subject: [Numpy-discussion] Typecasting changes from 1.5.1 to 1.6.1 In-Reply-To: References: <5FCAD379-0E60-4993-976C-6A29E7E73B1B@continuum.io> Message-ID: On Tue, Feb 14, 2012 at 12:27 PM, Benjamin Root wrote: > > > On Tuesday, February 14, 2012, Travis Oliphant > wrote: > >> > As you can see there were changes in each release. Most of these > were minor prior to the change from 1.5.1 to 1.6.1. I am still reviewing > the changes from 1.5.1 to 1.6.1. At first blush, it looks like there are > a lot of changes to swallow that are not necessarily minor. I really > would like to just say all is well, and it's no big deal. I hope that > users really don't care and nobody's code is really relying on array-scalar > combination conversions. > >> > > >> > -Travis > >> > > >> > > >> > > >> > >> Would it make sense to adapt this code to go into the test suite? > > > > Possibly. I'm not sure. I guess it depends on how much we want to > hard-code the current behavior now. Mark? How confident do you feel in > solidifying the casting rules? We would need to add cases for overflow > that are currently handled differently than originally specified. > > > > -Travis > > > > I don't think it is so much a question about whether or not to solidify > these casting rules as much it is about being aware of changes in the > table. This prevents accidental changes and forces us to make purposeful > changes to the agreed test results. > This is the best reason for putting it in the tests. It doesn't mean that we never change how the casting rules work, it means we never *accidentally* change how they work. > This would also make it very easy to diagnose cross-platform differences. > > Of course, the question is, what do we consider to be the "truth" data? > I think this would be either the current behavior, or whatever we decide should be the behavior, with associated coding effort to make that change. -Mark > > Ben Root > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Tue Feb 14 16:54:56 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 14 Feb 2012 15:54:56 -0600 Subject: [Numpy-discussion] Numpy governance update - was: Updated differences between 1.5.1 to 1.6.1 In-Reply-To: References: Message-ID: >> >> There is a mailing list for numfocus that you can sign up for if you would like to be part of those discussions. Let me know if you would like more information about that. John Hunter, Fernando Perez, me, Perry Greenfield, and Jarrod Millman are the initial board of the Foundation. But, I expect the Foundation directors to evolve over time. > > I should say that I have no knowledge of the events above other than > from the mailing list (I say that only because some of you may know > that I'm a friend and colleague of Jarrod and Fernando). Thanks for speaking up, Matthew. I knew that this was my first announcement of the Foundation to this list. Things are still just starting around that organization, and so there is plenty of time for input. This sort of thing has actually been under-way for a long time --- it just has not received much impetus until now for one reason or another. To be clear, there were several email posts about a Foundation to this list last fall and we took the discussion of the Foundation that has really been in the works for a couple of years (thanks to Jarrod), to a Google Group (very poorly) called Fastechula. There were 33 people who signed up for that list and discussions continued sporadically on that list away from this one. When we selected the name NumFOCUS just a few weeks ago, we created the list for numfocus and then I signed everyone up for that list who was on the other one. I apologize if anyone felt left out. That is not my intention. But, I also did not want to consume this mailing list with something that might be considered off-topic. I repeat that there is still plenty time for input. Obviously, the board has been selected. But that must be done by someone. I took the liberty to invite the first board members who graciously accepted the assignment. I consider the Foundation a service opportunity. I'm grateful that representatives from the major projects are willing to serve. I expect that to be a tradition, but it is one that needs to be discussed and developed. Yes, I have started a new company with Peter Wang. However, most of the people on this list will probably be most interested in the NumFOCUS work. The goal of the Foundation is to promote the entire Scientific Computing with Python ecosystem. It will not be taking over any of the public mailing lists where there is already a great deal of opportunity to express opinions and desires. The Foundation will have it's own public mailing list where mostly financial and funding matters that are common to all of the projects can be sent and discussed. Go here and sign up for the public mailing list if you are interested in the Foundation: http://groups.google.com/group/numfocus?hl=en We will be discussing the Foundation at PyCon as well. > > "This may also mean different business models and licensing around > some of the NumPy-related code that the company writes." > > Obviously your company will need to make enough money to cover your > salaries and more. There is huge potential here for clashes of > interest, and for perceived clashes of interest. The perceived > clashes are just as damaging as the actual clashes. Perceptions can be damaging. This is one of the big reasons for the organization of the Foundation -- to be a place separate from any commercial venture which can direct resources to a vision whose goal is more democratically determined. I trust that people will observe results and come to expect good things that will naturally emerge by having more talented people involved in the process who are being directed by the community needs. > > I still don't think we've got a "Numpy steering group". The > combination of the huge concentration of numpy resources in your > company, and a lack of explicit community governance, seems to me to > be something that needs to be fixed urgently. Do you agree? I'm sensitive to the perception that some might have that Continuum might "hi-jack" NumPy. That is the central reason I am very supportive of and pushing the organization of NumFOCUS. I want corporate dollars that flow to NumPy to have some buffering between the money that is being spent and what is promoted. This can be a delicate situation, but I think it can also work well. RedHat, IBM, and Google all cooperate to make Linux better through the Linux Foundation. The same needs to be the case with NumPy. This will depend, of course, on everybody on this list and they way they receive new input and the way they communicate with each other. I think we do have a NumPy steering group if you want to call it that. It is currently me, Mark Wiebe, and Charles Harris. Rolf Gommers, Pauli Virtanen, David Cournapeau and Robert Kern also have opinions that carry significant weight. Are there other people that should be on this list? There are other people who also speak up on this list whose opinions will be listened to and heard. In fact, I hope that many more people will come to the list and speak out as development increases. I think many of the people who have been carrying NumPy along for the past few years really have their hearts and minds on SciPy. I would like to unburden Ralf, David, and Robert, for example to continue their work on making SciPy a better package. Working on both NumPy and SciPy is too big of a job for one person. I am very aware of this. I agree whole-heartedly with what might be an implied concern of your previous statement. I absolutely don't want Continuum to be the only voice driving NumPy. It is a long-term concern for me. In the short-term, Mark and I are both driving NumPy and both at Continuum. This might be an issue for some. I do not believe it will be. Mark and I are quite different in our perspectives and we have and will continue to disagree appropriately for the benefit of the project as a whole. I really don't see a problem over the next few months, but time will tell. If for some reason, it changes, I'm sure there will be people speaking out --- and I would encourage it. > > Is there any reason why the numfocus group was formed without obvious > public discussion about it's composition, remit or governance? I'm > not objecting to it's composition, but I think it is a mistake to make > large decisions like this without public consultation. There is probably some misunderstanding here. NumFOCUS is a "funding body". It's goal is to get money for sprints, code-grants, bounties, etc. to other projects that maintain their independence. Whether it becomes more than that over time really depends on who participates in it. I believe the discussions were public --- just not on this list. Some discussions have not been, but that is the nature of any organization -- and also the fact that it is higher band-width to talk face-to-face. That's why we have conferences, and meetings, and sprints. > > I imagine that what happened was that things moved too fast to make it > attractive to slow the process by public discussion. I implore you > to slow down and commit yourself to have that discussion in full and > in public, in the interests of the common ownership of the project There will be plenty of time for public discussion about the NumPy project. Nothing is changing there. I apologize if I have implied I see this otherwise. Anything that happens with NumPy the project will happen in the full light of this list. What Continuum does on top of that will be separate. My goal is to continue to improve the NumPy / SciPy ecosystem. That has been my goal and desire for 14 years. Nothing has changed except my circumstances and ability to contribute in multiple ways that is different now than it has been in the past. What has also changed is that I now have a much clearer picture of where NumPy can and should be. After watching it grow and get used by multiple large organizations --- and get pushed up against its limitations, and get under-utilized in certain corners, I have seen what NumPy can be. It could help a lot of people --- I mean *a lot* of people. To realize what NumPy could be, there is a lot of work that needs to get done. This is very exciting to think about and to work on. I hope that you will be patient with me as it will take me time to write up every thing that NumPy needs. I plan to take that time. Mark Wiebe is also doing a lot of writing and will be presenting his ideas to this list. This is a project that will take a few man years not man months. Not all of this will end up in NumPy 1.8. We are geared up for the long haul. Consistent contributions is what we will be providing for several months --- to the benefit of all involved. NumFOCUS will have it's own pace and its own discussion as it serves a different purpose. It's organization and mission will also be open to public comments and discussion on it's own list. The discussion there is very early stages, still. There is plenty of time to jump in and comment. Here's the link again: http://groups.google.com/group/numfocus?hl=en Best regards, -Travis > . > > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Tue Feb 14 16:58:33 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 14 Feb 2012 15:58:33 -0600 Subject: [Numpy-discussion] Release management (was: Updated differences between 1.5.1 to 1.6.1) In-Reply-To: References: Message-ID: <27E6F800-48F5-4436-A6E0-F349D11DF216@continuum.io> > > Travis, it's very good to see that the release manager role can be filled going forward (it's not the most popular job), but I think the way it should work is that people volunteer for this role and then the community agrees on giving a volunteer that role. > > I actually started contributing when David asked for someone to take over from him in the above manner. Maybe someone else will step up now, giving you or Mark more time to work on new NumPy features (which I'm pretty sure you'd prefer). > A lovely suggestion. If there is someone who would like to step up? We can train you.... -Travis > Cheers, > Ralf > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From chris.barker at noaa.gov Tue Feb 14 18:12:55 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 14 Feb 2012 15:12:55 -0800 Subject: [Numpy-discussion] [IPython-dev] Discussion with Guido van Rossum and (hopefully) core python-dev on scientific Python and Python3 In-Reply-To: <4F3A96E3.20401@astro.uio.no> References: <4F3A96E3.20401@astro.uio.no> Message-ID: On Tue, Feb 14, 2012 at 9:16 AM, Dag Sverre Seljebotn wrote: > It was about the need for a dedicated matrix multiplication operator. has anyone proposed that? I do think we've had a proposal on the table for generally more operators: i.e. like matlab's ".*" vs "*", and yes, matrix multiplication would be one use of that feature. > (The difference from the current > situation being that people avoid np.matrix because it's so similar to > np.ndarray that you too easily get confused). I'm not sure that's why -- I think it's because: a) np.matrix is really not quite finished - it's not full featured enough to be truly useful (maybe that's the same as you "similar to np.ndarray)" b) all it provides is syntax candy for matrix operations -- and how much of our code is matricx operations? a couple lines our of hundreds (that was always the case with my MATLAB code, and it even the more so now -- I can count my uses of np.dot on one hand...) > I myself never missed a matrix multiplication operator, precisely > because my matrices are very often diagonal or triangular or sparse or > something else, so having syntax candy simply to invoke np.dot wouldn't > help me. exactly -- so that could address point (a), but I still think in most code it's only going to amke a small fraction of teh code more readable. -Chris PS: the notable exception is instructional code involving matrix arithmetic -- it would be nice there, and that is where I've seen the strongest requests for it. -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From m.oliver at jacobs-university.de Tue Feb 14 18:23:53 2012 From: m.oliver at jacobs-university.de (Marcel Oliver) Date: Wed, 15 Feb 2012 00:23:53 +0100 Subject: [Numpy-discussion] Index Array Performance In-Reply-To: References: Message-ID: <20282.60681.922318.540486@localhost.localdomain> Francesc Alted wrote: > On Feb 14, 2012, at 1:50 AM, Wes McKinney wrote: > [clip] > > But: > > > > In [40]: timeit hist[i, j] > > 10000 loops, best of 3: 32 us per loop > > > > So that's roughly 7-8x slower than a simple Cython method, so I > > sincerely hope it could be brought down to the sub 10 microsecond > > level with a little bit of work. > > I vaguely remember this has shown up before. My hunch is that > indexing in NumPy is so powerful, that it has to check for a lot of > different values for indices (integers, tuples, lists, arrays?), > and that it is all these checks what is taking time. Your Cython > wrapper just assumed that the indices where integers, so this is > probably the reason why it is that much faster. Thanks for all the replies. Playing a bit with timeit, it is clear that it cannot just be the overhead of checking the type of the index array, as the overhead grows very roughly propertional to the size of the index array, but remains independent of the size of the indexed array. In [1]: a=arange(1000000) In [2]: i = arange(10) In [3]: timeit a[i] 100000 loops, best of 3: 1.95 us per loop In [4]: i = arange(100) In [5]: timeit a[i] 100000 loops, best of 3: 4.07 us per loop In [6]: i = arange(1000) In [7]: timeit a[i] 10000 loops, best of 3: 23.3 us per loop In [8]: timeit i+i 100000 loops, best of 3: 5.62 us per loop In [9]: i = arange(10000) In [10]: timeit a[i] 1000 loops, best of 3: 220 us per loop In [11]: timeit i+i 10000 loops, best of 3: 28.1 us per loop It would really be nice if this could be made more efficient as it's a very nice and powerful construct. Robert Kern wrote: > Other people have explained that yes, applying index arrays is slow. I > would just like to add the tangential point that this code does not > behave the way that you think it does. You cannot make histograms like > this. The statement "hist[i,j] += 1" gets broken down into three > separate statements by the Python compiler: > > tmp = hist.__getitem__((i,j)) > tmp = tmp.__iadd__(1) > hist.__setitem__((i,j), tmp) > > Note that tmp is a new array with copies of the data in hist at the > (i,j) locations, possibly multiple copies if the i index has > repetitions. Each one of these copies gets incremented by 1, then the > __setitem__() will apply each of those in turn to the appropriate cell > in hist, each one simply overwriting the previous one. I know. The posted code is correct because it updates only one bin in each of the many histograms at a time, which is natural for that particular example. It would indeed be nice if the increment construct would result in multiple increments work when indices repeat - I think that would be generally more useful and expected. But current behavior is documented, so it's fine if only it was efficient... Regards, Marcel From matthew.brett at gmail.com Tue Feb 14 18:33:01 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 14 Feb 2012 15:33:01 -0800 Subject: [Numpy-discussion] Numpy governance update - was: Updated differences between 1.5.1 to 1.6.1 In-Reply-To: References: Message-ID: Hi, On Tue, Feb 14, 2012 at 1:54 PM, Travis Oliphant wrote: > > There is a mailing list for numfocus that you can sign up for if you would > like to be part of those discussions. ? Let me know if you would like more > information about that. ? ?John Hunter, Fernando Perez, me, Perry > Greenfield, and Jarrod Millman are the initial board of the Foundation. > But, I expect the Foundation directors to evolve over time. > > > I should say that I have no knowledge of the events above other than > from the mailing list (I say that only because some of you may know > that I'm a friend and colleague of Jarrod and Fernando). > > > Thanks for speaking up, Matthew. ? I knew that this was my first > announcement of the Foundation to this list. ? Things are still just > starting around that organization, and so there is plenty of time for input. > ? This sort of thing has actually been under-way for a long time --- it just > has not received much impetus until now for one reason or another. > > To be clear, there were several email posts about a Foundation to this list > last fall and we took the discussion of the Foundation that has really been > in the works for a couple of years (thanks to Jarrod), to a Google Group > (very poorly) called Fastechula. ? ?There were 33 people who signed up for > that list and discussions continued sporadically on that list away from this > one. > > When we selected the name NumFOCUS just a few weeks ago, we created the list > for numfocus and then I signed everyone up for that list who was on the > other one. ? ? ?I apologize if anyone felt left out. ? That is not my > intention. My point is that there are two ways go to about this process, one is open and the other is closed. In the open version, someone proposes such a group to the mailing lists. They ask for expressions of interest. The discussion might then move to another mailing list that is publicly known and widely advertised. Members of the board are proposed in public. There might be some sort of formal or informal voting process. The reason to prefer this to the more informal private negotiations is that a) the community feels a greater ownership and control of the process and b) it is much harder to weaken or subvert an organization that explicitly does all its business in public. The counter-argument usually goes 'members X, Y and Z are of impeccable integrity and would only do what is best for the public good'. And usually, members X, Y and Z are indeed of impeccable integrity. Nevertheless I'm sure I don't have to unpack the evidence that this approach frequently fails and can fail in a catastrophic way. > Perceptions can be damaging. ? This is one of the big reasons for the > organization of the Foundation -- to be a place separate from any commercial > venture which can direct resources to a vision whose goal is more > democratically determined. Are you proposing that the Foundation oversee Numpy governance and direction? From your chosen members I'm guessing that the idea is for the foundation to think about broad strategy rather than - say - whether missing values should be encoded with masked arrays? > I think we do have a NumPy steering group if you want to call it that. > It is currently me, Mark Wiebe, and Charles Harris. ? ?Rolf Gommers, Pauli > Virtanen, David Cournapeau and Robert Kern also have opinions that carry > significant weight. ? ?Are there other people that should be on this list? > ?There are other people who also speak up on this list whose opinions will > be listened to and heard. ? In fact, I hope that many more people will come > to the list and speak out as development increases. The point I was making was that the concentration of numpy development hours and talent in your company makes it urgent that the numpy governance is set out formally, that the interests of the company are made clear, and that the steering group can be assured of explicit and public independence from the interests of the company, if and when that becomes necessary. In the past, the numpy steering group has seemed a virtual organization, formed ad-hoc when needed, and with no formal governance. I'm saying that I firmly believe that has to change, to avoid the actual or perceived loss of community ownership. Best, Matthew From d.s.seljebotn at astro.uio.no Tue Feb 14 18:33:17 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 14 Feb 2012 15:33:17 -0800 Subject: [Numpy-discussion] [IPython-dev] Discussion with Guido van Rossum and (hopefully) core python-dev on scientific Python and Python3 In-Reply-To: References: <4F3A96E3.20401@astro.uio.no> Message-ID: <4F3AEF3D.4060300@astro.uio.no> On 02/14/2012 03:12 PM, Chris Barker wrote: > On Tue, Feb 14, 2012 at 9:16 AM, Dag Sverre Seljebotn > wrote: >> It was about the need for a dedicated matrix multiplication operator. > > has anyone proposed that? I do think we've had a proposal on the table > for generally more operators: i.e. like matlab's ".*" vs "*", and yes, > matrix multiplication would be one use of that feature. Well, http://fperez.org/py4science/numpy-pep225/numpy-pep225.html is on the table, and it really is primarily about getting a matrix multiplication operator ("because MATLAB has it"). There aren't too many other uses; even in matlab, "+" and ".+" does the same thing :-) Dag From wesmckinn at gmail.com Tue Feb 14 18:38:31 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Tue, 14 Feb 2012 18:38:31 -0500 Subject: [Numpy-discussion] Index Array Performance In-Reply-To: References: <20281.39813.720754.45947@localhost.localdomain> Message-ID: On Tue, Feb 14, 2012 at 4:03 AM, Francesc Alted wrote: > On Feb 14, 2012, at 1:50 AM, Wes McKinney wrote: > [clip] >> But: >> >> In [40]: timeit hist[i, j] >> 10000 loops, best of 3: 32 us per loop >> >> So that's roughly 7-8x slower than a simple Cython method, so I >> sincerely hope it could be brought down to the sub 10 microsecond >> level with a little bit of work. > > I vaguely remember this has shown up before. ?My hunch is that indexing in NumPy is so powerful, that it has to check for a lot of different values for indices (integers, tuples, lists, arrays?), and that it is all these checks what is taking time. ?Your Cython wrapper just assumed that the indices where integers, so this is probably the reason why it is that much faster. > > This is not to say that indexing in NumPy could not be accelerated, but it won't be trivial, IMO. > Given that __getitem__ and __setitem__ receive a 2-tuple of 1-dimensional integer arrays, should be pretty simple (dare I say trivial? :) ) to optimize for this use case? The abysmal performance of of __getitem__ and __setitem__ with 1d integer arrays is pretty high on my list of annoyances with NumPy (especially when take and put are so much faster), so you guys may see a pull request from me whenever I can spare the time to hack on it (assuming you don't beat me to it)! > -- Francesc Alted > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From fperez.net at gmail.com Tue Feb 14 18:42:53 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 14 Feb 2012 15:42:53 -0800 Subject: [Numpy-discussion] [IPython-dev] Discussion with Guido van Rossum and (hopefully) core python-dev on scientific Python and Python3 In-Reply-To: References: <4F3A96E3.20401@astro.uio.no> Message-ID: On Tue, Feb 14, 2012 at 3:12 PM, Chris Barker wrote: > On Tue, Feb 14, 2012 at 9:16 AM, Dag Sverre Seljebotn > wrote: >> It was about the need for a dedicated matrix multiplication operator. > > has anyone proposed that? I do think we've had a proposal on the table > for generally more operators: i.e. like matlab's ".*" vs "*", and yes, > matrix multiplication would be one use of that feature. yup, that's what pep 225 is about: http://fperez.org/py4science/numpy-pep225/numpy-pep225.html Cheers, f From pav at iki.fi Tue Feb 14 18:46:39 2012 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 15 Feb 2012 00:46:39 +0100 Subject: [Numpy-discussion] Index Array Performance In-Reply-To: <20282.60681.922318.540486@localhost.localdomain> References: <20282.60681.922318.540486@localhost.localdomain> Message-ID: 15.02.2012 00:23, Marcel Oliver kirjoitti: [clip] > Thanks for all the replies. Playing a bit with timeit, it is clear > that it cannot just be the overhead of checking the type of the index > array, as the overhead grows very roughly propertional to the size of > the index array, but remains independent of the size of the indexed > array. > > In [1]: a=arange(1000000) > > In [2]: i = arange(10) > > In [3]: timeit a[i] > 100000 loops, best of 3: 1.95 us per loop > > In [4]: i = arange(100) > > In [5]: timeit a[i] > 100000 loops, best of 3: 4.07 us per loop I think the linear growth here is expected, as i = arange(n) a[i].shape == i.shape It's probably possible to cut the overhead, though. -- Pauli Virtanen From travis at continuum.io Tue Feb 14 18:58:50 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 14 Feb 2012 17:58:50 -0600 Subject: [Numpy-discussion] Numpy governance update - was: Updated differences between 1.5.1 to 1.6.1 In-Reply-To: References: Message-ID: >> >> When we selected the name NumFOCUS just a few weeks ago, we created the list >> for numfocus and then I signed everyone up for that list who was on the >> other one. I apologize if anyone felt left out. That is not my >> intention. > > My point is that there are two ways go to about this process, one is > open and the other is closed. In the open version, someone proposes > such a group to the mailing lists. They ask for expressions of > interest. The discussion might then move to another mailing list that > is publicly known and widely advertised. Members of the board are > proposed in public. There might be some sort of formal or informal > voting process. The reason to prefer this to the more informal > private negotiations is that a) the community feels a greater > ownership and control of the process and b) it is much harder to > weaken or subvert an organization that explicitly does all its > business in public. Your points are well taken. However, my point is that this has been discussed on an open mailing list. Things weren't *as* open as they could have been, perhaps, in terms of board selection. But, there was opportunity for people to provide input. > >> Perceptions can be damaging. This is one of the big reasons for the >> organization of the Foundation -- to be a place separate from any commercial >> venture which can direct resources to a vision whose goal is more >> democratically determined. > > Are you proposing that the Foundation oversee Numpy governance and > direction? From your chosen members I'm guessing that the idea is > for the foundation to think about broad strategy rather than - say - > whether missing values should be encoded with masked arrays? No, I am not proposing that. The Foundation will be focused on higher-level broad strategy sorts of things: mostly around how to raise money and how to direct that money to projects that have their own development cycles. I would think the Foundation would be interested in paying for things like issue trackers and continuous integration servers as well. It will leave NumPy management to this list and the people who have gathered around this watering hole. Obviously, there will be points of connection, but exactly how this will play-out depends on who shows up to both organizations. >> I think we do have a NumPy steering group if you want to call it that. >> It is currently me, Mark Wiebe, and Charles Harris. Rolf Gommers, Pauli >> Virtanen, David Cournapeau and Robert Kern also have opinions that carry >> significant weight. Are there other people that should be on this list? >> There are other people who also speak up on this list whose opinions will >> be listened to and heard. In fact, I hope that many more people will come >> to the list and speak out as development increases. > > The point I was making was that the concentration of numpy development > hours and talent in your company makes it urgent that the numpy > governance is set out formally, that the interests of the company are > made clear, and that the steering group can be assured of explicit and > public independence from the interests of the company, if and when > that becomes necessary. In the past, the numpy steering group has > seemed a virtual organization, formed ad-hoc when needed, and with no > formal governance. I'm saying that I firmly believe that has to > change, to avoid the actual or perceived loss of community ownership. I hear your point. Thank you for sharing it. Fortunately, we are having this discussion, and plan to continue to have it as any concerns arise. I think the situation is actually less concentrated than it used to be when the SciPy steering committee was discussed. On that note, I think the SciPy steering committee needs serious revision as well. But, we've all just been getting along pretty well without too much formalism, so far, so perhaps that is enough for now. Thanks, -Travis From heng at cantab.net Tue Feb 14 18:59:42 2012 From: heng at cantab.net (Henry Gomersall) Date: Tue, 14 Feb 2012 23:59:42 +0000 Subject: [Numpy-discussion] [IPython-dev] Discussion with Guido van Rossum and (hopefully) core python-dev on scientific Python and Python3 In-Reply-To: References: <4F3A96E3.20401@astro.uio.no> Message-ID: <1329263982.27469.34.camel@farnsworth> On Tue, 2012-02-14 at 15:12 -0800, Chris Barker wrote: > > On Tue, Feb 14, 2012 at 9:16 AM, Dag Sverre Seljebotn > wrote: > > It was about the need for a dedicated matrix multiplication > operator. > > has anyone proposed that? I do think we've had a proposal on the table > for generally more operators: i.e. like matlab's ".*" vs "*", and yes, > matrix multiplication would be one use of that feature. One thing that would be nice would to be able to do something like: a = H * b mapping to something like H.__new_mul__(b, a) so that it's becomes possible to write to a pre-existing object, a. I'm not sure how to syntatically deal with this. Perhaps a[:] = H * b could carry that implicit connotation (with a fallback to create an interim array). It's not exactly clean though. Actually, having some syntax for any operation in which the left hand side of the assignment is presented as an argument would allow some fun things. Though i'm not sure how it makes sense if doesn't yet exist. I'm sure someone can tell me why this is a really bad idea! Cheers, Henry From ben.root at ou.edu Tue Feb 14 19:07:45 2012 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 14 Feb 2012 18:07:45 -0600 Subject: [Numpy-discussion] Numpy governance update - was: Updated differences between 1.5.1 to 1.6.1 In-Reply-To: References: Message-ID: On Tuesday, February 14, 2012, Matthew Brett wrote: > Hi, > > On Tue, Feb 14, 2012 at 1:54 PM, Travis Oliphant wrote: >> >> There is a mailing list for numfocus that you can sign up for if you would >> like to be part of those discussions. Let me know if you would like more >> information about that. John Hunter, Fernando Perez, me, Perry >> Greenfield, and Jarrod Millman are the initial board of the Foundation. >> But, I expect the Foundation directors to evolve over time. >> >> >> I should say that I have no knowledge of the events above other than >> from the mailing list (I say that only because some of you may know >> that I'm a friend and colleague of Jarrod and Fernando). >> >> >> Thanks for speaking up, Matthew. I knew that this was my first >> announcement of the Foundation to this list. Things are still just >> starting around that organization, and so there is plenty of time for input. >> This sort of thing has actually been under-way for a long time --- it just >> has not received much impetus until now for one reason or another. >> >> To be clear, there were several email posts about a Foundation to this list >> last fall and we took the discussion of the Foundation that has really been >> in the works for a couple of years (thanks to Jarrod), to a Google Group >> (very poorly) called Fastechula. There were 33 people who signed up for >> that list and discussions continued sporadically on that list away from this >> one. >> >> When we selected the name NumFOCUS just a few weeks ago, we created the list >> for numfocus and then I signed everyone up for that list who was on the >> other one. I apologize if anyone felt left out. That is not my >> intention. > > My point is that there are two ways go to about this process, one is > open and the other is closed. In the open version, someone proposes > such a group to the mailing lists. They ask for expressions of > interest. The discussion might then move to another mailing list that > is publicly known and widely advertised. Members of the board are > proposed in public. There might be some sort of formal or informal > voting process. The reason to prefer this to the more informal > private negotiations is that a) the community feels a greater > ownership and control of the process and b) it is much harder to > weaken or subvert an organization that explicitly does all its > business in public. > > The counter-argument usually goes 'members X, Y and Z are of > impeccable integrity and would only do what is best for the public > good'. And usually, members X, Y and Z are indeed of impeccable > integrity. Nevertheless I'm sure I don't have to unpack the evidence > that this approach frequently fails and can fail in a catastrophic > way. > >> Perceptions can be damaging. This is one of the big reasons for the >> organization of the Foundation -- to be a place separate from any commercial >> venture which can direct resources to a vision whose goal is more >> democratically determined. > > Are you proposing that the Foundation oversee Numpy governance and > direction? From your chosen members I'm guessing that the idea is > for the foundation to think about broad strategy rather than - say - > whether missing values should be encoded with masked arrays? > >> I think we do have a NumPy steering group if you want to call it that. >> It is currently me, Mark Wiebe, and Charles Harris. Rolf Gommers, Pauli >> Virtanen, David Cournapeau and Robert Kern also have opinions that carry >> significant weight. Are there other people that should be on this list? >> There are other people who also speak up on this list whose opinions will >> be listened to and heard. In fact, I hope that many more people will come >> to the list and speak out as development increases. > > The point I was making was that the concentration of numpy development > hours and talent in your company makes it urgent that the numpy > governance is set out formally, that the interests of the company are > made clear, and that the steering group can be assured of explicit and > public independence from the interests of the company, if and when > that becomes necessary. In the past, the numpy steering group has > seemed a virtual organization, formed ad-hoc when needed, and with no > formal governance. I'm saying that I firmly believe that has to > change, to avoid the actual or perceived loss of community ownership. > > Best, > > Matthew > I have to agree with Mathew here, to a point. There has been discussions of these groups before, but I don't recall any announcement of this group. Of course, now that it has been announced, maybe a link to it should be prominent on the numpy/scipy pages(maybe others?). It should also be in the list of mailing lists. A funding org much like the Linux Foundation would be great, and I am all for it. A separate governing committee is also important, and I think we had some very good ideas in previous discussions. I also have to agree with Matthew's concerns about the concentration of developer resources at Continuum. I think that establishing a community-driven governance committee would be crucial in making sure that Continuum's (and Enthought's??) efforts go to serve both the community and the company's customers. Travis, in about a month, I will be starting up work at a company that has been users of the SciPy stack, but has not been active members of the community. I wish to change that. Will this Funding committee serve as a face for numpy for private companies? Thanks, Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue Feb 14 19:43:09 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 14 Feb 2012 16:43:09 -0800 Subject: [Numpy-discussion] Numpy governance update - was: Updated differences between 1.5.1 to 1.6.1 In-Reply-To: References: Message-ID: Hi, On Tue, Feb 14, 2012 at 3:58 PM, Travis Oliphant wrote: >>> >>> When we selected the name NumFOCUS just a few weeks ago, we created the list >>> for numfocus and then I signed everyone up for that list who was on the >>> other one. ? ? ?I apologize if anyone felt left out. ? That is not my >>> intention. >> >> My point is that there are two ways go to about this process, one is >> open and the other is closed. ?In the open version, someone proposes >> such a group to the mailing lists. ?They ask for expressions of >> interest. ?The discussion might then move to another mailing list that >> is publicly known and widely advertised. ?Members of the board are >> proposed in public. ?There might be some sort of formal or informal >> voting process. ?The reason to prefer this to the more informal >> private negotiations is that a) the community feels a greater >> ownership and control of the process and b) it is much harder to >> weaken or subvert an organization that explicitly does all its >> business in public. > > Your points are well taken. ? However, my point is that this has been discussed on an open mailing list. ? Things weren't *as* open as they could have been, perhaps, in terms of board selection. ?But, there was opportunity for people to provide input. I am on the numpy, scipy, matplotlib, ipython and cython mailing lists. Jarrod and Fernando are friends of mine. I've been obviously concerned about numpy governance for some time. I didn't know about this mailing list, had only a vague idea that some sort of foundation was being proposed and I had no idea at all that you'd selected a board. Would you say that was closer to 'open' or closer to 'closed'? >>> Perceptions can be damaging. ? This is one of the big reasons for the >>> organization of the Foundation -- to be a place separate from any commercial >>> venture which can direct resources to a vision whose goal is more >>> democratically determined. >> >> Are you proposing that the Foundation oversee Numpy governance and >> direction? ? From your chosen members I'm guessing that the idea is >> for the foundation to think about broad strategy rather than - say - >> whether missing values should be encoded with masked arrays? > > No, I am not proposing that. ? ?The Foundation will be focused on higher-level broad strategy sorts of things: ?mostly around how to raise money and how to direct that money to projects that have their own development cycles. ? I would think the Foundation would be interested in paying for things like issue trackers and continuous integration servers as well. ? ? It will leave NumPy management to this list and the people who have gathered around this watering hole. ? ?Obviously, there will be points of connection, but exactly how this will play-out depends on who shows up to both organizations. > > >>> I think we do have a NumPy steering group if you want to call it that. >>> It is currently me, Mark Wiebe, and Charles Harris. ? ?Rolf Gommers, Pauli >>> Virtanen, David Cournapeau and Robert Kern also have opinions that carry >>> significant weight. ? ?Are there other people that should be on this list? >>> ?There are other people who also speak up on this list whose opinions will >>> be listened to and heard. ? In fact, I hope that many more people will come >>> to the list and speak out as development increases. >> >> The point I was making was that the concentration of numpy development >> hours and talent in your company makes it urgent that the numpy >> governance is set out formally, that the interests of the company are >> made clear, and that the steering group can be assured of explicit and >> public independence from the interests of the company, if and when >> that becomes necessary. ? In the past, the numpy steering group has >> seemed a virtual organization, formed ad-hoc when needed, and with no >> formal governance. ? I'm saying that I firmly believe that has to >> change, to avoid the actual or perceived loss of community ownership. > > I hear your point. ? ?Thank you for sharing it. ? ?Fortunately, we are having this discussion, and plan to continue to have it as any concerns arise. ? ?I think the situation is actually less concentrated than it used to be when the SciPy steering committee was discussed. ?On that note, ?I think the SciPy steering committee needs serious revision as well. ? ?But, we've all just been getting along pretty well without too much formalism, so far, so perhaps that is enough for now. But a) there have already been serious unresolved disagreements on this list (I note no resolution of the masks / NA debate) and b) the whole point is to set up structures that can deal with the problems before or as they arise. After the problem arises, it is too late. See you, Matthew From travis at continuum.io Tue Feb 14 19:52:17 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 14 Feb 2012 18:52:17 -0600 Subject: [Numpy-discussion] Numpy governance update - was: Updated differences between 1.5.1 to 1.6.1 In-Reply-To: References: Message-ID: > > I have to agree with Mathew here, to a point. There has been discussions of these groups before, but I don't recall any announcement of this group. Of course, now that it has been announced, maybe a link to it should be prominent on the numpy/scipy pages(maybe others?). It should also be in the list of mailing lists. I'm happy for all these discussions to be in the open. > > A funding org much like the Linux Foundation would be great, and I am all for it. A separate governing committee is also important, and I think we had some very good ideas in previous discussions. > > I also have to agree with Matthew's concerns about the concentration of developer resources at Continuum. I think that establishing a community-driven governance committee would be crucial in making sure that Continuum's (and Enthought's??) efforts go to serve both the community and the company's customers. I can try and re-assure you that all will be well, but I know that time is the only thing that will prove that out as each one will decide for themselves whether or not their input is valued and acted upon. To provide some perspective, for the next 5 months at least, Continuum will be providing 3.5 people at least 50% to the NumPy project plus dev ops help to get issue tracking and continuous build integration set up. After that we will have at least 1.5 people devoted full-time to the open-source NumPy project (more if possible). I would like this support to actually go through the Foundation (which already has a community governance and non-profit mission statement), but this takes some leg-work in getting the Foundation setup and organizing those contracts. But, that is my intent and what I am working to get in place eventually. Obviously, the fact that I am deeply involved in NumPy complicates the question of "community governance" for some people, but I hope you will trust that we are just trying to improve NumPy as we understand it. I remain interested in others views of what "improving NumPy" means. But, I do have a long list of ideas that I am anxious to get started on. > > Travis, in about a month, I will be starting up work at a company that has been users of the SciPy stack, but has not been active members of the community. I wish to change that. Will this Funding committee serve as a face for numpy for private companies? Absolutely. The Foundation web-site is getting set up right now. It will be an evolving thing, and your feedback about how the Foundation can help you get your company involved will be very helpful. I would like multiple companies to interact through the Foundation, and that is how I would ultimately like Continuum to interact with the community as well. The fact that Continuum employs people who work on NumPy should be no more concerning than the fact that Google employs people that work on Python, or that Enthought employs people who work on SciPy and NumPy. I recognize that my role in NumPy, the Foundation, and my company may be concerning for some. I firmly believe that NumPy is successful because of everybody who has participated. I am not interested in somehow changing that. I just want to do what I can to accelerate it. I look forward to working with you and your company in the Foundation. Best regards, -Travis From matthew.brett at gmail.com Tue Feb 14 20:00:26 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 14 Feb 2012 17:00:26 -0800 Subject: [Numpy-discussion] Numpy governance update - was: Updated differences between 1.5.1 to 1.6.1 In-Reply-To: References: Message-ID: Hi, On Tue, Feb 14, 2012 at 4:43 PM, Matthew Brett wrote: > Hi, > > On Tue, Feb 14, 2012 at 3:58 PM, Travis Oliphant wrote: >>>> >>>> When we selected the name NumFOCUS just a few weeks ago, we created the list >>>> for numfocus and then I signed everyone up for that list who was on the >>>> other one. ? ? ?I apologize if anyone felt left out. ? That is not my >>>> intention. >>> >>> My point is that there are two ways go to about this process, one is >>> open and the other is closed. ?In the open version, someone proposes >>> such a group to the mailing lists. ?They ask for expressions of >>> interest. ?The discussion might then move to another mailing list that >>> is publicly known and widely advertised. ?Members of the board are >>> proposed in public. ?There might be some sort of formal or informal >>> voting process. ?The reason to prefer this to the more informal >>> private negotiations is that a) the community feels a greater >>> ownership and control of the process and b) it is much harder to >>> weaken or subvert an organization that explicitly does all its >>> business in public. >> >> Your points are well taken. ? However, my point is that this has been discussed on an open mailing list. ? Things weren't *as* open as they could have been, perhaps, in terms of board selection. ?But, there was opportunity for people to provide input. > > I am on the numpy, scipy, matplotlib, ipython and cython mailing > lists. ?Jarrod and Fernando are friends of mine. ?I've been obviously > concerned about numpy governance for some time. ?I didn't know about > this mailing list, had only a vague idea that some sort of foundation > was being proposed and I had no idea at all that you'd selected a > board. ?Would you say that was closer to 'open' or closer to 'closed'? By the way - I want to be clear - I am not suggesting that I should have been one of the people involved in these discussions. If you were choosing a small number of people to discuss this with, one of them should not be me. I am saying that, if I didn't know, it's reasonable to assume that very few people knew, who weren't being explicitly told, and that this means that the process was, effectively, closed. See you, Matthew From travis at continuum.io Tue Feb 14 20:17:23 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 14 Feb 2012 19:17:23 -0600 Subject: [Numpy-discussion] Numpy governance update - was: Updated differences between 1.5.1 to 1.6.1 In-Reply-To: References: Message-ID: <1F3C1CFF-D744-4741-BBBA-07BC04658DDC@continuum.io> >> >> Your points are well taken. However, my point is that this has been discussed on an open mailing list. Things weren't *as* open as they could have been, perhaps, in terms of board selection. But, there was opportunity for people to provide input. > > I am on the numpy, scipy, matplotlib, ipython and cython mailing > lists. Jarrod and Fernando are friends of mine. I've been obviously > concerned about numpy governance for some time. I didn't know about > this mailing list, had only a vague idea that some sort of foundation > was being proposed and I had no idea at all that you'd selected a > board. Would you say that was closer to 'open' or closer to 'closed'? I see it a different way. First, the Foundation is not a NumPy-governance thing. Certainly it could grow in that direction over time, but it isn't there now, nor is that its goal. Second, the Foundation is just getting started. It's only come together over the past couple of weeks. The fact that we are talking about it now, seems to me to indicate that it is quite "open" --- certainly closer to 'open' then you seem to imply. Also, the fact that there was a public mailing list for its discussion certainly sounds "open" to me (poorly advertised I will grant you). I tried to include as many people as I thought were interested by the responses to the initial emails on the list. I reached out to people that contacted me expressing their interest, and included them on the mailing list. I can accept that I made mistakes. I can guarantee that I will make more. Your feedback is appreciated and noted. The fact is that the Foundation is really a service organization that will require a lot of work to run and administer. It's effectiveness at fulfilling its mission will depend on how well it serves the group on this list, as well as the other groups that are working on Python for Science. I'm all for getting as many volunteers as we can get for the Foundation. I've just been trying to get things organized. Sometimes this works best by phone calls and direct contact, rather than mailing lists. For those interested. The Foundation mission is to: * Promote Open Source Software for Science * Fund Open Source Projects in Science (currently NumPy, SciPy, IPython, and Matplotlib are first-tier with a whole host of second-tier projects that could received funding) * through grants * through code bounties * through graduate-student scholarships * Sponsor sprints * Sponsor conferences * Sponsor student travel * etc., etc. Whether or not it can do any of those things depends on whether or not it can raise money from people and organizations that benefit from the Scientific Python Stack. All of this will be advertised more as the year progresses. Best regards, -Travis From jason-sage at creativetrax.com Tue Feb 14 20:22:20 2012 From: jason-sage at creativetrax.com (Jason Grout) Date: Tue, 14 Feb 2012 19:22:20 -0600 Subject: [Numpy-discussion] Numpy governance update - was: Updated differences between 1.5.1 to 1.6.1 In-Reply-To: <1F3C1CFF-D744-4741-BBBA-07BC04658DDC@continuum.io> References: <1F3C1CFF-D744-4741-BBBA-07BC04658DDC@continuum.io> Message-ID: <4F3B08CC.8010901@creativetrax.com> On 2/14/12 7:17 PM, Travis Oliphant wrote: > * Fund Open Source Projects in Science (currently NumPy, SciPy, IPython, and Matplotlib are first-tier with a whole host of second-tier projects that could received funding) > * through grants So, for example, would the Foundation apply to mentor Google Summer of Code projects? Jason From ben.root at ou.edu Tue Feb 14 20:26:34 2012 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 14 Feb 2012 19:26:34 -0600 Subject: [Numpy-discussion] Numpy governance update - was: Updated differences between 1.5.1 to 1.6.1 In-Reply-To: References: Message-ID: On Tuesday, February 14, 2012, Travis Oliphant wrote: >> >> I have to agree with Mathew here, to a point. There has been discussions of these groups before, but I don't recall any announcement of this group. Of course, now that it has been announced, maybe a link to it should be prominent on the numpy/scipy pages(maybe others?). It should also be in the list of mailing lists. > > I'm happy for all these discussions to be in the open. > >> >> A funding org much like the Linux Foundation would be great, and I am all for it. A separate governing committee is also important, and I think we had some very good ideas in previous discussions. >> >> I also have to agree with Matthew's concerns about the concentration of developer resources at Continuum. I think that establishing a community-driven governance committee would be crucial in making sure that Continuum's (and Enthought's??) efforts go to serve both the community and the company's customers. > > I can try and re-assure you that all will be well, but I know that time is the only thing that will prove that out as each one will decide for themselves whether or not their input is valued and acted upon. To provide some perspective, for the next 5 months at least, Continuum will be providing 3.5 people at least 50% to the NumPy project plus dev ops help to get issue tracking and continuous build integration set up. After that we will have at least 1.5 people devoted full-time to the open-source NumPy project (more if possible). I would like this support to actually go through the Foundation (which already has a community governance and non-profit mission statement), but this takes some leg-work in getting the Foundation setup and organizing those contracts. But, that is my intent and what I am working to get in place eventually. That's good, and all of this is all in NumPy's benefit. I think that This is now the perfect time to establish some sort of governance, even if it is provincial. This has absolutely nothing against you (as you have done an excellent job so far), but now that the numpy community has grown this much and there are so many stake-holders, a formal governance will be essential for community cohesion. > > Obviously, the fact that I am deeply involved in NumPy complicates the question of "community governance" for some people, but I hope you will trust that we are just trying to improve NumPy as we understand it. I remain interested in others views of what "improving NumPy" means. But, I do have a long list of ideas that I am anxious to get started on. > And I don't want the governance to hinder those ideas. I just see it as inevitable that there will be disagreements and having an agreed-upon structure by which to resolve them would be most beneficial. Also, such a committee could be used to solicit feedback on RFCs and NEPs. >> >> Travis, in about a month, I will be starting up work at a company that has been users of the SciPy stack, but has not been active members of the community. I wish to change that. Will this Funding committee serve as a face for numpy for private companies? > > Absolutely. The Foundation web-site is getting set up right now. It will be an evolving thing, and your feedback about how the Foundation can help you get your company involved will be very helpful. I would like multiple companies to interact through the Foundation, and that is how I would ultimately like Continuum to interact with the community as well. The fact that Continuum employs people who work on NumPy should be no more concerning than the fact that Google employs people that work on Python, or that Enthought employs people who work on SciPy and NumPy. I recognize that my role in NumPy, the Foundation, and my company may be concerning for some. I firmly believe that NumPy is successful because of everybody who has participated. I am not interested in somehow changing that. I just want to do what I can to accelerate it. > > I look forward to working with you and your company in the Foundation. > Me too. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Tue Feb 14 22:07:11 2012 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 14 Feb 2012 21:07:11 -0600 Subject: [Numpy-discussion] Numpy governance update - was: Updated differences between 1.5.1 to 1.6.1 In-Reply-To: References: Message-ID: On Tue, Feb 14, 2012 at 6:43 PM, Matthew Brett wrote: > Hi, > > On Tue, Feb 14, 2012 at 3:58 PM, Travis Oliphant wrote: >>>> >>>> When we selected the name NumFOCUS just a few weeks ago, we created the list >>>> for numfocus and then I signed everyone up for that list who was on the >>>> other one. ? ? ?I apologize if anyone felt left out. ? That is not my >>>> intention. >>> >>> My point is that there are two ways go to about this process, one is >>> open and the other is closed. ?In the open version, someone proposes >>> such a group to the mailing lists. ?They ask for expressions of >>> interest. ?The discussion might then move to another mailing list that >>> is publicly known and widely advertised. ?Members of the board are >>> proposed in public. ?There might be some sort of formal or informal >>> voting process. ?The reason to prefer this to the more informal >>> private negotiations is that a) the community feels a greater >>> ownership and control of the process and b) it is much harder to >>> weaken or subvert an organization that explicitly does all its >>> business in public. >> >> Your points are well taken. ? However, my point is that this has been discussed on an open mailing list. ? Things weren't *as* open as they could have been, perhaps, in terms of board selection. ?But, there was opportunity for people to provide input. > > I am on the numpy, scipy, matplotlib, ipython and cython mailing > lists. ?Jarrod and Fernando are friends of mine. ?I've been obviously > concerned about numpy governance for some time. ?I didn't know about > this mailing list, had only a vague idea that some sort of foundation > was being proposed and I had no idea at all that you'd selected a > board. ?Would you say that was closer to 'open' or closer to 'closed'? > >>>> Perceptions can be damaging. ? This is one of the big reasons for the >>>> organization of the Foundation -- to be a place separate from any commercial >>>> venture which can direct resources to a vision whose goal is more >>>> democratically determined. >>> >>> Are you proposing that the Foundation oversee Numpy governance and >>> direction? ? From your chosen members I'm guessing that the idea is >>> for the foundation to think about broad strategy rather than - say - >>> whether missing values should be encoded with masked arrays? >> >> No, I am not proposing that. ? ?The Foundation will be focused on higher-level broad strategy sorts of things: ?mostly around how to raise money and how to direct that money to projects that have their own development cycles. ? I would think the Foundation would be interested in paying for things like issue trackers and continuous integration servers as well. ? ? It will leave NumPy management to this list and the people who have gathered around this watering hole. ? ?Obviously, there will be points of connection, but exactly how this will play-out depends on who shows up to both organizations. >> >> >>>> I think we do have a NumPy steering group if you want to call it that. >>>> It is currently me, Mark Wiebe, and Charles Harris. ? ?Rolf Gommers, Pauli >>>> Virtanen, David Cournapeau and Robert Kern also have opinions that carry >>>> significant weight. ? ?Are there other people that should be on this list? >>>> ?There are other people who also speak up on this list whose opinions will >>>> be listened to and heard. ? In fact, I hope that many more people will come >>>> to the list and speak out as development increases. >>> >>> The point I was making was that the concentration of numpy development >>> hours and talent in your company makes it urgent that the numpy >>> governance is set out formally, that the interests of the company are >>> made clear, and that the steering group can be assured of explicit and >>> public independence from the interests of the company, if and when >>> that becomes necessary. ? In the past, the numpy steering group has >>> seemed a virtual organization, formed ad-hoc when needed, and with no >>> formal governance. ? I'm saying that I firmly believe that has to >>> change, to avoid the actual or perceived loss of community ownership. >> >> I hear your point. ? ?Thank you for sharing it. ? ?Fortunately, we are having this discussion, and plan to continue to have it as any concerns arise. ? ?I think the situation is actually less concentrated than it used to be when the SciPy steering committee was discussed. ?On that note, ?I think the SciPy steering committee needs serious revision as well. ? ?But, we've all just been getting along pretty well without too much formalism, so far, so perhaps that is enough for now. > > But a) there have already been serious unresolved disagreements on > this list (I note no resolution of the masks / NA debate) and b) the > whole point is to set up structures that can deal with the problems > before or as they arise. ?After the problem arises, it is too late. > > See you, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Thanks for the links, Matt, as it does put quite a few things into perspective. I do fully agree with you on this. The 'masks / NA debate' is a curious issue here. On one hand we are meant to understand that numpy will be continuing 'as usual'. Yet, thanks to the link, we know that this new company involves probably the only person, Mark Wiebe, that understands the NA object that has been entered into the development branch. It just does not come across very well when major issue fundamental issues with the NA object have occurred on the list have been totally ignored by Mark - thanks Chuck! The one thing that gets over looked here is that there is a huge diversity of users with very different skill levels. But very few people have an understanding of the core code. (In fact the other thread about type-casting suggests that it is extremely few people.) So in all of this, I do not yet see 'community'. But the only way you can change that perception is through actions. Bruce From bsouthey at gmail.com Tue Feb 14 22:14:11 2012 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 14 Feb 2012 21:14:11 -0600 Subject: [Numpy-discussion] Release management (was: Updated differences between 1.5.1 to 1.6.1) In-Reply-To: References: Message-ID: On Tue, Feb 14, 2012 at 2:24 PM, Ralf Gommers wrote: > > > On Tue, Feb 14, 2012 at 7:25 PM, Travis Oliphant > wrote: >> >> >> On Feb 14, 2012, at 3:32 AM, David Cournapeau wrote: >> >> > Hi Travis, >> > >> > It is great that some resources can be spent to have people paid to >> > work on NumPy. Thank you for making that happen. >> > >> > I am slightly confused about roadmaps for numpy 1.8 and 2.0. This >> > needs discussion on the ML, and our release manager currently is Ralf >> > - he is the one who ultimately decides what goes when. >> >> Thank you for reminding me of this. ?Ralf and I spoke several days ago, >> and have been working on how to give him more time to spend on SciPy >> full-time. > > > Well, full-time is the job that I get paid for:) > >> As a result, he will be release managing NumPy 1.7, but for NumPy 1.8, I >> will be the release manager again. ? Ralf will continue serving as release >> manager for SciPy. > > > I had planned to bring this up only after the 1.7 release but yes, I would > like to push the balance of my open-source work a little from > release/maintenance work towards writing more new code. I've been doing both > NumPy and SciPy releases for about two years now, and it's time for me to > hand over the manager hat for one of those two. And my preference is to keep > on doing the SciPy releases rather than the NumPy ones. > >> For NumPy 2.0 and beyond, Mark Wiebe will likely be the release manager. >> I only know that I won't be release manager past NumPy 1.X. > > > Travis, it's very good to see that the release manager role can be filled > going forward (it's not the most popular job), but I think the way it should > work is that people volunteer for this role and then the community agrees on > giving a volunteer that role. > > I actually started contributing when David asked for someone to take over > from him in the above manner. Maybe someone else will step up now, giving > you or Mark more time to work on new NumPy features (which I'm pretty sure > you'd prefer). > > Cheers, > Ralf > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Ralf, I will miss you as a numpy release manager! You have not only done an incredible job but also taken the role to a higher level. Your attitude and attention to details has been amazing. Thanks, Bruce From matthew.brett at gmail.com Tue Feb 14 22:19:00 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 14 Feb 2012 19:19:00 -0800 Subject: [Numpy-discussion] can_cast with structured array output - bug? In-Reply-To: References: Message-ID: Hi, On Mon, Feb 13, 2012 at 7:02 PM, Mark Wiebe wrote: > I took a look into the code to see what is causing this, and the reason is > that nothing has ever been implemented to deal with the fields. This means > it falls back to treating all struct dtypes as if they were a plain "void" > dtype, which allows anything to be cast to it. > > While I was redoing the casting subsystem for 1.6, I did think on this > issue, and decided that it wasn't worth tackling it at the time because the > 'safe'/'same_kind'/'unsafe' don't seem sufficient to handle what might be > desired. I tried to leave this alone as much as possible. > > Some random thoughts about this are: > > * Casting a scalar to a struct dtype: should it be safe if the scalar can be > safely cast to each member of the struct dtype? This is the NumPy > broadcasting rule applied to dtypes as if the struct dtype is another > dimension. > * Casting one struct dtype to another: If the fields of the source are a > subset of the target, and the types can safely convert, should that be a > safe cast? If the fields of the source are not a subset of the target, > should that still be a same_kind cast? Should a second enum which > complements the safe/same_kind/unsafe one, but is specific for how > adding/removing struct fields be added? > > This is closely related to adding ufunc support for struct dtypes, and the > choices here should probably be decided at the same time as designing how > the ufuncs should work. Thanks for the discussion - that's very helpful. How about, at a first pass, returning True for conversion of void types only if input dtype == output dtype, then adding more sophisticated rules later? See you, Matthew From ben.root at ou.edu Tue Feb 14 23:17:53 2012 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 14 Feb 2012 22:17:53 -0600 Subject: [Numpy-discussion] Implicit conversion of python datetime to numpy datetime64? Message-ID: Just a thought I had. Right now, I can pass a list of python ints or floats into np.array() and get a numpy array with a sensible dtype. Is there any reason why we can't do the same for python's datetime? Right now, it is very easy for me to make a list comprehension of datetime objects using strptime(), but it is very awkward to make a numpy array out of it. The only barrier I can think of are those who have already built code around a object dtype array of datetime objects. Thoughts? Ben Root P.S. - what ever happened to arange() and linspace() for datetime64? -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Tue Feb 14 23:53:00 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Tue, 14 Feb 2012 20:53:00 -0800 Subject: [Numpy-discussion] can_cast with structured array output - bug? In-Reply-To: References: Message-ID: On Tue, Feb 14, 2012 at 7:19 PM, Matthew Brett wrote: > Hi, > > On Mon, Feb 13, 2012 at 7:02 PM, Mark Wiebe wrote: > > I took a look into the code to see what is causing this, and the reason > is > > that nothing has ever been implemented to deal with the fields. This > means > > it falls back to treating all struct dtypes as if they were a plain > "void" > > dtype, which allows anything to be cast to it. > > > > While I was redoing the casting subsystem for 1.6, I did think on this > > issue, and decided that it wasn't worth tackling it at the time because > the > > 'safe'/'same_kind'/'unsafe' don't seem sufficient to handle what might be > > desired. I tried to leave this alone as much as possible. > > > > Some random thoughts about this are: > > > > * Casting a scalar to a struct dtype: should it be safe if the scalar > can be > > safely cast to each member of the struct dtype? This is the NumPy > > broadcasting rule applied to dtypes as if the struct dtype is another > > dimension. > > * Casting one struct dtype to another: If the fields of the source are a > > subset of the target, and the types can safely convert, should that be a > > safe cast? If the fields of the source are not a subset of the target, > > should that still be a same_kind cast? Should a second enum which > > complements the safe/same_kind/unsafe one, but is specific for how > > adding/removing struct fields be added? > > > > This is closely related to adding ufunc support for struct dtypes, and > the > > choices here should probably be decided at the same time as designing how > > the ufuncs should work. > > Thanks for the discussion - that's very helpful. > > How about, at a first pass, returning True for conversion of void > types only if input dtype == output dtype, then adding more > sophisticated rules later? > That's a very good approach, thanks. I've created a ticket in the bug tracker so this doesn't get lost, and can be triaged alongside the other issues. http://projects.scipy.org/numpy/ticket/2055 Cheers, Mark > > See you, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Feb 15 00:05:19 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 14 Feb 2012 22:05:19 -0700 Subject: [Numpy-discussion] Implicit conversion of python datetime to numpy datetime64? In-Reply-To: References: Message-ID: On Tue, Feb 14, 2012 at 9:17 PM, Benjamin Root wrote: > Just a thought I had. Right now, I can pass a list of python ints or > floats into np.array() and get a numpy array with a sensible dtype. Is > there any reason why we can't do the same for python's datetime? Right > now, it is very easy for me to make a list comprehension of datetime > objects using strptime(), but it is very awkward to make a numpy array out > of it. > > The only barrier I can think of are those who have already built code > around a object dtype array of datetime objects. > > Thoughts? > Ben Root > > P.S. - what ever happened to arange() and linspace() for datetime64? > Arange works in the development branch, In [1]: arange(0,3,1, dtype="datetime64[D]") Out[1]: array(['1970-01-01', '1970-01-02', '1970-01-03'], dtype='datetime64[D]') but linspace is more complicated in that it might not be possible to subdivide an interval into reasonable datetime64 units In [4]: a = datetime64(0, 'D') In [5]: b = datetime64(1, 'D') In [6]: linspace(a, b, 5) Out[6]: array(['1970-01-01', '1970-01-01', '1970-01-01', '1970-01-01', '1970-01-02'], dtype='datetime64[D]') Looks like a project for somebody. There is probably a lot of work along that line to be done. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Wed Feb 15 00:12:20 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Tue, 14 Feb 2012 21:12:20 -0800 Subject: [Numpy-discussion] Implicit conversion of python datetime to numpy datetime64? In-Reply-To: References: Message-ID: On Tue, Feb 14, 2012 at 8:17 PM, Benjamin Root wrote: > Just a thought I had. Right now, I can pass a list of python ints or > floats into np.array() and get a numpy array with a sensible dtype. Is > there any reason why we can't do the same for python's datetime? Right > now, it is very easy for me to make a list comprehension of datetime > objects using strptime(), but it is very awkward to make a numpy array out > of it. > I would consider this a bug, it's not behaving sensibly at present. Here's what it does for me: In [20]: np.array([datetime.datetime.strptime(date, "%m/%d/%y") for date in ["02/03/12", ...: "07/22/98", "12/12/12"]], dtype="M8") --------------------------------------------------------------------------- TypeError Traceback (most recent call last) C:\Python27\Scripts\ in () 1 np.array([datetime.datetime.strptime(date, "%m/%d/%y") for date in ["02/03/12", ----> 2 "07/22/98", "12/12/12"]], dtype="M8") TypeError: Cannot cast datetime.datetime object from metadata [us] to [D] according to the rule 'same_kind' In [21]: np.array([datetime.datetime.strptime(date, "%m/%d/%y") for date in ["02/03/12", ...: "07/22/98", "12/12/12"]], dtype="M8[us]") Out[21]: array(['2012-02-02T16:00:00.000000-0800', '1998-07-21T17:00:00.000000-0700', '2012-12-11T16:00:00.000000-0800'], dtype='datetime64[us]') In [22]: np.array([datetime.datetime.strptime(date, "%m/%d/%y") for date in ["02/03/12", ...: "07/22/98", "12/12/12"]], dtype="M8[us]").astype("M8[D]") Out[22]: array(['2012-02-03', '1998-07-22', '2012-12-12'], dtype='datetime64[D]') The only barrier I can think of are those who have already built code > around a object dtype array of datetime objects. > > Thoughts? > Ben Root > > P.S. - what ever happened to arange() and linspace() for datetime64? > arange definitely works: In[28] np.arange('2011-03-02', '2011-04-01', dtype='M8') Out[28]: array(['2011-03-02', '2011-03-03', '2011-03-04', '2011-03-05', '2011-03-06', '2011-03-07', '2011-03-08', '2011-03-09', '2011-03-10', '2011-03-11', '2011-03-12', '2011-03-13', '2011-03-14', '2011-03-15', '2011-03-16', '2011-03-17', '2011-03-18', '2011-03-19', '2011-03-20', '2011-03-21', '2011-03-22', '2011-03-23', '2011-03-24', '2011-03-25', '2011-03-26', '2011-03-27', '2011-03-28', '2011-03-29', '2011-03-30', '2011-03-31'], dtype='datetime64[D]') I didn't get to implementing linspace. I did look at it, but the current code didn't make it a trivial thing to put in. -Mark > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Wed Feb 15 00:37:26 2012 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 14 Feb 2012 23:37:26 -0600 Subject: [Numpy-discussion] Implicit conversion of python datetime to numpy datetime64? In-Reply-To: References: Message-ID: On Tuesday, February 14, 2012, Mark Wiebe wrote: > On Tue, Feb 14, 2012 at 8:17 PM, Benjamin Root wrote: >> >> Just a thought I had. Right now, I can pass a list of python ints or floats into np.array() and get a numpy array with a sensible dtype. Is there any reason why we can't do the same for python's datetime? Right now, it is very easy for me to make a list comprehension of datetime objects using strptime(), but it is very awkward to make a numpy array out of it. > > I would consider this a bug, it's not behaving sensibly at present. Here's what it does for me: > > In [20]: np.array([datetime.datetime.strptime(date, "%m/%d/%y") for date in ["02/03/12", > > ...: "07/22/98", "12/12/12"]], dtype="M8") Well, I guess it would be nice if I didn't even have to provide the dtype (I.e., inferred from the datetime type, since we aren't talking about strings). But I hadn't noticed the above, I was just making object arrays. > > --------------------------------------------------------------------------- > > TypeError Traceback (most recent call last) > > C:\Python27\Scripts\ in () > > 1 np.array([datetime.datetime.strptime(date, "%m/%d/%y") for date in ["02/03/12", > > ----> 2 "07/22/98", "12/12/12"]], dtype="M8") > > TypeError: Cannot cast datetime.datetime object from metadata [us] to [D] according to the rule 'same_kind' > > In [21]: np.array([datetime.datetime.strptime(date, "%m/%d/%y") for date in ["02/03/12", > > ...: "07/22/98", "12/12/12"]], dtype="M8[us]") > > Out[21]: > > array(['2012-02-02T16:00:00.000000-0800', > > '1998-07-21T17:00:00.000000-0700', '2012-12-11T16:00:00.000000-0800'], dtype='datetime64[us]') > > In [22]: np.array([datetime.datetime.strptime(date, "%m/%d/%y") for date in ["02/03/12", > > ...: "07/22/98", "12/12/12"]], dtype="M8[us]").astype("M8[D]") > > Out[22]: array(['2012-02-03', '1998-07-22', '2012-12-12'], dtype='datetime64[D]') >> >> The only barrier I can think of are those who have already built code around a object dtype array of datetime objects. >> >> Thoughts? >> Ben Root >> >> P.S. - what ever happened to arange() and linspace() for datetime64? > > arange definitely works: > In[28] np.arange('2011-03-02', '2011-04-01', dtype='M8') > Out[28]: > array(['2011-03-02', '2011-03-03', '2011-03-04', '2011-03-05', > '2011-03-06', '2011-03-07', '2011-03-08', '2011-03-09', > '2011-03-10', '2011-03-11', '2011-03-12', '2011-03-13', > '2011-03-14', '2011-03-15', '2011-03-16', '2011-03-17', > '2011-03-18', '2011-03-19', '2011-03-20', '2011-03-21', > '2011-03-22', '2011-03-23', '2011-03-24', '2011-03-25', > '2011-03-26', '2011-03-27', '2011-03-28', '2011-03-29', > '2011-03-30', '2011-03-31'], dtype='datetime64[D]') > I didn't get to implementing linspace. I did look at it, but the current code didn't make it a trivial thing to put in. > -Mark Sorry, I wasn't clear about arange, I meant that it would be nice if it could take python datetimes as arguments (and timedelat for the step?) because that is much more intuitive than remembering the exact dtype code and string format. I see it as the numpy datetime64 type could take three types for it's constructor: another datetime64, python datetime, and The standard unambiguous datetime string. I should be able to use these interchangeably in numpy. The same would be true for timedelta64. Easy interchange between python datetime and datetime64 would allow numpy to piggy-back on established functionality in the python system libraries, allowing for focus to be given to extended features. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wardefar at iro.umontreal.ca Wed Feb 15 00:43:17 2012 From: wardefar at iro.umontreal.ca (David Warde-Farley) Date: Wed, 15 Feb 2012 00:43:17 -0500 Subject: [Numpy-discussion] Release management (was: Updated differences between 1.5.1 to 1.6.1) In-Reply-To: References: Message-ID: <1F5C6C52-6E25-4BF1-81FC-10D3ED836361@iro.umontreal.ca> On 2012-02-14, at 10:14 PM, Bruce Southey wrote: > I will miss you as a numpy release manager! > You have not only done an incredible job but also taken the role to a > higher level. > Your attitude and attention to details has been amazing. +1, hear hear! Thank you for all the time you've invested, you are owed a great deal by all of us, myself included. David From mwwiebe at gmail.com Wed Feb 15 00:54:29 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Tue, 14 Feb 2012 21:54:29 -0800 Subject: [Numpy-discussion] Implicit conversion of python datetime to numpy datetime64? In-Reply-To: References: Message-ID: On Tue, Feb 14, 2012 at 9:37 PM, Benjamin Root wrote: > On Tuesday, February 14, 2012, Mark Wiebe wrote: > > On Tue, Feb 14, 2012 at 8:17 PM, Benjamin Root wrote: > >> > >> Just a thought I had. Right now, I can pass a list of python ints or > floats into np.array() and get a numpy array with a sensible dtype. Is > there any reason why we can't do the same for python's datetime? Right > now, it is very easy for me to make a list comprehension of datetime > objects using strptime(), but it is very awkward to make a numpy array out > of it. > > > > I would consider this a bug, it's not behaving sensibly at present. > Here's what it does for me: > > > > In [20]: np.array([datetime.datetime.strptime(date, "%m/%d/%y") for date > in ["02/03/12", > > > > ...: "07/22/98", "12/12/12"]], dtype="M8") > > Well, I guess it would be nice if I didn't even have to provide the dtype > (I.e., inferred from the datetime type, since we aren't talking about > strings). But I hadn't noticed the above, I was just making object arrays. > > > > > > > --------------------------------------------------------------------------- > > > > TypeError Traceback (most recent call last) > > > > C:\Python27\Scripts\ in () > > > > 1 np.array([datetime.datetime.strptime(date, "%m/%d/%y") for date in > ["02/03/12", > > > > ----> 2 "07/22/98", "12/12/12"]], dtype="M8") > > > > TypeError: Cannot cast datetime.datetime object from metadata [us] to > [D] according to the rule 'same_kind' > > > > In [21]: np.array([datetime.datetime.strptime(date, "%m/%d/%y") for date > in ["02/03/12", > > > > ...: "07/22/98", "12/12/12"]], dtype="M8[us]") > > > > Out[21]: > > > > array(['2012-02-02T16:00:00.000000-0800', > > > > '1998-07-21T17:00:00.000000-0700', '2012-12-11T16:00:00.000000-0800'], > dtype='datetime64[us]') > > > > In [22]: np.array([datetime.datetime.strptime(date, "%m/%d/%y") for date > in ["02/03/12", > > > > ...: "07/22/98", "12/12/12"]], dtype="M8[us]").astype("M8[D]") > > > > Out[22]: array(['2012-02-03', '1998-07-22', '2012-12-12'], > dtype='datetime64[D]') > >> > >> The only barrier I can think of are those who have already built code > around a object dtype array of datetime objects. > >> > >> Thoughts? > >> Ben Root > >> > >> P.S. - what ever happened to arange() and linspace() for datetime64? > > > > arange definitely works: > > In[28] np.arange('2011-03-02', '2011-04-01', dtype='M8') > > Out[28]: > > array(['2011-03-02', '2011-03-03', '2011-03-04', '2011-03-05', > > '2011-03-06', '2011-03-07', '2011-03-08', '2011-03-09', > > '2011-03-10', '2011-03-11', '2011-03-12', '2011-03-13', > > '2011-03-14', '2011-03-15', '2011-03-16', '2011-03-17', > > '2011-03-18', '2011-03-19', '2011-03-20', '2011-03-21', > > '2011-03-22', '2011-03-23', '2011-03-24', '2011-03-25', > > '2011-03-26', '2011-03-27', '2011-03-28', '2011-03-29', > > '2011-03-30', '2011-03-31'], dtype='datetime64[D]') > > I didn't get to implementing linspace. I did look at it, but the current > code didn't make it a trivial thing to put in. > > -Mark > > Sorry, I wasn't clear about arange, I meant that it would be nice if it > could take python datetimes as arguments (and timedelat for the step?) > because that is much more intuitive than remembering the exact dtype code > and string format. > > I see it as the numpy datetime64 type could take three types for it's > constructor: another datetime64, python datetime, and The standard > unambiguous datetime string. I should be able to use these interchangeably > in numpy. The same would be true for timedelta64. > > Easy interchange between python datetime and datetime64 would allow numpy > to piggy-back on established functionality in the python system libraries, > allowing for focus to be given to extended features. > Ben Walsh actually implemented this and the code is in a pull request here: https://github.com/numpy/numpy/pull/111 This didn't go in, because the datetime properties don't exist on the arrays after you convert them to datetime64, so there could be some unintuitive consequences from that. When Martin implemented the quaternion dtype, we discussed the possibility that dtypes could expose properties that show up on the array object, and if this were implemented I think the conversion and compatibility between python datetime and datetime64 could be made quite natural. -Mark > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From 00ai99 at gmail.com Wed Feb 15 02:45:03 2012 From: 00ai99 at gmail.com (David Gowers (kampu)) Date: Wed, 15 Feb 2012 18:15:03 +1030 Subject: [Numpy-discussion] recarray field access asymmetry Message-ID: Hi all, This email is about the difference, given a recarray 'arr', ?between A) arr.foo.x[0] and B) arr.foo[0].x Specifically, form A returns the 0-th x value, whereas form B raises AttributeError: Some code demonstrating this: >>> arr = np.zeros((4,), dtype = [('foo',[('x','H'),('y','H')])]) >>> a2 = arr.view (np.recarray) >>> a2.foo rec.array([(0, 0), (0, 0), (0, 0), (0, 0)], ????? dtype=[('x', '>> a2.foo.x array([0, 0, 0, 0], dtype=uint16) >>> a2.foo.x[0] 0 >>> a2.foo[0] (0, 0) >>> a2.foo[0].x Traceback (most recent call last): ? File "", line 1, in AttributeError: 'numpy.void' object has no attribute 'x' (similarly, ``a2[0].foo`` raises an identical AttributeError) This is obstructive, particularly since ``a2.foo[0].x`` is the more logical grouping than ``a2.foo.x[0]`` -- we want the x field of item 0 in foo, not the 0th x-value in foo. I see this issue has come up previously... http://mail.scipy.org/pipermail/numpy-discussion/2008-August/036429.html The solution proposed by Travis in that email: ('arr.view(dtype=(np.record, b.dtype), type=np.recarray)') is ineffective with current versions of NumPy; the result is exactly the same as if you had not done it at all. I've tried various other methods including subclassing recarray and overriding __getitem__ and __getattribute__, with no success. My question is, is there a way to resolve this so that ``a2.foo[0].x`` does actually do what you'd expect it to? Thanks, David From ben.root at ou.edu Wed Feb 15 03:07:25 2012 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 15 Feb 2012 02:07:25 -0600 Subject: [Numpy-discussion] recarray field access asymmetry In-Reply-To: References: Message-ID: On Wednesday, February 15, 2012, David Gowers (kampu) <00ai99 at gmail.com> wrote: > Hi all, > > This email is about the difference, given a recarray 'arr', > between > > > A) > > arr.foo.x[0] > > and B) > > arr.foo[0].x > > > > Specifically, form A returns the 0-th x value, whereas form B raises > AttributeError: > > > Some code demonstrating this: > >>>> arr = np.zeros((4,), dtype = [('foo',[('x','H'),('y','H')])]) >>>> a2 = arr.view (np.recarray) >>>> a2.foo > rec.array([(0, 0), (0, 0), (0, 0), (0, 0)], > dtype=[('x', ' >>>> a2.foo.x > array([0, 0, 0, 0], dtype=uint16) > >>>> a2.foo.x[0] > 0 > >>>> a2.foo[0] > (0, 0) >>>> a2.foo[0].x > Traceback (most recent call last): > File "", line 1, in > AttributeError: 'numpy.void' object has no attribute 'x' > > (similarly, ``a2[0].foo`` raises an identical AttributeError) > > > This is obstructive, particularly since ``a2.foo[0].x`` is the more > logical grouping than ``a2.foo.x[0]`` -- we want the x field of item 0 > in foo, not the 0th x-value in foo. > > I see this issue has come up previously... > http://mail.scipy.org/pipermail/numpy-discussion/2008-August/036429.html > > The solution proposed by Travis in that email: > > ('arr.view(dtype=(np.record, b.dtype), type=np.recarray)') > > is ineffective with current versions of NumPy; the result is exactly > the same as if you had not done it at all. > I've tried various other methods including subclassing recarray and > overriding __getitem__ and __getattribute__, with no success. > > My question is, is there a way to resolve this so that ``a2.foo[0].x`` > does actually do what you'd expect it to? > > Thanks, > David > Rather than recarrays, I just use structured arrays like so: A = np.array([(0, 0), (0, 0), (0, 0), (0, 0)], dtype=[('x', ' From ben.root at ou.edu Wed Feb 15 03:10:31 2012 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 15 Feb 2012 02:10:31 -0600 Subject: [Numpy-discussion] recarray field access asymmetry In-Reply-To: References: Message-ID: On Wednesday, February 15, 2012, Benjamin Root wrote: > > > On Wednesday, February 15, 2012, David Gowers (kampu) <00ai99 at gmail.com> wrote: >> Hi all, >> >> This email is about the difference, given a recarray 'arr', >> between >> >> >> A) >> >> arr.foo.x[0] >> >> and B) >> >> arr.foo[0].x >> >> >> >> Specifically, form A returns the 0-th x value, whereas form B raises >> AttributeError: >> >> >> Some code demonstrating this: >> >>>>> arr = np.zeros((4,), dtype = [('foo',[('x','H'),('y','H')])]) >>>>> a2 = arr.view (np.recarray) >>>>> a2.foo >> rec.array([(0, 0), (0, 0), (0, 0), (0, 0)], >> dtype=[('x', '> >>>>> a2.foo.x >> array([0, 0, 0, 0], dtype=uint16) >> >>>>> a2.foo.x[0] >> 0 >> >>>>> a2.foo[0] >> (0, 0) >>>>> a2.foo[0].x >> Traceback (most recent call last): >> File "", line 1, in >> AttributeError: 'numpy.void' object has no attribute 'x' >> >> (similarly, ``a2[0].foo`` raises an identical AttributeError) >> >> >> This is obstructive, particularly since ``a2.foo[0].x`` is the more >> logical grouping than ``a2.foo.x[0]`` -- we want the x field of item 0 >> in foo, not the 0th x-value in foo. >> >> I see this issue has come up previously... >> http://mail.scipy.org/pipermail/numpy-discussion/2008-August/036429.html >> >> The solution proposed by Travis in that email: >> >> ('arr.view(dtype=(np.record, b.dtype), type=np.recarray)') >> >> is ineffective with current versions of NumPy; the result is exactly >> the same as if you had not done it at all. >> I've tried various other methods including subclassing recarray and >> overriding __getitem__ and __getattribute__, with no success. >> >> My question is, is there a way to resolve this so that ``a2.foo[0].x`` >> does actually do what you'd expect it to? >> >> Thanks, >> David >> > > Rather than recarrays, I just use structured arrays like so: > > A = np.array([(0, 0), (0, 0), (0, 0), (0, 0)], > dtype=[('x', ' > I can then do: > > A['x'][0] > > Or > > A[0]['x'] > > This allows me to slice and access the data any way I want. I have even been able to use this dictionary idiom to format strings and such. > > Does that help? > Ben Root Sorry, didn't see that you have nested dtypes. Is there a particular reason why you need record arrays over structured arrays? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From elcortogm at googlemail.com Wed Feb 15 03:25:16 2012 From: elcortogm at googlemail.com (Steve Schmerler) Date: Wed, 15 Feb 2012 09:25:16 +0100 Subject: [Numpy-discussion] repeat array along new axis without making a copy Message-ID: <20120215082516.GA2467@ramrod.starsherrifs.de> Hi I'd like to repeat an array along a new axis (like broadcast): In [8]: a Out[8]: array([[0, 1, 2], [3, 4, 5]]) In [9]: b=repeat(a[None,...], 3, axis=0) In [10]: b Out[10]: array([[[0, 1, 2], [3, 4, 5]], [[0, 1, 2], [3, 4, 5]], [[0, 1, 2], [3, 4, 5]]]) In [18]: id(a); id(b[0,...]); id(b[1,...]); id(b[2,...]) Out[18]: 40129600 Out[18]: 39752080 Out[18]: 40445232 Out[18]: 40510272 Can I do this such that each sub-array b[i,...] is a view and not a copy? Background: I'm working on a container class to store trajectory-like data. The API requires that each array has a "time axis" (axis=0 here) along which sub-arrays are stored which may be the same in some cases. Then, I don't want to store redundant information if possible. Thanks! best, Steve From 00ai99 at gmail.com Wed Feb 15 03:25:47 2012 From: 00ai99 at gmail.com (David Gowers (kampu)) Date: Wed, 15 Feb 2012 18:55:47 +1030 Subject: [Numpy-discussion] recarray field access asymmetry In-Reply-To: References: Message-ID: Hi Ben, Thanks for your prompt response. On Wed, Feb 15, 2012 at 6:40 PM, Benjamin Root wrote: > >> >> Rather than recarrays, I just use structured arrays like so: >> >> A = np.array([(0, 0), (0, 0), (0, 0), (0, 0)], >> ? ? ? ? ? ? ? ?dtype=[('x', '> >> I can then do: >> >> A['x'][0] >> >> Or >> >> A[0]['x'] >> >> This allows me to slice and access the data any way I want. ?I have even >> been able to use this dictionary idiom to format strings and such. >> >> Does that help? >> Ben Root > > Sorry, didn't see that you have nested dtypes. ?Is there a particular reason > why you need record arrays over structured arrays? It's really a matter of how much dereferencing of substructures occurs, and how much extra typing that turns into. A['x'][0] -> 4 extra characters per field lookup, vs A.x[0] / A[0].x -> 1 extra character per field lookup. There's also an issue of highlighting -- I'd prefer x to be highlighted in the style of an attribute, not a string, when I'm editing source. A['x'] obviously precludes this. I have considered normal structured arrays -- repeatedly, after being frustrated by this recarray behaviour. However, I'd like to achieve some kind of resolution here -- even if it is just this unexpected behaviour being properly documented. David From pierre.haessig at crans.org Wed Feb 15 04:35:38 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Wed, 15 Feb 2012 10:35:38 +0100 Subject: [Numpy-discussion] Numpy governance update - was: Updated differences between 1.5.1 to 1.6.1 In-Reply-To: References: Message-ID: <4F3B7C6A.4020507@crans.org> Le 15/02/2012 04:07, Bruce Southey a ?crit : > The one thing that gets over looked here is that there is a huge > diversity of users with very different skill levels. But very few > people have an understanding of the core code. (In fact the other > thread about type-casting suggests that it is extremely few people.) > So in all of this, I do not yet see 'community'. But the only way you > can change that perception is through actions. Hi Bruce, I agree with the "skill issue" you raised. My own experience being : 1) For some years, I've been a quite heavy user of numpy and various scipy modules. Zero knowledge of the numpy code 2) I recently (November 2011) subscribed to numpy & scipy ML. Going through the various topics coming every day, I feel like I'm learning more & faster. I'm now browsing numpy's GitHub from time to time. 3) Now I see regularly messages about topics like datetime(64) and NAs about which I feel I could share my ($.02 !) views as a user or as a potential user. But in the end I don't write anything because the issue is so complex that I feel both lost and silly. I have no solution to propose, so I try to keep on learning... -- Pierre From pierre.haessig at crans.org Wed Feb 15 05:17:33 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Wed, 15 Feb 2012 11:17:33 +0100 Subject: [Numpy-discussion] autocorrelation computation performance : use of np.correlate In-Reply-To: References: <4F2984E6.4070005@crans.org> Message-ID: <4F3B863D.2050800@crans.org> Le 04/02/2012 23:19, Ralf Gommers a ?crit : > > > scipy.signal is the right place I think. numpy shouldn't grow too many > functions like this. [going back in time on the autocorrelation topic] I see scipy.signal being the good place. However, I have the (possibly wrong) feeling that Matplotlib is not so much depending on scipy. Would Matplotlib's acorr/xcorr functions benefit from a faster function available in scipy as opposed to numpy ? Pierre -------------- next part -------------- An HTML attachment was scrubbed... URL: From scott.sinclair.za at gmail.com Wed Feb 15 06:08:30 2012 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Wed, 15 Feb 2012 13:08:30 +0200 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: References: Message-ID: On 19 January 2012 00:44, Fernando Perez wrote: > On Wed, Jan 18, 2012 at 2:18 AM, Scott Sinclair > wrote: >> It's rather confusing having two websites. The "official" page at >> http://www.scipy.org/Download points to github. > > The problem is that this page, which looks pretty official to just about anyone: > > http://numpy.scipy.org/ > > takes you to the one at new.scipy... ?So as far as traps for the > unwary go, this one was pretty cleverly laid out ;) The version of the numpy website now at http://numpy.github.com no longer points to the misleading and outdated new.scipy.org (an updated version of that site is at http://scipy.github.com). I think that numpy.scipy.org should be redirected to numpy.github.com as outlined at pages.github.com (see section on Custom Domains), and that it should happen sooner rather than later. Unfortunately I have no idea who has access to the DNS records (Ognen Duzlevski @ Enthought?). This change would remove one of the ways that people are currently directed to new.scipy.org. Cheers, Scott From thouis at gmail.com Wed Feb 15 06:11:59 2012 From: thouis at gmail.com (Thouis (Ray) Jones) Date: Wed, 15 Feb 2012 12:11:59 +0100 Subject: [Numpy-discussion] Migrating issues to GitHub In-Reply-To: References: Message-ID: On Sat, Feb 11, 2012 at 21:54, Fernando Perez wrote: > On Sat, Feb 11, 2012 at 12:36 PM, Pauli Virtanen wrote: >> The lack of attachments is the main problem with this transition. It's >> not so seldom that numerical input data or scripts demonstrating an >> issue come useful. This is probably less of an issue for Numpy than for >> Scipy, though. > > We've taken to using gist for scripts/data and free image hosting > sites for screenshots, using > > References: <4F2984E6.4070005@crans.org> <4F3B863D.2050800@crans.org> Message-ID: On Wednesday, February 15, 2012, Pierre Haessig wrote: > Le 04/02/2012 23:19, Ralf Gommers a ?crit : > > scipy.signal is the right place I think. numpy shouldn't grow too many functions like this. > > [going back in time on the autocorrelation topic] > > I see scipy.signal being the good place. However, I have the (possibly wrong) feeling that Matplotlib is not so much depending on scipy. > Would Matplotlib's acorr/xcorr functions benefit from a faster function available in scipy as opposed to numpy ? > > Pierre > Mpl does not depend on scipy at all and it is intended to stay that way. Mind you, mpl does not "require" acorr/xcorr because we do it using existing numpy tools. And any user needing a faster version can simply call it themselves and simply plot the results themselves. The current functions are merely convenience functions So, yes, it would be nice to have it in numpy, but if it fits better in scipy, then put it there. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Wed Feb 15 06:36:00 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 15 Feb 2012 12:36:00 +0100 Subject: [Numpy-discussion] Release management (was: Updated differences between 1.5.1 to 1.6.1) In-Reply-To: References: Message-ID: <20120215113558.GG7614@phare.normalesup.org> On Tue, Feb 14, 2012 at 09:14:11PM -0600, Bruce Southey wrote: > Ralf, > I will miss you as a numpy release manager! > You have not only done an incredible job but also taken the role to a > higher level. > Your attitude and attention to details has been amazing. I definitely +1 that. I think that you have really spurred the community to new heights in terms of quality and frequency of the releases. G From warren.weckesser at enthought.com Wed Feb 15 07:25:20 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Wed, 15 Feb 2012 06:25:20 -0600 Subject: [Numpy-discussion] repeat array along new axis without making a copy In-Reply-To: <20120215082516.GA2467@ramrod.starsherrifs.de> References: <20120215082516.GA2467@ramrod.starsherrifs.de> Message-ID: On Wed, Feb 15, 2012 at 2:25 AM, Steve Schmerler wrote: > Hi > > I'd like to repeat an array along a new axis (like broadcast): > > In [8]: a > Out[8]: > array([[0, 1, 2], > [3, 4, 5]]) > In [9]: b=repeat(a[None,...], 3, axis=0) > In [10]: b > Out[10]: > array([[[0, 1, 2], > [3, 4, 5]], > > [[0, 1, 2], > [3, 4, 5]], > > [[0, 1, 2], > [3, 4, 5]]]) > > In [18]: id(a); id(b[0,...]); id(b[1,...]); id(b[2,...]) > Out[18]: 40129600 > Out[18]: 39752080 > Out[18]: 40445232 > Out[18]: 40510272 > > > Can I do this such that each sub-array b[i,...] is a view and not a copy? > Yes, such an array can be created using the as_strided() function from the module numpy.lib.stride_tricks: In [1]: from numpy.lib.stride_tricks import as_strided In [2]: a = array([[1,2,3],[4,5,6]]) In [3]: b = as_strided(a, strides=(0, a.strides[0], a.strides[1]), shape=(3, a.shape[0], a.shape[1])) In [4]: b Out[4]: array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]]) In [5]: a[0,0] = 99 In [6]: b Out[6]: array([[[99, 2, 3], [ 4, 5, 6]], [[99, 2, 3], [ 4, 5, 6]], [[99, 2, 3], [ 4, 5, 6]]]) Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.raspaud at smhi.se Wed Feb 15 07:29:28 2012 From: martin.raspaud at smhi.se (Martin Raspaud) Date: Wed, 15 Feb 2012 13:29:28 +0100 Subject: [Numpy-discussion] Numpy 1.6.1 installation problem In-Reply-To: <4F3A8246.40307@gmail.com> References: <4F3A1F05.10609@smhi.se> <4F3A8246.40307@gmail.com> Message-ID: <4F3BA528.8080909@smhi.se> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 14/02/12 16:48, Bruce Southey wrote: > On 02/14/2012 09:40 AM, Olivier Delalleau wrote: >> Really not an expert here, but it looks like it's trying various >> compilation options, some work and some don't, and for some reason >> it's really unhappy about the one where it can't find Python.h. >> Maybe add /usr/include/python2.6 to your CPATH, see if that helps (and >> make sure permissions are correctly set on this directory)? However, >> it may very well be something else.... > This there a reason why you are using the fcompiler option? > If not just try the basic approach: > $ python setup.py build Hi guys, Thanks for the help. I'm getting past this error thanks to the CPATH environment variable. Unfortunately I get another error later on that the compiler can't find some C files... I attach the error. Best regards, Martin [...] creating build/temp.linux-x86_64-2.6/numpy/core/src/multiarray compile options: '-Inumpy/core/include - -Ibuild/src.linux-x86_64-2.6/numpy/core/include/numpy - -I/usr/lib64/python2.6/site-packages/numpy/core/include - -I/usr/include/python2.6 -Inumpy/core/src/private -Inumpy/core/src - -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray - -Inumpy/core/src/umath -Inumpy/core/include -c' gcc: numpy/core/src/multiarray/multiarraymodule_onefile.c numpy/core/src/multiarray/multiarraymodule_onefile.c:10:25: error: scalartypes.c: No such file or directory numpy/core/src/multiarray/multiarraymodule_onefile.c:14:24: error: arraytypes.c: No such file or directory In file included from numpy/core/src/multiarray/multiarraymodule_onefile.c:35: numpy/core/src/multiarray/conversion_utils.c: In function 'PyArray_PyIntAsInt': numpy/core/src/multiarray/conversion_utils.c:378: error: 'INT_Descr' undeclared (first use in this function) numpy/core/src/multiarray/conversion_utils.c:378: error: (Each undeclared identifier is reported only once numpy/core/src/multiarray/conversion_utils.c:378: error: for each function it appears in.) numpy/core/src/multiarray/conversion_utils.c: In function 'PyArray_PyIntAsIntp': numpy/core/src/multiarray/conversion_utils.c:467: error: 'LONG_Descr' undeclared (first use in this function) numpy/core/src/multiarray/multiarraymodule_onefile.c:38:20: error: nditer.c: No such file or directory numpy/core/src/multiarray/multiarraymodule_onefile.c:40:36: error: lowlevel_strided_loops.c: No such file or directory numpy/core/src/multiarray/multiarraymodule_onefile.c:42:20: error: einsum.c: No such file or directory numpy/core/src/multiarray/multiarraymodule_onefile.c:10:25: error: scalartypes.c: No such file or directory numpy/core/src/multiarray/multiarraymodule_onefile.c:14:24: error: arraytypes.c: No such file or directory In file included from numpy/core/src/multiarray/multiarraymodule_onefile.c:35: numpy/core/src/multiarray/conversion_utils.c: In function 'PyArray_PyIntAsInt': numpy/core/src/multiarray/conversion_utils.c:378: error: 'INT_Descr' undeclared (first use in this function) numpy/core/src/multiarray/conversion_utils.c:378: error: (Each undeclared identifier is reported only once numpy/core/src/multiarray/conversion_utils.c:378: error: for each function it appears in.) numpy/core/src/multiarray/conversion_utils.c: In function 'PyArray_PyIntAsIntp': numpy/core/src/multiarray/conversion_utils.c:467: error: 'LONG_Descr' undeclared (first use in this function) numpy/core/src/multiarray/multiarraymodule_onefile.c:38:20: error: nditer.c: No such file or directory numpy/core/src/multiarray/multiarraymodule_onefile.c:40:36: error: lowlevel_strided_loops.c: No such file or directory numpy/core/src/multiarray/multiarraymodule_onefile.c:42:20: error: einsum.c: No such file or directory error: Command "gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall - -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector - --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC - -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions - -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic - -D_GNU_SOURCE -fPIC -fwrapv -fPIC -Inumpy/core/include - -Ibuild/src.linux-x86_64-2.6/numpy/core/include/numpy - -I/usr/lib64/python2.6/site-packages/numpy/core/include - -I/usr/include/python2.6 -Inumpy/core/src/private -Inumpy/core/src - -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray - -Inumpy/core/src/umath -Inumpy/core/include -c numpy/core/src/multiarray/multiarraymodule_onefile.c -o build/temp.linux-x86_64-2.6/numpy/core/src/multiarray/multiarraymodule_onefile.o" failed with exit status 1 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) Comment: Using GnuPG with Red Hat - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJPO6UoAAoJEBdvyODiyJI4fwQH+wb3n69uJnSKN8upAkIIBykZ W1vUuvqz1M8Dzm3/drkn01NtDC5eCW2D2ncO97J/7HW4HB4NRDxfB3F/p+9or6Vk 4qv/HGeCTNXaGf5GKidfGXyo2CzWRj/uK00lCC8roUfrQo5+cgL/7hXYE4Z59QK/ S4NXrv7FU9pBysbpMJJ9O47yPfh2Z3qKKItvhiZM5jr5K91pMeDtIAB5HsJDV/4t 1gEGlZh/DCVZwQ8yWc8iDnk1uWVK5E+cGvwxBw9xA8TNTNxvqtSRaKlQhMhzyaZl vpd9vYiXijbR7iHjOhoriw/suI7RyX/oQl9AjvM+AQ4NoB/22xJqbqlXDSNcWXE= =PI/y -----END PGP SIGNATURE----- -------------- next part -------------- A non-text attachment was scrubbed... Name: martin_raspaud.vcf Type: text/x-vcard Size: 303 bytes Desc: not available URL: From ognen at enthought.com Wed Feb 15 08:18:22 2012 From: ognen at enthought.com (Ognen Duzlevski) Date: Wed, 15 Feb 2012 07:18:22 -0600 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: References: <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io> <4F1955BC.4000207@gmail.com> <563BE2FC-8439-4418-AB74-64C83A5B1503@continuum.io> Message-ID: On Wed, Feb 15, 2012 at 5:13 AM, Scott Sinclair wrote: > On 8 February 2012 00:03, Travis Oliphant wrote: >> >> On Feb 7, 2012, at 4:02 AM, Pauli Virtanen wrote: >> >>> Hi, >>> >>> 06.02.2012 20:41, Ralf Gommers kirjoitti: >>> [clip] >>>> I've created https://github.com/scipy/scipy.github.com and gave you >>>> permissions on that. So with that for the built html and >>>> https://github.com/scipy/scipy.org-new for the sources, that should do it. >>>> >>>> On the numpy org I don't have the right permissions to do the same. >>> >>> Ditto for numpy.github.com, now. >> >> This is really nice. ? It will really help us make changes to the web-site quickly and synchronously with code changes. >> >> John Turner at ORNL has the numpy.org domain and perhaps we could get him to point it to numpy.github.com > > It looks like numpy.org already redirects to numpy.scipy.org. So I > think redirecting numpy.scipy.org to github should "do the right > thing" I can do this - can I assume there is consensus that majority wants this done? Thank you, Ognen From fperez.net at gmail.com Wed Feb 15 08:30:01 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Wed, 15 Feb 2012 05:30:01 -0800 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: References: <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io> <4F1955BC.4000207@gmail.com> <563BE2FC-8439-4418-AB74-64C83A5B1503@continuum.io> Message-ID: On Wed, Feb 15, 2012 at 5:18 AM, Ognen Duzlevski wrote: >> It looks like numpy.org already redirects to numpy.scipy.org. So I >> think redirecting numpy.scipy.org to github should "do the right >> thing" > > I can do this - can I assume there is consensus that majority wants this done? +1, and thanks to Scott for pushing on this front! Cheers, f From scott.sinclair.za at gmail.com Wed Feb 15 08:42:45 2012 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Wed, 15 Feb 2012 15:42:45 +0200 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: References: <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io> <4F1955BC.4000207@gmail.com> <563BE2FC-8439-4418-AB74-64C83A5B1503@continuum.io> Message-ID: On 15 February 2012 15:30, Fernando Perez wrote: > On Wed, Feb 15, 2012 at 5:18 AM, Ognen Duzlevski wrote: >>> It looks like numpy.org already redirects to numpy.scipy.org. So I >>> think redirecting numpy.scipy.org to github should "do the right >>> thing" >> >> I can do this - can I assume there is consensus that majority wants this done? > > +1, and thanks to Scott for pushing on this front! Thanks Ognen. I think you can assume that there's consensus after a few +1's from core developers... Cheers, Scott From shish at keba.be Wed Feb 15 08:46:01 2012 From: shish at keba.be (Olivier Delalleau) Date: Wed, 15 Feb 2012 08:46:01 -0500 Subject: [Numpy-discussion] Numpy 1.6.1 installation problem In-Reply-To: <4F3BA528.8080909@smhi.se> References: <4F3A1F05.10609@smhi.se> <4F3A8246.40307@gmail.com> <4F3BA528.8080909@smhi.se> Message-ID: Le 15 f?vrier 2012 07:29, Martin Raspaud a ?crit : > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 14/02/12 16:48, Bruce Southey wrote: > > On 02/14/2012 09:40 AM, Olivier Delalleau wrote: > >> Really not an expert here, but it looks like it's trying various > >> compilation options, some work and some don't, and for some reason > >> it's really unhappy about the one where it can't find Python.h. > >> Maybe add /usr/include/python2.6 to your CPATH, see if that helps (and > >> make sure permissions are correctly set on this directory)? However, > >> it may very well be something else.... > > > This there a reason why you are using the fcompiler option? > > If not just try the basic approach: > > > $ python setup.py build > > > Hi guys, > > Thanks for the help. I'm getting past this error thanks to the CPATH > environment variable. > > Unfortunately I get another error later on that the compiler can't find > some C files... > > I attach the error. > > Best regards, > Martin > > [...] > creating build/temp.linux-x86_64-2.6/numpy/core/src/multiarray > compile options: '-Inumpy/core/include > - -Ibuild/src.linux-x86_64-2.6/numpy/core/include/numpy > - -I/usr/lib64/python2.6/site-packages/numpy/core/include > - -I/usr/include/python2.6 -Inumpy/core/src/private -Inumpy/core/src > - -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray > - -Inumpy/core/src/umath -Inumpy/core/include -c' > gcc: numpy/core/src/multiarray/multiarraymodule_onefile.c > numpy/core/src/multiarray/multiarraymodule_onefile.c:10:25: error: > scalartypes.c: No such file or directory > numpy/core/src/multiarray/multiarraymodule_onefile.c:14:24: error: > arraytypes.c: No such file or directory > In file included from > numpy/core/src/multiarray/multiarraymodule_onefile.c:35: > numpy/core/src/multiarray/conversion_utils.c: In function > 'PyArray_PyIntAsInt': > numpy/core/src/multiarray/conversion_utils.c:378: error: 'INT_Descr' > undeclared (first use in this function) > numpy/core/src/multiarray/conversion_utils.c:378: error: (Each > undeclared identifier is reported only once > numpy/core/src/multiarray/conversion_utils.c:378: error: for each > function it appears in.) > numpy/core/src/multiarray/conversion_utils.c: In function > 'PyArray_PyIntAsIntp': > numpy/core/src/multiarray/conversion_utils.c:467: error: 'LONG_Descr' > undeclared (first use in this function) > numpy/core/src/multiarray/multiarraymodule_onefile.c:38:20: error: > nditer.c: No such file or directory > numpy/core/src/multiarray/multiarraymodule_onefile.c:40:36: error: > lowlevel_strided_loops.c: No such file or directory > numpy/core/src/multiarray/multiarraymodule_onefile.c:42:20: error: > einsum.c: No such file or directory > numpy/core/src/multiarray/multiarraymodule_onefile.c:10:25: error: > scalartypes.c: No such file or directory > numpy/core/src/multiarray/multiarraymodule_onefile.c:14:24: error: > arraytypes.c: No such file or directory > In file included from > numpy/core/src/multiarray/multiarraymodule_onefile.c:35: > numpy/core/src/multiarray/conversion_utils.c: In function > 'PyArray_PyIntAsInt': > numpy/core/src/multiarray/conversion_utils.c:378: error: 'INT_Descr' > undeclared (first use in this function) > numpy/core/src/multiarray/conversion_utils.c:378: error: (Each > undeclared identifier is reported only once > numpy/core/src/multiarray/conversion_utils.c:378: error: for each > function it appears in.) > numpy/core/src/multiarray/conversion_utils.c: In function > 'PyArray_PyIntAsIntp': > numpy/core/src/multiarray/conversion_utils.c:467: error: 'LONG_Descr' > undeclared (first use in this function) > numpy/core/src/multiarray/multiarraymodule_onefile.c:38:20: error: > nditer.c: No such file or directory > numpy/core/src/multiarray/multiarraymodule_onefile.c:40:36: error: > lowlevel_strided_loops.c: No such file or directory > numpy/core/src/multiarray/multiarraymodule_onefile.c:42:20: error: > einsum.c: No such file or directory > error: Command "gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall > - -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector > - --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC > - -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions > - -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic > - -D_GNU_SOURCE -fPIC -fwrapv -fPIC -Inumpy/core/include > - -Ibuild/src.linux-x86_64-2.6/numpy/core/include/numpy > - -I/usr/lib64/python2.6/site-packages/numpy/core/include > - -I/usr/include/python2.6 -Inumpy/core/src/private -Inumpy/core/src > - -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray > - -Inumpy/core/src/umath -Inumpy/core/include -c > numpy/core/src/multiarray/multiarraymodule_onefile.c -o > > build/temp.linux-x86_64-2.6/numpy/core/src/multiarray/multiarraymodule_onefile.o" > failed with exit status 1 > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2.0.14 (GNU/Linux) > Comment: Using GnuPG with Red Hat - http://enigmail.mozdev.org/ > > iQEcBAEBAgAGBQJPO6UoAAoJEBdvyODiyJI4fwQH+wb3n69uJnSKN8upAkIIBykZ > W1vUuvqz1M8Dzm3/drkn01NtDC5eCW2D2ncO97J/7HW4HB4NRDxfB3F/p+9or6Vk > 4qv/HGeCTNXaGf5GKidfGXyo2CzWRj/uK00lCC8roUfrQo5+cgL/7hXYE4Z59QK/ > S4NXrv7FU9pBysbpMJJ9O47yPfh2Z3qKKItvhiZM5jr5K91pMeDtIAB5HsJDV/4t > 1gEGlZh/DCVZwQ8yWc8iDnk1uWVK5E+cGvwxBw9xA8TNTNxvqtSRaKlQhMhzyaZl > vpd9vYiXijbR7iHjOhoriw/suI7RyX/oQl9AjvM+AQ4NoB/22xJqbqlXDSNcWXE= > =PI/y > -----END PGP SIGNATURE----- > Hmm... since it tells you the gcc command line that fails, I'd suggest to copy / paste it in a shell prompt, run it, analyze the error, and figure out how to fix the command line. It looks suspicious though that you're having so much trouble... Have you tried without the fcompiler option like Bruce suggested? -=- Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Wed Feb 15 08:51:15 2012 From: alan.isaac at gmail.com (Alan G Isaac) Date: Wed, 15 Feb 2012 08:51:15 -0500 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: Message-ID: <4F3BB853.9090100@gmail.com> On 2/14/2012 10:07 PM, Bruce Southey wrote: > The one thing that gets over looked here is that there is a huge > diversity of users with very different skill levels. But very few > people have an understanding of the core code. (In fact the other > thread about type-casting suggests that it is extremely few people.) > So in all of this, I do not yet see 'community'. As an active user and long-time list member who has never even looked at the core code, I perhaps presumptuously urge a moderation of rhetoric. I object to the idea that users like myself do not form part of the "community". This list has 1400 subscribers, and the fact that most of us are quiet most of the time does not mean we are not interested or attentive to the discussions, including discussions of governance. It looks to me like this will be great for NumPy. People who would otherwise not be able to spend much time on NumPy will be spending a lot of time improving the code and adding features. In my view, this will help NumPy advance which will enlarge the user community, which will slowly but inevitably enlarge the contributor community. I'm pretty excited about Travis's bold efforts to find ways to allow him and others to spend more time on NumPy. I wish him the best of luck. Cheers, Alan Isaac From ognen at enthought.com Wed Feb 15 08:59:55 2012 From: ognen at enthought.com (Ognen Duzlevski) Date: Wed, 15 Feb 2012 07:59:55 -0600 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: References: <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io> <4F1955BC.4000207@gmail.com> <563BE2FC-8439-4418-AB74-64C83A5B1503@continuum.io> Message-ID: On Wed, Feb 15, 2012 at 7:42 AM, Scott Sinclair wrote: > On 15 February 2012 15:30, Fernando Perez wrote: >> On Wed, Feb 15, 2012 at 5:18 AM, Ognen Duzlevski wrote: >>>> It looks like numpy.org already redirects to numpy.scipy.org. So I >>>> think redirecting numpy.scipy.org to github should "do the right >>>> thing" >>> >>> I can do this - can I assume there is consensus that majority wants this done? >> >> +1, and thanks to Scott for pushing on this front! > > Thanks Ognen. I think you can assume that there's consensus after a > few +1's from core developers... Alright, it will happen sometime today and I will post a message announcing so. Ognen From pav at iki.fi Wed Feb 15 09:07:30 2012 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 15 Feb 2012 15:07:30 +0100 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: References: <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io> <4F1955BC.4000207@gmail.com> <563BE2FC-8439-4418-AB74-64C83A5B1503@continuum.io> Message-ID: Hi, 15.02.2012 14:59, Ognen Duzlevski kirjoitti: [clip] > Alright, it will happen sometime today and I will post a message announcing so. > Ognen Great! Once you have changed the records, we can adjust [1] the numpy.github.com page [1] to deal with the new virtual host name. Thanks, Pauli [1] http://pages.github.com/#custom_domains From ben.root at ou.edu Wed Feb 15 09:29:28 2012 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 15 Feb 2012 08:29:28 -0600 Subject: [Numpy-discussion] Implicit conversion of python datetime to numpy datetime64? In-Reply-To: References: Message-ID: On Tuesday, February 14, 2012, Mark Wiebe wrote: > On Tue, Feb 14, 2012 at 9:37 PM, Benjamin Root wrote: > > On Tuesday, February 14, 2012, Mark Wiebe wrote: >> On Tue, Feb 14, 2012 at 8:17 PM, Benjamin Root wrote: >>> >>> Just a thought I had. Right now, I can pass a list of python ints or floats into np.array() and get a numpy array with a sensible dtype. Is there any reason why we can't do the same for python's datetime? Right now, it is very easy for me to make a list comprehension of datetime objects using strptime(), but it is very awkward to make a numpy array out of it. >> >> I would consider this a bug, it's not behaving sensibly at present. Here's what it does for me: >> >> In [20]: np.array([datetime.datetime.strptime(date, "%m/%d/%y") for date in ["02/03/12", >> >> ...: "07/22/98", "12/12/12"]], dtype="M8") > > Well, I guess it would be nice if I didn't even have to provide the dtype (I.e., inferred from the datetime type, since we aren't talking about strings). But I hadn't noticed the above, I was just making object arrays. > >> >> --------------------------------------------------------------------------- >> >> TypeError Traceback (most recent call last) >> >> C:\Python27\Scripts\ in () >> >> 1 np.array([datetime.datetime.strptime(date, "%m/%d/%y") for date in ["02/03/12", >> >> ----> 2 "07/22/98", "12/12/12"]], dtype="M8") >> >> TypeError: Cannot cast datetime.datetime object from metadata [us] to [D] according to the rule 'same_kind' >> >> In [21]: np.array([datetime.datetime.strptime(date, "%m/%d/%y") for date in ["02/03/12", >> >> ...: "07/22/98", "12/12/12"]], dtype="M8[us]") >> >> Out[21]: >> >> array(['2012-02-02T16:00:00.000000-0800', >> >> '1998-07-21T17:00:00.000000-0700', '2012-12-11T16:00:00.000000-0800'], dtype='datetime64[us]') >> >> In [22]: np.array([datetime.datetime.strptime(date, "%m/%d/%y") for date in ["02/03/12", >> >> ...: "07/22/98", "12/12/12"]], dtype="M8[us]").astype("M8[D]") >> >> Out[22]: array(['2012-02-03', '1998-07-22', '2012-12-12'], dtype='datetime64[D]') >>> >>> The only barrier I can think of are those who have already built code around a object dtype array of datetime objects. >>> >>> Thoughts? >>> Ben Root >>> >>> P.S. - what ever happened to arange() and linspace() for datetime64? >> >> arange definitely works: >> In[28] np.arange('2011-03-02', '2011-04-01', dtype='M8') >> Out[28]: >> array(['2011-03-02', '2011-03-03', '2011-03-04', '2011-03-05', >> '2011-03-06', '2011-03-07', '2011-03-08', '2011-03-09', >> '2011-03-10', '2011-03-11', '2011-03-12', '2011-03-13', >> '2011-03-14', '2011-03-15', '2011-03-16', '2011-03-17', >> '2011-03-18', '2011-03-19', '2011-03-20', '2011-03-21', >> '2011-03-22', '2011-03-23', '2011-03-24', '2011-03-25', >> '2011-03-26', '2011-03-27', '2011-03-28', '2011-03-29', >> '2011-03-30', '2011-03-31'], dtype='datetime64[D]') >> I didn't get to implementing linspace. I did look at it, but the current code didn't make it a trivial thing to put in. >> -Mark > > Sorry, I wasn't clear about arange, I meant that it would be nice if it could take python datetimes as arguments (and timedelat for the step?) because that is much more intuitive than remembering the exact dtype code and string format. > > I see it as the numpy datetime64 type could take three types for it's constructor: another datetime64, python datetime, and The standard unambiguous datetime string. I should be able to use these interchangeably in numpy. The same would be true for timedelta64. > > Easy interchange between pyth > > Ben Walsh actually implemented this and the code is in a pull request here: > https://github.com/numpy/numpy/pull/111 > This didn't go in, because the datetime properties don't exist on the arrays after you convert them to datetime64, so there could be some unintuitive consequences from that. When Martin implemented the quaternion dtype, we discussed the possibility that dtypes could expose properties that show up on the array object, and if this were implemented I think the conversion and compatibility between python datetime and datetime64 could be made quite natural. > -Mark > Actually, at first glance, I don't see why this shouldn't go ahead as-is. If I know I am getting datetime64, then I should expect to lose the features of the datetime object, right. Sure, it would be nice if it kept those attributes, but keeping them would provide an inconsistent interface in the case of a numpy array created from datetime objects and one created from datetime64 objects (unless I misunderstood) I will read through the pull request more closely and comment further. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From elcortogm at googlemail.com Wed Feb 15 10:04:48 2012 From: elcortogm at googlemail.com (Steve Schmerler) Date: Wed, 15 Feb 2012 16:04:48 +0100 Subject: [Numpy-discussion] repeat array along new axis without making a copy In-Reply-To: References: <20120215082516.GA2467@ramrod.starsherrifs.de> Message-ID: <20120215150448.GF1193@cartman.physik.tu-freiberg.de> On Feb 15 06:25 -0600, Warren Weckesser wrote: > Yes, such an array can be created using the as_strided() function from the > module numpy.lib.stride_tricks: Thank you, I will look into that. best, Steve From scipy at samueljohn.de Wed Feb 15 10:16:13 2012 From: scipy at samueljohn.de (Samuel John) Date: Wed, 15 Feb 2012 16:16:13 +0100 Subject: [Numpy-discussion] repeat array along new axis without making a copy References: <9C73F0AB-E7AA-4AB4-B441-7ADE6FDDFB19@samueljohn.de> Message-ID: Wow, I wasn't aware of that even if I work with numpy for years now. NumPy is amazing. Samuel From charlesr.harris at gmail.com Wed Feb 15 10:17:49 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 15 Feb 2012 08:17:49 -0700 Subject: [Numpy-discussion] Release management (was: Updated differences between 1.5.1 to 1.6.1) In-Reply-To: References: Message-ID: On Tue, Feb 14, 2012 at 1:24 PM, Ralf Gommers wrote: > > > On Tue, Feb 14, 2012 at 7:25 PM, Travis Oliphant wrote: > >> >> On Feb 14, 2012, at 3:32 AM, David Cournapeau wrote: >> >> > Hi Travis, >> > >> > It is great that some resources can be spent to have people paid to >> > work on NumPy. Thank you for making that happen. >> > >> > I am slightly confused about roadmaps for numpy 1.8 and 2.0. This >> > needs discussion on the ML, and our release manager currently is Ralf >> > - he is the one who ultimately decides what goes when. >> >> Thank you for reminding me of this. Ralf and I spoke several days ago, >> and have been working on how to give him more time to spend on SciPy >> full-time. > > > Well, full-time is the job that I get paid for:) > > As a result, he will be release managing NumPy 1.7, but for NumPy 1.8, I >> will be the release manager again. Ralf will continue serving as release >> manager for SciPy. >> > > I had planned to bring this up only after the 1.7 release but yes, I would > like to push the balance of my open-source work a little from > release/maintenance work towards writing more new code. I've been doing > both NumPy and SciPy releases for about two years now, and it's time for me > to hand over the manager hat for one of those two. And my preference is to > keep on doing the SciPy releases rather than the NumPy ones. > > For NumPy 2.0 and beyond, Mark Wiebe will likely be the release manager. >> I only know that I won't be release manager past NumPy 1.X. >> > > Travis, it's very good to see that the release manager role can be filled > going forward (it's not the most popular job), but I think the way it > should work is that people volunteer for this role and then the community > agrees on giving a volunteer that role. > > I actually started contributing when David asked for someone to take over > from him in the above manner. Maybe someone else will step up now, giving > you or Mark more time to work on new NumPy features (which I'm pretty sure > you'd prefer). > > And you saved our ass. Numpy development would have ground to a stop without your great work. Thanks. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Feb 15 10:23:53 2012 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 15 Feb 2012 15:23:53 +0000 Subject: [Numpy-discussion] repeat array along new axis without making a copy In-Reply-To: References: <9C73F0AB-E7AA-4AB4-B441-7ADE6FDDFB19@samueljohn.de> Message-ID: On Wed, Feb 15, 2012 at 15:16, Samuel John wrote: > Wow, I wasn't aware of that even if I work with numpy for years now. > NumPy is amazing. It's deliberately unpublicized because you can cause segfaults if you get your math wrong. But once you get your math right and can wrap it up into a utility function, it works great. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From ben.root at ou.edu Wed Feb 15 11:36:16 2012 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 15 Feb 2012 10:36:16 -0600 Subject: [Numpy-discussion] Implicit conversion of python datetime to numpy datetime64? In-Reply-To: References: Message-ID: On Wed, Feb 15, 2012 at 8:29 AM, Benjamin Root wrote: > > > On Tuesday, February 14, 2012, Mark Wiebe wrote: > > On Tue, Feb 14, 2012 at 9:37 PM, Benjamin Root wrote: > > > > On Tuesday, February 14, 2012, Mark Wiebe wrote: > >> On Tue, Feb 14, 2012 at 8:17 PM, Benjamin Root wrote: > >>> > >>> Just a thought I had. Right now, I can pass a list of python ints or > floats into np.array() and get a numpy array with a sensible dtype. Is > there any reason why we can't do the same for python's datetime? Right > now, it is very easy for me to make a list comprehension of datetime > objects using strptime(), but it is very awkward to make a numpy array out > of it. > >> > >> I would consider this a bug, it's not behaving sensibly at present. > Here's what it does for me: > >> > >> In [20]: np.array([datetime.datetime.strptime(date, "%m/%d/%y") for > date in ["02/03/12", > >> > >> ...: "07/22/98", "12/12/12"]], dtype="M8") > > > > Well, I guess it would be nice if I didn't even have to provide the > dtype (I.e., inferred from the datetime type, since we aren't talking about > strings). But I hadn't noticed the above, I was just making object arrays. > > > >> > >> > --------------------------------------------------------------------------- > >> > >> TypeError Traceback (most recent call last) > >> > >> C:\Python27\Scripts\ in () > >> > >> 1 np.array([datetime.datetime.strptime(date, "%m/%d/%y") for date in > ["02/03/12", > >> > >> ----> 2 "07/22/98", "12/12/12"]], dtype="M8") > >> > >> TypeError: Cannot cast datetime.datetime object from metadata [us] to > [D] according to the rule 'same_kind' > >> > >> In [21]: np.array([datetime.datetime.strptime(date, "%m/%d/%y") for > date in ["02/03/12", > >> > >> ...: "07/22/98", "12/12/12"]], dtype="M8[us]") > >> > >> Out[21]: > >> > >> array(['2012-02-02T16:00:00.000000-0800', > >> > >> '1998-07-21T17:00:00.000000-0700', '2012-12-11T16:00:00.000000-0800'], > dtype='datetime64[us]') > >> > >> In [22]: np.array([datetime.datetime.strptime(date, "%m/%d/%y") for > date in ["02/03/12", > >> > >> ...: "07/22/98", "12/12/12"]], dtype="M8[us]").astype("M8[D]") > >> > >> Out[22]: array(['2012-02-03', '1998-07-22', '2012-12-12'], > dtype='datetime64[D]') > >>> > >>> The only barrier I can think of are those who have already built code > around a object dtype array of datetime objects. > >>> > >>> Thoughts? > >>> Ben Root > >>> > >>> P.S. - what ever happened to arange() and linspace() for datetime64? > >> > >> arange definitely works: > >> In[28] np.arange('2011-03-02', '2011-04-01', dtype='M8') > >> Out[28]: > >> array(['2011-03-02', '2011-03-03', '2011-03-04', '2011-03-05', > >> '2011-03-06', '2011-03-07', '2011-03-08', '2011-03-09', > >> '2011-03-10', '2011-03-11', '2011-03-12', '2011-03-13', > >> '2011-03-14', '2011-03-15', '2011-03-16', '2011-03-17', > >> '2011-03-18', '2011-03-19', '2011-03-20', '2011-03-21', > >> '2011-03-22', '2011-03-23', '2011-03-24', '2011-03-25', > >> '2011-03-26', '2011-03-27', '2011-03-28', '2011-03-29', > >> '2011-03-30', '2011-03-31'], dtype='datetime64[D]') > >> I didn't get to implementing linspace. I did look at it, but the > current code didn't make it a trivial thing to put in. > >> -Mark > > > > Sorry, I wasn't clear about arange, I meant that it would be nice if it > could take python datetimes as arguments (and timedelat for the step?) > because that is much more intuitive than remembering the exact dtype code > and string format. > > > > I see it as the numpy datetime64 type could take three types for it's > constructor: another datetime64, python datetime, and The standard > unambiguous datetime string. I should be able to use these interchangeably > in numpy. The same would be true for timedelta64. > > > > Easy interchange between pyth > > > > Ben Walsh actually implemented this and the code is in a pull request > here: > > https://github.com/numpy/numpy/pull/111 > > This didn't go in, because the datetime properties don't exist on the > arrays after you convert them to datetime64, so there could be some > unintuitive consequences from that. When Martin implemented the quaternion > dtype, we discussed the possibility that dtypes could expose properties > that show up on the array object, and if this were implemented I think the > conversion and compatibility between python datetime and datetime64 could > be made quite natural. > > -Mark > > > > Actually, at first glance, I don't see why this shouldn't go ahead as-is. > If I know I am getting datetime64, then I should expect to lose the > features of the datetime object, right. Sure, it would be nice if it kept > those attributes, but keeping them would provide an inconsistent interface > in the case of a numpy array created from datetime objects and one created > from datetime64 objects (unless I misunderstood) > > I will read through the pull request more closely and comment further. > > Ben Root > Ok, I did some more testing between the master branch and the pull request. I suspect that something is interfering with the type conversion because walshb's branch pulled on top of the current master yields the same results as for the current master (see next). If passed a datetime, date, time or timedelta object ""without specifying the dtype"", you will get object arrays, which will, of course allow one to access attributes such as .year, .month, etc. >>> np.array([date(2000, 1, 1)]) array([2000-01-01], dtype=object) If passed a date object with dtype='M8', or a timedelta object with dtype='m8', you will get a datetime64 (or timedelta64): >>> np.array([date(2000, 1, 1)], dtype='M8') array(['2000-01-01'], dtype='datetime64[D]') >>> np.array([timedelta(0, 0, 0)], dtype='m8') array([0], dtype='timedelta64[us]') The exception noted before only happens when a datetime object is passed in. As an additional note, a time object passed in with dtype 'M8' will throw a ValueError because of the decision not to support times that are without dates. Personally, I wonder if this should instead be treated like a timedelta64 object, but I haven't thought through the consequences of that yet. I should also note a slight difference between the results from master and from v1.6.1. In v1.6.1, creating an array with datetime objects and dtype='M8' works: >>> np.array([datetime(2000, 1, 1)], dtype='M8') array([2000-01-01 00:00:00], dtype=datetime64[us]) and for passing in a date object, the dtype is named something slightly different (and the string repr is different): >>> np.array([date(2000, 1, 1)], dtype='M8') array([2000-01-01 00:00:00], dtype=datetime64[us]) The above has a dtype of 'datetime64[us]' instead of the current 'datetime64[D]', and it displays the time part, which is not currently done (but that is likely due to the '[D]' part of the datetime). So, where does that leave us? Well, I do agree that there is likely a problem with possible existing code that expects to create an object array. Maybe an implicit conversion should be held off until version 2.0? Until then, I would be happy with better documentation of the current abilities. The datetime64 page currently only shows how to make a datetime64 array using strings, implying that that is the only method. Maybe the top of that page should have a section showing how to create a datetime64 (and timedelta64) array using both string and datetime (timedelta) data sources. It should also mention the need for providing the dtype (and possibly noting that future releases may not have that requirement?). Cheers! Ben Root P.S. - the need for linspace has come up for me multiple times. I might try putting something together. -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Wed Feb 15 12:29:23 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 15 Feb 2012 11:29:23 -0600 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: References: <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io> <4F1955BC.4000207@gmail.com> <563BE2FC-8439-4418-AB74-64C83A5B1503@continuum.io> Message-ID: <6A57CCDE-A6EC-4D3F-A57C-34BC8D5296DC@continuum.io> It certainly would help people keep the NumPy web-site up to date. Thanks Ognen. -Travis On Feb 15, 2012, at 7:18 AM, Ognen Duzlevski wrote: > On Wed, Feb 15, 2012 at 5:13 AM, Scott Sinclair > wrote: >> On 8 February 2012 00:03, Travis Oliphant wrote: >>> >>> On Feb 7, 2012, at 4:02 AM, Pauli Virtanen wrote: >>> >>>> Hi, >>>> >>>> 06.02.2012 20:41, Ralf Gommers kirjoitti: >>>> [clip] >>>>> I've created https://github.com/scipy/scipy.github.com and gave you >>>>> permissions on that. So with that for the built html and >>>>> https://github.com/scipy/scipy.org-new for the sources, that should do it. >>>>> >>>>> On the numpy org I don't have the right permissions to do the same. >>>> >>>> Ditto for numpy.github.com, now. >>> >>> This is really nice. It will really help us make changes to the web-site quickly and synchronously with code changes. >>> >>> John Turner at ORNL has the numpy.org domain and perhaps we could get him to point it to numpy.github.com >> >> It looks like numpy.org already redirects to numpy.scipy.org. So I >> think redirecting numpy.scipy.org to github should "do the right >> thing" > > I can do this - can I assume there is consensus that majority wants this done? > > Thank you, > Ognen > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From travis at continuum.io Wed Feb 15 12:32:27 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 15 Feb 2012 11:32:27 -0600 Subject: [Numpy-discussion] Release management (was: Updated differences between 1.5.1 to 1.6.1) In-Reply-To: References: Message-ID: <37C69EE9-FEEE-4152-AF0E-7CD5D39C3831@continuum.io> On Feb 15, 2012, at 9:17 AM, Charles R Harris wrote: > > > On Tue, Feb 14, 2012 at 1:24 PM, Ralf Gommers wrote: > > > On Tue, Feb 14, 2012 at 7:25 PM, Travis Oliphant wrote: > > On Feb 14, 2012, at 3:32 AM, David Cournapeau wrote: > > > Hi Travis, > > > > It is great that some resources can be spent to have people paid to > > work on NumPy. Thank you for making that happen. > > > > I am slightly confused about roadmaps for numpy 1.8 and 2.0. This > > needs discussion on the ML, and our release manager currently is Ralf > > - he is the one who ultimately decides what goes when. > > Thank you for reminding me of this. Ralf and I spoke several days ago, and have been working on how to give him more time to spend on SciPy full-time. > > Well, full-time is the job that I get paid for:) > > As a result, he will be release managing NumPy 1.7, but for NumPy 1.8, I will be the release manager again. Ralf will continue serving as release manager for SciPy. > > I had planned to bring this up only after the 1.7 release but yes, I would like to push the balance of my open-source work a little from release/maintenance work towards writing more new code. I've been doing both NumPy and SciPy releases for about two years now, and it's time for me to hand over the manager hat for one of those two. And my preference is to keep on doing the SciPy releases rather than the NumPy ones. > > For NumPy 2.0 and beyond, Mark Wiebe will likely be the release manager. I only know that I won't be release manager past NumPy 1.X. > > Travis, it's very good to see that the release manager role can be filled going forward (it's not the most popular job), but I think the way it should work is that people volunteer for this role and then the community agrees on giving a volunteer that role. > > I actually started contributing when David asked for someone to take over from him in the above manner. Maybe someone else will step up now, giving you or Mark more time to work on new NumPy features (which I'm pretty sure you'd prefer). > > > And you saved our ass. Numpy development would have ground to a stop without your great work. Thanks. > +10 -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From francesc at continuum.io Wed Feb 15 12:39:39 2012 From: francesc at continuum.io (Francesc Alted) Date: Wed, 15 Feb 2012 18:39:39 +0100 Subject: [Numpy-discussion] David M. Cooke? Message-ID: Hi, I know this is a bit unusual, but in the last few years I completely lost the track of David M. Cooke. Most of the veterans on this list will remember him as being a great contributor to NumPy during the years 2004 to 2007. He was the creator of numexpr (back in 2006) too. I wonder if somebody knows about him. If so, please tell me. Thanks! -- Francesc Alted From ognen at enthought.com Wed Feb 15 13:27:18 2012 From: ognen at enthought.com (Ognen Duzlevski) Date: Wed, 15 Feb 2012 12:27:18 -0600 Subject: [Numpy-discussion] Download page still points to SVN In-Reply-To: <6A57CCDE-A6EC-4D3F-A57C-34BC8D5296DC@continuum.io> References: <031B94B2-5A18-4592-BBAC-56FDC169738A@continuum.io> <4F1955BC.4000207@gmail.com> <563BE2FC-8439-4418-AB74-64C83A5B1503@continuum.io> <6A57CCDE-A6EC-4D3F-A57C-34BC8D5296DC@continuum.io> Message-ID: OK, the deed has been done :) Ognen On Wed, Feb 15, 2012 at 11:29 AM, Travis Oliphant wrote: > It certainly would help people keep the NumPy web-site up to date. ? Thanks Ognen. > > -Travis > > On Feb 15, 2012, at 7:18 AM, Ognen Duzlevski wrote: > >> On Wed, Feb 15, 2012 at 5:13 AM, Scott Sinclair >> wrote: >>> On 8 February 2012 00:03, Travis Oliphant wrote: >>>> >>>> On Feb 7, 2012, at 4:02 AM, Pauli Virtanen wrote: >>>> >>>>> Hi, >>>>> >>>>> 06.02.2012 20:41, Ralf Gommers kirjoitti: >>>>> [clip] >>>>>> I've created https://github.com/scipy/scipy.github.com and gave you >>>>>> permissions on that. So with that for the built html and >>>>>> https://github.com/scipy/scipy.org-new for the sources, that should do it. >>>>>> >>>>>> On the numpy org I don't have the right permissions to do the same. >>>>> >>>>> Ditto for numpy.github.com, now. >>>> >>>> This is really nice. ? It will really help us make changes to the web-site quickly and synchronously with code changes. >>>> >>>> John Turner at ORNL has the numpy.org domain and perhaps we could get him to point it to numpy.github.com >>> >>> It looks like numpy.org already redirects to numpy.scipy.org. So I >>> think redirecting numpy.scipy.org to github should "do the right >>> thing" >> >> I can do this - can I assume there is consensus that majority wants this done? >> >> Thank you, >> Ognen >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From matthew.brett at gmail.com Wed Feb 15 13:50:44 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 15 Feb 2012 10:50:44 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: <4F3BB853.9090100@gmail.com> References: <4F3BB853.9090100@gmail.com> Message-ID: Hi, On Wed, Feb 15, 2012 at 5:51 AM, Alan G Isaac wrote: > On 2/14/2012 10:07 PM, Bruce Southey wrote: >> The one thing that gets over looked here is that there is a huge >> diversity of users with very different skill levels. But very few >> people have an understanding of the core code. (In fact the other >> thread about type-casting suggests that it is extremely few people.) >> So in all of this, I do not yet see 'community'. > > > As an active user and long-time list member > who has never even looked at the core code, > I perhaps presumptuously urge a moderation > of rhetoric. I object to the idea that users > like myself do not form part of the "community". > > This list has 1400 subscribers, and the fact that > most of us are quiet most of the time does not mean we > are not interested or attentive to the discussions, > including discussions of governance. > > It looks to me like this will be great for NumPy. > People who would otherwise not be able to spend much > time on NumPy will be spending a lot of time improving > the code and adding features. In my view, this will help > NumPy advance which will enlarge the user community, which will > slowly but inevitably enlarge the contributor community. > I'm pretty excited about Travis's bold efforts to find > ways to allow him and others to spend more time on NumPy. > I wish him the best of luck. I think it is important to stick to the thread topic here, which is 'Governance'. It's not about whether it is good or bad that Travis has re-engaged in Numpy and is funding development in Numpy through his company. I'm personally very glad to see Travis back on the list and engaged again, but that's really not what the thread is about. The thread is about whether we need explicit Numpy governance, especially in the situation where one new company will surely dominate numpy development in the short term at least. I would say - for the benefit of Continuum Analytics and for the Numpy community, there should be explicit governance, that takes this relationship into account. I believe that leaving the governance informal and underspecified at this stage would be a grave mistake, for everyone concerned. Best, Matthew From souheil.inati at nih.gov Wed Feb 15 14:23:40 2012 From: souheil.inati at nih.gov (Inati, Souheil (NIH/NIMH) [E]) Date: Wed, 15 Feb 2012 14:23:40 -0500 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com>, Message-ID: <6C69E7675B43074B8AD0C1106661B9B1036CCC836E@NIHMLBX10.nih.gov> Hello, ________________________________________ From: Matthew Brett [matthew.brett at gmail.com] Sent: Wednesday, February 15, 2012 1:50 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Numpy governance update Hi, On Wed, Feb 15, 2012 at 5:51 AM, Alan G Isaac wrote: > On 2/14/2012 10:07 PM, Bruce Southey wrote: >> The one thing that gets over looked here is that there is a huge >> diversity of users with very different skill levels. But very few >> people have an understanding of the core code. (In fact the other >> thread about type-casting suggests that it is extremely few people.) >> So in all of this, I do not yet see 'community'. > > > As an active user and long-time list member > who has never even looked at the core code, > I perhaps presumptuously urge a moderation > of rhetoric. I object to the idea that users > like myself do not form part of the "community". > > This list has 1400 subscribers, and the fact that > most of us are quiet most of the time does not mean we > are not interested or attentive to the discussions, > including discussions of governance. > > It looks to me like this will be great for NumPy. > People who would otherwise not be able to spend much > time on NumPy will be spending a lot of time improving > the code and adding features. In my view, this will help > NumPy advance which will enlarge the user community, which will > slowly but inevitably enlarge the contributor community. > I'm pretty excited about Travis's bold efforts to find > ways to allow him and others to spend more time on NumPy. > I wish him the best of luck. I think it is important to stick to the thread topic here, which is 'Governance'. It's not about whether it is good or bad that Travis has re-engaged in Numpy and is funding development in Numpy through his company. I'm personally very glad to see Travis back on the list and engaged again, but that's really not what the thread is about. The thread is about whether we need explicit Numpy governance, especially in the situation where one new company will surely dominate numpy development in the short term at least. I would say - for the benefit of Continuum Analytics and for the Numpy community, there should be explicit governance, that takes this relationship into account. I believe that leaving the governance informal and underspecified at this stage would be a grave mistake, for everyone concerned. Best, Matthew ___________________________ As another of the "silent" users of numpy, I agree with Matthew 100%. As great and trustworthy as Travis is, there is a very real potential for conflict of interest here. He is going to be leading an organization to raise and distribute funding and at the same time leading a commercial for profit enterprise that would apply to this foundation for funds, as well as being a major player in the direction of the open source project that his company is building on. This is not in and of itself a problem, but the boundaries have to be very clear and layed out in advance. Which hat is he wearing when he recommends one course of action over another? I understand the company is just getting off the ground and that the foundation is even less well formed, but numpy is a mature code with lots of users. It's governance structure should reflect this. Continued thanks for all of the hard work. -Souheil Inati From alan.isaac at gmail.com Wed Feb 15 14:32:54 2012 From: alan.isaac at gmail.com (Alan G Isaac) Date: Wed, 15 Feb 2012 14:32:54 -0500 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> Message-ID: <4F3C0866.3000703@gmail.com> On 2/15/2012 1:50 PM, Matthew Brett wrote: > I believe that leaving the governance informal and underspecified at > this stage would be a grave mistake, for everyone concerned. To justify that concern, can you point to an analogous case, where things went awry by not formalizing the governance structure? Can you provide an example where a more formal governance structure for NumPy would have meant more or better code development? (Please do not suggest the NA discussion!) Can you provide an example of what you might envision as a "more formal governance structure"? (I assume that any such structure will not put people who are not core contributors to NumPy in a position to tell core contributors what to spend their time on.) Early last December, Chuck Harris estimated that three people were active NumPy developers. I liked the idea of creating a "board" of these 3 and a rule that says any active developer can request to join the board, that additions are determined by majority vote of the existing board, and that having the board both small and odd numbered is a priority. I also suggested inviting to this board a developer or two from important projects that are very NumPy dependent (e.g., Matplotlib). I still like this idea. Would it fully satisfy you? Still, honestly, I have trouble seeing how implementing this idea would currently have much affect on the substance or extent or direction of the conversations about NumPy. Thanks, Alan PS Just to jog the group memory, Travis announced more than four months ago *on this list* that he had "been approached about the possibility of creating a foundation to support the development of SciPy and NumPy", and had become interested in creating a "Foundation for the Advancement of Scientific, Technical, and Engineering Computing Using High Level Abstractions (FASTECUHLA)", and had created an open discussion list for this at fastecuhla at googlegroups.com From efiring at hawaii.edu Wed Feb 15 14:33:58 2012 From: efiring at hawaii.edu (Eric Firing) Date: Wed, 15 Feb 2012 09:33:58 -1000 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> Message-ID: <4F3C08A6.5040807@hawaii.edu> On 02/15/2012 08:50 AM, Matthew Brett wrote: > Hi, > > On Wed, Feb 15, 2012 at 5:51 AM, Alan G Isaac wrote: >> On 2/14/2012 10:07 PM, Bruce Southey wrote: >>> The one thing that gets over looked here is that there is a huge >>> diversity of users with very different skill levels. But very few >>> people have an understanding of the core code. (In fact the other >>> thread about type-casting suggests that it is extremely few people.) >>> So in all of this, I do not yet see 'community'. >> >> >> As an active user and long-time list member >> who has never even looked at the core code, >> I perhaps presumptuously urge a moderation >> of rhetoric. I object to the idea that users >> like myself do not form part of the "community". >> >> This list has 1400 subscribers, and the fact that >> most of us are quiet most of the time does not mean we >> are not interested or attentive to the discussions, >> including discussions of governance. >> >> It looks to me like this will be great for NumPy. >> People who would otherwise not be able to spend much >> time on NumPy will be spending a lot of time improving >> the code and adding features. In my view, this will help >> NumPy advance which will enlarge the user community, which will >> slowly but inevitably enlarge the contributor community. >> I'm pretty excited about Travis's bold efforts to find >> ways to allow him and others to spend more time on NumPy. >> I wish him the best of luck. > > I think it is important to stick to the thread topic here, which is > 'Governance'. Do you have in mind a model of how this might work? (I suspect you have already answered a question like that in some earlier thread; sorry.) A comparable project that is doing it right? "Governance" implies enforcement power, doesn't it? Where, how, and by whom would the power be exercised? > > It's not about whether it is good or bad that Travis has re-engaged in > Numpy and is funding development in Numpy through his company. I'm > personally very glad to see Travis back on the list and engaged again, > but that's really not what the thread is about. > > The thread is about whether we need explicit Numpy governance, > especially in the situation where one new company will surely dominate > numpy development in the short term at least. > > I would say - for the benefit of Continuum Analytics and for the Numpy > community, there should be explicit governance, that takes this > relationship into account. Please elaborate; are you saying that Continuum Analytics must develop numpy as decided by some outside body? Eric > > I believe that leaving the governance informal and underspecified at > this stage would be a grave mistake, for everyone concerned. > > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ben.root at ou.edu Wed Feb 15 14:46:36 2012 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 15 Feb 2012 13:46:36 -0600 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: <4F3C0866.3000703@gmail.com> References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> Message-ID: On Wed, Feb 15, 2012 at 1:32 PM, Alan G Isaac wrote: > On 2/15/2012 1:50 PM, Matthew Brett wrote: > > I believe that leaving the governance informal and underspecified at > > this stage would be a grave mistake, for everyone concerned. > > To justify that concern, can you point to an > analogous case, where things went awry by not > formalizing the governance structure? > > Can you provide an example where a more formal > governance structure for NumPy would have meant > more or better code development? (Please do not > suggest the NA discussion!) > > Why not the NA discussion? Would we really want to have that happen again? Note that it still isn't fully resolved and progress still needs to be made (I think the last thread did an excellent job of fleshing out the ideas, but it became too much to digest. We may need to have someone go through the information, reduce it down and make one last push to bring it to a conclusion). The NA discussion is the perfect example where a governance structure would help resolve disputes. > Can you provide an example of what you might > envision as a "more formal governance structure"? > (I assume that any such structure will not put people > who are not core contributors to NumPy in a position > to tell core contributors what to spend their time on.) > > Early last December, Chuck Harris estimated that three > people were active NumPy developers. I liked the idea of > creating a "board" of these 3 and a rule that says any > active developer can request to join the board, that > additions are determined by majority vote of the existing > board, and that having the board both small and odd > numbered is a priority. I also suggested inviting to this > board a developer or two from important projects that are > very NumPy dependent (e.g., Matplotlib). > > I still like this idea. Would it fully satisfy you? > > I actually like that idea. Matthew, is this along the lines of what you were thinking? > Still, honestly, I have trouble seeing how implementing this > idea would currently have much affect on the substance or > extent or direction of the conversations about NumPy. > > Personally, I see it more for creating official long-term goal-posts, and for resolving disputes. Not every change needs to go through this board, that would be overkill. But maybe the board can have a process for dealing with NEPs and RFCs. I can envision having the representatives of other projects (such as SciPy and matplotlib) can file official comments on any NEPs regarding possible impacts and usefulness. > Thanks, > Alan > > PS Just to jog the group memory, Travis announced more than > four months ago *on this list* that he had "been approached > about the possibility of creating a foundation to support > the development of SciPy and NumPy", and had become > interested in creating a "Foundation for the Advancement of > Scientific, Technical, and Engineering Computing Using High > Level Abstractions (FASTECUHLA)", and had created an open > discussion list for this at fastecuhla at googlegroups.com > > I don't think that is in dispute. Personally, I think it would have been nice to get occasional status updates along the way, "keeping us in the loop". This didn't happen, but I am not going to complain too much on this. The group is still a work in progress and I think it is only fair that the group occasionally pings this mailing-list for important progress reports. My two cents, Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Wed Feb 15 15:00:58 2012 From: alan.isaac at gmail.com (Alan G Isaac) Date: Wed, 15 Feb 2012 15:00:58 -0500 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> Message-ID: <4F3C0EFA.1000701@gmail.com> On 2/15/2012 2:46 PM, Benjamin Root wrote: > I think it is only fair that the group occasionally pings this mailing-list for important progress reports. No offense intended, but that sounds like an unfunded mandate. More useful would be an offer to liaison between the two. Cheers, Alan From matthew.brett at gmail.com Wed Feb 15 15:01:50 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 15 Feb 2012 12:01:50 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: <4F3C08A6.5040807@hawaii.edu> References: <4F3BB853.9090100@gmail.com> <4F3C08A6.5040807@hawaii.edu> Message-ID: Hi, Thanks for these interesting and specific questions. On Wed, Feb 15, 2012 at 11:33 AM, Eric Firing wrote: > On 02/15/2012 08:50 AM, Matthew Brett wrote: >> Hi, >> >> On Wed, Feb 15, 2012 at 5:51 AM, Alan G Isaac ?wrote: >>> On 2/14/2012 10:07 PM, Bruce Southey wrote: >>>> The one thing that gets over looked here is that there is a huge >>>> diversity of users with very different skill levels. But very few >>>> people have an understanding of the core code. (In fact the other >>>> thread about type-casting suggests that it is extremely few people.) >>>> So in all of this, I do not yet see 'community'. >>> >>> >>> As an active user and long-time list member >>> who has never even looked at the core code, >>> I perhaps presumptuously urge a moderation >>> of rhetoric. I object to the idea that users >>> like myself do not form part of the "community". >>> >>> This list has 1400 subscribers, and the fact that >>> most of us are quiet most of the time does not mean we >>> are not interested or attentive to the discussions, >>> including discussions of governance. >>> >>> It looks to me like this will be great for NumPy. >>> People who would otherwise not be able to spend much >>> time on NumPy will be spending a lot of time improving >>> the code and adding features. In my view, this will help >>> NumPy advance which will enlarge the user community, which will >>> slowly but inevitably enlarge the contributor community. >>> I'm pretty excited about Travis's bold efforts to find >>> ways to allow him and others to spend more time on NumPy. >>> I wish him the best of luck. >> >> I think it is important to stick to the thread topic here, which is >> 'Governance'. > > Do you have in mind a model of how this might work? ?(I suspect you have > already answered a question like that in some earlier thread; sorry.) ?A > comparable project that is doing it right? The example that had come up previously was the book by Karl Fogel: http://producingoss.com/en/social-infrastructure.html http://producingoss.com/en/consensus-democracy.html In particular, the section "When Consensus Cannot Be Reached, Vote" in the second page. Here's an example of a voting policy: http://www.apache.org/foundation/voting.html Debian is a famous example: http://www.debian.org/devel/constitution Obviously some open-source projects do not have much of a formal governance structure, but I think in our case a) we have already run into problems with big decisions and b) we have now reached a situation where there is serious potential for actual or perceived problems with conflicts of interest. > "Governance" implies enforcement power, doesn't it? ?Where, how, and by > whom would the power be exercised? The governance that I had in mind is more to do with review and constraint of power. Thus, I believe we need a set of rules to govern how we deal with serious disputes, such as the masked array NA debate, or, previously the ABI breakage discussion at numpy 1.5.0. To go to a specific use-case. Let us imagine that Continuum think of an excellent feature they want in Numpy but that many others think would make the underlying array object too complicated. How would the desires of Continuum be weighed against the desires of other members of the community? >> It's not about whether it is good or bad that Travis has re-engaged in >> Numpy and is funding development in Numpy through his company. ? I'm >> personally very glad to see Travis back on the list and engaged again, >> but that's really not what the thread is about. >> >> The thread is about whether we need explicit Numpy governance, >> especially in the situation where one new company will surely dominate >> numpy development in the short term at least. >> >> I would say - for the benefit of Continuum Analytics and for the Numpy >> community, there should be explicit governance, that takes this >> relationship into account. > > Please elaborate; are you saying that Continuum Analytics must develop > numpy as decided by some outside body? No - of course not. Here's the discussion from Karl Fogel's book: http://producingoss.com/en/contracting.html I'm proposing Governance not as some council that contracts work, but as a committee set up with formal rules that can resolve disputes and rule changes as they arise. This committee needs to be able to do this to make sure that the interests of the community (developers of numpy outside Continuum) are being represented. Best, Matthew From alan.isaac at gmail.com Wed Feb 15 15:03:02 2012 From: alan.isaac at gmail.com (Alan G Isaac) Date: Wed, 15 Feb 2012 15:03:02 -0500 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> Message-ID: <4F3C0F76.8070700@gmail.com> On 2/15/2012 2:46 PM, Benjamin Root wrote: > The NA discussion is the perfect example where a governance structure would help resolve disputes. How? I'm not seeing it. Who would have behaved differently and why? Alan From matthew.brett at gmail.com Wed Feb 15 15:09:48 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 15 Feb 2012 12:09:48 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> Message-ID: Hi, On Wed, Feb 15, 2012 at 11:46 AM, Benjamin Root wrote: > > > On Wed, Feb 15, 2012 at 1:32 PM, Alan G Isaac wrote: >> Can you provide an example where a more formal >> governance structure for NumPy would have meant >> more or better code development? (Please do not >> suggest the NA discussion!) >> > > Why not the NA discussion?? Would we really want to have that happen again? > Note that it still isn't fully resolved and progress still needs to be made > (I think the last thread did an excellent job of fleshing out the ideas, but > it became too much to digest.? We may need to have someone go through the > information, reduce it down and make one last push to bring it to a > conclusion).? The NA discussion is the perfect example where a governance > structure would help resolve disputes. Yes, that was the most obvious example. I don't know about you, but I can't see any sign of that one being resolved. The other obvious example was the dispute about ABI breakage for numpy 1.5.0 where I believe Travis did invoke some sort of committee to vote, but (Travis can correct me if I'm wrong), the committee was named ad-hoc and contacted off-list. > >> >> Can you provide an example of what you might >> envision as a "more formal governance structure"? >> (I assume that any such structure will not put people >> who are not core contributors to NumPy in a position >> to tell core contributors what to spend their time on.) >> >> Early last December, Chuck Harris estimated that three >> people were active NumPy developers. ?I liked the idea of >> creating a "board" of these 3 and a rule that says any >> active developer can request to join the board, that >> additions are determined by majority vote of the existing >> board, and ?that having the board both small and odd >> numbered is a priority. ?I also suggested inviting to this >> board a developer or two from important projects that are >> very NumPy dependent (e.g., Matplotlib). >> >> I still like this idea. ?Would it fully satisfy you? >> > > I actually like that idea.? Matthew, is this along the lines of what you > were thinking? Honestly it would make me very happy if the discussion moved to what form the governance should take. I would have thought that 3 was too small a number. We should look at what other projects do. I think that this committee needs to be people who know numpy code; projects using numpy could advise, but people developing numpy should vote I think. There should be rules of engagement, a constitution, especially how to deal with disputes with Continuum or other contracting organizations. I would personally very much like to see a committment to consensus, where possible on these lines (as noted previously by Nathaniel): http://producingoss.com/en/consensus-democracy.html Best, Matthew From ben.root at ou.edu Wed Feb 15 15:12:01 2012 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 15 Feb 2012 14:12:01 -0600 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: <4F3C0F76.8070700@gmail.com> References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C0F76.8070700@gmail.com> Message-ID: On Wed, Feb 15, 2012 at 2:03 PM, Alan G Isaac wrote: > On 2/15/2012 2:46 PM, Benjamin Root wrote: > > The NA discussion is the perfect example where a governance structure > would help resolve disputes. > > > How? I'm not seeing it. > Who would have behaved differently and why? > > Alan > > I am pretty sure it was Matthew (but I could be wrong here) on numerous occasions would have stated that if there was some sort of process that was agreed upon beforehand, that he would have abided by the decision/outcome of that process. As much as I disagreed with Matthew and others on the design of NA (and the amount of review that went into the branch getting merged), I do agree that a formal process for handling grievances would have helped mitigate much of the problems in the discussions. Further, a more fleshed out review process for major changes would have aired out more of the design decisions (somewhat). Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Feb 15 15:22:02 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 15 Feb 2012 12:22:02 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: <4F3C0F76.8070700@gmail.com> References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C0F76.8070700@gmail.com> Message-ID: Hi, On Wed, Feb 15, 2012 at 12:03 PM, Alan G Isaac wrote: > On 2/15/2012 2:46 PM, Benjamin Root wrote: >> The NA discussion is the perfect example where a governance structure would help resolve disputes. > > > How? I'm not seeing it. > Who would have behaved differently and why? Let's say that we had a formal commitment to consensus (where possible): http://producingoss.com/en/consensus-democracy.html I believe that would have fundamentally changed the discussion, and would have led to a better result. Next, imagine (sorry, I'm replying partly to Ben here) that there was no such commitment, but there was a group of people who know the code, who's job it was to review both sides of a dispute, summarize their understanding, and then vote. I believe that all of us, whatever the result, would have accepted that that was the procedure, and the discussion would have been able to move on. However, what did happen was it seems to me characteristic of the situation where it is unclear how decisions get made, which is that it becomes possible to force outcomes (there's no-one to stop that). That's why we have governments, to reduce the arbitrary use of power. It believe it makes society more efficient when we do that. But in any case, the situation has changed. Now the economic interests of the main numpy developers can come into conflict with the needs of the community. I'm not saying they will, I'm saying they can. In that situation, it seems to me that it is of obvious and overriding importance to specify how decisions are made. Best, Matthew From alan.isaac at gmail.com Wed Feb 15 15:45:38 2012 From: alan.isaac at gmail.com (Alan G Isaac) Date: Wed, 15 Feb 2012 15:45:38 -0500 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C0F76.8070700@gmail.com> Message-ID: <4F3C1972.3000801@gmail.com> My analysis is fundamentally different than Matthew and Benjamin's for a few reasons. 1. The problem has been miscast. The "economic interests" of the developers *always* has had an apparent conflict with the economic interests of the users: users want developers to work more on the code, and developers need to make a living, which often involves spending their time on other things. On this score, nothing has really changed. 2. It seems pretty clear that Matthew wants some governance power to be held by individuals who are not actively developing NumPy. As Chuck Harris pointed out long ago, that dog ain't going to hunt. 3. Constitutions can be broken (and are, all the time). Designing a stable institution requires making it in the interests of the members to participate. Any formal governance structure that can be desirable for the NumPy community as a whole has to be desirable for the core developers. The right way to produce a governance structure is to make concrete proposals and show how these proposals are in the interest of the *developers* (as well as of the users). For example, Benjamin obliquely suggested that with an appropriate governance board, the NA discussion could have simply been shut down by having the developers vote (as part of their governance). This might be in the interest of the developers and of the community (I'm not sure), but I doubt it is what Matthew has in mind. In any case, until proposals are put on the table along with a clear effort to illustrate why it is in the interest of the *developers* to adopt the proposals, I really do not see this discussion moving forward. fwiw, Alan Isaac From mwwiebe at gmail.com Wed Feb 15 15:55:16 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 15 Feb 2012 12:55:16 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> Message-ID: On Wed, Feb 15, 2012 at 12:09 PM, Matthew Brett wrote: > Hi, > > On Wed, Feb 15, 2012 at 11:46 AM, Benjamin Root wrote: > > > > > > On Wed, Feb 15, 2012 at 1:32 PM, Alan G Isaac > wrote: > >> Can you provide an example where a more formal > >> governance structure for NumPy would have meant > >> more or better code development? (Please do not > >> suggest the NA discussion!) > >> > > > > Why not the NA discussion? Would we really want to have that happen > again? > > Note that it still isn't fully resolved and progress still needs to be > made > > (I think the last thread did an excellent job of fleshing out the ideas, > but > > it became too much to digest. We may need to have someone go through the > > information, reduce it down and make one last push to bring it to a > > conclusion). The NA discussion is the perfect example where a governance > > structure would help resolve disputes. > > Yes, that was the most obvious example. I don't know about you, but I > can't see any sign of that one being resolved. > > The other obvious example was the dispute about ABI breakage for numpy > 1.5.0 where I believe Travis did invoke some sort of committee to > vote, but (Travis can correct me if I'm wrong), the committee was > named ad-hoc and contacted off-list. > > > > >> > >> Can you provide an example of what you might > >> envision as a "more formal governance structure"? > >> (I assume that any such structure will not put people > >> who are not core contributors to NumPy in a position > >> to tell core contributors what to spend their time on.) > >> > >> Early last December, Chuck Harris estimated that three > >> people were active NumPy developers. I liked the idea of > >> creating a "board" of these 3 and a rule that says any > >> active developer can request to join the board, that > >> additions are determined by majority vote of the existing > >> board, and that having the board both small and odd > >> numbered is a priority. I also suggested inviting to this > >> board a developer or two from important projects that are > >> very NumPy dependent (e.g., Matplotlib). > >> > >> I still like this idea. Would it fully satisfy you? > >> > > > > I actually like that idea. Matthew, is this along the lines of what you > > were thinking? > > Honestly it would make me very happy if the discussion moved to what > form the governance should take. I would have thought that 3 was too > small a number. One thing to note about this point is that during the NA discussion, the only people doing active C-level development were Charles and me. I suspect a discussion about how to recruit more people into that group might be more important than governance at this point in time. If we need a formal structure, maybe a good approach is giving Travis the final say for now, until a trigger point occurs. That could be 6 months after the number of active developers hits 5, or something like that. At that point, we would reopen the discussion with a larger group of people who would directly play in that role, and any decision made then will probably be better than a decision we make now while the development team is so small. -Mark > We should look at what other projects do. I think > that this committee needs to be people who know numpy code; projects > using numpy could advise, but people developing numpy should vote I > think. > > There should be rules of engagement, a constitution, especially how to > deal with disputes with Continuum or other contracting organizations. > > I would personally very much like to see a committment to consensus, > where possible on these lines (as noted previously by Nathaniel): > > http://producingoss.com/en/consensus-democracy.html > > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From perry at stsci.edu Wed Feb 15 16:00:55 2012 From: perry at stsci.edu (Perry Greenfield) Date: Wed, 15 Feb 2012 16:00:55 -0500 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C08A6.5040807@hawaii.edu> Message-ID: <8775765F-2BEB-49DD-BB79-F407CC303E40@stsci.edu> On Feb 15, 2012, at 3:01 PM, Matthew Brett wrote: [...] My 2 cents. I think you put too much faith in formal systems. There are plenty of examples of formal governance that fail miserably. In the end it depends on the people and their willingness to continue cooperating. Formal governance won't protect people from misbehaving or abusing the formal system if they are so inclined. So my thought on this is why not see how it works out. Even if Travis's company has a conflict of interest (and that is certainly a possibility) it isn't always a bad thing. Look at two scenarios: 1) a project requires that all work is done by altruistic people with no conflicts of interest. But it languishes due to a lack of sufficient resources. 2) a big, bad, evil, self-interested company infuses lots of resources and talent, and they bend the project their way to meet their interests. The resulting project has lots of new capability, but isn't quite as pure as the altruistic people would have had it. (Mind you, it's still open source software!) Neither is ideal. But sometimes it's 2) that has led to progress. If the distortion of the self interested companies it too big, then it's a net negative. But even the self-interested company has a large stake in seeing the community not split. And you see this in the open source community all the time, even from the "altruistic". Those that do the work generally get the most say in how it is done. Finally, I think you should cut Travis some slack here. No one has come close to the personal investment in numpy that he has (and you probably aren't aware of all if it). If anyone deserves the benefit of the doubt, it's Travis. Why not base criticism on actual problems rather than anticipated ones? Perry (full disclosure: one of those selected board members) From matthew.brett at gmail.com Wed Feb 15 16:25:06 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 15 Feb 2012 13:25:06 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: <8775765F-2BEB-49DD-BB79-F407CC303E40@stsci.edu> References: <4F3BB853.9090100@gmail.com> <4F3C08A6.5040807@hawaii.edu> <8775765F-2BEB-49DD-BB79-F407CC303E40@stsci.edu> Message-ID: Hi, On Wed, Feb 15, 2012 at 1:00 PM, Perry Greenfield wrote: > On Feb 15, 2012, at 3:01 PM, Matthew Brett wrote: > > [...] > > My 2 cents. > > I think you put too much faith in formal systems. There are plenty of > examples of formal governance that fail miserably. In the end it > depends on the people and their willingness to continue cooperating. > Formal governance won't protect people from misbehaving or abusing the > formal system if they are so inclined. I think that's bad argument. I doubt any sensible person would claim formal systems can't be abused. Nor that formal systems are harder to abuse than no system. > So my thought on this is why not see how it works out. Even if > Travis's company has a conflict of interest (and that is certainly a > possibility) it isn't always a bad thing. Look at two scenarios: > > 1) a project requires that all work is done by altruistic people with > no conflicts of interest. But it languishes ?due to a lack of > sufficient resources. > > 2) a big, bad, evil, self-interested company infuses lots of resources > and talent, and they bend the project their way to meet their > interests. The resulting project has lots of new capability, but isn't > quite as pure as the altruistic people would have had it. (Mind you, > it's still open source software!) Again, this is a false dichotomy. No one is suggesting stopping Continuum working on numpy. It's not sensible to paint 'the community' as 'altruistic' and Continuum as not. We are not discussing whether Continuum is good or bad. That's discussion is pointless. We are discussing whether Numpy in general would benefit from a governance structure which could protect it from the risks inherent in the situation where a company holds a very large part of the development talent and time for an open-source project. What are the risks? 1) The main development discussions are happening in the offices of a company rather than on-list. The understanding of the code and the code changes moves into those offices. 2) The decisions are being made by these same people who, of course, can wander into each other's offices and discuss the problem. Hence, even if the decisions are good ones, it is can be hard for others outside those offices to review them, understand them or own them. 3) When discussions arise, without formal governance, it is easy for the impression to form that there is a 'Continuum' view and other views. Suspicion can naturally arise even if the reason for the Continuum view is fully in the interests of the community. 4) It is possible for Continuum to want features that are good for Continuum, but bad for the code-base in general. For example, Continuum may have some product that requires a particular arcane feature in numpy. Through these mechanisms, Numpy can lose developers and commitment (please note) *relative to the situation where there is formal governance*. Obviously, at worst, this can lead to a split. We can avoid that *risk* with a sensible governance model that is satisfactory for all parties. I'm sure that's achievable. > Neither is ideal. But sometimes it's 2) that has led to progress. If > the distortion of the self interested companies it too big, then it's > a net negative. But even the self-interested company has a large stake > in seeing the community not split. > > And you see this in the open source community all the time, even from > the "altruistic". Those that do the work generally get the most say in > how it is done. > > Finally, I think you should cut Travis some slack here. ... > Why not base criticism on actual problems > rather than anticipated ones? 1) I don't feel that I am criticizing here, I am arguing for particular political change in numpy governance. We must distinguish between criticism and constructive suggestion, otherwise we're going to get stuck. 2) We have already had problems (NA / masks, 1.5.0). 3) The current situation is new and has obvious risks that may be easy to deal with. Not to consider these problems on the basis they haven't come up yet does not seem wise to me. Best, Matthew From charlesr.harris at gmail.com Wed Feb 15 16:27:11 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 15 Feb 2012 14:27:11 -0700 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> Message-ID: On Wed, Feb 15, 2012 at 1:55 PM, Mark Wiebe wrote: > On Wed, Feb 15, 2012 at 12:09 PM, Matthew Brett wrote: > >> Hi, >> >> On Wed, Feb 15, 2012 at 11:46 AM, Benjamin Root wrote: >> > >> > >> > On Wed, Feb 15, 2012 at 1:32 PM, Alan G Isaac >> wrote: >> >> Can you provide an example where a more formal >> >> governance structure for NumPy would have meant >> >> more or better code development? (Please do not >> >> suggest the NA discussion!) >> >> >> > >> > Why not the NA discussion? Would we really want to have that happen >> again? >> > Note that it still isn't fully resolved and progress still needs to be >> made >> > (I think the last thread did an excellent job of fleshing out the >> ideas, but >> > it became too much to digest. We may need to have someone go through >> the >> > information, reduce it down and make one last push to bring it to a >> > conclusion). The NA discussion is the perfect example where a >> governance >> > structure would help resolve disputes. >> >> Yes, that was the most obvious example. I don't know about you, but I >> can't see any sign of that one being resolved. >> >> The other obvious example was the dispute about ABI breakage for numpy >> 1.5.0 where I believe Travis did invoke some sort of committee to >> vote, but (Travis can correct me if I'm wrong), the committee was >> named ad-hoc and contacted off-list. >> >> > >> >> >> >> Can you provide an example of what you might >> >> envision as a "more formal governance structure"? >> >> (I assume that any such structure will not put people >> >> who are not core contributors to NumPy in a position >> >> to tell core contributors what to spend their time on.) >> >> >> >> Early last December, Chuck Harris estimated that three >> >> people were active NumPy developers. I liked the idea of >> >> creating a "board" of these 3 and a rule that says any >> >> active developer can request to join the board, that >> >> additions are determined by majority vote of the existing >> >> board, and that having the board both small and odd >> >> numbered is a priority. I also suggested inviting to this >> >> board a developer or two from important projects that are >> >> very NumPy dependent (e.g., Matplotlib). >> >> >> >> I still like this idea. Would it fully satisfy you? >> >> >> > >> > I actually like that idea. Matthew, is this along the lines of what you >> > were thinking? >> >> Honestly it would make me very happy if the discussion moved to what >> form the governance should take. I would have thought that 3 was too >> small a number. > > > One thing to note about this point is that during the NA discussion, the > only people doing active C-level development were Charles and me. I suspect > a discussion about how to recruit more people into that group might be more > important than governance at this point in time. > > You flatter me, but thanks ;) Over the past 15 months or so, it's been pretty much all Mark. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Feb 15 16:36:58 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 15 Feb 2012 13:36:58 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> Message-ID: Hi, On Wed, Feb 15, 2012 at 12:55 PM, Mark Wiebe wrote: > On Wed, Feb 15, 2012 at 12:09 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Wed, Feb 15, 2012 at 11:46 AM, Benjamin Root wrote: >> > >> > >> > On Wed, Feb 15, 2012 at 1:32 PM, Alan G Isaac >> > wrote: >> >> Can you provide an example where a more formal >> >> governance structure for NumPy would have meant >> >> more or better code development? (Please do not >> >> suggest the NA discussion!) >> >> >> > >> > Why not the NA discussion?? Would we really want to have that happen >> > again? >> > Note that it still isn't fully resolved and progress still needs to be >> > made >> > (I think the last thread did an excellent job of fleshing out the ideas, >> > but >> > it became too much to digest.? We may need to have someone go through >> > the >> > information, reduce it down and make one last push to bring it to a >> > conclusion).? The NA discussion is the perfect example where a >> > governance >> > structure would help resolve disputes. >> >> Yes, that was the most obvious example. I don't know about you, but I >> can't see any sign of that one being resolved. >> >> The other obvious example was the dispute about ABI breakage for numpy >> 1.5.0 where I believe Travis did invoke some sort of committee to >> vote, but (Travis can correct me if I'm wrong), the committee was >> named ad-hoc and contacted off-list. >> >> > >> >> >> >> Can you provide an example of what you might >> >> envision as a "more formal governance structure"? >> >> (I assume that any such structure will not put people >> >> who are not core contributors to NumPy in a position >> >> to tell core contributors what to spend their time on.) >> >> >> >> Early last December, Chuck Harris estimated that three >> >> people were active NumPy developers. ?I liked the idea of >> >> creating a "board" of these 3 and a rule that says any >> >> active developer can request to join the board, that >> >> additions are determined by majority vote of the existing >> >> board, and ?that having the board both small and odd >> >> numbered is a priority. ?I also suggested inviting to this >> >> board a developer or two from important projects that are >> >> very NumPy dependent (e.g., Matplotlib). >> >> >> >> I still like this idea. ?Would it fully satisfy you? >> >> >> > >> > I actually like that idea.? Matthew, is this along the lines of what you >> > were thinking? >> >> Honestly it would make me very happy if the discussion moved to what >> form the governance should take. ?I would have thought that 3 was too >> small a number. > > > One thing to note about this point is that during the NA discussion, the > only people doing active C-level development were Charles and me. I suspect > a discussion about how to recruit more people into that group might be more > important than governance at this point in time. Mark - a) thanks for replying, it's good to hear your voice and b) I don't think there's any competition between the discussion about governance and the need to recruit more people into the group who understand the C code. Remember we are deciding here between governance - of a form to be decided - and no governance - which I think is the current situation. I know your desire is to see more people contributing to the C code. It would help a lot if you could say what you think the barriers are, how they could be lowered, and the risks that you see as a result of the numpy C expertise moving essentially into one company. Then we can formulate some governance that would help lower those barriers and reduce those risks. > If we need a formal structure, maybe a good approach is giving Travis the > final say for now, until a trigger point occurs. That could be 6 months > after the number of active developers hits 5, or something like that. At > that point, we would reopen the discussion with a larger group of people who > would directly play in that role, and any decision made then will probably > be better than a decision we make now while the development team is so > small. Honestly - as I was saying to Alan and indirectly to Ben - any formal model - at all - is preferable to the current situation. Personally, I would say that making the founder of a company, which is working to make money from Numpy, the only decision maker on numpy - is - scary. But maybe it's the best way. But, again, we're all high-functioning sensible people, I'm sure it's possible for us to formulate what the risks are, what the potential solutions are, and come up with the best - maybe short-term - solution, See you, Matthew From matthew.brett at gmail.com Wed Feb 15 16:43:33 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 15 Feb 2012 13:43:33 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: <4F3C1972.3000801@gmail.com> References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C0F76.8070700@gmail.com> <4F3C1972.3000801@gmail.com> Message-ID: Hi, On Wed, Feb 15, 2012 at 12:45 PM, Alan G Isaac wrote: > My analysis is fundamentally different than Matthew > and Benjamin's for a few reasons. > > 1. The problem has been miscast. > ? ?The "economic interests" of the developers *always* > ? ?has had an apparent conflict with the economic > ? ?interests of the users: users want developers to work more > ? ?on the code, and developers need to make a living, which > ? ?often involves spending their time on other things. > ? ?On this score, nothing has really changed. > 2. It seems pretty clear that Matthew wants some governance > ? ?power to be held by individuals who are not actively > ? ?developing NumPy. ?As Chuck Harris pointed out long ago, > ? ?that dog ain't going to hunt. > 3. Constitutions can be broken (and are, all the time). > ? ?Designing a stable institution requires making it in > ? ?the interests of the members to participate. > > Any formal governance structure that can be desirable > for the NumPy community as a whole has to be desirable > for the core developers. ?The right way to produce a > governance structure is to make concrete proposals and > show how these proposals are in the interest of the > *developers* (as well as of the users). > > For example, Benjamin obliquely suggested that with an > appropriate governance board, the NA discussion could > have simply been shut down by having the developers > vote (as part of their governance). ?This might be in > the interest of the developers and of the community > (I'm not sure), but I doubt it is what Matthew has in mind. > In any case, until proposals are put on the table along > with a clear effort to illustrate why it is in the interest > of the *developers* to adopt the proposals, I really do not > see this discussion moving forward. That's helpful, it would be good to discuss concrete proposals. Would you care to flesh out your proposal in more detail or is it as you quoted it before? Where do you stand on the desirability of consensus? Do you have any suggestions on how to ensure that the non-Continuum community has sufficient weight in decision making? Best, Matthew From josef.pktd at gmail.com Wed Feb 15 16:48:24 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 15 Feb 2012 16:48:24 -0500 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C0F76.8070700@gmail.com> <4F3C1972.3000801@gmail.com> Message-ID: On Wed, Feb 15, 2012 at 4:43 PM, Matthew Brett wrote: > Hi, > > On Wed, Feb 15, 2012 at 12:45 PM, Alan G Isaac wrote: >> My analysis is fundamentally different than Matthew >> and Benjamin's for a few reasons. >> >> 1. The problem has been miscast. >> ? ?The "economic interests" of the developers *always* >> ? ?has had an apparent conflict with the economic >> ? ?interests of the users: users want developers to work more >> ? ?on the code, and developers need to make a living, which >> ? ?often involves spending their time on other things. >> ? ?On this score, nothing has really changed. >> 2. It seems pretty clear that Matthew wants some governance >> ? ?power to be held by individuals who are not actively >> ? ?developing NumPy. ?As Chuck Harris pointed out long ago, >> ? ?that dog ain't going to hunt. >> 3. Constitutions can be broken (and are, all the time). >> ? ?Designing a stable institution requires making it in >> ? ?the interests of the members to participate. >> >> Any formal governance structure that can be desirable >> for the NumPy community as a whole has to be desirable >> for the core developers. ?The right way to produce a >> governance structure is to make concrete proposals and >> show how these proposals are in the interest of the >> *developers* (as well as of the users). >> >> For example, Benjamin obliquely suggested that with an >> appropriate governance board, the NA discussion could >> have simply been shut down by having the developers >> vote (as part of their governance). ?This might be in >> the interest of the developers and of the community >> (I'm not sure), but I doubt it is what Matthew has in mind. >> In any case, until proposals are put on the table along >> with a clear effort to illustrate why it is in the interest >> of the *developers* to adopt the proposals, I really do not >> see this discussion moving forward. > > That's helpful, it would be good to discuss concrete proposals. > Would you care to flesh out your proposal in more detail or is it as > you quoted it before? > > Where do you stand on the desirability of consensus? > > Do you have any suggestions on how to ensure that the non-Continuum > community has sufficient weight in decision making? I'm going to miss Ralf as release manager, since in terms of governance he had the last control over what's actually in the released versions of numpy (and scipy). (I think the ABI breakage in 1.4 not 1.5 was pretty painful for a long time.) Josef > > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From tjhnson at gmail.com Wed Feb 15 17:08:42 2012 From: tjhnson at gmail.com (T J) Date: Wed, 15 Feb 2012 14:08:42 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: <4F3C1972.3000801@gmail.com> References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C0F76.8070700@gmail.com> <4F3C1972.3000801@gmail.com> Message-ID: On Wed, Feb 15, 2012 at 12:45 PM, Alan G Isaac wrote: > for the core developers. The right way to produce a > governance structure is to make concrete proposals and > show how these proposals are in the interest of the > *developers* (as well as of the users). > > At this point, it seems to me that Matthew is simply trying to make the case that ==some== governance structure should be (or should be more clearly) set up. Even this fairly modest goal seems to be receiving blowback from some. Perhaps a specific proposal would be more convincing to those in opposition, but I'd like to think that the merits for a governance structure could be appreciated without having specific the details of what such a structure would look like. Matthew's links certainly make a good case, IMO. He has also described a number of scenarios where a governance structure would be helpful (even if we think the likelihood of such scenarios is small). -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Wed Feb 15 17:21:15 2012 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 15 Feb 2012 16:21:15 -0600 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C0F76.8070700@gmail.com> <4F3C1972.3000801@gmail.com> Message-ID: On Wed, Feb 15, 2012 at 4:08 PM, T J wrote: > On Wed, Feb 15, 2012 at 12:45 PM, Alan G Isaac wrote: > > >> for the core developers. The right way to produce a >> governance structure is to make concrete proposals and >> show how these proposals are in the interest of the >> *developers* (as well as of the users). >> >> > At this point, it seems to me that Matthew is simply trying to make the > case that ==some== governance structure should be (or should be more > clearly) set up. Even this fairly modest goal seems to be receiving > blowback from some. > > Perhaps a specific proposal would be more convincing to those in > opposition, but I'd like to think that the merits for > a governance structure could be appreciated without having specific the > details of what such a structure would look like. Matthew's links > certainly make a good case, IMO. He has also described a number of > scenarios where a governance structure would be helpful (even if we think > the likelihood of such scenarios is small). > > Agreed. During the NA discussion, I remember at one point there was an attempt to create a governance structure to resolve the dispute. I pushed back on that idea at the time because we would be trying to form a governance while in heated arguments. I would like to see a very basic structure established and agreed upon now, while heads are cool and the pressure isn't "on" (relatively speaking). The point of these structures are for the unanticipated situations. Following the Boy Scouts motto: "Be Prepared!" Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From balarsen at lanl.gov Wed Feb 15 17:23:51 2012 From: balarsen at lanl.gov (Larsen, Brian A) Date: Wed, 15 Feb 2012 22:23:51 +0000 Subject: [Numpy-discussion] PyArray_FromAny() steals a reference to PyArray_Descr* dtype Message-ID: <4D175517-DBCB-4719-B9C0-586798E617FF@lanl.gov> Hello all, the docs are unclear as to the reference counting on the inputs to the numpy C function PyArray_FromAny(). multiarraymodule.c in the PyArray_InnerProduct() code seems to imply that a reference to dtype is stolen in the PyArray_FromAny process. Meaning that I don't need/can't have a Py_DECREF(). Can anyone confirm this? Thanks much, Brian -- Brian A. Larsen ISR-1 Space Science and Applications Los Alamos National Laboratory PO Box 1663, MS-D466 Los Alamos, NM 87545 USA (For overnight add: SM-30, Bikini Atoll Road) Phone: 505-665-7691 Fax: 505-665-7395 email: balarsen at lanl.gov Correspondence / Technical data or Software Publicly Available -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Wed Feb 15 17:24:41 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 15 Feb 2012 14:24:41 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> Message-ID: On Wed, Feb 15, 2012 at 1:36 PM, Matthew Brett wrote: > Hi, > > On Wed, Feb 15, 2012 at 12:55 PM, Mark Wiebe wrote: > > On Wed, Feb 15, 2012 at 12:09 PM, Matthew Brett > > > wrote: > >> > >> Hi, > >> > >> On Wed, Feb 15, 2012 at 11:46 AM, Benjamin Root > wrote: > >> > > >> > > >> > On Wed, Feb 15, 2012 at 1:32 PM, Alan G Isaac > >> > wrote: > >> >> Can you provide an example where a more formal > >> >> governance structure for NumPy would have meant > >> >> more or better code development? (Please do not > >> >> suggest the NA discussion!) > >> >> > >> > > >> > Why not the NA discussion? Would we really want to have that happen > >> > again? > >> > Note that it still isn't fully resolved and progress still needs to be > >> > made > >> > (I think the last thread did an excellent job of fleshing out the > ideas, > >> > but > >> > it became too much to digest. We may need to have someone go through > >> > the > >> > information, reduce it down and make one last push to bring it to a > >> > conclusion). The NA discussion is the perfect example where a > >> > governance > >> > structure would help resolve disputes. > >> > >> Yes, that was the most obvious example. I don't know about you, but I > >> can't see any sign of that one being resolved. > >> > >> The other obvious example was the dispute about ABI breakage for numpy > >> 1.5.0 where I believe Travis did invoke some sort of committee to > >> vote, but (Travis can correct me if I'm wrong), the committee was > >> named ad-hoc and contacted off-list. > >> > >> > > >> >> > >> >> Can you provide an example of what you might > >> >> envision as a "more formal governance structure"? > >> >> (I assume that any such structure will not put people > >> >> who are not core contributors to NumPy in a position > >> >> to tell core contributors what to spend their time on.) > >> >> > >> >> Early last December, Chuck Harris estimated that three > >> >> people were active NumPy developers. I liked the idea of > >> >> creating a "board" of these 3 and a rule that says any > >> >> active developer can request to join the board, that > >> >> additions are determined by majority vote of the existing > >> >> board, and that having the board both small and odd > >> >> numbered is a priority. I also suggested inviting to this > >> >> board a developer or two from important projects that are > >> >> very NumPy dependent (e.g., Matplotlib). > >> >> > >> >> I still like this idea. Would it fully satisfy you? > >> >> > >> > > >> > I actually like that idea. Matthew, is this along the lines of what > you > >> > were thinking? > >> > >> Honestly it would make me very happy if the discussion moved to what > >> form the governance should take. I would have thought that 3 was too > >> small a number. > > > > > > One thing to note about this point is that during the NA discussion, the > > only people doing active C-level development were Charles and me. I > suspect > > a discussion about how to recruit more people into that group might be > more > > important than governance at this point in time. > > Mark - a) thanks for replying, it's good to hear your voice and b) I > don't think there's any competition between the discussion about > governance and the need to recruit more people into the group who > understand the C code. > There hasn't really been any discussion about recruiting developers to compete with the governance topic, now we can let the topics compete. :) Some of the mechanisms which will help are already being set in motion through the discussion about better infrastructure support like bug trackers and continuous integration. The forthcoming roadmap discussion Travis alluded to, where we will propose a roadmap for review by the numpy user community, will include many more such points. > Remember we are deciding here between governance - of a form to be > decided - and no governance - which I think is the current situation. > I know your desire is to see more people contributing to the C code. > It would help a lot if you could say what you think the barriers are, > how they could be lowered, and the risks that you see as a result of > the numpy C expertise moving essentially into one company. Then we > can formulate some governance that would help lower those barriers and > reduce those risks. > There certainly is governance now, it's just informal. It's a combination of how the design discussions are carried out, how pull requests occur, and who has commit rights. The only way to reasonably mitigate the risk of development expertise being in one company is to recruit more developers from elsewhere. I think those new developers will appreciate having a say about how governance works, which is why I suggested to postpone the meat of this discussion with an interim solution, and move on to the recruitment topic. > > If we need a formal structure, maybe a good approach is giving Travis the > > final say for now, until a trigger point occurs. That could be 6 months > > after the number of active developers hits 5, or something like that. At > > that point, we would reopen the discussion with a larger group of people > who > > would directly play in that role, and any decision made then will > probably > > be better than a decision we make now while the development team is so > > small. > > Honestly - as I was saying to Alan and indirectly to Ben - any formal > model - at all - is preferable to the current situation. Personally, I > would say that making the founder of a company, which is working to > make money from Numpy, the only decision maker on numpy - is - scary. > I'm proposing to make Travis the backstop for when a decision can't be decided informally, not to force all decisions through him. There's a closely related project, Python, which follows a similar approach. ;) > But maybe it's the best way. But, again, we're all high-functioning > sensible people, I'm sure it's possible for us to formulate what the > risks are, what the potential solutions are, and come up with the best > - maybe short-term - solution, > The biggest risk I see is stagnant development. If there has to be a formal structure, I think it should be as simple as possible right now. I believe Travis's heart is in the right place to tackle the many problems NumPy faces, and putting him in the role of "decider of last resort" is a simple formal structure that I believe will work well. When there is a bigger active development community, there will be more voices, and more importantly, those voices will represent the experience gained from a bigger development team interacting with the numpy user community. That is a better time to design a more complex governance structure. Cheers, Mark > > See you, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pwang at streamitive.com Wed Feb 15 17:30:28 2012 From: pwang at streamitive.com (Peter Wang) Date: Wed, 15 Feb 2012 16:30:28 -0600 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> Message-ID: <0EB48A59-AEAF-44F0-B910-8B1DC9B574B8@streamitive.com> On Feb 15, 2012, at 3:36 PM, Matthew Brett wrote: > Honestly - as I was saying to Alan and indirectly to Ben - any formal > model - at all - is preferable to the current situation. Personally, I > would say that making the founder of a company, which is working to > make money from Numpy, the only decision maker on numpy - is - scary. How is this different from the situation of the last 4 years? Travis was President at Enthought, which makes money from not only Numpy but SciPy as well. In addition to employing Travis, Enthought also employees many other key contributors to Numpy and Scipy, like Robert and David. Furthermore, the Scipy and Numpy mailing lists and repos and web pages were all hosted at Enthought. If they didn't like how a particular discussion was going, they could have memory-holed the entire conversation from the archives, or worse yet, revoked commit access and reverted changes. But such things never transpired, and of course most of us know that such things would never happen. I don't see why the current situation is any different from the previous situation, other than the fact that Travis actually plans on actively developing Numpy again, and that hardly seems scary. -Peter From matthew.brett at gmail.com Wed Feb 15 17:48:18 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 15 Feb 2012 14:48:18 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: <0EB48A59-AEAF-44F0-B910-8B1DC9B574B8@streamitive.com> References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <0EB48A59-AEAF-44F0-B910-8B1DC9B574B8@streamitive.com> Message-ID: Hi, On Wed, Feb 15, 2012 at 2:30 PM, Peter Wang wrote: > On Feb 15, 2012, at 3:36 PM, Matthew Brett wrote: > >> Honestly - as I was saying to Alan and indirectly to Ben - any formal >> model - at all - is preferable to the current situation. Personally, I >> would say that making the founder of a company, which is working to >> make money from Numpy, the only decision maker on numpy - is - scary. > > How is this different from the situation of the last 4 years? ?Travis was President at Enthought, which makes money from not only Numpy but SciPy as well. ?In addition to employing Travis, Enthought also employees many other key contributors to Numpy and Scipy, like Robert and David. The difference is fairly obvious to me, but stop me if I'm wrong. First - although Enthought was in a position to influence numpy development, it didn't very much, partly, I suppose because Travis did not have time to contribute to numpy. The exception is of course the masked array stuff by Mark that caused a lot of controversy. > ?Furthermore, the Scipy and Numpy mailing lists and repos and web pages were all hosted at Enthought. ?If they didn't like how a particular discussion was going, they could have memory-holed the entire conversation from the archives, or worse yet, revoked commit access and reverted changes. Obviously we should be realistic about the risks. Situations like that are very unlikely. > But such things never transpired, and of course most of us know that such things would never happen. Right. >?I don't see why the current situation is any different from the previous situation, other than the fact that Travis actually plans on actively developing Numpy again, and that hardly seems scary. It would be silly to be worried about Travis contributing to numpy, in general. Best, Matthew From bryanv at continuum.io Wed Feb 15 17:50:07 2012 From: bryanv at continuum.io (Bryan Van de Ven) Date: Wed, 15 Feb 2012 16:50:07 -0600 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C08A6.5040807@hawaii.edu> <8775765F-2BEB-49DD-BB79-F407CC303E40@stsci.edu> Message-ID: <4F3C369F.8060505@continuum.io> On 2/15/12 3:25 PM, Matthew Brett wrote: > 4) It is possible for Continuum to want features that are good for > Continuum, but bad for the code-base in general. For example, > Continuum may have some product that requires a particular arcane > feature in numpy. > > Through these mechanisms, Numpy can lose developers and commitment > (please note) *relative to the situation where there is formal > governance*. > > Obviously, at worst, this can lead to a split. We can avoid that > *risk* with a sensible governance model that is satisfactory for all > parties. I'm sure that's achievable. Hi All, I'm one of the Continuum devs tasked with contributing to numpy core. I have experience as a numpy user in the past but the core C code is new to me, and getting familiar with it has been an enlightening experience, to say the least. One of our primary long term goals is to make the core codebase much cleaner and more modular. The outcome we expect and hope for as a result of this effort are: 1) Encourage more core developers to join numpy because the codebase is more approachable (I hope we are all everyone agreed that it is very desirable to attract more core devs) 2) Allow the development of new types and features to have some relative insulation from one another and from numpy core Increased modularity mitigates the risk of any conflicts between Continuum and the numpy community (if any should ever actually arise), and reduces the chance of a split. Having more core devs spreads around the responsibility for making decisions while still vesting that responsibility largely among the folks actually contributing their time and effort. Perhaps then the most important question is how to get to a cleaned up, modular numpy core? The details of that roadmap should definitely be hashed out here on the list. But if we can get to that state, I think everyone can pursue both their shared and individual interests comfortably, regardless of what type of formal or informal governance might be adopted in the future. BTW I'd also just like to take this chance to say "Hello" to the list, I am very excited to help improve numpy. Bryan Van de Ven From jh at physics.ucf.edu Wed Feb 15 18:18:52 2012 From: jh at physics.ucf.edu (Joe Harrington) Date: Wed, 15 Feb 2012 18:18:52 -0500 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: (numpy-discussion-request@scipy.org) Message-ID: >On Wed, Feb 15, 2012 at 1:00 PM, Perry Greenfield wrote: >> On Feb 15, 2012, at 3:01 PM, Matthew Brett wrote: >> >> [...] >> >> My 2 cents. >> >> [...] I am both elated and concerned. Since it's obvious what there is to be elated about, this post has a "concerned" tone. But overall, I think this is a great move with some obvious problems to solve before it moves forward. I think the effort spent now looking for a solution will be much less than the pain of trying to undo a mess involving contracts and clients. That way lies court and code splits. So, let's do the hard work of figuring out governance now. In principle, I agree with Matt. We're moving from pretty altruistic code development to a model in which the most active development will be paid for and therefore controlled by influences outside the user community. This can have a lot of unintended side effects, including those Matt pointed out. We might also feel some level of discontent among developers who are not paid vs. those who are. This might make it hard to recruit developers who are not Continuum employees. There are tons of examples of financial interests creeping in and mucking up community computer projects. Symbolics vs. Lisp Machines, Inc. Early shenanigans on internet technical committees by engineers working at companies that were behind the curve on product development. Etc. As for being responsive to the community, Continuum is already promising the world whole new directions in numpy (see continuum.io). Were those plans even mentioned on a mailing list? Is that the direction we want to go in? Are there negative consequeces to those plans? What are the plans, exactly? Having those who do the work make the decisions only works when those working consider the needs of all. The community believes that's true largely when it's included in the discussion and not taken by surprise, as seems to have happened here. Of course, balancing all of this (and our security blanket) is the possibility of someone splitting the code if they don't like how Continuum runs things. Perry, you've done that yourself to this code's predecessor, so you know the risks. You did that in response to one constituency's moving the code in a direction you didn't like (or not moving it in one you did, I don't remember exactly), as in your example #2. So, while progress might be made when that happens, last time it hurt astronomers enough that you rolled your own and had to put several FTE on the problem. That split held back adoption of numpy both in the astronomy community and outside it, for like 5 years. Perhaps some governance would have saved you the effort and cost and the community the grief of the numarray split. Of course, lots of good eventually came from the split. I'd like to see at least some serious thought on how to protect the interests of the community under this very different development model. Trying it out and deciding we don't like it later will be a *much* harder thing to sort out. At the same time, the idea of multiplying the number of people actually working, and of having continuous builds and good issue tracking and all the rest, including the enhancements listed on Travis's web site, are very exciting! Let's just make sure we retain our community orientation with this new model. --jh-- From travis at continuum.io Wed Feb 15 18:31:53 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 15 Feb 2012 17:31:53 -0600 Subject: [Numpy-discussion] PyArray_FromAny() steals a reference to PyArray_Descr* dtype In-Reply-To: <4D175517-DBCB-4719-B9C0-586798E617FF@lanl.gov> References: <4D175517-DBCB-4719-B9C0-586798E617FF@lanl.gov> Message-ID: <9340EF4C-73AF-4762-B64C-0872C7E43DB1@continuum.io> Yes, the PyArray_FromAny steals a reference to the dtype object. This is done so you can build one on the fly doing something like PyArray_DescrFromType(NPY_DOUBLE) inline with the PyArray_FromAny call. -Travis On Feb 15, 2012, at 4:23 PM, Larsen, Brian A wrote: > Hello all, > > the docs are unclear as to the reference counting on the inputs to the numpy C function PyArray_FromAny(). > > multiarraymodule.c in the PyArray_InnerProduct() code seems to imply that a reference to dtype is stolen in the PyArray_FromAny process. Meaning that I don't need/can't have a Py_DECREF(). > > Can anyone confirm this? > > Thanks much, > > Brian > > > > > > > > -- > > Brian A. Larsen > ISR-1 Space Science and Applications > Los Alamos National Laboratory > PO Box 1663, MS-D466 > Los Alamos, NM 87545 > USA > > (For overnight add: > SM-30, Bikini Atoll Road) > > Phone: 505-665-7691 > Fax: 505-665-7395 > email: balarsen at lanl.gov > > Correspondence / > Technical data or Software Publicly Available > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Wed Feb 15 19:27:04 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 15 Feb 2012 16:27:04 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> Message-ID: <4F3C4D58.6050007@astro.uio.no> On 02/15/2012 02:24 PM, Mark Wiebe wrote: > On Wed, Feb 15, 2012 at 1:36 PM, Matthew Brett > wrote: > > Hi, > > On Wed, Feb 15, 2012 at 12:55 PM, Mark Wiebe > wrote: > > On Wed, Feb 15, 2012 at 12:09 PM, Matthew Brett > > > > wrote: > >> > >> Hi, > >> > >> On Wed, Feb 15, 2012 at 11:46 AM, Benjamin Root > wrote: > >> > > >> > > >> > On Wed, Feb 15, 2012 at 1:32 PM, Alan G Isaac > > > >> > wrote: > >> >> Can you provide an example where a more formal > >> >> governance structure for NumPy would have meant > >> >> more or better code development? (Please do not > >> >> suggest the NA discussion!) > >> >> > >> > > >> > Why not the NA discussion? Would we really want to have that > happen > >> > again? > >> > Note that it still isn't fully resolved and progress still > needs to be > >> > made > >> > (I think the last thread did an excellent job of fleshing out > the ideas, > >> > but > >> > it became too much to digest. We may need to have someone go > through > >> > the > >> > information, reduce it down and make one last push to bring it > to a > >> > conclusion). The NA discussion is the perfect example where a > >> > governance > >> > structure would help resolve disputes. > >> > >> Yes, that was the most obvious example. I don't know about you, > but I > >> can't see any sign of that one being resolved. > >> > >> The other obvious example was the dispute about ABI breakage for > numpy > >> 1.5.0 where I believe Travis did invoke some sort of committee to > >> vote, but (Travis can correct me if I'm wrong), the committee was > >> named ad-hoc and contacted off-list. > >> > >> > > >> >> > >> >> Can you provide an example of what you might > >> >> envision as a "more formal governance structure"? > >> >> (I assume that any such structure will not put people > >> >> who are not core contributors to NumPy in a position > >> >> to tell core contributors what to spend their time on.) > >> >> > >> >> Early last December, Chuck Harris estimated that three > >> >> people were active NumPy developers. I liked the idea of > >> >> creating a "board" of these 3 and a rule that says any > >> >> active developer can request to join the board, that > >> >> additions are determined by majority vote of the existing > >> >> board, and that having the board both small and odd > >> >> numbered is a priority. I also suggested inviting to this > >> >> board a developer or two from important projects that are > >> >> very NumPy dependent (e.g., Matplotlib). > >> >> > >> >> I still like this idea. Would it fully satisfy you? > >> >> > >> > > >> > I actually like that idea. Matthew, is this along the lines > of what you > >> > were thinking? > >> > >> Honestly it would make me very happy if the discussion moved to what > >> form the governance should take. I would have thought that 3 > was too > >> small a number. > > > > > > One thing to note about this point is that during the NA > discussion, the > > only people doing active C-level development were Charles and me. > I suspect > > a discussion about how to recruit more people into that group > might be more > > important than governance at this point in time. > > Mark - a) thanks for replying, it's good to hear your voice and b) I > don't think there's any competition between the discussion about > governance and the need to recruit more people into the group who > understand the C code. > > > There hasn't really been any discussion about recruiting developers to > compete with the governance topic, now we can let the topics compete. :) > > Some of the mechanisms which will help are already being set in motion > through the discussion about better infrastructure support like bug > trackers and continuous integration. The forthcoming roadmap discussion > Travis alluded to, where we will propose a roadmap for review by the > numpy user community, will include many more such points. > > Remember we are deciding here between governance - of a form to be > decided - and no governance - which I think is the current situation. > I know your desire is to see more people contributing to the C code. > It would help a lot if you could say what you think the barriers are, > how they could be lowered, and the risks that you see as a result of > the numpy C expertise moving essentially into one company. Then we > can formulate some governance that would help lower those barriers and > reduce those risks. > > > There certainly is governance now, it's just informal. It's a > combination of how the design discussions are carried out, how pull > requests occur, and who has commit rights. +1 If non-contributing users came along on the Cython list demanding that we set up a system to select non-developers along on a board that would have discussions in order to veto pull requests, I don't know whether we'd ignore it or ridicule it or try to show some patience, but we certainly wouldn't take it seriously. It's obvious that one should try for consensus as long as possible, including listening to users. But in the very end, when agreement can't be reached by other means, the developers are the one making the calls. (This is simply a consequence that they are the only ones who can credibly threaten to fork the project.) Sure, structures that includes users in the process could be useful... but, if the devs are fine with the current situation (and I don't see Mark or Charles complaining), then I honestly think it is quite rude to not let the matter drop after the first ten posts or so. Making things the way one wants it and scratching *ones own* itch is THE engine of open source development (whether one is putting in spare time or monetary funding). Trying to work against that with artificial structures doesn't sound wise for a project with as few devs as NumPy... Dag From josef.pktd at gmail.com Wed Feb 15 19:57:53 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 15 Feb 2012 19:57:53 -0500 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: <4F3C4D58.6050007@astro.uio.no> References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> Message-ID: On Wed, Feb 15, 2012 at 7:27 PM, Dag Sverre Seljebotn wrote: > On 02/15/2012 02:24 PM, Mark Wiebe wrote: >> On Wed, Feb 15, 2012 at 1:36 PM, Matthew Brett > > wrote: >> >> ? ? Hi, >> >> ? ? On Wed, Feb 15, 2012 at 12:55 PM, Mark Wiebe > ? ? > wrote: >> ? ? ?> On Wed, Feb 15, 2012 at 12:09 PM, Matthew Brett >> ? ? > >> ? ? ?> wrote: >> ? ? ?>> >> ? ? ?>> Hi, >> ? ? ?>> >> ? ? ?>> On Wed, Feb 15, 2012 at 11:46 AM, Benjamin Root > ? ? > wrote: >> ? ? ?>> > >> ? ? ?>> > >> ? ? ?>> > On Wed, Feb 15, 2012 at 1:32 PM, Alan G Isaac >> ? ? > >> ? ? ?>> > wrote: >> ? ? ?>> >> Can you provide an example where a more formal >> ? ? ?>> >> governance structure for NumPy would have meant >> ? ? ?>> >> more or better code development? (Please do not >> ? ? ?>> >> suggest the NA discussion!) >> ? ? ?>> >> >> ? ? ?>> > >> ? ? ?>> > Why not the NA discussion? ?Would we really want to have that >> ? ? happen >> ? ? ?>> > again? >> ? ? ?>> > Note that it still isn't fully resolved and progress still >> ? ? needs to be >> ? ? ?>> > made >> ? ? ?>> > (I think the last thread did an excellent job of fleshing out >> ? ? the ideas, >> ? ? ?>> > but >> ? ? ?>> > it became too much to digest. ?We may need to have someone go >> ? ? through >> ? ? ?>> > the >> ? ? ?>> > information, reduce it down and make one last push to bring it >> ? ? to a >> ? ? ?>> > conclusion). ?The NA discussion is the perfect example where a >> ? ? ?>> > governance >> ? ? ?>> > structure would help resolve disputes. >> ? ? ?>> >> ? ? ?>> Yes, that was the most obvious example. I don't know about you, >> ? ? but I >> ? ? ?>> can't see any sign of that one being resolved. >> ? ? ?>> >> ? ? ?>> The other obvious example was the dispute about ABI breakage for >> ? ? numpy >> ? ? ?>> 1.5.0 where I believe Travis did invoke some sort of committee to >> ? ? ?>> vote, but (Travis can correct me if I'm wrong), the committee was >> ? ? ?>> named ad-hoc and contacted off-list. >> ? ? ?>> >> ? ? ?>> > >> ? ? ?>> >> >> ? ? ?>> >> Can you provide an example of what you might >> ? ? ?>> >> envision as a "more formal governance structure"? >> ? ? ?>> >> (I assume that any such structure will not put people >> ? ? ?>> >> who are not core contributors to NumPy in a position >> ? ? ?>> >> to tell core contributors what to spend their time on.) >> ? ? ?>> >> >> ? ? ?>> >> Early last December, Chuck Harris estimated that three >> ? ? ?>> >> people were active NumPy developers. ?I liked the idea of >> ? ? ?>> >> creating a "board" of these 3 and a rule that says any >> ? ? ?>> >> active developer can request to join the board, that >> ? ? ?>> >> additions are determined by majority vote of the existing >> ? ? ?>> >> board, and ?that having the board both small and odd >> ? ? ?>> >> numbered is a priority. ?I also suggested inviting to this >> ? ? ?>> >> board a developer or two from important projects that are >> ? ? ?>> >> very NumPy dependent (e.g., Matplotlib). >> ? ? ?>> >> >> ? ? ?>> >> I still like this idea. ?Would it fully satisfy you? >> ? ? ?>> >> >> ? ? ?>> > >> ? ? ?>> > I actually like that idea. ?Matthew, is this along the lines >> ? ? of what you >> ? ? ?>> > were thinking? >> ? ? ?>> >> ? ? ?>> Honestly it would make me very happy if the discussion moved to what >> ? ? ?>> form the governance should take. ?I would have thought that 3 >> ? ? was too >> ? ? ?>> small a number. >> ? ? ?> >> ? ? ?> >> ? ? ?> One thing to note about this point is that during the NA >> ? ? discussion, the >> ? ? ?> only people doing active C-level development were Charles and me. >> ? ? I suspect >> ? ? ?> a discussion about how to recruit more people into that group >> ? ? might be more >> ? ? ?> important than governance at this point in time. >> >> ? ? Mark - a) thanks for replying, it's good to hear your voice and b) I >> ? ? don't think there's any competition between the discussion about >> ? ? governance and the need to recruit more people into the group who >> ? ? understand the C code. >> >> >> There hasn't really been any discussion about recruiting developers to >> compete with the governance topic, now we can let the topics compete. :) >> >> Some of the mechanisms which will help are already being set in motion >> through the discussion about better infrastructure support like bug >> trackers and continuous integration. The forthcoming roadmap discussion >> Travis alluded to, where we will propose a roadmap for review by the >> numpy user community, will include many more such points. >> >> ? ? Remember we are deciding here between governance - of a form to be >> ? ? decided - and no governance - which I think is the current situation. >> ? ? I know your desire is to see more people contributing to the C code. >> ? ? It would help a lot if you could say what you think the barriers are, >> ? ? how they could be lowered, and the risks that you see as a result of >> ? ? the numpy C expertise moving essentially into one company. ?Then we >> ? ? can formulate some governance that would help lower those barriers and >> ? ? reduce those risks. >> >> >> There certainly is governance now, it's just informal. It's a >> combination of how the design discussions are carried out, how pull >> requests occur, and who has commit rights. > > +1 > > If non-contributing users came along on the Cython list demanding that > we set up a system to select non-developers along on a board that would > have discussions in order to veto pull requests, I don't know whether > we'd ignore it or ridicule it or try to show some patience, but we > certainly wouldn't take it seriously. > > It's obvious that one should try for consensus as long as possible, > including listening to users. But in the very end, when agreement can't > be reached by other means, the developers are the one making the calls. > (This is simply a consequence that they are the only ones who can > credibly threaten to fork the project.) > > Sure, structures that includes users in the process could be useful... > but, if the devs are fine with the current situation (and I don't see > Mark or Charles complaining), then I honestly think it is quite rude to > not let the matter drop after the first ten posts or so. > > Making things the way one wants it and scratching *ones own* itch is THE > engine of open source development (whether one is putting in spare time > or monetary funding). Trying to work against that with artificial > structures doesn't sound wise for a project with as few devs as NumPy... I don't think you can restrict the Numpy developer or contributor group just to the developers that work on the C core like Charles and Mark, and others over the years I have been following it ( Pauli and David, ...). There is a large part of non C numpy, Pierre for example, and Joe Harrington put money and a lot of effort into bringing the documentation into the current state, the documentation was mostly a community effort. Of course I only ever contributed to scipy, except of two or three bugfixes in numpy.random, but I still care about the direction numpy is going, as do developers of the "SciPy" community which crucially rely on numpy. Josef > > Dag > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From matthew.brett at gmail.com Wed Feb 15 20:02:26 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 15 Feb 2012 17:02:26 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: <4F3C4D58.6050007@astro.uio.no> References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> Message-ID: Hi, On Wed, Feb 15, 2012 at 4:27 PM, Dag Sverre Seljebotn wrote: > On 02/15/2012 02:24 PM, Mark Wiebe wrote: >> On Wed, Feb 15, 2012 at 1:36 PM, Matthew Brett > > wrote: >> >> ? ? Hi, >> >> ? ? On Wed, Feb 15, 2012 at 12:55 PM, Mark Wiebe > ? ? > wrote: >> ? ? ?> On Wed, Feb 15, 2012 at 12:09 PM, Matthew Brett >> ? ? > >> ? ? ?> wrote: >> ? ? ?>> >> ? ? ?>> Hi, >> ? ? ?>> >> ? ? ?>> On Wed, Feb 15, 2012 at 11:46 AM, Benjamin Root > ? ? > wrote: >> ? ? ?>> > >> ? ? ?>> > >> ? ? ?>> > On Wed, Feb 15, 2012 at 1:32 PM, Alan G Isaac >> ? ? > >> ? ? ?>> > wrote: >> ? ? ?>> >> Can you provide an example where a more formal >> ? ? ?>> >> governance structure for NumPy would have meant >> ? ? ?>> >> more or better code development? (Please do not >> ? ? ?>> >> suggest the NA discussion!) >> ? ? ?>> >> >> ? ? ?>> > >> ? ? ?>> > Why not the NA discussion? ?Would we really want to have that >> ? ? happen >> ? ? ?>> > again? >> ? ? ?>> > Note that it still isn't fully resolved and progress still >> ? ? needs to be >> ? ? ?>> > made >> ? ? ?>> > (I think the last thread did an excellent job of fleshing out >> ? ? the ideas, >> ? ? ?>> > but >> ? ? ?>> > it became too much to digest. ?We may need to have someone go >> ? ? through >> ? ? ?>> > the >> ? ? ?>> > information, reduce it down and make one last push to bring it >> ? ? to a >> ? ? ?>> > conclusion). ?The NA discussion is the perfect example where a >> ? ? ?>> > governance >> ? ? ?>> > structure would help resolve disputes. >> ? ? ?>> >> ? ? ?>> Yes, that was the most obvious example. I don't know about you, >> ? ? but I >> ? ? ?>> can't see any sign of that one being resolved. >> ? ? ?>> >> ? ? ?>> The other obvious example was the dispute about ABI breakage for >> ? ? numpy >> ? ? ?>> 1.5.0 where I believe Travis did invoke some sort of committee to >> ? ? ?>> vote, but (Travis can correct me if I'm wrong), the committee was >> ? ? ?>> named ad-hoc and contacted off-list. >> ? ? ?>> >> ? ? ?>> > >> ? ? ?>> >> >> ? ? ?>> >> Can you provide an example of what you might >> ? ? ?>> >> envision as a "more formal governance structure"? >> ? ? ?>> >> (I assume that any such structure will not put people >> ? ? ?>> >> who are not core contributors to NumPy in a position >> ? ? ?>> >> to tell core contributors what to spend their time on.) >> ? ? ?>> >> >> ? ? ?>> >> Early last December, Chuck Harris estimated that three >> ? ? ?>> >> people were active NumPy developers. ?I liked the idea of >> ? ? ?>> >> creating a "board" of these 3 and a rule that says any >> ? ? ?>> >> active developer can request to join the board, that >> ? ? ?>> >> additions are determined by majority vote of the existing >> ? ? ?>> >> board, and ?that having the board both small and odd >> ? ? ?>> >> numbered is a priority. ?I also suggested inviting to this >> ? ? ?>> >> board a developer or two from important projects that are >> ? ? ?>> >> very NumPy dependent (e.g., Matplotlib). >> ? ? ?>> >> >> ? ? ?>> >> I still like this idea. ?Would it fully satisfy you? >> ? ? ?>> >> >> ? ? ?>> > >> ? ? ?>> > I actually like that idea. ?Matthew, is this along the lines >> ? ? of what you >> ? ? ?>> > were thinking? >> ? ? ?>> >> ? ? ?>> Honestly it would make me very happy if the discussion moved to what >> ? ? ?>> form the governance should take. ?I would have thought that 3 >> ? ? was too >> ? ? ?>> small a number. >> ? ? ?> >> ? ? ?> >> ? ? ?> One thing to note about this point is that during the NA >> ? ? discussion, the >> ? ? ?> only people doing active C-level development were Charles and me. >> ? ? I suspect >> ? ? ?> a discussion about how to recruit more people into that group >> ? ? might be more >> ? ? ?> important than governance at this point in time. >> >> ? ? Mark - a) thanks for replying, it's good to hear your voice and b) I >> ? ? don't think there's any competition between the discussion about >> ? ? governance and the need to recruit more people into the group who >> ? ? understand the C code. >> >> >> There hasn't really been any discussion about recruiting developers to >> compete with the governance topic, now we can let the topics compete. :) >> >> Some of the mechanisms which will help are already being set in motion >> through the discussion about better infrastructure support like bug >> trackers and continuous integration. The forthcoming roadmap discussion >> Travis alluded to, where we will propose a roadmap for review by the >> numpy user community, will include many more such points. >> >> ? ? Remember we are deciding here between governance - of a form to be >> ? ? decided - and no governance - which I think is the current situation. >> ? ? I know your desire is to see more people contributing to the C code. >> ? ? It would help a lot if you could say what you think the barriers are, >> ? ? how they could be lowered, and the risks that you see as a result of >> ? ? the numpy C expertise moving essentially into one company. ?Then we >> ? ? can formulate some governance that would help lower those barriers and >> ? ? reduce those risks. >> >> >> There certainly is governance now, it's just informal. It's a >> combination of how the design discussions are carried out, how pull >> requests occur, and who has commit rights. > > +1 > > If non-contributing users came along on the Cython list demanding that > we set up a system to select non-developers along on a board that would > have discussions in order to veto pull requests, I don't know whether > we'd ignore it or ridicule it or try to show some patience, but we > certainly wouldn't take it seriously. Ouch. Is that me, one of the non-contributing users? Was I suggesting that we set up a system to select non-developers to a board? I must say, now you mention it, I do feel a bit ridiculous. > It's obvious that one should try for consensus as long as possible, > including listening to users. But in the very end, when agreement can't > be reached by other means, the developers are the one making the calls. > (This is simply a consequence that they are the only ones who can > credibly threaten to fork the project.) I think the following are not in question: 1) Consensus is desirable 2) Developers need to have the final say. But, I think it is clear in the both the ABI numpy 1.5.0 dispute, and the mask / NA dispute, that we could have gone further in negotiating to consensus. The question we're considering here is whether there is any way of setting up a set of guidelines or procedures that would help us work at and reach consensus. Or if we don't reach consensus, finding a way to decide that is clear and fair. I don't think working on that seems as silly and / or rude to me as it does to you. > Sure, structures that includes users in the process could be useful... > but, if the devs are fine with the current situation (and I don't see > Mark or Charles complaining), then I honestly think it is quite rude to > not let the matter drop after the first ten posts or so. I have clearly overestimated my own importance and wisdom, and have made myself appear foolish in the eyes of my peers. > Making things the way one wants it and scratching *ones own* itch is THE > engine of open source development (whether one is putting in spare time > or monetary funding). Trying to work against that with artificial > structures doesn't sound wise for a project with as few devs as NumPy... You believe, I suppose, that there are no significant risks in nearly all the numpy core development being done by a new company, or at least, that there can little benefit to a governance discussion in that situation. I think you are wrong, but of course it's a tenable point of view, Best, Matthew From mwwiebe at gmail.com Wed Feb 15 20:26:10 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 15 Feb 2012 17:26:10 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> Message-ID: On Wed, Feb 15, 2012 at 4:57 PM, wrote: > On Wed, Feb 15, 2012 at 7:27 PM, Dag Sverre Seljebotn > wrote: > > On 02/15/2012 02:24 PM, Mark Wiebe wrote: > >> On Wed, Feb 15, 2012 at 1:36 PM, Matthew Brett >> > wrote: > >> > >> Hi, > >> > >> On Wed, Feb 15, 2012 at 12:55 PM, Mark Wiebe >> > wrote: > >> > On Wed, Feb 15, 2012 at 12:09 PM, Matthew Brett > >> > > >> > wrote: > >> >> > >> >> Hi, > >> >> > >> >> On Wed, Feb 15, 2012 at 11:46 AM, Benjamin Root < > ben.root at ou.edu > >> > wrote: > >> >> > > >> >> > > >> >> > On Wed, Feb 15, 2012 at 1:32 PM, Alan G Isaac > >> > > >> >> > wrote: > >> >> >> Can you provide an example where a more formal > >> >> >> governance structure for NumPy would have meant > >> >> >> more or better code development? (Please do not > >> >> >> suggest the NA discussion!) > >> >> >> > >> >> > > >> >> > Why not the NA discussion? Would we really want to have that > >> happen > >> >> > again? > >> >> > Note that it still isn't fully resolved and progress still > >> needs to be > >> >> > made > >> >> > (I think the last thread did an excellent job of fleshing out > >> the ideas, > >> >> > but > >> >> > it became too much to digest. We may need to have someone go > >> through > >> >> > the > >> >> > information, reduce it down and make one last push to bring it > >> to a > >> >> > conclusion). The NA discussion is the perfect example where a > >> >> > governance > >> >> > structure would help resolve disputes. > >> >> > >> >> Yes, that was the most obvious example. I don't know about you, > >> but I > >> >> can't see any sign of that one being resolved. > >> >> > >> >> The other obvious example was the dispute about ABI breakage for > >> numpy > >> >> 1.5.0 where I believe Travis did invoke some sort of committee > to > >> >> vote, but (Travis can correct me if I'm wrong), the committee > was > >> >> named ad-hoc and contacted off-list. > >> >> > >> >> > > >> >> >> > >> >> >> Can you provide an example of what you might > >> >> >> envision as a "more formal governance structure"? > >> >> >> (I assume that any such structure will not put people > >> >> >> who are not core contributors to NumPy in a position > >> >> >> to tell core contributors what to spend their time on.) > >> >> >> > >> >> >> Early last December, Chuck Harris estimated that three > >> >> >> people were active NumPy developers. I liked the idea of > >> >> >> creating a "board" of these 3 and a rule that says any > >> >> >> active developer can request to join the board, that > >> >> >> additions are determined by majority vote of the existing > >> >> >> board, and that having the board both small and odd > >> >> >> numbered is a priority. I also suggested inviting to this > >> >> >> board a developer or two from important projects that are > >> >> >> very NumPy dependent (e.g., Matplotlib). > >> >> >> > >> >> >> I still like this idea. Would it fully satisfy you? > >> >> >> > >> >> > > >> >> > I actually like that idea. Matthew, is this along the lines > >> of what you > >> >> > were thinking? > >> >> > >> >> Honestly it would make me very happy if the discussion moved to > what > >> >> form the governance should take. I would have thought that 3 > >> was too > >> >> small a number. > >> > > >> > > >> > One thing to note about this point is that during the NA > >> discussion, the > >> > only people doing active C-level development were Charles and me. > >> I suspect > >> > a discussion about how to recruit more people into that group > >> might be more > >> > important than governance at this point in time. > >> > >> Mark - a) thanks for replying, it's good to hear your voice and b) I > >> don't think there's any competition between the discussion about > >> governance and the need to recruit more people into the group who > >> understand the C code. > >> > >> > >> There hasn't really been any discussion about recruiting developers to > >> compete with the governance topic, now we can let the topics compete. :) > >> > >> Some of the mechanisms which will help are already being set in motion > >> through the discussion about better infrastructure support like bug > >> trackers and continuous integration. The forthcoming roadmap discussion > >> Travis alluded to, where we will propose a roadmap for review by the > >> numpy user community, will include many more such points. > >> > >> Remember we are deciding here between governance - of a form to be > >> decided - and no governance - which I think is the current > situation. > >> I know your desire is to see more people contributing to the C code. > >> It would help a lot if you could say what you think the barriers > are, > >> how they could be lowered, and the risks that you see as a result of > >> the numpy C expertise moving essentially into one company. Then we > >> can formulate some governance that would help lower those barriers > and > >> reduce those risks. > >> > >> > >> There certainly is governance now, it's just informal. It's a > >> combination of how the design discussions are carried out, how pull > >> requests occur, and who has commit rights. > > > > +1 > > > > If non-contributing users came along on the Cython list demanding that > > we set up a system to select non-developers along on a board that would > > have discussions in order to veto pull requests, I don't know whether > > we'd ignore it or ridicule it or try to show some patience, but we > > certainly wouldn't take it seriously. > > > > It's obvious that one should try for consensus as long as possible, > > including listening to users. But in the very end, when agreement can't > > be reached by other means, the developers are the one making the calls. > > (This is simply a consequence that they are the only ones who can > > credibly threaten to fork the project.) > > > > Sure, structures that includes users in the process could be useful... > > but, if the devs are fine with the current situation (and I don't see > > Mark or Charles complaining), then I honestly think it is quite rude to > > not let the matter drop after the first ten posts or so. > > > > Making things the way one wants it and scratching *ones own* itch is THE > > engine of open source development (whether one is putting in spare time > > or monetary funding). Trying to work against that with artificial > > structures doesn't sound wise for a project with as few devs as NumPy... > > I don't think you can restrict the Numpy developer or contributor > group just to the developers that work on the C core like Charles and > Mark, and others over the years I have been following it ( Pauli and > David, ...). > There is a large part of non C numpy, Pierre for example, and Joe > Harrington put money and a lot of effort into bringing the > documentation into the current state, the documentation was mostly a > community effort. > This is very true, at the moment the number of people doing feature-work within numpy purely in Python is similarly small and sporadic. Here's a current example: https://github.com/numpy/numpy/pull/198 Having such a small development core is one of the reasons it often takes a while for such pull requests to get reviewed by someone, and a situation Continuum and anyone else with resources to contribute can help improve. One thing that's clear to me is that the current documentation on how to contribute code, documentation, and other help to NumPy is lacking, and this is something that needs improvement. An example I really like is LibreOffice's "get involved" page. http://www.libreoffice.org/get-involved/ Producing something similar for NumPy will take some work, but I believe it's needed. Cheers, Mark > > Of course I only ever contributed to scipy, except of two or three > bugfixes in numpy.random, but I still care about the direction numpy > is going, as do developers of the "SciPy" community which crucially > rely on numpy. > > Josef > > > > > > Dag > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Wed Feb 15 20:46:58 2012 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 15 Feb 2012 19:46:58 -0600 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> Message-ID: On Wednesday, February 15, 2012, Mark Wiebe wrote: > On Wed, Feb 15, 2012 at 4:57 PM, wrote: > > On Wed, Feb 15, 2012 at 7:27 PM, Dag Sverre Seljebotn > wrote: >> On 02/15/2012 02:24 PM, Mark Wiebe wrote: >>> On Wed, Feb 15, 2012 at 1:36 PM, Matthew Brett >> > wrote: >>> >>> Hi, >>> >>> On Wed, Feb 15, 2012 at 12:55 PM, Mark Wiebe >> > wrote: >>> > On Wed, Feb 15, 2012 at 12:09 PM, Matthew Brett >>> > >>> > wrote: >>> >> >>> >> Hi, >>> >> >>> >> On Wed, Feb 15, 2012 at 11:46 AM, Benjamin Root >> > wrote: >>> >> > >>> >> > >>> >> > On Wed, Feb 15, 2012 at 1:32 PM, Alan G Isaac >>> > >>> >> > wrote: >>> >> >> Can you provide an example where a more formal >>> >> >> governance structure for NumPy would have meant >>> >> >> more or better code development? (Please do not >>> >> >> suggest the NA discussion!) >>> >> >> >>> >> > >>> >> > Why not the NA discussion? Would we really want to have that >>> happen >>> >> > again? >>> >> > Note that it still isn't fully resolved and progress still >>> needs to be >>> >> > made >>> >> > (I think the last thread did an excellent job of fleshing out >>> the ideas, >>> >> > but >>> >> > it became too much to digest. We may need to have someone go >>> through >>> >> > the >>> >> > information, reduce it down and make one last push to bring it >>> to a >>> >> > conclusion). The NA discussion is the perfect example where a >>> >> > governance >>> >> > structure would help resolve disputes. >>> >> >>> >> Yes, that was the most obvious example. I don't know about you, >>> but I >>> >> can't see any sign of that one being resolved. >>> >> >>> >> The other obvious example was the dispute about ABI breakage for >>> numpy >>> >> 1.5.0 where I believe Travis did invoke some sort of committee to >>> >> vote, but (Travis can correct me if I'm wrong), the committee was >>> >> named ad-hoc and contacted off-list. >>> >> >>> >> > >>> >> >> >>> >> >> Can you provide an example of what you might >>> >> >> envision as a "more formal governance structure"? >>> >> >> (I assume that any such structure will not put people >>> >> >> who are not core contributors to NumPy in a position >>> >> >> to tell core contributors what to spend their time on.) >>> >> >> >>> > > This is very true, at the moment the number of people doing feature-work within numpy purely in Python is similarly small and sporadic. Here's a current example: > https://github.com/numpy/numpy/pull/198 > Having such a small development core is one of the reasons it often takes a while for such pull requests to get reviewed by someone, and a situation Continuum and anyone else with resources to contribute can help improve. One thing that's clear to me is that the current documentation on how to contribute code, documentation, and other help to NumPy is lacking, and this is something that needs improvement. > An example I really like is LibreOffice's "get involved" page. > http://www.libreoffice.org/get-involved/ > Producing something similar for NumPy will take some work, but I believe it's needed. > Cheers, > Mark > +1000. Each time I have submitted a pull request, I always had to ask where the appropriate tests go. I still haven't gotten my head around the layout of the source tree, and my only saving grace has been 'git grep'. It is the little things that can keep someone from contributing. Anything to make this easier would be great. Maybe a protege system might be nice? Chuck ain't getting younger, ya'll! Ben Root P.S. - who knows? Maybe I will be one of those protoges depending on how my new job unfolds. -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Feb 15 20:49:53 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 15 Feb 2012 17:49:53 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: <4F3C4D58.6050007@astro.uio.no> References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> Message-ID: Hi, On Wed, Feb 15, 2012 at 4:27 PM, Dag Sverre Seljebotn wrote: > On 02/15/2012 02:24 PM, Mark Wiebe wrote: >> There certainly is governance now, it's just informal. It's a >> combination of how the design discussions are carried out, how pull >> requests occur, and who has commit rights. > > +1 > > If non-contributing users came along on the Cython list demanding that > we set up a system to select non-developers along on a board that would > have discussions in order to veto pull requests, I don't know whether > we'd ignore it or ridicule it or try to show some patience, but we > certainly wouldn't take it seriously. In the spirit (as I read) of Dag's post, maybe we should accept that this thread is not going anywhere much, and summarize: The current situation is the following: Travis is de-facto BDFL for Numpy Disputes get resolved by convening an ad-hoc group of interested and / or active developers to resolve or vote, maybe off-list. How this happens is for Travis to call. I think that's reasonable? As far as I can make out, in favor of the current status quo with no significant modification are: Travis (is that right)? Mark Peter Bryan vdv Perry Dag In favor of some sort of formalization of governance to be decided are: Me Ben R (did I get that right?) Bruce Southey Souheil Inati TJ Joe H I am not quite sure which side of that fence are: Josef Alan Chuck If I missed someone who gave an opinion - sorry - please do speak up. I think it's clear that if - you, Travis, don't want to go this direction, there isn't much chance of anything happening, and I think those of us who think something needs doing will have to keep quiet, as Dag suggests. I would only suggest that you (Travis) specify that you will take the BDFL role so that we can be clear about the informal governance at least. Best, Matthew From josef.pktd at gmail.com Wed Feb 15 21:07:06 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 15 Feb 2012 21:07:06 -0500 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> Message-ID: On Wed, Feb 15, 2012 at 8:49 PM, Matthew Brett wrote: > Hi, > > On Wed, Feb 15, 2012 at 4:27 PM, Dag Sverre Seljebotn > wrote: >> On 02/15/2012 02:24 PM, Mark Wiebe wrote: > >>> There certainly is governance now, it's just informal. It's a >>> combination of how the design discussions are carried out, how pull >>> requests occur, and who has commit rights. >> >> +1 >> >> If non-contributing users came along on the Cython list demanding that >> we set up a system to select non-developers along on a board that would >> have discussions in order to veto pull requests, I don't know whether >> we'd ignore it or ridicule it or try to show some patience, but we >> certainly wouldn't take it seriously. > > In the spirit (as I read) of Dag's post, maybe we should accept that > this thread is not going anywhere much, and summarize: > > The current situation is the following: > > Travis is de-facto BDFL for Numpy > Disputes get resolved by convening an ad-hoc group of interested and / > or active developers to resolve or vote, maybe off-list. ?How this > happens is for Travis to call. > > I think that's reasonable? > > As far as I can make out, in favor of the current status quo with no > significant modification are: > > Travis (is that right)? > Mark > Peter > Bryan vdv > Perry > Dag > > In favor of some sort of formalization of governance to be decided are: > > Me > Ben R (did I get that right?) > Bruce Southey > Souheil Inati > TJ > Joe H > > I am not quite sure which side of that fence are: > > Josef Actually in the sense of separation of powers, I would vote for Chuck as president, Travis as prime minister and an independent release manager as supreme court, and the noisy mailing list community as parliament. (I don't see a constitution yet.) Josef > Alan > Chuck > > If I missed someone who gave an opinion - sorry - please do speak up. > > I think it's clear that if - you, Travis, don't want to go this > direction, there isn't much chance of anything happening, and I think > those of us who think something needs doing will have to keep quiet, > as Dag suggests. > > I would only suggest that you (Travis) specify that you will take the > BDFL role so that we can be clear about the informal governance at > least. > > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From matthew.brett at gmail.com Wed Feb 15 21:12:06 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 15 Feb 2012 18:12:06 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> Message-ID: Hi, On Wed, Feb 15, 2012 at 6:07 PM, wrote: > On Wed, Feb 15, 2012 at 8:49 PM, Matthew Brett wrote: >> Hi, >> >> On Wed, Feb 15, 2012 at 4:27 PM, Dag Sverre Seljebotn >> wrote: >>> On 02/15/2012 02:24 PM, Mark Wiebe wrote: >> >>>> There certainly is governance now, it's just informal. It's a >>>> combination of how the design discussions are carried out, how pull >>>> requests occur, and who has commit rights. >>> >>> +1 >>> >>> If non-contributing users came along on the Cython list demanding that >>> we set up a system to select non-developers along on a board that would >>> have discussions in order to veto pull requests, I don't know whether >>> we'd ignore it or ridicule it or try to show some patience, but we >>> certainly wouldn't take it seriously. >> >> In the spirit (as I read) of Dag's post, maybe we should accept that >> this thread is not going anywhere much, and summarize: >> >> The current situation is the following: >> >> Travis is de-facto BDFL for Numpy >> Disputes get resolved by convening an ad-hoc group of interested and / >> or active developers to resolve or vote, maybe off-list. ?How this >> happens is for Travis to call. >> >> I think that's reasonable? >> >> As far as I can make out, in favor of the current status quo with no >> significant modification are: >> >> Travis (is that right)? >> Mark >> Peter >> Bryan vdv >> Perry >> Dag >> >> In favor of some sort of formalization of governance to be decided are: >> >> Me >> Ben R (did I get that right?) >> Bruce Southey >> Souheil Inati >> TJ >> Joe H >> >> I am not quite sure which side of that fence are: >> >> Josef > > Actually in the sense of separation of powers, I would vote for Chuck > as president, Travis as prime minister and an independent release > manager as supreme court, and the noisy mailing list community as > parliament. That sounds dangerously Canadian ... But actually - I was hoping for an answer to whether you felt there was a need for a more formal governance structure, or not. > (I don't see a constitution yet.) My feeling is there is not enough appetite for any change for that to be worth thinking about, but I might be wrong. See you, Matthew From ben.root at ou.edu Wed Feb 15 21:24:25 2012 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 15 Feb 2012 20:24:25 -0600 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> Message-ID: On Wednesday, February 15, 2012, Matthew Brett wrote: > Hi, > > On Wed, Feb 15, 2012 at 4:27 PM, Dag Sverre Seljebotn > wrote: >> On 02/15/2012 02:24 PM, Mark Wiebe wrote: > >>> There certainly is governance now, it's just informal. It's a >>> combination of how the design discussions are carried out, how pull >>> requests occur, and who has commit rights. >> >> +1 >> >> If non-contributing users came along on the Cython list demanding that >> we set up a system to select non-developers along on a board that would >> have discussions in order to veto pull requests, I don't know whether >> we'd ignore it or ridicule it or try to show some patience, but we >> certainly wouldn't take it seriously. > > In the spirit (as I read) of Dag's post, maybe we should accept that > this thread is not going anywhere much, and summarize: > > The current situation is the following: > > Travis is de-facto BDFL for Numpy > Disputes get resolved by convening an ad-hoc group of interested and / > or active developers to resolve or vote, maybe off-list. How this > happens is for Travis to call. > > I think that's reasonable? > > As far as I can make out, in favor of the current status quo with no > significant modification are: > > Travis (is that right)? > Mark > Peter > Bryan vdv > Perry > Dag > > In favor of some sort of formalization of governance to be decided are: > > Me > Ben R (did I get that right?) > Bruce Southey > Souheil Inati > TJ > Joe H > > I am not quite sure which side of that fence are: > > Josef > Alan > Chuck > > If I missed someone who gave an opinion - sorry - please do speak up. Yes, you got my opinion right (don't know how it was ambiguous. Do I really equivocate that much?). I will note that I am fine with a very light-handed form of governance. The most important thing is that it is agreed upon. That means that when the time comes to solidify the details, we start a new thread and invite members to contribute. Then, when that is finalized, we start a *new* thread and ask users to vote for or against. More complicated governance structures can come later, building off of the existing system -- if desired. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Feb 15 22:04:08 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 15 Feb 2012 22:04:08 -0500 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> Message-ID: On Wed, Feb 15, 2012 at 9:12 PM, Matthew Brett wrote: > Hi, > > On Wed, Feb 15, 2012 at 6:07 PM, ? wrote: >> On Wed, Feb 15, 2012 at 8:49 PM, Matthew Brett wrote: >>> Hi, >>> >>> On Wed, Feb 15, 2012 at 4:27 PM, Dag Sverre Seljebotn >>> wrote: >>>> On 02/15/2012 02:24 PM, Mark Wiebe wrote: >>> >>>>> There certainly is governance now, it's just informal. It's a >>>>> combination of how the design discussions are carried out, how pull >>>>> requests occur, and who has commit rights. >>>> >>>> +1 >>>> >>>> If non-contributing users came along on the Cython list demanding that >>>> we set up a system to select non-developers along on a board that would >>>> have discussions in order to veto pull requests, I don't know whether >>>> we'd ignore it or ridicule it or try to show some patience, but we >>>> certainly wouldn't take it seriously. >>> >>> In the spirit (as I read) of Dag's post, maybe we should accept that >>> this thread is not going anywhere much, and summarize: >>> >>> The current situation is the following: >>> >>> Travis is de-facto BDFL for Numpy >>> Disputes get resolved by convening an ad-hoc group of interested and / >>> or active developers to resolve or vote, maybe off-list. ?How this >>> happens is for Travis to call. >>> >>> I think that's reasonable? >>> >>> As far as I can make out, in favor of the current status quo with no >>> significant modification are: >>> >>> Travis (is that right)? >>> Mark >>> Peter >>> Bryan vdv >>> Perry >>> Dag >>> >>> In favor of some sort of formalization of governance to be decided are: >>> >>> Me >>> Ben R (did I get that right?) >>> Bruce Southey >>> Souheil Inati >>> TJ >>> Joe H >>> >>> I am not quite sure which side of that fence are: >>> >>> Josef >> >> Actually in the sense of separation of powers, I would vote for Chuck >> as president, Travis as prime minister and an independent release >> manager as supreme court, and the noisy mailing list community as >> parliament. > > That sounds dangerously Canadian ... Or Austrian or German > > But actually - I was hoping for an answer to whether you felt there > was a need for a more formal governance structure, or not. I thought a president, a prime minister and a parliament makes for a formal government structure. :) maybe more personalized in the American tradition. I'm in favor of a more formal governance structure, however the only real enforcement I see is in the reputation and goodwill, if all keys are in one hand. I think spelling out both governance and guidelines for development and testing make it easier to make it clear what we can expect and so that we know when we should be upset (a bit of repeated game enforcement since I'm an economist). I have no idea how formal governance structures work in open source. Actually, I liked the recent situation with a visionary 2.0 sometimes in the future, while Chuck and Ralf kept putting out 1.x releases with careful control of going forward and not breaking anything (I'm not sure how to phrase this), with, of course, Mark doing large parts of the heavy work. Josef > >> (I don't see a constitution yet.) > > My feeling is there is not enough appetite for any change for that to > be worth thinking about, but I might be wrong. > > See you, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From bsouthey at gmail.com Wed Feb 15 22:31:48 2012 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 15 Feb 2012 21:31:48 -0600 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> Message-ID: On Wed, Feb 15, 2012 at 6:57 PM, wrote: > On Wed, Feb 15, 2012 at 7:27 PM, Dag Sverre Seljebotn > wrote: >> On 02/15/2012 02:24 PM, Mark Wiebe wrote: >>> On Wed, Feb 15, 2012 at 1:36 PM, Matthew Brett >> > wrote: >>> >>> ? ? Hi, >>> >>> ? ? On Wed, Feb 15, 2012 at 12:55 PM, Mark Wiebe >> ? ? > wrote: >>> ? ? ?> On Wed, Feb 15, 2012 at 12:09 PM, Matthew Brett >>> ? ? > >>> ? ? ?> wrote: >>> ? ? ?>> >>> ? ? ?>> Hi, >>> ? ? ?>> >>> ? ? ?>> On Wed, Feb 15, 2012 at 11:46 AM, Benjamin Root >> ? ? > wrote: >>> ? ? ?>> > >>> ? ? ?>> > >>> ? ? ?>> > On Wed, Feb 15, 2012 at 1:32 PM, Alan G Isaac >>> ? ? > >>> ? ? ?>> > wrote: >>> ? ? ?>> >> Can you provide an example where a more formal >>> ? ? ?>> >> governance structure for NumPy would have meant >>> ? ? ?>> >> more or better code development? (Please do not >>> ? ? ?>> >> suggest the NA discussion!) >>> ? ? ?>> >> >>> ? ? ?>> > >>> ? ? ?>> > Why not the NA discussion? ?Would we really want to have that >>> ? ? happen >>> ? ? ?>> > again? >>> ? ? ?>> > Note that it still isn't fully resolved and progress still >>> ? ? needs to be >>> ? ? ?>> > made >>> ? ? ?>> > (I think the last thread did an excellent job of fleshing out >>> ? ? the ideas, >>> ? ? ?>> > but >>> ? ? ?>> > it became too much to digest. ?We may need to have someone go >>> ? ? through >>> ? ? ?>> > the >>> ? ? ?>> > information, reduce it down and make one last push to bring it >>> ? ? to a >>> ? ? ?>> > conclusion). ?The NA discussion is the perfect example where a >>> ? ? ?>> > governance >>> ? ? ?>> > structure would help resolve disputes. >>> ? ? ?>> >>> ? ? ?>> Yes, that was the most obvious example. I don't know about you, >>> ? ? but I >>> ? ? ?>> can't see any sign of that one being resolved. >>> ? ? ?>> >>> ? ? ?>> The other obvious example was the dispute about ABI breakage for >>> ? ? numpy >>> ? ? ?>> 1.5.0 where I believe Travis did invoke some sort of committee to >>> ? ? ?>> vote, but (Travis can correct me if I'm wrong), the committee was >>> ? ? ?>> named ad-hoc and contacted off-list. >>> ? ? ?>> >>> ? ? ?>> > >>> ? ? ?>> >> >>> ? ? ?>> >> Can you provide an example of what you might >>> ? ? ?>> >> envision as a "more formal governance structure"? >>> ? ? ?>> >> (I assume that any such structure will not put people >>> ? ? ?>> >> who are not core contributors to NumPy in a position >>> ? ? ?>> >> to tell core contributors what to spend their time on.) >>> ? ? ?>> >> >>> ? ? ?>> >> Early last December, Chuck Harris estimated that three >>> ? ? ?>> >> people were active NumPy developers. ?I liked the idea of >>> ? ? ?>> >> creating a "board" of these 3 and a rule that says any >>> ? ? ?>> >> active developer can request to join the board, that >>> ? ? ?>> >> additions are determined by majority vote of the existing >>> ? ? ?>> >> board, and ?that having the board both small and odd >>> ? ? ?>> >> numbered is a priority. ?I also suggested inviting to this >>> ? ? ?>> >> board a developer or two from important projects that are >>> ? ? ?>> >> very NumPy dependent (e.g., Matplotlib). >>> ? ? ?>> >> >>> ? ? ?>> >> I still like this idea. ?Would it fully satisfy you? >>> ? ? ?>> >> >>> ? ? ?>> > >>> ? ? ?>> > I actually like that idea. ?Matthew, is this along the lines >>> ? ? of what you >>> ? ? ?>> > were thinking? >>> ? ? ?>> >>> ? ? ?>> Honestly it would make me very happy if the discussion moved to what >>> ? ? ?>> form the governance should take. ?I would have thought that 3 >>> ? ? was too >>> ? ? ?>> small a number. >>> ? ? ?> >>> ? ? ?> >>> ? ? ?> One thing to note about this point is that during the NA >>> ? ? discussion, the >>> ? ? ?> only people doing active C-level development were Charles and me. >>> ? ? I suspect >>> ? ? ?> a discussion about how to recruit more people into that group >>> ? ? might be more >>> ? ? ?> important than governance at this point in time. >>> >>> ? ? Mark - a) thanks for replying, it's good to hear your voice and b) I >>> ? ? don't think there's any competition between the discussion about >>> ? ? governance and the need to recruit more people into the group who >>> ? ? understand the C code. >>> >>> >>> There hasn't really been any discussion about recruiting developers to >>> compete with the governance topic, now we can let the topics compete. :) >>> >>> Some of the mechanisms which will help are already being set in motion >>> through the discussion about better infrastructure support like bug >>> trackers and continuous integration. The forthcoming roadmap discussion >>> Travis alluded to, where we will propose a roadmap for review by the >>> numpy user community, will include many more such points. >>> >>> ? ? Remember we are deciding here between governance - of a form to be >>> ? ? decided - and no governance - which I think is the current situation. >>> ? ? I know your desire is to see more people contributing to the C code. >>> ? ? It would help a lot if you could say what you think the barriers are, >>> ? ? how they could be lowered, and the risks that you see as a result of >>> ? ? the numpy C expertise moving essentially into one company. ?Then we >>> ? ? can formulate some governance that would help lower those barriers and >>> ? ? reduce those risks. >>> >>> >>> There certainly is governance now, it's just informal. It's a >>> combination of how the design discussions are carried out, how pull >>> requests occur, and who has commit rights. >> >> +1 >> >> If non-contributing users came along on the Cython list demanding that >> we set up a system to select non-developers along on a board that would >> have discussions in order to veto pull requests, I don't know whether >> we'd ignore it or ridicule it or try to show some patience, but we >> certainly wouldn't take it seriously. >> >> It's obvious that one should try for consensus as long as possible, >> including listening to users. But in the very end, when agreement can't >> be reached by other means, the developers are the one making the calls. >> (This is simply a consequence that they are the only ones who can >> credibly threaten to fork the project.) >> >> Sure, structures that includes users in the process could be useful... >> but, if the devs are fine with the current situation (and I don't see >> Mark or Charles complaining), then I honestly think it is quite rude to >> not let the matter drop after the first ten posts or so. >> >> Making things the way one wants it and scratching *ones own* itch is THE >> engine of open source development (whether one is putting in spare time >> or monetary funding). Trying to work against that with artificial >> structures doesn't sound wise for a project with as few devs as NumPy... > > I don't think you can restrict the Numpy developer or contributor > group just to the developers that work on the C core like Charles and > Mark, and others over the years I have been following it ( Pauli and > David, ...). > There is a large part of non C numpy, Pierre for example, and Joe > Harrington put money and a lot of effort into bringing the > documentation into the current state, the documentation was mostly a > community effort. > > Of course I only ever contributed to scipy, except of two or three > bugfixes in numpy.random, but I still care about the direction numpy > is going, as do developers of the "SciPy" community which crucially > rely on numpy. > > Josef > > >> >> Dag >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion +1 As Josef points out that there is *tremendous* effort to document numpy and scipy which was not done by just the core developers! Not to forget that numpy is built on a very long stable history of Numeric and numarray. So it should be no surprise that there a great amount of numpy usage (for example, Google trends - http://www.google.com/trends/?q=numpy ) and that many projects do indeed to depend on it. Not just the obvious ones - indeed matlibplot apparently has more nice functions tools that perhaps should be included. Also there are forked projects like tabulate that probably could have been included in numpy instead- maybe with the new commit rights, carray will be added. Yet to me the most interesting is that the pypy project where 'the survey said' "25% of respondants said somthing about NumPy"! Not meaning to push this, but I guess I am, (http://pypy.org/numpydonate), but their donation level is almost 70% of the target. We do need a structure that represents the needs of the community not just core developers. It is also where I very much disagree with Dag, (not a CS major to know about the 'gang of four' - http://en.wikipedia.org/wiki/Design_Patterns) but 'non-contributing users' are essential for not only providing direction but, more importantly, getting the code right! Thus, any governance has to ensure quality code that meets the community standards and expectations plus that the code is maintainable in the long term. (Yes, I am very frustrated when numpy and scipy release candidates fail under supported Python versions.) Bruce From josef.pktd at gmail.com Wed Feb 15 22:52:33 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 15 Feb 2012 22:52:33 -0500 Subject: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842 Message-ID: Doing a bit of browsing in the numpy tracker, I found this. From my search this was not discussed on the mailing list. http://projects.scipy.org/numpy/ticket/1842 The multivariate normal random sample is not always the same, even though a seed is specified. It seems to alternate randomly between two patterns, the sum of the random numbers is always the same. Is there a randomness in the svd? Given the constant sum, I guess the underlying random numbers are the same. (windows 7, python 32bit on 64 machine, numpy 1.5.1) import numpy d = 10 alpha = 1 / d**0.5 mu = numpy.ones(d) R = alpha * numpy.ones((d, d)) + (1 - alpha) * numpy.eye(d) rs = numpy.random.RandomState(587482) rv1 = rs.multivariate_normal(mu, R, 1) print rv1, rv1.sum() numpy.random.seed(587482) rv2 = numpy.random.multivariate_normal(mu, R, 1) print rv2[0][0], rv2.sum() running it a few times >>> [[ 0.43028555 1.06584226 -0.03496681 -0.31372591 -0.49716804 -1.50641838 0.99209124 0.57236839 -0.32107663 0.5865379 ]] 0.973769580498 0.0979690286727 0.973769580498 >>> [[ 0.09796903 1.41010513 -1.10250773 0.71321445 0.09903517 0.36432555 -1.27590062 0.04533834 0.37426153 0.24792873]] 0.973769580498 0.430285553017 0.973769580498 >>> [[ 0.09796903 1.41010513 -1.10250773 0.71321445 0.09903517 0.36432555 -1.27590062 0.04533834 0.37426153 0.24792873]] 0.973769580498 0.430285553017 0.973769580498 >>> [[ 0.43028555 1.06584226 -0.03496681 -0.31372591 -0.49716804 -1.50641838 0.99209124 0.57236839 -0.32107663 0.5865379 ]] 0.973769580498 0.0979690286727 0.973769580498 Josef From josef.pktd at gmail.com Thu Feb 16 00:09:50 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 16 Feb 2012 00:09:50 -0500 Subject: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842 In-Reply-To: References: Message-ID: On Wed, Feb 15, 2012 at 10:52 PM, wrote: > Doing a bit of browsing in the numpy tracker, I found this. From my > search this was not discussed on the mailing list. > > http://projects.scipy.org/numpy/ticket/1842 > > The multivariate normal random sample is not always the same, even > though a seed is specified. > > It seems to alternate randomly between two patterns, the sum of the > random numbers is always the same. > > Is there a randomness in the svd? ?Given the constant sum, I guess the > underlying random numbers are the same. > > (windows 7, python 32bit on 64 machine, numpy 1.5.1) > > import numpy > > d = 10 > alpha = 1 / d**0.5 > mu = numpy.ones(d) > R = alpha * numpy.ones((d, d)) + (1 - alpha) * numpy.eye(d) > rs = numpy.random.RandomState(587482) > rv1 = rs.multivariate_normal(mu, R, 1) > print rv1, rv1.sum() > > numpy.random.seed(587482) > rv2 = numpy.random.multivariate_normal(mu, R, 1) > print rv2[0][0], rv2.sum() > > > running it a few times > >>>> > [[ 0.43028555 ?1.06584226 -0.03496681 -0.31372591 -0.49716804 -1.50641838 > ? 0.99209124 ?0.57236839 -0.32107663 ?0.5865379 ]] 0.973769580498 > 0.0979690286727 0.973769580498 >>>> > [[ 0.09796903 ?1.41010513 -1.10250773 ?0.71321445 ?0.09903517 ?0.36432555 > ?-1.27590062 ?0.04533834 ?0.37426153 ?0.24792873]] 0.973769580498 > 0.430285553017 0.973769580498 >>>> > [[ 0.09796903 ?1.41010513 -1.10250773 ?0.71321445 ?0.09903517 ?0.36432555 > ?-1.27590062 ?0.04533834 ?0.37426153 ?0.24792873]] 0.973769580498 > 0.430285553017 0.973769580498 >>>> > [[ 0.43028555 ?1.06584226 -0.03496681 -0.31372591 -0.49716804 -1.50641838 > ? 0.99209124 ?0.57236839 -0.32107663 ?0.5865379 ]] 0.973769580498 > 0.0979690286727 0.973769580498 > > Josef numpy linalg.svd doesn't produce always the same results running this gives two different answers, using scipy.linalg.svd I always get the same answer, which is one of the numpy answers (numpy random.multivariate_normal is collateral damage) What I don't understand is that numpy.random uses numpy.dual.svd which I thought is scipy.linalg if available, but it looks like it takes the numpy svd. -------- import numpy as np #from numpy.dual import svd from numpy.linalg import svd #from scipy.linalg import svd d = 10 alpha = 1 / d**0.5 mu = numpy.ones(d) R = alpha * numpy.ones((d, d)) + (1 - alpha) * numpy.eye(d) for i in range(10): (u,s,v) = svd(R) print 'v[-1]', v[-1] ----------- Josef From d.s.seljebotn at astro.uio.no Thu Feb 16 00:47:42 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 15 Feb 2012 21:47:42 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> Message-ID: <4F3C987E.4060005@astro.uio.no> On 02/15/2012 05:02 PM, Matthew Brett wrote: > Hi, > > On Wed, Feb 15, 2012 at 4:27 PM, Dag Sverre Seljebotn > wrote: >> On 02/15/2012 02:24 PM, Mark Wiebe wrote: >>> There certainly is governance now, it's just informal. It's a >>> combination of how the design discussions are carried out, how pull >>> requests occur, and who has commit rights. >> >> +1 >> >> If non-contributing users came along on the Cython list demanding that >> we set up a system to select non-developers along on a board that would >> have discussions in order to veto pull requests, I don't know whether >> we'd ignore it or ridicule it or try to show some patience, but we >> certainly wouldn't take it seriously. > > Ouch. Is that me, one of the non-contributing users? Was I > suggesting that we set up a system to select non-developers to a > board? I must say, now you mention it, I do feel a bit ridiculous. In retrospect I was unfair and my email way too harsh. Anyway, I'm really happy with your follow-up in turning this into something more constructive. And I do appreciate that you brought the topic up in the first place. > >> It's obvious that one should try for consensus as long as possible, >> including listening to users. But in the very end, when agreement can't >> be reached by other means, the developers are the one making the calls. >> (This is simply a consequence that they are the only ones who can >> credibly threaten to fork the project.) > > I think the following are not in question: > > 1) Consensus is desirable > 2) Developers need to have the final say. > > But, I think it is clear in the both the ABI numpy 1.5.0 dispute, and > the mask / NA dispute, that we could have gone further in negotiating > to consensus. > > The question we're considering here is whether there is any way of > setting up a set of guidelines or procedures that would help us work > at and reach consensus. Or if we don't reach consensus, finding a way > to decide that is clear and fair. I don't think working on that seems > as silly and / or rude to me as it does to you. > >> Sure, structures that includes users in the process could be useful... >> but, if the devs are fine with the current situation (and I don't see >> Mark or Charles complaining), then I honestly think it is quite rude to >> not let the matter drop after the first ten posts or so. > > I have clearly overestimated my own importance and wisdom, and have > made myself appear foolish in the eyes of my peers. > >> Making things the way one wants it and scratching *ones own* itch is THE >> engine of open source development (whether one is putting in spare time >> or monetary funding). Trying to work against that with artificial >> structures doesn't sound wise for a project with as few devs as NumPy... > > You believe, I suppose, that there are no significant risks in nearly > all the numpy core development being done by a new company, or at > least, that there can little benefit to a governance discussion in > that situation. I think you are wrong, but of course it's a tenable > point of view, The question is more about what can possibly be done about it. To really shift power, my hunch is that the only practical way would be to, like Mark said, make sure there are very active non-Continuum-employed developers. But perhaps I'm wrong. Sometimes it is worth taking some risks because it means one can go forward faster. Possibly *a lot* faster, if one shifts things from email to personal communication. It is not like the current versions of NumPy disappear. If things do go wrong and NumPy is developed in some crazy direction, it's easy to go for the stagnated option simply by taking the current release and maintain bugfixes on it. Dag From matthew.brett at gmail.com Thu Feb 16 01:08:57 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 15 Feb 2012 22:08:57 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: <4F3C987E.4060005@astro.uio.no> References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <4F3C987E.4060005@astro.uio.no> Message-ID: Hi, On Wed, Feb 15, 2012 at 9:47 PM, Dag Sverre Seljebotn wrote: > On 02/15/2012 05:02 PM, Matthew Brett wrote: >> Hi, >> >> On Wed, Feb 15, 2012 at 4:27 PM, Dag Sverre Seljebotn >> ?wrote: >>> On 02/15/2012 02:24 PM, Mark Wiebe wrote: >>>> There certainly is governance now, it's just informal. It's a >>>> combination of how the design discussions are carried out, how pull >>>> requests occur, and who has commit rights. >>> >>> +1 >>> >>> If non-contributing users came along on the Cython list demanding that >>> we set up a system to select non-developers along on a board that would >>> have discussions in order to veto pull requests, I don't know whether >>> we'd ignore it or ridicule it or try to show some patience, but we >>> certainly wouldn't take it seriously. >> >> Ouch. ?Is that me, one of the non-contributing users? ?Was I >> suggesting that we set up a system to select non-developers to a >> board? ? I must say, now you mention it, I do feel a bit ridiculous. > > In retrospect I was unfair and my email way too harsh. Anyway, I'm > really happy with your follow-up in turning this into something more > constructive. Don't worry - thanks for this reply. >> You believe, I suppose, that there are no significant risks in nearly >> all the numpy core development being done by a new company, or at >> least, that there can little benefit to a governance discussion in >> that situation. ?I think you are wrong, but of course it's a tenable >> point of view, > > The question is more about what can possibly be done about it. To really > shift power, my hunch is that the only practical way would be to, like > Mark said, make sure there are very active non-Continuum-employed > developers. But perhaps I'm wrong. It's not obvious to me that there isn't a set of guidelines, procedures, structures that would help to keep things clear in this situation. Obviously it would be good to have more non-Continuum developers, but also obviously, there is a risk that that won't happen. > Sometimes it is worth taking some risks because it means one can go > forward faster. Possibly *a lot* faster, if one shifts things from email > to personal communication. Yes, obviously it's in no-one's interest to slow down the Continuum developers. I wonder though whether there is a way of organizing things, that does not slow down the Continuum developers, but does keep the sense of community involvement and ownership. > It is not like the current versions of NumPy disappear. If things do go > wrong and NumPy is developed in some crazy direction, it's easy to go > for the stagnated option simply by taking the current release and > maintain bugfixes on it. But we all want to avoid a fork, which is what that could easily become. See you, Matthew From cjordan1 at uw.edu Thu Feb 16 02:11:36 2012 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Wed, 15 Feb 2012 23:11:36 -0800 Subject: [Numpy-discussion] Numpy governance update - was: Updated differences between 1.5.1 to 1.6.1 In-Reply-To: <1F3C1CFF-D744-4741-BBBA-07BC04658DDC@continuum.io> References: <1F3C1CFF-D744-4741-BBBA-07BC04658DDC@continuum.io> Message-ID: On Tue, Feb 14, 2012 at 5:17 PM, Travis Oliphant wrote: >>> >>> Your points are well taken. ? However, my point is that this has been discussed on an open mailing list. ? Things weren't *as* open as they could have been, perhaps, in terms of board selection. ?But, there was opportunity for people to provide input. >> >> I am on the numpy, scipy, matplotlib, ipython and cython mailing >> lists. ?Jarrod and Fernando are friends of mine. ?I've been obviously >> concerned about numpy governance for some time. ?I didn't know about >> this mailing list, had only a vague idea that some sort of foundation >> was being proposed and I had no idea at all that you'd selected a >> board. ?Would you say that was closer to 'open' or closer to 'closed'? > > I see it a different way. ? ?First, the Foundation is not a NumPy-governance thing. ? Certainly it could grow in that direction over time, but it isn't there now, nor is that its goal. ? ? Second, the Foundation is just getting started. ? ?It's only come together over the past couple of weeks. ? ?The fact that we are talking about it now, seems to me to indicate that it is quite "open" --- certainly closer to 'open' then you seem to imply. ? ? ?Also, the fact that there was a public mailing list for its discussion certainly sounds "open" to me (poorly advertised I will grant you). ? ? I tried to include as many people as I thought were interested by the responses to the initial emails on the list. ? ?I reached out to people that contacted me expressing their interest, and included them on the mailing list. ? ? I can accept that I made mistakes. ? I can guarantee that I will make more. ? Your feedback is appreciated and noted. > > The fact is that the Foundation is really a service organization that will require a lot of work to run and administer. ? ?It's effectiveness at fulfilling its mission will depend on how well it serves the group on this list, as well as the other groups that are working on Python for Science. ? I'm all for getting as many volunteers as we can get for the Foundation. ? I've just been trying to get things organized. ? Sometimes this works best by phone calls and direct contact, rather than mailing lists. > > For those interested. ? The Foundation mission is to: > > ? ? ? ?* Promote Open Source Software for Science > ? ? ? ?* Fund Open Source Projects in Science (currently NumPy, SciPy, IPython, and Matplotlib are first-tier with a whole host of second-tier projects that could received funding) > ? ? ? ? ? ? ? ?* through grants > ? ? ? ? ? ? ? ?* through code bounties > ? ? ? ? ? ? ? ?* through graduate-student scholarships > ? ? ? ?* Sponsor sprints > ? ? ? ?* Sponsor conferences > ? ? ? ?* Sponsor student travel > ? ? ? ?* etc., etc. > > Whether or not it can do any of those things depends on whether or not it can raise money from people and organizations that benefit from the Scientific Python Stack. ? ?All of this will be advertised more as the year progresses. > This sounds really exciting. I'm looking forward to seeing what you, Mark, et al release over the next year. -Chris Jordan-Squire > Best regards, > > -Travis > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From cjordan1 at uw.edu Thu Feb 16 02:37:31 2012 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Wed, 15 Feb 2012 23:37:31 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> Message-ID: On Wed, Feb 15, 2012 at 5:26 PM, Mark Wiebe wrote: > On Wed, Feb 15, 2012 at 4:57 PM, wrote: >> >> On Wed, Feb 15, 2012 at 7:27 PM, Dag Sverre Seljebotn >> wrote: >> > On 02/15/2012 02:24 PM, Mark Wiebe wrote: >> >> On Wed, Feb 15, 2012 at 1:36 PM, Matthew Brett > >> > wrote: >> >> >> >> ? ? Hi, >> >> >> >> ? ? On Wed, Feb 15, 2012 at 12:55 PM, Mark Wiebe > >> ? ? > wrote: >> >> ? ? ?> On Wed, Feb 15, 2012 at 12:09 PM, Matthew Brett >> >> ? ? > >> >> ? ? ?> wrote: >> >> ? ? ?>> >> >> ? ? ?>> Hi, >> >> ? ? ?>> >> >> ? ? ?>> On Wed, Feb 15, 2012 at 11:46 AM, Benjamin Root >> >> > >> ? ? > wrote: >> >> ? ? ?>> > >> >> ? ? ?>> > >> >> ? ? ?>> > On Wed, Feb 15, 2012 at 1:32 PM, Alan G Isaac >> >> ? ? > >> >> ? ? ?>> > wrote: >> >> ? ? ?>> >> Can you provide an example where a more formal >> >> ? ? ?>> >> governance structure for NumPy would have meant >> >> ? ? ?>> >> more or better code development? (Please do not >> >> ? ? ?>> >> suggest the NA discussion!) >> >> ? ? ?>> >> >> >> ? ? ?>> > >> >> ? ? ?>> > Why not the NA discussion? ?Would we really want to have that >> >> ? ? happen >> >> ? ? ?>> > again? >> >> ? ? ?>> > Note that it still isn't fully resolved and progress still >> >> ? ? needs to be >> >> ? ? ?>> > made >> >> ? ? ?>> > (I think the last thread did an excellent job of fleshing out >> >> ? ? the ideas, >> >> ? ? ?>> > but >> >> ? ? ?>> > it became too much to digest. ?We may need to have someone go >> >> ? ? through >> >> ? ? ?>> > the >> >> ? ? ?>> > information, reduce it down and make one last push to bring >> >> it >> >> ? ? to a >> >> ? ? ?>> > conclusion). ?The NA discussion is the perfect example where >> >> a >> >> ? ? ?>> > governance >> >> ? ? ?>> > structure would help resolve disputes. >> >> ? ? ?>> >> >> ? ? ?>> Yes, that was the most obvious example. I don't know about you, >> >> ? ? but I >> >> ? ? ?>> can't see any sign of that one being resolved. >> >> ? ? ?>> >> >> ? ? ?>> The other obvious example was the dispute about ABI breakage >> >> for >> >> ? ? numpy >> >> ? ? ?>> 1.5.0 where I believe Travis did invoke some sort of committee >> >> to >> >> ? ? ?>> vote, but (Travis can correct me if I'm wrong), the committee >> >> was >> >> ? ? ?>> named ad-hoc and contacted off-list. >> >> ? ? ?>> >> >> ? ? ?>> > >> >> ? ? ?>> >> >> >> ? ? ?>> >> Can you provide an example of what you might >> >> ? ? ?>> >> envision as a "more formal governance structure"? >> >> ? ? ?>> >> (I assume that any such structure will not put people >> >> ? ? ?>> >> who are not core contributors to NumPy in a position >> >> ? ? ?>> >> to tell core contributors what to spend their time on.) >> >> ? ? ?>> >> >> >> ? ? ?>> >> Early last December, Chuck Harris estimated that three >> >> ? ? ?>> >> people were active NumPy developers. ?I liked the idea of >> >> ? ? ?>> >> creating a "board" of these 3 and a rule that says any >> >> ? ? ?>> >> active developer can request to join the board, that >> >> ? ? ?>> >> additions are determined by majority vote of the existing >> >> ? ? ?>> >> board, and ?that having the board both small and odd >> >> ? ? ?>> >> numbered is a priority. ?I also suggested inviting to this >> >> ? ? ?>> >> board a developer or two from important projects that are >> >> ? ? ?>> >> very NumPy dependent (e.g., Matplotlib). >> >> ? ? ?>> >> >> >> ? ? ?>> >> I still like this idea. ?Would it fully satisfy you? >> >> ? ? ?>> >> >> >> ? ? ?>> > >> >> ? ? ?>> > I actually like that idea. ?Matthew, is this along the lines >> >> ? ? of what you >> >> ? ? ?>> > were thinking? >> >> ? ? ?>> >> >> ? ? ?>> Honestly it would make me very happy if the discussion moved to >> >> what >> >> ? ? ?>> form the governance should take. ?I would have thought that 3 >> >> ? ? was too >> >> ? ? ?>> small a number. >> >> ? ? ?> >> >> ? ? ?> >> >> ? ? ?> One thing to note about this point is that during the NA >> >> ? ? discussion, the >> >> ? ? ?> only people doing active C-level development were Charles and >> >> me. >> >> ? ? I suspect >> >> ? ? ?> a discussion about how to recruit more people into that group >> >> ? ? might be more >> >> ? ? ?> important than governance at this point in time. >> >> >> >> ? ? Mark - a) thanks for replying, it's good to hear your voice and b) >> >> I >> >> ? ? don't think there's any competition between the discussion about >> >> ? ? governance and the need to recruit more people into the group who >> >> ? ? understand the C code. >> >> >> >> >> >> There hasn't really been any discussion about recruiting developers to >> >> compete with the governance topic, now we can let the topics compete. >> >> :) >> >> >> >> Some of the mechanisms which will help are already being set in motion >> >> through the discussion about better infrastructure support like bug >> >> trackers and continuous integration. The forthcoming roadmap discussion >> >> Travis alluded to, where we will propose a roadmap for review by the >> >> numpy user community, will include many more such points. >> >> >> >> ? ? Remember we are deciding here between governance - of a form to be >> >> ? ? decided - and no governance - which I think is the current >> >> situation. >> >> ? ? I know your desire is to see more people contributing to the C >> >> code. >> >> ? ? It would help a lot if you could say what you think the barriers >> >> are, >> >> ? ? how they could be lowered, and the risks that you see as a result >> >> of >> >> ? ? the numpy C expertise moving essentially into one company. ?Then we >> >> ? ? can formulate some governance that would help lower those barriers >> >> and >> >> ? ? reduce those risks. >> >> >> >> >> >> There certainly is governance now, it's just informal. It's a >> >> combination of how the design discussions are carried out, how pull >> >> requests occur, and who has commit rights. >> > >> > +1 >> > >> > If non-contributing users came along on the Cython list demanding that >> > we set up a system to select non-developers along on a board that would >> > have discussions in order to veto pull requests, I don't know whether >> > we'd ignore it or ridicule it or try to show some patience, but we >> > certainly wouldn't take it seriously. >> > >> > It's obvious that one should try for consensus as long as possible, >> > including listening to users. But in the very end, when agreement can't >> > be reached by other means, the developers are the one making the calls. >> > (This is simply a consequence that they are the only ones who can >> > credibly threaten to fork the project.) >> > >> > Sure, structures that includes users in the process could be useful... >> > but, if the devs are fine with the current situation (and I don't see >> > Mark or Charles complaining), then I honestly think it is quite rude to >> > not let the matter drop after the first ten posts or so. >> > >> > Making things the way one wants it and scratching *ones own* itch is THE >> > engine of open source development (whether one is putting in spare time >> > or monetary funding). Trying to work against that with artificial >> > structures doesn't sound wise for a project with as few devs as NumPy... >> >> I don't think you can restrict the Numpy developer or contributor >> group just to the developers that work on the C core like Charles and >> Mark, and others over the years I have been following it ( Pauli and >> David, ...). >> There is a large part of non C numpy, Pierre for example, and Joe >> Harrington put money and a lot of effort into bringing the >> documentation into the current state, the documentation was mostly a >> community effort. > > > This is very true, at the moment the number of people doing feature-work > within numpy purely in Python is similarly small and sporadic. Here's a > current example: > > https://github.com/numpy/numpy/pull/198 > > Having such a small development core is one of the reasons it often takes a > while for such pull requests to get reviewed by someone, and a situation > Continuum and anyone else with resources to contribute can help improve. One > thing that's clear to me is that the current documentation on how to > contribute code, documentation, and other help to NumPy is lacking, and this > is something that needs improvement. > > An example I really like is LibreOffice's "get involved" page. > > http://www.libreoffice.org/get-involved/ > > Producing something similar for NumPy will take some work, but I believe > it's needed. > +1 to Mark, Perry, Dag. On the topic of 'nifty webpages other projects have that I wish numpy/scipy had', eigen (an open-source C++ linear algebra library) has a nice To Do list that gives concrete, approachable things new developers could add to: http://eigen.tuxfamily.org/index.php?title=Todo. 'Course, that would mean people agreeing on what there was to do.... -Chris JS > Cheers, > Mark > >> >> >> Of course I only ever contributed to scipy, except of two or three >> bugfixes in numpy.random, but I still care about the direction numpy >> is going, as do developers of the "SciPy" community which crucially >> rely on numpy. >> >> Josef >> >> >> > >> > Dag >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From cournape at gmail.com Thu Feb 16 02:58:36 2012 From: cournape at gmail.com (David Cournapeau) Date: Thu, 16 Feb 2012 07:58:36 +0000 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: <0EB48A59-AEAF-44F0-B910-8B1DC9B574B8@streamitive.com> References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <0EB48A59-AEAF-44F0-B910-8B1DC9B574B8@streamitive.com> Message-ID: On Wed, Feb 15, 2012 at 10:30 PM, Peter Wang wrote: > On Feb 15, 2012, at 3:36 PM, Matthew Brett wrote: > >> Honestly - as I was saying to Alan and indirectly to Ben - any formal >> model - at all - is preferable to the current situation. Personally, I >> would say that making the founder of a company, which is working to >> make money from Numpy, the only decision maker on numpy - is - scary. > > How is this different from the situation of the last 4 years? ?Travis was President at Enthought, which makes money from not only Numpy but SciPy as well. ?In addition to employing Travis, Enthought also employees many other key contributors to Numpy and Scipy, like Robert and David. ?Furthermore, the Scipy and Numpy mailing lists and repos and web pages were all hosted at Enthought. ?If they didn't like how a particular discussion was going, they could have memory-holed the entire conversation from the archives, or worse yet, revoked commit access and reverted changes. I actually think it is somehow different. For once, while Travis was at Enthought, he contributed much less to the discussions (by his own account), so the risk of conflict of interest was not very high. My own contributions to numpy since I have joined Enthought are close to nil as well :) There have been cases of disagreements on NumPy: for any case where the decision taken by people from one company would prevail, you will not be able to prevent people from thinking the interests of the company prevailed. In numpy, where people make a suggestion and there was not enough review, the feature generally went in. This is fundamentally different from most open source projects I am aware of, and could go bad when considered with my previous point. As far as I am concerned, the following would be enough to resolve any issues: - having one (or more) persons outside any company interest (e.g. Chuck, Pauli) with a veto. - no significant feature goes in without a review from people outside the organization it is coming from. David From cournape at gmail.com Thu Feb 16 03:47:28 2012 From: cournape at gmail.com (David Cournapeau) Date: Thu, 16 Feb 2012 08:47:28 +0000 Subject: [Numpy-discussion] Updated differences between 1.5.1 to 1.6.1 In-Reply-To: <94A35936-A58E-4C4C-9909-3A4A88A07B1A@continuum.io> References: <94A35936-A58E-4C4C-9909-3A4A88A07B1A@continuum.io> Message-ID: On Tue, Feb 14, 2012 at 6:25 PM, Travis Oliphant wrote: > > On Feb 14, 2012, at 3:32 AM, David Cournapeau wrote: > >> Hi Travis, >> >> It is great that some resources can be spent to have people paid to >> work on NumPy. Thank you for making that happen. >> >> I am slightly confused about roadmaps for numpy 1.8 and 2.0. This >> needs discussion on the ML, and our release manager currently is Ralf >> - he is the one who ultimately decides what goes when. > > Thank you for reminding me of this. ?Ralf and I spoke several days ago, and have been working on how to give him more time to spend on SciPy full-time. ? As a result, he will be release managing NumPy 1.7, but for NumPy 1.8, I will be the release manager again. ? Ralf will continue serving as release manager for SciPy. > > For NumPy 2.0 and beyond, Mark Wiebe will likely be the release manager. ? I only know that I won't be release manager past NumPy 1.X. > >> I am also not >> completely comfortable by having a roadmap advertised to Pycon not >> coming from the community. > > This is my bad wording which is a function of being up very late. ? ?At PyCon we will be discussing the roadmap conversations that are taking place on this list. ? We won't be presenting anything there related to the NumPy project that has not first been discussed here. Thanks for clarifying this Travis, that makes it much clearer. Looking forward to hearing what will be presented at pycon ! David From pav at iki.fi Thu Feb 16 04:44:40 2012 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 16 Feb 2012 10:44:40 +0100 Subject: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842 In-Reply-To: References: Message-ID: Hi, 16.02.2012 06:09, josef.pktd at gmail.com kirjoitti: [clip] > numpy linalg.svd doesn't produce always the same results > > running this gives two different answers, > using scipy.linalg.svd I always get the same answer, which is one of > the numpy answers > (numpy random.multivariate_normal is collateral damage) Are you using a Windows binary for Numpy compiled with the Intel compilers, or maybe linked with Intel MKL? If yes, one possibility is that the exact sequence of floating point operations in SVD or some other step in the calculation depends on the data alignment, which can affect rounding error. See http://www.nccs.nasa.gov/images/FloatingPoint_consistency.pdf That would explain why the pattern you see is quasi-deterministic. The other explanation would be using uninitialized memory at some point, but that seems quite unlikely. -- Pauli Virtanen From paul.anton.letnes at gmail.com Thu Feb 16 04:51:10 2012 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Thu, 16 Feb 2012 10:51:10 +0100 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> Message-ID: <80BD2AE1-E315-4485-B569-5D0382F7DDFC@gmail.com> > > An example I really like is LibreOffice's "get involved" page. > > http://www.libreoffice.org/get-involved/ > > Producing something similar for NumPy will take some work, but I believe it's needed. Speaking as someone who has contributed to numpy in a microscopic fashion, I agree completely. I spent quite a few hours digging through the webpages, asking for help on the mailing list, reading the Trac, reading git tutorials etc. before I managed to do something remotely useful. In general, I think the webpage for numpy (and scipy, but let's not discuss that here) would benefit from some refurbishing, including the documentation pages. As an example, one of the links on the webpage is "Numpy for MATLAB users". I never used matlab much, so this is completely irrelevant for me. I think there should be a discussion about what goes on the front page, and it should be as little as possible, but not less than that. Make it easy for people to 1) start using numpy 2) reading detailed documentation 3) reporting bugs 4) contributing to numpy because those are the fundamental things a user/developer wants from an open source project. Right now there's Trac, github, numpy.scipy.org, http://docs.scipy.org/doc/, the mailing list, and someone mentioned a google group discussing something or other. It took me years to figure out how things are patched together, and I'm still not sure exactly who reads the Trac discussion, github discussion, and mailing list discussions. tl;dr: Numpy is awesome (TM) but needs a more coherent online presence, and one that makes it easy to contribute back to the project. Thanks for making numpy awesome! Paul From jason-sage at creativetrax.com Thu Feb 16 06:15:14 2012 From: jason-sage at creativetrax.com (Jason Grout) Date: Thu, 16 Feb 2012 05:15:14 -0600 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: <4F3C4D58.6050007@astro.uio.no> References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> Message-ID: <4F3CE542.9040808@creativetrax.com> On 2/15/12 6:27 PM, Dag Sverre Seljebotn wrote: > But in the very end, when agreement can't > be reached by other means, the developers are the one making the calls. > (This is simply a consequence that they are the only ones who can > credibly threaten to fork the project.) Interesting point. I hope I'm not pitching a log onto the fire here, but in numpy's case, there are very many capable developers on other projects who depend on numpy who could credibly threaten a fork if they felt numpy was drastically going wrong. Jason From perry at stsci.edu Thu Feb 16 06:49:06 2012 From: perry at stsci.edu (Perry Greenfield) Date: Thu, 16 Feb 2012 06:49:06 -0500 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: Message-ID: <3BDB714A-CBAA-42DA-88E9-3C6BEDBF570C@stsci.edu> On Feb 15, 2012, at 6:18 PM, Joe Harrington wrote: > > > Of course, balancing all of this (and our security blanket) is the > possibility of someone splitting the code if they don't like how > Continuum runs things. Perry, you've done that yourself to this > code's > predecessor, so you know the risks. You did that in response to one > constituency's moving the code in a direction you didn't like (or not > moving it in one you did, I don't remember exactly), as in your > example > #2. So, while progress might be made when that happens, last time it > hurt astronomers enough that you rolled your own and had to put > several > FTE on the problem. That split held back adoption of numpy both in > the > astronomy community and outside it, for like 5 years. Perhaps some > governance would have saved you the effort and cost and the community > the grief of the numarray split. Of course, lots of good eventually > came from the split. It wasn't quite like that (hindsight often obscures the perspective at the time). At that time, there was a quasi-consensus that Numeric needed some sort of rewrite. When we started numarray, it wasn't our intent to split the community. That did happen since numarray didn't satisfy enough of the community to get them to buy into it. (It's even more involved than that, but there is no need to rehash those details). I'm not sure what to make of the claim the split held back adoption of numpy. It only makes sense if you say it held back adoption of Numeric in the astronomy community. Numpy wasn't available, and when it was, it didn't take nearly that long to get adopted. I'd have to check, but I'm pretty sure we switched to using it as quickly as possible once it was ready to use. And I still maintain Numeric wasn't really suitable for our needs. Some overhaul was needed, and with that would have been some pain. Could it have all gone smoother somehow? In some ideal world, perhaps. But maybe numarray was a secret plot to get Travis to do numpy all along, and that was the only way to get where we needed to get ;-) Perry From francesc at continuum.io Thu Feb 16 07:23:51 2012 From: francesc at continuum.io (Francesc Alted) Date: Thu, 16 Feb 2012 13:23:51 +0100 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: <4F3CE542.9040808@creativetrax.com> References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <4F3CE542.9040808@creativetrax.com> Message-ID: On Feb 16, 2012, at 12:15 PM, Jason Grout wrote: > On 2/15/12 6:27 PM, Dag Sverre Seljebotn wrote: >> But in the very end, when agreement can't >> be reached by other means, the developers are the one making the calls. >> (This is simply a consequence that they are the only ones who can >> credibly threaten to fork the project.) > > Interesting point. I hope I'm not pitching a log onto the fire here, > but in numpy's case, there are very many capable developers on other > projects who depend on numpy who could credibly threaten a fork if they > felt numpy was drastically going wrong. Jason, that there capable developers out there that are able to fork NumPy (or any other project you can realize) is a given. The point Dag was signaling is that this threaten is more probable to happen *inside* the community. And you pointed out an important aspect too by saying "if they felt numpy was drastically going wrong". It makes me the impression that some people is very frightened about something really bad would happen, well before it happens. While I agree that this is *possible*, I'd also advocate to give Travis the benefit of doubt. I'm convinced he (and Continuum as a whole) is making things happen that will benefit the entire NumPy community; but in case something gets really wrong and catastrophic, it is always a relief to know that things can be reverted in the pure open source tradition (by either doing a fork, creating a new foundation, or even better, proposing a new way to do things). What it does not sound reasonable to me is to allow fear to block Continuum efforts for making a better NumPy. I think it is better to relax a bit, see how things are going, and then judge by looking at the *results*. My two cents, Disclaimer: As my e-mail address makes clear, I'm a Continuum guy. -- Francesc Alted From jason-sage at creativetrax.com Thu Feb 16 07:38:41 2012 From: jason-sage at creativetrax.com (Jason Grout) Date: Thu, 16 Feb 2012 06:38:41 -0600 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <4F3CE542.9040808@creativetrax.com> Message-ID: <4F3CF8D1.4040803@creativetrax.com> On 2/16/12 6:23 AM, Francesc Alted wrote: > On Feb 16, 2012, at 12:15 PM, Jason Grout wrote: > >> On 2/15/12 6:27 PM, Dag Sverre Seljebotn wrote: >>> But in the very end, when agreement can't be reached by other >>> means, the developers are the one making the calls. (This is >>> simply a consequence that they are the only ones who can credibly >>> threaten to fork the project.) >> >> Interesting point. I hope I'm not pitching a log onto the fire >> here, but in numpy's case, there are very many capable developers >> on other projects who depend on numpy who could credibly threaten a >> fork if they felt numpy was drastically going wrong. > > Jason, that there capable developers out there that are able to fork > NumPy (or any other project you can realize) is a given. The point > Dag was signaling is that this threaten is more probable to happen > *inside* the community. Sure. Given numpy's status as a fundamental building block of many systems, though, if there was a perceived problem by downstream, it's more liable to be forked than most other projects that aren't so close to the headwaters. > > And you pointed out an important aspect too by saying "if they felt > numpy was drastically going wrong". It makes me the impression that > some people is very frightened about something really bad would > happen, well before it happens. While I agree that this is > *possible*, I'd also advocate to give Travis the benefit of doubt. > I'm convinced he (and Continuum as a whole) is making things happen > that will benefit the entire NumPy community; but in case something > gets really wrong and catastrophic, it is always a relief to know > that things can be reverted in the pure open source tradition (by > either doing a fork, creating a new foundation, or even better, > proposing a new way to do things). What it does not sound reasonable > to me is to allow fear to block Continuum efforts for making a better > NumPy. I think it is better to relax a bit, see how things are > going, and then judge by looking at the *results*. I'm really happy about Continuum. I agree with Mark that numpy certainly could use a few more core developers. I've not decided on how much structure I feel numpy governance needs (nor do I think it's particularly important for me to decide how I feel at this point on the subject). Jason From takowl at gmail.com Thu Feb 16 08:08:37 2012 From: takowl at gmail.com (Thomas Kluyver) Date: Thu, 16 Feb 2012 13:08:37 +0000 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: <4F3CF8D1.4040803@creativetrax.com> References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <4F3CE542.9040808@creativetrax.com> <4F3CF8D1.4040803@creativetrax.com> Message-ID: If I can chime in as a newcomer on this list: I don't think a conflict of interest is at all likely, but I can see the point of those saying that it's worth thinking about this while everything is going well. If any tension does arise, it will be all but impossible to decide on a fair governance structure, because everyone will root for the system that looks likely to produce their favoured outcome. It strikes me that the effort everyone's put into this thread could have by now designed some way to resolve disputes. ;-) It could be as simple as 'so-and-so gets to make the final call', through to committees, voting systems, etc. So long as everything's going well, it shouldn't restrict anyone, and it would reassure anyone who does have concerns (justified or not) about conflicts of interest. Thanks, Thomas From josef.pktd at gmail.com Thu Feb 16 08:14:40 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 16 Feb 2012 08:14:40 -0500 Subject: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842 In-Reply-To: References: Message-ID: On Thu, Feb 16, 2012 at 4:44 AM, Pauli Virtanen wrote: > Hi, > > 16.02.2012 06:09, josef.pktd at gmail.com kirjoitti: > [clip] >> numpy linalg.svd doesn't produce always the same results >> >> running this gives two different answers, >> using scipy.linalg.svd I always get the same answer, which is one of >> the numpy answers >> (numpy random.multivariate_normal is collateral damage) > > Are you using a Windows binary for Numpy compiled with the Intel > compilers, or maybe linked with Intel MKL? This was with the official numpy installer, compiled with MingW I just tried with 64 bit python 3.2 with MKL (Gohlke installer) and in several runs I always get the same answer. > > If yes, one possibility is that the exact sequence of floating point > operations in SVD or some other step in the calculation depends on the > data alignment, which can affect rounding error. > > See http://www.nccs.nasa.gov/images/FloatingPoint_consistency.pdf > > That would explain why the pattern you see is quasi-deterministic. The > other explanation would be using uninitialized memory at some point, but > that seems quite unlikely. Running the script on the commandline I always get several patterns, but running the script in the same process didn't converge to a unique pattern. We had other cases of several patterns in quasi-deterministic linalg before, but as far as I remember only in the final digits of precision, where it didn't matter much except for reducing test precision in my cases. In the random multivariate normal case in the ticket the differences are large, which makes them pretty unreliable and useless for reproducability. Josef > > -- > Pauli Virtanen > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From pav at iki.fi Thu Feb 16 08:45:39 2012 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 16 Feb 2012 14:45:39 +0100 Subject: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842 In-Reply-To: References: Message-ID: 16.02.2012 14:14, josef.pktd at gmail.com kirjoitti: [clip] > We had other cases of several patterns in quasi-deterministic linalg > before, but as far as I remember only in the final digits of > precision, where it didn't matter much except for reducing test > precision in my cases. > > In the random multivariate normal case in the ticket the differences > are large, which makes them pretty unreliable and useless for > reproducability. Now that I read your mail more carefully, the following piece of code indeed does not give reproducible results on Linux with ATLAS either: -------- import numpy as np from numpy.linalg import svd d = 10 alpha = 1 / d**0.5 mu = np.ones(d) R = alpha * np.ones((d, d)) + (1 - alpha) * np.eye(d) for i in range(10): u, s, vH = svd(R) print vH[-1,1], abs(u.dot(np.diag(s)).dot(vH)-R).max() print s ----------- Of course, the returned SVD decomposition *is* correct in all cases. The reason seems to be that the matrix has 9 coinciding singular values, and the (alignment-dependent) rounding error is sufficient to perturb the choice (or order?) of singular vectors. So, the algorithm used to generate multivariate normal random numbers is then actually numerically unstable, as it relies on the order of singular vectors returned by SVD. I'm not sure how to fix this. Maybe the vectors returned by SVD should be sorted if there are numerically close singular values. Just ensuring alignment of the input probably won't guarantee reproducibility across platforms. Please file a bug ticket, so this doesn't get forgotten... Pauli From josef.pktd at gmail.com Thu Feb 16 08:46:01 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 16 Feb 2012 08:46:01 -0500 Subject: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842 In-Reply-To: References: Message-ID: On Thu, Feb 16, 2012 at 8:14 AM, wrote: > On Thu, Feb 16, 2012 at 4:44 AM, Pauli Virtanen wrote: >> Hi, >> >> 16.02.2012 06:09, josef.pktd at gmail.com kirjoitti: >> [clip] >>> numpy linalg.svd doesn't produce always the same results >>> >>> running this gives two different answers, >>> using scipy.linalg.svd I always get the same answer, which is one of >>> the numpy answers >>> (numpy random.multivariate_normal is collateral damage) >> >> Are you using a Windows binary for Numpy compiled with the Intel >> compilers, or maybe linked with Intel MKL? > > This was with the official numpy installer, compiled with MingW > > I just tried with 64 bit python 3.2 with MKL (Gohlke installer) and in > several runs I always get the same answer. > >> >> If yes, one possibility is that the exact sequence of floating point >> operations in SVD or some other step in the calculation depends on the >> data alignment, which can affect rounding error. >> >> See http://www.nccs.nasa.gov/images/FloatingPoint_consistency.pdf >> >> That would explain why the pattern you see is quasi-deterministic. The >> other explanation would be using uninitialized memory at some point, but >> that seems quite unlikely. > > Running the script on the commandline I always get several patterns, > but running the script in the same process didn't converge to a unique > pattern. > > We had other cases of several patterns in quasi-deterministic linalg > before, but as far as I remember only in the final digits of > precision, where it didn't matter much except for reducing test > precision in my cases. > > In the random multivariate normal case in the ticket the differences > are large, which makes them pretty unreliable and useless for > reproducability. linalg question Is there anything special, or are there specific numerical problems with an svd when most singular values are the same ? The example has all random variables equal correlated singular values >>> s array([ 3.84604989, 0.68377223, 0.68377223, 0.68377223, 0.68377223, 0.68377223, 0.68377223, 0.68377223, 0.68377223, 0.68377223]) Josef > > Josef > >> >> -- >> Pauli Virtanen >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion From josef.pktd at gmail.com Thu Feb 16 08:54:50 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 16 Feb 2012 08:54:50 -0500 Subject: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842 In-Reply-To: References: Message-ID: On Thu, Feb 16, 2012 at 8:45 AM, Pauli Virtanen wrote: > 16.02.2012 14:14, josef.pktd at gmail.com kirjoitti: > [clip] >> We had other cases of several patterns in quasi-deterministic linalg >> before, but as far as I remember only in the final digits of >> precision, where it didn't matter much except for reducing test >> precision in my cases. >> >> In the random multivariate normal case in the ticket the differences >> are large, which makes them pretty unreliable and useless for >> reproducability. > > Now that I read your mail more carefully, the following piece of code > indeed does not give reproducible results on Linux with ATLAS either: > > -------- > import numpy as np > from numpy.linalg import svd > > d = 10 > alpha = 1 / d**0.5 > mu = np.ones(d) > R = alpha * np.ones((d, d)) + (1 - alpha) * np.eye(d) > > for i in range(10): > ? ?u, s, vH = svd(R) > ? ?print vH[-1,1], abs(u.dot(np.diag(s)).dot(vH)-R).max() > print s > ----------- > > Of course, the returned SVD decomposition *is* correct in all cases. > > The reason seems to be that the matrix has 9 coinciding singular values, > and the (alignment-dependent) rounding error is sufficient to perturb > the choice (or order?) of singular vectors. > > So, the algorithm used to generate multivariate normal random numbers is > then actually numerically unstable, as it relies on the order of > singular vectors returned by SVD. > > I'm not sure how to fix this. Maybe the vectors returned by SVD should > be sorted if there are numerically close singular values. Just ensuring > alignment of the input probably won't guarantee reproducibility across > platforms. > > Please file a bug ticket, so this doesn't get forgotten... the multivariate normal case is already http://projects.scipy.org/numpy/ticket/1842 I can add the diagnosis. If I interpret you correctly, this should be a svd ticket, or an svd ticket as "duplicate" ? Thanks, Josef > > ? ? ? ?Pauli > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From scott.sinclair.za at gmail.com Thu Feb 16 09:06:26 2012 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Thu, 16 Feb 2012 16:06:26 +0200 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <4F3CE542.9040808@creativetrax.com> <4F3CF8D1.4040803@creativetrax.com> Message-ID: On 16 February 2012 15:08, Thomas Kluyver wrote: > It strikes me that the effort everyone's put into this thread could > have by now designed some way to resolve disputes. ;-) This is not intended to downplay the concerns raised in this thread, but I can't help myself. I propose the following (tongue-in-cheek) patch against the current numpy master branch. https://github.com/scottza/numpy/compare/constitution If this gets enough interest, I'll consider submitting a "real" pull request ;-) Cheers, Scott From pav at iki.fi Thu Feb 16 09:08:46 2012 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 16 Feb 2012 15:08:46 +0100 Subject: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842 In-Reply-To: References: Message-ID: 16.02.2012 14:54, josef.pktd at gmail.com kirjoitti: [clip] > If I interpret you correctly, this should be a svd ticket, or an svd > ticket as "duplicate" ? I think it should be a multivariate normal ticket. "Fixing" SVD is in my opinion not sensible: its only guarantee is that A = U S V^H down to numerical precision and S are sorted. If the algorithm assumes something extra, it is wrong. This sort of reproducibility issues affect potentially all code (depends on the compiler and libraries used), and trying to combat it at the linalg level is IMHO not our business --- if someone really wants it, they should tell their C compiler and all libraries to use a reproducible FP model. However, we should ensure the algorithms we provide are stable against rounding error. In this case, the random number generation is not, so it should be fixed. Pauli From jason-sage at creativetrax.com Thu Feb 16 09:45:09 2012 From: jason-sage at creativetrax.com (Jason Grout) Date: Thu, 16 Feb 2012 08:45:09 -0600 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <4F3CE542.9040808@creativetrax.com> <4F3CF8D1.4040803@creativetrax.com> Message-ID: <4F3D1675.4040107@creativetrax.com> On 2/16/12 8:06 AM, Scott Sinclair wrote: > On 16 February 2012 15:08, Thomas Kluyver wrote: >> It strikes me that the effort everyone's put into this thread could >> have by now designed some way to resolve disputes. ;-) > > This is not intended to downplay the concerns raised in this thread, > but I can't help myself. > > I propose the following (tongue-in-cheek) patch against the current > numpy master branch. > > https://github.com/scottza/numpy/compare/constitution > > If this gets enough interest, I'll consider submitting a "real" pull request ;-) Time to start submitting lots of 1-line commits and typo fixes to pad my karma :). Jason From pwang at streamitive.com Thu Feb 16 10:17:53 2012 From: pwang at streamitive.com (Peter Wang) Date: Thu, 16 Feb 2012 09:17:53 -0600 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <4F3C987E.4060005@astro.uio.no> Message-ID: <448F2982-602C-4D41-838B-A5ED171779E6@streamitive.com> On Feb 16, 2012, at 12:08 AM, Matthew Brett wrote: >> The question is more about what can possibly be done about it. To really >> shift power, my hunch is that the only practical way would be to, like >> Mark said, make sure there are very active non-Continuum-employed >> developers. But perhaps I'm wrong. > > It's not obvious to me that there isn't a set of guidelines, > procedures, structures that would help to keep things clear in this > situation. Matthew, I think this is the crux of the issue. There are two kinds of disagreements which could polarize Numpy development: disagreements over vision/values, and disagreements over implementation. The latter can be (and has been) resolved in an ad-hoc fashion because we are all consenting adults here, and as long as there is a consensus about the shared values (i.e. long-term vision) of the project, we can usually work something out. Disagreements over values and long-term vision are the ones that actually do split developer communities, and which procedural guidelines are really quite poor at resolving. In the realm of open source software, value differences (most commonly, licensing disagreements) generally manifest as forks, regardless of what governance may be in place. At the end of the day, you cannot compel people to continue committing to a project that they feel is going the *wrong direction*, not merely the right direction in the wrong way. In the physical world, where we are forced to share geographic space with people who may have vastly different values, it is useful to have a framework for resolution of value differences, because a fork attempt usually means physical warfare. Hence, constitutions, checks & balances, impeachment procedures, etc. are all there to avoid forking. But with software, forks are not so costly, and not always a bad thing. Numpy itself arose from merging Numeric and its fork, Numarray, and X.org and EGCS are examples of big forks of major projects which later became the mainline trunk. In short, even if you *could* put governance in place to prevent a fork, that's not always a Good Thing. Creative destruction is vital to the health of any organism or ecosystem, because that is how evolution frequently achieves its greatest results. Of course, this is not to say that I have any desire to see Numpy forked. What I *do* desire is a modular, extensible core of Numpy will allow the experimentation and creative destruction to occur, while minimizing the merge effort when people realize that someone cool has been developed. Lowering the barrier to entry for hacking on the core array code is not merely for Continuum's benefit, but rather will benefit the ecosystem as a whole. No matter how one feels about the potential conflicts of interest, I think we can all agree that the alternative of stagnation is far, far worse. The only way to avoid stagnation is to give the hackers and rebels plenty of room to play, while ensuring a stable base platform for end users and downstream projects to avoid code churn. Travis's and Mark's roadmap proposals for creating a modular core and an extensible C-level ABI are a key technical mechanism for achieving this. Ultimately, procedures and guidelines are only a means to an end, not an ends unto themselves. Curiously enough, I have not yet seen anyone articulate the desire for those *ends* themselves to be written down or manifest as a document. Now, if the Numpy developers want to produce a "vision document" or "values statement" for the project, I think that would help as a reference point for any potential disagreements over the direction of the project as commercial stakeholders become involved. But, of course, the request for such a document is itself an unfunded mandate, so it's perfectly possible we may get a one-liner like "make Python scientific computing awesome." :-) -Peter Disclaimer: I work with Travis at Continuum. From josef.pktd at gmail.com Thu Feb 16 10:20:48 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 16 Feb 2012 10:20:48 -0500 Subject: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842 In-Reply-To: References: Message-ID: On Thu, Feb 16, 2012 at 9:08 AM, Pauli Virtanen wrote: > 16.02.2012 14:54, josef.pktd at gmail.com kirjoitti: > [clip] >> If I interpret you correctly, this should be a svd ticket, or an svd >> ticket as "duplicate" ? > > I think it should be a multivariate normal ticket. > > "Fixing" SVD is in my opinion not sensible: its only guarantee is that A > = U S V^H down to numerical precision and S are sorted. If the algorithm > assumes something extra, it is wrong. This sort of reproducibility > issues affect potentially all code (depends on the compiler and > libraries used), and trying to combat it at the linalg level is IMHO not > our business --- if someone really wants it, they should tell their C > compiler and all libraries to use a reproducible FP model. I agree, I added the comments to the ticket. > > However, we should ensure the algorithms we provide are stable against > rounding error. In this case, the random number generation is not, so it > should be fixed. storing the last column of v vli = [] for i in range(10): (u,s,v) = svd(R) print('v[:,-1]') print(v[:,-4:]) vli.append(v[:, -1]) >>> np.unique([tuple(vv.tolist()) for vv in vli]) array([[-0.31622777, -0.11785113, 0.08706383, 0.42953906, 0.75736963, -0.31048693, -0.01693654, 0.10328164, -0.04417299, -0.10540926], [-0.31622777, -0.03661979, 0.61237244, -0.15302481, 0.0664198 , 0.11341968, 0.38265194, 0.51112292, -0.10540926, 0.25335061]]) The different v are not just a reordering of each other. If my linear algebra is correct, then the algorithm provides different basis vectors for the subspace with identical singular values. I don't see any way to fix multivariate_normal for this case, except for dropping svd or for random perturbing a covariance matrix with multiplicity of singular values. Josef > > ? ? ? ?Pauli > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From bsouthey at gmail.com Thu Feb 16 10:31:41 2012 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 16 Feb 2012 09:31:41 -0600 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <4F3CE542.9040808@creativetrax.com> <4F3CF8D1.4040803@creativetrax.com> Message-ID: <4F3D215D.9010504@gmail.com> On 02/16/2012 08:06 AM, Scott Sinclair wrote: > On 16 February 2012 15:08, Thomas Kluyver wrote: >> It strikes me that the effort everyone's put into this thread could >> have by now designed some way to resolve disputes. ;-) > This is not intended to downplay the concerns raised in this thread, > but I can't help myself. > > I propose the following (tongue-in-cheek) patch against the current > numpy master branch. > > https://github.com/scottza/numpy/compare/constitution > > If this gets enough interest, I'll consider submitting a "real" pull request ;-) > > Cheers, > Scott > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Now that is totally disrespectful and just plain ignorant! Not to mention the inability to count people correctly. Yes, 'you pushed my button' so to speak. As I understand it, all the pre-git history just contains the information of the person who actually committed the change into the numpy trunk. It does not hold any information of Numeric and the history of numarray so I really question the accuracy of the counting. Also it misses many of the 'user' patches that lead to those changes (perhaps these user-patches are now in git). The second aspect is time frame as you do get a very different list if you just restrict it to 'current developers' eg adding '--since="1 year ago". It is disrespectful because many of the heated discussions are not about code per se but about the design and expected behavior. Counting commits or lines will never tell you any of those things. So I do agree with David's suggestion. Bruce From pierre.haessig at crans.org Thu Feb 16 11:12:36 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Thu, 16 Feb 2012 17:12:36 +0100 Subject: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842 In-Reply-To: References: Message-ID: <4F3D2AF4.5050300@crans.org> Le 16/02/2012 16:20, josef.pktd at gmail.com a ?crit : > I don't see any way to fix multivariate_normal for this case, except > for dropping svd or for random perturbing a covariance matrix with > multiplicity of singular values. Hi, I just made a quick search in what R guys are doing. It happens there are several codes (http://cran.r-project.org/web/views/Multivariate.html ). For instance, mvtnorm (http://cran.r-project.org/web/packages/mvtnorm/index.html). I've attached the related function from the source code of this package. Interestingly enough, it seems they provide 3 different methods (svd, eigen values, and Cholesky). I don't have the time now to dive in the assessments of pros and cons of those three. Maybe one works for our problem, but I didn't check yet. Pierre -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: mvnorm_from_mvtnorm.R URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: mvrnorm_from_MASS.R URL: From warren.weckesser at enthought.com Thu Feb 16 11:20:19 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Thu, 16 Feb 2012 10:20:19 -0600 Subject: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842 In-Reply-To: <4F3D2AF4.5050300@crans.org> References: <4F3D2AF4.5050300@crans.org> Message-ID: On Thu, Feb 16, 2012 at 10:12 AM, Pierre Haessig wrote: > Le 16/02/2012 16:20, josef.pktd at gmail.com a ?crit : > > I don't see any way to fix multivariate_normal for this case, except >> for dropping svd or for random perturbing a covariance matrix with >> multiplicity of singular values. >> > Hi, > I just made a quick search in what R guys are doing. It happens there are > several codes (http://cran.r-project.org/**web/views/Multivariate.html). For instance, mvtnorm ( > http://cran.r-project.org/**web/packages/mvtnorm/index.**html). > I've attached the related function from the source code of this package. > > Interestingly enough, it seems they provide 3 different methods (svd, > eigen values, and Cholesky). > I don't have the time now to dive in the assessments of pros and cons of > those three. Maybe one works for our problem, but I didn't check yet. > > Pierre > > For some alternatives to numpy's multivariate_normal, see http://www.scipy.org/Cookbook/CorrelatedRandomSamples. Both versions (Cholesky and eigh) are just a couple lines of code. Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Thu Feb 16 11:29:39 2012 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 16 Feb 2012 16:29:39 +0000 Subject: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842 In-Reply-To: <4F3D2AF4.5050300@crans.org> References: <4F3D2AF4.5050300@crans.org> Message-ID: On Thu, Feb 16, 2012 at 16:12, Pierre Haessig wrote: > Le 16/02/2012 16:20, josef.pktd at gmail.com a ?crit : > >> I don't see any way to fix multivariate_normal for this case, except >> for dropping svd or for random perturbing a covariance matrix with >> multiplicity of singular values. > > Hi, > I just made a quick search in what R guys are doing. It happens there are > several codes (http://cran.r-project.org/web/views/Multivariate.html ). For > instance, mvtnorm > (http://cran.r-project.org/web/packages/mvtnorm/index.html). I've attached > the related function from the source code of this package. > > Interestingly enough, it seems they provide 3 different methods (svd, eigen > values, and Cholesky). > I don't have the time now to dive in the assessments of pros and cons of > those three. Maybe one works for our problem, but I didn't check yet. The main reason I used the SVD variant is because the Cholesky decomposition failed on some covariance matrices that were nearly not positive definite (i.e. had a nearly-0 eigenvalue). In the application that I extracted this code from, this was a valid thing to do; the deviates just inhabit an infinitely thin subspace of the main space, but are otherwise multivariate-normally-distributed in that subspace. I'm not too attached to the semantics. We should check that the Cholesky decomposition is stable before switching, though. The eigenvalue algorithm probably suffers from instability just as much as the SVD one. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From josef.pktd at gmail.com Thu Feb 16 11:30:31 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 16 Feb 2012 11:30:31 -0500 Subject: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842 In-Reply-To: References: <4F3D2AF4.5050300@crans.org> Message-ID: On Thu, Feb 16, 2012 at 11:20 AM, Warren Weckesser wrote: > > > On Thu, Feb 16, 2012 at 10:12 AM, Pierre Haessig > wrote: >> >> Le 16/02/2012 16:20, josef.pktd at gmail.com a ?crit : >> >>> I don't see any way to fix multivariate_normal for this case, except >>> for dropping svd or for random perturbing a covariance matrix with >>> multiplicity of singular values. >> >> Hi, >> I just made a quick search in what R guys are doing. It happens there are >> several codes (http://cran.r-project.org/web/views/Multivariate.html ). For >> instance, mvtnorm >> (http://cran.r-project.org/web/packages/mvtnorm/index.html). I've attached >> the related function from the source code of this package. >> >> Interestingly enough, it seems they provide 3 different methods (svd, >> eigen values, and Cholesky). >> I don't have the time now to dive in the assessments of pros and cons of >> those three. Maybe one works for our problem, but I didn't check yet. >> >> Pierre >> > > > For some alternatives to numpy's multivariate_normal, see > http://www.scipy.org/Cookbook/CorrelatedRandomSamples.? Both versions > (Cholesky and eigh) are just a couple lines of code. Thanks both, The main point is that it is a "Needs decision" Robert argued several times on the mailing list why he chose svd. (with svd covariance can be closer to singular then with cholesky) In statsmodels we usually just use Cholesky for similar transformation, and I use occasionally an eigh version. (I need to look up the thread but I got puzzled about results with eig and multiplicity of eigenvalues before.) The R code is GPL, but the few lines of code look standard without any special provision for non-deterministic linear algebra. If multivariate_normal switches from svd to cholesky or eigh, we still need to check that we don't run into similar "determinacy" problems with numpy's linalg (I think in statsmodels we use mostly scipy, so I don't know.) Josef > > Warren > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Thu Feb 16 11:47:51 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 16 Feb 2012 11:47:51 -0500 Subject: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842 In-Reply-To: References: <4F3D2AF4.5050300@crans.org> Message-ID: On Thu, Feb 16, 2012 at 11:30 AM, wrote: > On Thu, Feb 16, 2012 at 11:20 AM, Warren Weckesser > wrote: >> >> >> On Thu, Feb 16, 2012 at 10:12 AM, Pierre Haessig >> wrote: >>> >>> Le 16/02/2012 16:20, josef.pktd at gmail.com a ?crit : >>> >>>> I don't see any way to fix multivariate_normal for this case, except >>>> for dropping svd or for random perturbing a covariance matrix with >>>> multiplicity of singular values. >>> >>> Hi, >>> I just made a quick search in what R guys are doing. It happens there are >>> several codes (http://cran.r-project.org/web/views/Multivariate.html ). For >>> instance, mvtnorm >>> (http://cran.r-project.org/web/packages/mvtnorm/index.html). I've attached >>> the related function from the source code of this package. >>> >>> Interestingly enough, it seems they provide 3 different methods (svd, >>> eigen values, and Cholesky). >>> I don't have the time now to dive in the assessments of pros and cons of >>> those three. Maybe one works for our problem, but I didn't check yet. >>> >>> Pierre >>> >> >> >> For some alternatives to numpy's multivariate_normal, see >> http://www.scipy.org/Cookbook/CorrelatedRandomSamples.? Both versions >> (Cholesky and eigh) are just a couple lines of code. > > Thanks both, > > The main point is that it is a "Needs decision" > > Robert argued several times on the mailing list why he chose svd. > (with svd covariance can be closer to singular then with cholesky) > > In statsmodels we usually just use Cholesky for similar > transformation, and I use occasionally an eigh version. (I need to > look up the thread but I got puzzled about results with eig and > multiplicity of eigenvalues before.) > > The R code is GPL, but the few lines of code look standard without any > special provision for non-deterministic linear algebra. > > If multivariate_normal switches from svd to cholesky or eigh, we still > need to check that we don't run into similar "determinacy" problems > with numpy's linalg (I think in statsmodels we use mostly scipy, so I > don't know.) np.linalg.eigh always produces the same eigenvectors, both running repeatedly in the same session and running the script several times on the command line. so eigh looks good as alternative to svd for this case, I don't know if we buy numerical problems in other corner cases, but for near singularity it's always possible to check the smallest eigenvalue Josef > > Josef > >> >> Warren >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> From njs at pobox.com Thu Feb 16 11:56:39 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 16 Feb 2012 16:56:39 +0000 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: <4F3C4D58.6050007@astro.uio.no> References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> Message-ID: On Thu, Feb 16, 2012 at 12:27 AM, Dag Sverre Seljebotn wrote: > If non-contributing users came along on the Cython list demanding that > we set up a system to select non-developers along on a board that would > have discussions in order to veto pull requests, I don't know whether > we'd ignore it or ridicule it or try to show some patience, but we > certainly wouldn't take it seriously. I'm not really worried about the Continuum having some nefarious "corporate" intent. But I am worried about how these plans will affect numpy, and I think there serious risks if we don't think about process. Money has a dramatic effect on FOSS development, and not always in a positive way, even when -- or *especially* when -- everyone has the best of intentions. I'm actually *more* worried about altruistic full-time developers doing work on behalf of the community than I am about developers who are working strictly in some company's interests. Finding a good design for software is like a nasty optimization problem -- it's easy to get stuck in local maxima, and any one person has only an imperfect, noisy estimate of the objective function. So you need lots of eyes to catch mistakes, filter out the noise, and explore multiple maxima in parallel. The classic FOSS model of volunteer developers who are in charge of project direction does a *great* job of solving this problem. (Linux beat all the classic Unixen on technical quality, and it did it using college students and volunteers -- it's not like Sun, IBM, HP etc. couldn't afford better engineers! But they still lost.) Volunteers are intimately familiar with the itch they're trying to scratch and the trade-offs involved in doing so, and they need to work together to produce anything major, so you get lots of different, high-quality perspectives to help you figure out which approach is best. Developers who are working for some corporate interest alter this balance, because in a "do-ocracy", someone who can throw a few full-time developers at something suddenly is suddenly has effectively complete control over project direction. There's no moral problem here when the "dictator" is benevolent, but suddenly you have an informational bottleneck -- even benevolent dictators make mistakes, and they certainly aren't omniscient. Even this isn't *so* bad though, so long as the corporation is scratching their own itch -- at least you can be pretty sure that whatever they produce will at least make them happy, which implies a certain level of utility. The riskiest case is paying developers to scratch someone else's itch. IIUC, that's a major goal of Travis's here, to find a way to pay developers to make numpy better for everyone. But, now you need some way for the community to figure out what "better" means, because the developers themselves don't necessarily know. It's not their itch anymore. Running a poll or whatever might be a nice start, but we all know how tough it is to extract useful design information from users. You need a lot more than that if you want to keep the quality up. Travis's proposal is that we go from a large number of self-selecting people putting in little bits of time to a small number of designated people putting in lots of time. There's a major win in terms of total effort, but you inevitably lose a lot of diversity of viewpoints. My feeling is it will only be a net win if the new employees put serious, bend-over-backwards effort into taking advantage of the volunteer community's wisdom. This is why the NA discussion seems so relevant to me here -- everyone involved absolutely had good intentions, excellent skills, etc., and yet the outcome is still a huge unresolved mess. It was supposed to make numpy more attractive for a certain set of applications, like statistical analysis, where R is currently preferred. Instead, there have been massive changes merged into numpy mainline, but most of the intended "target market" for these changes is indifferent to them; they don't solve the problem they're supposed to. And along the way we've not just spent a bunch of Enthought's money, but also wasted dozens of hours of volunteer time while seriously alienating some of numpy's most dedicated advocates in that "target market". We could debate about blame, and I'm sure there's plenty to spread around, but I also think the fundamental problem isn't one of blame at all -- it's that Mark, Charles and Travis *aren't* scratching an itch; AFAICT the NA functionality is not something they actually need themselves. Which means they're fighting uphill when trying to find the best solutions, and haven't managed it yet. And were working on a deadline, to boot. > It's obvious that one should try for consensus as long as possible, > including listening to users. But in the very end, when agreement can't > be reached by other means, the developers are the one making the calls. > (This is simply a consequence that they are the only ones who can > credibly threaten to fork the project.) > > Sure, structures that includes users in the process could be useful... > but, if the devs are fine with the current situation (and I don't see > Mark or Charles complaining), then I honestly think it is quite rude to > not let the matter drop after the first ten posts or so. I'm not convinced we need a formal governing body, but I think we really, really need a community norm that takes consensus *very* seriously. That principle is more important than who exactly enforces it. I guess people are worried about that turning into obstructionism or something, but seriously, this is a practical approach that works well for lots of real actual successful FOSS projects. I think it's also worth distinguishing between "users" and "developers who happen not to be numpy core developers". There are lots of experienced and skilled developers who spend their time on, say, scipy or nipy or whatever, just because numpy already works for them. That doesn't mean they don't have valuable insights or a stake in how numpy develops going forward! IMHO, everyone who can credibly participate in the technical discussion should have a veto -- and should almost never use it. And yes, that means volunteers should be able to screw up corporate schedules if that's what's best for numpy-the-project. And, to be clear, I'm not saying that random list-members somehow *deserve* to screw around with generous corporate endowments; I'm saying that the people running the corporation are going to be a lot happier in the long run if they impose this rule on themselves. -- Nathaniel From njs at pobox.com Thu Feb 16 12:00:29 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 16 Feb 2012 17:00:29 +0000 Subject: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842 In-Reply-To: References: Message-ID: On Thu, Feb 16, 2012 at 2:08 PM, Pauli Virtanen wrote: > 16.02.2012 14:54, josef.pktd at gmail.com kirjoitti: > [clip] >> If I interpret you correctly, this should be a svd ticket, or an svd >> ticket as "duplicate" ? > > I think it should be a multivariate normal ticket. > > "Fixing" SVD is in my opinion not sensible: its only guarantee is that A > = U S V^H down to numerical precision and S are sorted. I agree, but the behavior is still surprising -- people reasonably expect something like svd to be deterministic. So there's probably a doc bug for alerting people that their reasonable expectation is, in fact, wrong :-). -- Nathaniel From josef.pktd at gmail.com Thu Feb 16 12:07:58 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 16 Feb 2012 12:07:58 -0500 Subject: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842 In-Reply-To: References: <4F3D2AF4.5050300@crans.org> Message-ID: On Thu, Feb 16, 2012 at 11:47 AM, wrote: > On Thu, Feb 16, 2012 at 11:30 AM, ? wrote: >> On Thu, Feb 16, 2012 at 11:20 AM, Warren Weckesser >> wrote: >>> >>> >>> On Thu, Feb 16, 2012 at 10:12 AM, Pierre Haessig >>> wrote: >>>> >>>> Le 16/02/2012 16:20, josef.pktd at gmail.com a ?crit : >>>> >>>>> I don't see any way to fix multivariate_normal for this case, except >>>>> for dropping svd or for random perturbing a covariance matrix with >>>>> multiplicity of singular values. >>>> >>>> Hi, >>>> I just made a quick search in what R guys are doing. It happens there are >>>> several codes (http://cran.r-project.org/web/views/Multivariate.html ). For >>>> instance, mvtnorm >>>> (http://cran.r-project.org/web/packages/mvtnorm/index.html). I've attached >>>> the related function from the source code of this package. >>>> >>>> Interestingly enough, it seems they provide 3 different methods (svd, >>>> eigen values, and Cholesky). >>>> I don't have the time now to dive in the assessments of pros and cons of >>>> those three. Maybe one works for our problem, but I didn't check yet. >>>> >>>> Pierre >>>> >>> >>> >>> For some alternatives to numpy's multivariate_normal, see >>> http://www.scipy.org/Cookbook/CorrelatedRandomSamples.? Both versions >>> (Cholesky and eigh) are just a couple lines of code. >> >> Thanks both, >> >> The main point is that it is a "Needs decision" >> >> Robert argued several times on the mailing list why he chose svd. >> (with svd covariance can be closer to singular then with cholesky) >> >> In statsmodels we usually just use Cholesky for similar >> transformation, and I use occasionally an eigh version. (I need to >> look up the thread but I got puzzled about results with eig and >> multiplicity of eigenvalues before.) >> >> The R code is GPL, but the few lines of code look standard without any >> special provision for non-deterministic linear algebra. >> >> If multivariate_normal switches from svd to cholesky or eigh, we still >> need to check that we don't run into similar "determinacy" problems >> with numpy's linalg (I think in statsmodels we use mostly scipy, so I >> don't know.) > > np.linalg.eigh always produces the same eigenvectors, both running > repeatedly in the same session and running the script several times on > the command line. > > so eigh looks good as alternative to svd for this case, I don't know > if we buy numerical problems in other corner cases, but for near > singularity it's always possible to check the smallest eigenvalue cholesky is also deterministic in my runs What I would suggest is to use cholesky first, catch the singular exception and then use eigh. With eigh we would get perfectly correlated random variables. Again if my reading of linalg comments is correct, cholesky is the fastest way to detect singularity of a matrix, and is faster then eigh in the non-singular case. I have no idea if there is an almost singular case, where cholesky fails, but the current svd would produce a not perfectly correlated random sample (up to numerical precision). (Alternative, which I don't think I like so much, is to use a small Ridge correction (multiply diagonal by 1 + x*nulp ? This would bound it away from perfect correlation, I guess.) Josef > > Josef > >> >> Josef >> >>> >>> Warren >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> From wfspotz at sandia.gov Thu Feb 16 12:09:36 2012 From: wfspotz at sandia.gov (Spotz, William F) Date: Thu, 16 Feb 2012 17:09:36 +0000 Subject: [Numpy-discussion] Strange PyArray_FromObject() behavior Message-ID: I have a user who is reporting tests that are failing on his platform. I have not been able to reproduce the error on my system, but working with him, we have isolated the problem to unexpected results when PyArray_FromObject() is called. Here is the chain of events: In python, an integer is calculated. Specifically, it is len(result.errors) + len(result.failures) where result is a unit test result object from the unittest module. I had him verify that this value was in fact a python integer. In my extension module, this PyObject gets passed to the PyArray_FromObject() function in a routine that comes from numpy.i. What I expect, and what I typically get, is a numpy scalar array of type C long. I had my user print the result using PyObject_Print() and what he got was array([0:00:00], dtype=timedelta64[us]) I am stuck as to why this might be happening. Any ideas? Thanks ** Bill Spotz ** ** Sandia National Laboratories Voice: (505)845-0170 ** ** P.O. Box 5800 Fax: (505)284-0154 ** ** Albuquerque, NM 87185-0370 Email: wfspotz at sandia.gov ** -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Thu Feb 16 12:13:40 2012 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 16 Feb 2012 17:13:40 +0000 Subject: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842 In-Reply-To: References: <4F3D2AF4.5050300@crans.org> Message-ID: On Thu, Feb 16, 2012 at 17:07, wrote: > cholesky is also deterministic in my runs We will need to check a variety of builds with different LAPACK libraries and also different matrix sizes to be sure. Alas! -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From travis at vaught.net Thu Feb 16 12:17:14 2012 From: travis at vaught.net (Travis Vaught) Date: Thu, 16 Feb 2012 11:17:14 -0600 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> Message-ID: <3542A375-B1AD-44D9-8E10-BB7D748C51E2@vaught.net> On Feb 16, 2012, at 10:56 AM, Nathaniel Smith wrote: > Travis's proposal is that we go from a large number of self-selecting > people putting in little bits of time to a small number of designated > people putting in lots of time. That's not what Travis, or anyone else, proposed. Travis V. -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Feb 16 12:18:12 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 16 Feb 2012 10:18:12 -0700 Subject: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842 In-Reply-To: References: <4F3D2AF4.5050300@crans.org> Message-ID: On Thu, Feb 16, 2012 at 10:07 AM, wrote: > On Thu, Feb 16, 2012 at 11:47 AM, wrote: > > On Thu, Feb 16, 2012 at 11:30 AM, wrote: > >> On Thu, Feb 16, 2012 at 11:20 AM, Warren Weckesser > >> wrote: > >>> > >>> > >>> On Thu, Feb 16, 2012 at 10:12 AM, Pierre Haessig < > pierre.haessig at crans.org> > >>> wrote: > >>>> > >>>> Le 16/02/2012 16:20, josef.pktd at gmail.com a ?crit : > >>>> > >>>>> I don't see any way to fix multivariate_normal for this case, except > >>>>> for dropping svd or for random perturbing a covariance matrix with > >>>>> multiplicity of singular values. > >>>> > >>>> Hi, > >>>> I just made a quick search in what R guys are doing. It happens there > are > >>>> several codes (http://cran.r-project.org/web/views/Multivariate.html). For > >>>> instance, mvtnorm > >>>> (http://cran.r-project.org/web/packages/mvtnorm/index.html). I've > attached > >>>> the related function from the source code of this package. > >>>> > >>>> Interestingly enough, it seems they provide 3 different methods (svd, > >>>> eigen values, and Cholesky). > >>>> I don't have the time now to dive in the assessments of pros and cons > of > >>>> those three. Maybe one works for our problem, but I didn't check yet. > >>>> > >>>> Pierre > >>>> > >>> > >>> > >>> For some alternatives to numpy's multivariate_normal, see > >>> http://www.scipy.org/Cookbook/CorrelatedRandomSamples. Both versions > >>> (Cholesky and eigh) are just a couple lines of code. > >> > >> Thanks both, > >> > >> The main point is that it is a "Needs decision" > >> > >> Robert argued several times on the mailing list why he chose svd. > >> (with svd covariance can be closer to singular then with cholesky) > >> > >> In statsmodels we usually just use Cholesky for similar > >> transformation, and I use occasionally an eigh version. (I need to > >> look up the thread but I got puzzled about results with eig and > >> multiplicity of eigenvalues before.) > >> > >> The R code is GPL, but the few lines of code look standard without any > >> special provision for non-deterministic linear algebra. > >> > >> If multivariate_normal switches from svd to cholesky or eigh, we still > >> need to check that we don't run into similar "determinacy" problems > >> with numpy's linalg (I think in statsmodels we use mostly scipy, so I > >> don't know.) > > > > np.linalg.eigh always produces the same eigenvectors, both running > > repeatedly in the same session and running the script several times on > > the command line. > > > > so eigh looks good as alternative to svd for this case, I don't know > > if we buy numerical problems in other corner cases, but for near > > singularity it's always possible to check the smallest eigenvalue > > cholesky is also deterministic in my runs > > What I would suggest is to use cholesky first, catch the singular > exception and then use eigh. With eigh we would get perfectly > correlated random variables. > > Again if my reading of linalg comments is correct, cholesky is the > fastest way to detect singularity of a matrix, and is faster then eigh > in the non-singular case. > > I have no idea if there is an almost singular case, where cholesky > fails, but the current svd would produce a not perfectly correlated > random sample (up to numerical precision). > > (Alternative, which I don't think I like so much, is to use a small > Ridge correction (multiply diagonal by 1 + x*nulp ? This would bound > it away from perfect correlation, I guess.) > > Cholesky doesn't do any reordering of the matrix, but proceeds downward factoring row by row, so to speak. It is Gauss elimination without row pivoting and is only stable when the matrix is positive definite. Fortunately, failure of positive definiteness shows up an attempt to take the real square root of a negative number, so is detected. The problem with svd is that the singular values are always non-negative, hence the resulting factorization isn't always of the form R^T * R, which is easily understood because that form is necessarily non-negative definite. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Thu Feb 16 12:20:38 2012 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 16 Feb 2012 18:20:38 +0100 Subject: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842 In-Reply-To: References: Message-ID: Hi, 16.02.2012 18:00, Nathaniel Smith kirjoitti: [clip] > I agree, but the behavior is still surprising -- people reasonably > expect something like svd to be deterministic. So there's probably a > doc bug for alerting people that their reasonable expectation is, in > fact, wrong :-). The problem here is that these warnings should in principle appear in the documentation of every numerical algorithm that contains branches chosen on the basis of floating point data. For example, optimization algorithms --- they terminate after a tolerance is satisfied, and so the results can contain similar quasi-random error much larger than the rounding error, tol > |err| >> eps. Floating point sucks, it's full of gotchas for all ages :( Something like a FAQ could be good place to answer this, alongside more basic floating point questions. Pauli From chris.barker at noaa.gov Thu Feb 16 12:23:30 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 16 Feb 2012 09:23:30 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: <6C69E7675B43074B8AD0C1106661B9B1036CCC836E@NIHMLBX10.nih.gov> References: <4F3BB853.9090100@gmail.com> <6C69E7675B43074B8AD0C1106661B9B1036CCC836E@NIHMLBX10.nih.gov> Message-ID: On Wed, Feb 15, 2012 at 11:23 AM, Inati, Souheil (NIH/NIMH) [E] >?As great and trustworthy as Travis is, there is a very real > potential for conflict of interest here. He is going to be leading an > organization to raise and distribute funding and at the same time > leading a commercial for profit enterprise that would apply to this > foundation for funds, as well as being a major player in the > direction of the open source project that his company is building > on. > > This is not in and of itself a problem, but the boundaries have to > be very clear and laid out in advance. I disagree here -- a business that contributes to an Open-Source project is really no different than an individual that contributes -- it (or he or she) contributes because it sees a benefit -- that could be financial, that could be just for fun, whatever. Sometime individuals get paid to contribute, sometimes companies do -- why is there a difference? To be personal about it -- Continuum writing a bunch of numpy code will be no different than when Travis personally on his own time wrote bunch of numpy code -- and I think we all agree that the project and the community is very grateful he did that when he did. If anyone (company or individual) goes off on their own and writes a bunch of code that community doesn't embrace, then we have a fork -- sometimes that for the better, but for the most part, I think everyone involved does not want to see that happen -- and I think there is a general consensus that a more formal governing stucture is a good idea, in part to prevent that. HOwever -- it's still an open source project -- no onde (or institution) can tell anyone else what to do or how to do it, the the project will be moved forward by those that actually do stuff: - write core code - write supporting code - document stuff - test stuff - package stuff - contribute tech support on the list - contribute to the conversion about development issues - ....... and yes -- actually getting around to forming a foundation, or securing funding, or other institutional activities. So while it may seem like a small group of people kind of went off on their own to form a foundation -- that's the only way things ever get done on an open-source project! There may or may not be a lot of discussion about something first, but it only gets down when someone sits down and does it. So Bravo for moving the project forward! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From charlesr.harris at gmail.com Thu Feb 16 12:35:37 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 16 Feb 2012 10:35:37 -0700 Subject: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842 In-Reply-To: References: Message-ID: On Thu, Feb 16, 2012 at 10:20 AM, Pauli Virtanen wrote: > Hi, > > 16.02.2012 18:00, Nathaniel Smith kirjoitti: > [clip] > > I agree, but the behavior is still surprising -- people reasonably > > expect something like svd to be deterministic. So there's probably a > > doc bug for alerting people that their reasonable expectation is, in > > fact, wrong :-). > > The problem here is that these warnings should in principle appear in > the documentation of every numerical algorithm that contains branches > chosen on the basis of floating point data. For example, optimization > algorithms --- they terminate after a tolerance is satisfied, and so the > results can contain similar quasi-random error much larger than the > rounding error, tol > |err| >> eps. > > Floating point sucks, it's full of gotchas for all ages :( > > :). I believe that was one of the reasons that Von Neumann thought everyone should work with integers and scaling factors. But he lost that battle to the sheer convenience of floating point. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Feb 16 12:38:38 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 16 Feb 2012 09:38:38 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> Message-ID: On Wed, Feb 15, 2012 at 1:36 PM, Matthew Brett > Personally, I > would say that making the founder of a company, which is > working to > make money from Numpy, the only decision maker on numpy - > is - scary. not to me: -- power always goes to those that actually write the code -- as far as I can recall, there has never been a large group of folks contributing to the core code so have a company being the primary contributor and decision maker is no different that what we've always had -- particularly when Travis pretty much single-handedly re-factored Numeric to give us numpy. Oh -- one difference -- having a company with more that one coder means more code! If continuum does indeed develop a bloated, ugly mess to meet their client's needs -- none of us have to use it. -Chris > But maybe it's the best way. ? But, again, we're all high-functioning > sensible people, I'm sure it's possible for us to formulate what the > risks are, what the potential solutions are, and come up with the best > - maybe short-term - solution, > > See you, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From charlesr.harris at gmail.com Thu Feb 16 12:53:25 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 16 Feb 2012 10:53:25 -0700 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> Message-ID: On Thu, Feb 16, 2012 at 9:56 AM, Nathaniel Smith wrote: > On Thu, Feb 16, 2012 at 12:27 AM, Dag Sverre Seljebotn > wrote: > > If non-contributing users came along on the Cython list demanding that > > we set up a system to select non-developers along on a board that would > > have discussions in order to veto pull requests, I don't know whether > > we'd ignore it or ridicule it or try to show some patience, but we > > certainly wouldn't take it seriously. > > I'm not really worried about the Continuum having some nefarious > "corporate" intent. But I am worried about how these plans will affect > numpy, and I think there serious risks if we don't think about > process. Money has a dramatic effect on FOSS development, and not > always in a positive way, even when -- or *especially* when -- > everyone has the best of intentions. I'm actually *more* worried about > altruistic full-time developers doing work on behalf of the community > than I am about developers who are working strictly in some company's > interests. > > Finding a good design for software is like a nasty optimization > problem -- it's easy to get stuck in local maxima, and any one person > has only an imperfect, noisy estimate of the objective function. So > you need lots of eyes to catch mistakes, filter out the noise, and > explore multiple maxima in parallel. > > The classic FOSS model of volunteer developers who are in charge of > project direction does a *great* job of solving this problem. (Linux > beat all the classic Unixen on technical quality, and it did it using > college students and volunteers -- it's not like Sun, IBM, HP etc. > couldn't afford better engineers! But they still lost.) Volunteers are > intimately familiar with the itch they're trying to scratch and the > trade-offs involved in doing so, and they need to work together to > produce anything major, so you get lots of different, high-quality > perspectives to help you figure out which approach is best. > > Linux is probably a bad choice as example here. Right up to about 2002 Linus was pretty much the only entry point into mainline as he applied all the patches by hand and reviewed all of them. This of course slowed Linux development considerably. I also had the opportunity to fix up some of the drivers for my own machine and can testify that the code quality of the patches was mixed. Now, of course, with 10000 or more patches going in during the open period of each development cycle, Linus relies on lieutenants to handle the subsystems, but he can be damn scathing when he takes an interest in some code and doesn't like what he sees. And he *can* be scathing, not just because he started the whole thing, but because he is darn good and the other developers respect that. But my point here is that Linus pretty much shapes Linux. Developers who are working for some corporate interest alter this > balance, because in a "do-ocracy", someone who can throw a few > full-time developers at something suddenly is suddenly has effectively > complete control over project direction. There's no moral problem here > when the "dictator" is benevolent, but suddenly you have an > informational bottleneck -- even benevolent dictators make mistakes, > and they certainly aren't omniscient. Even this isn't *so* bad though, > so long as the corporation is scratching their own itch -- at least > you can be pretty sure that whatever they produce will at least make > them happy, which implies a certain level of utility. > > Linus deals with this by saying, fork, fork, fork. Of course the gpl makes that a more viable response. > The riskiest case is paying developers to scratch someone else's itch. > IIUC, that's a major goal of Travis's here, to find a way to pay > developers to make numpy better for everyone. But, now you need some > way for the community to figure out what "better" means, because the > developers themselves don't necessarily know. It's not their itch > anymore. Running a poll or whatever might be a nice start, but we all > know how tough it is to extract useful design information from users. > You need a lot more than that if you want to keep the quality up. > > Travis's proposal is that we go from a large number of self-selecting > people putting in little bits of time to a small number of designated > people putting in lots of time. There's a major win in terms of total > effort, but you inevitably lose a lot of diversity of viewpoints. My > feeling is it will only be a net win if the new employees put serious, > bend-over-backwards effort into taking advantage of the volunteer > community's wisdom. > > This is why the NA discussion seems so relevant to me here -- everyone > involved absolutely had good intentions, excellent skills, etc., and > yet the outcome is still a huge unresolved mess. It was supposed to > make numpy more attractive for a certain set of applications, like > statistical analysis, where R is currently preferred. Instead, there > have been massive changes merged into numpy mainline, but most of the > intended "target market" for these changes is indifferent to them; > they don't solve the problem they're supposed to. And along the way > we've not just spent a bunch of Enthought's money, but also wasted > dozens of hours of volunteer time while seriously alienating some of > numpy's most dedicated advocates in that "target market". We could > debate about blame, and I'm sure there's plenty to spread around, but > I also think the fundamental problem isn't one of blame at all -- it's > that Mark, Charles and Travis *aren't* scratching an itch; AFAICT the > NA functionality is not something they actually need themselves. Which > means they're fighting uphill when trying to find the best solutions, > and haven't managed it yet. And were working on a deadline, to boot. > > > It's obvious that one should try for consensus as long as possible, > > including listening to users. But in the very end, when agreement can't > > be reached by other means, the developers are the one making the calls. > > (This is simply a consequence that they are the only ones who can > > credibly threaten to fork the project.) > > > > Sure, structures that includes users in the process could be useful... > > but, if the devs are fine with the current situation (and I don't see > > Mark or Charles complaining), then I honestly think it is quite rude to > > not let the matter drop after the first ten posts or so. > > I'm not convinced we need a formal governing body, but I think we > really, really need a community norm that takes consensus *very* > seriously. That principle is more important than who exactly enforces > it. I guess people are worried about that turning into obstructionism > or something, but seriously, this is a practical approach that works > well for lots of real actual successful FOSS projects. > > I think it's also worth distinguishing between "users" and "developers > who happen not to be numpy core developers". There are lots of > experienced and skilled developers who spend their time on, say, scipy > or nipy or whatever, just because numpy already works for them. That > doesn't mean they don't have valuable insights or a stake in how numpy > develops going forward! > > IMHO, everyone who can credibly participate in the technical > discussion should have a veto -- and should almost never use it. And > yes, that means volunteers should be able to screw up corporate > schedules if that's what's best for numpy-the-project. And, to be > clear, I'm not saying that random list-members somehow *deserve* to > screw around with generous corporate endowments; I'm saying that the > people running the corporation are going to be a lot happier in the > long run if they impose this rule on themselves. > > I'm more for the Linux model, Linus rules, the rest grovel ;) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Thu Feb 16 12:54:10 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 16 Feb 2012 18:54:10 +0100 Subject: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842 In-Reply-To: References: Message-ID: <20120216175410.GB13400@phare.normalesup.org> On Thu, Feb 16, 2012 at 05:00:29PM +0000, Nathaniel Smith wrote: > I agree, but the behavior is still surprising -- people reasonably > expect something like svd to be deterministic. People are wrong then. Trust me, I work enough with ill-conditionned problems, including SVDs, to know that the algorithms are not deterministic. You can improve them by controlling the random starting point, but in many case it is not enough. Decreasing the tolerance on the algorithm may help (I don't know if we can control that with the lapack interface), but at the cost of a lot of computing time. G From josef.pktd at gmail.com Thu Feb 16 13:09:27 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 16 Feb 2012 13:09:27 -0500 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> Message-ID: On Thu, Feb 16, 2012 at 12:53 PM, Charles R Harris wrote: > > > On Thu, Feb 16, 2012 at 9:56 AM, Nathaniel Smith wrote: >> >> On Thu, Feb 16, 2012 at 12:27 AM, Dag Sverre Seljebotn >> wrote: >> > If non-contributing users came along on the Cython list demanding that >> > we set up a system to select non-developers along on a board that would >> > have discussions in order to veto pull requests, I don't know whether >> > we'd ignore it or ridicule it or try to show some patience, but we >> > certainly wouldn't take it seriously. >> >> I'm not really worried about the Continuum having some nefarious >> "corporate" intent. But I am worried about how these plans will affect >> numpy, and I think there serious risks if we don't think about >> process. Money has a dramatic effect on FOSS development, and not >> always in a positive way, even when -- or *especially* when -- >> everyone has the best of intentions. I'm actually *more* worried about >> altruistic full-time developers doing work on behalf of the community >> than I am about developers who are working strictly in some company's >> interests. >> >> Finding a good design for software is like a nasty optimization >> problem -- it's easy to get stuck in local maxima, and any one person >> has only an imperfect, noisy estimate of the objective function. So >> you need lots of eyes to catch mistakes, filter out the noise, and >> explore multiple maxima in parallel. >> >> The classic FOSS model of volunteer developers who are in charge of >> project direction does a *great* job of solving this problem. (Linux >> beat all the classic Unixen on technical quality, and it did it using >> college students and volunteers -- it's not like Sun, IBM, HP etc. >> couldn't afford better engineers! But they still lost.) Volunteers are >> intimately familiar with the itch they're trying to scratch and the >> trade-offs involved in doing so, and they need to work together to >> produce anything major, so you get lots of different, high-quality >> perspectives to help you figure out which approach is best. >> > > Linux is probably a bad choice as example here. Right up to about 2002 Linus > was pretty much the only entry point into mainline as he applied all the > patches by hand and reviewed all of them. This of course slowed Linux > development considerably. I also had the opportunity to fix up some of the > drivers for my own machine and can testify that the code quality of the > patches was mixed. Now, of course, with 10000 or more patches going in > during the open period of each development cycle, Linus relies on > lieutenants to handle the subsystems, but he can be damn scathing when he > takes an interest in some code and doesn't like what he sees. And he *can* > be scathing, not just because he started the whole thing, but because he is > darn good and the other developers respect that. But my point here is that > Linus pretty much shapes Linux. > > >> Developers who are working for some corporate interest alter this >> balance, because in a "do-ocracy", someone who can throw a few >> full-time developers at something suddenly is suddenly has effectively >> complete control over project direction. There's no moral problem here >> when the "dictator" is benevolent, but suddenly you have an >> informational bottleneck -- even benevolent dictators make mistakes, >> and they certainly aren't omniscient. Even this isn't *so* bad though, >> so long as the corporation is scratching their own itch -- at least >> you can be pretty sure that whatever they produce will at least make >> them happy, which implies a certain level of utility. >> > > Linus deals with this by saying, fork, fork, fork. Of course the gpl makes > that a more viable response. > >> >> The riskiest case is paying developers to scratch someone else's itch. >> IIUC, that's a major goal of Travis's here, to find a way to pay >> developers to make numpy better for everyone. But, now you need some >> way for the community to figure out what "better" means, because the >> developers themselves don't necessarily know. It's not their itch >> anymore. Running a poll or whatever might be a nice start, but we all >> know how tough it is to extract useful design information from users. >> You need a lot more than that if you want to keep the quality up. >> >> Travis's proposal is that we go from a large number of self-selecting >> people putting in little bits of time to a small number of designated >> people putting in lots of time. There's a major win in terms of total >> effort, but you inevitably lose a lot of diversity of viewpoints. My >> feeling is it will only be a net win if the new employees put serious, >> bend-over-backwards effort into taking advantage of the volunteer >> community's wisdom. >> >> This is why the NA discussion seems so relevant to me here -- everyone >> involved absolutely had good intentions, excellent skills, etc., and >> yet the outcome is still a huge unresolved mess. It was supposed to >> make numpy more attractive for a certain set of applications, like >> statistical analysis, where R is currently preferred. Instead, there >> have been massive changes merged into numpy mainline, but most of the >> intended "target market" for these changes is indifferent to them; >> they don't solve the problem they're supposed to. And along the way >> we've not just spent a bunch of Enthought's money, but also wasted >> dozens of hours of volunteer time while seriously alienating some of >> numpy's most dedicated advocates in that "target market". We could >> debate about blame, and I'm sure there's plenty to spread around, but >> I also think the fundamental problem isn't one of blame at all -- it's >> that Mark, Charles and Travis *aren't* scratching an itch; AFAICT the >> NA functionality is not something they actually need themselves. Which >> means they're fighting uphill when trying to find the best solutions, >> and haven't managed it yet. And were working on a deadline, to boot. >> >> > It's obvious that one should try for consensus as long as possible, >> > including listening to users. But in the very end, when agreement can't >> > be reached by other means, the developers are the one making the calls. >> > (This is simply a consequence that they are the only ones who can >> > credibly threaten to fork the project.) >> > >> > Sure, structures that includes users in the process could be useful... >> > but, if the devs are fine with the current situation (and I don't see >> > Mark or Charles complaining), then I honestly think it is quite rude to >> > not let the matter drop after the first ten posts or so. >> >> I'm not convinced we need a formal governing body, but I think we >> really, really need a community norm that takes consensus *very* >> seriously. That principle is more important than who exactly enforces >> it. I guess people are worried about that turning into obstructionism >> or something, but seriously, this is a practical approach that works >> well for lots of real actual successful FOSS projects. >> >> I think it's also worth distinguishing between "users" and "developers >> who happen not to be numpy core developers". There are lots of >> experienced and skilled developers who spend their time on, say, scipy >> or nipy or whatever, just because numpy already works for them. That >> doesn't mean they don't have valuable insights or a stake in how numpy >> develops going forward! >> >> IMHO, everyone who can credibly participate in the technical >> discussion should have a veto -- and should almost never use it. And >> yes, that means volunteers should be able to screw up corporate >> schedules if that's what's best for numpy-the-project. And, to be >> clear, I'm not saying that random list-members somehow *deserve* to >> screw around with generous corporate endowments; I'm saying that the >> people running the corporation are going to be a lot happier in the >> long run if they impose this rule on themselves. >> > > I'm more for the Linux model, Linus rules, the rest grovel ;) I would feel a lot more comfortable with a BDFL that has code coverage and ABI consistency high on his priority, and not just getting the greatest new features in as fast as possible. numpy is quite a bit more in use now than xx years ago. Josef someone who learned numpy and scipy by working through bugs. > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From ralf.gommers at googlemail.com Thu Feb 16 13:25:36 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 16 Feb 2012 19:25:36 +0100 Subject: [Numpy-discussion] Migrating issues to GitHub In-Reply-To: References: Message-ID: On Wed, Feb 15, 2012 at 12:11 PM, Thouis (Ray) Jones wrote: > On Sat, Feb 11, 2012 at 21:54, Fernando Perez > wrote: > > On Sat, Feb 11, 2012 at 12:36 PM, Pauli Virtanen wrote: > >> The lack of attachments is the main problem with this transition. It's > >> not so seldom that numerical input data or scripts demonstrating an > >> issue come useful. This is probably less of an issue for Numpy than for > >> Scipy, though. > > > > We've taken to using gist for scripts/data and free image hosting > > sites for screenshots, using > > > > >> numpy, and I think there serious risks if we don't think about > >> process. Money has a dramatic effect on FOSS development, and not > >> always in a positive way, even when -- or *especially* when -- > >> everyone has the best of intentions. I'm actually *more* worried about > >> altruistic full-time developers doing work on behalf of the community > >> than I am about developers who are working strictly in some company's > >> interests. > >> > >> Finding a good design for software is like a nasty optimization > >> problem -- it's easy to get stuck in local maxima, and any one person > >> has only an imperfect, noisy estimate of the objective function. So > >> you need lots of eyes to catch mistakes, filter out the noise, and > >> explore multiple maxima in parallel. > >> > >> The classic FOSS model of volunteer developers who are in charge of > >> project direction does a *great* job of solving this problem. (Linux > >> beat all the classic Unixen on technical quality, and it did it using > >> college students and volunteers -- it's not like Sun, IBM, HP etc. > >> couldn't afford better engineers! But they still lost.) Volunteers are > >> intimately familiar with the itch they're trying to scratch and the > >> trade-offs involved in doing so, and they need to work together to > >> produce anything major, so you get lots of different, high-quality > >> perspectives to help you figure out which approach is best. > >> > > > > Linux is probably a bad choice as example here. Right up to about 2002 > Linus > > was pretty much the only entry point into mainline as he applied all the > > patches by hand and reviewed all of them. This of course slowed Linux > > development considerably. I also had the opportunity to fix up some of > the > > drivers for my own machine and can testify that the code quality of the > > patches was mixed. Now, of course, with 10000 or more patches going in > > during the open period of each development cycle, Linus relies on > > lieutenants to handle the subsystems, but he can be damn scathing when he > > takes an interest in some code and doesn't like what he sees. And he > *can* > > be scathing, not just because he started the whole thing, but because he > is > > darn good and the other developers respect that. But my point here is > that > > Linus pretty much shapes Linux. > > > > > >> Developers who are working for some corporate interest alter this > >> balance, because in a "do-ocracy", someone who can throw a few > >> full-time developers at something suddenly is suddenly has effectively > >> complete control over project direction. There's no moral problem here > >> when the "dictator" is benevolent, but suddenly you have an > >> informational bottleneck -- even benevolent dictators make mistakes, > >> and they certainly aren't omniscient. Even this isn't *so* bad though, > >> so long as the corporation is scratching their own itch -- at least > >> you can be pretty sure that whatever they produce will at least make > >> them happy, which implies a certain level of utility. > >> > > > > Linus deals with this by saying, fork, fork, fork. Of course the gpl > makes > > that a more viable response. > > > >> > >> The riskiest case is paying developers to scratch someone else's itch. > >> IIUC, that's a major goal of Travis's here, to find a way to pay > >> developers to make numpy better for everyone. But, now you need some > >> way for the community to figure out what "better" means, because the > >> developers themselves don't necessarily know. It's not their itch > >> anymore. Running a poll or whatever might be a nice start, but we all > >> know how tough it is to extract useful design information from users. > >> You need a lot more than that if you want to keep the quality up. > >> > >> Travis's proposal is that we go from a large number of self-selecting > >> people putting in little bits of time to a small number of designated > >> people putting in lots of time. There's a major win in terms of total > >> effort, but you inevitably lose a lot of diversity of viewpoints. My > >> feeling is it will only be a net win if the new employees put serious, > >> bend-over-backwards effort into taking advantage of the volunteer > >> community's wisdom. > >> > >> This is why the NA discussion seems so relevant to me here -- everyone > >> involved absolutely had good intentions, excellent skills, etc., and > >> yet the outcome is still a huge unresolved mess. It was supposed to > >> make numpy more attractive for a certain set of applications, like > >> statistical analysis, where R is currently preferred. Instead, there > >> have been massive changes merged into numpy mainline, but most of the > >> intended "target market" for these changes is indifferent to them; > >> they don't solve the problem they're supposed to. And along the way > >> we've not just spent a bunch of Enthought's money, but also wasted > >> dozens of hours of volunteer time while seriously alienating some of > >> numpy's most dedicated advocates in that "target market". We could > >> debate about blame, and I'm sure there's plenty to spread around, but > >> I also think the fundamental problem isn't one of blame at all -- it's > >> that Mark, Charles and Travis *aren't* scratching an itch; AFAICT the > >> NA functionality is not something they actually need themselves. Which > >> means they're fighting uphill when trying to find the best solutions, > >> and haven't managed it yet. And were working on a deadline, to boot. > >> > >> > It's obvious that one should try for consensus as long as possible, > >> > including listening to users. But in the very end, when agreement > can't > >> > be reached by other means, the developers are the one making the > calls. > >> > (This is simply a consequence that they are the only ones who can > >> > credibly threaten to fork the project.) > >> > > >> > Sure, structures that includes users in the process could be useful... > >> > but, if the devs are fine with the current situation (and I don't see > >> > Mark or Charles complaining), then I honestly think it is quite rude > to > >> > not let the matter drop after the first ten posts or so. > >> > >> I'm not convinced we need a formal governing body, but I think we > >> really, really need a community norm that takes consensus *very* > >> seriously. That principle is more important than who exactly enforces > >> it. I guess people are worried about that turning into obstructionism > >> or something, but seriously, this is a practical approach that works > >> well for lots of real actual successful FOSS projects. > >> > >> I think it's also worth distinguishing between "users" and "developers > >> who happen not to be numpy core developers". There are lots of > >> experienced and skilled developers who spend their time on, say, scipy > >> or nipy or whatever, just because numpy already works for them. That > >> doesn't mean they don't have valuable insights or a stake in how numpy > >> develops going forward! > >> > >> IMHO, everyone who can credibly participate in the technical > >> discussion should have a veto -- and should almost never use it. And > >> yes, that means volunteers should be able to screw up corporate > >> schedules if that's what's best for numpy-the-project. And, to be > >> clear, I'm not saying that random list-members somehow *deserve* to > >> screw around with generous corporate endowments; I'm saying that the > >> people running the corporation are going to be a lot happier in the > >> long run if they impose this rule on themselves. > >> > > > > I'm more for the Linux model, Linus rules, the rest grovel ;) > > To clarify this a bit more, there isn't an 'official' Linux. Linus' tree is the reference due to history and his own central position in the community, but he would argue that it is *his* tree, and what goes in, or out, is *his* choice. If you want to do things differently, you have your own tree, go for it. Linus is a bit a of a radical libertarian when it comes to code :) Debian is much more democratic, but the stable releases also tend to lag 2-3 years behind current development. That's one of the reasons there is a spot for Debian based distributions like Ubuntu. I would feel a lot more comfortable with a BDFL that has code coverage > and ABI consistency high on his priority, and not just getting the > greatest new features in as fast as possible. > > I think this is a good point, which is why the idea of a long term release is appealing. That release should be stodgy and safe, while the ongoing development can be much more radical in making changes. And numpy really does need a fairly radical rewrite, just to clarify and simplify the base code easier if nothing else. New features I'm more leery about, at least until the code base is improved, which would be my short term priority. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Thu Feb 16 14:03:19 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 16 Feb 2012 11:03:19 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <4F3CE542.9040808@creativetrax.com> Message-ID: Hi, On Thu, Feb 16, 2012 at 4:23 AM, Francesc Alted wrote: > On Feb 16, 2012, at 12:15 PM, Jason Grout wrote: > >> On 2/15/12 6:27 PM, Dag Sverre Seljebotn wrote: >>> But in the very end, when agreement can't >>> be reached by other means, the developers are the one making the calls. >>> (This is simply a consequence that they are the only ones who can >>> credibly threaten to fork the project.) >> >> Interesting point. ?I hope I'm not pitching a log onto the fire here, >> but in numpy's case, there are very many capable developers on other >> projects who depend on numpy who could credibly threaten a fork if they >> felt numpy was drastically going wrong. > > Jason, that there capable developers out there that are able to fork NumPy (or any other project you can realize) is a given. ?The point Dag was signaling is that this threaten is more probable to happen *inside* the community. > > And you pointed out an important aspect too by saying "if they felt numpy was drastically going wrong". ?It makes me the impression that some people is very frightened about something really bad would happen, well before it happens. ?While I agree that this is *possible*, I'd also advocate to give Travis the benefit of doubt. ?I'm convinced he (and Continuum as a whole) is making things happen that will benefit the entire NumPy community; but in case something gets really wrong and catastrophic, it is always a relief to know that things can be reverted in the pure open source tradition (by either doing a fork, creating a new foundation, or even better, proposing a new way to do things). ?What it does not sound reasonable to me is to allow fear to block Continuum efforts for making a better NumPy. ?I think it is better to relax a bit, see how things are going, and then judge by looking at the *results*. I'm finding this conversation a bit frustrating. The question on the table as I understand it, is just the following: Is there any governance structure / procedure / set of guidelines that would help ensure the long-term health of the numpy project? The subtext of your response is that you regard *any structure at all* as damaging to the numpy effort and in particular, as damaging to the efforts of Continuum. It seems to me that is a very extreme point of view, and I think, honestly, it is not tenable. But surely - surely - the best thing to do here is to formulate something that might be acceptable, and for everyone to say what they think the problems would be. Do you agree? Best, Matthew From ralf.gommers at googlemail.com Thu Feb 16 14:05:06 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 16 Feb 2012 20:05:06 +0100 Subject: [Numpy-discussion] test errors on deprecation/runtime warnings Message-ID: Hi, Last week we merged https://github.com/numpy/numpy/pull/201, which causes DeprecationWarning's and RuntimeWarning's to be converted to errors if they occur when running the test suite. The purpose of that is to make sure that code that still uses other deprecated code (or code that for some reason generates warnings) to be cleaned up. In principle this is a good idea IMHO, but after merging we quickly found some problems with failing tests in scipy. Because this potentially affects any other projects or users that use the numpy NoseTester test runner, here's a proposal to deal with this issue: - make this behavior configurable by a keyword in `NoseTester.__init__()` - default to raising an error in numpy master - when making a branch for release, immediately set the default to not raise. Do this not only for the 1.7 release, but for any future release (at least until the oldest numpy version still in use has the keyword). The reason for this is that otherwise a new numpy release will trigger test failures in older releases of scipy or other packages. The pros: - We find issues that otherwise get ignored (see the issues Christoph just found when compiling with MSVC). - We're forced to clean up code when submitting PRs, instead of letting the warnings accumulate and having to deal with them just before release time. The con: - You may see test errors if you run a released version of scipy with a development version of numpy. Opinions? Concerns? Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Thu Feb 16 14:17:50 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 16 Feb 2012 20:17:50 +0100 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <4F3CE542.9040808@creativetrax.com> Message-ID: On Thu, Feb 16, 2012 at 8:03 PM, Matthew Brett wrote: > Hi, > > On Thu, Feb 16, 2012 at 4:23 AM, Francesc Alted > wrote: > > On Feb 16, 2012, at 12:15 PM, Jason Grout wrote: > > > >> On 2/15/12 6:27 PM, Dag Sverre Seljebotn wrote: > >>> But in the very end, when agreement can't > >>> be reached by other means, the developers are the one making the calls. > >>> (This is simply a consequence that they are the only ones who can > >>> credibly threaten to fork the project.) > >> > >> Interesting point. I hope I'm not pitching a log onto the fire here, > >> but in numpy's case, there are very many capable developers on other > >> projects who depend on numpy who could credibly threaten a fork if they > >> felt numpy was drastically going wrong. > > > > Jason, that there capable developers out there that are able to fork > NumPy (or any other project you can realize) is a given. The point Dag was > signaling is that this threaten is more probable to happen *inside* the > community. > > > > And you pointed out an important aspect too by saying "if they felt > numpy was drastically going wrong". It makes me the impression that some > people is very frightened about something really bad would happen, well > before it happens. While I agree that this is *possible*, I'd also > advocate to give Travis the benefit of doubt. I'm convinced he (and > Continuum as a whole) is making things happen that will benefit the entire > NumPy community; but in case something gets really wrong and catastrophic, > it is always a relief to know that things can be reverted in the pure open > source tradition (by either doing a fork, creating a new foundation, or > even better, proposing a new way to do things). What it does not sound > reasonable to me is to allow fear to block Continuum efforts for making a > better NumPy. I think it is better to relax a bit, see how things are > going, and then judge by looking at the *results*. > > I'm finding this conversation a bit frustrating. > > The question on the table as I understand it, is just the following: > > Is there any governance structure / procedure / set of guidelines that > would help ensure the long-term health of the numpy project? > > The subtext of your response is that you regard *any structure at all* > as damaging to the numpy effort and in particular, as damaging to the > efforts of Continuum. It seems to me that is a very extreme point of > view, and I think, honestly, it is not tenable. > That's not exactly how I'd interpret Peter's answer. > > But surely - surely - the best thing to do here is to formulate > something that might be acceptable, and for everyone to say what they > think the problems would be. Do you agree? > > David has made a concrete proposal for a procedure. It looks to me like that's an appropriate and adequate safeguard against Continuum pushing things into Numpy. Would that be enough for you? If not, would it at least be a good start? Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Thu Feb 16 14:51:34 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 16 Feb 2012 11:51:34 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <4F3CE542.9040808@creativetrax.com> Message-ID: On Thu, Feb 16, 2012 at 11:03 AM, Matthew Brett wrote: > But surely - surely - the best thing to do here is to formulate > something that might be acceptable, and for everyone to say what they > think the problems would be. ?Do you agree? Absolutely -- but just like anything else in open source -- nothing gets done because people think it should get done -- it gets done because someone sits down and does it. having a governance structure is not my itch -- I'm not going to scratch it -- is someone itchy enough to scratch? if so, do so -- while we all continue to talk about what color the bicycle shed behind the foundation offices should be ;-) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From njs at pobox.com Thu Feb 16 14:53:18 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 16 Feb 2012 19:53:18 +0000 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> Message-ID: On Wed, Feb 15, 2012 at 7:46 PM, Benjamin Root wrote: > Why not the NA discussion?? Would we really want to have that happen again? > Note that it still isn't fully resolved and progress still needs to be made > (I think the last thread did an excellent job of fleshing out the ideas, but > it became too much to digest.? We may need to have someone go through the > information, reduce it down and make one last push to bring it to a > conclusion). BTW, this is still on my todo list -- sorry for dropping the ball here. Perhaps once I find a flat here in Edinburgh. > The NA discussion is the perfect example where a governance > structure would help resolve disputes. I think the important question is, in an ideal world, what would have been done to help resolve this dispute? My best idea was to try and organize a document articulating points of consensus -- I'm not sure what sort of governance structure would have helped with that. A committee with an odd number of members is good at voting on things, but would a vote have helped? I dunno, I'm not saying it wouldn't -- just that it's something we might want to think about before we start writing bylaws. -- Nathaniel From cjordan1 at uw.edu Thu Feb 16 14:54:15 2012 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Thu, 16 Feb 2012 11:54:15 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <4F3CE542.9040808@creativetrax.com> Message-ID: On Thu, Feb 16, 2012 at 11:03 AM, Matthew Brett wrote: > Hi, > > On Thu, Feb 16, 2012 at 4:23 AM, Francesc Alted wrote: >> On Feb 16, 2012, at 12:15 PM, Jason Grout wrote: >> >>> On 2/15/12 6:27 PM, Dag Sverre Seljebotn wrote: >>>> But in the very end, when agreement can't >>>> be reached by other means, the developers are the one making the calls. >>>> (This is simply a consequence that they are the only ones who can >>>> credibly threaten to fork the project.) >>> >>> Interesting point. ?I hope I'm not pitching a log onto the fire here, >>> but in numpy's case, there are very many capable developers on other >>> projects who depend on numpy who could credibly threaten a fork if they >>> felt numpy was drastically going wrong. >> >> Jason, that there capable developers out there that are able to fork NumPy (or any other project you can realize) is a given. ?The point Dag was signaling is that this threaten is more probable to happen *inside* the community. >> >> And you pointed out an important aspect too by saying "if they felt numpy was drastically going wrong". ?It makes me the impression that some people is very frightened about something really bad would happen, well before it happens. ?While I agree that this is *possible*, I'd also advocate to give Travis the benefit of doubt. ?I'm convinced he (and Continuum as a whole) is making things happen that will benefit the entire NumPy community; but in case something gets really wrong and catastrophic, it is always a relief to know that things can be reverted in the pure open source tradition (by either doing a fork, creating a new foundation, or even better, proposing a new way to do things). ?What it does not sound reasonable to me is to allow fear to block Continuum efforts for making a better NumPy. ?I think it is better to relax a bit, see how things are going, and then judge by looking at the *results*. > > I'm finding this conversation a bit frustrating. > > The question on the table as I understand it, is just the following: > > Is there any governance structure / procedure / set of guidelines that > would help ensure the long-term health of the numpy project? > > The subtext of your response is that you regard *any structure at all* > as damaging to the numpy effort and in particular, as damaging to the > efforts of Continuum. ?It seems to me that is a very extreme point of > view, and I think, honestly, it is not tenable. > > But surely - surely - the best thing to do here is to formulate > something that might be acceptable, and for everyone to say what they > think the problems would be. ?Do you agree? > Perhaps I'm mistaken, but I think the subtext has more been that worrying about potential problems--which aren't yet actual problems--isn't terribly productive. Particularly when the people involved are smart, invested in the success of the broader numpy package, and very deserving of the benefit of the doubt. Also, as Ralf said, David made a concrete proposal. What are your comments on his proposal? -Chris JS > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From njs at pobox.com Thu Feb 16 15:13:36 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 16 Feb 2012 20:13:36 +0000 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: <3542A375-B1AD-44D9-8E10-BB7D748C51E2@vaught.net> References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <3542A375-B1AD-44D9-8E10-BB7D748C51E2@vaught.net> Message-ID: On Thu, Feb 16, 2012 at 5:17 PM, Travis Vaught wrote: > On Feb 16, 2012, at 10:56 AM, Nathaniel Smith wrote: > >> Travis's proposal is that we go from a large number of self-selecting >> people putting in little bits of time to a small number of designated >> people putting in lots of time. > > > That's not what Travis, or anyone else, proposed. Maybe I was unclear -- all I mean here is that if we suddenly have a few people working full-time on numpy (as Travis proposed), then that will cause two things: -- a massive increase in the total number of person-hours going into numpy -- a smaller group of people will be responsible for a much larger proportion of those person-hours (and this is leaving aside the other ways that it can be difficult for full-time developers and volunteers to interact -- the volunteers aren't in the office, the full-timers may not have the patience to wait for a long email-paced conversation before making a decision, etc.) I think Travis' proposal is potentially a great thing, but it's not as simple as just saying "hey we hired some people now our software will be better". Ask Fred Brooks ;-) -- Nathaniel From ben.root at ou.edu Thu Feb 16 15:28:34 2012 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 16 Feb 2012 14:28:34 -0600 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <3542A375-B1AD-44D9-8E10-BB7D748C51E2@vaught.net> Message-ID: On Thu, Feb 16, 2012 at 2:13 PM, Nathaniel Smith wrote: > On Thu, Feb 16, 2012 at 5:17 PM, Travis Vaught wrote: > > On Feb 16, 2012, at 10:56 AM, Nathaniel Smith wrote: > > > >> Travis's proposal is that we go from a large number of self-selecting > >> people putting in little bits of time to a small number of designated > >> people putting in lots of time. > > > > > > That's not what Travis, or anyone else, proposed. > > Maybe I was unclear -- all I mean here is that if we suddenly have a > few people working full-time on numpy (as Travis proposed), then that > will cause two things: > -- a massive increase in the total number of person-hours going into numpy > -- a smaller group of people will be responsible for a much larger > proportion of those person-hours > (and this is leaving aside the other ways that it can be difficult for > full-time developers and volunteers to interact -- the volunteers > aren't in the office, the full-timers may not have the patience to > wait for a long email-paced conversation before making a decision, > etc.) > > I think Travis' proposal is potentially a great thing, but it's not as > simple as just saying "hey we hired some people now our software will > be better". Ask Fred Brooks ;-) > > -- Nathaniel > Just a thought I had. >From the perspective of any company, they do not want to devote developer resources to an open-source project if the features are going to get rejected (either by the core-devs or by community backlash). Maybe the governance structure could be more along the lines of an advise/consent process for NEPs. This way, a company puts together a plan of action for some features and submits it to the central body (however that is defined). Comments and revisions are done. Finally, if the plan is approved, the company can feel confident that their efforts and resources won't get rejected after committing to the changes. Small changes and bugfixes are not effected by this. Large changes need planning and commentary anyway. This allows some official representation of the community to have some sort of light-handed control over the vision of the project. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Feb 16 15:36:19 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 16 Feb 2012 13:36:19 -0700 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <3542A375-B1AD-44D9-8E10-BB7D748C51E2@vaught.net> Message-ID: On Thu, Feb 16, 2012 at 1:13 PM, Nathaniel Smith wrote: > On Thu, Feb 16, 2012 at 5:17 PM, Travis Vaught wrote: > > On Feb 16, 2012, at 10:56 AM, Nathaniel Smith wrote: > > > >> Travis's proposal is that we go from a large number of self-selecting > >> people putting in little bits of time to a small number of designated > >> people putting in lots of time. > > > > > > That's not what Travis, or anyone else, proposed. > > Maybe I was unclear -- all I mean here is that if we suddenly have a > few people working full-time on numpy (as Travis proposed), then that > will cause two things: > -- a massive increase in the total number of person-hours going into numpy > -- a smaller group of people will be responsible for a much larger > proportion of those person-hours > (and this is leaving aside the other ways that it can be difficult for > full-time developers and volunteers to interact -- the volunteers > aren't in the office, the full-timers may not have the patience to > wait for a long email-paced conversation before making a decision, > etc.) > > I think Travis' proposal is potentially a great thing, but it's not as > simple as just saying "hey we hired some people now our software will > be better". Ask Fred Brooks ;-) > > What, you are invoking Fred Brooks for a team of, maybe, four? Numpy ain't OS/360. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Feb 16 15:39:41 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 16 Feb 2012 20:39:41 +0000 Subject: [Numpy-discussion] strange behavior of numpy.random.multivariate_normal, ticket:1842 In-Reply-To: References: Message-ID: On Thu, Feb 16, 2012 at 5:20 PM, Pauli Virtanen wrote: > Hi, > > 16.02.2012 18:00, Nathaniel Smith kirjoitti: > [clip] >> I agree, but the behavior is still surprising -- people reasonably >> expect something like svd to be deterministic. So there's probably a >> doc bug for alerting people that their reasonable expectation is, in >> fact, wrong :-). > > The problem here is that these warnings should in principle appear in > the documentation of every numerical algorithm that contains branches > chosen on the basis of floating point data. For example, optimization > algorithms --- they terminate after a tolerance is satisfied, and so the > results can contain similar quasi-random error much larger than the > rounding error, tol > |err| >> eps. > > Floating point sucks, it's full of gotchas for all ages :( Yes, and maybe I'm just projecting my own particular naivete... I'm very familiar with numerical stability and rounding as issues, and of course optimization-based algorithms have the issue you raise. I'm still surprised to learn that on a single machine, with bit-identical inputs, using a mature low-level routine like svd, you can get *qualitatively* different results depending on memory alignment. (I wouldn't expect dense SVD to use a fixed tolerance optimization routine either!) -- Nathaniel From njs at pobox.com Thu Feb 16 15:45:42 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 16 Feb 2012 20:45:42 +0000 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <3542A375-B1AD-44D9-8E10-BB7D748C51E2@vaught.net> Message-ID: On Thu, Feb 16, 2012 at 8:36 PM, Charles R Harris wrote: > > > On Thu, Feb 16, 2012 at 1:13 PM, Nathaniel Smith wrote: >> >> On Thu, Feb 16, 2012 at 5:17 PM, Travis Vaught wrote: >> > On Feb 16, 2012, at 10:56 AM, Nathaniel Smith wrote: >> > >> >> Travis's proposal is that we go from a large number of self-selecting >> >> people putting in little bits of time to a small number of designated >> >> people putting in lots of time. >> > >> > >> > That's not what Travis, or anyone else, proposed. >> >> Maybe I was unclear -- all I mean here is that if we suddenly have a >> few people working full-time on numpy (as Travis proposed), then that >> will cause two things: >> ?-- a massive increase in the total number of person-hours going into >> numpy >> ?-- a smaller group of people will be responsible for a much larger >> proportion of those person-hours >> (and this is leaving aside the other ways that it can be difficult for >> full-time developers and volunteers to interact -- the volunteers >> aren't in the office, the full-timers may not have the patience to >> wait for a long email-paced conversation before making a decision, >> etc.) >> >> I think Travis' proposal is potentially a great thing, but it's not as >> simple as just saying "hey we hired some people now our software will >> be better". Ask Fred Brooks ;-) >> > > What, you are invoking Fred Brooks for a team of, maybe, four? Numpy ain't > OS/360. For the general idea that you can't just translate person-hours of effort into results? Yes, though do note the winky emoticon, which is used to indicate that a statement is somewhat tongue in cheek ;-). Do you have any thoughts on the actual content of my concerns? Do you agree that there's a risk that in Travis's plan, you'll be losing out on valuable input from non-core-contributors who are nonetheless experts in particular areas? -- Nathaniel From matthew.brett at gmail.com Thu Feb 16 15:57:43 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 16 Feb 2012 12:57:43 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <4F3CE542.9040808@creativetrax.com> Message-ID: Hi, On Thu, Feb 16, 2012 at 11:54 AM, Christopher Jordan-Squire wrote: > On Thu, Feb 16, 2012 at 11:03 AM, Matthew Brett wrote: >> Hi, >> >> On Thu, Feb 16, 2012 at 4:23 AM, Francesc Alted wrote: >>> On Feb 16, 2012, at 12:15 PM, Jason Grout wrote: >>> >>>> On 2/15/12 6:27 PM, Dag Sverre Seljebotn wrote: >>>>> But in the very end, when agreement can't >>>>> be reached by other means, the developers are the one making the calls. >>>>> (This is simply a consequence that they are the only ones who can >>>>> credibly threaten to fork the project.) >>>> >>>> Interesting point. ?I hope I'm not pitching a log onto the fire here, >>>> but in numpy's case, there are very many capable developers on other >>>> projects who depend on numpy who could credibly threaten a fork if they >>>> felt numpy was drastically going wrong. >>> >>> Jason, that there capable developers out there that are able to fork NumPy (or any other project you can realize) is a given. ?The point Dag was signaling is that this threaten is more probable to happen *inside* the community. >>> >>> And you pointed out an important aspect too by saying "if they felt numpy was drastically going wrong". ?It makes me the impression that some people is very frightened about something really bad would happen, well before it happens. ?While I agree that this is *possible*, I'd also advocate to give Travis the benefit of doubt. ?I'm convinced he (and Continuum as a whole) is making things happen that will benefit the entire NumPy community; but in case something gets really wrong and catastrophic, it is always a relief to know that things can be reverted in the pure open source tradition (by either doing a fork, creating a new foundation, or even better, proposing a new way to do things). ?What it does not sound reasonable to me is to allow fear to block Continuum efforts for making a better NumPy. ?I think it is better to relax a bit, see how things are going, and then judge by looking at the *results*. >> >> I'm finding this conversation a bit frustrating. >> >> The question on the table as I understand it, is just the following: >> >> Is there any governance structure / procedure / set of guidelines that >> would help ensure the long-term health of the numpy project? >> >> The subtext of your response is that you regard *any structure at all* >> as damaging to the numpy effort and in particular, as damaging to the >> efforts of Continuum. ?It seems to me that is a very extreme point of >> view, and I think, honestly, it is not tenable. >> >> But surely - surely - the best thing to do here is to formulate >> something that might be acceptable, and for everyone to say what they >> think the problems would be. ?Do you agree? >> > > Perhaps I'm mistaken, but I think the subtext has more been that > worrying about potential problems--which aren't yet actual > problems--isn't terribly productive. Particularly when the people > involved are smart, invested in the success of the broader numpy > package, and very deserving of the benefit of the doubt. OK - that is one point of view. I'll state the most extreme version thus: "There is no possible benefit to thinking of a governance structure before problems arise". That seems to me to be untenable. As others have pointed out, the kind of problems that might arise are fairly obvious, these kinds of things have been thought about before, and designing a solution to these problems after they have arisen may be considerably harder than doing it before. Here's another version of the argument: "It is not possible to imagine a governance structure that would be better than the current one". That does seem to me to be extreme, and untenable until various schemes have been seriously considered. A more reasonable version of the same argument might be: "The costs of working on an a governance structure are greater than the potential benefits". Is that defensible? I don't think so. But if it is, what are the costs, exactly? > Also, as Ralf said, David made a concrete proposal. What are your > comments on his proposal? Right - and so did Alan Isaac some time ago, which I responded to, and got no reply. I think - as Benjamin has said previously, we first have to establish that *any* governance structure is worth discussing. Including the current one. I'm hearing a lot of "no" to that. Do you think that any governance structure is worth discussing? If so, which do you prefer? For David's proposal - I think it is likely to be impractical because, at the moment, almost all agreement is informal and sometimes off-list. Given that situation, it would take an enormous amount of balls to reject a Continuum pull-request. If Continuum started to lose interest in that mechanism, if I was them, I'd make the pull requests unmanageably large and therefore impossible to review. It would require one person to be the gatekeeper for the whole community, putting enormous strain on that person. Lastly, I'm not seeing much commitment to what George Dafermos calls "unmediated participation of all community members". "The desire to realise democracy is not futile. Rather, the problem is that real democracy, that is, that mode of governance which is characterised by the unmediated participation of all community members in the process of formulating problems and negotiating decisions, is unattainable once a group is split into a fraction that decides and commands and another that obeys. Such structures make a travesty of the notion of democracy" http://shareable.net/blog/governance-of-open-source-george-dafermos-interview It seems to me that it is that we need to strive for, and we need to think how that would best come about. With Nathaniel, I also think that will lead to the most harmonious, enjoyable and productive relationship between Continuum and the rest of the community. Best, Matthew From cjordan1 at uw.edu Thu Feb 16 16:06:11 2012 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Thu, 16 Feb 2012 13:06:11 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <3542A375-B1AD-44D9-8E10-BB7D748C51E2@vaught.net> Message-ID: On Thu, Feb 16, 2012 at 12:45 PM, Nathaniel Smith wrote: > On Thu, Feb 16, 2012 at 8:36 PM, Charles R Harris > wrote: >> >> >> On Thu, Feb 16, 2012 at 1:13 PM, Nathaniel Smith wrote: >>> >>> On Thu, Feb 16, 2012 at 5:17 PM, Travis Vaught wrote: >>> > On Feb 16, 2012, at 10:56 AM, Nathaniel Smith wrote: >>> > >>> >> Travis's proposal is that we go from a large number of self-selecting >>> >> people putting in little bits of time to a small number of designated >>> >> people putting in lots of time. >>> > >>> > >>> > That's not what Travis, or anyone else, proposed. >>> >>> Maybe I was unclear -- all I mean here is that if we suddenly have a >>> few people working full-time on numpy (as Travis proposed), then that >>> will cause two things: >>> ?-- a massive increase in the total number of person-hours going into >>> numpy >>> ?-- a smaller group of people will be responsible for a much larger >>> proportion of those person-hours >>> (and this is leaving aside the other ways that it can be difficult for >>> full-time developers and volunteers to interact -- the volunteers >>> aren't in the office, the full-timers may not have the patience to >>> wait for a long email-paced conversation before making a decision, >>> etc.) >>> >>> I think Travis' proposal is potentially a great thing, but it's not as >>> simple as just saying "hey we hired some people now our software will >>> be better". Ask Fred Brooks ;-) >>> >> >> What, you are invoking Fred Brooks for a team of, maybe, four? Numpy ain't >> OS/360. > > For the general idea that you can't just translate person-hours of > effort into results? Yes, though do note the winky emoticon, which is > used to indicate that a statement is somewhat tongue in cheek ;-). > > Do you have any thoughts on the actual content of my concerns? Do you > agree that there's a risk that in Travis's plan, you'll be losing out > on valuable input from non-core-contributors who are nonetheless > experts in particular areas? > I'm not really sure how. All the developers involved are attentive enough to make announcements of pull requests and requests for comments on proposed changes. So if there's expert opinion to be had easily, i.e. through the mailing list, then I can only imagine they'd go out and get it. This also jibes with Benjamin Root's comment. Major changes will be discussed anyways. So I'm not sure how this particular objection is relevant. -Chris > -- Nathaniel > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Thu Feb 16 16:08:25 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 16 Feb 2012 14:08:25 -0700 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <3542A375-B1AD-44D9-8E10-BB7D748C51E2@vaught.net> Message-ID: On Thu, Feb 16, 2012 at 1:45 PM, Nathaniel Smith wrote: > On Thu, Feb 16, 2012 at 8:36 PM, Charles R Harris > wrote: > > > > > > On Thu, Feb 16, 2012 at 1:13 PM, Nathaniel Smith wrote: > >> > >> On Thu, Feb 16, 2012 at 5:17 PM, Travis Vaught > wrote: > >> > On Feb 16, 2012, at 10:56 AM, Nathaniel Smith wrote: > >> > > >> >> Travis's proposal is that we go from a large number of self-selecting > >> >> people putting in little bits of time to a small number of designated > >> >> people putting in lots of time. > >> > > >> > > >> > That's not what Travis, or anyone else, proposed. > >> > >> Maybe I was unclear -- all I mean here is that if we suddenly have a > >> few people working full-time on numpy (as Travis proposed), then that > >> will cause two things: > >> -- a massive increase in the total number of person-hours going into > >> numpy > >> -- a smaller group of people will be responsible for a much larger > >> proportion of those person-hours > >> (and this is leaving aside the other ways that it can be difficult for > >> full-time developers and volunteers to interact -- the volunteers > >> aren't in the office, the full-timers may not have the patience to > >> wait for a long email-paced conversation before making a decision, > >> etc.) > >> > >> I think Travis' proposal is potentially a great thing, but it's not as > >> simple as just saying "hey we hired some people now our software will > >> be better". Ask Fred Brooks ;-) > >> > > > > What, you are invoking Fred Brooks for a team of, maybe, four? Numpy > ain't > > OS/360. > > For the general idea that you can't just translate person-hours of > effort into results? Yes, though do note the winky emoticon, which is > used to indicate that a statement is somewhat tongue in cheek ;-). > > Do you have any thoughts on the actual content of my concerns? Do you > agree that there's a risk that in Travis's plan, you'll be losing out > on valuable input from non-core-contributors who are nonetheless > experts in particular areas? > I'd be more concerned if I saw more input from non-core-contributors. The sticky issues I see are more along the lines of 1) Trademarking Numpy (TM), which probably needs doing, but who holds the trademark? 2) Distribution of money, accounting, and maybe meeting minutes. If donations are targeted to specific uses, that probably isn't a problem. Advertizing income could be in issue, though. I don't know how much transparency is required by 501(c), probably not much judging by the organizations that have that status. I think Mark's proposal to revisit the issue if/when the number of core contributors reaches maybe 5 is a good one. But in order to attract that many developers long term requires making the code more attractive and laying out a direction. I hope that the initial work along that line is soon published on the list, the sooner the better IMHO. It's not difficult to become a core developer at this point, apart from the non-trivial task of understanding the code and wanting to scratch an itch, since we are pretty desperate for developers. That is to say, the barriers are technical, not social. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Thu Feb 16 16:11:19 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 16 Feb 2012 15:11:19 -0600 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> Message-ID: <52C9DA42-4070-46D5-88FF-D2BDCA44F34B@continuum.io> This has been a clarifying discussion for some people. I'm glad people are speaking up. I believe in the value of consensus and the value of users opinions. I want to make sure that people who use NumPy and haven't yet learned how to contribute, feel like they have a voice. I have always been very open about adding people to the lists that I have influence over and giving people permissions to contribute even when they disagree with me. I recognize the value of multiple points of view. That is why in addition to creating the company (with a goal to allow at least some people to spend their day-job working on NumPy), I've pushed to organize a Foundation whose essential mission is to make sure that the core tools used for Python in Science stay open, maintained, and available. I will work very hard to do all I can to make these ventures successful. I had thought I would be able to spend more time on NumPy and SciPy over the past 4 years. This did not work out --- which is why I made a career change. All I can point to is my previous work and say thank you to all who have done so much for the communities I have been able to participate in. I believe in the power of community development, but I also believe in the power of directed development towards solving people's problems in an open market where people can choose to either interact with the provider or find another supplier. Having two organizations that I support helps me direct my energies towards both of those values. I resonate with Linus' s individual leanings. I'm not a big fan of design-by-committee as I haven't seen it be very successful in creating new technologies. It is pretty good at enforcing the status-quo. If I felt like that is what NumPy needed I would be fine with it. However, I feel that NumPy is going to be surpassed with other solutions if steps are not taken to improve the code-base *and* add new features. I'm very interested in discussions about how this work is to be accomplished. I'm with Mark that I believe this discussion will be more useful in 6 months when we have made it easier for more people to get involved with core code development. At the end of the day it is about people and what they spend there time doing. Whatever I do, inside or outside the community, people are free to accept or reject. I can only promise to do my best. It's all I ask of everyone I work with. It is gratifying to see that NumPy has become a well-used project and that there are significant numbers of stake-holders who want to see the project continue to succeed and be useful for them. My goal with the Foundation, with the Company and with my professional life is to see that the growth of Python in Science, Technology, and Data Analysis continues and even accelerates. My view right now is similar to Mark's in that we don't have enough core developers. Charles and Ralf and Pauli and David before them have done an amazing job at "pruning, cleaning, and maintaining what is there". Obviously, Jim, Perry, Todd, Rick, Konrad, David A., and Paul Dubois have had significant impact before them. NumPy has always been a community project, but it needs some energy put into it. As someone who is intimately familiar with the code-base (having worked on Numeric as early as 1998 as well as been part of many discussions about Scientific Computing on Python), I'm trying to infuse that energy as best I can. NumPy has a chance to be far more than it is. There are people using inferior solutions because of missing features in NumPy and the lack of awareness of how to use NumPy. There are other important use-cases that NumPy is an "almost-there" solution for. As it solves these problems, even more users will come to our community, and there needs to be a way to hear their voice as well. Just for the record, I don't believe the NA discussion has been finalized. In fact, the NA discussion this summer was one of the factors that led to my decision to put myself back into NumPy development full time --- I just had to figure out how to do it in a way that my family could accept. I think I could have contributed to that discussion as someone who understands both the code and how it is and has been used. For the next 6-12 months, I am comfortable taking the "benevolent dictator role". During that time, I hope we can find many more core developers and then re-visit the discussion. My view is that design decisions should be a consensus based on current contributors to the code base and major users. To continue to be relevant, NumPy has to serve it's customers. They are the ones who will have the final say. If others feel like they can do better, a fork is an option. I don't want that to happen, but it is the only effective and practical "governance" structure that exists in my mind outside of the self-governance of the people that participate. I and others that I work with will be working on code that we plan to put into NumPy. Some of this will be bug-fixes, code-cleanup, and new tests. Some of this will be new features --- features that should have been there from the beginning (If I had understood the use-cases then like I do now). Some of this work will be features that have already been proposed and talked about but nobody has stepped up to write the code (group-by, meta-data, labeled-arrays, etc.). All of the major features will be proposed on this list, and we will use the github process. I do care about code-quality, tests, and maintenance issues. Now that I am putting some actual resources into the project (and not just late-nights and stolen hours from my academic career and full-time job), I can actually put some energy and money behind those things --- along with new features. We need build-bots and a good issue-tracking system. I have applied to JetBrains for NumPy to have free access to YouTrack and TeamCity. We are looking for machines to host the build-systems on. Some people have approached me already volunteering to help. All of this help will be graciously accepted. The task before us is large, not small. It will require people working together, trusting each other, and looking for ways to find common ground instead of ways to disagree. No organizational structure can make up for the lack of great people putting their hearts and efforts into a great cause. At the AMS, SIAM, and other scientific meetings, I have seen hundreds of scientists using NumPy to analyze their large data-sets, and needing threading support, labels, units, data-persistence and management, and the ability to perform data-base-like queries on their data. On Wall Street I have seen quants use NumPy to quickly analyze trading data and create systems for risk analysis. In marketing companies, I have seen NumPy get used (somewhat inefficiently) to manage customer lists and do analysis about where to build retail stores. In several other companies, I have watched NumPy get used to analyze data coming from instruments and determine engineering direction based on those data. I have watched NumPy get used by insurance companies trying to charge a more effective premium. I have also seen people write solutions without NumPy --- because they don't understand the power of array-oriented computing, or have enough math to be comfortable with thinking of an array of data as a single *thing*. I've seen software architectures develop in the data-base world without awareness of NumPy (and it's ancestor's of J, and APL) and seen people struggle with maintaining bulky solutions that could be a few lines of array-code. I've seen people write compilers for Python while ignoring the NumPy use case and then later trying to bolt-it back on. I finally feel that I've gained enough experience and awareness to know what NumPy can and should be. I recognize that others have contributed to NumPy and SciPy, and I also recognize that people with great skill will want to comment on and proposals. I will do my best to listen. I will encourage those I work with and have any influence over to do the same. There are literally thousands of use-cases that NumPy can help people with. NumPy needs life-blood to make it go where it needs to go. All of this will happen in the full light of day. People will be free to comment, complain, argue, and ultimately fork if they don't like what we are doing. The NumPy developers will disagree and not everything I want will happen. I've been over-ruled before --- I expect it will happen in the future. The door is open to all who want to contribute. It remains so. There is a lot that needs to be done. I appreciate the concerns that have been raised and the people that have raised them. My limited energies remain devoted to improving the NumPy code, building Continuum, and building the Foundation to be able to support all of the Python for Science projects that it possibly can. Eventually, perhaps, I can even participate substantially with SciPy again --- where all of this started for me in 1998. Best regards, -Travis On Feb 16, 2012, at 10:56 AM, Nathaniel Smith wrote: > On Thu, Feb 16, 2012 at 12:27 AM, Dag Sverre Seljebotn > wrote: >> If non-contributing users came along on the Cython list demanding that >> we set up a system to select non-developers along on a board that would >> have discussions in order to veto pull requests, I don't know whether >> we'd ignore it or ridicule it or try to show some patience, but we >> certainly wouldn't take it seriously. > > I'm not really worried about the Continuum having some nefarious > "corporate" intent. But I am worried about how these plans will affect > numpy, and I think there serious risks if we don't think about > process. Money has a dramatic effect on FOSS development, and not > always in a positive way, even when -- or *especially* when -- > everyone has the best of intentions. I'm actually *more* worried about > altruistic full-time developers doing work on behalf of the community > than I am about developers who are working strictly in some company's > interests. > > Finding a good design for software is like a nasty optimization > problem -- it's easy to get stuck in local maxima, and any one person > has only an imperfect, noisy estimate of the objective function. So > you need lots of eyes to catch mistakes, filter out the noise, and > explore multiple maxima in parallel. > > The classic FOSS model of volunteer developers who are in charge of > project direction does a *great* job of solving this problem. (Linux > beat all the classic Unixen on technical quality, and it did it using > college students and volunteers -- it's not like Sun, IBM, HP etc. > couldn't afford better engineers! But they still lost.) Volunteers are > intimately familiar with the itch they're trying to scratch and the > trade-offs involved in doing so, and they need to work together to > produce anything major, so you get lots of different, high-quality > perspectives to help you figure out which approach is best. > > Developers who are working for some corporate interest alter this > balance, because in a "do-ocracy", someone who can throw a few > full-time developers at something suddenly is suddenly has effectively > complete control over project direction. There's no moral problem here > when the "dictator" is benevolent, but suddenly you have an > informational bottleneck -- even benevolent dictators make mistakes, > and they certainly aren't omniscient. Even this isn't *so* bad though, > so long as the corporation is scratching their own itch -- at least > you can be pretty sure that whatever they produce will at least make > them happy, which implies a certain level of utility. > > The riskiest case is paying developers to scratch someone else's itch. > IIUC, that's a major goal of Travis's here, to find a way to pay > developers to make numpy better for everyone. But, now you need some > way for the community to figure out what "better" means, because the > developers themselves don't necessarily know. It's not their itch > anymore. Running a poll or whatever might be a nice start, but we all > know how tough it is to extract useful design information from users. > You need a lot more than that if you want to keep the quality up. > > Travis's proposal is that we go from a large number of self-selecting > people putting in little bits of time to a small number of designated > people putting in lots of time. There's a major win in terms of total > effort, but you inevitably lose a lot of diversity of viewpoints. My > feeling is it will only be a net win if the new employees put serious, > bend-over-backwards effort into taking advantage of the volunteer > community's wisdom. > > This is why the NA discussion seems so relevant to me here -- everyone > involved absolutely had good intentions, excellent skills, etc., and > yet the outcome is still a huge unresolved mess. It was supposed to > make numpy more attractive for a certain set of applications, like > statistical analysis, where R is currently preferred. Instead, there > have been massive changes merged into numpy mainline, but most of the > intended "target market" for these changes is indifferent to them; > they don't solve the problem they're supposed to. And along the way > we've not just spent a bunch of Enthought's money, but also wasted > dozens of hours of volunteer time while seriously alienating some of > numpy's most dedicated advocates in that "target market". We could > debate about blame, and I'm sure there's plenty to spread around, but > I also think the fundamental problem isn't one of blame at all -- it's > that Mark, Charles and Travis *aren't* scratching an itch; AFAICT the > NA functionality is not something they actually need themselves. Which > means they're fighting uphill when trying to find the best solutions, > and haven't managed it yet. And were working on a deadline, to boot. > >> It's obvious that one should try for consensus as long as possible, >> including listening to users. But in the very end, when agreement can't >> be reached by other means, the developers are the one making the calls. >> (This is simply a consequence that they are the only ones who can >> credibly threaten to fork the project.) >> >> Sure, structures that includes users in the process could be useful... >> but, if the devs are fine with the current situation (and I don't see >> Mark or Charles complaining), then I honestly think it is quite rude to >> not let the matter drop after the first ten posts or so. > > I'm not convinced we need a formal governing body, but I think we > really, really need a community norm that takes consensus *very* > seriously. That principle is more important than who exactly enforces > it. I guess people are worried about that turning into obstructionism > or something, but seriously, this is a practical approach that works > well for lots of real actual successful FOSS projects. > > I think it's also worth distinguishing between "users" and "developers > who happen not to be numpy core developers". There are lots of > experienced and skilled developers who spend their time on, say, scipy > or nipy or whatever, just because numpy already works for them. That > doesn't mean they don't have valuable insights or a stake in how numpy > develops going forward! > > IMHO, everyone who can credibly participate in the technical > discussion should have a veto -- and should almost never use it. And > yes, that means volunteers should be able to screw up corporate > schedules if that's what's best for numpy-the-project. And, to be > clear, I'm not saying that random list-members somehow *deserve* to > screw around with generous corporate endowments; I'm saying that the > people running the corporation are going to be a lot happier in the > long run if they impose this rule on themselves. > > -- Nathaniel > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From thouis at gmail.com Thu Feb 16 16:20:24 2012 From: thouis at gmail.com (Thouis (Ray) Jones) Date: Thu, 16 Feb 2012 22:20:24 +0100 Subject: [Numpy-discussion] Migrating issues to GitHub In-Reply-To: References: Message-ID: On Thu, Feb 16, 2012 at 19:25, Ralf Gommers wrote: > In another thread Jira was proposed as an alternative to Trac. Can you point > out some of its strengths and weaknesses, and tell us why you decided to > move away from it? The two primary reasons were that our Jira server was behind a firewall and we wanted to open it up, and the integration with github, where we were moving our source. My own impression is that Jira is much more complicated. It was nice that it was integrated with Fisheye and some reporting tools, but I found them so complicated to deal with that I usually didn't go beyond "show me my bugs", some bulk bug editing, and adding users to projects. As a group, we had difficulties keeping track of how we were indicating priority and planned work, even with wiki pages to tell us what we intended the different labels to mean. Jira's integration with other tools (Fisheye, Crucible) was useful in some ways, but in no way critical. There were all kinds of reports (LOC, bug count, etc.) that one could get from these, but nothing that couldn't be created with pylab and a free hour or two. I like github's issues for their simplicity and the http-based API. We miss having direct attachements, but we have a workaround. It would be nice if the github issues page were more customizable, but with the API, a motivated group could create whatever frontend they wanted. Github's issues remind me of python, Jira reminded me of Java. I guess Jira would be more suited to a large developments effort with multiple groups of programmers, which we were not. Moving bugs from Jira to github wasn't too bad (we dropped most of the metadata, except for our current/next/future label for which release fixes would go into). I think it would be easier to move from github to Jira, primarily because github has fewer possible bits of metadata on each bug. As I said, I avoided using Jira for anything really complicated, so perhaps I just needed to spend more time with it. My opinion should probably not be given undue weight. From ralf.gommers at googlemail.com Thu Feb 16 16:32:10 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 16 Feb 2012 22:32:10 +0100 Subject: [Numpy-discussion] Migrating issues to GitHub In-Reply-To: References: Message-ID: On Thu, Feb 16, 2012 at 10:20 PM, Thouis (Ray) Jones wrote: > On Thu, Feb 16, 2012 at 19:25, Ralf Gommers > wrote: > > In another thread Jira was proposed as an alternative to Trac. Can you > point > > out some of its strengths and weaknesses, and tell us why you decided to > > move away from it? > > .... > Jira reminded me of Java. > OK, you convinced me:) Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Thu Feb 16 17:32:19 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 16 Feb 2012 14:32:19 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: <52C9DA42-4070-46D5-88FF-D2BDCA44F34B@continuum.io> References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <52C9DA42-4070-46D5-88FF-D2BDCA44F34B@continuum.io> Message-ID: Hi, Just for my own sake, can I clarify what you are saying here? On Thu, Feb 16, 2012 at 1:11 PM, Travis Oliphant wrote: > I'm not a big fan of design-by-committee as I haven't seen it be very successful in creating new technologies. ? It is pretty good at enforcing the status-quo. ?If I felt like that is what NumPy needed I would be fine with it. Was it your impression that what was being proposed, was design by committee? > However, I feel that NumPy is going to be surpassed with other solutions if steps are not taken to improve the code-base *and* add new features. As far as you are concerned, is there any controversy about that? > For the next 6-12 months, I am comfortable taking the "benevolent dictator role". ? During that time, I hope we can find many more core developers and then re-visit the discussion. ?My view is that design decisions should be a consensus based on current contributors to the code base and major users. ? To continue to be relevant, NumPy has to serve it's customers. ? They are the ones who will have the final say. ? If others feel like they can do better, a fork is an option. ?I don't want that to happen, but it is the only effective and practical "governance" structure that exists in my mind outside of the self-governance of the people that participate. To confirm, you are saying that you can imagine no improvement in the current governance structure? > No organizational structure can make up for the lack of great people putting their hearts and efforts into a great cause. But you agree that there might be an organizational structure that would make this harder or easier? Best, Matthew From travis at continuum.io Thu Feb 16 17:39:26 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 16 Feb 2012 16:39:26 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview Message-ID: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires: * addition of new features that are needed in NumPy * improving the code-base generally and moving towards a more maintainable NumPy I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process. As a result, I am proposing a rough outline for releases over the next year: * NumPy 1.7 to come out as soon as the serious bugs can be eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those. * NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: * resolving the NA/missing-data issues * finishing group-by * incorporating the start of label arrays * incorporating a meta-object * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) * adding ufunc support for flexible dtypes and possibly structured arrays * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities * adding "filters" to Input and Output * simple computed fields for dtypes * accepting a Data-Type specification as a class or JSON file * work towards improving the dtype-addition mechanism * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures. * NumPy 2.0 to come out in January of 2013. Mark Wiebe and I will post to this list a document that explains some of it's proposed features and enhancements. I won't steal his thunder for some of the things he is working on. If there are code issues people would like to see addressed, it would be a great time to speak up and/or propose something that you would like to see. In general NumPy 1.8 will have new features that need to be explored in order that NumPy 2.0 has enough code "experience" in order to be as useful as possible. I recognize that NumPy 1.8 has quite a few proposed features. These have been building up and are the big reason I've committed so many resources to NumPy. The feature-list did not just come out of my head. They are the result of talking and interacting with many NumPy users and watching the code get used (and not used) in the real world. This will be a faster pace of development. But, all of this will be in the open. If the NumPy 2.0 schedule is too aggressive, then we will have a NumPy 1.9 release in order to allow features to come out. Thanks, -Travis From warren.weckesser at enthought.com Thu Feb 16 17:56:28 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Thu, 16 Feb 2012 16:56:28 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> Message-ID: On Thu, Feb 16, 2012 at 4:39 PM, Travis Oliphant wrote: > Mark Wiebe and I have been discussing off and on (as well as talking with > Charles) a good way forward to balance two competing desires: > > * addition of new features that are needed in NumPy > * improving the code-base generally and moving towards a more > maintainable NumPy > > I know there are load voices for just focusing on the second of these and > avoiding the first until we have finished that. I recognize the need to > improve the code base, but I will also be pushing for improvements to the > feature-set and user experience in the process. > > As a result, I am proposing a rough outline for releases over the next > year: > > * NumPy 1.7 to come out as soon as the serious bugs can be > eliminated. Bryan, Francesc, Mark, and I are able to help triage some of > those. > > * NumPy 1.8 to come out in July which will have as many > ABI-compatible feature enhancements as we can add while improving test > coverage and code cleanup. I will post to this list more details of what > we plan to address with it later. Included for possible inclusion are: > * resolving the NA/missing-data issues > * finishing group-by > * incorporating the start of label arrays > * incorporating a meta-object > * a few new dtypes (variable-length string, varialbe-length unicode > and an enum type) > * adding ufunc support for flexible dtypes and possibly structured > arrays > * allowing generalized ufuncs to work on more kinds of arrays > besides just contiguous > * improving the ability for NumPy to receive JIT-generated function > pointers for ufuncs and other calculation opportunities > * adding "filters" to Input and Output > * simple computed fields for dtypes > * accepting a Data-Type specification as a class or JSON file > * work towards improving the dtype-addition mechanism > * re-factoring of code so that it can compile with a C++ compiler > and be minimally dependent on Python data-structures. > > * NumPy 2.0 to come out in January of 2013. Mark Wiebe and I will > post to this list a document that explains some of it's proposed features > and enhancements. I won't steal his thunder for some of the things he is > working on. > > If there are code issues people would like to see addressed, it would be a > great time to speak up and/or propose something that you would like to see. > The above list looks great. Another request that comes up occasionally on the mailing list is for the efficient computation of order statistics, the simplest case being a combined min/max function. Longish thread starts here: http://thread.gmane.org/gmane.comp.python.numeric.general/44130/ Warren > In general NumPy 1.8 will have new features that need to be explored in > order that NumPy 2.0 has enough code "experience" in order to be as useful > as possible. I recognize that NumPy 1.8 has quite a few proposed > features. These have been building up and are the big reason I've > committed so many resources to NumPy. The feature-list did not just come > out of my head. They are the result of talking and interacting with many > NumPy users and watching the code get used (and not used) in the real > world. This will be a faster pace of development. But, all of this > will be in the open. If the NumPy 2.0 schedule is too aggressive, then > we will have a NumPy 1.9 release in order to allow features to come out. > > Thanks, > > -Travis > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Feb 16 18:20:39 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 16 Feb 2012 18:20:39 -0500 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> Message-ID: On Thu, Feb 16, 2012 at 5:56 PM, Warren Weckesser wrote: > > > On Thu, Feb 16, 2012 at 4:39 PM, Travis Oliphant > wrote: >> >> Mark Wiebe and I have been discussing off and on (as well as talking with >> Charles) a good way forward to balance two competing desires: >> >> ? ? ? ?* addition of new features that are needed in NumPy >> ? ? ? ?* improving the code-base generally and moving towards a more >> maintainable NumPy >> >> I know there are load voices for just focusing on the second of these and >> avoiding the first until we have finished that. ?I recognize the need to >> improve the code base, but I will also be pushing for improvements to the >> feature-set and user experience in the process. >> >> As a result, I am proposing a rough outline for releases over the next >> year: >> >> ? ? ? ?* NumPy 1.7 to come out as soon as the serious bugs can be >> eliminated. ?Bryan, Francesc, Mark, and I are able to help triage some of >> those. >> >> ? ? ? ?* NumPy 1.8 to come out in July which will have as many >> ABI-compatible feature enhancements as we can add while improving test >> coverage and code cleanup. ? I will post to this list more details of what >> we plan to address with it later. ? ?Included for possible inclusion are: >> ? ? ? ?* resolving the NA/missing-data issues >> ? ? ? ?* finishing group-by >> ? ? ? ?* incorporating the start of label arrays >> ? ? ? ?* incorporating a meta-object >> ? ? ? ?* a few new dtypes (variable-length string, varialbe-length unicode >> and an enum type) >> ? ? ? ?* adding ufunc support for flexible dtypes and possibly structured >> arrays >> ? ? ? ?* allowing generalized ufuncs to work on more kinds of arrays >> besides just contiguous >> ? ? ? ?* improving the ability for NumPy to receive JIT-generated function >> pointers for ufuncs and other calculation opportunities >> ? ? ? ?* adding "filters" to Input and Output >> ? ? ? ?* simple computed fields for dtypes >> ? ? ? ?* accepting a Data-Type specification as a class or JSON file >> ? ? ? ?* work towards improving the dtype-addition mechanism >> ? ? ? ?* re-factoring of code so that it can compile with a C++ compiler >> and be minimally dependent on Python data-structures. >> >> ? ? ? ?* NumPy 2.0 to come out in January of 2013. ? Mark Wiebe and I will >> post to this list a document that explains some of it's proposed features >> and enhancements. ? ?I won't steal his thunder for some of the things he is >> working on. >> >> If there are code issues people would like to see addressed, it would be a >> great time to speak up and/or propose something that you would like to see. > > > > The above list looks great.? Another request that comes up occasionally on > the mailing list is for the efficient computation of order statistics, the > simplest case being a combined min/max function.? Longish thread starts > here: http://thread.gmane.org/gmane.comp.python.numeric.general/44130/ The list looks great, but for the time table I expect there will be at least a 1.9 and 1.10 necessary to improve what "we didn't get quite right in the first place", or what not many users had time to try out. Josef > > Warren > > >> >> In general NumPy 1.8 will have new features that need to be explored in >> order that NumPy 2.0 has enough code "experience" in order to be as useful >> as possible. ? I recognize that NumPy 1.8 has quite a few proposed features. >> ? These have been building up and are the big reason I've committed so many >> resources to NumPy. ? The feature-list did not just come out of my head. >> They are the result of talking and interacting with many NumPy users and >> watching the code get used (and not used) in the real world. ? ?This will be >> a faster pace of development. ? But, all of this will be in the open. ? ?If >> the NumPy 2.0 schedule is too aggressive, then we will have a NumPy 1.9 >> release in order to allow features to come out. >> >> Thanks, >> >> -Travis >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From charlesr.harris at gmail.com Thu Feb 16 18:24:50 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 16 Feb 2012 16:24:50 -0700 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> Message-ID: On Thu, Feb 16, 2012 at 4:20 PM, wrote: > On Thu, Feb 16, 2012 at 5:56 PM, Warren Weckesser > wrote: > > > > > > On Thu, Feb 16, 2012 at 4:39 PM, Travis Oliphant > > wrote: > >> > >> Mark Wiebe and I have been discussing off and on (as well as talking > with > >> Charles) a good way forward to balance two competing desires: > >> > >> * addition of new features that are needed in NumPy > >> * improving the code-base generally and moving towards a more > >> maintainable NumPy > >> > >> I know there are load voices for just focusing on the second of these > and > >> avoiding the first until we have finished that. I recognize the need to > >> improve the code base, but I will also be pushing for improvements to > the > >> feature-set and user experience in the process. > >> > >> As a result, I am proposing a rough outline for releases over the next > >> year: > >> > >> * NumPy 1.7 to come out as soon as the serious bugs can be > >> eliminated. Bryan, Francesc, Mark, and I are able to help triage some > of > >> those. > >> > >> * NumPy 1.8 to come out in July which will have as many > >> ABI-compatible feature enhancements as we can add while improving test > >> coverage and code cleanup. I will post to this list more details of > what > >> we plan to address with it later. Included for possible inclusion > are: > >> * resolving the NA/missing-data issues > >> * finishing group-by > >> * incorporating the start of label arrays > >> * incorporating a meta-object > >> * a few new dtypes (variable-length string, varialbe-length > unicode > >> and an enum type) > >> * adding ufunc support for flexible dtypes and possibly > structured > >> arrays > >> * allowing generalized ufuncs to work on more kinds of arrays > >> besides just contiguous > >> * improving the ability for NumPy to receive JIT-generated > function > >> pointers for ufuncs and other calculation opportunities > >> * adding "filters" to Input and Output > >> * simple computed fields for dtypes > >> * accepting a Data-Type specification as a class or JSON file > >> * work towards improving the dtype-addition mechanism > >> * re-factoring of code so that it can compile with a C++ compiler > >> and be minimally dependent on Python data-structures. > >> > >> * NumPy 2.0 to come out in January of 2013. Mark Wiebe and I > will > >> post to this list a document that explains some of it's proposed > features > >> and enhancements. I won't steal his thunder for some of the things > he is > >> working on. > >> > >> If there are code issues people would like to see addressed, it would > be a > >> great time to speak up and/or propose something that you would like to > see. > > > > > > > > The above list looks great. Another request that comes up occasionally > on > > the mailing list is for the efficient computation of order statistics, > the > > simplest case being a combined min/max function. Longish thread starts > > here: http://thread.gmane.org/gmane.comp.python.numeric.general/44130/ > > The list looks great, but for the time table I expect there will be at > least a 1.9 and 1.10 necessary to improve what "we didn't get quite > right in the first place", or what not many users had time to try out. > > That's my sense also. I think the long list needs to be prioritized and broken up into smaller chunks. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ceball at gmail.com Thu Feb 16 18:52:00 2012 From: ceball at gmail.com (Chris Ball) Date: Thu, 16 Feb 2012 23:52:00 +0000 (UTC) Subject: [Numpy-discussion] Buildbot/continuous integration (was Re: Issue Tracking) References: Message-ID: Ralf Gommers googlemail.com> writes: ... > While we're at it, our buildbot situation is much worse than our issue > tracker situation. This also looks good (and free): > http://www.jetbrains.com/teamcity/ I'd like to help with the NumPy Buildbot situation, and below I propose a plan for myself to do this. However, I realize there are people who know more about continuous integration than I do. So, if someone is already planning to do something, I'd be happy to help with a different plan instead! I know how to set up and run Buildbot (and how much effort that takes), but I'm not familiar with the alternatives, so I can only propose one concrete plan: I'll find a machine on which to run a build master, then start to add slaves (real machines or virtual machines). At first I'll focus on the NumPy master branch, (a) testing it over different operating systems and versions of Python and (b) reporting things such as test coverage. I'll keep the Buildbot configuration in a github project, along with documentation (in case I disappear...). After getting to this initial stage, I'll discuss about adding more features (such as testing pull requests, performance testing, building binaries on the different operating systems, etc). Also, if it's working well, this Buildbot setup could replace/be merged with the one at buildbot.scipy.org (I don't know who is currently running that). Buildbot is used by some big projects (e.g. Python, Chromium, and Mozilla), but I'm aware that several projects in the scientific/numeric Python ecosystem use Jenkins (including Cython, IPython, and SymPy), often using a hosted Jenkins solution such as Shining Panda. A difficult part of running a Buildbot service is finding hardware for the slaves and keeping them alive, so a hosted solution sounds wonderful (assuming hosted solutions offer an adequate range of operating systems etc). Also, earlier in the "Issue Tracking" thread some commercial packages were mentioned; I don't know anything about those. So, as I said at the beginning, if someone is already planning to do something (or wants to) I'd be happy to help with a different plan instead! Otherwise, I can proceed with the plan I suggested. Chris From travis at continuum.io Thu Feb 16 18:58:43 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 16 Feb 2012 17:58:43 -0600 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <52C9DA42-4070-46D5-88FF-D2BDCA44F34B@continuum.io> Message-ID: <5C553755-83A5-4234-AF4E-FCFD4B59578A@continuum.io> Matthew, What you should take from my post is that I appreciate your concern for the future of the NumPy project, and am grateful that you have an eye to the sort of things that can go wrong --- it will help ensure they don't go wrong. But, I personally don't agree that it is necessary to put any more formal structure in place at this time, and we should wait for 6-12 months, and see where we are at while doing everything we can to get more people interested in contributing to the project. I'm comfortable playing the role of BDF12 with a cadre of developers/contributors who seeks to come to consensus. I believe there are sufficient checks on the process that will make it quite difficult for me to *abuse* that in the short term. Charles, Rolf, Mark, David, Robert, Josef, you, and many others are already quite adept at calling me out when I do things they don't like or think are problematic. I encourage them to continue this. I can't promise I'll do everything you want, but I can promise I will listen and take your opinions seriously --- just like I take the opinions of every contributor to the NumPy and SciPy lists seriously (though weighted by the work-effort they have put on the project). We can all only continue to do our best to help out wherever we can. Just so we are clear: Continuum's current major client is the larger NumPy/SciPy community itself and this will remain the case for at least several months. You have nothing to fear from "other clients" we are trying to please. Thus, we are incentivized to keep as many people happy as possible. In the second place, the Foundation's major client is the same community (and even broader) and the rest of the board is committed to the overall success of the ecosystem. There is a reason the board is comprised of a wide-representation of that eco-system. I am very hopeful that numfocus will evolve over time to have an active community of people who participate in it's processes and plans to support as many projects as it can given the bandwidth and funding available to it. So, if I don't participate in this discussion, anymore, it's because I am working on some open-source things I'd like to show at PyCon, and time is clicking down. If you really feel strongly about this, then I would suggest that you come up with a proposal for governance that you would like us all to review. At the SciPy conference in Austin this summer we can talk about it --- when many of us will be face-to-face. Best regards, -Travis On Feb 16, 2012, at 4:32 PM, Matthew Brett wrote: > Hi, > > Just for my own sake, can I clarify what you are saying here? > > On Thu, Feb 16, 2012 at 1:11 PM, Travis Oliphant wrote: >> I'm not a big fan of design-by-committee as I haven't seen it be very successful in creating new technologies. It is pretty good at enforcing the status-quo. If I felt like that is what NumPy needed I would be fine with it. > > Was it your impression that what was being proposed, was design by committee? > >> However, I feel that NumPy is going to be surpassed with other solutions if steps are not taken to improve the code-base *and* add new features. > > As far as you are concerned, is there any controversy about that? > >> For the next 6-12 months, I am comfortable taking the "benevolent dictator role". During that time, I hope we can find many more core developers and then re-visit the discussion. My view is that design decisions should be a consensus based on current contributors to the code base and major users. To continue to be relevant, NumPy has to serve it's customers. They are the ones who will have the final say. If others feel like they can do better, a fork is an option. I don't want that to happen, but it is the only effective and practical "governance" structure that exists in my mind outside of the self-governance of the people that participate. > > To confirm, you are saying that you can imagine no improvement in the > current governance structure? > >> No organizational structure can make up for the lack of great people putting their hearts and efforts into a great cause. > > But you agree that there might be an organizational structure that > would make this harder or easier? > > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From travis at continuum.io Thu Feb 16 19:01:46 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 16 Feb 2012 18:01:46 -0600 Subject: [Numpy-discussion] Buildbot/continuous integration (was Re: Issue Tracking) In-Reply-To: References: Message-ID: <7237DB36-3598-4C05-A19C-5D19CFA678E6@continuum.io> We never turn down good help like this. Thank's Chris. I have applied for an unlimited license for TeamCity for the NumPy project. I have heard good things about TeamCity, although getting the slaves cranking and staying cranking is the goal and not the CI architecture. If you know build-bot, it's a good place to start. I have heard very positive things about Jenkins. I also think that hosted solutions are going to be easier to manage over time. But, your offer is very generous. -Travis On Feb 16, 2012, at 5:52 PM, Chris Ball wrote: > Ralf Gommers googlemail.com> writes: > ... >> While we're at it, our buildbot situation is much worse than our issue >> tracker situation. This also looks good (and free): >> http://www.jetbrains.com/teamcity/ > > I'd like to help with the NumPy Buildbot situation, and below I propose > a plan for myself to do this. However, I realize there are people who > know more about continuous integration than I do. So, if someone is > already planning to do something, I'd be happy to help with a different > plan instead! > > > I know how to set up and run Buildbot (and how much effort that takes), > but I'm not familiar with the alternatives, so I can only propose one > concrete plan: > > I'll find a machine on which to run a build master, then start to add > slaves (real machines or virtual machines). At first I'll focus on the > NumPy master branch, (a) testing it over different operating systems and > versions of Python and (b) reporting things such as test coverage. I'll > keep the Buildbot configuration in a github project, along with > documentation (in case I disappear...). > > After getting to this initial stage, I'll discuss about adding more > features (such as testing pull requests, performance testing, building > binaries on the different operating systems, etc). Also, if it's working > well, this Buildbot setup could replace/be merged with the one at > buildbot.scipy.org (I don't know who is currently running that). > > > Buildbot is used by some big projects (e.g. Python, Chromium, and > Mozilla), but I'm aware that several projects in the scientific/numeric > Python ecosystem use Jenkins (including Cython, IPython, and SymPy), > often using a hosted Jenkins solution such as Shining Panda. A difficult > part of running a Buildbot service is finding hardware for the slaves > and keeping them alive, so a hosted solution sounds wonderful (assuming > hosted solutions offer an adequate range of operating systems etc). > Also, earlier in the "Issue Tracking" thread some commercial packages > were mentioned; I don't know anything about those. So, as I said at the > beginning, if someone is already planning to do something (or wants to) > I'd be happy to help with a different plan instead! Otherwise, I can > proceed with the plan I suggested. > > Chris > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From takowl at gmail.com Thu Feb 16 19:07:44 2012 From: takowl at gmail.com (Thomas Kluyver) Date: Fri, 17 Feb 2012 00:07:44 +0000 Subject: [Numpy-discussion] Buildbot/continuous integration (was Re: Issue Tracking) In-Reply-To: References: Message-ID: On 16 February 2012 23:52, Chris Ball wrote: > I'm aware that several projects in the scientific/numeric > Python ecosystem use Jenkins (including Cython, IPython, and SymPy), > often using a hosted Jenkins solution such as Shining Panda. A difficult > part of running a Buildbot service is finding hardware for the slaves > and keeping them alive, so a hosted solution sounds wonderful (assuming > hosted solutions offer an adequate range of operating systems etc). We're using ShiningPanda's hosted CI for IPython: https://jenkins.shiningpanda.com/ipython/ It has a number of things going for it - not least that the basic service is free for FLOSS - but, to misquote, you can have any OS you like, so long as it's Debian 6. I get the feeling that they're still developing things, so maybe there will be more options in the future, but that's the state now. You may notice an ipython-mac job in the list - one of our contributors kindly set up his Mac to run the test suite overnight, and we have ShiningPanda download the zipped results. It's a neat trick, but it's not really a solution if you're testing many OS flavours. Thomas From ognen at enthought.com Thu Feb 16 19:11:31 2012 From: ognen at enthought.com (Ognen Duzlevski) Date: Thu, 16 Feb 2012 18:11:31 -0600 Subject: [Numpy-discussion] Buildbot/continuous integration (was Re: Issue Tracking) In-Reply-To: References: Message-ID: On Thu, Feb 16, 2012 at 6:07 PM, Thomas Kluyver wrote: > On 16 February 2012 23:52, Chris Ball wrote: >> I'm aware that several projects in the scientific/numeric >> Python ecosystem use Jenkins (including Cython, IPython, and SymPy), >> often using a hosted Jenkins solution such as Shining Panda. A difficult >> part of running a Buildbot service is finding hardware for the slaves >> and keeping them alive, so a hosted solution sounds wonderful (assuming >> hosted solutions offer an adequate range of operating systems etc). > > We're using ShiningPanda's hosted CI for IPython: > https://jenkins.shiningpanda.com/ipython/ > > It has a number of things going for it - not least that the basic > service is free for FLOSS - but, to misquote, you can have any OS you > like, so long as it's Debian 6. I get the feeling that they're still > developing things, so maybe there will be more options in the future, > but that's the state now. > > You may notice an ipython-mac job in the list - one of our > contributors kindly set up his Mac to run the test suite overnight, > and we have ShiningPanda download the zipped results. It's a neat > trick, but it's not really a solution if you're testing many OS > flavours. You may also set up a few (free) EC2 instances on Amazon, some with Linux and some with Windows Server 2008 and install your own Jenkins CI solution on them. Unfortunately, OS X is not offered on Amazon either... Many ways to skin a cat.. ;-) Ognen From njs at pobox.com Thu Feb 16 19:12:14 2012 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 17 Feb 2012 00:12:14 +0000 Subject: [Numpy-discussion] Buildbot/continuous integration (was Re: Issue Tracking) In-Reply-To: References: Message-ID: On Thu, Feb 16, 2012 at 11:52 PM, Chris Ball wrote: > Buildbot is used by some big projects (e.g. Python, Chromium, and > Mozilla), but I'm aware that several projects in the scientific/numeric > Python ecosystem use Jenkins (including Cython, IPython, and SymPy), > often using a hosted Jenkins solution such as Shining Panda. A difficult > part of running a Buildbot service is finding hardware for the slaves > and keeping them alive, so a hosted solution sounds wonderful (assuming > hosted solutions offer an adequate range of operating systems etc). A quick look at Shining Panda suggests that you get no coverage for anything but Linux, which is a good start but rather limiting. IME by far the most annoying part of a useful buildbot setup is keeping all the build slaves up and working. It's one thing to set up a build environment in one OS, it's quite another to keep like 5 of them working, each on a different volunteered machine where you don't have root and the person who does isn't answering email... the total effort isn't large, but it's really poorly suited to the nature of volunteer labor, because it needs prompt attention at random intervals. (Also, this doesn't become obvious until after one's already gotten everything set up, so then you're stuck limping along because who wants to start over and build something more maintainable...) If anyone has existing sysadmin resources then keeping build-slaves running is a place where they'd be a huge contribution. -- Nathaniel From matthew.brett at gmail.com Thu Feb 16 19:22:26 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 16 Feb 2012 16:22:26 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: <5C553755-83A5-4234-AF4E-FCFD4B59578A@continuum.io> References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <52C9DA42-4070-46D5-88FF-D2BDCA44F34B@continuum.io> <5C553755-83A5-4234-AF4E-FCFD4B59578A@continuum.io> Message-ID: Hi, On Thu, Feb 16, 2012 at 3:58 PM, Travis Oliphant wrote: > > Matthew, > > What you should take from my post is that I appreciate your concern for the future of the NumPy project, and am grateful that you have an eye to the sort of things that can go wrong --- it will help ensure they don't go wrong. > > But, I personally don't agree that it is necessary to put any more formal structure in place at this time, and we should wait for 6-12 months, and see where we are at while doing everything we can to get more people interested in contributing to the project. ? ? I'm comfortable playing the role of BDF12 with a cadre of developers/contributors who seeks to come to consensus. ? ?I believe there are sufficient checks on the process that will make it quite difficult for me to *abuse* that in the short term. ? Charles, Rolf, Mark, David, Robert, Josef, you, and many others are already quite adept at calling me out when I do things they don't like or think are problematic. ? ?I encourage them to continue this. ? I can't promise I'll do everything you want, but I can promise I will listen and take your opinions seriously --- just like I take the opinions of every contributor to the NumPy and SciPy lists seriously (though weighted by the work-effort they have put on the project). > ?We can all only continue to do our best to help out wherever we can. > > Just so we are clear: ?Continuum's current major client ?is the larger NumPy/SciPy community itself and this will remain the case for at least several months. ? ?You have nothing to fear from "other clients" we are trying to please. ? Thus, we are incentivized to keep as many people happy as possible. ? ?In the second place, the Foundation's major client is the same community (and even broader) and the rest of the board is committed to the overall success of the ecosystem. ? There is a reason the board is comprised of a wide-representation of that eco-system. ? I am very hopeful that numfocus will evolve over time to have an active community of people who participate in it's processes and plans to support as many projects as it can given the bandwidth and funding available to it. > > So, if I don't participate in this discussion, anymore, it's because I am working on some open-source things I'd like to show at PyCon, and time is clicking down. ? ?If you really feel strongly about this, then I would suggest that you come up with a proposal for governance that you would like us all to review. ?At the SciPy conference in Austin this summer we can talk about it --- when many of us will be face-to-face. This has not been an encouraging episode in striving for consensus. I see virtually no movement from your implied position at the beginning of this thread, other than the following 1) yes you are in charge 2) you'll consider other options in 6 to 12 months. I think you're saying here that you won't reply any more on this thread, and I suppose that reflects the importance you attach to this problem. I will not myself propose a governance model because I do not consider myself to have enough influence (on various metrics) to make it likely it would be supported. I wish that wasn't my perception of how things are done here. Best, Matthew From matthew.brett at gmail.com Thu Feb 16 19:36:18 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 16 Feb 2012 16:36:18 -0800 Subject: [Numpy-discussion] Buildbot/continuous integration (was Re: Issue Tracking) In-Reply-To: References: Message-ID: Hi, On Thu, Feb 16, 2012 at 4:12 PM, Nathaniel Smith wrote: > On Thu, Feb 16, 2012 at 11:52 PM, Chris Ball wrote: >> Buildbot is used by some big projects (e.g. Python, Chromium, and >> Mozilla), but I'm aware that several projects in the scientific/numeric >> Python ecosystem use Jenkins (including Cython, IPython, and SymPy), >> often using a hosted Jenkins solution such as Shining Panda. A difficult >> part of running a Buildbot service is finding hardware for the slaves >> and keeping them alive, so a hosted solution sounds wonderful (assuming >> hosted solutions offer an adequate range of operating systems etc). > > A quick look at Shining Panda suggests that you get no coverage for > anything but Linux, which is a good start but rather limiting. IME by > far the most annoying part of a useful buildbot setup is keeping all > the build slaves up and working. It's one thing to set up a build > environment in one OS, it's quite another to keep like 5 of them > working, each on a different volunteered machine where you don't have > root and the person who does isn't answering email... the total effort > isn't large, but it's really poorly suited to the nature of volunteer > labor, because it needs prompt attention at random intervals. (Also, > this doesn't become obvious until after one's already gotten > everything set up, so then you're stuck limping along because who > wants to start over and build something more maintainable...) > > If anyone has existing sysadmin resources then keeping build-slaves > running is a place where they'd be a huge contribution. Yup - keeping the slaves running is the big problem. We do have various slaves running here that at least are all accessible by me, and Jarrod, and (at a pinch) Fernando, Stefan and others. These are: XP (when I'm not using the machine, which is the large majority of the time) OSX 10.5 OSX 10.4 PPC Linux 32 bit Linux 64 bit These are all real machines not virtual machines. I'm happy to give some reliable person ssh access to the buildslave user on these machines. They won't necessarily be available for all time, they are dotted around campus doing various jobs like being gateways, project machines, occasional desktops. See you, Matthew From 00ai99 at gmail.com Thu Feb 16 19:41:14 2012 From: 00ai99 at gmail.com (David Gowers (kampu)) Date: Fri, 17 Feb 2012 11:11:14 +1030 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> Message-ID: On Fri, Feb 17, 2012 at 9:09 AM, Travis Oliphant wrote: > * incorporating a meta-object > * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) > * simple computed fields for dtypes >From the sound of that, I'm certainly looking forward to seeing some details (like: Do you mean Pascal (length, content) style strings, AKA struct code 'p'?; Read-only dtype fields computed via a callback function?). > ? ? ? ?* accepting a Data-Type specification as a class or JSON file On that subject, I incidentally have implemented a pair of functions (freeze()/thaw()) that make de/serialization to JSON or YAML fairly simple. (currently they leave fundamental dtypes as is. Basically the only thing that would be necessary to render the result serializable to/from JSON, is representing fundamental dtypes as JSON-safe objects .. a string would probably do.) http://paste.pocoo.org/show/552311/ (Modified slightly from code in my project here: https://gitorious.org/bits/bits/blobs/master/dtype.py) I've tried and failed to find a bug report for dtype serialization. Should I create a new ticket for JSON deserialization? (serialization wouldn't hurt either, since that would let us store both an array's data/shape/etc and its dtype in the same JSON document.) From wardefar at iro.umontreal.ca Thu Feb 16 19:51:35 2012 From: wardefar at iro.umontreal.ca (David Warde-Farley) Date: Thu, 16 Feb 2012 19:51:35 -0500 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> Message-ID: <2C1C5F7C-F4B3-4B8B-987E-BE00A05EF8A9@iro.umontreal.ca> On 2012-02-16, at 1:28 PM, Charles R Harris wrote: > I think this is a good point, which is why the idea of a long term release is appealing. That release should be stodgy and safe, while the ongoing development can be much more radical in making changes. I sort of thought this *was* the state of affairs re: NumPy 2.0. > And numpy really does need a fairly radical rewrite, just to clarify and simplify the base code easier if nothing else. New features I'm more leery about, at least until the code base is improved, which would be my short term priority. As someone who has now thrice ventured into the NumPy C code (twice to add features, once to fix a nasty bug I encountered) I simply could not agree more. While it's not a completely hopeless exercise for someone comfortable with C to get themselves acquainted with NumPy's C internals, the code base as is could be simpler. A refactoring and *documentation* effort would be a good way to get more people contributing to this side of NumPy. I believe the suggestion of seeing more of the C moved to Cython has also been floated before. David From alan.isaac at gmail.com Thu Feb 16 20:26:35 2012 From: alan.isaac at gmail.com (Alan G Isaac) Date: Thu, 16 Feb 2012 20:26:35 -0500 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <52C9DA42-4070-46D5-88FF-D2BDCA44F34B@continuum.io> <5C553755-83A5-4234-AF4E-FCFD4B59578A@continuum.io> Message-ID: <4F3DACCB.2040004@gmail.com> On 2/16/2012 7:22 PM, Matthew Brett wrote: > This has not been an encouraging episode in striving for consensus. I disagree. Failure to reach consensus does not imply lack of striving. I see parallels with a recent hiring decision process I observed. There were fundamentally different views of how to rank the top candidates. Because those involved value consensus, these views were extensively aired and discussed. *That* is where the commitment to consensus showed. It proved not possible to reach a consensus on the candidate choice, so the decision of the search committee carried the day. (Even there, there was not consensus.) In the end, there is work to be done, and getting the work done is something *everyone* agrees trumps other disagreements. Striving for consensus does not mean that a minority automatically gets veto rights. Cheers, Alan From ben.root at ou.edu Thu Feb 16 20:35:52 2012 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 16 Feb 2012 19:35:52 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> Message-ID: On Thursday, February 16, 2012, Warren Weckesser wrote: > > > On Thu, Feb 16, 2012 at 4:39 PM, Travis Oliphant wrote: > >> Mark Wiebe and I have been discussing off and on (as well as talking with >> Charles) a good way forward to balance two competing desires: >> >> * addition of new features that are needed in NumPy >> * improving the code-base generally and moving towards a more >> maintainable NumPy >> >> I know there are load voices for just focusing on the second of these and >> avoiding the first until we have finished that. I recognize the need to >> improve the code base, but I will also be pushing for improvements to the >> feature-set and user experience in the process. >> >> As a result, I am proposing a rough outline for releases over the next >> year: >> >> * NumPy 1.7 to come out as soon as the serious bugs can be >> eliminated. Bryan, Francesc, Mark, and I are able to help triage some of >> those. >> >> * NumPy 1.8 to come out in July which will have as many >> ABI-compatible feature enhancements as we can add while improving test >> coverage and code cleanup. I will post to this list more details of what >> we plan to address with it later. Included for possible inclusion are: >> * resolving the NA/missing-data issues >> * finishing group-by >> * incorporating the start of label arrays >> * incorporating a meta-object >> * a few new dtypes (variable-length string, varialbe-length >> unicode and an enum type) >> * adding ufunc support for flexible dtypes and possibly structured >> arrays >> * allowing generalized ufuncs to work on more kinds of arrays >> besides just contiguous >> * improving the ability for NumPy to receive JIT-generated >> function pointers for ufuncs and other calculation opportunities >> * adding "filters" to Input and Output >> * simple computed fields for dtypes >> * accepting a Data-Type specification as a class or JSON file >> * work towards improving the dtype-addition mechanism >> * re-factoring of code so that it can compile with a C++ compiler >> and be minimally dependent on Python data-structures. >> >> * NumPy 2.0 to come out in January of 2013. Mark Wiebe and I >> will post to this list a document that explains some of it's proposed >> features and enhancements. I won't steal his thunder for some of the >> things he is working on. >> >> If there are code issues people would like to see addressed, it would be >> a great time to speak up and/or propose something that you would like to >> see. >> > > > The above list looks great. Another request that comes up occasionally on > the mailing list is for the efficient computation of order statistics, the > simplest case being a combined min/max function. Longish thread starts > here: http://thread.gmane.org/gmane.comp.python.numeric.general/44130/ > > Warren > > > +1 on this. Also, before I forget, it looks like as of matlab 2011, they also have a "minmax" function, but for the neural network toolbox. Also, what it does is so constrained and different that at the very least, a note about it should go into the "numpy for matlab users" webpage. Ben Root > >> In general NumPy 1.8 will have new features that need to be explored in >> order that NumPy 2.0 has enough code "experience" in order to be as useful >> as possible. I recognize that NumPy 1.8 has quite a few proposed >> features. These have been building up and are the big reason I've >> committed so many resources to NumPy. The feature-list did not just come >> out of my head. They are the result of talking and interacting with many >> NumPy users and watching the code get used (and not used) in the real >> world. This will be a faster pace of development. But, all of this >> will be in the open. If the NumPy 2.0 schedule is too aggressive, then >> we will have a NumPy 1.9 release in order to allow features to come out. >> >> Thanks, >> >> -Travis >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Thu Feb 16 22:11:22 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 16 Feb 2012 19:11:22 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: <4F3DACCB.2040004@gmail.com> References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <52C9DA42-4070-46D5-88FF-D2BDCA44F34B@continuum.io> <5C553755-83A5-4234-AF4E-FCFD4B59578A@continuum.io> <4F3DACCB.2040004@gmail.com> Message-ID: Hi, On Thu, Feb 16, 2012 at 5:26 PM, Alan G Isaac wrote: > On 2/16/2012 7:22 PM, Matthew Brett wrote: >> This has not been an encouraging episode in striving for consensus. > Striving for consensus does not mean that a minority > automatically gets veto rights. 'Striving' for consensus does imply some attempt to get to grips with the arguments, and working on some compromise to accommodate both parties. It seems to me there was very great latitude for finding such a comprise here, but Travis has terminated the discussion and I see no sign of a compromise. "Striving for consensus" can't of course be regulated. The desire has to be there. It's probably true, as Nathaniel says, that there isn't much you can do to legislate on that. We can only try to persuade. I was trying to do that, I failed, I'll have to look back and see if there was something else I could have done that would have been more useful to the same end, Best, Matthew From jdh2358 at gmail.com Thu Feb 16 23:20:00 2012 From: jdh2358 at gmail.com (John Hunter) Date: Thu, 16 Feb 2012 22:20:00 -0600 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: <4F3DACCB.2040004@gmail.com> References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <52C9DA42-4070-46D5-88FF-D2BDCA44F34B@continuum.io> <5C553755-83A5-4234-AF4E-FCFD4B59578A@continuum.io> <4F3DACCB.2040004@gmail.com> Message-ID: On Thu, Feb 16, 2012 at 7:26 PM, Alan G Isaac wrote: > On 2/16/2012 7:22 PM, Matthew Brett wrote: > > This has not been an encouraging episode in striving for consensus. > > I disagree. > Failure to reach consensus does not imply lack of striving. > > Hey Alan, thanks for your thoughtful and nuanced views. I agree with everything you've said, but have a few additional points. At the risk of wading into a thread that has grown far too long, and echoing Eric's comments that the idea of governance is murky at best when there is no provision for enforceability, I have a few comments. Full disclosure: Travis has asked me and I have agreed to to serve on a board for "numfocus", the not-for-profit arm of his efforts to promote numpy and related tools. Although I have no special numpy developer chops, as the original author of matplotlib, which is one of the leading "numpy clients", he asked me to join his organization as a "community representative". I support his efforts, and so agreed to join the numfocus board. My first and most important point is that the subtext of many postings here about the fear of undue and inappropriate influence of Continuum under Travis' leadership is far overblown. Travis created numpy -- it is his baby. Undeniably, he created it by standing on the shoulders of giants: Jim Hugunin, Paul Dubois, Perry Greenfield and his team, and many others. But the idea that we need to guard against the possibility that his corporate interests will compromise his interests in "what is best for numpy" is academic at best. As someone who has created a significant project in the realm of "scientific computing in Python", I can tell you that it is something I take quite a bit of pride in and it is very important to me that the project thrives as it was intended to: as a free, open-source, best-practice way of doing science. I know Travis well enough to know he feels the same way -- numpy doing well is *at least* important to him his company doing well. All of his recent actions to start a company and foundation which focuses resources on numpy and related tools reinforce that view. If he had a different temperament, he wouldn't have devoted five to ten years of is life to Numeric, scipy and numpy. He is a BDFL for a reason: he has earned our trust. And he has proven his ability to lead when *almost everyone* was against him. At the height of the Numeric/numarray split, and I was deeply involved in this as the mpl author because we had a "numerix" compatibility layer to allow users to use one or the other, Travis proposed writing numpy to solve both camp's problems. I really can't remember a single individual who supported him. What I remember is the cacophony of voices who though this was a bad idea, because of the "third fork" problem. But Travis forged ahead, on his own, wrote numpy, and re-united the Numeric and numarray camps. And all-the-while he maintained his friendship with the numarray developers (Perry Greenfield who led the numarray development effort has also been invited by Travis to the numfocus board, as has Fernando Perez and Jarrod Millman). Although MPL at the time agreed to support a third version in its numerix compatibility layer for numpy, I can thankfully say we have since dropped support for the compatibility layer entirely as we all use numpy now. This to me is the distilled essence of leadership, against the voices of the masses, and it bears remembering. I have two more points I want to make: one is on democracy, and one is on corporate control. On corporate control: there have been a number of posts in this thread about the worries and dangers that Continuum poses as the corporate sponser of numpy development, about how this may cause numpy to shift from a model of a few loosely connected, decentralized cadre of volunteers to a centrally controlled steering committee of programmers who are controlled by corporate headquarters and who make all their decisions around the water cooler unobserved by the community of users. I want to make a connection to something that happened in the history of matplotlib development, something that is not strictly analogous but I think close enough to be informative. Sometime around 2005, Perry Greenfield, who heads the development team of the Space Telescope Science Institute (STScI) that is charged with processing the Hubble image pipeline, emailed me that he was considering using matplotlib as their primary image visualization tool. I can't tell you how excited I was at the time. The idea of having institutional sponsorship from someone as prestigious and resourceful as STScI was hugely motivating. I worked feverishly for months to add stuff they needed: better rendering, better image support, mathtext and lots more. But more importantly, Perry was offering to bring institutional support to my project: well qualified full-time employees who dedicated a significant part of their time to matplotlib development. He had done this before with numarray development, and the contributions of his team are enormous. Many mpl features owe their support to institutional sopnsership: Perry's group deserves the biggest props, but Ted Drain's group at the JPL and corporate sponsors as well are on the list. What I want you to think about are the parallels between Perry and his team joining matplotlib's development effort and Continuum's stated desire to contribute to numpy development. Yes, STScI is a not-for-profit entity operated by NASA, and Continuum is a for-profit-entity with a not-for-profit arm (numfocus). But the differences are not so great in my experience. Both for-profits and not-for-profits experience institutional pressures to get code out on a deadline. In fact, perhaps our "finest hour" in matplotlib development came as a result of one of out not-for-profit client's deadlines. My favorite story is when the Jet Propulsion Labs at Caltech emailed me about the inadequacy of our ellipse approximations, and gave us the constraint that the Mars Rover was scheduled to land in the next few months. Talk about a hard deadline! Michael Droettboom, under Perry's direction, implemented a "8-cubic-spline-approximation-to-curves-in-the-viewport" solution that I honestly think gives matplotlib the *best* approximation to such curves anywhere. Period. Institutional deadlines to get working code into the codebase, whether from a for-profit or not-for-profit entity, and usually are a good thing. It may not be perfect going in, but it is usually better for being there. That is one example from matplotlib's history that illustrates the benefit of institutional sponsers in a project. In this example, the organizational goal -- getting the Rover to land without crashing -- is one we can all relate to and support. And the resolution to the story, in which a heroically talented developer (Michael D) steps up to solve the problem, is one we can all aspire to. But the essential ingredients of the story are not so different from what we face here: an organization needs to solve a problem on a deadline; another organization, possibly related, has the resources to get the job done; all efforts are contributed to the public domain. Now that brings me to my final and perhaps most controverisal point. I don't believe democracy is the right solution for most open source problems. As exhibit A, I reference the birth of numpy itself that I discussed above. Numpy would have never happened if community input were considered. I'm pretty sure that all of us that were there at the time can attest to this. Democracy is something that many of us have grown up by default to consider as the right solution to many, if not most or, problems of governance. I believe it is a solution to a specific problem of governance. I do not believe democracy is a panacea or an ideal solution for most problems: rather it is the right solution for which the consequences of failure are too high. In a state (by which I mean a government with a power to subject its people to its will by force of arms) where the consequences of failure to submit include the death, dismemberment, or imprisonment of dissenters, democracy is a safeguard against the excesses of the powerful. Generally, there is no reason to believe that the simple majority of people polled is the "best" or "right" answer, but there is also no reason to believe that those who hold power will rule beneficiently. The democratic ability of the people to check to the rule of the few and powerful is essential to insure the survival of the minority. In open source software development, we face none of these problems. Our power to fork is precisely the power the minority in a tyranical democracy lacks: noone will kill us for going off the reservation. We are free to use the product or not, to modify it or not, to enhance it or not. The power to fork is not abstract: it is essential. matplotlib, and chaco, both rely *heavily* on agg, the Antigrain C++ rendering library. At some point many years ago, Maxim, the author of Agg, decided to change the license of Agg (circa version 2.5) to GPL rather than BSD. Obviously, this was a non-starter for projects like mpl, scipy and chaco which assumed BSD licensing terms. Unfortunately, Maxim had a new employer which appeared to us to be dictating the terms and our best arguments fell on deaf ears. No matter: mpl and Enthought chaco have continued to ship agg 2.4, pre-GPL, and I think that less than 1% of our users have even noticed. Yes, we forked the project, and yes, noone has noticed. To me this is the ultimate reason why governance of open source, free projects does not need to be democratic. As painful as a fork may be, it is the ultimate antidote to a leader who may not have your interests in mind. It is an antidote that we citizens in a state government may not have. It is true that numpy exists in a privileged position in a way that matplotlib or scipy does not. Numpy is the core. Yes, Continuum is different than STScI because Travis is both the lead of Numpy and the lead of the company sponsoring numpy. These are important differences. In the worst cases, we might imagine that these differences will negatively impact numpy and associated tools. But these worst case scenarios that we imagine will most likely simply distract us from what is going on: Travis, one of the most prolific and valuable contributers to the scientific python community, has decided to refocus his efforts to do more. And that is a very happy moment for all of us. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kalatsky at gmail.com Thu Feb 16 23:37:10 2012 From: kalatsky at gmail.com (Val Kalatsky) Date: Thu, 16 Feb 2012 22:37:10 -0600 Subject: [Numpy-discussion] Strange PyArray_FromObject() behavior In-Reply-To: References: Message-ID: Hi Bill, Looks like you are running a very fresh version of numpy. Without knowing the build version and what's going on in the extension module I can't tell you much. The usual suspects would be: 1) Numpy bug, not too likely. 2) Incorrect use of PyArray_FromObject, you'll need to send more info. 3) Something is seriously corrupted, probably not the case, because segfault would follow quickly. Please provide more info. Val PS Is it something related to what we'll be working on (Trilinos)? On Thu, Feb 16, 2012 at 11:09 AM, Spotz, William F wrote: > I have a user who is reporting tests that are failing on his platform. > I have not been able to reproduce the error on my system, but working with > him, we have isolated the problem to unexpected results when > PyArray_FromObject() is called. Here is the chain of events: > > In python, an integer is calculated. Specifically, it is > > len(result.errors) + len(result.failures) > > where result is a unit test result object from the unittest module. I > had him verify that this value was in fact a python integer. In my > extension module, this PyObject gets passed to the PyArray_FromObject() > function in a routine that comes from numpy.i. What I expect, and what I > typically get, is a numpy scalar array of type C long. I had my user print > the result using PyObject_Print() and what he got was > > array([0:00:00], dtype=timedelta64[us]) > > I am stuck as to why this might be happening. Any ideas? > > Thanks > > ** Bill Spotz ** > ** Sandia National Laboratories Voice: (505)845-0170 ** > ** P.O. Box 5800 Fax: (505)284-0154 ** > ** Albuquerque, NM 87185-0370 Email: wfspotz at sandia.gov ** > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Thu Feb 16 23:50:35 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 16 Feb 2012 20:50:35 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <52C9DA42-4070-46D5-88FF-D2BDCA44F34B@continuum.io> <5C553755-83A5-4234-AF4E-FCFD4B59578A@continuum.io> <4F3DACCB.2040004@gmail.com> Message-ID: Hi John, On Thu, Feb 16, 2012 at 8:20 PM, John Hunter wrote: > > > On Thu, Feb 16, 2012 at 7:26 PM, Alan G Isaac wrote: >> >> On 2/16/2012 7:22 PM, Matthew Brett wrote: >> > This has not been an encouraging episode in striving for consensus. >> >> I disagree. >> Failure to reach consensus does not imply lack of striving. >> > > Hey Alan, thanks for your thoughtful and nuanced views. ?I agree ?with > everything you've said, but have a few additional points. I thought I'd looked deep in my heart and failed to find paranoia about corporate involvement in numpy. I am happy that Travis formed Continuum and look forward to the progress we can expect for numpy. I don't think the conversation was much about 'democracy'. As far as I was concerned, anything on the range of "no change but at least being specific" to "full veto power from mailing list members" was up for discussion and anything in between. I wish we had not had to deal with the various red herrings here, such as whether Continuum is good or bad, whether Travis has been given adequate credit, or whether companies are bad for software. But, we did. It's fine. Argument over now. Best, Matthew From wfspotz at sandia.gov Fri Feb 17 00:41:02 2012 From: wfspotz at sandia.gov (Bill Spotz) Date: Thu, 16 Feb 2012 22:41:02 -0700 Subject: [Numpy-discussion] [EXTERNAL] Re: Strange PyArray_FromObject() behavior In-Reply-To: References: Message-ID: <10BE641C-1DE7-49C8-9AF3-D4654DE648A3@sandia.gov> Val, The problem occurs in function PyArrayObject* obj_to_array_allow_conversion(PyObject* input, int typecode, int* is_new_object) in numpy.i (which is the numpy SWIG interface file that I authored and is in the numpy distribution). The argument "input" comes in as a python int of value 0, "typecode" is NPY_NOTYPE to signify that the type should be detected, and "is_new_object" is an output flag. This function calls PyArray_FromObject(input, typecode, 0, 0) This is, in fact, a part of the PyTrilinos package, specifically the Teuchos module (Teuchos is our general tools package). The context here is the Teuchos Comm classes' reduce() method, in this case a summation over processors. We will be working with Tpetra classes that are built on top of a Teuchos Comm class. Thanks, Bill On Feb 16, 2012, at 9:37 PM, Val Kalatsky wrote: > > Hi Bill, > > Looks like you are running a very fresh version of numpy. > Without knowing the build version and what's going on in the extension module I can't tell you much. > The usual suspects would be: > 1) Numpy bug, not too likely. > 2) Incorrect use of PyArray_FromObject, you'll need to send more info. > 3) Something is seriously corrupted, probably not the case, because segfault would follow quickly. > > Please provide more info. > Val > > PS Is it something related to what we'll be working on (Trilinos)? > > > On Thu, Feb 16, 2012 at 11:09 AM, Spotz, William F wrote: > I have a user who is reporting tests that are failing on his platform. I have not been able to reproduce the error on my system, but working with him, we have isolated the problem to unexpected results when PyArray_FromObject() is called. Here is the chain of events: > > In python, an integer is calculated. Specifically, it is > > len(result.errors) + len(result.failures) > > where result is a unit test result object from the unittest module. I had him verify that this value was in fact a python integer. In my extension module, this PyObject gets passed to the PyArray_FromObject() function in a routine that comes from numpy.i. What I expect, and what I typically get, is a numpy scalar array of type C long. I had my user print the result using PyObject_Print() and what he got was > > array([0:00:00], dtype=timedelta64[us]) > > I am stuck as to why this might be happening. Any ideas? > > Thanks > > ** Bill Spotz ** > ** Sandia National Laboratories Voice: (505)845-0170 ** > ** P.O. Box 5800 Fax: (505)284-0154 ** > ** Albuquerque, NM 87185-0370 Email: wfspotz at sandia.gov ** > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ben.root at ou.edu Fri Feb 17 00:54:39 2012 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 16 Feb 2012 23:54:39 -0600 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <52C9DA42-4070-46D5-88FF-D2BDCA44F34B@continuum.io> <5C553755-83A5-4234-AF4E-FCFD4B59578A@continuum.io> <4F3DACCB.2040004@gmail.com> Message-ID: On Thursday, February 16, 2012, John Hunter wrote: > > > On Thu, Feb 16, 2012 at 7:26 PM, Alan G Isaac > > wrote: > >> On 2/16/2012 7:22 PM, Matthew Brett wrote: >> > This has not been an encouraging episode in striving for consensus. >> >> I disagree. >> Failure to reach consensus does not imply lack of striving. >> >> > Hey Alan, thanks for your thoughtful and nuanced views. I agree with > everything you've said, but have a few additional points. > > At the risk of wading into a thread that has grown far too long, and > echoing Eric's comments that the idea of governance is murky at best > when there is no provision for enforceability, I have a few comments. > Full disclosure: Travis has asked me and I have agreed to to serve on > a board for "numfocus", the not-for-profit arm of his efforts to > promote numpy and related tools. Although I have no special numpy > developer chops, as the original author of matplotlib, which is one of > the leading "numpy clients", he asked me to join his organization as a > "community representative". I support his efforts, and so agreed to > join the numfocus board. > > My first and most important point is that the subtext of many postings here > about the fear of undue and inappropriate influence of Continuum under > Travis' leadership is far overblown. Travis created numpy -- it is > his baby. Undeniably, he created it by standing on the shoulders of > giants: Jim Hugunin, Paul Dubois, Perry Greenfield and his team, and > many others. But the idea that we need to guard against the > possibility that his corporate interests will compromise his interests > in "what is best for numpy" is academic at best. > > As someone who has created a significant project in the realm of > "scientific computing in Python", I can tell you that it is something > I take quite a bit of pride in and it is very important to me that the > project thrives as it was intended to: as a free, open-source, > best-practice way of doing science. I know Travis well enough to know > he feels the same way -- numpy doing well is *at least* important to > him his company doing well. All of his recent actions to start a > company and foundation which focuses resources on numpy and related > tools reinforce that view. If he had a different temperament, he > wouldn't have devoted five to ten years of is life to Numeric, scipy > and numpy. He is a BDFL for a reason: he has earned our trust. > > And he has proven his ability to lead when *almost everyone* was > against him. At the height of the Numeric/numarray split, and I was > deeply involved in this as the mpl author because we had a "numerix" > compatibility layer to allow users to use one or the other, Travis > proposed writing numpy to solve both camp's problems. I really can't > remember a single individual who supported him. What I remember is > the cacophony of voices who though this was a bad idea, because of the > "third fork" problem. But Travis forged ahead, on his own, wrote > numpy, and re-united the Numeric and numarray camps. And > all-the-while he maintained his friendship with the numarray > developers (Perry Greenfield who led the numarray development effort > has also been invited by Travis to the numfocus board, as has Fernando > Perez and Jarrod Millman). Although MPL at the time agreed to support > a third version in its numerix compatibility layer for numpy, I can > thankfully say we have since dropped support for the compatibility > layer entirely as we all use numpy now. This to me is the distilled > essence of leadership, against the voices of the masses, and it bears > remembering. > > I have two more points I want to make: one is on democracy, and one is > on corporate control. On corporate control: there have been a number > of posts in this thread about the worries and dangers that Continuum > poses as the corporate sponser of numpy development, about how this > may cause numpy to shift from a model of a few loosely connected, > decentralized cadre of volunteers to a centrally controlled steering > committee of programmers who are controlled by corporate headquarters > and who make all their decisions around the water cooler unobserved by > the community of users. > > I want to make a connection to something that happened in the history > of matplotlib development, something that is not strictly analogous > but I think close enough to be informative. Sometime around 2005, > Perry Greenfield, who heads the development team of the Space > Telescope Science Institute (STScI) that is charged with processing > the Hubble image pipeline, emailed me that he was considering using > matplotlib as their primary image visualization tool. I can't tell > you how excited I was at the time. The idea of having institutional > sponsorship from someone as prestigious and resourceful as STScI was > hugely motivating. I worked feverishly for months to add stuff they > needed: better rendering, better image support, mathtext and lots > more. But more importantly, Perry was offering to bring institutional > support to my project: well qualified full-time employees who > dedicated a significant part of their time to matplotlib > development. He had done this before with numarray development, and > the contributions of his team are enormous. Many mpl features owe > their support to institutional sopnsership: Perry's group deserves the > biggest props, but Ted Drain's group at the JPL and corporate sponsors > as well are on the list. > > What I want you to think about are the parallels between Perry and his > team joining matplotlib's development effort and Continuum's stated > desire to contribute to numpy development. Yes, STScI is a > not-for-profit entity operated by NASA, and Continuum is a > for-profit-entity with a not-for-profit arm (numfocus). But the > differences are not so great in my experience. Both for-profits and > not-for-profits experience institutional pressures to get code out on > a deadline. In fact, perhaps our "finest hour" in matplotlib > development came as a result of one of out not-for-profit client's > deadlines. My favorite story is when the Jet Propulsion Labs at > Caltech emailed me about the inadequacy of our ellipse approximations, > and gave us the constraint that the Mars Rover was scheduled to land > in the next few months. Talk about a hard deadline! Michael > Droettboom, under Perry's direction, implemented a > "8-cubic-spline-approximation-to-curves-in-the-viewport" solution that > I honestly think gives matplotlib the *best* approximation to such > curves anywhere. Period. Institutional deadlines to get working code > into the codebase, whether from a for-profit or not-for-profit entity, > and usually are a good thing. It may not be perfect going in, but it is > usually better for being there. > > That is one example from matplotlib's history that illustrates the > benefit of institutional sponsers in a project. In this example, the > organizational goal -- getting the Rover to land without crashing -- is > one we can all relate to and support. And the resolution to the story, > in which a heroically talented developer (Michael D) steps up to > solve the problem, is one we can all aspire to. But the essential > ingredients of the story are not so different from what we face here: > an organization needs to solve a problem on a deadline; another > organization, possibly related, has the resources to get the job done; > all efforts are contributed to the public domain. > > Now that brings me to my final and perhaps most controverisal point. > I don't believe democracy is the right solution for most open source > problems. As exhibit A, I reference the birth of numpy itself that I > discussed above. Numpy would have never happened if community input > were considered. I'm pretty sure that all of us that were there at > the time can attest to this. > > Democracy is something that many of us have grown up by default to > consider as the right solution to many, if not most or, problems of > governance. I believe it is a solution to a specific problem of > governance. I do not believe democracy is a panacea or an ideal > solution for most problems: rather it is the right solution for which > the consequences of failure are too high. In a state (by which I mean > a government with a power to subject its people to its will by force > of arms) where the consequences of failure to submit include the > death, dismemberment, or imprisonment of dissenters, democracy is a > safeguard against the excesses of the powerful. Generally, there is > no reason to believe that the simple majority of people polled is the > "best" or "right" answer, but there is also no reason to believe that > those who hold power will rule beneficiently. The democratic ability > of the people to check to the rule of the few and powerful is > essential to insure the survival of the minority. > > In open source software development, we face none of these problems. > Our power to fork is precisely the power the minority in a tyranical > democracy lacks: noone will kill us for going off the reservation. We > are free to use the product or not, to modify it or not, to enhance it > or not. > > The power to fork is not abstract: it is essential. matplotlib, and > chaco, both rely *heavily* on agg, the Antigrain C++ rendering > library. At some point many years ago, Maxim, the author of Agg, > decided to change the license of Agg (circa version 2.5) to GPL rather > than BSD. Obviously, this was a non-starter for projects like mpl, > scipy and chaco which assumed BSD licensing terms. Unfortunately, > Maxim had a new employer which appeared to us to be dictating the > terms and our best arguments fell on deaf ears. No matter: mpl and > Enthought chaco have continued to ship agg 2.4, pre-GPL, and I think > that less than 1% of our users have even noticed. Yes, we forked the > project, and yes, noone has noticed. To me this is the ultimate > reason why governance of open source, free projects does not need to > be democratic. As painful as a fork may be, it is the ultimate > antidote to a leader who may not have your interests in mind. It is > an antidote that we citizens in a state government may not have. > > It is true that numpy exists in a privileged position in a way that > matplotlib or scipy does not. Numpy is the core. Yes, Continuum is > different than STScI because Travis is both the lead of Numpy and the > lead of the company sponsoring numpy. These are important > differences. In the worst cases, we might imagine that these > differences will negatively impact numpy and associated tools. But > these worst case scenarios that we imagine will most likely simply > distract us from what is going on: Travis, one of the most prolific > and valuable contributers to the scientific python community, has > decided to refocus his efforts to do more. And that is a very happy > moment for all of us. > John, Thank you for taking the time to share your perspective. As always, it is very interesting. I have only been involved with numpy for the past 3 years. From my perspective, Chuck has been BDFL, in a sense. It is hard for me to imagine python without numpy, and it is difficult to fully appreciate the effort Travis has put in. Perspectives like your are useful for this. So, power is derived by a mandate from the masses. That part is always democratic. However, the form of governance is not required to be democratic. If we are agreeing to Travis being BDF12, then that is just as valid in my mind as us agreeing to some other structure. What is important is the community coming together and agreeing on this. Would that be fine for everybody? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Fri Feb 17 01:11:51 2012 From: travis at continuum.io (Travis Oliphant) Date: Fri, 17 Feb 2012 00:11:51 -0600 Subject: [Numpy-discussion] Buildbot/continuous integration (was Re: Issue Tracking) In-Reply-To: References: Message-ID: The OS X slaves (especially PPC) are very valuable for testing. We have an intern who could help keep the build-bots going if you would give her access to those machines. Thanks for being willing to offer them. -Travis On Feb 16, 2012, at 6:36 PM, Matthew Brett wrote: > Hi, > > On Thu, Feb 16, 2012 at 4:12 PM, Nathaniel Smith wrote: >> On Thu, Feb 16, 2012 at 11:52 PM, Chris Ball wrote: >>> Buildbot is used by some big projects (e.g. Python, Chromium, and >>> Mozilla), but I'm aware that several projects in the scientific/numeric >>> Python ecosystem use Jenkins (including Cython, IPython, and SymPy), >>> often using a hosted Jenkins solution such as Shining Panda. A difficult >>> part of running a Buildbot service is finding hardware for the slaves >>> and keeping them alive, so a hosted solution sounds wonderful (assuming >>> hosted solutions offer an adequate range of operating systems etc). >> >> A quick look at Shining Panda suggests that you get no coverage for >> anything but Linux, which is a good start but rather limiting. IME by >> far the most annoying part of a useful buildbot setup is keeping all >> the build slaves up and working. It's one thing to set up a build >> environment in one OS, it's quite another to keep like 5 of them >> working, each on a different volunteered machine where you don't have >> root and the person who does isn't answering email... the total effort >> isn't large, but it's really poorly suited to the nature of volunteer >> labor, because it needs prompt attention at random intervals. (Also, >> this doesn't become obvious until after one's already gotten >> everything set up, so then you're stuck limping along because who >> wants to start over and build something more maintainable...) >> >> If anyone has existing sysadmin resources then keeping build-slaves >> running is a place where they'd be a huge contribution. > > Yup - keeping the slaves running is the big problem. > > We do have various slaves running here that at least are all > accessible by me, and Jarrod, and (at a pinch) Fernando, Stefan and > others. > > These are: > > XP (when I'm not using the machine, which is the large majority of the time) > OSX 10.5 > OSX 10.4 PPC > Linux 32 bit > Linux 64 bit > > These are all real machines not virtual machines. I'm happy to give > some reliable person ssh access to the buildslave user on these > machines. They won't necessarily be available for all time, they are > dotted around campus doing various jobs like being gateways, project > machines, occasional desktops. > > See you, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From matthew.brett at gmail.com Fri Feb 17 01:17:20 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 16 Feb 2012 22:17:20 -0800 Subject: [Numpy-discussion] Buildbot/continuous integration (was Re: Issue Tracking) In-Reply-To: References: Message-ID: Hi, On Thu, Feb 16, 2012 at 10:11 PM, Travis Oliphant wrote: > The OS X slaves (especially PPC) are very valuable for testing. ? ?We have an intern who could help keep the build-bots going if you would give her access to those machines. > > Thanks for being willing to offer them. No problem. The OSX machines should be reliably available. Please do put your intern in touch, I'll give her access. Best, Matthew From josef.pktd at gmail.com Fri Feb 17 01:22:16 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 17 Feb 2012 01:22:16 -0500 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <52C9DA42-4070-46D5-88FF-D2BDCA44F34B@continuum.io> <5C553755-83A5-4234-AF4E-FCFD4B59578A@continuum.io> <4F3DACCB.2040004@gmail.com> Message-ID: On Fri, Feb 17, 2012 at 12:54 AM, Benjamin Root wrote: > > > On Thursday, February 16, 2012, John Hunter wrote: >> >> >> >> On Thu, Feb 16, 2012 at 7:26 PM, Alan G Isaac >> wrote: >>> >>> On 2/16/2012 7:22 PM, Matthew Brett wrote: >>> > This has not been an encouraging episode in striving for consensus. >>> >>> I disagree. >>> Failure to reach consensus does not imply lack of striving. >>> >> >> Hey Alan, thanks for your thoughtful and nuanced views. ?I agree ?with >> everything you've said, but have a few additional points. >> >> At the risk of wading into a thread that has grown far too long, and >> echoing Eric's comments that the idea of governance is murky at best >> when there is no provision for enforceability, I have a few comments. >> Full disclosure: Travis has asked me and I have agreed to to serve on >> a board for "numfocus", the not-for-profit arm of his efforts to >> promote numpy and related tools. ?Although I have no special numpy >> developer chops, as the original author of matplotlib, which is one of >> the leading "numpy clients", he asked me to join his organization as a >> "community representative". ?I support his efforts, and so agreed to >> join the numfocus board. >> >> My first and most important point is that the subtext of many postings >> here >> about the fear of undue and inappropriate influence of Continuum under >> Travis' leadership is far overblown. ?Travis created numpy -- it is >> his baby. ?Undeniably, he created it by standing on the shoulders of >> giants: Jim Hugunin, Paul Dubois, Perry Greenfield and his team, and >> many others. ?But the idea that we need to guard against the >> possibility that his corporate interests will compromise his interests >> in "what is best for numpy" is academic at best. >> >> As someone who has created a significant project in the realm of >> "scientific computing in Python", I can tell you that it is something >> I take quite a bit of pride in and it is very important to me that the >> project thrives as it was intended to: as a free, open-source, >> best-practice way of doing science. ?I know Travis well enough to know >> he feels the same way -- numpy doing well is *at least* important to >> him his company doing well. ?All of his recent actions to start a >> company and foundation which focuses resources on numpy and related >> tools reinforce that view. ?If he had a different temperament, he >> wouldn't have devoted five to ten years of is life to Numeric, scipy >> and numpy. ?He is a BDFL for a reason: he has earned our trust. >> >> And he has proven his ability to lead when *almost everyone* was >> against him. ?At the height of the Numeric/numarray split, and I was >> deeply involved in this as the mpl author because we had a "numerix" >> compatibility layer to allow users to use one or the other, Travis >> proposed writing numpy to solve both camp's problems. ?I really can't >> remember a single individual who supported him. ?What I remember is >> the cacophony of voices who though this was a bad idea, because of the >> "third fork" problem. ?But Travis forged ahead, on his own, wrote >> numpy, and re-united the Numeric and numarray camps. ?And >> all-the-while he maintained his friendship with the numarray >> developers (Perry Greenfield who led the numarray development effort >> has also been invited by Travis to the numfocus board, as has Fernando >> Perez and Jarrod Millman). ?Although MPL at the time agreed to support >> a third version in its numerix compatibility layer for numpy, I can >> thankfully say we have since dropped support for the compatibility >> layer entirely as we all use numpy now. ?This to me is the distilled >> essence of leadership, against the voices of the masses, and it bears >> remembering. >> >> I have two more points I want to make: one is on democracy, and one is >> on corporate control. ?On corporate control: there have been a number >> of posts in this thread about the worries and dangers that Continuum >> poses as the corporate sponser of numpy development, about how this >> may cause numpy to shift from a model of a few loosely connected, >> decentralized cadre of volunteers to a centrally controlled steering >> committee of programmers who are controlled by corporate headquarters >> and who make all their decisions around the water cooler unobserved by >> the community of users. >> >> I want to make a connection to something that happened in the history >> of matplotlib development, something that is not strictly analogous >> but I think close enough to be informative. ?Sometime around 2005, >> Perry Greenfield, who heads the development team of the Space >> Telescope Science Institute (STScI) that is charged with processing >> the Hubble image pipeline, emailed me that he was considering using >> matplotlib as their primary image visualization tool. ?I can't tell >> you how excited I was at the time. ?The idea of having institutional >> sponsorship from someone as prestigious and resourceful as STScI was >> hugely motivating. ?I worked feverishly for months to add stuff they >> needed: better rendering, better image support, mathtext and lots >> more. ?But more importantly, Perry was offering to bring institutional >> support to my project: well qualified full-time employees who >> dedicated a significant part of their time to matplotlib >> development. He had done this before with numarray development, and >> the contributions of his team are enormous. ?Many mpl features owe >> their support to institutional sopnsership: Perry's group deserves the >> biggest props, but Ted Drain's group at the JPL and corporate sponsors >> as well are on the list. >> >> What I want you to think about are the parallels between Perry and his >> team joining matplotlib's development effort and Continuum's stated >> desire to contribute to numpy development. ?Yes, STScI is a >> not-for-profit entity operated by NASA, and Continuum is a >> for-profit-entity with a not-for-profit arm (numfocus). ?But the >> differences are not so great in my experience. ?Both for-profits and >> not-for-profits experience institutional pressures to get code out on >> a deadline. ?In fact, perhaps our "finest hour" in matplotlib >> development came as a result of one of out not-for-profit client's >> deadlines. ?My favorite story is when the Jet Propulsion Labs at >> Caltech emailed me about the inadequacy of our ellipse approximations, >> and gave us the constraint that the Mars Rover was scheduled to land >> in the next few months. ?Talk about a hard deadline! ?Michael >> Droettboom, under Perry's direction, implemented a >> "8-cubic-spline-approximation-to-curves-in-the-viewport" solution that >> I honestly think gives matplotlib the *best* approximation to such >> curves anywhere. ?Period. ?Institutional deadlines to get working code >> into the codebase, whether from a for-profit or not-for-profit entity, >> and usually are a good thing. It may not be perfect going in, but it is >> usually better for being there. >> >> That is one example from matplotlib's history that illustrates the >> benefit of institutional sponsers in a project. ?In this example, the >> organizational goal -- getting the Rover to land without crashing -- is >> one we can all relate to and support. ?And the resolution to the story, >> in which a heroically talented developer (Michael D) steps up to >> solve the problem, is one we can all aspire to. ?But the essential >> ingredients of the story are not so different from what we face here: >> an organization needs to solve a problem on a deadline; another >> organization, possibly related, has the resources to get the job done; >> all efforts are contributed to the public domain. >> >> Now that brings me to my final and perhaps most controverisal point. >> I don't believe democracy is the right solution for most open source >> problems. ?As exhibit A, I reference the birth of numpy itself that I >> discussed above. ?Numpy would have never happened if community input >> were considered. ?I'm pretty sure that all of us that were there at >> the time can attest to this. >> >> Democracy is something that many of us have grown up by default to >> consider as the right solution to many, if not most or, problems of >> governance. ?I believe it is a solution to a specific problem of >> governance. ?I do not believe democracy is a panacea or an ideal >> solution for most problems: rather it is the right solution for which >> the consequences of failure are too high. ?In a state (by which I mean >> a government with a power to subject its people to its will by force >> of arms) where the consequences of failure to submit include the >> death, dismemberment, or imprisonment of dissenters, democracy is a >> safeguard against the excesses of the powerful. ?Generally, there is >> no reason to believe that the simple majority of people polled is the >> "best" or "right" answer, but there is also no reason to believe that >> those who hold power will rule beneficiently. ?The democratic ability >> of the people to check to the rule of the few and powerful is >> essential to insure the survival of the minority. >> >> In open source software development, we face none of these problems. >> Our power to fork is precisely the power the minority in a tyranical >> democracy lacks: noone will kill us for going off the reservation. ?We >> are free to use the product or not, to modify it or not, to enhance it >> or not. >> >> The power to fork is not abstract: it is essential. ?matplotlib, and >> chaco, both rely *heavily* on agg, the Antigrain C++ rendering >> library. ?At some point many years ago, Maxim, the author of Agg, >> decided to change the license of Agg (circa version 2.5) to GPL rather >> than BSD. ?Obviously, this was a non-starter for projects like mpl, >> scipy and chaco which assumed BSD licensing terms. ?Unfortunately, >> Maxim had a new employer which appeared to us to be dictating the >> terms and our best arguments fell on deaf ears. ?No matter: mpl and >> Enthought chaco have continued to ship agg 2.4, pre-GPL, and I think >> that less than 1% of our users have even noticed. ?Yes, we forked the >> project, and yes, noone has noticed. ?To me this is the ultimate >> reason why governance of open source, free projects does not need to >> be democratic. ?As painful as a fork may be, it is the ultimate >> antidote to a leader who may not have your interests in mind. ?It is >> an antidote that we citizens in a state government may not have. >> >> It is true that numpy exists in a privileged position in a way that >> matplotlib or scipy does not. ?Numpy is the core. ?Yes, Continuum is >> different than STScI because Travis is both the lead of Numpy and the >> lead of the company sponsoring numpy. ?These are important >> differences. ?In the worst cases, we might imagine that these >> differences will negatively impact numpy and associated tools. ?But >> these worst case scenarios that we imagine will most likely simply >> distract us from what is going on: Travis, one of the most prolific >> and valuable contributers to the scientific python community, has >> decided to refocus his efforts to do more. ?And that is a very happy >> moment for all of us. > > > > John, > > Thank you for taking the time to share your perspective. ?As always, it is > very interesting. ?I have only been involved with numpy for the past 3 > years. From my perspective, Chuck has been BDFL, in a sense. ?It is hard for > me to imagine python without numpy, and it is difficult to fully appreciate > the effort Travis has put in. Perspectives like your are useful for this. > > So, power is derived by a mandate from the masses. That part is always > democratic. However, the form of governance is not required to be > democratic. > > If we are agreeing to Travis being BDF12, then that is just as valid in my > mind as us agreeing to some other structure. What is important is the > community coming together and agreeing on this. ?Would that be fine for > everybody? If Ralf, Charles and David manage to get a stable 1.7 out. Switching to MingW 4.x and gfortran for Windows would by high on my personal priority. Josef > > Ben Root > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From scott.sinclair.za at gmail.com Fri Feb 17 01:44:56 2012 From: scott.sinclair.za at gmail.com (Scott Sinclair) Date: Fri, 17 Feb 2012 08:44:56 +0200 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: <4F3D215D.9010504@gmail.com> References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <4F3CE542.9040808@creativetrax.com> <4F3CF8D1.4040803@creativetrax.com> <4F3D215D.9010504@gmail.com> Message-ID: On 16 February 2012 17:31, Bruce Southey wrote: > On 02/16/2012 08:06 AM, Scott Sinclair wrote: >> This is not intended to downplay the concerns raised in this thread, >> but I can't help myself. >> >> I propose the following (tongue-in-cheek) patch against the current >> numpy master branch. >> >> https://github.com/scottza/numpy/compare/constitution >> >> If this gets enough interest, I'll consider submitting a "real" pull request ;-) > Now that is totally disrespectful and just plain ignorant! Not to > mention the inability to count people correctly. I'm sorry that you feel that way and apologize if I've offended you. I didn't expect to and assure you that was not my intention. That said, I do hope that we can continue to make allowance for (very occasional) levity in the community. Cheers, Scott From charlesr.harris at gmail.com Fri Feb 17 01:50:24 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 16 Feb 2012 23:50:24 -0700 Subject: [Numpy-discussion] Strange PyArray_FromObject() behavior In-Reply-To: References: Message-ID: On Thu, Feb 16, 2012 at 10:09 AM, Spotz, William F wrote: > I have a user who is reporting tests that are failing on his platform. > I have not been able to reproduce the error on my system, but working with > him, we have isolated the problem to unexpected results when > PyArray_FromObject() is called. Here is the chain of events: > > In python, an integer is calculated. Specifically, it is > > len(result.errors) + len(result.failures) > > where result is a unit test result object from the unittest module. I > had him verify that this value was in fact a python integer. In my > extension module, this PyObject gets passed to the PyArray_FromObject() > function in a routine that comes from numpy.i. What I expect, and what I > typically get, is a numpy scalar array of type C long. I had my user print > the result using PyObject_Print() and what he got was > > array([0:00:00], dtype=timedelta64[us]) > > That's strange. Is the output always a zero and the type a timedelta64? In the absence of better info I'd quess a stray pointer or, unlikely, byte order. The numpy version would be nice to know. If you have an old version of numpy you could also give it a shot to see what happens. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From francesc at continuum.io Fri Feb 17 04:35:31 2012 From: francesc at continuum.io (Francesc Alted) Date: Fri, 17 Feb 2012 10:35:31 +0100 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <52C9DA42-4070-46D5-88FF-D2BDCA44F34B@continuum.io> <5C553755-83A5-4234-AF4E-FCFD4B59578A@continuum.io> <4F3DACCB.2040004@gmail.com> Message-ID: On Feb 17, 2012, at 5:20 AM, John Hunter wrote: > And he has proven his ability to lead when *almost everyone* was > against him. At the height of the Numeric/numarray split, and I was > deeply involved in this as the mpl author because we had a "numerix" > compatibility layer to allow users to use one or the other, Travis > proposed writing numpy to solve both camp's problems. I really can't > remember a single individual who supported him. What I remember is > the cacophony of voices who though this was a bad idea, because of the > "third fork" problem. But Travis forged ahead, on his own, wrote > numpy, and re-united the Numeric and numarray camps. Thanks John for including this piece of NumPy history in this context. My voice was part of the "cacophony" about the "third fork" problem back in 2005. I was pretty darn uncomfortable on the perspective of adding support for a third numerical package on PyTables. However, Travis started to work with all engines powered on, and in a few months he had built a thing that was overly better than Numeric and numarray together. The rest is history. But I remember this period (2005) as one of the most dramatic examples on how the capacity and dedication of a single individual can shape the world. -- Francesc Alted From pav at iki.fi Fri Feb 17 06:41:33 2012 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 17 Feb 2012 12:41:33 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> Message-ID: Hi, 16.02.2012 23:39, Travis Oliphant kirjoitti: [clip] > * NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: > * resolving the NA/missing-data issues > * finishing group-by > * incorporating the start of label arrays > * incorporating a meta-object > * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) > * adding ufunc support for flexible dtypes and possibly structured arrays > * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous > * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities > * adding "filters" to Input and Output > * simple computed fields for dtypes > * accepting a Data-Type specification as a class or JSON file > * work towards improving the dtype-addition mechanism > * re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures. That looks a pretty great heap of work -- it's great that you're going to tackle it! Here's one additional wishlist point: - Add necessary hooks to the ufunc machinery, dot products, etc., so that the behavior of sparse matrices can be made nice. Sparse matrices are pretty ubiquitous in many fields, but right now it seems that there are dark corners in the interplay between dense and sparse. This is a bit of a sticky API design problem though: what should be done to make the ufunc machinery "subclassable"? Addressing this could also resolve problems coming up with the `matrix` ndarray subclass. Pauli From cournape at gmail.com Fri Feb 17 10:01:06 2012 From: cournape at gmail.com (David Cournapeau) Date: Fri, 17 Feb 2012 15:01:06 +0000 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> Message-ID: Hi Travis, On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant wrote: > Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires: > > ? ? ? ?* addition of new features that are needed in NumPy > ? ? ? ?* improving the code-base generally and moving towards a more maintainable NumPy > > I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. ?I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process. > > As a result, I am proposing a rough outline for releases over the next year: > > ? ? ? ?* NumPy 1.7 to come out as soon as the serious bugs can be eliminated. ?Bryan, Francesc, Mark, and I are able to help triage some of those. > > ? ? ? ?* NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. ? I will post to this list more details of what we plan to address with it later. ? ?Included for possible inclusion are: > ? ? ? ?* resolving the NA/missing-data issues > ? ? ? ?* finishing group-by > ? ? ? ?* incorporating the start of label arrays > ? ? ? ?* incorporating a meta-object > ? ? ? ?* a few new dtypes (variable-length string, varialbe-length unicode and an enum type) > ? ? ? ?* adding ufunc support for flexible dtypes and possibly structured arrays > ? ? ? ?* allowing generalized ufuncs to work on more kinds of arrays besides just contiguous > ? ? ? ?* improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities > ? ? ? ?* adding "filters" to Input and Output > ? ? ? ?* simple computed fields for dtypes > ? ? ? ?* accepting a Data-Type specification as a class or JSON file > ? ? ? ?* work towards improving the dtype-addition mechanism > ? ? ? ?* re-factoring of code so that it can compile with a C++ compiler and be minimally dependent on Python data-structures. This is a pretty exciting list of features. What is the rationale for code being compiled as C++ ? IMO, it will be difficult to do so without preventing useful C constructs, and without removing some of the existing features (like our use of C99 complex). The subset that is both C and C++ compatible is quite constraining. cheers, David From charlesr.harris at gmail.com Fri Feb 17 10:39:25 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 17 Feb 2012 08:39:25 -0700 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> Message-ID: On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau wrote: > Hi Travis, > > On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant > wrote: > > Mark Wiebe and I have been discussing off and on (as well as talking > with Charles) a good way forward to balance two competing desires: > > > > * addition of new features that are needed in NumPy > > * improving the code-base generally and moving towards a more > maintainable NumPy > > > > I know there are load voices for just focusing on the second of these > and avoiding the first until we have finished that. I recognize the need > to improve the code base, but I will also be pushing for improvements to > the feature-set and user experience in the process. > > > > As a result, I am proposing a rough outline for releases over the next > year: > > > > * NumPy 1.7 to come out as soon as the serious bugs can be > eliminated. Bryan, Francesc, Mark, and I are able to help triage some of > those. > > > > * NumPy 1.8 to come out in July which will have as many > ABI-compatible feature enhancements as we can add while improving test > coverage and code cleanup. I will post to this list more details of what > we plan to address with it later. Included for possible inclusion are: > > * resolving the NA/missing-data issues > > * finishing group-by > > * incorporating the start of label arrays > > * incorporating a meta-object > > * a few new dtypes (variable-length string, varialbe-length > unicode and an enum type) > > * adding ufunc support for flexible dtypes and possibly > structured arrays > > * allowing generalized ufuncs to work on more kinds of arrays > besides just contiguous > > * improving the ability for NumPy to receive JIT-generated > function pointers for ufuncs and other calculation opportunities > > * adding "filters" to Input and Output > > * simple computed fields for dtypes > > * accepting a Data-Type specification as a class or JSON file > > * work towards improving the dtype-addition mechanism > > * re-factoring of code so that it can compile with a C++ compiler > and be minimally dependent on Python data-structures. > > This is a pretty exciting list of features. What is the rationale for > code being compiled as C++ ? IMO, it will be difficult to do so > without preventing useful C constructs, and without removing some of > the existing features (like our use of C99 complex). The subset that > is both C and C++ compatible is quite constraining. > > I'm in favor of this myself, C++ would allow a lot code cleanup and make it easier to provide an extensible base, I think it would be a natural fit with numpy. Of course, some C++ projects become tangled messes of inheritance, but I'd be very interested in seeing what a good C++ designer like Mark, intimately familiar with the numpy code base, could do. This opportunity might not come by again anytime soon and I think we should grab onto it. The initial step would be a release whose code that would compile in both C/C++, which mostly comes down to removing C++ keywords like 'new'. I did suggest running it by you for build issues, so please raise any you can think of. Note that MatPlotLib is in C++, so I don't think the problems are insurmountable. And choosing a set of compilers to support is something that will need to be done. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Fri Feb 17 11:27:32 2012 From: cournape at gmail.com (David Cournapeau) Date: Fri, 17 Feb 2012 16:27:32 +0000 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> Message-ID: On Fri, Feb 17, 2012 at 3:39 PM, Charles R Harris wrote: > > > On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau > wrote: >> >> Hi Travis, >> >> On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant >> wrote: >> > Mark Wiebe and I have been discussing off and on (as well as talking >> > with Charles) a good way forward to balance two competing desires: >> > >> > ? ? ? ?* addition of new features that are needed in NumPy >> > ? ? ? ?* improving the code-base generally and moving towards a more >> > maintainable NumPy >> > >> > I know there are load voices for just focusing on the second of these >> > and avoiding the first until we have finished that. ?I recognize the need to >> > improve the code base, but I will also be pushing for improvements to the >> > feature-set and user experience in the process. >> > >> > As a result, I am proposing a rough outline for releases over the next >> > year: >> > >> > ? ? ? ?* NumPy 1.7 to come out as soon as the serious bugs can be >> > eliminated. ?Bryan, Francesc, Mark, and I are able to help triage some of >> > those. >> > >> > ? ? ? ?* NumPy 1.8 to come out in July which will have as many >> > ABI-compatible feature enhancements as we can add while improving test >> > coverage and code cleanup. ? I will post to this list more details of what >> > we plan to address with it later. ? ?Included for possible inclusion are: >> > ? ? ? ?* resolving the NA/missing-data issues >> > ? ? ? ?* finishing group-by >> > ? ? ? ?* incorporating the start of label arrays >> > ? ? ? ?* incorporating a meta-object >> > ? ? ? ?* a few new dtypes (variable-length string, varialbe-length >> > unicode and an enum type) >> > ? ? ? ?* adding ufunc support for flexible dtypes and possibly >> > structured arrays >> > ? ? ? ?* allowing generalized ufuncs to work on more kinds of arrays >> > besides just contiguous >> > ? ? ? ?* improving the ability for NumPy to receive JIT-generated >> > function pointers for ufuncs and other calculation opportunities >> > ? ? ? ?* adding "filters" to Input and Output >> > ? ? ? ?* simple computed fields for dtypes >> > ? ? ? ?* accepting a Data-Type specification as a class or JSON file >> > ? ? ? ?* work towards improving the dtype-addition mechanism >> > ? ? ? ?* re-factoring of code so that it can compile with a C++ compiler >> > and be minimally dependent on Python data-structures. >> >> This is a pretty exciting list of features. What is the rationale for >> code being compiled as C++ ? IMO, it will be difficult to do so >> without preventing useful C constructs, and without removing some of >> the existing features (like our use of C99 complex). The subset that >> is both C and C++ compatible is quite constraining. >> > > I'm in favor of this myself, C++ would allow a lot code cleanup and make it > easier to provide an extensible base, I think it would be a natural fit with > numpy. Of course, some C++ projects become tangled messes of inheritance, > but I'd be very interested in seeing what a good C++ designer like Mark, > intimately familiar with the numpy code base, could do. This opportunity > might not come by again anytime soon and I think we should grab onto it. The > initial step would be a release whose code that would compile in both C/C++, > which mostly comes down to removing C++ keywords like 'new'. C++ will make integration with external environments much harder (calling a C++ library from a non C++ program is very hard, especially for cross-platform projects), and I am not convinced by the more extensible argument. Making the numpy C code buildable by a C++ compiler is harder than removing keywords. > I did suggest running it by you for build issues, so please raise any you > can think of. Note that MatPlotLib is in C++, so I don't think the problems > are insurmountable. And choosing a set of compilers to support is something > that will need to be done. I don't know for matplotlib, but for scipy, quite a few issues were caused by our C++ extensions in scipy.sparse. But build issues are a not a strong argument against C++ - I am sure those could be worked out. regards, David From bryanv at continuum.io Fri Feb 17 12:02:56 2012 From: bryanv at continuum.io (Bryan Van de Ven) Date: Fri, 17 Feb 2012 11:02:56 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> Message-ID: <4F3E8840.8070405@continuum.io> On 2/17/12 10:27 AM, David Cournapeau wrote: > Making the numpy C code buildable by a C++ compiler is harder than > removing keywords. Just as a data point, I took the cpp branch mark started and got numpy built and running with multiarray compiled using C++ (OSX llvm-g++ 4.2). All I really did was rename reserved keywords and add extern "C" where necessary. Although, AFAIK C99 complex support is included as an extension, so I believe you are correct that there would be more work there to get that working under more platforms. Bryan Van de Ven From wfspotz at sandia.gov Fri Feb 17 12:31:35 2012 From: wfspotz at sandia.gov (Bill Spotz) Date: Fri, 17 Feb 2012 10:31:35 -0700 Subject: [Numpy-discussion] [EXTERNAL] Re: Strange PyArray_FromObject() behavior In-Reply-To: References: Message-ID: <72A51B26-D531-485E-8B95-EBB11D0F44B1@sandia.gov> Chuck, I provided a little more context in another email. The user is using numpy 1.6.1 with python 2.6. I asked him to try an earlier version -- we'll see how it goes. This is code that has worked for a long time. It still works on my laptop and on our test platforms. The behavior on the user's system appears to be consistent, as far as I can tell. -Bill On Feb 16, 2012, at 11:50 PM, Charles R Harris wrote: > On Thu, Feb 16, 2012 at 10:09 AM, Spotz, William F wrote: > I have a user who is reporting tests that are failing on his platform. I have not been able to reproduce the error on my system, but working with him, we have isolated the problem to unexpected results when PyArray_FromObject() is called. Here is the chain of events: > > In python, an integer is calculated. Specifically, it is > > len(result.errors) + len(result.failures) > > where result is a unit test result object from the unittest module. I had him verify that this value was in fact a python integer. In my extension module, this PyObject gets passed to the PyArray_FromObject() function in a routine that comes from numpy.i. What I expect, and what I typically get, is a numpy scalar array of type C long. I had my user print the result using PyObject_Print() and what he got was > > array([0:00:00], dtype=timedelta64[us]) > > > That's strange. Is the output always a zero and the type a timedelta64? In the absence of better info I'd quess a stray pointer or, unlikely, byte order. The numpy version would be nice to know. If you have an old version of numpy you could also give it a shot to see what happens. > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From efiring at hawaii.edu Fri Feb 17 12:52:10 2012 From: efiring at hawaii.edu (Eric Firing) Date: Fri, 17 Feb 2012 07:52:10 -1000 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> Message-ID: <4F3E93CA.8020703@hawaii.edu> On 02/17/2012 05:39 AM, Charles R Harris wrote: > > > On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau > wrote: > > Hi Travis, > > On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant > > wrote: > > Mark Wiebe and I have been discussing off and on (as well as > talking with Charles) a good way forward to balance two competing > desires: > > > > * addition of new features that are needed in NumPy > > * improving the code-base generally and moving towards a > more maintainable NumPy > > > > I know there are load voices for just focusing on the second of > these and avoiding the first until we have finished that. I > recognize the need to improve the code base, but I will also be > pushing for improvements to the feature-set and user experience in > the process. > > > > As a result, I am proposing a rough outline for releases over the > next year: > > > > * NumPy 1.7 to come out as soon as the serious bugs can be > eliminated. Bryan, Francesc, Mark, and I are able to help triage > some of those. > > > > * NumPy 1.8 to come out in July which will have as many > ABI-compatible feature enhancements as we can add while improving > test coverage and code cleanup. I will post to this list more > details of what we plan to address with it later. Included for > possible inclusion are: > > * resolving the NA/missing-data issues > > * finishing group-by > > * incorporating the start of label arrays > > * incorporating a meta-object > > * a few new dtypes (variable-length string, > varialbe-length unicode and an enum type) > > * adding ufunc support for flexible dtypes and possibly > structured arrays > > * allowing generalized ufuncs to work on more kinds of > arrays besides just contiguous > > * improving the ability for NumPy to receive JIT-generated > function pointers for ufuncs and other calculation opportunities > > * adding "filters" to Input and Output > > * simple computed fields for dtypes > > * accepting a Data-Type specification as a class or JSON file > > * work towards improving the dtype-addition mechanism > > * re-factoring of code so that it can compile with a C++ > compiler and be minimally dependent on Python data-structures. > > This is a pretty exciting list of features. What is the rationale for > code being compiled as C++ ? IMO, it will be difficult to do so > without preventing useful C constructs, and without removing some of > the existing features (like our use of C99 complex). The subset that > is both C and C++ compatible is quite constraining. > > > I'm in favor of this myself, C++ would allow a lot code cleanup and make > it easier to provide an extensible base, I think it would be a natural > fit with numpy. Of course, some C++ projects become tangled messes of > inheritance, but I'd be very interested in seeing what a good C++ > designer like Mark, intimately familiar with the numpy code base, could > do. This opportunity might not come by again anytime soon and I think we > should grab onto it. The initial step would be a release whose code that > would compile in both C/C++, which mostly comes down to removing C++ > keywords like 'new'. > > I did suggest running it by you for build issues, so please raise any > you can think of. Note that MatPlotLib is in C++, so I don't think the > problems are insurmountable. And choosing a set of compilers to support > is something that will need to be done. It's true that matplotlib relies heavily on C++, both via the Agg library and in its own extension code. Personally, I don't like this; I think it raises the barrier to contributing. C++ is an order of magnitude more complicated than C--harder to read, and much harder to write, unless one is a true expert. In mpl it brings reliance on the CXX library, which Mike D. has had to help maintain. And if it does increase compiler specificity, that's bad. I would much rather see development in the direction of sticking with C where direct low-level control and speed are needed, and using cython to gain higher level language benefits where appropriate. Of course, that brings in the danger of reliance on another complex tool, cython. If that danger is considered excessive, then just stick with C. Eric > > Chuck From mwwiebe at gmail.com Fri Feb 17 12:57:57 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 17 Feb 2012 11:57:57 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> Message-ID: On Fri, Feb 17, 2012 at 10:27 AM, David Cournapeau wrote: > On Fri, Feb 17, 2012 at 3:39 PM, Charles R Harris > wrote: > > > > > > On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau > > wrote: > >> > >> Hi Travis, > >> > >> On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant > >> wrote: > >> > Mark Wiebe and I have been discussing off and on (as well as talking > >> > with Charles) a good way forward to balance two competing desires: > >> > > >> > * addition of new features that are needed in NumPy > >> > * improving the code-base generally and moving towards a more > >> > maintainable NumPy > >> > > >> > I know there are load voices for just focusing on the second of these > >> > and avoiding the first until we have finished that. I recognize the > need to > >> > improve the code base, but I will also be pushing for improvements to > the > >> > feature-set and user experience in the process. > >> > > >> > As a result, I am proposing a rough outline for releases over the next > >> > year: > >> > > >> > * NumPy 1.7 to come out as soon as the serious bugs can be > >> > eliminated. Bryan, Francesc, Mark, and I are able to help triage > some of > >> > those. > >> > > >> > * NumPy 1.8 to come out in July which will have as many > >> > ABI-compatible feature enhancements as we can add while improving test > >> > coverage and code cleanup. I will post to this list more details of > what > >> > we plan to address with it later. Included for possible inclusion > are: > >> > * resolving the NA/missing-data issues > >> > * finishing group-by > >> > * incorporating the start of label arrays > >> > * incorporating a meta-object > >> > * a few new dtypes (variable-length string, varialbe-length > >> > unicode and an enum type) > >> > * adding ufunc support for flexible dtypes and possibly > >> > structured arrays > >> > * allowing generalized ufuncs to work on more kinds of arrays > >> > besides just contiguous > >> > * improving the ability for NumPy to receive JIT-generated > >> > function pointers for ufuncs and other calculation opportunities > >> > * adding "filters" to Input and Output > >> > * simple computed fields for dtypes > >> > * accepting a Data-Type specification as a class or JSON file > >> > * work towards improving the dtype-addition mechanism > >> > * re-factoring of code so that it can compile with a C++ > compiler > >> > and be minimally dependent on Python data-structures. > >> > >> This is a pretty exciting list of features. What is the rationale for > >> code being compiled as C++ ? IMO, it will be difficult to do so > >> without preventing useful C constructs, and without removing some of > >> the existing features (like our use of C99 complex). The subset that > >> is both C and C++ compatible is quite constraining. > >> > > > > I'm in favor of this myself, C++ would allow a lot code cleanup and make > it > > easier to provide an extensible base, I think it would be a natural fit > with > > numpy. Of course, some C++ projects become tangled messes of inheritance, > > but I'd be very interested in seeing what a good C++ designer like Mark, > > intimately familiar with the numpy code base, could do. This opportunity > > might not come by again anytime soon and I think we should grab onto it. > The > > initial step would be a release whose code that would compile in both > C/C++, > > which mostly comes down to removing C++ keywords like 'new'. > > C++ will make integration with external environments much harder > (calling a C++ library from a non C++ program is very hard, especially > for cross-platform projects), and I am not convinced by the more > extensible argument. > The whole of NumPy could be written utilizing C++ extensively while still using exactly the same API and ABI numpy has now. C++ does not force anything about API/ABI design decisions. One good document to read about how a major open source project transitioned from C to C++ is about gcc. Their points comparing C and C++ apply to numpy quite well, and being compiler authors, they're intimately familiar with ABI and performance issues: http://gcc.gnu.org/wiki/gcc-in-cxx#The_gcc-in-cxx_branch Making the numpy C code buildable by a C++ compiler is harder than > removing keywords. Certainly, but it's not a difficult task for someone who's familiar with both C and C++. > > I did suggest running it by you for build issues, so please raise any you > > can think of. Note that MatPlotLib is in C++, so I don't think the > problems > > are insurmountable. And choosing a set of compilers to support is > something > > that will need to be done. > > I don't know for matplotlib, but for scipy, quite a few issues were > caused by our C++ extensions in scipy.sparse. But build issues are a > not a strong argument against C++ - I am sure those could be worked > out. > On this topic, I'd like to ask what it would take to change the default warning levels in all the build configurations? Building with no warnings under high warning levels is a pretty standard practice as a basic mechanisms for catching some classes of bugs, and it would be nice for numpy to do this. The only way this is reasonable, though, is if it's the default in the build system. Thanks, Mark > regards, > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Fri Feb 17 13:21:11 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 17 Feb 2012 12:21:11 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <4F3E93CA.8020703@hawaii.edu> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing wrote: > On 02/17/2012 05:39 AM, Charles R Harris wrote: > > > > > > On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau > > wrote: > > > > Hi Travis, > > > > On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant > > > wrote: > > > Mark Wiebe and I have been discussing off and on (as well as > > talking with Charles) a good way forward to balance two competing > > desires: > > > > > > * addition of new features that are needed in NumPy > > > * improving the code-base generally and moving towards a > > more maintainable NumPy > > > > > > I know there are load voices for just focusing on the second of > > these and avoiding the first until we have finished that. I > > recognize the need to improve the code base, but I will also be > > pushing for improvements to the feature-set and user experience in > > the process. > > > > > > As a result, I am proposing a rough outline for releases over the > > next year: > > > > > > * NumPy 1.7 to come out as soon as the serious bugs can be > > eliminated. Bryan, Francesc, Mark, and I are able to help triage > > some of those. > > > > > > * NumPy 1.8 to come out in July which will have as many > > ABI-compatible feature enhancements as we can add while improving > > test coverage and code cleanup. I will post to this list more > > details of what we plan to address with it later. Included for > > possible inclusion are: > > > * resolving the NA/missing-data issues > > > * finishing group-by > > > * incorporating the start of label arrays > > > * incorporating a meta-object > > > * a few new dtypes (variable-length string, > > varialbe-length unicode and an enum type) > > > * adding ufunc support for flexible dtypes and possibly > > structured arrays > > > * allowing generalized ufuncs to work on more kinds of > > arrays besides just contiguous > > > * improving the ability for NumPy to receive JIT-generated > > function pointers for ufuncs and other calculation opportunities > > > * adding "filters" to Input and Output > > > * simple computed fields for dtypes > > > * accepting a Data-Type specification as a class or JSON > file > > > * work towards improving the dtype-addition mechanism > > > * re-factoring of code so that it can compile with a C++ > > compiler and be minimally dependent on Python data-structures. > > > > This is a pretty exciting list of features. What is the rationale for > > code being compiled as C++ ? IMO, it will be difficult to do so > > without preventing useful C constructs, and without removing some of > > the existing features (like our use of C99 complex). The subset that > > is both C and C++ compatible is quite constraining. > > > > > > I'm in favor of this myself, C++ would allow a lot code cleanup and make > > it easier to provide an extensible base, I think it would be a natural > > fit with numpy. Of course, some C++ projects become tangled messes of > > inheritance, but I'd be very interested in seeing what a good C++ > > designer like Mark, intimately familiar with the numpy code base, could > > do. This opportunity might not come by again anytime soon and I think we > > should grab onto it. The initial step would be a release whose code that > > would compile in both C/C++, which mostly comes down to removing C++ > > keywords like 'new'. > > > > I did suggest running it by you for build issues, so please raise any > > you can think of. Note that MatPlotLib is in C++, so I don't think the > > problems are insurmountable. And choosing a set of compilers to support > > is something that will need to be done. > > It's true that matplotlib relies heavily on C++, both via the Agg > library and in its own extension code. Personally, I don't like this; I > think it raises the barrier to contributing. C++ is an order of > magnitude more complicated than C--harder to read, and much harder to > write, unless one is a true expert. In mpl it brings reliance on the CXX > library, which Mike D. has had to help maintain. And if it does > increase compiler specificity, that's bad. > This gets to the recruitment issue, which is one of the most important problems I see numpy facing. I personally have contributed a lot of code to NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++ was the biggest negative point when I considered whether it was worth contributing to the project. I suspect there are many programmers out there who are skilled in low-level, high-performance C++, who would be willing to contribute, but don't want to code in C. I believe NumPy should be trying to find people who want to make high performance, close to the metal, libraries. This is a very different type of programmer than one who wants to program in Python, but is willing to dabble in a lower level language to make something run faster. High performance library development is one of the things the C++ developer community does very well, and that community is where we have a good chance of finding the programmers NumPy needs. I would much rather see development in the direction of sticking with C > where direct low-level control and speed are needed, and using cython to > gain higher level language benefits where appropriate. Of course, that > brings in the danger of reliance on another complex tool, cython. If > that danger is considered excessive, then just stick with C. > There are many small benefits C++ can offer, even if numpy chooses only to use a tiny subset of the C++ language. For example, RAII can be used to reliably eliminate PyObject reference leaks. Consider a regression like this: http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html Fixing this in C would require switching all the relevant usages of NPY_MAXARGS to use a dynamic memory allocation. This brings with it the potential of easily introducing a memory leak, and is a lot of work to do. In C++, this functionality could be placed inside a class, where the deterministic construction/destruction semantics eliminate the risk of memory leaks and make the code easier to read at the same time. There are other examples like this where the C language has forced a suboptimal design choice because of how hard it would be to do it better. Cheers, Mark > Eric > > > > > Chuck > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Fri Feb 17 13:37:06 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 17 Feb 2012 13:37:06 -0500 Subject: [Numpy-discussion] Proposed Roadmap Overview References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: Mark Wiebe wrote: > On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing wrote: > >> On 02/17/2012 05:39 AM, Charles R Harris wrote: >> > >> > >> > On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau > > > wrote: >> > >> > Hi Travis, >> > >> > On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant >> > > wrote: >> > > Mark Wiebe and I have been discussing off and on (as well as >> > talking with Charles) a good way forward to balance two competing >> > desires: >> > > >> > > * addition of new features that are needed in NumPy >> > > * improving the code-base generally and moving towards a >> > more maintainable NumPy >> > > >> > > I know there are load voices for just focusing on the second of >> > these and avoiding the first until we have finished that. I >> > recognize the need to improve the code base, but I will also be >> > pushing for improvements to the feature-set and user experience in >> > the process. >> > > >> > > As a result, I am proposing a rough outline for releases over the >> > next year: >> > > >> > > * NumPy 1.7 to come out as soon as the serious bugs can be >> > eliminated. Bryan, Francesc, Mark, and I are able to help triage >> > some of those. >> > > >> > > * NumPy 1.8 to come out in July which will have as many >> > ABI-compatible feature enhancements as we can add while improving >> > test coverage and code cleanup. I will post to this list more >> > details of what we plan to address with it later. Included for >> > possible inclusion are: >> > > * resolving the NA/missing-data issues >> > > * finishing group-by >> > > * incorporating the start of label arrays >> > > * incorporating a meta-object >> > > * a few new dtypes (variable-length string, >> > varialbe-length unicode and an enum type) >> > > * adding ufunc support for flexible dtypes and possibly >> > structured arrays >> > > * allowing generalized ufuncs to work on more kinds of >> > arrays besides just contiguous >> > > * improving the ability for NumPy to receive JIT-generated >> > function pointers for ufuncs and other calculation opportunities >> > > * adding "filters" to Input and Output >> > > * simple computed fields for dtypes >> > > * accepting a Data-Type specification as a class or JSON >> file >> > > * work towards improving the dtype-addition mechanism >> > > * re-factoring of code so that it can compile with a C++ >> > compiler and be minimally dependent on Python data-structures. >> > >> > This is a pretty exciting list of features. What is the rationale for >> > code being compiled as C++ ? IMO, it will be difficult to do so >> > without preventing useful C constructs, and without removing some of >> > the existing features (like our use of C99 complex). The subset that >> > is both C and C++ compatible is quite constraining. >> > >> > >> > I'm in favor of this myself, C++ would allow a lot code cleanup and make >> > it easier to provide an extensible base, I think it would be a natural >> > fit with numpy. Of course, some C++ projects become tangled messes of >> > inheritance, but I'd be very interested in seeing what a good C++ >> > designer like Mark, intimately familiar with the numpy code base, could >> > do. This opportunity might not come by again anytime soon and I think we >> > should grab onto it. The initial step would be a release whose code that >> > would compile in both C/C++, which mostly comes down to removing C++ >> > keywords like 'new'. >> > >> > I did suggest running it by you for build issues, so please raise any >> > you can think of. Note that MatPlotLib is in C++, so I don't think the >> > problems are insurmountable. And choosing a set of compilers to support >> > is something that will need to be done. >> >> It's true that matplotlib relies heavily on C++, both via the Agg >> library and in its own extension code. Personally, I don't like this; I >> think it raises the barrier to contributing. C++ is an order of >> magnitude more complicated than C--harder to read, and much harder to >> write, unless one is a true expert. In mpl it brings reliance on the CXX >> library, which Mike D. has had to help maintain. And if it does >> increase compiler specificity, that's bad. >> > > This gets to the recruitment issue, which is one of the most important > problems I see numpy facing. I personally have contributed a lot of code to > NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++ was > the biggest negative point when I considered whether it was worth > contributing to the project. I suspect there are many programmers out there > who are skilled in low-level, high-performance C++, who would be willing to > contribute, but don't want to code in C. > > I believe NumPy should be trying to find people who want to make high > performance, close to the metal, libraries. This is a very different type > of programmer than one who wants to program in Python, but is willing to > dabble in a lower level language to make something run faster. High > performance library development is one of the things the C++ developer > community does very well, and that community is where we have a good chance > of finding the programmers NumPy needs. > > I would much rather see development in the direction of sticking with C >> where direct low-level control and speed are needed, and using cython to >> gain higher level language benefits where appropriate. Of course, that >> brings in the danger of reliance on another complex tool, cython. If >> that danger is considered excessive, then just stick with C. >> > > There are many small benefits C++ can offer, even if numpy chooses only to > use a tiny subset of the C++ language. For example, RAII can be used to > reliably eliminate PyObject reference leaks. > > Consider a regression like this: > http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html > > Fixing this in C would require switching all the relevant usages of > NPY_MAXARGS to use a dynamic memory allocation. This brings with it the > potential of easily introducing a memory leak, and is a lot of work to do. > In C++, this functionality could be placed inside a class, where the > deterministic construction/destruction semantics eliminate the risk of > memory leaks and make the code easier to read at the same time. There are > other examples like this where the C language has forced a suboptimal > design choice because of how hard it would be to do it better. > > Cheers, > Mark > > I think numpy really wants to use c++ templates to generate specific instantiations of algorithms for each dtype from a generic version, rather than the current code that uses cpp. From charlesr.harris at gmail.com Fri Feb 17 13:46:27 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 17 Feb 2012 11:46:27 -0700 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: On Fri, Feb 17, 2012 at 11:37 AM, Neal Becker wrote: > Mark Wiebe wrote: > > > On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing > wrote: > > > >> On 02/17/2012 05:39 AM, Charles R Harris wrote: > >> > > >> > > >> > On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau >> > > wrote: > >> > > >> > Hi Travis, > >> > > >> > On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant > >> > > wrote: > >> > > Mark Wiebe and I have been discussing off and on (as well as > >> > talking with Charles) a good way forward to balance two competing > >> > desires: > >> > > > >> > > * addition of new features that are needed in NumPy > >> > > * improving the code-base generally and moving towards a > >> > more maintainable NumPy > >> > > > >> > > I know there are load voices for just focusing on the second of > >> > these and avoiding the first until we have finished that. I > >> > recognize the need to improve the code base, but I will also be > >> > pushing for improvements to the feature-set and user experience in > >> > the process. > >> > > > >> > > As a result, I am proposing a rough outline for releases over > the > >> > next year: > >> > > > >> > > * NumPy 1.7 to come out as soon as the serious bugs can > be > >> > eliminated. Bryan, Francesc, Mark, and I are able to help triage > >> > some of those. > >> > > > >> > > * NumPy 1.8 to come out in July which will have as many > >> > ABI-compatible feature enhancements as we can add while improving > >> > test coverage and code cleanup. I will post to this list more > >> > details of what we plan to address with it later. Included for > >> > possible inclusion are: > >> > > * resolving the NA/missing-data issues > >> > > * finishing group-by > >> > > * incorporating the start of label arrays > >> > > * incorporating a meta-object > >> > > * a few new dtypes (variable-length string, > >> > varialbe-length unicode and an enum type) > >> > > * adding ufunc support for flexible dtypes and possibly > >> > structured arrays > >> > > * allowing generalized ufuncs to work on more kinds of > >> > arrays besides just contiguous > >> > > * improving the ability for NumPy to receive > JIT-generated > >> > function pointers for ufuncs and other calculation opportunities > >> > > * adding "filters" to Input and Output > >> > > * simple computed fields for dtypes > >> > > * accepting a Data-Type specification as a class or JSON > >> file > >> > > * work towards improving the dtype-addition mechanism > >> > > * re-factoring of code so that it can compile with a C++ > >> > compiler and be minimally dependent on Python data-structures. > >> > > >> > This is a pretty exciting list of features. What is the rationale > for > >> > code being compiled as C++ ? IMO, it will be difficult to do so > >> > without preventing useful C constructs, and without removing some > of > >> > the existing features (like our use of C99 complex). The subset > that > >> > is both C and C++ compatible is quite constraining. > >> > > >> > > >> > I'm in favor of this myself, C++ would allow a lot code cleanup and > make > >> > it easier to provide an extensible base, I think it would be a natural > >> > fit with numpy. Of course, some C++ projects become tangled messes of > >> > inheritance, but I'd be very interested in seeing what a good C++ > >> > designer like Mark, intimately familiar with the numpy code base, > could > >> > do. This opportunity might not come by again anytime soon and I think > we > >> > should grab onto it. The initial step would be a release whose code > that > >> > would compile in both C/C++, which mostly comes down to removing C++ > >> > keywords like 'new'. > >> > > >> > I did suggest running it by you for build issues, so please raise any > >> > you can think of. Note that MatPlotLib is in C++, so I don't think the > >> > problems are insurmountable. And choosing a set of compilers to > support > >> > is something that will need to be done. > >> > >> It's true that matplotlib relies heavily on C++, both via the Agg > >> library and in its own extension code. Personally, I don't like this; I > >> think it raises the barrier to contributing. C++ is an order of > >> magnitude more complicated than C--harder to read, and much harder to > >> write, unless one is a true expert. In mpl it brings reliance on the CXX > >> library, which Mike D. has had to help maintain. And if it does > >> increase compiler specificity, that's bad. > >> > > > > This gets to the recruitment issue, which is one of the most important > > problems I see numpy facing. I personally have contributed a lot of code > to > > NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++ > was > > the biggest negative point when I considered whether it was worth > > contributing to the project. I suspect there are many programmers out > there > > who are skilled in low-level, high-performance C++, who would be willing > to > > contribute, but don't want to code in C. > > > > I believe NumPy should be trying to find people who want to make high > > performance, close to the metal, libraries. This is a very different type > > of programmer than one who wants to program in Python, but is willing to > > dabble in a lower level language to make something run faster. High > > performance library development is one of the things the C++ developer > > community does very well, and that community is where we have a good > chance > > of finding the programmers NumPy needs. > > > > I would much rather see development in the direction of sticking with C > >> where direct low-level control and speed are needed, and using cython to > >> gain higher level language benefits where appropriate. Of course, that > >> brings in the danger of reliance on another complex tool, cython. If > >> that danger is considered excessive, then just stick with C. > >> > > > > There are many small benefits C++ can offer, even if numpy chooses only > to > > use a tiny subset of the C++ language. For example, RAII can be used to > > reliably eliminate PyObject reference leaks. > > > > Consider a regression like this: > > http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html > > > > Fixing this in C would require switching all the relevant usages of > > NPY_MAXARGS to use a dynamic memory allocation. This brings with it the > > potential of easily introducing a memory leak, and is a lot of work to > do. > > In C++, this functionality could be placed inside a class, where the > > deterministic construction/destruction semantics eliminate the risk of > > memory leaks and make the code easier to read at the same time. There are > > other examples like this where the C language has forced a suboptimal > > design choice because of how hard it would be to do it better. > > > > Cheers, > > Mark > > > > > > I think numpy really wants to use c++ templates to generate specific > instantiations of algorithms for each dtype from a generic version, rather > than > the current code that uses cpp. > > One of many places. Exception handling, smart pointers, and iterators are the first things that come to my mind. Note that smart pointers also provide a nice way to do some high performance stuff, like transparent pointer swapping with memory deallocation. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjordan1 at uw.edu Fri Feb 17 14:00:07 2012 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Fri, 17 Feb 2012 11:00:07 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: On Fri, Feb 17, 2012 at 10:21 AM, Mark Wiebe wrote: > On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing wrote: >> >> On 02/17/2012 05:39 AM, Charles R Harris wrote: >> > >> > >> > On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau > > > wrote: >> > >> > ? ? Hi Travis, >> > >> > ? ? On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant >> > ? ? > wrote: >> > ? ? ?> Mark Wiebe and I have been discussing off and on (as well as >> > ? ? talking with Charles) a good way forward to balance two competing >> > ? ? desires: >> > ? ? ?> >> > ? ? ?> ? ? ? ?* addition of new features that are needed in NumPy >> > ? ? ?> ? ? ? ?* improving the code-base generally and moving towards a >> > ? ? more maintainable NumPy >> > ? ? ?> >> > ? ? ?> I know there are load voices for just focusing on the second of >> > ? ? these and avoiding the first until we have finished that. ?I >> > ? ? recognize the need to improve the code base, but I will also be >> > ? ? pushing for improvements to the feature-set and user experience in >> > ? ? the process. >> > ? ? ?> >> > ? ? ?> As a result, I am proposing a rough outline for releases over the >> > ? ? next year: >> > ? ? ?> >> > ? ? ?> ? ? ? ?* NumPy 1.7 to come out as soon as the serious bugs can be >> > ? ? eliminated. ?Bryan, Francesc, Mark, and I are able to help triage >> > ? ? some of those. >> > ? ? ?> >> > ? ? ?> ? ? ? ?* NumPy 1.8 to come out in July which will have as many >> > ? ? ABI-compatible feature enhancements as we can add while improving >> > ? ? test coverage and code cleanup. ? I will post to this list more >> > ? ? details of what we plan to address with it later. ? ?Included for >> > ? ? possible inclusion are: >> > ? ? ?> ? ? ? ?* resolving the NA/missing-data issues >> > ? ? ?> ? ? ? ?* finishing group-by >> > ? ? ?> ? ? ? ?* incorporating the start of label arrays >> > ? ? ?> ? ? ? ?* incorporating a meta-object >> > ? ? ?> ? ? ? ?* a few new dtypes (variable-length string, >> > ? ? varialbe-length unicode and an enum type) >> > ? ? ?> ? ? ? ?* adding ufunc support for flexible dtypes and possibly >> > ? ? structured arrays >> > ? ? ?> ? ? ? ?* allowing generalized ufuncs to work on more kinds of >> > ? ? arrays besides just contiguous >> > ? ? ?> ? ? ? ?* improving the ability for NumPy to receive JIT-generated >> > ? ? function pointers for ufuncs and other calculation opportunities >> > ? ? ?> ? ? ? ?* adding "filters" to Input and Output >> > ? ? ?> ? ? ? ?* simple computed fields for dtypes >> > ? ? ?> ? ? ? ?* accepting a Data-Type specification as a class or JSON >> > file >> > ? ? ?> ? ? ? ?* work towards improving the dtype-addition mechanism >> > ? ? ?> ? ? ? ?* re-factoring of code so that it can compile with a C++ >> > ? ? compiler and be minimally dependent on Python data-structures. >> > >> > ? ? This is a pretty exciting list of features. What is the rationale >> > for >> > ? ? code being compiled as C++ ? IMO, it will be difficult to do so >> > ? ? without preventing useful C constructs, and without removing some of >> > ? ? the existing features (like our use of C99 complex). The subset that >> > ? ? is both C and C++ compatible is quite constraining. >> > >> > >> > I'm in favor of this myself, C++ would allow a lot code cleanup and make >> > it easier to provide an extensible base, I think it would be a natural >> > fit with numpy. Of course, some C++ projects become tangled messes of >> > inheritance, but I'd be very interested in seeing what a good C++ >> > designer like Mark, intimately familiar with the numpy code base, could >> > do. This opportunity might not come by again anytime soon and I think we >> > should grab onto it. The initial step would be a release whose code that >> > would compile in both C/C++, which mostly comes down to removing C++ >> > keywords like 'new'. >> > >> > I did suggest running it by you for build issues, so please raise any >> > you can think of. Note that MatPlotLib is in C++, so I don't think the >> > problems are insurmountable. And choosing a set of compilers to support >> > is something that will need to be done. >> >> It's true that matplotlib relies heavily on C++, both via the Agg >> library and in its own extension code. ?Personally, I don't like this; I >> think it raises the barrier to contributing. ?C++ is an order of >> magnitude more complicated than C--harder to read, and much harder to >> write, unless one is a true expert. In mpl it brings reliance on the CXX >> library, which Mike D. has had to help maintain. ?And if it does >> increase compiler specificity, that's bad. > > > This gets to the recruitment issue, which is one of the most important > problems I see numpy facing. I personally have contributed a lot of code to > NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++ was > the biggest negative point when I considered whether it was worth > contributing to the project. I suspect there are many programmers out there > who are skilled in low-level, high-performance C++, who would be willing to > contribute, but don't want to code in C. > > I believe NumPy should be trying to find people who want to make high > performance, close to the metal, libraries. This is a very different type of > programmer than one who wants to program in Python, but is willing to dabble > in a lower level language to make something run faster. High performance > library development is one of the things the C++ developer community does > very well, and that community is where we have a good chance of finding the > programmers NumPy needs. > >> I would much rather see development in the direction of sticking with C >> where direct low-level control and speed are needed, and using cython to >> gain higher level language benefits where appropriate. ?Of course, that >> brings in the danger of reliance on another complex tool, cython. ?If >> that danger is considered excessive, then just stick with C. > > > There are many small benefits C++ can offer, even if numpy chooses only to > use a tiny subset of the C++ language. For example, RAII can be used to > reliably eliminate PyObject reference leaks. > > Consider a regression like this: > http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html > > Fixing this in C would require switching all the relevant usages of > NPY_MAXARGS to use a dynamic memory allocation. This brings with it the > potential of easily introducing a memory leak, and is a lot of work to do. > In C++, this functionality could be placed inside a class, where the > deterministic construction/destruction semantics eliminate the risk of > memory leaks and make the code easier to read at the same time. There are > other examples like this where the C language has forced a suboptimal design > choice because of how hard it would be to do it better. > > Cheers, > Mark > In a similar vein, could incorporating C++ lead to a simpler low-level API for numpy? I know Mark has talked before about--in the long-term, as a dream project to scratch his own itch, and something the BDF12 doesn't necessarily agree with--implementing the great ideas in numpy as a layered C++ library. (Which would have the added benefit of making numpy more of a general array library that could be exposed to any language which can call C++ libraries.) I don't imagine that's on the table for anything near-term, but I wonder if making more of the low-level stuff C++ would make it easier for performance nuts to write their own code in C/C++ interfacing with numpy, and then expose it to python. After playing around with ufuncs at the C level for a little while last summer, I quickly realized any simplifications would be greatly appreciated. -Chris >> >> Eric >> >> > >> > Chuck >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From matthew.brett at gmail.com Fri Feb 17 14:07:20 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 17 Feb 2012 11:07:20 -0800 Subject: [Numpy-discussion] Numpy governance update In-Reply-To: References: <4F3BB853.9090100@gmail.com> <4F3C0866.3000703@gmail.com> <4F3C4D58.6050007@astro.uio.no> <52C9DA42-4070-46D5-88FF-D2BDCA44F34B@continuum.io> <5C553755-83A5-4234-AF4E-FCFD4B59578A@continuum.io> <4F3DACCB.2040004@gmail.com> Message-ID: Hi Ben, On Thu, Feb 16, 2012 at 9:54 PM, Benjamin Root wrote: > > > On Thursday, February 16, 2012, John Hunter wrote: >> >> >> >> On Thu, Feb 16, 2012 at 7:26 PM, Alan G Isaac >> wrote: >>> >>> On 2/16/2012 7:22 PM, Matthew Brett wrote: >>> > This has not been an encouraging episode in striving for consensus. >>> >>> I disagree. >>> Failure to reach consensus does not imply lack of striving. >>> >> >> Hey Alan, thanks for your thoughtful and nuanced views. ?I agree ?with >> everything you've said, but have a few additional points. >> >> At the risk of wading into a thread that has grown far too long, and >> echoing Eric's comments that the idea of governance is murky at best >> when there is no provision for enforceability, I have a few comments. >> Full disclosure: Travis has asked me and I have agreed to to serve on >> a board for "numfocus", the not-for-profit arm of his efforts to >> promote numpy and related tools. ?Although I have no special numpy >> developer chops, as the original author of matplotlib, which is one of >> the leading "numpy clients", he asked me to join his organization as a >> "community representative". ?I support his efforts, and so agreed to >> join the numfocus board. >> >> My first and most important point is that the subtext of many postings >> here >> about the fear of undue and inappropriate influence of Continuum under >> Travis' leadership is far overblown. ?Travis created numpy -- it is >> his baby. ?Undeniably, he created it by standing on the shoulders of >> giants: Jim Hugunin, Paul Dubois, Perry Greenfield and his team, and >> many others. ?But the idea that we need to guard against the >> possibility that his corporate interests will compromise his interests >> in "what is best for numpy" is academic at best. >> >> As someone who has created a significant project in the realm of >> "scientific computing in Python", I can tell you that it is something >> I take quite a bit of pride in and it is very important to me that the >> project thrives as it was intended to: as a free, open-source, >> best-practice way of doing science. ?I know Travis well enough to know >> he feels the same way -- numpy doing well is *at least* important to >> him his company doing well. ?All of his recent actions to start a >> company and foundation which focuses resources on numpy and related >> tools reinforce that view. ?If he had a different temperament, he >> wouldn't have devoted five to ten years of is life to Numeric, scipy >> and numpy. ?He is a BDFL for a reason: he has earned our trust. >> >> And he has proven his ability to lead when *almost everyone* was >> against him. ?At the height of the Numeric/numarray split, and I was >> deeply involved in this as the mpl author because we had a "numerix" >> compatibility layer to allow users to use one or the other, Travis >> proposed writing numpy to solve both camp's problems. ?I really can't >> remember a single individual who supported him. ?What I remember is >> the cacophony of voices who though this was a bad idea, because of the >> "third fork" problem. ?But Travis forged ahead, on his own, wrote >> numpy, and re-united the Numeric and numarray camps. ?And >> all-the-while he maintained his friendship with the numarray >> developers (Perry Greenfield who led the numarray development effort >> has also been invited by Travis to the numfocus board, as has Fernando >> Perez and Jarrod Millman). ?Although MPL at the time agreed to support >> a third version in its numerix compatibility layer for numpy, I can >> thankfully say we have since dropped support for the compatibility >> layer entirely as we all use numpy now. ?This to me is the distilled >> essence of leadership, against the voices of the masses, and it bears >> remembering. >> >> I have two more points I want to make: one is on democracy, and one is >> on corporate control. ?On corporate control: there have been a number >> of posts in this thread about the worries and dangers that Continuum >> poses as the corporate sponser of numpy development, about how this >> may cause numpy to shift from a model of a few loosely connected, >> decentralized cadre of volunteers to a centrally controlled steering >> committee of programmers who are controlled by corporate headquarters >> and who make all their decisions around the water cooler unobserved by >> the community of users. >> >> I want to make a connection to something that happened in the history >> of matplotlib development, something that is not strictly analogous >> but I think close enough to be informative. ?Sometime around 2005, >> Perry Greenfield, who heads the development team of the Space >> Telescope Science Institute (STScI) that is charged with processing >> the Hubble image pipeline, emailed me that he was considering using >> matplotlib as their primary image visualization tool. ?I can't tell >> you how excited I was at the time. ?The idea of having institutional >> sponsorship from someone as prestigious and resourceful as STScI was >> hugely motivating. ?I worked feverishly for months to add stuff they >> needed: better rendering, better image support, mathtext and lots >> more. ?But more importantly, Perry was offering to bring institutional >> support to my project: well qualified full-time employees who >> dedicated a significant part of their time to matplotlib >> development. He had done this before with numarray development, and >> the contributions of his team are enormous. ?Many mpl features owe >> their support to institutional sopnsership: Perry's group deserves the >> biggest props, but Ted Drain's group at the JPL and corporate sponsors >> as well are on the list. >> >> What I want you to think about are the parallels between Perry and his >> team joining matplotlib's development effort and Continuum's stated >> desire to contribute to numpy development. ?Yes, STScI is a >> not-for-profit entity operated by NASA, and Continuum is a >> for-profit-entity with a not-for-profit arm (numfocus). ?But the >> differences are not so great in my experience. ?Both for-profits and >> not-for-profits experience institutional pressures to get code out on >> a deadline. ?In fact, perhaps our "finest hour" in matplotlib >> development came as a result of one of out not-for-profit client's >> deadlines. ?My favorite story is when the Jet Propulsion Labs at >> Caltech emailed me about the inadequacy of our ellipse approximations, >> and gave us the constraint that the Mars Rover was scheduled to land >> in the next few months. ?Talk about a hard deadline! ?Michael >> Droettboom, under Perry's direction, implemented a >> "8-cubic-spline-approximation-to-curves-in-the-viewport" solution that >> I honestly think gives matplotlib the *best* approximation to such >> curves anywhere. ?Period. ?Institutional deadlines to get working code >> into the codebase, whether from a for-profit or not-for-profit entity, >> and usually are a good thing. It may not be perfect going in, but it is >> usually better for being there. >> >> That is one example from matplotlib's history that illustrates the >> benefit of institutional sponsers in a project. ?In this example, the >> organizational goal -- getting the Rover to land without crashing -- is >> one we can all relate to and support. ?And the resolution to the story, >> in which a heroically talented developer (Michael D) steps up to >> solve the problem, is one we can all aspire to. ?But the essential >> ingredients of the story are not so different from what we face here: >> an organization needs to solve a problem on a deadline; another >> organization, possibly related, has the resources to get the job done; >> all efforts are contributed to the public domain. >> >> Now that brings me to my final and perhaps most controverisal point. >> I don't believe democracy is the right solution for most open source >> problems. ?As exhibit A, I reference the birth of numpy itself that I >> discussed above. ?Numpy would have never happened if community input >> were considered. ?I'm pretty sure that all of us that were there at >> the time can attest to this. >> >> Democracy is something that many of us have grown up by default to >> consider as the right solution to many, if not most or, problems of >> governance. ?I believe it is a solution to a specific problem of >> governance. ?I do not believe democracy is a panacea or an ideal >> solution for most problems: rather it is the right solution for which >> the consequences of failure are too high. ?In a state (by which I mean >> a government with a power to subject its people to its will by force >> of arms) where the consequences of failure to submit include the >> death, dismemberment, or imprisonment of dissenters, democracy is a >> safeguard against the excesses of the powerful. ?Generally, there is >> no reason to believe that the simple majority of people polled is the >> "best" or "right" answer, but there is also no reason to believe that >> those who hold power will rule beneficiently. ?The democratic ability >> of the people to check to the rule of the few and powerful is >> essential to insure the survival of the minority. >> >> In open source software development, we face none of these problems. >> Our power to fork is precisely the power the minority in a tyranical >> democracy lacks: noone will kill us for going off the reservation. ?We >> are free to use the product or not, to modify it or not, to enhance it >> or not. >> >> The power to fork is not abstract: it is essential. ?matplotlib, and >> chaco, both rely *heavily* on agg, the Antigrain C++ rendering >> library. ?At some point many years ago, Maxim, the author of Agg, >> decided to change the license of Agg (circa version 2.5) to GPL rather >> than BSD. ?Obviously, this was a non-starter for projects like mpl, >> scipy and chaco which assumed BSD licensing terms. ?Unfortunately, >> Maxim had a new employer which appeared to us to be dictating the >> terms and our best arguments fell on deaf ears. ?No matter: mpl and >> Enthought chaco have continued to ship agg 2.4, pre-GPL, and I think >> that less than 1% of our users have even noticed. ?Yes, we forked the >> project, and yes, noone has noticed. ?To me this is the ultimate >> reason why governance of open source, free projects does not need to >> be democratic. ?As painful as a fork may be, it is the ultimate >> antidote to a leader who may not have your interests in mind. ?It is >> an antidote that we citizens in a state government may not have. >> >> It is true that numpy exists in a privileged position in a way that >> matplotlib or scipy does not. ?Numpy is the core. ?Yes, Continuum is >> different than STScI because Travis is both the lead of Numpy and the >> lead of the company sponsoring numpy. ?These are important >> differences. ?In the worst cases, we might imagine that these >> differences will negatively impact numpy and associated tools. ?But >> these worst case scenarios that we imagine will most likely simply >> distract us from what is going on: Travis, one of the most prolific >> and valuable contributers to the scientific python community, has >> decided to refocus his efforts to do more. ?And that is a very happy >> moment for all of us. > > > > John, > > Thank you for taking the time to share your perspective. ?As always, it is > very interesting. ?I have only been involved with numpy for the past 3 > years. From my perspective, Chuck has been BDFL, in a sense. ?It is hard for > me to imagine python without numpy, and it is difficult to fully appreciate > the effort Travis has put in. Perspectives like your are useful for this. > > So, power is derived by a mandate from the masses. That part is always > democratic. However, the form of governance is not required to be > democratic. > > If we are agreeing to Travis being BDF12, then that is just as valid in my > mind as us agreeing to some other structure. What is important is the > community coming together and agreeing on this. ?Would that be fine for > everybody? Actually this is more to thank you for the tone of your email than anything else. I know we've disagreed from time to time, but I do appreciate your constant and kind attempts to bring people together. I think there's no controversy about Travis being BDF$N$ and yes, I agree, to specify that is a step forward. Best, Matthew From ben.root at ou.edu Fri Feb 17 14:09:36 2012 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 17 Feb 2012 13:09:36 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: On Fri, Feb 17, 2012 at 1:00 PM, Christopher Jordan-Squire wrote: > On Fri, Feb 17, 2012 at 10:21 AM, Mark Wiebe wrote: > > On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing > wrote: > >> > >> On 02/17/2012 05:39 AM, Charles R Harris wrote: > >> > > >> > > >> > On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau >> > > wrote: > >> > > >> > Hi Travis, > >> > > >> > On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant > >> > > wrote: > >> > > Mark Wiebe and I have been discussing off and on (as well as > >> > talking with Charles) a good way forward to balance two competing > >> > desires: > >> > > > >> > > * addition of new features that are needed in NumPy > >> > > * improving the code-base generally and moving towards a > >> > more maintainable NumPy > >> > > > >> > > I know there are load voices for just focusing on the second of > >> > these and avoiding the first until we have finished that. I > >> > recognize the need to improve the code base, but I will also be > >> > pushing for improvements to the feature-set and user experience in > >> > the process. > >> > > > >> > > As a result, I am proposing a rough outline for releases over > the > >> > next year: > >> > > > >> > > * NumPy 1.7 to come out as soon as the serious bugs can > be > >> > eliminated. Bryan, Francesc, Mark, and I are able to help triage > >> > some of those. > >> > > > >> > > * NumPy 1.8 to come out in July which will have as many > >> > ABI-compatible feature enhancements as we can add while improving > >> > test coverage and code cleanup. I will post to this list more > >> > details of what we plan to address with it later. Included for > >> > possible inclusion are: > >> > > * resolving the NA/missing-data issues > >> > > * finishing group-by > >> > > * incorporating the start of label arrays > >> > > * incorporating a meta-object > >> > > * a few new dtypes (variable-length string, > >> > varialbe-length unicode and an enum type) > >> > > * adding ufunc support for flexible dtypes and possibly > >> > structured arrays > >> > > * allowing generalized ufuncs to work on more kinds of > >> > arrays besides just contiguous > >> > > * improving the ability for NumPy to receive > JIT-generated > >> > function pointers for ufuncs and other calculation opportunities > >> > > * adding "filters" to Input and Output > >> > > * simple computed fields for dtypes > >> > > * accepting a Data-Type specification as a class or JSON > >> > file > >> > > * work towards improving the dtype-addition mechanism > >> > > * re-factoring of code so that it can compile with a C++ > >> > compiler and be minimally dependent on Python data-structures. > >> > > >> > This is a pretty exciting list of features. What is the rationale > >> > for > >> > code being compiled as C++ ? IMO, it will be difficult to do so > >> > without preventing useful C constructs, and without removing some > of > >> > the existing features (like our use of C99 complex). The subset > that > >> > is both C and C++ compatible is quite constraining. > >> > > >> > > >> > I'm in favor of this myself, C++ would allow a lot code cleanup and > make > >> > it easier to provide an extensible base, I think it would be a natural > >> > fit with numpy. Of course, some C++ projects become tangled messes of > >> > inheritance, but I'd be very interested in seeing what a good C++ > >> > designer like Mark, intimately familiar with the numpy code base, > could > >> > do. This opportunity might not come by again anytime soon and I think > we > >> > should grab onto it. The initial step would be a release whose code > that > >> > would compile in both C/C++, which mostly comes down to removing C++ > >> > keywords like 'new'. > >> > > >> > I did suggest running it by you for build issues, so please raise any > >> > you can think of. Note that MatPlotLib is in C++, so I don't think the > >> > problems are insurmountable. And choosing a set of compilers to > support > >> > is something that will need to be done. > >> > >> It's true that matplotlib relies heavily on C++, both via the Agg > >> library and in its own extension code. Personally, I don't like this; I > >> think it raises the barrier to contributing. C++ is an order of > >> magnitude more complicated than C--harder to read, and much harder to > >> write, unless one is a true expert. In mpl it brings reliance on the CXX > >> library, which Mike D. has had to help maintain. And if it does > >> increase compiler specificity, that's bad. > > > > > > This gets to the recruitment issue, which is one of the most important > > problems I see numpy facing. I personally have contributed a lot of code > to > > NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++ > was > > the biggest negative point when I considered whether it was worth > > contributing to the project. I suspect there are many programmers out > there > > who are skilled in low-level, high-performance C++, who would be willing > to > > contribute, but don't want to code in C. > > > > I believe NumPy should be trying to find people who want to make high > > performance, close to the metal, libraries. This is a very different > type of > > programmer than one who wants to program in Python, but is willing to > dabble > > in a lower level language to make something run faster. High performance > > library development is one of the things the C++ developer community does > > very well, and that community is where we have a good chance of finding > the > > programmers NumPy needs. > > > >> I would much rather see development in the direction of sticking with C > >> where direct low-level control and speed are needed, and using cython to > >> gain higher level language benefits where appropriate. Of course, that > >> brings in the danger of reliance on another complex tool, cython. If > >> that danger is considered excessive, then just stick with C. > > > > > > There are many small benefits C++ can offer, even if numpy chooses only > to > > use a tiny subset of the C++ language. For example, RAII can be used to > > reliably eliminate PyObject reference leaks. > > > > Consider a regression like this: > > http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html > > > > Fixing this in C would require switching all the relevant usages of > > NPY_MAXARGS to use a dynamic memory allocation. This brings with it the > > potential of easily introducing a memory leak, and is a lot of work to > do. > > In C++, this functionality could be placed inside a class, where the > > deterministic construction/destruction semantics eliminate the risk of > > memory leaks and make the code easier to read at the same time. There are > > other examples like this where the C language has forced a suboptimal > design > > choice because of how hard it would be to do it better. > > > > Cheers, > > Mark > > > > In a similar vein, could incorporating C++ lead to a simpler low-level > API for numpy? I know Mark has talked before about--in the long-term, > as a dream project to scratch his own itch, and something the BDF12 > doesn't necessarily agree with--implementing the great ideas in numpy > as a layered C++ library. (Which would have the added benefit of > making numpy more of a general array library that could be exposed to > any language which can call C++ libraries.) > > I don't imagine that's on the table for anything near-term, but I > wonder if making more of the low-level stuff C++ would make it easier > for performance nuts to write their own code in C/C++ interfacing with > numpy, and then expose it to python. After playing around with ufuncs > at the C level for a little while last summer, I quickly realized any > simplifications would be greatly appreciated. > > -Chris > > > I am also in favor of moving towards a C++ oriented library. Personally, I find C++ easier to read and understand, most likely because I learned it first. I only learned C in the context of learning C++. Just a thought, with the upcoming revisions to the C++ standard, this does open up the possibility of some nice templating features that would make the library easier to use in native C++ programs. On a side note, does anybody use std::valarray? Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Feb 17 14:15:55 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 17 Feb 2012 12:15:55 -0700 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: On Fri, Feb 17, 2012 at 12:09 PM, Benjamin Root wrote: > > > On Fri, Feb 17, 2012 at 1:00 PM, Christopher Jordan-Squire < > cjordan1 at uw.edu> wrote: > >> On Fri, Feb 17, 2012 at 10:21 AM, Mark Wiebe wrote: >> > On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing >> wrote: >> >> >> >> On 02/17/2012 05:39 AM, Charles R Harris wrote: >> >> > >> >> > >> >> > On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau < >> cournape at gmail.com >> >> > > wrote: >> >> > >> >> > Hi Travis, >> >> > >> >> > On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant >> >> > > wrote: >> >> > > Mark Wiebe and I have been discussing off and on (as well as >> >> > talking with Charles) a good way forward to balance two competing >> >> > desires: >> >> > > >> >> > > * addition of new features that are needed in NumPy >> >> > > * improving the code-base generally and moving towards >> a >> >> > more maintainable NumPy >> >> > > >> >> > > I know there are load voices for just focusing on the second >> of >> >> > these and avoiding the first until we have finished that. I >> >> > recognize the need to improve the code base, but I will also be >> >> > pushing for improvements to the feature-set and user experience >> in >> >> > the process. >> >> > > >> >> > > As a result, I am proposing a rough outline for releases over >> the >> >> > next year: >> >> > > >> >> > > * NumPy 1.7 to come out as soon as the serious bugs >> can be >> >> > eliminated. Bryan, Francesc, Mark, and I are able to help triage >> >> > some of those. >> >> > > >> >> > > * NumPy 1.8 to come out in July which will have as many >> >> > ABI-compatible feature enhancements as we can add while improving >> >> > test coverage and code cleanup. I will post to this list more >> >> > details of what we plan to address with it later. Included for >> >> > possible inclusion are: >> >> > > * resolving the NA/missing-data issues >> >> > > * finishing group-by >> >> > > * incorporating the start of label arrays >> >> > > * incorporating a meta-object >> >> > > * a few new dtypes (variable-length string, >> >> > varialbe-length unicode and an enum type) >> >> > > * adding ufunc support for flexible dtypes and possibly >> >> > structured arrays >> >> > > * allowing generalized ufuncs to work on more kinds of >> >> > arrays besides just contiguous >> >> > > * improving the ability for NumPy to receive >> JIT-generated >> >> > function pointers for ufuncs and other calculation opportunities >> >> > > * adding "filters" to Input and Output >> >> > > * simple computed fields for dtypes >> >> > > * accepting a Data-Type specification as a class or >> JSON >> >> > file >> >> > > * work towards improving the dtype-addition mechanism >> >> > > * re-factoring of code so that it can compile with a >> C++ >> >> > compiler and be minimally dependent on Python data-structures. >> >> > >> >> > This is a pretty exciting list of features. What is the rationale >> >> > for >> >> > code being compiled as C++ ? IMO, it will be difficult to do so >> >> > without preventing useful C constructs, and without removing >> some of >> >> > the existing features (like our use of C99 complex). The subset >> that >> >> > is both C and C++ compatible is quite constraining. >> >> > >> >> > >> >> > I'm in favor of this myself, C++ would allow a lot code cleanup and >> make >> >> > it easier to provide an extensible base, I think it would be a >> natural >> >> > fit with numpy. Of course, some C++ projects become tangled messes of >> >> > inheritance, but I'd be very interested in seeing what a good C++ >> >> > designer like Mark, intimately familiar with the numpy code base, >> could >> >> > do. This opportunity might not come by again anytime soon and I >> think we >> >> > should grab onto it. The initial step would be a release whose code >> that >> >> > would compile in both C/C++, which mostly comes down to removing C++ >> >> > keywords like 'new'. >> >> > >> >> > I did suggest running it by you for build issues, so please raise any >> >> > you can think of. Note that MatPlotLib is in C++, so I don't think >> the >> >> > problems are insurmountable. And choosing a set of compilers to >> support >> >> > is something that will need to be done. >> >> >> >> It's true that matplotlib relies heavily on C++, both via the Agg >> >> library and in its own extension code. Personally, I don't like this; >> I >> >> think it raises the barrier to contributing. C++ is an order of >> >> magnitude more complicated than C--harder to read, and much harder to >> >> write, unless one is a true expert. In mpl it brings reliance on the >> CXX >> >> library, which Mike D. has had to help maintain. And if it does >> >> increase compiler specificity, that's bad. >> > >> > >> > This gets to the recruitment issue, which is one of the most important >> > problems I see numpy facing. I personally have contributed a lot of >> code to >> > NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++ >> was >> > the biggest negative point when I considered whether it was worth >> > contributing to the project. I suspect there are many programmers out >> there >> > who are skilled in low-level, high-performance C++, who would be >> willing to >> > contribute, but don't want to code in C. >> > >> > I believe NumPy should be trying to find people who want to make high >> > performance, close to the metal, libraries. This is a very different >> type of >> > programmer than one who wants to program in Python, but is willing to >> dabble >> > in a lower level language to make something run faster. High performance >> > library development is one of the things the C++ developer community >> does >> > very well, and that community is where we have a good chance of finding >> the >> > programmers NumPy needs. >> > >> >> I would much rather see development in the direction of sticking with C >> >> where direct low-level control and speed are needed, and using cython >> to >> >> gain higher level language benefits where appropriate. Of course, that >> >> brings in the danger of reliance on another complex tool, cython. If >> >> that danger is considered excessive, then just stick with C. >> > >> > >> > There are many small benefits C++ can offer, even if numpy chooses only >> to >> > use a tiny subset of the C++ language. For example, RAII can be used to >> > reliably eliminate PyObject reference leaks. >> > >> > Consider a regression like this: >> > http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html >> > >> > Fixing this in C would require switching all the relevant usages of >> > NPY_MAXARGS to use a dynamic memory allocation. This brings with it the >> > potential of easily introducing a memory leak, and is a lot of work to >> do. >> > In C++, this functionality could be placed inside a class, where the >> > deterministic construction/destruction semantics eliminate the risk of >> > memory leaks and make the code easier to read at the same time. There >> are >> > other examples like this where the C language has forced a suboptimal >> design >> > choice because of how hard it would be to do it better. >> > >> > Cheers, >> > Mark >> > >> >> In a similar vein, could incorporating C++ lead to a simpler low-level >> API for numpy? I know Mark has talked before about--in the long-term, >> as a dream project to scratch his own itch, and something the BDF12 >> doesn't necessarily agree with--implementing the great ideas in numpy >> as a layered C++ library. (Which would have the added benefit of >> making numpy more of a general array library that could be exposed to >> any language which can call C++ libraries.) >> >> I don't imagine that's on the table for anything near-term, but I >> wonder if making more of the low-level stuff C++ would make it easier >> for performance nuts to write their own code in C/C++ interfacing with >> numpy, and then expose it to python. After playing around with ufuncs >> at the C level for a little while last summer, I quickly realized any >> simplifications would be greatly appreciated. >> >> -Chris >> >> >> > I am also in favor of moving towards a C++ oriented library. Personally, > I find C++ easier to read and understand, most likely because I learned it > first. I only learned C in the context of learning C++. > > Just a thought, with the upcoming revisions to the C++ standard, this does > open up the possibility of some nice templating features that would make > the library easier to use in native C++ programs. On a side note, does > anybody use std::valarray? > > My impression is that std::valarray didn't really solve the problems it was intended to solve. IIRC, the valarray author himself said as much, but I don't recall where. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Fri Feb 17 14:31:58 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 17 Feb 2012 11:31:58 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: On Fri, Feb 17, 2012 at 11:00 AM, Christopher Jordan-Squire wrote: > On Fri, Feb 17, 2012 at 10:21 AM, Mark Wiebe wrote: > > On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing > wrote: > >> > >> On 02/17/2012 05:39 AM, Charles R Harris wrote: > >> > > >> > > >> > On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau >> > > wrote: > >> > > >> > Hi Travis, > >> > > >> > On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant > >> > > wrote: > >> > > Mark Wiebe and I have been discussing off and on (as well as > >> > talking with Charles) a good way forward to balance two competing > >> > desires: > >> > > > >> > > * addition of new features that are needed in NumPy > >> > > * improving the code-base generally and moving towards a > >> > more maintainable NumPy > >> > > > >> > > I know there are load voices for just focusing on the second of > >> > these and avoiding the first until we have finished that. I > >> > recognize the need to improve the code base, but I will also be > >> > pushing for improvements to the feature-set and user experience in > >> > the process. > >> > > > >> > > As a result, I am proposing a rough outline for releases over > the > >> > next year: > >> > > > >> > > * NumPy 1.7 to come out as soon as the serious bugs can > be > >> > eliminated. Bryan, Francesc, Mark, and I are able to help triage > >> > some of those. > >> > > > >> > > * NumPy 1.8 to come out in July which will have as many > >> > ABI-compatible feature enhancements as we can add while improving > >> > test coverage and code cleanup. I will post to this list more > >> > details of what we plan to address with it later. Included for > >> > possible inclusion are: > >> > > * resolving the NA/missing-data issues > >> > > * finishing group-by > >> > > * incorporating the start of label arrays > >> > > * incorporating a meta-object > >> > > * a few new dtypes (variable-length string, > >> > varialbe-length unicode and an enum type) > >> > > * adding ufunc support for flexible dtypes and possibly > >> > structured arrays > >> > > * allowing generalized ufuncs to work on more kinds of > >> > arrays besides just contiguous > >> > > * improving the ability for NumPy to receive > JIT-generated > >> > function pointers for ufuncs and other calculation opportunities > >> > > * adding "filters" to Input and Output > >> > > * simple computed fields for dtypes > >> > > * accepting a Data-Type specification as a class or JSON > >> > file > >> > > * work towards improving the dtype-addition mechanism > >> > > * re-factoring of code so that it can compile with a C++ > >> > compiler and be minimally dependent on Python data-structures. > >> > > >> > This is a pretty exciting list of features. What is the rationale > >> > for > >> > code being compiled as C++ ? IMO, it will be difficult to do so > >> > without preventing useful C constructs, and without removing some > of > >> > the existing features (like our use of C99 complex). The subset > that > >> > is both C and C++ compatible is quite constraining. > >> > > >> > > >> > I'm in favor of this myself, C++ would allow a lot code cleanup and > make > >> > it easier to provide an extensible base, I think it would be a natural > >> > fit with numpy. Of course, some C++ projects become tangled messes of > >> > inheritance, but I'd be very interested in seeing what a good C++ > >> > designer like Mark, intimately familiar with the numpy code base, > could > >> > do. This opportunity might not come by again anytime soon and I think > we > >> > should grab onto it. The initial step would be a release whose code > that > >> > would compile in both C/C++, which mostly comes down to removing C++ > >> > keywords like 'new'. > >> > > >> > I did suggest running it by you for build issues, so please raise any > >> > you can think of. Note that MatPlotLib is in C++, so I don't think the > >> > problems are insurmountable. And choosing a set of compilers to > support > >> > is something that will need to be done. > >> > >> It's true that matplotlib relies heavily on C++, both via the Agg > >> library and in its own extension code. Personally, I don't like this; I > >> think it raises the barrier to contributing. C++ is an order of > >> magnitude more complicated than C--harder to read, and much harder to > >> write, unless one is a true expert. In mpl it brings reliance on the CXX > >> library, which Mike D. has had to help maintain. And if it does > >> increase compiler specificity, that's bad. > > > > > > This gets to the recruitment issue, which is one of the most important > > problems I see numpy facing. I personally have contributed a lot of code > to > > NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++ > was > > the biggest negative point when I considered whether it was worth > > contributing to the project. I suspect there are many programmers out > there > > who are skilled in low-level, high-performance C++, who would be willing > to > > contribute, but don't want to code in C. > > > > I believe NumPy should be trying to find people who want to make high > > performance, close to the metal, libraries. This is a very different > type of > > programmer than one who wants to program in Python, but is willing to > dabble > > in a lower level language to make something run faster. High performance > > library development is one of the things the C++ developer community does > > very well, and that community is where we have a good chance of finding > the > > programmers NumPy needs. > > > >> I would much rather see development in the direction of sticking with C > >> where direct low-level control and speed are needed, and using cython to > >> gain higher level language benefits where appropriate. Of course, that > >> brings in the danger of reliance on another complex tool, cython. If > >> that danger is considered excessive, then just stick with C. > > > > > > There are many small benefits C++ can offer, even if numpy chooses only > to > > use a tiny subset of the C++ language. For example, RAII can be used to > > reliably eliminate PyObject reference leaks. > > > > Consider a regression like this: > > http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html > > > > Fixing this in C would require switching all the relevant usages of > > NPY_MAXARGS to use a dynamic memory allocation. This brings with it the > > potential of easily introducing a memory leak, and is a lot of work to > do. > > In C++, this functionality could be placed inside a class, where the > > deterministic construction/destruction semantics eliminate the risk of > > memory leaks and make the code easier to read at the same time. There are > > other examples like this where the C language has forced a suboptimal > design > > choice because of how hard it would be to do it better. > > > > Cheers, > > Mark > > > > In a similar vein, could incorporating C++ lead to a simpler low-level > API for numpy? This could definitely happen. One way to do it is to have a stable C API which remains fixed over many releases, and a C++ library which is allowed to change significantly at each release. This is what the LLVM project does, for example. OpenCV is an example of another project which was previously just C, but now has an extensive C++ API. > I know Mark has talked before about--in the long-term, > as a dream project to scratch his own itch, and something the BDF12 > doesn't necessarily agree with--implementing the great ideas in numpy > as a layered C++ library. (Which would have the added benefit of > making numpy more of a general array library that could be exposed to > any language which can call C++ libraries.) > > I don't imagine that's on the table for anything near-term, but I > wonder if making more of the low-level stuff C++ would make it easier > for performance nuts to write their own code in C/C++ interfacing with > numpy, and then expose it to python. After playing around with ufuncs > at the C level for a little while last summer, I quickly realized any > simplifications would be greatly appreciated. > This is all possible, yes. The way this typically works is that library authors use advanced C++ techniques to get generality, performance, and usability. The library user can then write code which is very simple and written in a way which makes simple errors very difficult to make compared to using a C-like API. -Mark > -Chris > > > >> > >> Eric > >> > >> > > >> > Chuck > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Feb 17 15:23:49 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 17 Feb 2012 13:23:49 -0700 Subject: [Numpy-discussion] [EXTERNAL] Re: Strange PyArray_FromObject() behavior In-Reply-To: <72A51B26-D531-485E-8B95-EBB11D0F44B1@sandia.gov> References: <72A51B26-D531-485E-8B95-EBB11D0F44B1@sandia.gov> Message-ID: On Fri, Feb 17, 2012 at 10:31 AM, Bill Spotz wrote: > Chuck, > > I provided a little more context in another email. The user is using > numpy 1.6.1 with python 2.6. I asked him to try an earlier version -- > we'll see how it goes. This is code that has worked for a long time. It > still works on my laptop and on our test platforms. > > The behavior on the user's system appears to be consistent, as far as I > can tell. > > Maybe a compiler issue? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Fri Feb 17 15:38:36 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 17 Feb 2012 21:38:36 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: On Fri, Feb 17, 2012 at 8:31 PM, Mark Wiebe wrote: > On Fri, Feb 17, 2012 at 11:00 AM, Christopher Jordan-Squire < > cjordan1 at uw.edu> wrote: > >> On Fri, Feb 17, 2012 at 10:21 AM, Mark Wiebe wrote: >> > On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing >> wrote: >> >> >> >> On 02/17/2012 05:39 AM, Charles R Harris wrote: >> >> > >> >> > >> >> > On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau < >> cournape at gmail.com >> >> > > wrote: >> >> > >> >> > Hi Travis, >> >> > >> >> > On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant >> >> > > wrote: >> >> > > Mark Wiebe and I have been discussing off and on (as well as >> >> > talking with Charles) a good way forward to balance two competing >> >> > desires: >> >> > > >> >> > > * addition of new features that are needed in NumPy >> >> > > * improving the code-base generally and moving towards >> a >> >> > more maintainable NumPy >> >> > > >> >> > > I know there are load voices for just focusing on the second >> of >> >> > these and avoiding the first until we have finished that. I >> >> > recognize the need to improve the code base, but I will also be >> >> > pushing for improvements to the feature-set and user experience >> in >> >> > the process. >> >> > > >> >> > > As a result, I am proposing a rough outline for releases over >> the >> >> > next year: >> >> > > >> >> > > * NumPy 1.7 to come out as soon as the serious bugs >> can be >> >> > eliminated. Bryan, Francesc, Mark, and I are able to help triage >> >> > some of those. >> >> > > >> >> > > * NumPy 1.8 to come out in July which will have as many >> >> > ABI-compatible feature enhancements as we can add while improving >> >> > test coverage and code cleanup. I will post to this list more >> >> > details of what we plan to address with it later. Included for >> >> > possible inclusion are: >> >> > > * resolving the NA/missing-data issues >> >> > > * finishing group-by >> >> > > * incorporating the start of label arrays >> >> > > * incorporating a meta-object >> >> > > * a few new dtypes (variable-length string, >> >> > varialbe-length unicode and an enum type) >> >> > > * adding ufunc support for flexible dtypes and possibly >> >> > structured arrays >> >> > > * allowing generalized ufuncs to work on more kinds of >> >> > arrays besides just contiguous >> >> > > * improving the ability for NumPy to receive >> JIT-generated >> >> > function pointers for ufuncs and other calculation opportunities >> >> > > * adding "filters" to Input and Output >> >> > > * simple computed fields for dtypes >> >> > > * accepting a Data-Type specification as a class or >> JSON >> >> > file >> >> > > * work towards improving the dtype-addition mechanism >> >> > > * re-factoring of code so that it can compile with a >> C++ >> >> > compiler and be minimally dependent on Python data-structures. >> >> > >> >> > This is a pretty exciting list of features. What is the rationale >> >> > for >> >> > code being compiled as C++ ? IMO, it will be difficult to do so >> >> > without preventing useful C constructs, and without removing >> some of >> >> > the existing features (like our use of C99 complex). The subset >> that >> >> > is both C and C++ compatible is quite constraining. >> >> > >> >> > >> >> > I'm in favor of this myself, C++ would allow a lot code cleanup and >> make >> >> > it easier to provide an extensible base, I think it would be a >> natural >> >> > fit with numpy. Of course, some C++ projects become tangled messes of >> >> > inheritance, but I'd be very interested in seeing what a good C++ >> >> > designer like Mark, intimately familiar with the numpy code base, >> could >> >> > do. This opportunity might not come by again anytime soon and I >> think we >> >> > should grab onto it. The initial step would be a release whose code >> that >> >> > would compile in both C/C++, which mostly comes down to removing C++ >> >> > keywords like 'new'. >> >> > >> >> > I did suggest running it by you for build issues, so please raise any >> >> > you can think of. Note that MatPlotLib is in C++, so I don't think >> the >> >> > problems are insurmountable. And choosing a set of compilers to >> support >> >> > is something that will need to be done. >> >> >> >> It's true that matplotlib relies heavily on C++, both via the Agg >> >> library and in its own extension code. Personally, I don't like this; >> I >> >> think it raises the barrier to contributing. C++ is an order of >> >> magnitude more complicated than C--harder to read, and much harder to >> >> write, unless one is a true expert. In mpl it brings reliance on the >> CXX >> >> library, which Mike D. has had to help maintain. And if it does >> >> increase compiler specificity, that's bad. >> > >> > >> > This gets to the recruitment issue, which is one of the most important >> > problems I see numpy facing. I personally have contributed a lot of >> code to >> > NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++ >> was >> > the biggest negative point when I considered whether it was worth >> > contributing to the project. I suspect there are many programmers out >> there >> > who are skilled in low-level, high-performance C++, who would be >> willing to >> > contribute, but don't want to code in C. >> > >> > I believe NumPy should be trying to find people who want to make high >> > performance, close to the metal, libraries. This is a very different >> type of >> > programmer than one who wants to program in Python, but is willing to >> dabble >> > in a lower level language to make something run faster. High performance >> > library development is one of the things the C++ developer community >> does >> > very well, and that community is where we have a good chance of finding >> the >> > programmers NumPy needs. >> > >> >> I would much rather see development in the direction of sticking with C >> >> where direct low-level control and speed are needed, and using cython >> to >> >> gain higher level language benefits where appropriate. Of course, that >> >> brings in the danger of reliance on another complex tool, cython. If >> >> that danger is considered excessive, then just stick with C. >> > >> > >> > There are many small benefits C++ can offer, even if numpy chooses only >> to >> > use a tiny subset of the C++ language. For example, RAII can be used to >> > reliably eliminate PyObject reference leaks. >> > >> > Consider a regression like this: >> > http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html >> > >> > Fixing this in C would require switching all the relevant usages of >> > NPY_MAXARGS to use a dynamic memory allocation. This brings with it the >> > potential of easily introducing a memory leak, and is a lot of work to >> do. >> > In C++, this functionality could be placed inside a class, where the >> > deterministic construction/destruction semantics eliminate the risk of >> > memory leaks and make the code easier to read at the same time. There >> are >> > other examples like this where the C language has forced a suboptimal >> design >> > choice because of how hard it would be to do it better. >> > >> > Cheers, >> > Mark >> > >> >> In a similar vein, could incorporating C++ lead to a simpler low-level >> API for numpy? > > > This could definitely happen. One way to do it is to have a stable C API > which remains fixed over many releases, and a C++ library which is allowed > to change significantly at each release. This is what the LLVM project > does, for example. OpenCV is an example of another project which was > previously just C, but now has an extensive C++ API. > > >> I know Mark has talked before about--in the long-term, >> as a dream project to scratch his own itch, and something the BDF12 >> doesn't necessarily agree with--implementing the great ideas in numpy >> as a layered C++ library. (Which would have the added benefit of >> making numpy more of a general array library that could be exposed to >> any language which can call C++ libraries.) >> >> I don't imagine that's on the table for anything near-term, but I >> wonder if making more of the low-level stuff C++ would make it easier >> for performance nuts to write their own code in C/C++ interfacing with >> numpy, and then expose it to python. After playing around with ufuncs >> at the C level for a little while last summer, I quickly realized any >> simplifications would be greatly appreciated. >> > > This is all possible, yes. The way this typically works is that library > authors use advanced C++ techniques to get generality, performance, and > usability. The library user can then write code which is very simple and > written in a way which makes simple errors very difficult to make compared > to using a C-like API. > While the longer compile times are going to annoy me, I don't have a strong opinion on using C++. One thing to keep in mind though is portability. Numpy is used on many platforms and with many compilers. Keeping things working on AIX or with a PathScale compiler for example will be a lot more difficult when using C++. Or will support for not-so-common platforms be reduced? Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Fri Feb 17 15:46:48 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 17 Feb 2012 21:46:48 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <4F3E8840.8070405@continuum.io> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E8840.8070405@continuum.io> Message-ID: Hi Bryan, On Fri, Feb 17, 2012 at 6:02 PM, Bryan Van de Ven wrote: > On 2/17/12 10:27 AM, David Cournapeau wrote: > > Making the numpy C code buildable by a C++ compiler is harder than > > removing keywords. > Just as a data point, I took the cpp branch mark started and got numpy > built and running with multiarray compiled using C++ (OSX llvm-g++ 4.2). > That sounds promising. So far llvm-gcc has proved to be painful. Are you by any chance using scipy too? So far no one has managed to build the numpy/scipy combo with the LLVM-based compilers, so if you were willing to have a go at fixing that it would be hugely appreciated. See http://projects.scipy.org/scipy/ticket/1500 for details. Once that's fixed, numpy can switch to using it for releases. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Fri Feb 17 15:49:48 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 17 Feb 2012 21:49:48 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> Message-ID: On Fri, Feb 17, 2012 at 12:24 AM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > On Thu, Feb 16, 2012 at 4:20 PM, wrote: > >> On Thu, Feb 16, 2012 at 5:56 PM, Warren Weckesser >> wrote: >> > >> > >> > On Thu, Feb 16, 2012 at 4:39 PM, Travis Oliphant >> > wrote: >> >> >> >> Mark Wiebe and I have been discussing off and on (as well as talking >> with >> >> Charles) a good way forward to balance two competing desires: >> >> >> >> * addition of new features that are needed in NumPy >> >> * improving the code-base generally and moving towards a more >> >> maintainable NumPy >> >> >> >> I know there are load voices for just focusing on the second of these >> and >> >> avoiding the first until we have finished that. I recognize the need >> to >> >> improve the code base, but I will also be pushing for improvements to >> the >> >> feature-set and user experience in the process. >> >> >> >> As a result, I am proposing a rough outline for releases over the next >> >> year: >> >> >> >> * NumPy 1.7 to come out as soon as the serious bugs can be >> >> eliminated. Bryan, Francesc, Mark, and I are able to help triage some >> of >> >> those. >> >> >> >> * NumPy 1.8 to come out in July which will have as many >> >> ABI-compatible feature enhancements as we can add while improving test >> >> coverage and code cleanup. I will post to this list more details of >> what >> >> we plan to address with it later. Included for possible inclusion >> are: >> >> * resolving the NA/missing-data issues >> >> * finishing group-by >> >> * incorporating the start of label arrays >> >> * incorporating a meta-object >> >> * a few new dtypes (variable-length string, varialbe-length >> unicode >> >> and an enum type) >> >> * adding ufunc support for flexible dtypes and possibly >> structured >> >> arrays >> >> * allowing generalized ufuncs to work on more kinds of arrays >> >> besides just contiguous >> >> * improving the ability for NumPy to receive JIT-generated >> function >> >> pointers for ufuncs and other calculation opportunities >> >> * adding "filters" to Input and Output >> >> * simple computed fields for dtypes >> >> * accepting a Data-Type specification as a class or JSON file >> >> * work towards improving the dtype-addition mechanism >> >> * re-factoring of code so that it can compile with a C++ >> compiler >> >> and be minimally dependent on Python data-structures. >> >> >> >> * NumPy 2.0 to come out in January of 2013. Mark Wiebe and I >> will >> >> post to this list a document that explains some of it's proposed >> features >> >> and enhancements. I won't steal his thunder for some of the things >> he is >> >> working on. >> >> >> >> If there are code issues people would like to see addressed, it would >> be a >> >> great time to speak up and/or propose something that you would like to >> see. >> > >> > >> > >> > The above list looks great. Another request that comes up occasionally >> on >> > the mailing list is for the efficient computation of order statistics, >> the >> > simplest case being a combined min/max function. Longish thread starts >> > here: http://thread.gmane.org/gmane.comp.python.numeric.general/44130/ >> >> The list looks great, but for the time table I expect there will be at >> least a 1.9 and 1.10 necessary to improve what "we didn't get quite >> right in the first place", or what not many users had time to try out. >> >> > > That's my sense also. I think the long list needs to be prioritized and > broken up into smaller chunks. > +1 for an extra release (or two). Looking at the list of features, which looks great by the way, I think the last release before adding a whole bunch of new features should be the LTS. Ideally 1.8 would be mostly the refactoring and the LTS, with 1.9 containing most of the new features. If not, 1.7 should probably be the LTS. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From chanley at gmail.com Fri Feb 17 15:54:08 2012 From: chanley at gmail.com (Christopher Hanley) Date: Fri, 17 Feb 2012 15:54:08 -0500 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: On Fri, Feb 17, 2012 at 3:38 PM, Ralf Gommers wrote: > > > On Fri, Feb 17, 2012 at 8:31 PM, Mark Wiebe wrote: > >> On Fri, Feb 17, 2012 at 11:00 AM, Christopher Jordan-Squire < >> cjordan1 at uw.edu> wrote: >> >>> On Fri, Feb 17, 2012 at 10:21 AM, Mark Wiebe wrote: >>> > On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing >>> wrote: >>> >> >>> >> On 02/17/2012 05:39 AM, Charles R Harris wrote: >>> >> > >>> >> > >>> >> > On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau < >>> cournape at gmail.com >>> >> > > wrote: >>> >> > >>> >> > Hi Travis, >>> >> > >>> >> > On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant >>> >> > > wrote: >>> >> > > Mark Wiebe and I have been discussing off and on (as well as >>> >> > talking with Charles) a good way forward to balance two >>> competing >>> >> > desires: >>> >> > > >>> >> > > * addition of new features that are needed in NumPy >>> >> > > * improving the code-base generally and moving >>> towards a >>> >> > more maintainable NumPy >>> >> > > >>> >> > > I know there are load voices for just focusing on the second >>> of >>> >> > these and avoiding the first until we have finished that. I >>> >> > recognize the need to improve the code base, but I will also be >>> >> > pushing for improvements to the feature-set and user experience >>> in >>> >> > the process. >>> >> > > >>> >> > > As a result, I am proposing a rough outline for releases >>> over the >>> >> > next year: >>> >> > > >>> >> > > * NumPy 1.7 to come out as soon as the serious bugs >>> can be >>> >> > eliminated. Bryan, Francesc, Mark, and I are able to help >>> triage >>> >> > some of those. >>> >> > > >>> >> > > * NumPy 1.8 to come out in July which will have as >>> many >>> >> > ABI-compatible feature enhancements as we can add while >>> improving >>> >> > test coverage and code cleanup. I will post to this list more >>> >> > details of what we plan to address with it later. Included >>> for >>> >> > possible inclusion are: >>> >> > > * resolving the NA/missing-data issues >>> >> > > * finishing group-by >>> >> > > * incorporating the start of label arrays >>> >> > > * incorporating a meta-object >>> >> > > * a few new dtypes (variable-length string, >>> >> > varialbe-length unicode and an enum type) >>> >> > > * adding ufunc support for flexible dtypes and >>> possibly >>> >> > structured arrays >>> >> > > * allowing generalized ufuncs to work on more kinds of >>> >> > arrays besides just contiguous >>> >> > > * improving the ability for NumPy to receive >>> JIT-generated >>> >> > function pointers for ufuncs and other calculation opportunities >>> >> > > * adding "filters" to Input and Output >>> >> > > * simple computed fields for dtypes >>> >> > > * accepting a Data-Type specification as a class or >>> JSON >>> >> > file >>> >> > > * work towards improving the dtype-addition mechanism >>> >> > > * re-factoring of code so that it can compile with a >>> C++ >>> >> > compiler and be minimally dependent on Python data-structures. >>> >> > >>> >> > This is a pretty exciting list of features. What is the >>> rationale >>> >> > for >>> >> > code being compiled as C++ ? IMO, it will be difficult to do so >>> >> > without preventing useful C constructs, and without removing >>> some of >>> >> > the existing features (like our use of C99 complex). The subset >>> that >>> >> > is both C and C++ compatible is quite constraining. >>> >> > >>> >> > >>> >> > I'm in favor of this myself, C++ would allow a lot code cleanup and >>> make >>> >> > it easier to provide an extensible base, I think it would be a >>> natural >>> >> > fit with numpy. Of course, some C++ projects become tangled messes >>> of >>> >> > inheritance, but I'd be very interested in seeing what a good C++ >>> >> > designer like Mark, intimately familiar with the numpy code base, >>> could >>> >> > do. This opportunity might not come by again anytime soon and I >>> think we >>> >> > should grab onto it. The initial step would be a release whose code >>> that >>> >> > would compile in both C/C++, which mostly comes down to removing C++ >>> >> > keywords like 'new'. >>> >> > >>> >> > I did suggest running it by you for build issues, so please raise >>> any >>> >> > you can think of. Note that MatPlotLib is in C++, so I don't think >>> the >>> >> > problems are insurmountable. And choosing a set of compilers to >>> support >>> >> > is something that will need to be done. >>> >> >>> >> It's true that matplotlib relies heavily on C++, both via the Agg >>> >> library and in its own extension code. Personally, I don't like >>> this; I >>> >> think it raises the barrier to contributing. C++ is an order of >>> >> magnitude more complicated than C--harder to read, and much harder to >>> >> write, unless one is a true expert. In mpl it brings reliance on the >>> CXX >>> >> library, which Mike D. has had to help maintain. And if it does >>> >> increase compiler specificity, that's bad. >>> > >>> > >>> > This gets to the recruitment issue, which is one of the most important >>> > problems I see numpy facing. I personally have contributed a lot of >>> code to >>> > NumPy *in spite of* the fact it's in C. NumPy being in C instead of >>> C++ was >>> > the biggest negative point when I considered whether it was worth >>> > contributing to the project. I suspect there are many programmers out >>> there >>> > who are skilled in low-level, high-performance C++, who would be >>> willing to >>> > contribute, but don't want to code in C. >>> > >>> > I believe NumPy should be trying to find people who want to make high >>> > performance, close to the metal, libraries. This is a very different >>> type of >>> > programmer than one who wants to program in Python, but is willing to >>> dabble >>> > in a lower level language to make something run faster. High >>> performance >>> > library development is one of the things the C++ developer community >>> does >>> > very well, and that community is where we have a good chance of >>> finding the >>> > programmers NumPy needs. >>> > >>> >> I would much rather see development in the direction of sticking with >>> C >>> >> where direct low-level control and speed are needed, and using cython >>> to >>> >> gain higher level language benefits where appropriate. Of course, >>> that >>> >> brings in the danger of reliance on another complex tool, cython. If >>> >> that danger is considered excessive, then just stick with C. >>> > >>> > >>> > There are many small benefits C++ can offer, even if numpy chooses >>> only to >>> > use a tiny subset of the C++ language. For example, RAII can be used to >>> > reliably eliminate PyObject reference leaks. >>> > >>> > Consider a regression like this: >>> > http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html >>> > >>> > Fixing this in C would require switching all the relevant usages of >>> > NPY_MAXARGS to use a dynamic memory allocation. This brings with it the >>> > potential of easily introducing a memory leak, and is a lot of work to >>> do. >>> > In C++, this functionality could be placed inside a class, where the >>> > deterministic construction/destruction semantics eliminate the risk of >>> > memory leaks and make the code easier to read at the same time. There >>> are >>> > other examples like this where the C language has forced a suboptimal >>> design >>> > choice because of how hard it would be to do it better. >>> > >>> > Cheers, >>> > Mark >>> > >>> >>> In a similar vein, could incorporating C++ lead to a simpler low-level >>> API for numpy? >> >> >> This could definitely happen. One way to do it is to have a stable C API >> which remains fixed over many releases, and a C++ library which is allowed >> to change significantly at each release. This is what the LLVM project >> does, for example. OpenCV is an example of another project which was >> previously just C, but now has an extensive C++ API. >> >> >>> I know Mark has talked before about--in the long-term, >>> as a dream project to scratch his own itch, and something the BDF12 >>> doesn't necessarily agree with--implementing the great ideas in numpy >>> as a layered C++ library. (Which would have the added benefit of >>> making numpy more of a general array library that could be exposed to >>> any language which can call C++ libraries.) >>> >>> I don't imagine that's on the table for anything near-term, but I >>> wonder if making more of the low-level stuff C++ would make it easier >>> for performance nuts to write their own code in C/C++ interfacing with >>> numpy, and then expose it to python. After playing around with ufuncs >>> at the C level for a little while last summer, I quickly realized any >>> simplifications would be greatly appreciated. >>> >> >> This is all possible, yes. The way this typically works is that library >> authors use advanced C++ techniques to get generality, performance, and >> usability. The library user can then write code which is very simple and >> written in a way which makes simple errors very difficult to make compared >> to using a C-like API. >> > > While the longer compile times are going to annoy me, I don't have a > strong opinion on using C++. One thing to keep in mind though is > portability. Numpy is used on many platforms and with many compilers. > Keeping things working on AIX or with a PathScale compiler for example will > be a lot more difficult when using C++. Or will support for not-so-common > platforms be reduced? > > Ralf > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > Ralf makes a good point. During the early numpy development days I was eternally fighting with Solaris compilers. It's not really a big issue for us anymore since we have dropped Solaris support. But I'm '+1' for having easy numpy distribution being something to consider. Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Fri Feb 17 15:56:25 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 17 Feb 2012 12:56:25 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> Message-ID: On Fri, Feb 17, 2012 at 12:49 PM, Ralf Gommers wrote: > > > On Fri, Feb 17, 2012 at 12:24 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Thu, Feb 16, 2012 at 4:20 PM, wrote: >> >>> On Thu, Feb 16, 2012 at 5:56 PM, Warren Weckesser >>> wrote: >>> > >>> > >>> > On Thu, Feb 16, 2012 at 4:39 PM, Travis Oliphant >>> > wrote: >>> >> >>> >> Mark Wiebe and I have been discussing off and on (as well as talking >>> with >>> >> Charles) a good way forward to balance two competing desires: >>> >> >>> >> * addition of new features that are needed in NumPy >>> >> * improving the code-base generally and moving towards a more >>> >> maintainable NumPy >>> >> >>> >> I know there are load voices for just focusing on the second of these >>> and >>> >> avoiding the first until we have finished that. I recognize the need >>> to >>> >> improve the code base, but I will also be pushing for improvements to >>> the >>> >> feature-set and user experience in the process. >>> >> >>> >> As a result, I am proposing a rough outline for releases over the next >>> >> year: >>> >> >>> >> * NumPy 1.7 to come out as soon as the serious bugs can be >>> >> eliminated. Bryan, Francesc, Mark, and I are able to help triage >>> some of >>> >> those. >>> >> >>> >> * NumPy 1.8 to come out in July which will have as many >>> >> ABI-compatible feature enhancements as we can add while improving test >>> >> coverage and code cleanup. I will post to this list more details of >>> what >>> >> we plan to address with it later. Included for possible inclusion >>> are: >>> >> * resolving the NA/missing-data issues >>> >> * finishing group-by >>> >> * incorporating the start of label arrays >>> >> * incorporating a meta-object >>> >> * a few new dtypes (variable-length string, varialbe-length >>> unicode >>> >> and an enum type) >>> >> * adding ufunc support for flexible dtypes and possibly >>> structured >>> >> arrays >>> >> * allowing generalized ufuncs to work on more kinds of arrays >>> >> besides just contiguous >>> >> * improving the ability for NumPy to receive JIT-generated >>> function >>> >> pointers for ufuncs and other calculation opportunities >>> >> * adding "filters" to Input and Output >>> >> * simple computed fields for dtypes >>> >> * accepting a Data-Type specification as a class or JSON file >>> >> * work towards improving the dtype-addition mechanism >>> >> * re-factoring of code so that it can compile with a C++ >>> compiler >>> >> and be minimally dependent on Python data-structures. >>> >> >>> >> * NumPy 2.0 to come out in January of 2013. Mark Wiebe and I >>> will >>> >> post to this list a document that explains some of it's proposed >>> features >>> >> and enhancements. I won't steal his thunder for some of the things >>> he is >>> >> working on. >>> >> >>> >> If there are code issues people would like to see addressed, it would >>> be a >>> >> great time to speak up and/or propose something that you would like >>> to see. >>> > >>> > >>> > >>> > The above list looks great. Another request that comes up >>> occasionally on >>> > the mailing list is for the efficient computation of order statistics, >>> the >>> > simplest case being a combined min/max function. Longish thread starts >>> > here: http://thread.gmane.org/gmane.comp.python.numeric.general/44130/ >>> >>> The list looks great, but for the time table I expect there will be at >>> least a 1.9 and 1.10 necessary to improve what "we didn't get quite >>> right in the first place", or what not many users had time to try out. >>> >>> >> >> That's my sense also. I think the long list needs to be prioritized and >> broken up into smaller chunks. >> > > +1 for an extra release (or two). > > Looking at the list of features, which looks great by the way, I think the > last release before adding a whole bunch of new features should be the LTS. > Ideally 1.8 would be mostly the refactoring and the LTS, with 1.9 > containing most of the new features. If not, 1.7 should probably be the LTS. > To be clear, the purpose behind an LTS release is to provide ongoing bugfixes for users to whom one of the following applies: * Must use Python 2.4. * Are on a platform whose C/C++ compiler will never be updated anymore This way, developing NumPy can be made easier by not having to keep compatibility with really old systems. Am I understanding this correctly, or am I missing some aspect of the LTS strategy? Thanks, Mark > > Ralf > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Fri Feb 17 16:02:38 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 17 Feb 2012 22:02:38 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> Message-ID: On Fri, Feb 17, 2012 at 9:56 PM, Mark Wiebe wrote: > On Fri, Feb 17, 2012 at 12:49 PM, Ralf Gommers < > ralf.gommers at googlemail.com> wrote: > >> >> >> On Fri, Feb 17, 2012 at 12:24 AM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> >>> >>> On Thu, Feb 16, 2012 at 4:20 PM, wrote: >>> >>>> On Thu, Feb 16, 2012 at 5:56 PM, Warren Weckesser >>>> wrote: >>>> > >>>> > >>>> > On Thu, Feb 16, 2012 at 4:39 PM, Travis Oliphant >>> > >>>> > wrote: >>>> >> >>>> >> Mark Wiebe and I have been discussing off and on (as well as talking >>>> with >>>> >> Charles) a good way forward to balance two competing desires: >>>> >> >>>> >> * addition of new features that are needed in NumPy >>>> >> * improving the code-base generally and moving towards a more >>>> >> maintainable NumPy >>>> >> >>>> >> I know there are load voices for just focusing on the second of >>>> these and >>>> >> avoiding the first until we have finished that. I recognize the >>>> need to >>>> >> improve the code base, but I will also be pushing for improvements >>>> to the >>>> >> feature-set and user experience in the process. >>>> >> >>>> >> As a result, I am proposing a rough outline for releases over the >>>> next >>>> >> year: >>>> >> >>>> >> * NumPy 1.7 to come out as soon as the serious bugs can be >>>> >> eliminated. Bryan, Francesc, Mark, and I are able to help triage >>>> some of >>>> >> those. >>>> >> >>>> >> * NumPy 1.8 to come out in July which will have as many >>>> >> ABI-compatible feature enhancements as we can add while improving >>>> test >>>> >> coverage and code cleanup. I will post to this list more details >>>> of what >>>> >> we plan to address with it later. Included for possible inclusion >>>> are: >>>> >> * resolving the NA/missing-data issues >>>> >> * finishing group-by >>>> >> * incorporating the start of label arrays >>>> >> * incorporating a meta-object >>>> >> * a few new dtypes (variable-length string, varialbe-length >>>> unicode >>>> >> and an enum type) >>>> >> * adding ufunc support for flexible dtypes and possibly >>>> structured >>>> >> arrays >>>> >> * allowing generalized ufuncs to work on more kinds of arrays >>>> >> besides just contiguous >>>> >> * improving the ability for NumPy to receive JIT-generated >>>> function >>>> >> pointers for ufuncs and other calculation opportunities >>>> >> * adding "filters" to Input and Output >>>> >> * simple computed fields for dtypes >>>> >> * accepting a Data-Type specification as a class or JSON file >>>> >> * work towards improving the dtype-addition mechanism >>>> >> * re-factoring of code so that it can compile with a C++ >>>> compiler >>>> >> and be minimally dependent on Python data-structures. >>>> >> >>>> >> * NumPy 2.0 to come out in January of 2013. Mark Wiebe and >>>> I will >>>> >> post to this list a document that explains some of it's proposed >>>> features >>>> >> and enhancements. I won't steal his thunder for some of the >>>> things he is >>>> >> working on. >>>> >> >>>> >> If there are code issues people would like to see addressed, it >>>> would be a >>>> >> great time to speak up and/or propose something that you would like >>>> to see. >>>> > >>>> > >>>> > >>>> > The above list looks great. Another request that comes up >>>> occasionally on >>>> > the mailing list is for the efficient computation of order >>>> statistics, the >>>> > simplest case being a combined min/max function. Longish thread >>>> starts >>>> > here: >>>> http://thread.gmane.org/gmane.comp.python.numeric.general/44130/ >>>> >>>> The list looks great, but for the time table I expect there will be at >>>> least a 1.9 and 1.10 necessary to improve what "we didn't get quite >>>> right in the first place", or what not many users had time to try out. >>>> >>>> >>> >>> That's my sense also. I think the long list needs to be prioritized and >>> broken up into smaller chunks. >>> >> >> +1 for an extra release (or two). >> >> Looking at the list of features, which looks great by the way, I think >> the last release before adding a whole bunch of new features should be the >> LTS. Ideally 1.8 would be mostly the refactoring and the LTS, with 1.9 >> containing most of the new features. If not, 1.7 should probably be the LTS. >> > > To be clear, the purpose behind an LTS release is to provide ongoing > bugfixes for users to whom one of the following applies: > > * Must use Python 2.4. > * Are on a platform whose C/C++ compiler will never be updated anymore > > Those both apply. > This way, developing NumPy can be made easier by not having to keep > compatibility with really old systems. Am I understanding this correctly, > or am I missing some aspect of the LTS strategy? > > The main reason is to allow starting to clean up the code, as Chuck said in his initial message: http://comments.gmane.org/gmane.comp.python.numeric.general/47765. So this would include old macros, maybe things like numarray support. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Fri Feb 17 17:51:01 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Fri, 17 Feb 2012 14:51:01 -0800 Subject: [Numpy-discussion] Buildbot/continuous integration (was Re: Issue Tracking) In-Reply-To: References: Message-ID: On Thu, Feb 16, 2012 at 3:52 PM, Chris Ball wrote: > After getting to this initial stage, I'll discuss about adding more > features (such as testing pull requests, performance testing, building > binaries on the different operating systems, etc). Also, if it's working > well, this Buildbot setup could replace/be merged with the one at > buildbot.scipy.org (I don't know who is currently running that). That machine is up and running at Stellenbosch University; the main problem is that it is behind a firewall, and we've had some issues working around that. I can gladly share the current configuration, and the instructions I send out to the maintainers of "slave" machines (I agree--putting it in a Git repo--minus passwords!--is a good idea). At the moment, there are three active build slaves: one Windows XP and two different setups of Linux Sparc. To trigger a build, click on the machine name and then "Force Build". St?fan From stefan at sun.ac.za Fri Feb 17 17:57:53 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Fri, 17 Feb 2012 14:57:53 -0800 Subject: [Numpy-discussion] test errors on deprecation/runtime warnings In-Reply-To: References: Message-ID: Hi Ralf On Thu, Feb 16, 2012 at 11:05 AM, Ralf Gommers wrote: > Last week we merged https://github.com/numpy/numpy/pull/201, which causes > DeprecationWarning's and RuntimeWarning's to be converted to errors if they > occur when running the test suite. It looks like this change affects other packages, too, which may legitimately raise RuntimeWarnings while running their test suites (unless I read the patch wrong). Would it be an option to rather add a flag (False by default) to enable this behaviour, and enable it inside of numpy.test() ? Regards St?fan From cournape at gmail.com Fri Feb 17 18:44:34 2012 From: cournape at gmail.com (David Cournapeau) Date: Fri, 17 Feb 2012 17:44:34 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: I don't think c++ has any significant advantage over c for high performance libraries. I am not convinced by the number of people argument either: it is not my experience that c++ is easier to maintain in a open source context, where the level of people is far from consistent. I doubt many people did not contribute to numoy because it is in c instead if c++. While this is somehow subjective, there are reasons that c is much more common than c++ in that context. I would much rather move most part to cython to solve subtle ref counting issues, typically. The only way that i know of to have a stable and usable abi is to wrap the c++ code in c. Wrapping c++ libraries in python has always been a pain in my experience. How are template or exceptions handled across languages ? it will also be a significant issue on windows with open source compilers. Interestingly, the api from clang exported to other languages is in c... David Le 17 f?vr. 2012 18:21, "Mark Wiebe" a ?crit : > > On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing wrote: >> >> On 02/17/2012 05:39 AM, Charles R Harris wrote: >> > >> > >> > On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau > > > wrote: >> > >> > Hi Travis, >> > >> > On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant >> > > wrote: >> > > Mark Wiebe and I have been discussing off and on (as well as >> > talking with Charles) a good way forward to balance two competing >> > desires: >> > > >> > > * addition of new features that are needed in NumPy >> > > * improving the code-base generally and moving towards a >> > more maintainable NumPy >> > > >> > > I know there are load voices for just focusing on the second of >> > these and avoiding the first until we have finished that. I >> > recognize the need to improve the code base, but I will also be >> > pushing for improvements to the feature-set and user experience in >> > the process. >> > > >> > > As a result, I am proposing a rough outline for releases over the >> > next year: >> > > >> > > * NumPy 1.7 to come out as soon as the serious bugs can be >> > eliminated. Bryan, Francesc, Mark, and I are able to help triage >> > some of those. >> > > >> > > * NumPy 1.8 to come out in July which will have as many >> > ABI-compatible feature enhancements as we can add while improving >> > test coverage and code cleanup. I will post to this list more >> > details of what we plan to address with it later. Included for >> > possible inclusion are: >> > > * resolving the NA/missing-data issues >> > > * finishing group-by >> > > * incorporating the start of label arrays >> > > * incorporating a meta-object >> > > * a few new dtypes (variable-length string, >> > varialbe-length unicode and an enum type) >> > > * adding ufunc support for flexible dtypes and possibly >> > structured arrays >> > > * allowing generalized ufuncs to work on more kinds of >> > arrays besides just contiguous >> > > * improving the ability for NumPy to receive JIT-generated >> > function pointers for ufuncs and other calculation opportunities >> > > * adding "filters" to Input and Output >> > > * simple computed fields for dtypes >> > > * accepting a Data-Type specification as a class or JSON file >> > > * work towards improving the dtype-addition mechanism >> > > * re-factoring of code so that it can compile with a C++ >> > compiler and be minimally dependent on Python data-structures. >> > >> > This is a pretty exciting list of features. What is the rationale for >> > code being compiled as C++ ? IMO, it will be difficult to do so >> > without preventing useful C constructs, and without removing some of >> > the existing features (like our use of C99 complex). The subset that >> > is both C and C++ compatible is quite constraining. >> > >> > >> > I'm in favor of this myself, C++ would allow a lot code cleanup and make >> > it easier to provide an extensible base, I think it would be a natural >> > fit with numpy. Of course, some C++ projects become tangled messes of >> > inheritance, but I'd be very interested in seeing what a good C++ >> > designer like Mark, intimately familiar with the numpy code base, could >> > do. This opportunity might not come by again anytime soon and I think we >> > should grab onto it. The initial step would be a release whose code that >> > would compile in both C/C++, which mostly comes down to removing C++ >> > keywords like 'new'. >> > >> > I did suggest running it by you for build issues, so please raise any >> > you can think of. Note that MatPlotLib is in C++, so I don't think the >> > problems are insurmountable. And choosing a set of compilers to support >> > is something that will need to be done. >> >> It's true that matplotlib relies heavily on C++, both via the Agg >> library and in its own extension code. Personally, I don't like this; I >> think it raises the barrier to contributing. C++ is an order of >> magnitude more complicated than C--harder to read, and much harder to >> write, unless one is a true expert. In mpl it brings reliance on the CXX >> library, which Mike D. has had to help maintain. And if it does >> increase compiler specificity, that's bad. > > > This gets to the recruitment issue, which is one of the most important problems I see numpy facing. I personally have contributed a lot of code to NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++ was the biggest negative point when I considered whether it was worth contributing to the project. I suspect there are many programmers out there who are skilled in low-level, high-performance C++, who would be willing to contribute, but don't want to code in C. > > I believe NumPy should be trying to find people who want to make high performance, close to the metal, libraries. This is a very different type of programmer than one who wants to program in Python, but is willing to dabble in a lower level language to make something run faster. High performance library development is one of the things the C++ developer community does very well, and that community is where we have a good chance of finding the programmers NumPy needs. > >> I would much rather see development in the direction of sticking with C >> where direct low-level control and speed are needed, and using cython to >> gain higher level language benefits where appropriate. Of course, that >> brings in the danger of reliance on another complex tool, cython. If >> that danger is considered excessive, then just stick with C. > > > There are many small benefits C++ can offer, even if numpy chooses only to use a tiny subset of the C++ language. For example, RAII can be used to reliably eliminate PyObject reference leaks. > > Consider a regression like this: > http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html > > Fixing this in C would require switching all the relevant usages of NPY_MAXARGS to use a dynamic memory allocation. This brings with it the potential of easily introducing a memory leak, and is a lot of work to do. In C++, this functionality could be placed inside a class, where the deterministic construction/destruction semantics eliminate the risk of memory leaks and make the code easier to read at the same time. There are other examples like this where the C language has forced a suboptimal design choice because of how hard it would be to do it better. > > Cheers, > Mark > >> >> Eric >> >> > >> > Chuck >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Fri Feb 17 18:55:38 2012 From: cournape at gmail.com (David Cournapeau) Date: Fri, 17 Feb 2012 17:55:38 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> Message-ID: Le 17 f?vr. 2012 17:58, "Mark Wiebe" a ?crit : > > On Fri, Feb 17, 2012 at 10:27 AM, David Cournapeau wrote: >> >> On Fri, Feb 17, 2012 at 3:39 PM, Charles R Harris >> wrote: >> > >> > >> > On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau >> > wrote: >> >> >> >> Hi Travis, >> >> >> >> On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant >> >> wrote: >> >> > Mark Wiebe and I have been discussing off and on (as well as talking >> >> > with Charles) a good way forward to balance two competing desires: >> >> > >> >> > * addition of new features that are needed in NumPy >> >> > * improving the code-base generally and moving towards a more >> >> > maintainable NumPy >> >> > >> >> > I know there are load voices for just focusing on the second of these >> >> > and avoiding the first until we have finished that. I recognize the need to >> >> > improve the code base, but I will also be pushing for improvements to the >> >> > feature-set and user experience in the process. >> >> > >> >> > As a result, I am proposing a rough outline for releases over the next >> >> > year: >> >> > >> >> > * NumPy 1.7 to come out as soon as the serious bugs can be >> >> > eliminated. Bryan, Francesc, Mark, and I are able to help triage some of >> >> > those. >> >> > >> >> > * NumPy 1.8 to come out in July which will have as many >> >> > ABI-compatible feature enhancements as we can add while improving test >> >> > coverage and code cleanup. I will post to this list more details of what >> >> > we plan to address with it later. Included for possible inclusion are: >> >> > * resolving the NA/missing-data issues >> >> > * finishing group-by >> >> > * incorporating the start of label arrays >> >> > * incorporating a meta-object >> >> > * a few new dtypes (variable-length string, varialbe-length >> >> > unicode and an enum type) >> >> > * adding ufunc support for flexible dtypes and possibly >> >> > structured arrays >> >> > * allowing generalized ufuncs to work on more kinds of arrays >> >> > besides just contiguous >> >> > * improving the ability for NumPy to receive JIT-generated >> >> > function pointers for ufuncs and other calculation opportunities >> >> > * adding "filters" to Input and Output >> >> > * simple computed fields for dtypes >> >> > * accepting a Data-Type specification as a class or JSON file >> >> > * work towards improving the dtype-addition mechanism >> >> > * re-factoring of code so that it can compile with a C++ compiler >> >> > and be minimally dependent on Python data-structures. >> >> >> >> This is a pretty exciting list of features. What is the rationale for >> >> code being compiled as C++ ? IMO, it will be difficult to do so >> >> without preventing useful C constructs, and without removing some of >> >> the existing features (like our use of C99 complex). The subset that >> >> is both C and C++ compatible is quite constraining. >> >> >> > >> > I'm in favor of this myself, C++ would allow a lot code cleanup and make it >> > easier to provide an extensible base, I think it would be a natural fit with >> > numpy. Of course, some C++ projects become tangled messes of inheritance, >> > but I'd be very interested in seeing what a good C++ designer like Mark, >> > intimately familiar with the numpy code base, could do. This opportunity >> > might not come by again anytime soon and I think we should grab onto it. The >> > initial step would be a release whose code that would compile in both C/C++, >> > which mostly comes down to removing C++ keywords like 'new'. >> >> C++ will make integration with external environments much harder >> (calling a C++ library from a non C++ program is very hard, especially >> for cross-platform projects), and I am not convinced by the more >> extensible argument. > > > The whole of NumPy could be written utilizing C++ extensively while still using exactly the same API and ABI numpy has now. C++ does not force anything about API/ABI design decisions. > > One good document to read about how a major open source project transitioned from C to C++ is about gcc. Their points comparing C and C++ apply to numpy quite well, and being compiler authors, they're intimately familiar with ABI and performance issues: > > http://gcc.gnu.org/wiki/gcc-in-cxx#The_gcc-in-cxx_branch > >> Making the numpy C code buildable by a C++ compiler is harder than >> removing keywords. > > > Certainly, but it's not a difficult task for someone who's familiar with both C and C++. > >> >> > I did suggest running it by you for build issues, so please raise any you >> > can think of. Note that MatPlotLib is in C++, so I don't think the problems >> > are insurmountable. And choosing a set of compilers to support is something >> > that will need to be done. >> >> I don't know for matplotlib, but for scipy, quite a few issues were >> caused by our C++ extensions in scipy.sparse. But build issues are a >> not a strong argument against C++ - I am sure those could be worked >> out. > > > On this topic, I'd like to ask what it would take to change the default warning levels in all the build configurations? Building with no warnings under high warning levels is a pretty standard practice as a basic mechanisms for catching some classes of bugs, and it would be nice for numpy to do this. The only way this is reasonable, though, is if it's the default in the build system. Doing it for say just gcc is not that complicated. Generally, easy custimization of cimpilation flags is one of the stated goal of bento :) david > > Thanks, > Mark > >> >> regards, >> >> David >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Feb 17 19:58:42 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 17 Feb 2012 17:58:42 -0700 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau wrote: > I don't think c++ has any significant advantage over c for high > performance libraries. I am not convinced by the number of people argument > either: it is not my experience that c++ is easier to maintain in a open > source context, where the level of people is far from consistent. I doubt > many people did not contribute to numoy because it is in c instead if c++. > While this is somehow subjective, there are reasons that c is much more > common than c++ in that context. > I think C++ offers much better tools than C for the sort of things in Numpy. The compiler will take care of lots of things that now have to be hand crafted and I wouldn't be surprised to see the code size shrink by a significant factor. > I would much rather move most part to cython to solve subtle ref counting > issues, typically. > Not me, I'd rather write most stuff in C/C++ than Cython, C is cleaner ;) Cython good for the Python interface, but once past that barrier C is easier, and C++ has lots of useful things. > The only way that i know of to have a stable and usable abi is to wrap the > c++ code in c. Wrapping c++ libraries in python has always been a pain in > my experience. How are template or exceptions handled across languages ? it > will also be a significant issue on windows with open source compilers. > > Interestingly, the api from clang exported to other languages is in c... > The api isn't the same as the implementation language. I wouldn't prejudge these issues, but some indication of how they would be solved might be helpful. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Feb 17 20:54:01 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 17 Feb 2012 17:54:01 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: Hi, On Fri, Feb 17, 2012 at 4:58 PM, Charles R Harris wrote: > > > On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau > wrote: >> >> I don't think c++ has any significant advantage over c for high >> performance libraries. I am not convinced by the number of people argument >> either: it is not my experience that c++ is easier to maintain in a open >> source context, where the level of people is far from consistent. I doubt >> many people did not contribute to numoy because it is in c instead if c++. >> While this is somehow subjective, there are reasons that c is much more >> common than c++ in that context. > > > I think C++ offers much better tools than C for the sort of things in Numpy. > The compiler will take care of lots of things that now have to be hand > crafted and I wouldn't be surprised to see the code size shrink by a > significant factor. >> >> I would much rather move most part to cython to solve subtle ref counting >> issues, typically. > > > Not me, I'd rather write most stuff in C/C++ than Cython, C is cleaner ;) > Cython good for the Python interface, but once past that barrier C is > easier, and C++ has lots of useful things. Maybe a straw poll of the number of recent contributors to numpy who know: C C++ Cython would help resolve this. I suspect using C++ would reduce the number of people who feel able to contribute, compared to: Simplifying the C code Rewriting in Cython Unless there is some reason to think that neither of these approaches would work in the particular case of numpy? Best, Matthew From charlesr.harris at gmail.com Fri Feb 17 21:04:01 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 17 Feb 2012 19:04:01 -0700 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: On Fri, Feb 17, 2012 at 6:54 PM, Matthew Brett wrote: > Hi, > > On Fri, Feb 17, 2012 at 4:58 PM, Charles R Harris > wrote: > > > > > > On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau > > wrote: > >> > >> I don't think c++ has any significant advantage over c for high > >> performance libraries. I am not convinced by the number of people > argument > >> either: it is not my experience that c++ is easier to maintain in a open > >> source context, where the level of people is far from consistent. I > doubt > >> many people did not contribute to numoy because it is in c instead if > c++. > >> While this is somehow subjective, there are reasons that c is much more > >> common than c++ in that context. > > > > > > I think C++ offers much better tools than C for the sort of things in > Numpy. > > The compiler will take care of lots of things that now have to be hand > > crafted and I wouldn't be surprised to see the code size shrink by a > > significant factor. > >> > >> I would much rather move most part to cython to solve subtle ref > counting > >> issues, typically. > > > > > > Not me, I'd rather write most stuff in C/C++ than Cython, C is cleaner ;) > > Cython good for the Python interface, but once past that barrier C is > > easier, and C++ has lots of useful things. > > Maybe a straw poll of the number of recent contributors to numpy who know: > > C > C++ > Cython > > would help resolve this. > > I suspect using C++ would reduce the number of people who feel able to > contribute, compared to: > > Simplifying the C code > Rewriting in Cython > > Unless there is some reason to think that neither of these approaches > would work in the particular case of numpy? > > How about a different variation. How many people writing Python would happily give up the following: 1) lists 2) dictionaries 3) default types 4) classes 5) automatic dellocation of memory Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Feb 17 21:16:31 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 17 Feb 2012 18:16:31 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: Hi, On Fri, Feb 17, 2012 at 6:04 PM, Charles R Harris wrote: > > > On Fri, Feb 17, 2012 at 6:54 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Fri, Feb 17, 2012 at 4:58 PM, Charles R Harris >> wrote: >> > >> > >> > On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau >> > wrote: >> >> >> >> I don't think c++ has any significant advantage over c for high >> >> performance libraries. I am not convinced by the number of people >> >> argument >> >> either: it is not my experience that c++ is easier to maintain in a >> >> open >> >> source context, where the level of people is far from consistent. I >> >> doubt >> >> many people did not contribute to numoy because it is in c instead if >> >> c++. >> >> While this is somehow subjective, there are reasons that c is much more >> >> common than c++ in that context. >> > >> > >> > I think C++ offers much better tools than C for the sort of things in >> > Numpy. >> > The compiler will take care of lots of things that now have to be hand >> > crafted and I wouldn't be surprised to see the code size shrink by a >> > significant factor. >> >> >> >> I would much rather move most part to cython to solve subtle ref >> >> counting >> >> issues, typically. >> > >> > >> > Not me, I'd rather write most stuff in C/C++ than Cython, C is cleaner >> > ;) >> > Cython good for the Python interface, but once past that barrier C is >> > easier, and C++ has lots of useful things. >> >> Maybe a straw poll of the number of recent contributors to numpy who know: >> >> C >> C++ >> Cython >> >> would help resolve this. >> >> I suspect using C++ would reduce the number of people who feel able to >> contribute, compared to: >> >> Simplifying the C code >> Rewriting in Cython >> >> Unless there is some reason to think that neither of these approaches >> would work in the particular case of numpy? >> > > How about a different variation. How many people writing Python would > happily give up the following: > > 1) lists > 2) dictionaries > 3) default types > 4) classes > 5) automatic dellocation of memory You gain some things and lose a lot of potential developers. Cython of course does give you access to classes, much of the automatic deallocation. Lists and dictionaries are fast in python used from Cython, as they are in Python. @Dag. @David, @anyone - have you ever had time to look and see what could be done with Cython in the numpy core? See you, Matthew From cournape at gmail.com Fri Feb 17 21:29:35 2012 From: cournape at gmail.com (David Cournapeau) Date: Fri, 17 Feb 2012 20:29:35 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: Le 18 f?vr. 2012 00:58, "Charles R Harris" a ?crit : > > > > On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau wrote: >> >> I don't think c++ has any significant advantage over c for high performance libraries. I am not convinced by the number of people argument either: it is not my experience that c++ is easier to maintain in a open source context, where the level of people is far from consistent. I doubt many people did not contribute to numoy because it is in c instead if c++. While this is somehow subjective, there are reasons that c is much more common than c++ in that context. > > > I think C++ offers much better tools than C for the sort of things in Numpy. The compiler will take care of lots of things that now have to be hand crafted and I wouldn't be surprised to see the code size shrink by a significant factor. There are two arguments here: that c code in numpy could be improved, and that c++ is the best way to do it. Nobody so far has argued against the first argument. i think there is a lot of space to improve things while still be in C. You say that the compiler would take care of a lot of things: so far, the main thing that has been mentionned is raii. While it is certainly a useful concept, I find it ewtremely difficult to use correctly in real applications. Things that are simple to do on simple examples become really hard to deal with when features start to interact with each other (which is always in c++). Writing robust code that is exception safe with the stl requires a lot of knowledge. I don't have this knowledge. I have .o doubt Mark has this knowledge. Does anyone else on this list has ? >> >> I would much rather move most part to cython to solve subtle ref counting issues, typically. > > > Not me, I'd rather write most stuff in C/C++ than Cython, C is cleaner ;) Cython good for the Python interface, but once past that barrier C is easier, and C++ has lots of useful things. >> >> The only way that i know of to have a stable and usable abi is to wrap the c++ code in c. Wrapping c++ libraries in python has always been a pain in my experience. How are template or exceptions handled across languages ? it will also be a significant issue on windows with open source compilers. >> >> Interestingly, the api from clang exported to other languages is in c... > > > The api isn't the same as the implementation language. I wouldn't prejudge these issues, but some indication of how they would be solved might be helpful. I understand that api and inplementation language are not the same: you just quoted the part where I was mentioning it :) Assuming a c++ inplementation with a c api, how will you deal with templates ? how will you deal with exception ? How will you deal with exception crossing dll/so between different compilers, which is a very common situation in our community ? david > > > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Feb 17 21:40:27 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 17 Feb 2012 21:40:27 -0500 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: On Fri, Feb 17, 2012 at 9:29 PM, David Cournapeau wrote: > > Le 18 f?vr. 2012 00:58, "Charles R Harris" a > ?crit?: > > >> >> >> >> On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau >> wrote: >>> >>> I don't think c++ has any significant advantage over c for high >>> performance libraries. I am not convinced by the number of people argument >>> either: it is not my experience that c++ is easier to maintain in a open >>> source context, where the level of people is far from consistent. I doubt >>> many people did not contribute to numoy because it is in c instead if c++. >>> While this is somehow subjective, there are reasons that c is much more >>> common than c++ in that context. >> >> >> I think C++ offers much better tools than C for the sort of things in >> Numpy. The compiler will take care of lots of things that now have to be >> hand crafted and I wouldn't be surprised to see the code size shrink by a >> significant factor. > > There are two arguments here: that c code in numpy could be improved, and > that c++ is the best way to do it. Nobody so far has argued against the > first argument. i think there is a lot of space to improve things while > still be in C. > > You say that the compiler would take care of a lot of things: so far, the > main thing that has been mentionned is raii. While it is certainly a useful > concept, I find it ewtremely difficult to use correctly in real > applications. Things that are simple to do on simple examples become really > hard to deal with when features start to interact with each other (which is > always in c++). Writing robust code that is exception safe with the stl > requires a lot of knowledge. I don't have this knowledge. I have .o doubt > Mark has this knowledge. Does anyone else on this list has ? > >>> >>> I would much rather move most part to cython to solve subtle ref counting >>> issues, typically. >> >> >> Not me, I'd rather write most stuff in C/C++ than Cython, C is cleaner ;) >> Cython good for the Python interface, but once past that barrier C is >> easier, and C++ has lots of useful things. What happened with the IronPython implementation of numpy that was translating into cython, as far as I understood. (just as curious bystander, I have no idea about any of the low level stuff.) Josef >>> >>> The only way that i know of to have a stable and usable abi is to wrap >>> the c++ code in c. Wrapping c++ libraries in python? has always been a pain >>> in my experience. How are template or exceptions handled across languages ? >>> it will also be a significant issue on windows with open source compilers. >>> >>> Interestingly, the api from clang exported to other languages is in c... >> >> >> The api isn't the same as the implementation language. I wouldn't prejudge >> these issues, but some indication of how they would be solved might be >> helpful. > > I understand that api and inplementation language are not the same: you just > quoted the part where I was mentioning it :) > > Assuming a c++ inplementation with a c api, how will you deal with templates > ? how will you deal with exception ? How will you deal with exception > crossing dll/so between different compilers, which is a very common > situation in our community ? > > david > >> >> > > >> >> Chuck >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From cournape at gmail.com Fri Feb 17 21:59:10 2012 From: cournape at gmail.com (David Cournapeau) Date: Fri, 17 Feb 2012 20:59:10 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: Le 17 f?vr. 2012 18:21, "Mark Wiebe" a ?crit : > > On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing wrote: >> >> On 02/17/2012 05:39 AM, Charles R Harris wrote: >> > >> > >> > On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau > > > wrote: >> > >> > Hi Travis, >> > >> > On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant >> > > wrote: >> > > Mark Wiebe and I have been discussing off and on (as well as >> > talking with Charles) a good way forward to balance two competing >> > desires: >> > > >> > > * addition of new features that are needed in NumPy >> > > * improving the code-base generally and moving towards a >> > more maintainable NumPy >> > > >> > > I know there are load voices for just focusing on the second of >> > these and avoiding the first until we have finished that. I >> > recognize the need to improve the code base, but I will also be >> > pushing for improvements to the feature-set and user experience in >> > the process. >> > > >> > > As a result, I am proposing a rough outline for releases over the >> > next year: >> > > >> > > * NumPy 1.7 to come out as soon as the serious bugs can be >> > eliminated. Bryan, Francesc, Mark, and I are able to help triage >> > some of those. >> > > >> > > * NumPy 1.8 to come out in July which will have as many >> > ABI-compatible feature enhancements as we can add while improving >> > test coverage and code cleanup. I will post to this list more >> > details of what we plan to address with it later. Included for >> > possible inclusion are: >> > > * resolving the NA/missing-data issues >> > > * finishing group-by >> > > * incorporating the start of label arrays >> > > * incorporating a meta-object >> > > * a few new dtypes (variable-length string, >> > varialbe-length unicode and an enum type) >> > > * adding ufunc support for flexible dtypes and possibly >> > structured arrays >> > > * allowing generalized ufuncs to work on more kinds of >> > arrays besides just contiguous >> > > * improving the ability for NumPy to receive JIT-generated >> > function pointers for ufuncs and other calculation opportunities >> > > * adding "filters" to Input and Output >> > > * simple computed fields for dtypes >> > > * accepting a Data-Type specification as a class or JSON file >> > > * work towards improving the dtype-addition mechanism >> > > * re-factoring of code so that it can compile with a C++ >> > compiler and be minimally dependent on Python data-structures. >> > >> > This is a pretty exciting list of features. What is the rationale for >> > code being compiled as C++ ? IMO, it will be difficult to do so >> > without preventing useful C constructs, and without removing some of >> > the existing features (like our use of C99 complex). The subset that >> > is both C and C++ compatible is quite constraining. >> > >> > >> > I'm in favor of this myself, C++ would allow a lot code cleanup and make >> > it easier to provide an extensible base, I think it would be a natural >> > fit with numpy. Of course, some C++ projects become tangled messes of >> > inheritance, but I'd be very interested in seeing what a good C++ >> > designer like Mark, intimately familiar with the numpy code base, could >> > do. This opportunity might not come by again anytime soon and I think we >> > should grab onto it. The initial step would be a release whose code that >> > would compile in both C/C++, which mostly comes down to removing C++ >> > keywords like 'new'. >> > >> > I did suggest running it by you for build issues, so please raise any >> > you can think of. Note that MatPlotLib is in C++, so I don't think the >> > problems are insurmountable. And choosing a set of compilers to support >> > is something that will need to be done. >> >> It's true that matplotlib relies heavily on C++, both via the Agg >> library and in its own extension code. Personally, I don't like this; I >> think it raises the barrier to contributing. C++ is an order of >> magnitude more complicated than C--harder to read, and much harder to >> write, unless one is a true expert. In mpl it brings reliance on the CXX >> library, which Mike D. has had to help maintain. And if it does >> increase compiler specificity, that's bad. > > > This gets to the recruitment issue, which is one of the most important problems I see numpy facing. I personally have contributed a lot of code to NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++ was the biggest negative point when I considered whether it was worth contributing to the project. I suspect there are many programmers out there who are skilled in low-level, high-performance C++, who would be willing to contribute, but don't want to code in C. This is a really important issue, because accessibility is the essential reason why I am so strongly against it. It trumps by far all my technical reservations. Maybe this is just a coincidence that you use this word but "recrutment" is not what is happening in an open community, and finding people who want to make close to the metal, high performance is very different from making the codebase more accessible. I would argue that they are actually contradictory, but I would concede this is slightly more subjective claim. To be used approprietly, c++ requires much more discipline than c. Doing this for a community-based project is very hard. Doing this with people who often are scientist first and programmers second even harder. I have been contributing to numpy for quite a few years and I have seen/been told many times that numpy c code was hard to dive in, people did not know where to start, etc... I cannot remember a case where people said that C itself was the reason: other contributors can correct me if I am wrong, but I believe you are the first person who considered c/c++ to be a fundamental reason. I have no reason to believe you would not be able to produce better code in c++. But I believe you are in a minority within the people I would like to see contributing to numpy. David > > I believe NumPy should be trying to find people who want to make high performance, close to the metal, libraries. This is a very different type of programmer than one who wants to program in Python, but is willing to dabble in a lower level language to make something run faster. High performance library development is one of the things the C++ developer community does very well, and that community is where we have a good chance of finding the programmers NumPy needs. > >> I would much rather see development in the direction of sticking with C >> where direct low-level control and speed are needed, and using cython to >> gain higher level language benefits where appropriate. Of course, that >> brings in the danger of reliance on another complex tool, cython. If >> that danger is considered excessive, then just stick with C. > > > There are many small benefits C++ can offer, even if numpy chooses only to use a tiny subset of the C++ language. For example, RAII can be used to reliably eliminate PyObject reference leaks. > > Consider a regression like this: > http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html > > Fixing this in C would require switching all the relevant usages of NPY_MAXARGS to use a dynamic memory allocation. This brings with it the potential of easily introducing a memory leak, and is a lot of work to do. In C++, this functionality could be placed inside a class, where the deterministic construction/destruction semantics eliminate the risk of memory leaks and make the code easier to read at the same time. There are other examples like this where the C language has forced a suboptimal design choice because of how hard it would be to do it better. > > Cheers, > Mark > >> >> Eric >> >> > >> > Chuck >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Fri Feb 17 22:07:17 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 18 Feb 2012 04:07:17 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: Den 18. feb. 2012 kl. 01:58 skrev Charles R Harris : > > > On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau wrote: > I don't think c++ has any significant advantage over c for high performance libraries. I am not convinced by the number of people argument either: it is not my experience that c++ is easier to maintain in a open source context, where the level of people is far from consistent. I doubt many people did not contribute to numoy because it is in c instead if c++. While this is somehow subjective, there are reasons that c is much more common than c++ in that context. > > > I think C++ offers much better tools than C for the sort of things in Numpy. The compiler will take care of lots of things that now have to be hand crafted and I wouldn't be surprised to see the code size shrink by a significant factor. The C++11 standard is fantastic. There are automatic data types, closures, reference counting, weak references, an improved STL with datatypes that map almost 1:1 against any built-in Python type, a sane threading API, regex, ect. Even prng is Mersenne Twister by standard. With C++11 it is finally possible to "write C++ (almost) like Python". On the downside, C++ takes a long term to learn, most C++ text books teach bad programming habits from the beginning to the end, and C++ becomes inherently dangerous if you write C++ like C. Many also abuse C++ as an bloatware generator. Templates can also be abused to write code that are impossible to debug. While it in theory could be better, C is a much smaller language. Personally I prefer C++ to C, but I am not convinced it will be better for NumPy. I agree about Cython. It is nice for writing a Python interface for C, but get messy and unclean when used for anything else. It also has too much focus on adding all sorts of "new features" instead of correctness and stability. I don't trust it to generate bug-free code anymore. For wrapping C, Swig might be just as good. For C++, SIP, CXX or Boost.Pyton work well too. If cracy ideas are allowed, what about PyPy RPython? Or perhaps Go? Or even C# if a native compuler could be found? Sturla > I would much rather move most part to cython to solve subtle ref counting issues, typically. > > > Not me, I'd rather write most stuff in C/C++ than Cython, C is cleaner ;) Cython good for the Python interface, but once past that barrier C is easier, and C++ has lots of useful things. > The only way that i know of to have a stable and usable abi is to wrap the c++ code in c. Wrapping c++ libraries in python has always been a pain in my experience. How are template or exceptions handled across languages ? it will also be a significant issue on windows with open source compilers. > > Interestingly, the api from clang exported to other languages is in c... > > > The api isn't the same as the implementation language. I wouldn't prejudge these issues, but some indication of how they would be solved might be helpful. > > > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Fri Feb 17 22:20:42 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 18 Feb 2012 04:20:42 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: > . > > To be used approprietly, c++ requires much more discipline than c. Doing this for a community-based project is very hard. Doing this with people who often are scientist first and programmers second even harder. > This is very important. I am not sure it is doable. Bad C++ is far worse than bad C. We would have to invent strict coding style and idiom rules, and enforce them like a totalitarian government. That begs the question if something else than C or C++ should be used. D C# Fortran 2003 Go RPython Sturla -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Fri Feb 17 22:25:14 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 18 Feb 2012 04:25:14 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: > > How about a different variation. How many people writing Python would happily give up the following: > > 1) lists > 2) dictionaries > 3) default types > 4) classes > 5) automatic dellocation of memory > 1) std::vector 2) std::unordered_map 3) auto 4) class 5) std::shared_ptr Sturla From jason-sage at creativetrax.com Fri Feb 17 22:27:02 2012 From: jason-sage at creativetrax.com (Jason Grout) Date: Fri, 17 Feb 2012 21:27:02 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: <4F3F1A86.30301@creativetrax.com> On 2/17/12 9:07 PM, Sturla Molden wrote: > > Den 18. feb. 2012 kl. 01:58 skrev Charles R Harris > >: > >> >> >> On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau > > wrote: >> >> I don't think c++ has any significant advantage over c for high >> performance libraries. I am not convinced by the number of people >> argument either: it is not my experience that c++ is easier to >> maintain in a open source context, where the level of people is >> far from consistent. I doubt many people did not contribute to >> numoy because it is in c instead if c++. While this is somehow >> subjective, there are reasons that c is much more common than c++ >> in that context. >> >> >> I think C++ offers much better tools than C for the sort of things in >> Numpy. The compiler will take care of lots of things that now have to >> be hand crafted and I wouldn't be surprised to see the code size >> shrink by a significant factor. > > The C++11 standard is fantastic. There are automatic data types, > closures, reference counting, weak references, an improved STL with > datatypes that map almost 1:1 against any built-in Python type, a sane > threading API, regex, ect. Even prng is Mersenne Twister by standard. > With C++11 it is finally possible to "write C++ (almost) like Python". > On the downside, C++ takes a long term to learn, most C++ text books > teach bad programming habits from the beginning to the end, and C++ > becomes inherently dangerous if you write C++ like C. Many also abuse > C++ as an bloatware generator. Templates can also be abused to write > code that are impossible to debug. While it in theory could be better, C > is a much smaller language. Personally I prefer C++ to C, but I am not > convinced it will be better for NumPy. > > I agree about Cython. It is nice for writing a Python interface for C, > but get messy and unclean when used for anything else. It also has too > much focus on adding all sorts of "new features" instead of correctness > and stability. I don't trust it to generate bug-free code anymore. For what it's worth, Cython supports C++ now. I'm sure there are people on this list that know much better than me the extent of this support, so I will let them chime in, but here are some docs on it: http://docs.cython.org/src/userguide/wrapping_CPlusPlus.html If you have specific examples of new features trumping correctness and stability, I'm sure the Cython devel list would love to hear about it. They seem to be pretty concerned about stability and correctness to me, though I admit I don't follow the list extremely deeply. I don't trust any automated tool to generate bug-free code. I don't even trust myself to generate bug-free code :). Jason From charlesr.harris at gmail.com Fri Feb 17 22:52:56 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 17 Feb 2012 20:52:56 -0700 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: On Fri, Feb 17, 2012 at 7:29 PM, David Cournapeau wrote: > > Le 18 f?vr. 2012 00:58, "Charles R Harris" a > ?crit : > > > > > > > > > On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau > wrote: > >> > >> I don't think c++ has any significant advantage over c for high > performance libraries. I am not convinced by the number of people argument > either: it is not my experience that c++ is easier to maintain in a open > source context, where the level of people is far from consistent. I doubt > many people did not contribute to numoy because it is in c instead if c++. > While this is somehow subjective, there are reasons that c is much more > common than c++ in that context. > > > > > > I think C++ offers much better tools than C for the sort of things in > Numpy. The compiler will take care of lots of things that now have to be > hand crafted and I wouldn't be surprised to see the code size shrink by a > significant factor. > > There are two arguments here: that c code in numpy could be improved, and > that c++ is the best way to do it. Nobody so far has argued against the > first argument. i think there is a lot of space to improve things while > still be in C. > > You say that the compiler would take care of a lot of things: so far, the > main thing that has been mentionned is raii. While it is certainly a useful > concept, I find it ewtremely difficult to use correctly in real > applications. Things that are simple to do on simple examples become really > hard to deal with when features start to interact with each other (which is > always in c++). Writing robust code that is exception safe with the stl > requires a lot of knowledge. I don't have this knowledge. I have .o doubt > Mark has this knowledge. Does anyone else on this list has ? > I have the sense you have written much in C++. Exception handling is maybe one of the weakest aspects of C, that is, it basically doesn't have any. The point is, I'd rather not *have* to worry much about the C/C++ side of things, and I think once a solid foundation is in place I won't have to nearly as much. Back in the late 80's I used rather nice Fortran and C++ compilers for writing code to run in extended DOS (the dos limit was 640 KB at that time). They were written in - wait for it - Pascal. The authors explained this seemingly odd decision by claiming that Pascal was better for bigger projects than C, and I agreed with them ;) Now you can point to Linux, which is 30 million + lines of C, but that is rather exceptional and the barriers to entry at this point are pretty darn high. My own experience is that beginners can seldom write more than a page of C and get it right, mostly because of pointers. Now C++ has a ton of subtleties and one needs to decide up front what parts to use and what not, but once a well designed system is in place, many things become easier because a lot of housekeeping is done for you. My own concern here is that the project is bigger than Mark thinks and he might get sucked off into a sideline, but I'd sure like to see the experiment made. > >> I would much rather move most part to cython to solve subtle ref > counting issues, typically. > > > > > > Not me, I'd rather write most stuff in C/C++ than Cython, C is cleaner > ;) Cython good for the Python interface, but once past that barrier C is > easier, and C++ has lots of useful things. > >> > >> The only way that i know of to have a stable and usable abi is to wrap > the c++ code in c. Wrapping c++ libraries in python has always been a pain > in my experience. How are template or exceptions handled across languages ? > it will also be a significant issue on windows with open source compilers. > >> > >> Interestingly, the api from clang exported to other languages is in c... > > > > > > The api isn't the same as the implementation language. I wouldn't > prejudge these issues, but some indication of how they would be solved > might be helpful. > > I understand that api and inplementation language are not the same: you > just quoted the part where I was mentioning it :) > > Assuming a c++ inplementation with a c api, how will you deal with > templates ? how will you deal with exception ? How will you deal with > exception crossing dll/so between different compilers, which is a very > common situation in our community ? > None of these strike me as relevant, I mean, they are internals, not api problems, and shouldn't be visible to the user. How Mark would implement the C++ API, as opposed to the C API I don't know, but since both would be there I don't see the problem. But really, we need more details on how these things would work. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Fri Feb 17 22:54:12 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 18 Feb 2012 04:54:12 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <4F3E93CA.8020703@hawaii.edu> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> Den 17. feb. 2012 kl. 18:52 skrev Eric Firing :. > > It's true that matplotlib relies heavily on C++, both via the Agg > library and in its own extension code. Personally, I don't like this; I > think it raises the barrier to contributing. C++ is an order of > magnitude more complicated than C--harder to read, and much harder to > write, unless one is a true expert. This is not true. C++ can be much easier, particularly for those who already know Python. The problem: C++ textbooks teach C++ as a subset of C. Writing C in C++ just adds the complexity of C++ on top of C, for no good reason. I can write FORTRAN in any language, it does not mean it is a good idea. We would have to start by teaching people to write good C++. E.g., always use the STL like Python built-in types if possible. Dynamic memory should be std::vector, not new or malloc. Pointers should be replaced with references. We would have to write a C++ programming tutorial that is based on Pyton knowledge instead of C knowledge. Sturla From sturla at molden.no Fri Feb 17 22:55:03 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 18 Feb 2012 04:55:03 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <4F3F1A86.30301@creativetrax.com> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <4F3F1A86.30301@creativetrax.com> Message-ID: <4526272F-5600-4191-84F6-129207798753@molden.no> Den 18. feb. 2012 kl. 04:27 skrev Jason Grout : > On 2/17/12 9:07 PM, Sturla Molden wrote: >> >> Den 18. feb. 2012 kl. 01:58 skrev Charles R Harris >> >: >> >>> >>> >>> On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau >> > wrote: >>> >>> I don't think c++ has any significant advantage over c for high >>> performance libraries. I am not convinced by the number of people >>> argument either: it is not my experience that c++ is easier to >>> maintain in a open source context, where the level of people is >>> far from consistent. I doubt many people did not contribute to >>> numoy because it is in c instead if c++. While this is somehow >>> subjective, there are reasons that c is much more common than c++ >>> in that context. >>> >>> >>> I think C++ offers much better tools than C for the sort of things in >>> Numpy. The compiler will take care of lots of things that now have to >>> be hand crafted and I wouldn't be surprised to see the code size >>> shrink by a significant factor. >> >> The C++11 standard is fantastic. There are automatic data types, >> closures, reference counting, weak references, an improved STL with >> datatypes that map almost 1:1 against any built-in Python type, a sane >> threading API, regex, ect. Even prng is Mersenne Twister by standard. >> With C++11 it is finally possible to "write C++ (almost) like Python". >> On the downside, C++ takes a long term to learn, most C++ text books >> teach bad programming habits from the beginning to the end, and C++ >> becomes inherently dangerous if you write C++ like C. Many also abuse >> C++ as an bloatware generator. Templates can also be abused to write >> code that are impossible to debug. While it in theory could be better, C >> is a much smaller language. Personally I prefer C++ to C, but I am not >> convinced it will be better for NumPy. >> >> I agree about Cython. It is nice for writing a Python interface for C, >> but get messy and unclean when used for anything else. It also has too >> much focus on adding all sorts of "new features" instead of correctness >> and stability. I don't trust it to generate bug-free code anymore. > > For what it's worth, Cython supports C++ now. I'm sure there are people > on this list that know much better than me the extent of this support, > so I will let them chime in, but here are some docs on it: > > http://docs.cython.org/src/userguide/wrapping_CPlusPlus.html > > If you have specific examples of new features trumping correctness and > stability, I'm sure the Cython devel list would love to hear about it. > They seem to be pretty concerned about stability and correctness to me, > though I admit I don't follow the list extremely deeply. > > I don't trust any automated tool to generate bug-free code. I don't > even trust myself to generate bug-free code :). > > Jason > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From jason-sage at creativetrax.com Fri Feb 17 23:01:27 2012 From: jason-sage at creativetrax.com (Jason Grout) Date: Fri, 17 Feb 2012 22:01:27 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> Message-ID: <4F3F2297.7010703@creativetrax.com> On 2/17/12 9:54 PM, Sturla Molden wrote: > We would have to write a C++ programming tutorial that is based on Pyton knowledge instead of C knowledge. I personally would love such a thing. It's been a while since I did anything nontrivial on my own in C++. Jason From charlesr.harris at gmail.com Fri Feb 17 23:03:21 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 17 Feb 2012 21:03:21 -0700 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: On Fri, Feb 17, 2012 at 8:07 PM, Sturla Molden wrote: > > Den 18. feb. 2012 kl. 01:58 skrev Charles R Harris < > charlesr.harris at gmail.com>: > > > > On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau wrote: > >> I don't think c++ has any significant advantage over c for high >> performance libraries. I am not convinced by the number of people argument >> either: it is not my experience that c++ is easier to maintain in a open >> source context, where the level of people is far from consistent. I doubt >> many people did not contribute to numoy because it is in c instead if c++. >> While this is somehow subjective, there are reasons that c is much more >> common than c++ in that context. >> > > I think C++ offers much better tools than C for the sort of things in > Numpy. The compiler will take care of lots of things that now have to be > hand crafted and I wouldn't be surprised to see the code size shrink by a > significant factor. > > > The C++11 standard is fantastic. There are automatic data types, closures, > reference counting, weak references, an improved STL with datatypes that > map almost 1:1 against any built-in Python type, a sane threading API, > regex, ect. Even prng is Mersenne Twister by standard. With C++11 it is > finally possible to "write C++ (almost) like Python". On the downside, > C++ takes a long term to learn, most C++ text books > Are crap ;) Yeah, that is a downside. > teach bad programming habits from the beginning to the end, and C++ > becomes inherently dangerous if you write C++ like C. Many also abuse C++ > as an bloatware generator. Templates can also be abused to write code that > are impossible to debug. While it in theory could be better, C is a much > smaller language. Personally I prefer C++ to C, but I am not convinced it > will be better for NumPy. > > I agree about Cython. It is nice for writing a Python interface for C, but > get messy and unclean when used for anything else. It also has too much > focus on adding all sorts of "new features" instead of correctness and > stability. I don't trust it to generate bug-free code anymore. > > For wrapping C, Swig might be just as good. For C++, SIP, CXX or > Boost.Pyton work well too. > > If cracy ideas are allowed, what about PyPy RPython? Or perhaps Go? Or > even C# if a native compuler could be found? > > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Fri Feb 17 23:10:44 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 18 Feb 2012 05:10:44 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <4F3F1A86.30301@creativetrax.com> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <4F3F1A86.30301@creativetrax.com> Message-ID: <39860981-F69C-4E14-BE9D-2F65AC4F283F@molden.no> > > For what it's worth, Cython supports C++ now. I'm sure there are people > on this list that know much better than me the extent of this support, > so I will let them chime in, but here are some docs on it: > > http://docs.cython.org/src/userguide/wrapping_CPlusPlus.html > Sure. They just keep adding features for the expence of stability. No focus or sence of direction. Focus on a small feature set, make it right, then don't add to it. That is the root of the successes of C, Python and Java. NumPy needs a stabile compiler that don't make mistakes everywhere. You cannot trust that to Cython. Sturla From charlesr.harris at gmail.com Fri Feb 17 23:16:22 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 17 Feb 2012 21:16:22 -0700 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <39860981-F69C-4E14-BE9D-2F65AC4F283F@molden.no> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <4F3F1A86.30301@creativetrax.com> <39860981-F69C-4E14-BE9D-2F65AC4F283F@molden.no> Message-ID: On Fri, Feb 17, 2012 at 9:10 PM, Sturla Molden wrote: > > > > For what it's worth, Cython supports C++ now. I'm sure there are people > > on this list that know much better than me the extent of this support, > > so I will let them chime in, but here are some docs on it: > > > > http://docs.cython.org/src/userguide/wrapping_CPlusPlus.html > > > > > Sure. They just keep adding features for the expence of stability. No > focus or sence of direction. Focus on a small feature set, make it right, > then don't add to it. That is the root of the successes of C, Python and > Java. NumPy needs a stabile compiler that don't make mistakes everywhere. > You cannot trust that to Cython. > > I'm staying out of this fight. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Fri Feb 17 23:18:58 2012 From: cournape at gmail.com (David Cournapeau) Date: Fri, 17 Feb 2012 22:18:58 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: Le 18 f?vr. 2012 03:53, "Charles R Harris" a ?crit : > > > > On Fri, Feb 17, 2012 at 7:29 PM, David Cournapeau wrote: >> >> >> Le 18 f?vr. 2012 00:58, "Charles R Harris" a ?crit : >> >> >> > >> > >> > >> > On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau wrote: >> >> >> >> I don't think c++ has any significant advantage over c for high performance libraries. I am not convinced by the number of people argument either: it is not my experience that c++ is easier to maintain in a open source context, where the level of people is far from consistent. I doubt many people did not contribute to numoy because it is in c instead if c++. While this is somehow subjective, there are reasons that c is much more common than c++ in that context. >> > >> > >> > I think C++ offers much better tools than C for the sort of things in Numpy. The compiler will take care of lots of things that now have to be hand crafted and I wouldn't be surprised to see the code size shrink by a significant factor. >> >> There are two arguments here: that c code in numpy could be improved, and that c++ is the best way to do it. Nobody so far has argued against the first argument. i think there is a lot of space to improve things while still be in C. >> >> You say that the compiler would take care of a lot of things: so far, the main thing that has been mentionned is raii. While it is certainly a useful concept, I find it ewtremely difficult to use correctly in real applications. Things that are simple to do on simple examples become really hard to deal with when features start to interact with each other (which is always in c++). Writing robust code that is exception safe with the stl requires a lot of knowledge. I don't have this knowledge. I have .o doubt Mark has this knowledge. Does anyone else on this list has ? > > > I have the sense you have written much in C++. Exception handling is maybe one of the weakest aspects of C, that is, it basically doesn't have any. The point is, I'd rather not *have* to worry much about the C/C++ side of things, and I think once a solid foundation is in place I won't have to nearly as much. > > Back in the late 80's I used rather nice Fortran and C++ compilers for writing code to run in extended DOS (the dos limit was 640 KB at that time). They were written in - wait for it - Pascal. The authors explained this seemingly odd decision by claiming that Pascal was better for bigger projects than C, and I agreed with them ;) Now you can point to Linux, which is 30 million + lines of C, but that is rather exceptional and the barriers to entry at this point are pretty darn high. My own experience is that beginners can seldom write more than a page of C and get it right, mostly because of pointers. Now C++ has a ton of subtleties and one needs to decide up front what parts to use and what not, but once a well designed system is in place, many things become easier because a lot of housekeeping is done for you. > > My own concern here is that the project is bigger than Mark thinks and he might get sucked off into a sideline, but I'd sure like to see the experiment made. >> >> >> I would much rather move most part to cython to solve subtle ref counting issues, typically. >> > >> > >> > Not me, I'd rather write most stuff in C/C++ than Cython, C is cleaner ;) Cython good for the Python interface, but once past that barrier C is easier, and C++ has lots of useful things. >> >> >> >> The only way that i know of to have a stable and usable abi is to wrap the c++ code in c. Wrapping c++ libraries in python has always been a pain in my experience. How are template or exceptions handled across languages ? it will also be a significant issue on windows with open source compilers. >> >> >> >> Interestingly, the api from clang exported to other languages is in c... >> > >> > >> > The api isn't the same as the implementation language. I wouldn't prejudge these issues, but some indication of how they would be solved might be helpful. >> >> I understand that api and inplementation language are not the same: you just quoted the part where I was mentioning it :) >> >> Assuming a c++ inplementation with a c api, how will you deal with templates ? how will you deal with exception ? How will you deal with exception crossing dll/so between different compilers, which is a very common situation in our community ? > > > None of these strike me as relevant, I mean, they are internals, not api problems, and shouldn't be visible to the user. How Mark would implement the C++ API, as opposed to the C API I don't know, but since both would be there I don't see the problem. But really, we need more details on how these things would work. I don't understand why you think this is not relevant ? If numpy is in c++, with a C API, most users of numpy C/C++ API will use the C API, at least at first, since most of them are in C. Changes of restrictions on how this API xan be used is visible. To be more concrete, if numpy is built by MS compiler, and an exception is thrown, you will have a lots of trouble with an extension built with gcc. I have also observed some weird things in linux when mixing intel and gcc. This will have significant impacts on how people will be able to use extensions. I am a bit surprised by the claim.that abi and cross language API are not an issue with c++: it is a widely shared issue even within c++ proponents. David > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason-sage at creativetrax.com Fri Feb 17 23:23:16 2012 From: jason-sage at creativetrax.com (Jason Grout) Date: Fri, 17 Feb 2012 22:23:16 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <39860981-F69C-4E14-BE9D-2F65AC4F283F@molden.no> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <4F3F1A86.30301@creativetrax.com> <39860981-F69C-4E14-BE9D-2F65AC4F283F@molden.no> Message-ID: <4F3F27B4.4060808@creativetrax.com> On 2/17/12 10:10 PM, Sturla Molden wrote: > Sure. They just keep adding features for the expence of stability. No > focus or sence of direction. Focus on a small feature set, make it > right, then don't add to it. That is the root of the successes of C, > Python and Java. NumPy needs a stabile compiler that don't make > mistakes everywhere. You cannot trust that to Cython. Again, if you have specific examples of stability being sacrificed, I'm sure the Cython list would like to hear about it. Your statements, as-is, are raising huge FUD flags for me. Anyways, I've said enough on this, and we've seen enough problems in discussions on this list already. Many people in the numpy community know Cython well enough to judge these things for themselves. Thanks, Jason From sturla at molden.no Fri Feb 17 23:30:00 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 18 Feb 2012 05:30:00 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <4F3F2297.7010703@creativetrax.com> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: Den 18. feb. 2012 kl. 05:01 skrev Jason Grout : > On 2/17/12 9:54 PM, Sturla Molden wrote: >> We would have to write a C++ programming tutorial that is based on Pyton knowledge instead of C knowledge. > > I personally would love such a thing. It's been a while since I did > anything nontrivial on my own in C++. > One example: How do we code multiple return values? In Python: - Return a tuple. In C: - Use pointers (evilness) In C++: - Return a std::tuple, as you would in Python. - Use references, as you would in Fortran or Pascal. - Use pointers, as you would in C. C++ textbooks always pick the last... I would show the first and the second method, and perhaps intentionally forget the last. Sturla From sturla at molden.no Fri Feb 17 23:35:03 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 18 Feb 2012 05:35:03 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <4F3F27B4.4060808@creativetrax.com> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <4F3F1A86.30301@creativetrax.com> <39860981-F69C-4E14-BE9D-2F65AC4F283F@molden.no> <4F3F27B4.4060808@creativetrax.com> Message-ID: <47BE38F2-31B6-47DD-96F1-A065B50DBBC9@molden.no> Den 18. feb. 2012 kl. 05:23 skrev Jason Grout : > On 2/17/12 10:10 PM, Sturla Molden wrote: > >> Sure. They just keep adding features for the expence of stability. No >> focus or sence of direction. Focus on a small feature set, make it >> right, then don't add to it. That is the root of the successes of C, >> Python and Java. NumPy needs a stabile compiler that don't make >> mistakes everywhere. You cannot trust that to Cython. > > Again, if you have specific examples of stability being sacrificed, I'm > sure the Cython list would like to hear about it. Your statements, > as-is, are raising huge FUD flags for me. > Cython is still 0.16, it is still unfinished. We cannot base NumPy on an unfinished compiler. Sturla From charlesr.harris at gmail.com Fri Feb 17 23:37:24 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 17 Feb 2012 21:37:24 -0700 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: On Fri, Feb 17, 2012 at 9:18 PM, David Cournapeau wrote: > > Le 18 f?vr. 2012 03:53, "Charles R Harris" a > ?crit : > > > > > > > > > On Fri, Feb 17, 2012 at 7:29 PM, David Cournapeau > wrote: > >> > >> > >> Le 18 f?vr. 2012 00:58, "Charles R Harris" > a ?crit : > >> > >> > >> > > >> > > >> > > >> > On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau > wrote: > >> >> > >> >> I don't think c++ has any significant advantage over c for high > performance libraries. I am not convinced by the number of people argument > either: it is not my experience that c++ is easier to maintain in a open > source context, where the level of people is far from consistent. I doubt > many people did not contribute to numoy because it is in c instead if c++. > While this is somehow subjective, there are reasons that c is much more > common than c++ in that context. > >> > > >> > > >> > I think C++ offers much better tools than C for the sort of things in > Numpy. The compiler will take care of lots of things that now have to be > hand crafted and I wouldn't be surprised to see the code size shrink by a > significant factor. > >> > >> There are two arguments here: that c code in numpy could be improved, > and that c++ is the best way to do it. Nobody so far has argued against the > first argument. i think there is a lot of space to improve things while > still be in C. > >> > >> You say that the compiler would take care of a lot of things: so far, > the main thing that has been mentionned is raii. While it is certainly a > useful concept, I find it ewtremely difficult to use correctly in real > applications. Things that are simple to do on simple examples become really > hard to deal with when features start to interact with each other (which is > always in c++). Writing robust code that is exception safe with the stl > requires a lot of knowledge. I don't have this knowledge. I have .o doubt > Mark has this knowledge. Does anyone else on this list has ? > > > > > > I have the sense you have written much in C++. Exception handling is > maybe one of the weakest aspects of C, that is, it basically doesn't have > any. The point is, I'd rather not *have* to worry much about the C/C++ side > of things, and I think once a solid foundation is in place I won't have to > nearly as much. > > > > Back in the late 80's I used rather nice Fortran and C++ compilers for > writing code to run in extended DOS (the dos limit was 640 KB at that > time). They were written in - wait for it - Pascal. The authors explained > this seemingly odd decision by claiming that Pascal was better for bigger > projects than C, and I agreed with them ;) Now you can point to Linux, > which is 30 million + lines of C, but that is rather exceptional and the > barriers to entry at this point are pretty darn high. My own experience is > that beginners can seldom write more than a page of C and get it right, > mostly because of pointers. Now C++ has a ton of subtleties and one needs > to decide up front what parts to use and what not, but once a well designed > system is in place, many things become easier because a lot of housekeeping > is done for you. > > > > My own concern here is that the project is bigger than Mark thinks and > he might get sucked off into a sideline, but I'd sure like to see the > experiment made. > >> > >> >> I would much rather move most part to cython to solve subtle ref > counting issues, typically. > >> > > >> > > >> > Not me, I'd rather write most stuff in C/C++ than Cython, C is > cleaner ;) Cython good for the Python interface, but once past that barrier > C is easier, and C++ has lots of useful things. > >> >> > >> >> The only way that i know of to have a stable and usable abi is to > wrap the c++ code in c. Wrapping c++ libraries in python has always been a > pain in my experience. How are template or exceptions handled across > languages ? it will also be a significant issue on windows with open source > compilers. > >> >> > >> >> Interestingly, the api from clang exported to other languages is in > c... > >> > > >> > > >> > The api isn't the same as the implementation language. I wouldn't > prejudge these issues, but some indication of how they would be solved > might be helpful. > >> > >> I understand that api and inplementation language are not the same: you > just quoted the part where I was mentioning it :) > >> > >> Assuming a c++ inplementation with a c api, how will you deal with > templates ? how will you deal with exception ? How will you deal with > exception crossing dll/so between different compilers, which is a very > common situation in our community ? > > > > > > None of these strike me as relevant, I mean, they are internals, not api > problems, and shouldn't be visible to the user. How Mark would implement > the C++ API, as opposed to the C API I don't know, but since both would be > there I don't see the problem. But really, we need more details on how > these things would work. > > I don't understand why you think this is not relevant ? If numpy is in > c++, with a C API, most users of numpy C/C++ API will use the C API, at > least at first, since most of them are in C. Changes of restrictions on how > this API xan be used is visible. > > To be more concrete, if numpy is built by MS compiler, and an exception is > thrown, you will have a lots of trouble with an extension built with gcc. > Why would you even see an exception if it is caught before it escapes? I would expect the C API to behave just as it currently does. What am I missing? > I have also observed some weird things in linux when mixing intel and gcc. > This will have significant impacts on how people will be able to use > extensions. > > I am a bit surprised by the claim.that abi and cross language API are not > an issue with c++: it is a widely shared issue even within c++ proponents. > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Fri Feb 17 23:47:53 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 18 Feb 2012 05:47:53 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: <3226706D-A07C-42EC-B2CF-260B6BF0F08F@molden.no> > > Why would you even see an exception if it is caught before it escapes? I would expect the C API to behave just as it currently does. What am I missing? Structured exception handling in the OS. MSVC uses SEH for C++ exceptions. Memory allocation fails in gcc code. Instead of returning NULL, Windows jumps to the SEH handler set in the MSVC code... *poff* Sturla From charlesr.harris at gmail.com Fri Feb 17 23:54:38 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 17 Feb 2012 21:54:38 -0700 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: On Fri, Feb 17, 2012 at 9:18 PM, David Cournapeau wrote: > > Le 18 f?vr. 2012 03:53, "Charles R Harris" a > ?crit : > > > > > > > > > On Fri, Feb 17, 2012 at 7:29 PM, David Cournapeau > wrote: > >> > >> > >> Le 18 f?vr. 2012 00:58, "Charles R Harris" > a ?crit : > >> > >> > >> > > >> > > >> > > >> > On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau > wrote: > >> >> > >> >> I don't think c++ has any significant advantage over c for high > performance libraries. I am not convinced by the number of people argument > either: it is not my experience that c++ is easier to maintain in a open > source context, where the level of people is far from consistent. I doubt > many people did not contribute to numoy because it is in c instead if c++. > While this is somehow subjective, there are reasons that c is much more > common than c++ in that context. > >> > > >> > > >> > I think C++ offers much better tools than C for the sort of things in > Numpy. The compiler will take care of lots of things that now have to be > hand crafted and I wouldn't be surprised to see the code size shrink by a > significant factor. > >> > >> There are two arguments here: that c code in numpy could be improved, > and that c++ is the best way to do it. Nobody so far has argued against the > first argument. i think there is a lot of space to improve things while > still be in C. > >> > >> You say that the compiler would take care of a lot of things: so far, > the main thing that has been mentionned is raii. While it is certainly a > useful concept, I find it ewtremely difficult to use correctly in real > applications. Things that are simple to do on simple examples become really > hard to deal with when features start to interact with each other (which is > always in c++). Writing robust code that is exception safe with the stl > requires a lot of knowledge. I don't have this knowledge. I have .o doubt > Mark has this knowledge. Does anyone else on this list has ? > > > > > > I have the sense you have written much in C++. Exception handling is > maybe one of the weakest aspects of C, that is, it basically doesn't have > any. The point is, I'd rather not *have* to worry much about the C/C++ side > of things, and I think once a solid foundation is in place I won't have to > nearly as much. > > > > Back in the late 80's I used rather nice Fortran and C++ compilers for > writing code to run in extended DOS (the dos limit was 640 KB at that > time). They were written in - wait for it - Pascal. The authors explained > this seemingly odd decision by claiming that Pascal was better for bigger > projects than C, and I agreed with them ;) Now you can point to Linux, > which is 30 million + lines of C, but that is rather exceptional and the > barriers to entry at this point are pretty darn high. My own experience is > that beginners can seldom write more than a page of C and get it right, > mostly because of pointers. Now C++ has a ton of subtleties and one needs > to decide up front what parts to use and what not, but once a well designed > system is in place, many things become easier because a lot of housekeeping > is done for you. > > > > My own concern here is that the project is bigger than Mark thinks and > he might get sucked off into a sideline, but I'd sure like to see the > experiment made. > >> > >> >> I would much rather move most part to cython to solve subtle ref > counting issues, typically. > >> > > >> > > >> > Not me, I'd rather write most stuff in C/C++ than Cython, C is > cleaner ;) Cython good for the Python interface, but once past that barrier > C is easier, and C++ has lots of useful things. > >> >> > >> >> The only way that i know of to have a stable and usable abi is to > wrap the c++ code in c. Wrapping c++ libraries in python has always been a > pain in my experience. How are template or exceptions handled across > languages ? it will also be a significant issue on windows with open source > compilers. > >> >> > >> >> Interestingly, the api from clang exported to other languages is in > c... > >> > > >> > > >> > The api isn't the same as the implementation language. I wouldn't > prejudge these issues, but some indication of how they would be solved > might be helpful. > >> > >> I understand that api and inplementation language are not the same: you > just quoted the part where I was mentioning it :) > >> > >> Assuming a c++ inplementation with a c api, how will you deal with > templates ? how will you deal with exception ? How will you deal with > exception crossing dll/so between different compilers, which is a very > common situation in our community ? > > > > > > None of these strike me as relevant, I mean, they are internals, not api > problems, and shouldn't be visible to the user. How Mark would implement > the C++ API, as opposed to the C API I don't know, but since both would be > there I don't see the problem. But really, we need more details on how > these things would work. > > I don't understand why you think this is not relevant ? If numpy is in > c++, with a C API, most users of numpy C/C++ API will use the C API, at > least at first, since most of them are in C. Changes of restrictions on how > this API xan be used is visible. > > To be more concrete, if numpy is built by MS compiler, and an exception is > thrown, you will have a lots of trouble with an extension built with gcc. > > I have also observed some weird things in linux when mixing intel and gcc. > This will have significant impacts on how people will be able to use > extensions. > > I am a bit surprised by the claim.that abi and cross language API are not > an issue with c++: it is a widely shared issue even within c++ proponents. > > I found this, which references 0mq (used by ipython) as an example of a C++ library with a C interface. It seems enums can have different sizes in C/C++, so that is something to watch. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Feb 17 23:56:43 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 17 Feb 2012 21:56:43 -0700 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <3226706D-A07C-42EC-B2CF-260B6BF0F08F@molden.no> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <3226706D-A07C-42EC-B2CF-260B6BF0F08F@molden.no> Message-ID: On Fri, Feb 17, 2012 at 9:47 PM, Sturla Molden wrote: > > > > > > > Why would you even see an exception if it is caught before it escapes? I > would expect the C API to behave just as it currently does. What am I > missing? > > Structured exception handling in the OS. > > MSVC uses SEH for C++ exceptions. > > Memory allocation fails in gcc code. Instead of returning NULL, Windows > jumps to the SEH handler set in the MSVC code... *poff* > But won't a C++ wrapper catch that? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sat Feb 18 00:00:20 2012 From: cournape at gmail.com (David Cournapeau) Date: Fri, 17 Feb 2012 23:00:20 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: Le 18 f?vr. 2012 04:37, "Charles R Harris" a ?crit : > > > > On Fri, Feb 17, 2012 at 9:18 PM, David Cournapeau wrote: >> >> >> Le 18 f?vr. 2012 03:53, "Charles R Harris" a ?crit : >> >> >> > >> > >> > >> > On Fri, Feb 17, 2012 at 7:29 PM, David Cournapeau wrote: >> >> >> >> >> >> Le 18 f?vr. 2012 00:58, "Charles R Harris" a ?crit : >> >> >> >> >> >> > >> >> > >> >> > >> >> > On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau < cournape at gmail.com> wrote: >> >> >> >> >> >> I don't think c++ has any significant advantage over c for high performance libraries. I am not convinced by the number of people argument either: it is not my experience that c++ is easier to maintain in a open source context, where the level of people is far from consistent. I doubt many people did not contribute to numoy because it is in c instead if c++. While this is somehow subjective, there are reasons that c is much more common than c++ in that context. >> >> > >> >> > >> >> > I think C++ offers much better tools than C for the sort of things in Numpy. The compiler will take care of lots of things that now have to be hand crafted and I wouldn't be surprised to see the code size shrink by a significant factor. >> >> >> >> There are two arguments here: that c code in numpy could be improved, and that c++ is the best way to do it. Nobody so far has argued against the first argument. i think there is a lot of space to improve things while still be in C. >> >> >> >> You say that the compiler would take care of a lot of things: so far, the main thing that has been mentionned is raii. While it is certainly a useful concept, I find it ewtremely difficult to use correctly in real applications. Things that are simple to do on simple examples become really hard to deal with when features start to interact with each other (which is always in c++). Writing robust code that is exception safe with the stl requires a lot of knowledge. I don't have this knowledge. I have .o doubt Mark has this knowledge. Does anyone else on this list has ? >> > >> > >> > I have the sense you have written much in C++. Exception handling is maybe one of the weakest aspects of C, that is, it basically doesn't have any. The point is, I'd rather not *have* to worry much about the C/C++ side of things, and I think once a solid foundation is in place I won't have to nearly as much. >> > >> > Back in the late 80's I used rather nice Fortran and C++ compilers for writing code to run in extended DOS (the dos limit was 640 KB at that time). They were written in - wait for it - Pascal. The authors explained this seemingly odd decision by claiming that Pascal was better for bigger projects than C, and I agreed with them ;) Now you can point to Linux, which is 30 million + lines of C, but that is rather exceptional and the barriers to entry at this point are pretty darn high. My own experience is that beginners can seldom write more than a page of C and get it right, mostly because of pointers. Now C++ has a ton of subtleties and one needs to decide up front what parts to use and what not, but once a well designed system is in place, many things become easier because a lot of housekeeping is done for you. >> > >> > My own concern here is that the project is bigger than Mark thinks and he might get sucked off into a sideline, but I'd sure like to see the experiment made. >> >> >> >> >> I would much rather move most part to cython to solve subtle ref counting issues, typically. >> >> > >> >> > >> >> > Not me, I'd rather write most stuff in C/C++ than Cython, C is cleaner ;) Cython good for the Python interface, but once past that barrier C is easier, and C++ has lots of useful things. >> >> >> >> >> >> The only way that i know of to have a stable and usable abi is to wrap the c++ code in c. Wrapping c++ libraries in python has always been a pain in my experience. How are template or exceptions handled across languages ? it will also be a significant issue on windows with open source compilers. >> >> >> >> >> >> Interestingly, the api from clang exported to other languages is in c... >> >> > >> >> > >> >> > The api isn't the same as the implementation language. I wouldn't prejudge these issues, but some indication of how they would be solved might be helpful. >> >> >> >> I understand that api and inplementation language are not the same: you just quoted the part where I was mentioning it :) >> >> >> >> Assuming a c++ inplementation with a c api, how will you deal with templates ? how will you deal with exception ? How will you deal with exception crossing dll/so between different compilers, which is a very common situation in our community ? >> > >> > >> > None of these strike me as relevant, I mean, they are internals, not api problems, and shouldn't be visible to the user. How Mark would implement the C++ API, as opposed to the C API I don't know, but since both would be there I don't see the problem. But really, we need more details on how these things would work. >> >> I don't understand why you think this is not relevant ? If numpy is in c++, with a C API, most users of numpy C/C++ API will use the C API, at least at first, since most of them are in C. Changes of restrictions on how this API xan be used is visible. >> >> To be more concrete, if numpy is built by MS compiler, and an exception is thrown, you will have a lots of trouble with an extension built with gcc. > > > Why would you even see an exception if it is caught before it escapes? I would expect the C API to behave just as it currently does. What am I missing? I believe that you cannot always guarantee that no exception will go through even with a catch all at the c++ -> c layer. I will try to find more about it, as I cannot remember the exact details I have in mind (need to look at the customer's code). David >> >> I have also observed some weird things in linux when mixing intel and gcc. This will have significant impacts on how people will be able to use extensions. >> >> I am a bit surprised by the claim.that abi and cross language API are not an issue with c++: it is a widely shared issue even within c++ proponents. > > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Feb 18 00:06:36 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 17 Feb 2012 22:06:36 -0700 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: On Fri, Feb 17, 2012 at 10:00 PM, David Cournapeau wrote: > > Le 18 f?vr. 2012 04:37, "Charles R Harris" a > ?crit : > > > > > > > > > On Fri, Feb 17, 2012 at 9:18 PM, David Cournapeau > wrote: > >> > >> > >> Le 18 f?vr. 2012 03:53, "Charles R Harris" > a ?crit : > >> > >> > >> > > >> > > >> > > >> > On Fri, Feb 17, 2012 at 7:29 PM, David Cournapeau > wrote: > >> >> > >> >> > >> >> Le 18 f?vr. 2012 00:58, "Charles R Harris" < > charlesr.harris at gmail.com> a ?crit : > >> >> > >> >> > >> >> > > >> >> > > >> >> > > >> >> > On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau < > cournape at gmail.com> wrote: > >> >> >> > >> >> >> I don't think c++ has any significant advantage over c for high > performance libraries. I am not convinced by the number of people argument > either: it is not my experience that c++ is easier to maintain in a open > source context, where the level of people is far from consistent. I doubt > many people did not contribute to numoy because it is in c instead if c++. > While this is somehow subjective, there are reasons that c is much more > common than c++ in that context. > >> >> > > >> >> > > >> >> > I think C++ offers much better tools than C for the sort of things > in Numpy. The compiler will take care of lots of things that now have to be > hand crafted and I wouldn't be surprised to see the code size shrink by a > significant factor. > >> >> > >> >> There are two arguments here: that c code in numpy could be > improved, and that c++ is the best way to do it. Nobody so far has argued > against the first argument. i think there is a lot of space to improve > things while still be in C. > >> >> > >> >> You say that the compiler would take care of a lot of things: so > far, the main thing that has been mentionned is raii. While it is certainly > a useful concept, I find it ewtremely difficult to use correctly in real > applications. Things that are simple to do on simple examples become really > hard to deal with when features start to interact with each other (which is > always in c++). Writing robust code that is exception safe with the stl > requires a lot of knowledge. I don't have this knowledge. I have .o doubt > Mark has this knowledge. Does anyone else on this list has ? > >> > > >> > > >> > I have the sense you have written much in C++. Exception handling is > maybe one of the weakest aspects of C, that is, it basically doesn't have > any. The point is, I'd rather not *have* to worry much about the C/C++ side > of things, and I think once a solid foundation is in place I won't have to > nearly as much. > >> > > >> > Back in the late 80's I used rather nice Fortran and C++ compilers > for writing code to run in extended DOS (the dos limit was 640 KB at that > time). They were written in - wait for it - Pascal. The authors explained > this seemingly odd decision by claiming that Pascal was better for bigger > projects than C, and I agreed with them ;) Now you can point to Linux, > which is 30 million + lines of C, but that is rather exceptional and the > barriers to entry at this point are pretty darn high. My own experience is > that beginners can seldom write more than a page of C and get it right, > mostly because of pointers. Now C++ has a ton of subtleties and one needs > to decide up front what parts to use and what not, but once a well designed > system is in place, many things become easier because a lot of housekeeping > is done for you. > >> > > >> > My own concern here is that the project is bigger than Mark thinks > and he might get sucked off into a sideline, but I'd sure like to see the > experiment made. > >> >> > >> >> >> I would much rather move most part to cython to solve subtle ref > counting issues, typically. > >> >> > > >> >> > > >> >> > Not me, I'd rather write most stuff in C/C++ than Cython, C is > cleaner ;) Cython good for the Python interface, but once past that barrier > C is easier, and C++ has lots of useful things. > >> >> >> > >> >> >> The only way that i know of to have a stable and usable abi is to > wrap the c++ code in c. Wrapping c++ libraries in python has always been a > pain in my experience. How are template or exceptions handled across > languages ? it will also be a significant issue on windows with open source > compilers. > >> >> >> > >> >> >> Interestingly, the api from clang exported to other languages is > in c... > >> >> > > >> >> > > >> >> > The api isn't the same as the implementation language. I wouldn't > prejudge these issues, but some indication of how they would be solved > might be helpful. > >> >> > >> >> I understand that api and inplementation language are not the same: > you just quoted the part where I was mentioning it :) > >> >> > >> >> Assuming a c++ inplementation with a c api, how will you deal with > templates ? how will you deal with exception ? How will you deal with > exception crossing dll/so between different compilers, which is a very > common situation in our community ? > >> > > >> > > >> > None of these strike me as relevant, I mean, they are internals, not > api problems, and shouldn't be visible to the user. How Mark would > implement the C++ API, as opposed to the C API I don't know, but since both > would be there I don't see the problem. But really, we need more details on > how these things would work. > >> > >> I don't understand why you think this is not relevant ? If numpy is in > c++, with a C API, most users of numpy C/C++ API will use the C API, at > least at first, since most of them are in C. Changes of restrictions on how > this API xan be used is visible. > >> > >> To be more concrete, if numpy is built by MS compiler, and an exception > is thrown, you will have a lots of trouble with an extension built with gcc. > > > > > > Why would you even see an exception if it is caught before it escapes? I > would expect the C API to behave just as it currently does. What am I > missing? > > I believe that you cannot always guarantee that no exception will go > through even with a catch all at the c++ -> c layer. I will try to find > more about it, as I cannot remember the exact details I have in mind (need > to look at the customer's code). > Stackoverflowsays you can catch all MSVC MEH exceptions. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Sat Feb 18 00:16:08 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 18 Feb 2012 06:16:08 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <3226706D-A07C-42EC-B2CF-260B6BF0F08F@molden.no> Message-ID: Den 18. feb. 2012 kl. 05:56 skrev Charles R Harris : > > > But won't a C++ wrapper catch that? A try-catch block with MSVC will register an SEH with the operating system. GCC (g++) implements exceptions without SEH. What happens if GCC code tries to catch a std::bad_alloc? Windows intervenes and sends control to a registered SEH. So the flow of control jumps out of GCC's hands, and goes to some catch or __except block set by MSVC instead. And now the stack is FUBAR... But this can always happen when you mix MSVC and MinGW. Even pure C code can set an SEH with MSVC, so it's not a C++ issue. You cannot wrap in a way that protects you from an intervention by the operating system. It's better to stick with MS and Intel compilers on Windows. MinGW code must execute in an SEH free environment. Sturla -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjordan1 at uw.edu Sat Feb 18 01:18:09 2012 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Fri, 17 Feb 2012 22:18:09 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden wrote: > > > Den 18. feb. 2012 kl. 05:01 skrev Jason Grout : > >> On 2/17/12 9:54 PM, Sturla Molden wrote: >>> We would have to write a C++ programming tutorial that is based on Pyton knowledge instead of C knowledge. >> >> I personally would love such a thing. ?It's been a while since I did >> anything nontrivial on my own in C++. >> > > One example: How do we code multiple return values? > > In Python: > - Return a tuple. > > In C: > - Use pointers (evilness) > > In C++: > - Return a std::tuple, as you would in Python. > - Use references, as you would in Fortran or Pascal. > - Use pointers, as you would in C. > > C++ textbooks always pick the last... > > I would show the first and the second method, and perhaps intentionally forget the last. > > Sturla > I can add my own 2 cents about cython vs. C vs. C++, based on summer coding experiences. I was an intern at Enthought, sharing an office with Mark W. (Which was a treat. I recommend you all quit your day jobs and haunt whatever office Mark is inhabiting.) I was trying to optimize some code and that lead to experimenting with both cython and C. Dealing with the C internals of numpy was frustrating. Since C doesn't have templating but numpy kinda needs it, instead python scripts go over and manually perform templating. Not the most obvious thing. There were other issues in the background--including C doesn't allow for abstraction (i.e. easy to read), lots of pointer-fu is required, and the C API is lightly documented and already plenty difficult. On the flip side, cython looked pretty...but I didn't get the performance gains I wanted, and had to spend a lot of time figuring out if it was cython, needing to add types, buggy support for numpy, or actually the algorithm. The C files generated by cython were enormous and difficult to read. They really weren't meant for human consumption. As Sturla has said, regardless of the quality of the current product, it isn't stable. And even if it looks friendly there's magic going on under the hood. Magic means it's hard to diagnose and fix problems. At least one very smart person has told me they find cython most useful for wrapping C/C++ libraries and exposing them to python, which is a far cry from library writing. (Of course Wes McKinney, a cython evangelist, uses it all over his pandas library.) In comparison, there are a number of high quality, performant, open-source C++ based array libraries out there with very friendly API's. Things like eigen (http://eigen.tuxfamily.org/index.php?title=Main_Page) and Armadillo (http://arma.sourceforge.net/). They seem to have plenty of users and more devs than numpy. On the broader topic of recruitment...sure, cython has a lower barrier to entry than C++. But there are many, many more C++ developers and resources out there than cython resources. And it likely will stay that way for quite some time. -Chris > > > > > > > > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From matthew.brett at gmail.com Sat Feb 18 02:31:45 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 17 Feb 2012 23:31:45 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: Hi, On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire wrote: > On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden wrote: >> >> >> Den 18. feb. 2012 kl. 05:01 skrev Jason Grout : >> >>> On 2/17/12 9:54 PM, Sturla Molden wrote: >>>> We would have to write a C++ programming tutorial that is based on Pyton knowledge instead of C knowledge. >>> >>> I personally would love such a thing. ?It's been a while since I did >>> anything nontrivial on my own in C++. >>> >> >> One example: How do we code multiple return values? >> >> In Python: >> - Return a tuple. >> >> In C: >> - Use pointers (evilness) >> >> In C++: >> - Return a std::tuple, as you would in Python. >> - Use references, as you would in Fortran or Pascal. >> - Use pointers, as you would in C. >> >> C++ textbooks always pick the last... >> >> I would show the first and the second method, and perhaps intentionally forget the last. >> >> Sturla >> > On the flip side, cython looked pretty...but I didn't get the > performance gains I wanted, and had to spend a lot of time figuring > out if it was cython, needing to add types, buggy support for numpy, > or actually the algorithm. At the time, was the numpy support buggy? I personally haven't had many problems with Cython and numpy. > The C files generated by cython were > enormous and difficult to read. They really weren't meant for human > consumption. Yes, it takes some practice to get used to what Cython will do, and how to optimize the output. > As Sturla has said, regardless of the quality of the > current product, it isn't stable. I've personally found it more or less rock solid. Could you say what you mean by "it isn't stable"? Best, Matthew From matthew.brett at gmail.com Sat Feb 18 02:47:25 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 17 Feb 2012 23:47:25 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: Hi, again (sorry), On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire wrote: > On the broader topic of recruitment...sure, cython has a lower barrier > to entry than C++. But there are many, many more C++ developers and > resources out there than cython resources. And it likely will stay > that way for quite some time. On the other hand, in the current development community around numpy, and among the subscribers to this mailing list, I suspect there is more Cython experience than C++ experience. Of course it might be that so-far undiscovered C++ developers are drawn to a C++ rewrite of Numpy. But it that really likely? I can see a C++ developer being drawn to C++ performance library they would use in their C++ applications, but it's harder for me to imagine a C++ programmer being drawn to a Python library because the internals are C++. Best, Matthew From cournape at gmail.com Sat Feb 18 02:55:40 2012 From: cournape at gmail.com (David Cournapeau) Date: Sat, 18 Feb 2012 01:55:40 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: Le 18 f?vr. 2012 06:18, "Christopher Jordan-Squire" a ?crit : > > On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden wrote: > > > > > > Den 18. feb. 2012 kl. 05:01 skrev Jason Grout < jason-sage at creativetrax.com>: > > > >> On 2/17/12 9:54 PM, Sturla Molden wrote: > >>> We would have to write a C++ programming tutorial that is based on Pyton knowledge instead of C knowledge. > >> > >> I personally would love such a thing. It's been a while since I did > >> anything nontrivial on my own in C++. > >> > > > > One example: How do we code multiple return values? > > > > In Python: > > - Return a tuple. > > > > In C: > > - Use pointers (evilness) > > > > In C++: > > - Return a std::tuple, as you would in Python. > > - Use references, as you would in Fortran or Pascal. > > - Use pointers, as you would in C. > > > > C++ textbooks always pick the last... > > > > I would show the first and the second method, and perhaps intentionally forget the last. > > > > Sturla > > > > I can add my own 2 cents about cython vs. C vs. C++, based on summer > coding experiences. > > I was an intern at Enthought, sharing an office with Mark W. (Which > was a treat. I recommend you all quit your day jobs and haunt whatever > office Mark is inhabiting.) I was trying to optimize some code and > that lead to experimenting with both cython and C. > > Dealing with the C internals of numpy was frustrating. Since C doesn't > have templating but numpy kinda needs it, instead python scripts go > over and manually perform templating. Not the most obvious thing. > There were other issues in the background--including C doesn't allow > for abstraction (i.e. easy to read), lots of pointer-fu is required, > and the C API is lightly documented and already plenty difficult. Please understand that the argument is not to maintain a status quo. Lack of API documentation, internals that need significant work are certainly issues. I fail to see how writing in C++ will solve the documentation issues. On the abstraction side of things, let's agree to disagree. Plenty of complex projects are written in both languages to make this a matter of mostly subjective matter. > > On the flip side, cython looked pretty...but I didn't get the > performance gains I wanted, and had to spend a lot of time figuring > out if it was cython, needing to add types, buggy support for numpy, > or actually the algorithm. The C files generated by cython were > enormous and difficult to read. They really weren't meant for human > consumption. As Sturla has said, regardless of the quality of the > current product, it isn't stable. Sturla represents only himself on this issue. Cython is widely held as a successful and very useful tool. Many more projects in the scipy community uses cython compared to C++. And even if it looks friendly > there's magic going on under the hood. Magic means it's hard to > diagnose and fix problems. At least one very smart person has told me > they find cython most useful for wrapping C/C++ libraries and exposing > them to python, which is a far cry from library writing. (Of course > Wes McKinney, a cython evangelist, uses it all over his pandas > library.) I am not very smart, but this is certainly close to what I had in mind as well :) As you know, the lack of clear abstraction between c and c python wrapping is one of the major issue in numpy. Cython is certainly one of the most capable tool out there to avoid tedious reference bug chasing. > > In comparison, there are a number of high quality, performant, > open-source C++ based array libraries out there with very friendly > API's. Things like eigen > (http://eigen.tuxfamily.org/index.php?title=Main_Page) and Armadillo > (http://arma.sourceforge.net/). They seem to have plenty of users and > more devs than eigen is a typical example of code i hope numpy will never be close to. This is again quite subjective, but it also shows that we have quite different ideas on what maintainable/readable code means. Which is of course quite alright. But it means a choice needs to be made. If a majority of people find eigen more readable than a well written C library, then I don't think anyone can reasonably argue against going to c++. > > On the broader topic of recruitment...sure, cython has a lower barrier > to entry than C++. But there are many, many more C++ developers and > resources out there than cython resources. And it likely will stay > that way for quite some I may not have explained it very well: my whole point is that we don't recruite people, where I understand recruit as hiring full time, profesional programmers.We need more people who can casually spend a few hours - typically grad students, scientists with an itch. There is no doubt that more professional programmers know c++ compared to C. But a community project like numpy has different requirements than a "professional" project. David > > -Chris > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Sat Feb 18 03:16:15 2012 From: efiring at hawaii.edu (Eric Firing) Date: Fri, 17 Feb 2012 22:16:15 -1000 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: <4F3F5E4F.3010100@hawaii.edu> On 02/17/2012 09:55 PM, David Cournapeau wrote: > I may not have explained it very well: my whole point is that we don't > recruite people, where I understand recruit as hiring full time, > profesional programmers.We need more people who can casually spend a few > hours - typically grad students, scientists with an itch. There is no > doubt that more professional programmers know c++ compared to C. But a > community project like numpy has different requirements than a > "professional" project. > My sense from the thread so far is that the C++ push is part of the new vision, in which numpy will make the transition to a more "professional" level, with paid developers, and there will no longer be the expectation that "grad students, scientists with an itch" will dive into the innermost guts of the code. The guts will be more like Qt or AGG or 0MQ--solid, documented libraries that just work (I think--I don't really know that much about these examples), so we can take them for granted and worry about other things instead. If that can be accomplished, it is certainly more than fine with me; and if the best way to accomplish that is with C++, so be it. Eric From cjordan1 at uw.edu Sat Feb 18 03:17:29 2012 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Sat, 18 Feb 2012 00:17:29 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: On Fri, Feb 17, 2012 at 11:55 PM, David Cournapeau wrote: > > Le 18 f?vr. 2012 06:18, "Christopher Jordan-Squire" a > ?crit?: > > >> >> On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden wrote: >> > >> > >> > Den 18. feb. 2012 kl. 05:01 skrev Jason Grout >> > : >> > >> >> On 2/17/12 9:54 PM, Sturla Molden wrote: >> >>> We would have to write a C++ programming tutorial that is based on >> >>> Pyton knowledge instead of C knowledge. >> >> >> >> I personally would love such a thing. ?It's been a while since I did >> >> anything nontrivial on my own in C++. >> >> >> > >> > One example: How do we code multiple return values? >> > >> > In Python: >> > - Return a tuple. >> > >> > In C: >> > - Use pointers (evilness) >> > >> > In C++: >> > - Return a std::tuple, as you would in Python. >> > - Use references, as you would in Fortran or Pascal. >> > - Use pointers, as you would in C. >> > >> > C++ textbooks always pick the last... >> > >> > I would show the first and the second method, and perhaps intentionally >> > forget the last. >> > >> > Sturla >> > >> >> I can add my own 2 cents about cython vs. C vs. C++, based on summer >> coding experiences. >> >> I was an intern at Enthought, sharing an office with Mark W. (Which >> was a treat. I recommend you all quit your day jobs and haunt whatever >> office Mark is inhabiting.) I was trying to optimize some code and >> that lead to experimenting with both cython and C. >> >> Dealing with the C internals of numpy was frustrating. Since C doesn't >> have templating but numpy kinda needs it, instead python scripts go >> over and manually perform templating. Not the most obvious thing. >> There were other issues ?in the background--including C doesn't allow >> for abstraction (i.e. easy to read), lots of pointer-fu is required, >> and the C API is lightly documented and already plenty difficult. > > Please understand that the argument is not to maintain a status quo. > > Lack of API documentation, internals that need significant work are > certainly issues. I fail to see how writing in C++ will solve the > documentation issues. > > On the abstraction side of things, let's agree to disagree. Plenty of > complex projects are written in both languages to make this a matter of > mostly subjective matter. > >> >> On the flip side, cython looked pretty...but I didn't get the >> performance gains I wanted, and had to spend a lot of time figuring >> out if it was cython, needing to add types, buggy support for numpy, >> or actually the algorithm. The C files generated by cython were >> enormous and difficult to read. They really weren't meant for human >> consumption. As Sturla has said, regardless of the quality of the >> current product, it isn't stable. > > Sturla represents only himself on this issue. Cython is widely held as a > successful and very useful tool. Many more projects in the scipy community > uses cython compared to C++. > > And even if it looks friendly >> there's magic going on under the hood. Magic means it's hard to >> diagnose and fix problems. At least one very smart person has told me >> they find cython most useful for wrapping C/C++ libraries and exposing >> them to python, which is a far cry from library writing. (Of course >> Wes McKinney, a cython evangelist, uses it all over his pandas >> library.) > > I am not very smart, but this is certainly close to what I had in mind as > well :) As you know, the lack of clear abstraction between c and c python > wrapping is one of the major issue in numpy. Cython is certainly one of the > most capable tool out there to avoid tedious reference bug chasing. > >> >> In comparison, there are a number of high quality, performant, >> open-source C++ based array libraries out there with very friendly >> API's. Things like eigen >> (http://eigen.tuxfamily.org/index.php?title=Main_Page) and Armadillo >> (http://arma.sourceforge.net/). They seem to have plenty of users and >> more devs than > > eigen is a typical example of code i hope numpy will never be close to. This > is again quite subjective, but it also shows that we have quite different > ideas on what maintainable/readable code means. Which is of course quite > alright. But it means a choice needs to be made. If a majority of people > find eigen more readable than a well written C library, then I don't think > anyone can reasonably argue against going to c++. > Fair point, obviously. I have't dug into eigen's internals much. I just like their performance benchmarks and API. Also their cute owl mascot, but I suppose that's not a meaningful standard for future coding practices. >> >> On the broader topic of recruitment...sure, cython has a lower barrier >> to entry than C++. But there are many, many more C++ developers and >> resources out there than cython resources. And it likely will stay >> that way for quite some > > I may not have explained it very well: my whole point is that we don't > recruite people, where I understand recruit as hiring full time, profesional > programmers.We need more people who can casually spend a few hours - > typically grad students, scientists with an itch. There is no doubt that > more professional programmers know c++ compared to C. But a community > project like numpy has different requirements than a "professional" project. > I'm not sure you really mean casually spend a few *hours*, but I get your point. It's important for people to be able to add onto it incrementally as an off-hours hobby. But for itches to scratch, is numpy the realistic place for scientists and grad students to go? As opposed to one of the extension packages, like scipy, sklearn, etc.? If anywhere is going to be more akin to a "professional" project, code-style wise, it seems like the numpy core is the place to do it. -Chris > David > > >> >> -Chris >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From cjordan1 at uw.edu Sat Feb 18 03:18:08 2012 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Sat, 18 Feb 2012 00:18:08 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett wrote: > Hi, > > On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire > wrote: >> On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden wrote: >>> >>> >>> Den 18. feb. 2012 kl. 05:01 skrev Jason Grout : >>> >>>> On 2/17/12 9:54 PM, Sturla Molden wrote: >>>>> We would have to write a C++ programming tutorial that is based on Pyton knowledge instead of C knowledge. >>>> >>>> I personally would love such a thing. ?It's been a while since I did >>>> anything nontrivial on my own in C++. >>>> >>> >>> One example: How do we code multiple return values? >>> >>> In Python: >>> - Return a tuple. >>> >>> In C: >>> - Use pointers (evilness) >>> >>> In C++: >>> - Return a std::tuple, as you would in Python. >>> - Use references, as you would in Fortran or Pascal. >>> - Use pointers, as you would in C. >>> >>> C++ textbooks always pick the last... >>> >>> I would show the first and the second method, and perhaps intentionally forget the last. >>> >>> Sturla >>> > >> On the flip side, cython looked pretty...but I didn't get the >> performance gains I wanted, and had to spend a lot of time figuring >> out if it was cython, needing to add types, buggy support for numpy, >> or actually the algorithm. > > At the time, was the numpy support buggy? ?I personally haven't had > many problems with Cython and numpy. > It's not that the support WAS buggy, it's that it wasn't clear to me what was going on and where my performance bottleneck was. Even after microbenchmarking with ipython, using timeit and prun, and using the cython code visualization tool. Ultimately I don't think it was cython, so perhaps my comment was a bit unfair. But it was unfortunately difficult to verify that. Of course, as you say, diagnosing and solving such issues would become easier to resolve with more cython experience. >> The C files generated by cython were >> enormous and difficult to read. They really weren't meant for human >> consumption. > > Yes, it takes some practice to get used to what Cython will do, and > how to optimize the output. > >> As Sturla has said, regardless of the quality of the >> current product, it isn't stable. > > I've personally found it more or less rock solid. ?Could you say what > you mean by "it isn't stable"? > I just meant what Sturla said, nothing more: "Cython is still 0.16, it is still unfinished. We cannot base NumPy on an unfinished compiler." -Chris > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ralf.gommers at googlemail.com Sat Feb 18 04:13:32 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 18 Feb 2012 10:13:32 +0100 Subject: [Numpy-discussion] test errors on deprecation/runtime warnings In-Reply-To: References: Message-ID: 2012/2/17 St?fan van der Walt > Hi Ralf > > On Thu, Feb 16, 2012 at 11:05 AM, Ralf Gommers > wrote: > > Last week we merged https://github.com/numpy/numpy/pull/201, which > causes > > DeprecationWarning's and RuntimeWarning's to be converted to errors if > they > > occur when running the test suite. > > It looks like this change affects other packages, too, It does, which is why I wanted to bring it up here. > which may legitimately raise RuntimeWarnings while running their test > suites > (unless I read the patch wrong). Would it be an option to rather add > a flag (False by default) to enable this behaviour, and enable it > inside of numpy.test() ? > Well, the idea is that this behavior is the correct one for all packages. It calls attention to those RuntimeWarnings, which may only occur on certain platforms. If they're legitimate, you silence them in the test suite of that package. If not, you fix them. Would you agree with that? Or would you prefer to just ignore DeprecationWarnings and/or RuntimeWarnings in skimage for example? Note that the changed behavior would only be visible for people running numpy master. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sat Feb 18 04:46:25 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 18 Feb 2012 10:46:25 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> Message-ID: On Thu, Feb 16, 2012 at 11:39 PM, Travis Oliphant wrote: > Mark Wiebe and I have been discussing off and on (as well as talking with > Charles) a good way forward to balance two competing desires: > > * addition of new features that are needed in NumPy > * improving the code-base generally and moving towards a more > maintainable NumPy > > I know there are load voices for just focusing on the second of these and > avoiding the first until we have finished that. I recognize the need to > improve the code base, but I will also be pushing for improvements to the > feature-set and user experience in the process. > > As a result, I am proposing a rough outline for releases over the next > year: > > * NumPy 1.7 to come out as soon as the serious bugs can be > eliminated. Bryan, Francesc, Mark, and I are able to help triage some of > those. > > * NumPy 1.8 to come out in July which will have as many > ABI-compatible feature enhancements as we can add while improving test > coverage and code cleanup. I will post to this list more details of what > we plan to address with it later. Included for possible inclusion are: > * resolving the NA/missing-data issues > * finishing group-by > * incorporating the start of label arrays > * incorporating a meta-object > * a few new dtypes (variable-length string, varialbe-length unicode > and an enum type) > * adding ufunc support for flexible dtypes and possibly structured > arrays > * allowing generalized ufuncs to work on more kinds of arrays > besides just contiguous > * improving the ability for NumPy to receive JIT-generated function > pointers for ufuncs and other calculation opportunities > * adding "filters" to Input and Output > * simple computed fields for dtypes > * accepting a Data-Type specification as a class or JSON file > * work towards improving the dtype-addition mechanism > For some of these things it's not entirely (or at all, what's a meta-object?) clear to me what they mean or how they would work. How do you plan to go about working on these features? One NEP per feature? Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Sat Feb 18 06:25:26 2012 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 18 Feb 2012 11:25:26 +0000 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: On Sat, Feb 18, 2012 at 04:54, Charles R Harris wrote: > I found this , which references 0mq (used by ipython) as an example of a C++ > library with a C interface. It seems enums can have different sizes in > C/C++, so that is something to watch. One of the ways they manage to do this is by scrupulously avoiding exceptions even in the internal, never-touches-C zone. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From cournape at gmail.com Sat Feb 18 08:38:54 2012 From: cournape at gmail.com (David Cournapeau) Date: Sat, 18 Feb 2012 07:38:54 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: Le 18 f?vr. 2012 11:25, "Robert Kern" a ?crit : > > On Sat, Feb 18, 2012 at 04:54, Charles R Harris > wrote: > > > I found this , which references 0mq (used by ipython) as an example of a C++ > > library with a C interface. It seems enums can have different sizes in > > C/C++, so that is something to watch. > > One of the ways they manage to do this is by scrupulously avoiding > exceptions even in the internal, never-touches-C zone. I took a superficial look at zeromq 2.x sources: it looks like they don't use much of the stl (beyond vector and some trivial usages of algorithm). I wonder if this is linked ? FWIW, I would be fine with using such a subset in numpy. David > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From marlin_rowley at hotmail.com Sat Feb 18 08:48:51 2012 From: marlin_rowley at hotmail.com (Marlin Rowley) Date: Sat, 18 Feb 2012 07:48:51 -0600 Subject: [Numpy-discussion] .. Message-ID: Madam: http://flooring-direct.com/folder1946/httpwww.timerteam332.php?oculuckyid=50 Sat, 18 Feb 2012 14:48:50 _____________________ "Hooper merely glanced at Bill." (c) Dale wumaofen -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat Feb 18 10:20:30 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 18 Feb 2012 10:20:30 -0500 Subject: [Numpy-discussion] The end of numpy as we know it ? Message-ID: (on a ambiguous day, pessimistic or optimistic?) Numpy is a monster written by a bunch of amateurs (engineers and scientists), with a glacial pace of development. If we want to make any progress to the world dominance of python in science, we need to go professionally about it. First we need to streamline the development infrastructure and get the distribution under control. Then we need to streamline the code so the bunch of amateurs doesn't understand what's going on and cannot effectively threaten a fork anymore. Then, we need to get our version of nans, labeled arrays and other tools in so we have the right hooks to compete downstream. Of course, "This may also mean different business models and licensing around some of the NumPy-related code that the company writes." To a glorious professional future and long live python in science. ------- unsigned From sturla at molden.no Sat Feb 18 10:47:07 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 18 Feb 2012 16:47:07 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: <524391C0-B41D-4C80-8D23-D2CEF9EF2F32@molden.no> Den 18. feb. 2012 kl. 14:38 skrev David Cournapeau : > I took a superficial look at zeromq 2.x sources: it looks like they don't use much of the stl (beyond vector and some trivial usages of algorithm). I wonder if this is linked ? > > FWIW, I would be fine with using such a subset in numpy. > > > I think basing it on STL and perhaps Boost would be fine. The problem is not exposing C++ to C, it is mixing MSVC and MinGW. But that problem exists regardless of what we do. It's not a fair argument against C++. Sturla -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Sat Feb 18 10:52:27 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Sat, 18 Feb 2012 16:52:27 +0100 Subject: [Numpy-discussion] change the mask state of one element in a masked array Message-ID: Dear all, I built a new empty masked array: In [91]: a=np.ma.empty((2,5)) In [92]: a Out[92]: masked_array(data = [[ 1.20569155e-312 3.34730819e-316 1.13580079e-316 1.11459945e-316 9.69610549e-317] [ 6.94900258e-310 8.48292532e-317 6.94900258e-310 9.76397825e-317 6.94900258e-310]], mask = False, fill_value = 1e+20) as you see, the mask for all the elements are false. so how can I set for some elements to masked elements (mask state as true)? let's say, I want a[0,0] to be masked. thanks & cheers, Chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Sat Feb 18 10:56:50 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 18 Feb 2012 16:56:50 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: > > I just meant what Sturla said, nothing more: > > "Cython is still 0.16, it is still unfinished. We cannot base NumPy on > an unfinished compiler." > Albeit Cython has a special syntax for NumPy arrays, we are talking about implementation of NumPy, not using it. I would not consider Cython for this before e.g. memoryviews have been stable for a long period. The subset of Cython we could safely use is not better than plain C. Sturla -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Sat Feb 18 11:12:36 2012 From: alan.isaac at gmail.com (Alan G Isaac) Date: Sat, 18 Feb 2012 11:12:36 -0500 Subject: [Numpy-discussion] The end of numpy as we know it ? In-Reply-To: References: Message-ID: <4F3FCDF4.1050508@gmail.com> On 2/18/2012 10:20 AM, josef.pktd at gmail.com wrote: > we need to streamline the code so the bunch of amateurs doesn't > understand what's going on and cannot effectively threaten a fork > anymore. I don't mean to take today's peculiar post too seriously, and your opening line undermines that. But of all the oddities in it, this one seemed most peculiar. How does "stream-lined" code written for maintainability (i.e., with helpful comments and tests) become *less* accessible to amateurs?? That just does not match my experience in the least. And btw, as someone who has (backwardsly enough) translated a little C++ and C to Python, I found C++ much easier to grok without leaving my Python perspective. (The technical questions are entirely separate, of course.) Alan Isaac From shish at keba.be Sat Feb 18 11:19:49 2012 From: shish at keba.be (Olivier Delalleau) Date: Sat, 18 Feb 2012 11:19:49 -0500 Subject: [Numpy-discussion] change the mask state of one element in a masked array In-Reply-To: References: Message-ID: There may be a better way to do it, but you can first do: a.mask = np.zeros_like(a) then afterwards e.g. a.mask[0, 0] = True will work. -=- Olivier Le 18 f?vrier 2012 10:52, Chao YUE a ?crit : > Dear all, > > I built a new empty masked array: > > In [91]: a=np.ma.empty((2,5)) > > In [92]: a > Out[92]: > masked_array(data = > [[ 1.20569155e-312 3.34730819e-316 1.13580079e-316 1.11459945e-316 > 9.69610549e-317] > [ 6.94900258e-310 8.48292532e-317 6.94900258e-310 9.76397825e-317 > 6.94900258e-310]], > mask = > False, > fill_value = 1e+20) > > > as you see, the mask for all the elements are false. so how can I set for > some elements to masked elements (mask state as true)? > let's say, I want a[0,0] to be masked. > > thanks & cheers, > > Chao > > -- > > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > > ************************************************************************************ > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Sat Feb 18 11:24:58 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 18 Feb 2012 17:24:58 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: <0B19F941-9E41-4E81-A4A6-70BCE944AECF@molden.no> > > Albeit Cython has a special syntax for NumPy arrays, we are talking about implementation of NumPy, not using it. I would not consider Cython for this before e.g. memoryviews have been stable for a long period. The subset of Cython we could safely use is not better than plain C. > > If we want something more readable than C or C++, that looks like Python, Cython is not the only option. Another is RPython, which is the subset of Python used for PyPy. It can be translated to various languages, including C, Java and .NET. Since RPython is valid Python, it an also be debugged with CPython. Code translated by RPython is extremely fast (often "faster than C" due to human limitation in C coding) and RPython is a stabile compiler. http://doc.pypy.org/en/latest/coding-guide.html#id1 http://doc.pypy.org/en/latest/translation.html http://olliwang.com/2009/12/20/aes-implementation-in-rpython/ Sturla From sturla at molden.no Sat Feb 18 11:27:19 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 18 Feb 2012 17:27:19 +0100 Subject: [Numpy-discussion] The end of numpy as we know it ? In-Reply-To: <4F3FCDF4.1050508@gmail.com> References: <4F3FCDF4.1050508@gmail.com> Message-ID: <11E39996-4857-4DFE-8A98-323DF443382B@molden.no> Den 18. feb. 2012 kl. 17:12 skrev Alan G Isaac : > > > How does "stream-lined" code written for maintainability > (i.e., with helpful comments and tests) become *less* > accessible to amateurs?? I think you missed the irony. Sturla From travis at continuum.io Sat Feb 18 11:32:07 2012 From: travis at continuum.io (Travis Oliphant) Date: Sat, 18 Feb 2012 10:32:07 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> Message-ID: <9379E6A5-F691-4C35-93B1-3782F1D15648@continuum.io> Yes. Basically, one NEP per feature. Some of them might be merged. The NEP will be an outline and overview and then fleshed out as the code is developed in a branch. Some of the NEPs will be more detailed than others a first of course. I just wanted to provide a preview about the kind of things I see needed in the code. The details will emerge in the coming weeks and months. Thanks, Travis -- Travis Oliphant (on a mobile) 512-826-7480 On Feb 18, 2012, at 3:46 AM, Ralf Gommers wrote: > > > On Thu, Feb 16, 2012 at 11:39 PM, Travis Oliphant wrote: > Mark Wiebe and I have been discussing off and on (as well as talking with Charles) a good way forward to balance two competing desires: > > * addition of new features that are needed in NumPy > * improving the code-base generally and moving towards a more maintainable NumPy > > I know there are load voices for just focusing on the second of these and avoiding the first until we have finished that. I recognize the need to improve the code base, but I will also be pushing for improvements to the feature-set and user experience in the process. > > As a result, I am proposing a rough outline for releases over the next year: > > * NumPy 1.7 to come out as soon as the serious bugs can be eliminated. Bryan, Francesc, Mark, and I are able to help triage some of those. > > * NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: > * resolving the NA/missing-data issues > * finishing group-by > * incorporating the start of label arrays > * incorporating a meta-object > * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) > * adding ufunc support for flexible dtypes and possibly structured arrays > * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous > * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities > * adding "filters" to Input and Output > * simple computed fields for dtypes > * accepting a Data-Type specification as a class or JSON file > * work towards improving the dtype-addition mechanism > > For some of these things it's not entirely (or at all, what's a meta-object?) clear to me what they mean or how they would work. How do you plan to go about working on these features? One NEP per feature? > > Ralf > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Sat Feb 18 11:52:44 2012 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 18 Feb 2012 10:52:44 -0600 Subject: [Numpy-discussion] The end of numpy as we know it ? In-Reply-To: <11E39996-4857-4DFE-8A98-323DF443382B@molden.no> References: <4F3FCDF4.1050508@gmail.com> <11E39996-4857-4DFE-8A98-323DF443382B@molden.no> Message-ID: On Saturday, February 18, 2012, Sturla Molden wrote: > > > Den 18. feb. 2012 kl. 17:12 skrev Alan G Isaac > >: > > > > > > > How does "stream-lined" code written for maintainability > > (i.e., with helpful comments and tests) become *less* > > accessible to amateurs?? > > > I think you missed the irony. > > Sturla Took me couple reads. Must be too early in the morning for me. For those who needs a clue, the last few lines seem to suggest that the only way forward is to relicense numpy so that it could be sold. This is obviously ridiculous and a give-away to the fact that everything else in the email was sarcastic. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Sat Feb 18 12:06:14 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 18 Feb 2012 09:06:14 -0800 Subject: [Numpy-discussion] The end of numpy as we know it ? In-Reply-To: References: <4F3FCDF4.1050508@gmail.com> <11E39996-4857-4DFE-8A98-323DF443382B@molden.no> Message-ID: <4F3FDA86.2020707@astro.uio.no> On 02/18/2012 08:52 AM, Benjamin Root wrote: > > > On Saturday, February 18, 2012, Sturla Molden wrote: > > > > Den 18. feb. 2012 kl. 17:12 skrev Alan G Isaac >: > > > > > > > How does "stream-lined" code written for maintainability > > (i.e., with helpful comments and tests) become *less* > > accessible to amateurs?? > > > I think you missed the irony. > > Sturla > > > Took me couple reads. Must be too early in the morning for me. > > For those who needs a clue, the last few lines seem to suggest that the > only way forward is to relicense numpy so that it could be sold. This > is obviously ridiculous and a give-away to the fact that everything else > in the email was sarcastic. No, it was a quotation from Travis' blog: http://technicaldiscovery.blogspot.com/ (I think people should just get a grip on themselves...worst case scenario *ever* (and I highly doubt it) is a fork, and even that may well be better than the status quo) Dag From efiring at hawaii.edu Sat Feb 18 12:15:14 2012 From: efiring at hawaii.edu (Eric Firing) Date: Sat, 18 Feb 2012 07:15:14 -1000 Subject: [Numpy-discussion] change the mask state of one element in a masked array In-Reply-To: References: Message-ID: <4F3FDCA2.7090900@hawaii.edu> On 02/18/2012 05:52 AM, Chao YUE wrote: > Dear all, > > I built a new empty masked array: > > In [91]: a=np.ma.empty((2,5)) Of course this only makes sense if you are going to immediately populate the array. > > In [92]: a > Out[92]: > masked_array(data = > [[ 1.20569155e-312 3.34730819e-316 1.13580079e-316 1.11459945e-316 > 9.69610549e-317] > [ 6.94900258e-310 8.48292532e-317 6.94900258e-310 9.76397825e-317 > 6.94900258e-310]], > mask = > False, > fill_value = 1e+20) > > > as you see, the mask for all the elements are false. so how can I set > for some elements to masked elements (mask state as true)? > let's say, I want a[0,0] to be masked. a[0,0] = np.ma.masked Eric > > thanks & cheers, > > Chao > > -- > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > ************************************************************************************ > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From pav at iki.fi Sat Feb 18 12:27:50 2012 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 18 Feb 2012 18:27:50 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <0B19F941-9E41-4E81-A4A6-70BCE944AECF@molden.no> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> <0B19F941-9E41-4E81-A4A6-70BCE944AECF@molden.no> Message-ID: 18.02.2012 17:24, Sturla Molden kirjoitti: [clip] > If we want something more readable than C or C++, that looks like Python, > Cython is not the only option. Another is RPython, which is the subset [clip] Except that AFAIK integrating it with CPython efficiently or providing C APIs with it is not that much fun. -- Pauli Virtanen From matthew.brett at gmail.com Sat Feb 18 14:14:47 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 18 Feb 2012 11:14:47 -0800 Subject: [Numpy-discussion] The end of numpy as we know it ? In-Reply-To: <4F3FDA86.2020707@astro.uio.no> References: <4F3FCDF4.1050508@gmail.com> <11E39996-4857-4DFE-8A98-323DF443382B@molden.no> <4F3FDA86.2020707@astro.uio.no> Message-ID: Hi, On Sat, Feb 18, 2012 at 9:06 AM, Dag Sverre Seljebotn wrote: > On 02/18/2012 08:52 AM, Benjamin Root wrote: >> >> >> On Saturday, February 18, 2012, Sturla Molden wrote: >> >> >> >> ? ? Den 18. feb. 2012 kl. 17:12 skrev Alan G Isaac > ? ? >: >> >> ? ? ?> >> ? ? ?> >> ? ? ?> How does "stream-lined" code written for maintainability >> ? ? ?> (i.e., with helpful comments and tests) become *less* >> ? ? ?> accessible to amateurs?? >> >> >> ? ? I think you missed the irony. >> >> ? ? Sturla >> >> >> Took me couple reads. ?Must be too early in the morning for me. >> >> For those who needs a clue, the last few lines seem to suggest that the >> only way forward is to relicense numpy so that it could be sold. ?This >> is obviously ridiculous and a give-away to the fact that everything else >> in the email was sarcastic. > > No, it was a quotation from Travis' blog: > > http://technicaldiscovery.blogspot.com/ Took me a couple of reads too. But I understand now, I think. I think Josef was indeed being ironic, and using the quote as part of the irony. > (I think people should just get a grip on themselves...worst case > scenario *ever* (and I highly doubt it) is a fork, and even that may > well be better than the status quo) This is nicely put, but very depressing. You say: "people should just get a grip on themselves" and we might also say: "shut up and stop whining". But, an environment like that is rich food for apathy, hostility and paranoia. Let's hope we're up to the challenge. Best, Matthew From matthew.brett at gmail.com Sat Feb 18 14:21:25 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 18 Feb 2012 11:21:25 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: Hi. On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire wrote: > On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett wrote: >> Hi, >> >> On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire >> wrote: >>> On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden wrote: >>>> >>>> >>>> Den 18. feb. 2012 kl. 05:01 skrev Jason Grout : >>>> >>>>> On 2/17/12 9:54 PM, Sturla Molden wrote: >>>>>> We would have to write a C++ programming tutorial that is based on Pyton knowledge instead of C knowledge. >>>>> >>>>> I personally would love such a thing. ?It's been a while since I did >>>>> anything nontrivial on my own in C++. >>>>> >>>> >>>> One example: How do we code multiple return values? >>>> >>>> In Python: >>>> - Return a tuple. >>>> >>>> In C: >>>> - Use pointers (evilness) >>>> >>>> In C++: >>>> - Return a std::tuple, as you would in Python. >>>> - Use references, as you would in Fortran or Pascal. >>>> - Use pointers, as you would in C. >>>> >>>> C++ textbooks always pick the last... >>>> >>>> I would show the first and the second method, and perhaps intentionally forget the last. >>>> >>>> Sturla >>>> >> >>> On the flip side, cython looked pretty...but I didn't get the >>> performance gains I wanted, and had to spend a lot of time figuring >>> out if it was cython, needing to add types, buggy support for numpy, >>> or actually the algorithm. >> >> At the time, was the numpy support buggy? ?I personally haven't had >> many problems with Cython and numpy. >> > > It's not that the support WAS buggy, it's that it wasn't clear to me > what was going on and where my performance bottleneck was. Even after > microbenchmarking with ipython, using timeit and prun, and using the > cython code visualization tool. Ultimately I don't think it was > cython, so perhaps my comment was a bit unfair. But it was > unfortunately difficult to verify that. Of course, as you say, > diagnosing and solving such issues would become easier to resolve with > more cython experience. > >>> The C files generated by cython were >>> enormous and difficult to read. They really weren't meant for human >>> consumption. >> >> Yes, it takes some practice to get used to what Cython will do, and >> how to optimize the output. >> >>> As Sturla has said, regardless of the quality of the >>> current product, it isn't stable. >> >> I've personally found it more or less rock solid. ?Could you say what >> you mean by "it isn't stable"? >> > > I just meant what Sturla said, nothing more: > > "Cython is still 0.16, it is still unfinished. We cannot base NumPy on > an unfinished compiler." Y'all mean, it has a zero at the beginning of the version number and it is still adding new features? Yes, that is correct, but it seems more reasonable to me to phrase that as 'active development' rather than 'unstable', because they take considerable care to be backwards compatible, have a large automated Cython test suite, and a major stress-tester in the Sage test suite. Best, Matthew From josef.pktd at gmail.com Sat Feb 18 14:52:07 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 18 Feb 2012 14:52:07 -0500 Subject: [Numpy-discussion] The end of numpy as we know it ? In-Reply-To: References: <4F3FCDF4.1050508@gmail.com> <11E39996-4857-4DFE-8A98-323DF443382B@molden.no> <4F3FDA86.2020707@astro.uio.no> Message-ID: On Sat, Feb 18, 2012 at 2:14 PM, Matthew Brett wrote: > Hi, > > On Sat, Feb 18, 2012 at 9:06 AM, Dag Sverre Seljebotn > wrote: >> On 02/18/2012 08:52 AM, Benjamin Root wrote: >>> >>> >>> On Saturday, February 18, 2012, Sturla Molden wrote: >>> >>> >>> >>> ? ? Den 18. feb. 2012 kl. 17:12 skrev Alan G Isaac >> ? ? >: >>> >>> ? ? ?> >>> ? ? ?> >>> ? ? ?> How does "stream-lined" code written for maintainability >>> ? ? ?> (i.e., with helpful comments and tests) become *less* >>> ? ? ?> accessible to amateurs?? >>> >>> >>> ? ? I think you missed the irony. >>> >>> ? ? Sturla >>> >>> >>> Took me couple reads. ?Must be too early in the morning for me. >>> >>> For those who needs a clue, the last few lines seem to suggest that the >>> only way forward is to relicense numpy so that it could be sold. ?This >>> is obviously ridiculous and a give-away to the fact that everything else >>> in the email was sarcastic. >> >> No, it was a quotation from Travis' blog: >> >> http://technicaldiscovery.blogspot.com/ > > Took me a couple of reads too. ?But I understand now, I think. ?I > think Josef was indeed being ironic, and using the quote as part of > the irony. > >> (I think people should just get a grip on themselves...worst case >> scenario *ever* (and I highly doubt it) is a fork, and even that may >> well be better than the status quo) > > This is nicely put, but very depressing. ?You say: > > "people should just get a grip on themselves" > > and we might also say: > > "shut up and stop whining". > > But, an environment like that is rich food for apathy, hostility and > paranoia. ?Let's hope we're up to the challenge. > > Best, > > Matthew I'm an economist by training. For technical issues I completely rely on the judgement of Charles and of David (who managed to reduce the number of installation problems on Windows reported to the mailing lists to essentially zero). My only contact with C++ was building quantlib and reading some source of it, with C it's not more. Josef http://cppdepend.wordpress.com/2009/10/08/is-quantlib-over-engineered/ > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Sat Feb 18 15:29:47 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 18 Feb 2012 13:29:47 -0700 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <3226706D-A07C-42EC-B2CF-260B6BF0F08F@molden.no> Message-ID: On Fri, Feb 17, 2012 at 10:16 PM, Sturla Molden wrote: > > > Den 18. feb. 2012 kl. 05:56 skrev Charles R Harris < > charlesr.harris at gmail.com>: > > >> > But won't a C++ wrapper catch that? > > > A try-catch block with MSVC will register an SEH with the operating > system. GCC (g++) implements exceptions without SEH. What happens if GCC > code tries to catch a std::bad_alloc? Windows intervenes and sends control > to a registered SEH. So the flow of control jumps out of GCC's hands, and > goes to some catch or __except block set by MSVC instead. And now the > stack is FUBAR... But this can always happen when you mix MSVC and MinGW. > Even pure C code can set an SEH with MSVC, so it's not a C++ issue. You > cannot wrap in a way that protects you from an intervention by the > operating system. It's better to stick with MS and Intel compilers on > Windows. MinGW code must execute in an SEH free environment. > > Here's a link with some current commentson mingw-64. I have the impression that things are moving (slowly) towards interoperability. Chuck > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Feb 18 15:35:02 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 18 Feb 2012 13:35:02 -0700 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett wrote: > Hi. > > On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire > wrote: > > On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett > wrote: > >> Hi, > >> > >> On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire > >> wrote: > >>> On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden > wrote: > >>>> > >>>> > >>>> Den 18. feb. 2012 kl. 05:01 skrev Jason Grout < > jason-sage at creativetrax.com>: > >>>> > >>>>> On 2/17/12 9:54 PM, Sturla Molden wrote: > >>>>>> We would have to write a C++ programming tutorial that is based on > Pyton knowledge instead of C knowledge. > >>>>> > >>>>> I personally would love such a thing. It's been a while since I did > >>>>> anything nontrivial on my own in C++. > >>>>> > >>>> > >>>> One example: How do we code multiple return values? > >>>> > >>>> In Python: > >>>> - Return a tuple. > >>>> > >>>> In C: > >>>> - Use pointers (evilness) > >>>> > >>>> In C++: > >>>> - Return a std::tuple, as you would in Python. > >>>> - Use references, as you would in Fortran or Pascal. > >>>> - Use pointers, as you would in C. > >>>> > >>>> C++ textbooks always pick the last... > >>>> > >>>> I would show the first and the second method, and perhaps > intentionally forget the last. > >>>> > >>>> Sturla > >>>> > >> > >>> On the flip side, cython looked pretty...but I didn't get the > >>> performance gains I wanted, and had to spend a lot of time figuring > >>> out if it was cython, needing to add types, buggy support for numpy, > >>> or actually the algorithm. > >> > >> At the time, was the numpy support buggy? I personally haven't had > >> many problems with Cython and numpy. > >> > > > > It's not that the support WAS buggy, it's that it wasn't clear to me > > what was going on and where my performance bottleneck was. Even after > > microbenchmarking with ipython, using timeit and prun, and using the > > cython code visualization tool. Ultimately I don't think it was > > cython, so perhaps my comment was a bit unfair. But it was > > unfortunately difficult to verify that. Of course, as you say, > > diagnosing and solving such issues would become easier to resolve with > > more cython experience. > > > >>> The C files generated by cython were > >>> enormous and difficult to read. They really weren't meant for human > >>> consumption. > >> > >> Yes, it takes some practice to get used to what Cython will do, and > >> how to optimize the output. > >> > >>> As Sturla has said, regardless of the quality of the > >>> current product, it isn't stable. > >> > >> I've personally found it more or less rock solid. Could you say what > >> you mean by "it isn't stable"? > >> > > > > I just meant what Sturla said, nothing more: > > > > "Cython is still 0.16, it is still unfinished. We cannot base NumPy on > > an unfinished compiler." > > Y'all mean, it has a zero at the beginning of the version number and > it is still adding new features? Yes, that is correct, but it seems > more reasonable to me to phrase that as 'active development' rather > than 'unstable', because they take considerable care to be backwards > compatible, have a large automated Cython test suite, and a major > stress-tester in the Sage test suite. > > Matthew, No one in their right mind would build a large performance library using Cython, it just isn't the right tool. For what it was designed for - wrapping existing c code or writing small and simple things close to Python - it does very well, but it was never designed for making core C/C++ libraries and in that role it just gets in the way. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sat Feb 18 15:39:31 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 18 Feb 2012 12:39:31 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: Hi, On Sat, Feb 18, 2012 at 12:35 PM, Charles R Harris wrote: > > > On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett > wrote: >> >> Hi. >> >> On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire >> wrote: >> > On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett >> > wrote: >> >> Hi, >> >> >> >> On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire >> >> wrote: >> >>> On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden >> >>> wrote: >> >>>> >> >>>> >> >>>> Den 18. feb. 2012 kl. 05:01 skrev Jason Grout >> >>>> : >> >>>> >> >>>>> On 2/17/12 9:54 PM, Sturla Molden wrote: >> >>>>>> We would have to write a C++ programming tutorial that is based on >> >>>>>> Pyton knowledge instead of C knowledge. >> >>>>> >> >>>>> I personally would love such a thing. ?It's been a while since I did >> >>>>> anything nontrivial on my own in C++. >> >>>>> >> >>>> >> >>>> One example: How do we code multiple return values? >> >>>> >> >>>> In Python: >> >>>> - Return a tuple. >> >>>> >> >>>> In C: >> >>>> - Use pointers (evilness) >> >>>> >> >>>> In C++: >> >>>> - Return a std::tuple, as you would in Python. >> >>>> - Use references, as you would in Fortran or Pascal. >> >>>> - Use pointers, as you would in C. >> >>>> >> >>>> C++ textbooks always pick the last... >> >>>> >> >>>> I would show the first and the second method, and perhaps >> >>>> intentionally forget the last. >> >>>> >> >>>> Sturla >> >>>> >> >> >> >>> On the flip side, cython looked pretty...but I didn't get the >> >>> performance gains I wanted, and had to spend a lot of time figuring >> >>> out if it was cython, needing to add types, buggy support for numpy, >> >>> or actually the algorithm. >> >> >> >> At the time, was the numpy support buggy? ?I personally haven't had >> >> many problems with Cython and numpy. >> >> >> > >> > It's not that the support WAS buggy, it's that it wasn't clear to me >> > what was going on and where my performance bottleneck was. Even after >> > microbenchmarking with ipython, using timeit and prun, and using the >> > cython code visualization tool. Ultimately I don't think it was >> > cython, so perhaps my comment was a bit unfair. But it was >> > unfortunately difficult to verify that. Of course, as you say, >> > diagnosing and solving such issues would become easier to resolve with >> > more cython experience. >> > >> >>> The C files generated by cython were >> >>> enormous and difficult to read. They really weren't meant for human >> >>> consumption. >> >> >> >> Yes, it takes some practice to get used to what Cython will do, and >> >> how to optimize the output. >> >> >> >>> As Sturla has said, regardless of the quality of the >> >>> current product, it isn't stable. >> >> >> >> I've personally found it more or less rock solid. ?Could you say what >> >> you mean by "it isn't stable"? >> >> >> > >> > I just meant what Sturla said, nothing more: >> > >> > "Cython is still 0.16, it is still unfinished. We cannot base NumPy on >> > an unfinished compiler." >> >> Y'all mean, it has a zero at the beginning of the version number and >> it is still adding new features? ?Yes, that is correct, but it seems >> more reasonable to me to phrase that as 'active development' rather >> than 'unstable', because they take considerable care to be backwards >> compatible, have a large automated Cython test suite, and a major >> stress-tester in the Sage test suite. >> > > Matthew, > > No one in their right mind would build a large performance library using > Cython, it just isn't the right tool. For what it was designed for - > wrapping existing c code or writing small and simple things close to Python > - it does very well, but it was never designed for making core C/C++ > libraries and in that role it just gets in the way. I believe the proposal is to refactor the lowest levels in pure C and move the some or most of the library superstructure to Cython. Best, Matthew From charlesr.harris at gmail.com Sat Feb 18 15:45:24 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 18 Feb 2012 13:45:24 -0700 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: On Sat, Feb 18, 2012 at 1:39 PM, Matthew Brett wrote: > Hi, > > On Sat, Feb 18, 2012 at 12:35 PM, Charles R Harris > wrote: > > > > > > On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett > > > wrote: > >> > >> Hi. > >> > >> On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire > >> wrote: > >> > On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett > >> > wrote: > >> >> Hi, > >> >> > >> >> On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire > >> >> wrote: > >> >>> On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden > >> >>> wrote: > >> >>>> > >> >>>> > >> >>>> Den 18. feb. 2012 kl. 05:01 skrev Jason Grout > >> >>>> : > >> >>>> > >> >>>>> On 2/17/12 9:54 PM, Sturla Molden wrote: > >> >>>>>> We would have to write a C++ programming tutorial that is based > on > >> >>>>>> Pyton knowledge instead of C knowledge. > >> >>>>> > >> >>>>> I personally would love such a thing. It's been a while since I > did > >> >>>>> anything nontrivial on my own in C++. > >> >>>>> > >> >>>> > >> >>>> One example: How do we code multiple return values? > >> >>>> > >> >>>> In Python: > >> >>>> - Return a tuple. > >> >>>> > >> >>>> In C: > >> >>>> - Use pointers (evilness) > >> >>>> > >> >>>> In C++: > >> >>>> - Return a std::tuple, as you would in Python. > >> >>>> - Use references, as you would in Fortran or Pascal. > >> >>>> - Use pointers, as you would in C. > >> >>>> > >> >>>> C++ textbooks always pick the last... > >> >>>> > >> >>>> I would show the first and the second method, and perhaps > >> >>>> intentionally forget the last. > >> >>>> > >> >>>> Sturla > >> >>>> > >> >> > >> >>> On the flip side, cython looked pretty...but I didn't get the > >> >>> performance gains I wanted, and had to spend a lot of time figuring > >> >>> out if it was cython, needing to add types, buggy support for numpy, > >> >>> or actually the algorithm. > >> >> > >> >> At the time, was the numpy support buggy? I personally haven't had > >> >> many problems with Cython and numpy. > >> >> > >> > > >> > It's not that the support WAS buggy, it's that it wasn't clear to me > >> > what was going on and where my performance bottleneck was. Even after > >> > microbenchmarking with ipython, using timeit and prun, and using the > >> > cython code visualization tool. Ultimately I don't think it was > >> > cython, so perhaps my comment was a bit unfair. But it was > >> > unfortunately difficult to verify that. Of course, as you say, > >> > diagnosing and solving such issues would become easier to resolve with > >> > more cython experience. > >> > > >> >>> The C files generated by cython were > >> >>> enormous and difficult to read. They really weren't meant for human > >> >>> consumption. > >> >> > >> >> Yes, it takes some practice to get used to what Cython will do, and > >> >> how to optimize the output. > >> >> > >> >>> As Sturla has said, regardless of the quality of the > >> >>> current product, it isn't stable. > >> >> > >> >> I've personally found it more or less rock solid. Could you say what > >> >> you mean by "it isn't stable"? > >> >> > >> > > >> > I just meant what Sturla said, nothing more: > >> > > >> > "Cython is still 0.16, it is still unfinished. We cannot base NumPy on > >> > an unfinished compiler." > >> > >> Y'all mean, it has a zero at the beginning of the version number and > >> it is still adding new features? Yes, that is correct, but it seems > >> more reasonable to me to phrase that as 'active development' rather > >> than 'unstable', because they take considerable care to be backwards > >> compatible, have a large automated Cython test suite, and a major > >> stress-tester in the Sage test suite. > >> > > > > Matthew, > > > > No one in their right mind would build a large performance library using > > Cython, it just isn't the right tool. For what it was designed for - > > wrapping existing c code or writing small and simple things close to > Python > > - it does very well, but it was never designed for making core C/C++ > > libraries and in that role it just gets in the way. > > I believe the proposal is to refactor the lowest levels in pure C and > move the some or most of the library superstructure to Cython. > Go for it. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sat Feb 18 16:02:30 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 18 Feb 2012 13:02:30 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: Hi, On Sat, Feb 18, 2012 at 12:45 PM, Charles R Harris wrote: > > > On Sat, Feb 18, 2012 at 1:39 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Sat, Feb 18, 2012 at 12:35 PM, Charles R Harris >> wrote: >> > >> > >> > On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett >> > >> > wrote: >> >> >> >> Hi. >> >> >> >> On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire >> >> wrote: >> >> > On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett >> >> > wrote: >> >> >> Hi, >> >> >> >> >> >> On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire >> >> >> wrote: >> >> >>> On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden >> >> >>> wrote: >> >> >>>> >> >> >>>> >> >> >>>> Den 18. feb. 2012 kl. 05:01 skrev Jason Grout >> >> >>>> : >> >> >>>> >> >> >>>>> On 2/17/12 9:54 PM, Sturla Molden wrote: >> >> >>>>>> We would have to write a C++ programming tutorial that is based >> >> >>>>>> on >> >> >>>>>> Pyton knowledge instead of C knowledge. >> >> >>>>> >> >> >>>>> I personally would love such a thing. ?It's been a while since I >> >> >>>>> did >> >> >>>>> anything nontrivial on my own in C++. >> >> >>>>> >> >> >>>> >> >> >>>> One example: How do we code multiple return values? >> >> >>>> >> >> >>>> In Python: >> >> >>>> - Return a tuple. >> >> >>>> >> >> >>>> In C: >> >> >>>> - Use pointers (evilness) >> >> >>>> >> >> >>>> In C++: >> >> >>>> - Return a std::tuple, as you would in Python. >> >> >>>> - Use references, as you would in Fortran or Pascal. >> >> >>>> - Use pointers, as you would in C. >> >> >>>> >> >> >>>> C++ textbooks always pick the last... >> >> >>>> >> >> >>>> I would show the first and the second method, and perhaps >> >> >>>> intentionally forget the last. >> >> >>>> >> >> >>>> Sturla >> >> >>>> >> >> >> >> >> >>> On the flip side, cython looked pretty...but I didn't get the >> >> >>> performance gains I wanted, and had to spend a lot of time figuring >> >> >>> out if it was cython, needing to add types, buggy support for >> >> >>> numpy, >> >> >>> or actually the algorithm. >> >> >> >> >> >> At the time, was the numpy support buggy? ?I personally haven't had >> >> >> many problems with Cython and numpy. >> >> >> >> >> > >> >> > It's not that the support WAS buggy, it's that it wasn't clear to me >> >> > what was going on and where my performance bottleneck was. Even after >> >> > microbenchmarking with ipython, using timeit and prun, and using the >> >> > cython code visualization tool. Ultimately I don't think it was >> >> > cython, so perhaps my comment was a bit unfair. But it was >> >> > unfortunately difficult to verify that. Of course, as you say, >> >> > diagnosing and solving such issues would become easier to resolve >> >> > with >> >> > more cython experience. >> >> > >> >> >>> The C files generated by cython were >> >> >>> enormous and difficult to read. They really weren't meant for human >> >> >>> consumption. >> >> >> >> >> >> Yes, it takes some practice to get used to what Cython will do, and >> >> >> how to optimize the output. >> >> >> >> >> >>> As Sturla has said, regardless of the quality of the >> >> >>> current product, it isn't stable. >> >> >> >> >> >> I've personally found it more or less rock solid. ?Could you say >> >> >> what >> >> >> you mean by "it isn't stable"? >> >> >> >> >> > >> >> > I just meant what Sturla said, nothing more: >> >> > >> >> > "Cython is still 0.16, it is still unfinished. We cannot base NumPy >> >> > on >> >> > an unfinished compiler." >> >> >> >> Y'all mean, it has a zero at the beginning of the version number and >> >> it is still adding new features? ?Yes, that is correct, but it seems >> >> more reasonable to me to phrase that as 'active development' rather >> >> than 'unstable', because they take considerable care to be backwards >> >> compatible, have a large automated Cython test suite, and a major >> >> stress-tester in the Sage test suite. >> >> >> > >> > Matthew, >> > >> > No one in their right mind would build a large performance library using >> > Cython, it just isn't the right tool. For what it was designed for - >> > wrapping existing c code or writing small and simple things close to >> > Python >> > - it does very well, but it was never designed for making core C/C++ >> > libraries and in that role it just gets in the way. >> >> I believe the proposal is to refactor the lowest levels in pure C and >> move the some or most of the library superstructure to Cython. > > > Go for it. My goal was to try and contribute to substantive discussion of the benefits / costs of the various approaches. It does require a realistic assessment of what is being proposed. It may be, that discussion is not fruitful. But then we all lose, I think, Best, Matthew From cournape at gmail.com Sat Feb 18 16:17:03 2012 From: cournape at gmail.com (David Cournapeau) Date: Sat, 18 Feb 2012 21:17:03 +0000 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: On Sat, Feb 18, 2012 at 8:45 PM, Charles R Harris wrote: > > > On Sat, Feb 18, 2012 at 1:39 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Sat, Feb 18, 2012 at 12:35 PM, Charles R Harris >> wrote: >> > >> > >> > On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett >> > >> > wrote: >> >> >> >> Hi. >> >> >> >> On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire >> >> wrote: >> >> > On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett >> >> > wrote: >> >> >> Hi, >> >> >> >> >> >> On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire >> >> >> wrote: >> >> >>> On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden >> >> >>> wrote: >> >> >>>> >> >> >>>> >> >> >>>> Den 18. feb. 2012 kl. 05:01 skrev Jason Grout >> >> >>>> : >> >> >>>> >> >> >>>>> On 2/17/12 9:54 PM, Sturla Molden wrote: >> >> >>>>>> We would have to write a C++ programming tutorial that is based >> >> >>>>>> on >> >> >>>>>> Pyton knowledge instead of C knowledge. >> >> >>>>> >> >> >>>>> I personally would love such a thing. ?It's been a while since I >> >> >>>>> did >> >> >>>>> anything nontrivial on my own in C++. >> >> >>>>> >> >> >>>> >> >> >>>> One example: How do we code multiple return values? >> >> >>>> >> >> >>>> In Python: >> >> >>>> - Return a tuple. >> >> >>>> >> >> >>>> In C: >> >> >>>> - Use pointers (evilness) >> >> >>>> >> >> >>>> In C++: >> >> >>>> - Return a std::tuple, as you would in Python. >> >> >>>> - Use references, as you would in Fortran or Pascal. >> >> >>>> - Use pointers, as you would in C. >> >> >>>> >> >> >>>> C++ textbooks always pick the last... >> >> >>>> >> >> >>>> I would show the first and the second method, and perhaps >> >> >>>> intentionally forget the last. >> >> >>>> >> >> >>>> Sturla >> >> >>>> >> >> >> >> >> >>> On the flip side, cython looked pretty...but I didn't get the >> >> >>> performance gains I wanted, and had to spend a lot of time figuring >> >> >>> out if it was cython, needing to add types, buggy support for >> >> >>> numpy, >> >> >>> or actually the algorithm. >> >> >> >> >> >> At the time, was the numpy support buggy? ?I personally haven't had >> >> >> many problems with Cython and numpy. >> >> >> >> >> > >> >> > It's not that the support WAS buggy, it's that it wasn't clear to me >> >> > what was going on and where my performance bottleneck was. Even after >> >> > microbenchmarking with ipython, using timeit and prun, and using the >> >> > cython code visualization tool. Ultimately I don't think it was >> >> > cython, so perhaps my comment was a bit unfair. But it was >> >> > unfortunately difficult to verify that. Of course, as you say, >> >> > diagnosing and solving such issues would become easier to resolve >> >> > with >> >> > more cython experience. >> >> > >> >> >>> The C files generated by cython were >> >> >>> enormous and difficult to read. They really weren't meant for human >> >> >>> consumption. >> >> >> >> >> >> Yes, it takes some practice to get used to what Cython will do, and >> >> >> how to optimize the output. >> >> >> >> >> >>> As Sturla has said, regardless of the quality of the >> >> >>> current product, it isn't stable. >> >> >> >> >> >> I've personally found it more or less rock solid. ?Could you say >> >> >> what >> >> >> you mean by "it isn't stable"? >> >> >> >> >> > >> >> > I just meant what Sturla said, nothing more: >> >> > >> >> > "Cython is still 0.16, it is still unfinished. We cannot base NumPy >> >> > on >> >> > an unfinished compiler." >> >> >> >> Y'all mean, it has a zero at the beginning of the version number and >> >> it is still adding new features? ?Yes, that is correct, but it seems >> >> more reasonable to me to phrase that as 'active development' rather >> >> than 'unstable', because they take considerable care to be backwards >> >> compatible, have a large automated Cython test suite, and a major >> >> stress-tester in the Sage test suite. >> >> >> > >> > Matthew, >> > >> > No one in their right mind would build a large performance library using >> > Cython, it just isn't the right tool. For what it was designed for - >> > wrapping existing c code or writing small and simple things close to >> > Python >> > - it does very well, but it was never designed for making core C/C++ >> > libraries and in that role it just gets in the way. >> >> I believe the proposal is to refactor the lowest levels in pure C and >> move the some or most of the library superstructure to Cython. > > > Go for it. The proposal of moving to a core C + cython has been discussed by multiple contributors. It is certainly a valid proposal. *I* have worked on this (npymath, separate compilation), although certainly not as much as I would have wanted to. I think much can be done in that vein. Using the "shut up if you don't do it" is a straw man (and uncalled for). Moving away from subjective considerations on how to do things, is there a way that one can see the pros/cons of each approach. For the C++ approach, I would really like to see which C++ is being considered. I was. Once the choice is done, going back would be quite hard, so I can't see how we could go for it just because some people prefer it without very clear technical arguments. Saying that C++ is more readable, or scale better are frankly very weak and too subjective to be convincing. There are too many projects way more complex than numpy that have been done in either C or C++. David From ben.root at ou.edu Sat Feb 18 16:25:42 2012 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 18 Feb 2012 15:25:42 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: On Sat, Feb 18, 2012 at 2:45 PM, Charles R Harris wrote: > > > On Sat, Feb 18, 2012 at 1:39 PM, Matthew Brett wrote: > >> Hi, >> >> On Sat, Feb 18, 2012 at 12:35 PM, Charles R Harris >> wrote: >> > >> > >> > On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett < >> matthew.brett at gmail.com> >> > wrote: >> >> >> >> Hi. >> >> >> >> On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire >> >> wrote: >> >> > On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett >> >> > wrote: >> >> >> Hi, >> >> >> >> >> >> On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire >> >> >> wrote: >> >> >>> On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden >> >> >>> wrote: >> >> >>>> >> >> >>>> >> >> >>>> Den 18. feb. 2012 kl. 05:01 skrev Jason Grout >> >> >>>> : >> >> >>>> >> >> >>>>> On 2/17/12 9:54 PM, Sturla Molden wrote: >> >> >>>>>> We would have to write a C++ programming tutorial that is based >> on >> >> >>>>>> Pyton knowledge instead of C knowledge. >> >> >>>>> >> >> >>>>> I personally would love such a thing. It's been a while since I >> did >> >> >>>>> anything nontrivial on my own in C++. >> >> >>>>> >> >> >>>> >> >> >>>> One example: How do we code multiple return values? >> >> >>>> >> >> >>>> In Python: >> >> >>>> - Return a tuple. >> >> >>>> >> >> >>>> In C: >> >> >>>> - Use pointers (evilness) >> >> >>>> >> >> >>>> In C++: >> >> >>>> - Return a std::tuple, as you would in Python. >> >> >>>> - Use references, as you would in Fortran or Pascal. >> >> >>>> - Use pointers, as you would in C. >> >> >>>> >> >> >>>> C++ textbooks always pick the last... >> >> >>>> >> >> >>>> I would show the first and the second method, and perhaps >> >> >>>> intentionally forget the last. >> >> >>>> >> >> >>>> Sturla >> >> >>>> >> >> >> >> >> >>> On the flip side, cython looked pretty...but I didn't get the >> >> >>> performance gains I wanted, and had to spend a lot of time figuring >> >> >>> out if it was cython, needing to add types, buggy support for >> numpy, >> >> >>> or actually the algorithm. >> >> >> >> >> >> At the time, was the numpy support buggy? I personally haven't had >> >> >> many problems with Cython and numpy. >> >> >> >> >> > >> >> > It's not that the support WAS buggy, it's that it wasn't clear to me >> >> > what was going on and where my performance bottleneck was. Even after >> >> > microbenchmarking with ipython, using timeit and prun, and using the >> >> > cython code visualization tool. Ultimately I don't think it was >> >> > cython, so perhaps my comment was a bit unfair. But it was >> >> > unfortunately difficult to verify that. Of course, as you say, >> >> > diagnosing and solving such issues would become easier to resolve >> with >> >> > more cython experience. >> >> > >> >> >>> The C files generated by cython were >> >> >>> enormous and difficult to read. They really weren't meant for human >> >> >>> consumption. >> >> >> >> >> >> Yes, it takes some practice to get used to what Cython will do, and >> >> >> how to optimize the output. >> >> >> >> >> >>> As Sturla has said, regardless of the quality of the >> >> >>> current product, it isn't stable. >> >> >> >> >> >> I've personally found it more or less rock solid. Could you say >> what >> >> >> you mean by "it isn't stable"? >> >> >> >> >> > >> >> > I just meant what Sturla said, nothing more: >> >> > >> >> > "Cython is still 0.16, it is still unfinished. We cannot base NumPy >> on >> >> > an unfinished compiler." >> >> >> >> Y'all mean, it has a zero at the beginning of the version number and >> >> it is still adding new features? Yes, that is correct, but it seems >> >> more reasonable to me to phrase that as 'active development' rather >> >> than 'unstable', because they take considerable care to be backwards >> >> compatible, have a large automated Cython test suite, and a major >> >> stress-tester in the Sage test suite. >> >> >> > >> > Matthew, >> > >> > No one in their right mind would build a large performance library using >> > Cython, it just isn't the right tool. For what it was designed for - >> > wrapping existing c code or writing small and simple things close to >> Python >> > - it does very well, but it was never designed for making core C/C++ >> > libraries and in that role it just gets in the way. >> >> I believe the proposal is to refactor the lowest levels in pure C and >> move the some or most of the library superstructure to Cython. >> > > Go for it. > > Chuck > > > Just a couple of quick questions: 1.) What is the status of the refactoring that was done for IronPython a couple of years ago? The last I heard, the branches diverged too much for merging the work back into numpy. Are there lessons that can be learned from that experience that can be applied to whatever happens next? 2.) My personal preference is an incremental refactor over to C++ using STL, however, I have to be realistic. First, the exception issue is problematic (unsolvable? I don't know). Second, one of Numpy/Scipy's greatest strengths is the relative ease it has in interfacing with BLAS, ATLAS, mkl and other optimizations. Will this still be possible from a C++ (or anything else) core? Third, I am only familiar with STL on gcc. Are there any subtle differences in implementations of STL in MSVC or any other compilers. Pointers are hard to mess up, in cross-platform ways. 3.) Will memory-mapped arrays still be possible after the refactor? I am not familiar with the implementation, but I am a big netcdf/hdf user and mem-mapped arrays are important to me. 4.) Wouldn't depending on Cython create a circular dependency? Can you build Cython without numpy-devel? (I never tried. I have only used packaged Cython). Also, because Cython generates code to compile, is there a possibility of producing different ABIs depending upon the combinations of numpy and cython versions (even if unintentional)? How difficult will it be for distro maintainers to package numpy and its extensions? How difficult will it be for users of Macs and Windows who may try combining different versions? Honest questions because I have never had more than a cursory exposure to Cython. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Feb 18 16:40:48 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 18 Feb 2012 14:40:48 -0700 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: On Sat, Feb 18, 2012 at 2:17 PM, David Cournapeau wrote: > On Sat, Feb 18, 2012 at 8:45 PM, Charles R Harris > wrote: > > > > > > On Sat, Feb 18, 2012 at 1:39 PM, Matthew Brett > > wrote: > >> > >> Hi, > >> > >> On Sat, Feb 18, 2012 at 12:35 PM, Charles R Harris > >> wrote: > >> > > >> > > >> > On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett > >> > > >> > wrote: > >> >> > >> >> Hi. > >> >> > >> >> On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire > >> >> wrote: > >> >> > On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett > >> >> > wrote: > >> >> >> Hi, > >> >> >> > >> >> >> On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire > >> >> >> wrote: > >> >> >>> On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden > > >> >> >>> wrote: > >> >> >>>> > >> >> >>>> > >> >> >>>> Den 18. feb. 2012 kl. 05:01 skrev Jason Grout > >> >> >>>> : > >> >> >>>> > >> >> >>>>> On 2/17/12 9:54 PM, Sturla Molden wrote: > >> >> >>>>>> We would have to write a C++ programming tutorial that is > based > >> >> >>>>>> on > >> >> >>>>>> Pyton knowledge instead of C knowledge. > >> >> >>>>> > >> >> >>>>> I personally would love such a thing. It's been a while since > I > >> >> >>>>> did > >> >> >>>>> anything nontrivial on my own in C++. > >> >> >>>>> > >> >> >>>> > >> >> >>>> One example: How do we code multiple return values? > >> >> >>>> > >> >> >>>> In Python: > >> >> >>>> - Return a tuple. > >> >> >>>> > >> >> >>>> In C: > >> >> >>>> - Use pointers (evilness) > >> >> >>>> > >> >> >>>> In C++: > >> >> >>>> - Return a std::tuple, as you would in Python. > >> >> >>>> - Use references, as you would in Fortran or Pascal. > >> >> >>>> - Use pointers, as you would in C. > >> >> >>>> > >> >> >>>> C++ textbooks always pick the last... > >> >> >>>> > >> >> >>>> I would show the first and the second method, and perhaps > >> >> >>>> intentionally forget the last. > >> >> >>>> > >> >> >>>> Sturla > >> >> >>>> > >> >> >> > >> >> >>> On the flip side, cython looked pretty...but I didn't get the > >> >> >>> performance gains I wanted, and had to spend a lot of time > figuring > >> >> >>> out if it was cython, needing to add types, buggy support for > >> >> >>> numpy, > >> >> >>> or actually the algorithm. > >> >> >> > >> >> >> At the time, was the numpy support buggy? I personally haven't > had > >> >> >> many problems with Cython and numpy. > >> >> >> > >> >> > > >> >> > It's not that the support WAS buggy, it's that it wasn't clear to > me > >> >> > what was going on and where my performance bottleneck was. Even > after > >> >> > microbenchmarking with ipython, using timeit and prun, and using > the > >> >> > cython code visualization tool. Ultimately I don't think it was > >> >> > cython, so perhaps my comment was a bit unfair. But it was > >> >> > unfortunately difficult to verify that. Of course, as you say, > >> >> > diagnosing and solving such issues would become easier to resolve > >> >> > with > >> >> > more cython experience. > >> >> > > >> >> >>> The C files generated by cython were > >> >> >>> enormous and difficult to read. They really weren't meant for > human > >> >> >>> consumption. > >> >> >> > >> >> >> Yes, it takes some practice to get used to what Cython will do, > and > >> >> >> how to optimize the output. > >> >> >> > >> >> >>> As Sturla has said, regardless of the quality of the > >> >> >>> current product, it isn't stable. > >> >> >> > >> >> >> I've personally found it more or less rock solid. Could you say > >> >> >> what > >> >> >> you mean by "it isn't stable"? > >> >> >> > >> >> > > >> >> > I just meant what Sturla said, nothing more: > >> >> > > >> >> > "Cython is still 0.16, it is still unfinished. We cannot base NumPy > >> >> > on > >> >> > an unfinished compiler." > >> >> > >> >> Y'all mean, it has a zero at the beginning of the version number and > >> >> it is still adding new features? Yes, that is correct, but it seems > >> >> more reasonable to me to phrase that as 'active development' rather > >> >> than 'unstable', because they take considerable care to be backwards > >> >> compatible, have a large automated Cython test suite, and a major > >> >> stress-tester in the Sage test suite. > >> >> > >> > > >> > Matthew, > >> > > >> > No one in their right mind would build a large performance library > using > >> > Cython, it just isn't the right tool. For what it was designed for - > >> > wrapping existing c code or writing small and simple things close to > >> > Python > >> > - it does very well, but it was never designed for making core C/C++ > >> > libraries and in that role it just gets in the way. > >> > >> I believe the proposal is to refactor the lowest levels in pure C and > >> move the some or most of the library superstructure to Cython. > > > > > > Go for it. > > The proposal of moving to a core C + cython has been discussed by > multiple contributors. It is certainly a valid proposal. *I* have > worked on this (npymath, separate compilation), although certainly not > as much as I would have wanted to. I think much can be done in that > vein. Using the "shut up if you don't do it" is a straw man (and > uncalled for). > OK, I was annoyed. > > Moving away from subjective considerations on how to do things, is > there a way that one can see the pros/cons of each approach. For the > C++ approach, I would really like to see which C++ is being > considered. I was. Once the choice is done, going back would be quite > hard, so I can't see how we could go for it just because some people > prefer it without very clear technical arguments. > Well, we already have code obfuscation (DOUBLE_your_pleasure, FLOAT_your_boat), so we might as well let the compiler handle it. Having classes, lists, and iterators would be a big plus. The current code is really a kludge trying to make C look like C++. Not inherently bad, the original C++ (C with classes), was a preprocessor that generated C code. I really think the best arguments against C++ is portability and I think that needs to be evaluated. But in many ways it supports the sort of things the Numpy C code does in a natural way. I'll let Mark expand on the virtues if he is so inclined, but C++ code offers a higher level of abstraction that is very useful and allows good reuse of properly constructed tools. The emphasis here on 'properly'. There is certainly bad C++ code out there. > Saying that C++ is more readable, or scale better are frankly very > weak and too subjective to be convincing. There are too many projects > way more complex than numpy that have been done in either C or C++. > > To some extent that is experience based. And to another extent, it is a question of what language people like to develop in. I myself would prefer C++. The main thing I really don't like about C++ is IO. But Boost offers some relief for that. I expect we will use small bits of Boost that can be excised without problems from the bigger library. I don't think we can count on C++11 at this point, so we would probably be conservative in our choice of features. Jim Hugunin was a keynote speaker at one of the scipy conventions. At dinner he said that if he was to do it again he would use managed code ;) I don't propose we do that, but tools do advance. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sat Feb 18 16:51:13 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 18 Feb 2012 13:51:13 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: On Sat, Feb 18, 2012 at 1:40 PM, Charles R Harris wrote: > > > On Sat, Feb 18, 2012 at 2:17 PM, David Cournapeau > wrote: >> >> On Sat, Feb 18, 2012 at 8:45 PM, Charles R Harris >> wrote: >> > >> > >> > On Sat, Feb 18, 2012 at 1:39 PM, Matthew Brett >> > wrote: >> >> >> >> Hi, >> >> >> >> On Sat, Feb 18, 2012 at 12:35 PM, Charles R Harris >> >> wrote: >> >> > >> >> > >> >> > On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett >> >> > >> >> > wrote: >> >> >> >> >> >> Hi. >> >> >> >> >> >> On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire >> >> >> wrote: >> >> >> > On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett >> >> >> > wrote: >> >> >> >> Hi, >> >> >> >> >> >> >> >> On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire >> >> >> >> wrote: >> >> >> >>> On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden >> >> >> >>> >> >> >> >>> wrote: >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> Den 18. feb. 2012 kl. 05:01 skrev Jason Grout >> >> >> >>>> : >> >> >> >>>> >> >> >> >>>>> On 2/17/12 9:54 PM, Sturla Molden wrote: >> >> >> >>>>>> We would have to write a C++ programming tutorial that is >> >> >> >>>>>> based >> >> >> >>>>>> on >> >> >> >>>>>> Pyton knowledge instead of C knowledge. >> >> >> >>>>> >> >> >> >>>>> I personally would love such a thing. ?It's been a while since >> >> >> >>>>> I >> >> >> >>>>> did >> >> >> >>>>> anything nontrivial on my own in C++. >> >> >> >>>>> >> >> >> >>>> >> >> >> >>>> One example: How do we code multiple return values? >> >> >> >>>> >> >> >> >>>> In Python: >> >> >> >>>> - Return a tuple. >> >> >> >>>> >> >> >> >>>> In C: >> >> >> >>>> - Use pointers (evilness) >> >> >> >>>> >> >> >> >>>> In C++: >> >> >> >>>> - Return a std::tuple, as you would in Python. >> >> >> >>>> - Use references, as you would in Fortran or Pascal. >> >> >> >>>> - Use pointers, as you would in C. >> >> >> >>>> >> >> >> >>>> C++ textbooks always pick the last... >> >> >> >>>> >> >> >> >>>> I would show the first and the second method, and perhaps >> >> >> >>>> intentionally forget the last. >> >> >> >>>> >> >> >> >>>> Sturla >> >> >> >>>> >> >> >> >> >> >> >> >>> On the flip side, cython looked pretty...but I didn't get the >> >> >> >>> performance gains I wanted, and had to spend a lot of time >> >> >> >>> figuring >> >> >> >>> out if it was cython, needing to add types, buggy support for >> >> >> >>> numpy, >> >> >> >>> or actually the algorithm. >> >> >> >> >> >> >> >> At the time, was the numpy support buggy? ?I personally haven't >> >> >> >> had >> >> >> >> many problems with Cython and numpy. >> >> >> >> >> >> >> > >> >> >> > It's not that the support WAS buggy, it's that it wasn't clear to >> >> >> > me >> >> >> > what was going on and where my performance bottleneck was. Even >> >> >> > after >> >> >> > microbenchmarking with ipython, using timeit and prun, and using >> >> >> > the >> >> >> > cython code visualization tool. Ultimately I don't think it was >> >> >> > cython, so perhaps my comment was a bit unfair. But it was >> >> >> > unfortunately difficult to verify that. Of course, as you say, >> >> >> > diagnosing and solving such issues would become easier to resolve >> >> >> > with >> >> >> > more cython experience. >> >> >> > >> >> >> >>> The C files generated by cython were >> >> >> >>> enormous and difficult to read. They really weren't meant for >> >> >> >>> human >> >> >> >>> consumption. >> >> >> >> >> >> >> >> Yes, it takes some practice to get used to what Cython will do, >> >> >> >> and >> >> >> >> how to optimize the output. >> >> >> >> >> >> >> >>> As Sturla has said, regardless of the quality of the >> >> >> >>> current product, it isn't stable. >> >> >> >> >> >> >> >> I've personally found it more or less rock solid. ?Could you say >> >> >> >> what >> >> >> >> you mean by "it isn't stable"? >> >> >> >> >> >> >> > >> >> >> > I just meant what Sturla said, nothing more: >> >> >> > >> >> >> > "Cython is still 0.16, it is still unfinished. We cannot base >> >> >> > NumPy >> >> >> > on >> >> >> > an unfinished compiler." >> >> >> >> >> >> Y'all mean, it has a zero at the beginning of the version number and >> >> >> it is still adding new features? ?Yes, that is correct, but it seems >> >> >> more reasonable to me to phrase that as 'active development' rather >> >> >> than 'unstable', because they take considerable care to be backwards >> >> >> compatible, have a large automated Cython test suite, and a major >> >> >> stress-tester in the Sage test suite. >> >> >> >> >> > >> >> > Matthew, >> >> > >> >> > No one in their right mind would build a large performance library >> >> > using >> >> > Cython, it just isn't the right tool. For what it was designed for - >> >> > wrapping existing c code or writing small and simple things close to >> >> > Python >> >> > - it does very well, but it was never designed for making core C/C++ >> >> > libraries and in that role it just gets in the way. >> >> >> >> I believe the proposal is to refactor the lowest levels in pure C and >> >> move the some or most of the library superstructure to Cython. >> > >> > >> > Go for it. >> >> The proposal of moving to a core C + cython has been discussed by >> multiple contributors. It is certainly a valid proposal. *I* have >> worked on this (npymath, separate compilation), although certainly not >> as much as I would have wanted to. I think much can be done in that >> vein. Using the "shut up if you don't do it" is a straw man (and >> uncalled for). > > > OK, I was annoyed. By what? Best, Matthew From charlesr.harris at gmail.com Sat Feb 18 16:55:40 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 18 Feb 2012 14:55:40 -0700 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: On Sat, Feb 18, 2012 at 2:51 PM, Matthew Brett wrote: > On Sat, Feb 18, 2012 at 1:40 PM, Charles R Harris > wrote: > > > > > > On Sat, Feb 18, 2012 at 2:17 PM, David Cournapeau > > wrote: > >> > >> On Sat, Feb 18, 2012 at 8:45 PM, Charles R Harris > >> wrote: > >> > > >> > > >> > On Sat, Feb 18, 2012 at 1:39 PM, Matthew Brett < > matthew.brett at gmail.com> > >> > wrote: > >> >> > >> >> Hi, > >> >> > >> >> On Sat, Feb 18, 2012 at 12:35 PM, Charles R Harris > >> >> wrote: > >> >> > > >> >> > > >> >> > On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett > >> >> > > >> >> > wrote: > >> >> >> > >> >> >> Hi. > >> >> >> > >> >> >> On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire > >> >> >> wrote: > >> >> >> > On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett > >> >> >> > wrote: > >> >> >> >> Hi, > >> >> >> >> > >> >> >> >> On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire > >> >> >> >> wrote: > >> >> >> >>> On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden > >> >> >> >>> > >> >> >> >>> wrote: > >> >> >> >>>> > >> >> >> >>>> > >> >> >> >>>> Den 18. feb. 2012 kl. 05:01 skrev Jason Grout > >> >> >> >>>> : > >> >> >> >>>> > >> >> >> >>>>> On 2/17/12 9:54 PM, Sturla Molden wrote: > >> >> >> >>>>>> We would have to write a C++ programming tutorial that is > >> >> >> >>>>>> based > >> >> >> >>>>>> on > >> >> >> >>>>>> Pyton knowledge instead of C knowledge. > >> >> >> >>>>> > >> >> >> >>>>> I personally would love such a thing. It's been a while > since > >> >> >> >>>>> I > >> >> >> >>>>> did > >> >> >> >>>>> anything nontrivial on my own in C++. > >> >> >> >>>>> > >> >> >> >>>> > >> >> >> >>>> One example: How do we code multiple return values? > >> >> >> >>>> > >> >> >> >>>> In Python: > >> >> >> >>>> - Return a tuple. > >> >> >> >>>> > >> >> >> >>>> In C: > >> >> >> >>>> - Use pointers (evilness) > >> >> >> >>>> > >> >> >> >>>> In C++: > >> >> >> >>>> - Return a std::tuple, as you would in Python. > >> >> >> >>>> - Use references, as you would in Fortran or Pascal. > >> >> >> >>>> - Use pointers, as you would in C. > >> >> >> >>>> > >> >> >> >>>> C++ textbooks always pick the last... > >> >> >> >>>> > >> >> >> >>>> I would show the first and the second method, and perhaps > >> >> >> >>>> intentionally forget the last. > >> >> >> >>>> > >> >> >> >>>> Sturla > >> >> >> >>>> > >> >> >> >> > >> >> >> >>> On the flip side, cython looked pretty...but I didn't get the > >> >> >> >>> performance gains I wanted, and had to spend a lot of time > >> >> >> >>> figuring > >> >> >> >>> out if it was cython, needing to add types, buggy support for > >> >> >> >>> numpy, > >> >> >> >>> or actually the algorithm. > >> >> >> >> > >> >> >> >> At the time, was the numpy support buggy? I personally haven't > >> >> >> >> had > >> >> >> >> many problems with Cython and numpy. > >> >> >> >> > >> >> >> > > >> >> >> > It's not that the support WAS buggy, it's that it wasn't clear > to > >> >> >> > me > >> >> >> > what was going on and where my performance bottleneck was. Even > >> >> >> > after > >> >> >> > microbenchmarking with ipython, using timeit and prun, and using > >> >> >> > the > >> >> >> > cython code visualization tool. Ultimately I don't think it was > >> >> >> > cython, so perhaps my comment was a bit unfair. But it was > >> >> >> > unfortunately difficult to verify that. Of course, as you say, > >> >> >> > diagnosing and solving such issues would become easier to > resolve > >> >> >> > with > >> >> >> > more cython experience. > >> >> >> > > >> >> >> >>> The C files generated by cython were > >> >> >> >>> enormous and difficult to read. They really weren't meant for > >> >> >> >>> human > >> >> >> >>> consumption. > >> >> >> >> > >> >> >> >> Yes, it takes some practice to get used to what Cython will do, > >> >> >> >> and > >> >> >> >> how to optimize the output. > >> >> >> >> > >> >> >> >>> As Sturla has said, regardless of the quality of the > >> >> >> >>> current product, it isn't stable. > >> >> >> >> > >> >> >> >> I've personally found it more or less rock solid. Could you > say > >> >> >> >> what > >> >> >> >> you mean by "it isn't stable"? > >> >> >> >> > >> >> >> > > >> >> >> > I just meant what Sturla said, nothing more: > >> >> >> > > >> >> >> > "Cython is still 0.16, it is still unfinished. We cannot base > >> >> >> > NumPy > >> >> >> > on > >> >> >> > an unfinished compiler." > >> >> >> > >> >> >> Y'all mean, it has a zero at the beginning of the version number > and > >> >> >> it is still adding new features? Yes, that is correct, but it > seems > >> >> >> more reasonable to me to phrase that as 'active development' > rather > >> >> >> than 'unstable', because they take considerable care to be > backwards > >> >> >> compatible, have a large automated Cython test suite, and a major > >> >> >> stress-tester in the Sage test suite. > >> >> >> > >> >> > > >> >> > Matthew, > >> >> > > >> >> > No one in their right mind would build a large performance library > >> >> > using > >> >> > Cython, it just isn't the right tool. For what it was designed for > - > >> >> > wrapping existing c code or writing small and simple things close > to > >> >> > Python > >> >> > - it does very well, but it was never designed for making core > C/C++ > >> >> > libraries and in that role it just gets in the way. > >> >> > >> >> I believe the proposal is to refactor the lowest levels in pure C and > >> >> move the some or most of the library superstructure to Cython. > >> > > >> > > >> > Go for it. > >> > >> The proposal of moving to a core C + cython has been discussed by > >> multiple contributors. It is certainly a valid proposal. *I* have > >> worked on this (npymath, separate compilation), although certainly not > >> as much as I would have wanted to. I think much can be done in that > >> vein. Using the "shut up if you don't do it" is a straw man (and > >> uncalled for). > > > > > > OK, I was annoyed. > > By what? > > Exactly. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Sat Feb 18 16:57:24 2012 From: travis at continuum.io (Travis Oliphant) Date: Sat, 18 Feb 2012 15:57:24 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> Message-ID: <14348F26-27EB-4313-8F90-D9726DC2144D@continuum.io> > > * NumPy 1.8 to come out in July which will have as many ABI-compatible feature enhancements as we can add while improving test coverage and code cleanup. I will post to this list more details of what we plan to address with it later. Included for possible inclusion are: > * resolving the NA/missing-data issues > * finishing group-by > * incorporating the start of label arrays > * incorporating a meta-object > * a few new dtypes (variable-length string, varialbe-length unicode and an enum type) > * adding ufunc support for flexible dtypes and possibly structured arrays > * allowing generalized ufuncs to work on more kinds of arrays besides just contiguous > * improving the ability for NumPy to receive JIT-generated function pointers for ufuncs and other calculation opportunities > * adding "filters" to Input and Output > * simple computed fields for dtypes > * accepting a Data-Type specification as a class or JSON file > * work towards improving the dtype-addition mechanism > > For some of these things it's not entirely (or at all, what's a meta-object?) clear to me what they mean or how they would work. How do you plan to go about working on these features? One NEP per feature? > I thought I responded to this already, but it might have been from a different mail server.... Yes, these will each be discussed in course as they are developed. I just wanted to get an outline started. More detail will come out on each feature as development proceeds. There is a larger list of features that we will be suggesting and discussing in the months ahead as NumPy 2.0 development is proposed and discussed. But, this list includes things that are fairly straightforward to implement in the current data-model and calculation infrastructure. There is a lot of criticism of the C-code which is welcome. I wrote *a lot* of that code --- inspired by and following patterns laid out by other people. I am always interested in specific improvement ideas and/or proposals, as are most people. I especially appreciate targeted, constructive comments and not just general FUD. There has been some criticism of the C-API documentation. After I gave away the content of my book, Guide to NumPy, 3 years ago: Joe Harrington and others adapted it to the web. The C-API portion which was documented in my book (see starting with page 211 at http://www.tramy.us/numpybook.pdf). This material is now available online as well (where it has received updates and improvements): http://docs.scipy.org/doc/numpy/reference/c-api.array.html There are under-documented sections of the code --- usually these are in areas where adoption has driven demand for an understanding of those features (adding new dtypes and array scalars, for example). In addition, there are always improvements to be made to the way something is said and described (and there are different ways people like to be taught). The C/C++ discussion is just getting started. Everyone should keep in mind that this is not something that is going to happening quickly. This will be a point of discussion throughout the year. I'm not a huge supporter of C++, but C++11 does look like it's made some nice progress, and as I think about making a core-set of NumPy into a library that can be called by multiple languages (and even multiple implementations of Python), tempered C++ seems like it might be an appropriate way to go. Cython could be useful for Python interfaces to that Core and for extension modules on top, but Cython is *not* a solution for the core of NumPy. It was entertained as we did the IronPython work, but realized it would have taken too long. I'm actually quite glad that we didn't go that direction, now. Cython is a nice project, and I think will play a role in the stack that emerges, but I am more interested in an eventual NumPy core that does not rely on the Python C-API. Another thing that I would like to see happen for NumPy 1.8 is the use of bento by default for the build --- and encouraging down-stream projects to use it as well. We should deprecate as much of numpy.distutils as possilbe, in my mind. What happens during build is pretty hard to understand partly because distutils never really supported building complex extension modules --- that community is still pretty hostile to the needs of extension writers with a real build problem on their hands. We have gotten by with numpy.distutils, but it has not been the easiest thing to adapt. -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sat Feb 18 17:03:17 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 18 Feb 2012 14:03:17 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <14348F26-27EB-4313-8F90-D9726DC2144D@continuum.io> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <14348F26-27EB-4313-8F90-D9726DC2144D@continuum.io> Message-ID: Hi, On Sat, Feb 18, 2012 at 1:57 PM, Travis Oliphant wrote: > The C/C++ discussion is just getting started. ?Everyone should keep in mind > that this is not something that is going to happening quickly. ? This will > be a point of discussion throughout the year. ? ?I'm not a huge supporter of > C++, but C++11 does look like it's made some nice progress, and as I think > about making a core-set of NumPy into a library that can be called by > multiple languages (and even multiple implementations of Python), tempered > C++ seems like it might be an appropriate way to go. Could you say more about this? Do you have any idea when the decision about C++ is likely to be made? At what point does it make most sense to make the argument for or against? Can you suggest a good way for us to be able to make more substantial arguments either way? Can you say a little more about your impression of the previous Cython refactor and why it was not successful? Thanks a lot, Matthew From robert.kern at gmail.com Sat Feb 18 17:03:39 2012 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 18 Feb 2012 22:03:39 +0000 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: On Sat, Feb 18, 2012 at 21:51, Matthew Brett wrote: > On Sat, Feb 18, 2012 at 1:40 PM, Charles R Harris > wrote: >> >> >> On Sat, Feb 18, 2012 at 2:17 PM, David Cournapeau >> wrote: >>> >>> On Sat, Feb 18, 2012 at 8:45 PM, Charles R Harris >>> wrote: >>> > >>> > >>> > On Sat, Feb 18, 2012 at 1:39 PM, Matthew Brett >>> > wrote: >>> >> >>> >> Hi, >>> >> >>> >> On Sat, Feb 18, 2012 at 12:35 PM, Charles R Harris >>> >> wrote: >>> >> > >>> >> > >>> >> > On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett >>> >> > >>> >> > wrote: >>> >> >> >>> >> >> Hi. >>> >> >> >>> >> >> On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire >>> >> >> wrote: >>> >> >> > On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett >>> >> >> > wrote: >>> >> >> >> Hi, >>> >> >> >> >>> >> >> >> On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire >>> >> >> >> wrote: >>> >> >> >>> On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden >>> >> >> >>> >>> >> >> >>> wrote: >>> >> >> >>>> >>> >> >> >>>> >>> >> >> >>>> Den 18. feb. 2012 kl. 05:01 skrev Jason Grout >>> >> >> >>>> : >>> >> >> >>>> >>> >> >> >>>>> On 2/17/12 9:54 PM, Sturla Molden wrote: >>> >> >> >>>>>> We would have to write a C++ programming tutorial that is >>> >> >> >>>>>> based >>> >> >> >>>>>> on >>> >> >> >>>>>> Pyton knowledge instead of C knowledge. >>> >> >> >>>>> >>> >> >> >>>>> I personally would love such a thing. ?It's been a while since >>> >> >> >>>>> I >>> >> >> >>>>> did >>> >> >> >>>>> anything nontrivial on my own in C++. >>> >> >> >>>>> >>> >> >> >>>> >>> >> >> >>>> One example: How do we code multiple return values? >>> >> >> >>>> >>> >> >> >>>> In Python: >>> >> >> >>>> - Return a tuple. >>> >> >> >>>> >>> >> >> >>>> In C: >>> >> >> >>>> - Use pointers (evilness) >>> >> >> >>>> >>> >> >> >>>> In C++: >>> >> >> >>>> - Return a std::tuple, as you would in Python. >>> >> >> >>>> - Use references, as you would in Fortran or Pascal. >>> >> >> >>>> - Use pointers, as you would in C. >>> >> >> >>>> >>> >> >> >>>> C++ textbooks always pick the last... >>> >> >> >>>> >>> >> >> >>>> I would show the first and the second method, and perhaps >>> >> >> >>>> intentionally forget the last. >>> >> >> >>>> >>> >> >> >>>> Sturla >>> >> >> >>>> >>> >> >> >> >>> >> >> >>> On the flip side, cython looked pretty...but I didn't get the >>> >> >> >>> performance gains I wanted, and had to spend a lot of time >>> >> >> >>> figuring >>> >> >> >>> out if it was cython, needing to add types, buggy support for >>> >> >> >>> numpy, >>> >> >> >>> or actually the algorithm. >>> >> >> >> >>> >> >> >> At the time, was the numpy support buggy? ?I personally haven't >>> >> >> >> had >>> >> >> >> many problems with Cython and numpy. >>> >> >> >> >>> >> >> > >>> >> >> > It's not that the support WAS buggy, it's that it wasn't clear to >>> >> >> > me >>> >> >> > what was going on and where my performance bottleneck was. Even >>> >> >> > after >>> >> >> > microbenchmarking with ipython, using timeit and prun, and using >>> >> >> > the >>> >> >> > cython code visualization tool. Ultimately I don't think it was >>> >> >> > cython, so perhaps my comment was a bit unfair. But it was >>> >> >> > unfortunately difficult to verify that. Of course, as you say, >>> >> >> > diagnosing and solving such issues would become easier to resolve >>> >> >> > with >>> >> >> > more cython experience. >>> >> >> > >>> >> >> >>> The C files generated by cython were >>> >> >> >>> enormous and difficult to read. They really weren't meant for >>> >> >> >>> human >>> >> >> >>> consumption. >>> >> >> >> >>> >> >> >> Yes, it takes some practice to get used to what Cython will do, >>> >> >> >> and >>> >> >> >> how to optimize the output. >>> >> >> >> >>> >> >> >>> As Sturla has said, regardless of the quality of the >>> >> >> >>> current product, it isn't stable. >>> >> >> >> >>> >> >> >> I've personally found it more or less rock solid. ?Could you say >>> >> >> >> what >>> >> >> >> you mean by "it isn't stable"? >>> >> >> >> >>> >> >> > >>> >> >> > I just meant what Sturla said, nothing more: >>> >> >> > >>> >> >> > "Cython is still 0.16, it is still unfinished. We cannot base >>> >> >> > NumPy >>> >> >> > on >>> >> >> > an unfinished compiler." >>> >> >> >>> >> >> Y'all mean, it has a zero at the beginning of the version number and >>> >> >> it is still adding new features? ?Yes, that is correct, but it seems >>> >> >> more reasonable to me to phrase that as 'active development' rather >>> >> >> than 'unstable', because they take considerable care to be backwards >>> >> >> compatible, have a large automated Cython test suite, and a major >>> >> >> stress-tester in the Sage test suite. >>> >> >> >>> >> > >>> >> > Matthew, >>> >> > >>> >> > No one in their right mind would build a large performance library >>> >> > using >>> >> > Cython, it just isn't the right tool. For what it was designed for - >>> >> > wrapping existing c code or writing small and simple things close to >>> >> > Python >>> >> > - it does very well, but it was never designed for making core C/C++ >>> >> > libraries and in that role it just gets in the way. >>> >> >>> >> I believe the proposal is to refactor the lowest levels in pure C and >>> >> move the some or most of the library superstructure to Cython. >>> > >>> > >>> > Go for it. >>> >>> The proposal of moving to a core C + cython has been discussed by >>> multiple contributors. It is certainly a valid proposal. *I* have >>> worked on this (npymath, separate compilation), although certainly not >>> as much as I would have wanted to. I think much can be done in that >>> vein. Using the "shut up if you don't do it" is a straw man (and >>> uncalled for). >> >> >> OK, I was annoyed. > > By what? Your misunderstanding of what was being discussed. The proposal being discussed is implementing the core of numpy in C++, wrapped in C to be usable as a C library that other extensions can use, and then exposed to Python in an unspecified way. Cython was raised as an alternative for this core, but as Chuck points out, it doesn't really fit. Your assertion that what was being discussed was putting the core in C and using Cython to wrap it was simply a non-sequitur. Discussion of alternatives is fine. You weren't doing that. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From matthew.brett at gmail.com Sat Feb 18 17:06:08 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 18 Feb 2012 14:06:08 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: Hi, On Sat, Feb 18, 2012 at 2:03 PM, Robert Kern wrote: > On Sat, Feb 18, 2012 at 21:51, Matthew Brett wrote: >> On Sat, Feb 18, 2012 at 1:40 PM, Charles R Harris >> wrote: >>> >>> >>> On Sat, Feb 18, 2012 at 2:17 PM, David Cournapeau >>> wrote: >>>> >>>> On Sat, Feb 18, 2012 at 8:45 PM, Charles R Harris >>>> wrote: >>>> > >>>> > >>>> > On Sat, Feb 18, 2012 at 1:39 PM, Matthew Brett >>>> > wrote: >>>> >> >>>> >> Hi, >>>> >> >>>> >> On Sat, Feb 18, 2012 at 12:35 PM, Charles R Harris >>>> >> wrote: >>>> >> > >>>> >> > >>>> >> > On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett >>>> >> > >>>> >> > wrote: >>>> >> >> >>>> >> >> Hi. >>>> >> >> >>>> >> >> On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire >>>> >> >> wrote: >>>> >> >> > On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett >>>> >> >> > wrote: >>>> >> >> >> Hi, >>>> >> >> >> >>>> >> >> >> On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire >>>> >> >> >> wrote: >>>> >> >> >>> On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden >>>> >> >> >>> >>>> >> >> >>> wrote: >>>> >> >> >>>> >>>> >> >> >>>> >>>> >> >> >>>> Den 18. feb. 2012 kl. 05:01 skrev Jason Grout >>>> >> >> >>>> : >>>> >> >> >>>> >>>> >> >> >>>>> On 2/17/12 9:54 PM, Sturla Molden wrote: >>>> >> >> >>>>>> We would have to write a C++ programming tutorial that is >>>> >> >> >>>>>> based >>>> >> >> >>>>>> on >>>> >> >> >>>>>> Pyton knowledge instead of C knowledge. >>>> >> >> >>>>> >>>> >> >> >>>>> I personally would love such a thing. ?It's been a while since >>>> >> >> >>>>> I >>>> >> >> >>>>> did >>>> >> >> >>>>> anything nontrivial on my own in C++. >>>> >> >> >>>>> >>>> >> >> >>>> >>>> >> >> >>>> One example: How do we code multiple return values? >>>> >> >> >>>> >>>> >> >> >>>> In Python: >>>> >> >> >>>> - Return a tuple. >>>> >> >> >>>> >>>> >> >> >>>> In C: >>>> >> >> >>>> - Use pointers (evilness) >>>> >> >> >>>> >>>> >> >> >>>> In C++: >>>> >> >> >>>> - Return a std::tuple, as you would in Python. >>>> >> >> >>>> - Use references, as you would in Fortran or Pascal. >>>> >> >> >>>> - Use pointers, as you would in C. >>>> >> >> >>>> >>>> >> >> >>>> C++ textbooks always pick the last... >>>> >> >> >>>> >>>> >> >> >>>> I would show the first and the second method, and perhaps >>>> >> >> >>>> intentionally forget the last. >>>> >> >> >>>> >>>> >> >> >>>> Sturla >>>> >> >> >>>> >>>> >> >> >> >>>> >> >> >>> On the flip side, cython looked pretty...but I didn't get the >>>> >> >> >>> performance gains I wanted, and had to spend a lot of time >>>> >> >> >>> figuring >>>> >> >> >>> out if it was cython, needing to add types, buggy support for >>>> >> >> >>> numpy, >>>> >> >> >>> or actually the algorithm. >>>> >> >> >> >>>> >> >> >> At the time, was the numpy support buggy? ?I personally haven't >>>> >> >> >> had >>>> >> >> >> many problems with Cython and numpy. >>>> >> >> >> >>>> >> >> > >>>> >> >> > It's not that the support WAS buggy, it's that it wasn't clear to >>>> >> >> > me >>>> >> >> > what was going on and where my performance bottleneck was. Even >>>> >> >> > after >>>> >> >> > microbenchmarking with ipython, using timeit and prun, and using >>>> >> >> > the >>>> >> >> > cython code visualization tool. Ultimately I don't think it was >>>> >> >> > cython, so perhaps my comment was a bit unfair. But it was >>>> >> >> > unfortunately difficult to verify that. Of course, as you say, >>>> >> >> > diagnosing and solving such issues would become easier to resolve >>>> >> >> > with >>>> >> >> > more cython experience. >>>> >> >> > >>>> >> >> >>> The C files generated by cython were >>>> >> >> >>> enormous and difficult to read. They really weren't meant for >>>> >> >> >>> human >>>> >> >> >>> consumption. >>>> >> >> >> >>>> >> >> >> Yes, it takes some practice to get used to what Cython will do, >>>> >> >> >> and >>>> >> >> >> how to optimize the output. >>>> >> >> >> >>>> >> >> >>> As Sturla has said, regardless of the quality of the >>>> >> >> >>> current product, it isn't stable. >>>> >> >> >> >>>> >> >> >> I've personally found it more or less rock solid. ?Could you say >>>> >> >> >> what >>>> >> >> >> you mean by "it isn't stable"? >>>> >> >> >> >>>> >> >> > >>>> >> >> > I just meant what Sturla said, nothing more: >>>> >> >> > >>>> >> >> > "Cython is still 0.16, it is still unfinished. We cannot base >>>> >> >> > NumPy >>>> >> >> > on >>>> >> >> > an unfinished compiler." >>>> >> >> >>>> >> >> Y'all mean, it has a zero at the beginning of the version number and >>>> >> >> it is still adding new features? ?Yes, that is correct, but it seems >>>> >> >> more reasonable to me to phrase that as 'active development' rather >>>> >> >> than 'unstable', because they take considerable care to be backwards >>>> >> >> compatible, have a large automated Cython test suite, and a major >>>> >> >> stress-tester in the Sage test suite. >>>> >> >> >>>> >> > >>>> >> > Matthew, >>>> >> > >>>> >> > No one in their right mind would build a large performance library >>>> >> > using >>>> >> > Cython, it just isn't the right tool. For what it was designed for - >>>> >> > wrapping existing c code or writing small and simple things close to >>>> >> > Python >>>> >> > - it does very well, but it was never designed for making core C/C++ >>>> >> > libraries and in that role it just gets in the way. >>>> >> >>>> >> I believe the proposal is to refactor the lowest levels in pure C and >>>> >> move the some or most of the library superstructure to Cython. >>>> > >>>> > >>>> > Go for it. >>>> >>>> The proposal of moving to a core C + cython has been discussed by >>>> multiple contributors. It is certainly a valid proposal. *I* have >>>> worked on this (npymath, separate compilation), although certainly not >>>> as much as I would have wanted to. I think much can be done in that >>>> vein. Using the "shut up if you don't do it" is a straw man (and >>>> uncalled for). >>> >>> >>> OK, I was annoyed. >> >> By what? > > Your misunderstanding of what was being discussed. The proposal being > discussed is implementing the core of numpy in C++, wrapped in C to be > usable as a C library that other extensions can use, and then exposed > to Python in an unspecified way. Cython was raised as an alternative > for this core, but as Chuck points out, it doesn't really fit. Your > assertion that what was being discussed was putting the core in C and > using Cython to wrap it was simply a non-sequitur. Discussion of > alternatives is fine. You weren't doing that. You read David's email? Was he also being annoying? Best, Matthew From d.s.seljebotn at astro.uio.no Sat Feb 18 17:07:46 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 18 Feb 2012 14:07:46 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: <4F402132.8030600@astro.uio.no> On 02/18/2012 12:35 PM, Charles R Harris wrote: > > > On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett > wrote: > > Hi. > > On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire > > wrote: > > On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett > > wrote: > >> Hi, > >> > >> On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire > >> > wrote: > >>> On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden > > wrote: > >>>> > >>>> > >>>> Den 18. feb. 2012 kl. 05:01 skrev Jason Grout > >: > >>>> > >>>>> On 2/17/12 9:54 PM, Sturla Molden wrote: > >>>>>> We would have to write a C++ programming tutorial that is > based on Pyton knowledge instead of C knowledge. > >>>>> > >>>>> I personally would love such a thing. It's been a while > since I did > >>>>> anything nontrivial on my own in C++. > >>>>> > >>>> > >>>> One example: How do we code multiple return values? > >>>> > >>>> In Python: > >>>> - Return a tuple. > >>>> > >>>> In C: > >>>> - Use pointers (evilness) > >>>> > >>>> In C++: > >>>> - Return a std::tuple, as you would in Python. > >>>> - Use references, as you would in Fortran or Pascal. > >>>> - Use pointers, as you would in C. > >>>> > >>>> C++ textbooks always pick the last... > >>>> > >>>> I would show the first and the second method, and perhaps > intentionally forget the last. > >>>> > >>>> Sturla > >>>> > >> > >>> On the flip side, cython looked pretty...but I didn't get the > >>> performance gains I wanted, and had to spend a lot of time figuring > >>> out if it was cython, needing to add types, buggy support for > numpy, > >>> or actually the algorithm. > >> > >> At the time, was the numpy support buggy? I personally haven't had > >> many problems with Cython and numpy. > >> > > > > It's not that the support WAS buggy, it's that it wasn't clear to me > > what was going on and where my performance bottleneck was. Even after > > microbenchmarking with ipython, using timeit and prun, and using the > > cython code visualization tool. Ultimately I don't think it was > > cython, so perhaps my comment was a bit unfair. But it was > > unfortunately difficult to verify that. Of course, as you say, > > diagnosing and solving such issues would become easier to resolve > with > > more cython experience. > > > >>> The C files generated by cython were > >>> enormous and difficult to read. They really weren't meant for human > >>> consumption. > >> > >> Yes, it takes some practice to get used to what Cython will do, and > >> how to optimize the output. > >> > >>> As Sturla has said, regardless of the quality of the > >>> current product, it isn't stable. > >> > >> I've personally found it more or less rock solid. Could you say > what > >> you mean by "it isn't stable"? > >> > > > > I just meant what Sturla said, nothing more: > > > > "Cython is still 0.16, it is still unfinished. We cannot base > NumPy on > > an unfinished compiler." > > Y'all mean, it has a zero at the beginning of the version number and > it is still adding new features? Yes, that is correct, but it seems > more reasonable to me to phrase that as 'active development' rather > than 'unstable', because they take considerable care to be backwards > compatible, have a large automated Cython test suite, and a major > stress-tester in the Sage test suite. > > > Matthew, > > No one in their right mind would build a large performance library using > Cython, it just isn't the right tool. For what it was designed for - > wrapping existing c code or writing small and simple things close to > Python - it does very well, but it was never designed for making core > C/C++ libraries and in that role it just gets in the way. +1. Even I who have contributed to Cython realize this; last autumn I implemented a library by writing it in C and wrapping it in Cython. Dag From sturla at molden.no Sat Feb 18 17:17:20 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 18 Feb 2012 23:17:20 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: <4F402370.4050803@molden.no> Den 18.02.2012 22:25, skrev Benjamin Root: > 2.) My personal preference is an incremental refactor over to C++ > using STL, however, I have to be realistic. First, the exception > issue is problematic (unsolvable? I don't know). Second, one of > Numpy/Scipy's greatest strengths is the relative ease it has in > interfacing with BLAS, ATLAS, mkl and other optimizations. Will this > still be possible from a C++ (or anything else) core? Yes. > Third, I am only familiar with STL on gcc. Are there any subtle > differences in implementations of STL in MSVC or any other compilers. > Pointers are hard to mess up, in cross-platform ways. NumPy should stay with the standard, whether C or C++, ans not be written for one particular compiler. Writing code that depends on a set of known bugs in one implementation is why IE6 almost broke the internet. > 3.) Will memory-mapped arrays still be possible after the refactor? I > am not familiar with the implementation, but I am a big netcdf/hdf > user and mem-mapped arrays are important to me. Yes, that depends on the operating system, not the programming language. Sturla From robert.kern at gmail.com Sat Feb 18 17:20:21 2012 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 18 Feb 2012 22:20:21 +0000 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: On Sat, Feb 18, 2012 at 22:06, Matthew Brett wrote: > Hi, > > On Sat, Feb 18, 2012 at 2:03 PM, Robert Kern wrote: >> On Sat, Feb 18, 2012 at 21:51, Matthew Brett wrote: >>> On Sat, Feb 18, 2012 at 1:40 PM, Charles R Harris >>> wrote: >>>> >>>> >>>> On Sat, Feb 18, 2012 at 2:17 PM, David Cournapeau >>>> wrote: >>>>> >>>>> On Sat, Feb 18, 2012 at 8:45 PM, Charles R Harris >>>>> wrote: >>>>> > >>>>> > >>>>> > On Sat, Feb 18, 2012 at 1:39 PM, Matthew Brett >>>>> > wrote: >>>>> >> >>>>> >> Hi, >>>>> >> >>>>> >> On Sat, Feb 18, 2012 at 12:35 PM, Charles R Harris >>>>> >> wrote: >>>>> >> > >>>>> >> > >>>>> >> > On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett >>>>> >> > >>>>> >> > wrote: >>>>> >> >> >>>>> >> >> Hi. >>>>> >> >> >>>>> >> >> On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire >>>>> >> >> wrote: >>>>> >> >> > On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett >>>>> >> >> > wrote: >>>>> >> >> >> Hi, >>>>> >> >> >> >>>>> >> >> >> On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire >>>>> >> >> >> wrote: >>>>> >> >> >>> On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden >>>>> >> >> >>> >>>>> >> >> >>> wrote: >>>>> >> >> >>>> >>>>> >> >> >>>> >>>>> >> >> >>>> Den 18. feb. 2012 kl. 05:01 skrev Jason Grout >>>>> >> >> >>>> : >>>>> >> >> >>>> >>>>> >> >> >>>>> On 2/17/12 9:54 PM, Sturla Molden wrote: >>>>> >> >> >>>>>> We would have to write a C++ programming tutorial that is >>>>> >> >> >>>>>> based >>>>> >> >> >>>>>> on >>>>> >> >> >>>>>> Pyton knowledge instead of C knowledge. >>>>> >> >> >>>>> >>>>> >> >> >>>>> I personally would love such a thing. ?It's been a while since >>>>> >> >> >>>>> I >>>>> >> >> >>>>> did >>>>> >> >> >>>>> anything nontrivial on my own in C++. >>>>> >> >> >>>>> >>>>> >> >> >>>> >>>>> >> >> >>>> One example: How do we code multiple return values? >>>>> >> >> >>>> >>>>> >> >> >>>> In Python: >>>>> >> >> >>>> - Return a tuple. >>>>> >> >> >>>> >>>>> >> >> >>>> In C: >>>>> >> >> >>>> - Use pointers (evilness) >>>>> >> >> >>>> >>>>> >> >> >>>> In C++: >>>>> >> >> >>>> - Return a std::tuple, as you would in Python. >>>>> >> >> >>>> - Use references, as you would in Fortran or Pascal. >>>>> >> >> >>>> - Use pointers, as you would in C. >>>>> >> >> >>>> >>>>> >> >> >>>> C++ textbooks always pick the last... >>>>> >> >> >>>> >>>>> >> >> >>>> I would show the first and the second method, and perhaps >>>>> >> >> >>>> intentionally forget the last. >>>>> >> >> >>>> >>>>> >> >> >>>> Sturla >>>>> >> >> >>>> >>>>> >> >> >> >>>>> >> >> >>> On the flip side, cython looked pretty...but I didn't get the >>>>> >> >> >>> performance gains I wanted, and had to spend a lot of time >>>>> >> >> >>> figuring >>>>> >> >> >>> out if it was cython, needing to add types, buggy support for >>>>> >> >> >>> numpy, >>>>> >> >> >>> or actually the algorithm. >>>>> >> >> >> >>>>> >> >> >> At the time, was the numpy support buggy? ?I personally haven't >>>>> >> >> >> had >>>>> >> >> >> many problems with Cython and numpy. >>>>> >> >> >> >>>>> >> >> > >>>>> >> >> > It's not that the support WAS buggy, it's that it wasn't clear to >>>>> >> >> > me >>>>> >> >> > what was going on and where my performance bottleneck was. Even >>>>> >> >> > after >>>>> >> >> > microbenchmarking with ipython, using timeit and prun, and using >>>>> >> >> > the >>>>> >> >> > cython code visualization tool. Ultimately I don't think it was >>>>> >> >> > cython, so perhaps my comment was a bit unfair. But it was >>>>> >> >> > unfortunately difficult to verify that. Of course, as you say, >>>>> >> >> > diagnosing and solving such issues would become easier to resolve >>>>> >> >> > with >>>>> >> >> > more cython experience. >>>>> >> >> > >>>>> >> >> >>> The C files generated by cython were >>>>> >> >> >>> enormous and difficult to read. They really weren't meant for >>>>> >> >> >>> human >>>>> >> >> >>> consumption. >>>>> >> >> >> >>>>> >> >> >> Yes, it takes some practice to get used to what Cython will do, >>>>> >> >> >> and >>>>> >> >> >> how to optimize the output. >>>>> >> >> >> >>>>> >> >> >>> As Sturla has said, regardless of the quality of the >>>>> >> >> >>> current product, it isn't stable. >>>>> >> >> >> >>>>> >> >> >> I've personally found it more or less rock solid. ?Could you say >>>>> >> >> >> what >>>>> >> >> >> you mean by "it isn't stable"? >>>>> >> >> >> >>>>> >> >> > >>>>> >> >> > I just meant what Sturla said, nothing more: >>>>> >> >> > >>>>> >> >> > "Cython is still 0.16, it is still unfinished. We cannot base >>>>> >> >> > NumPy >>>>> >> >> > on >>>>> >> >> > an unfinished compiler." >>>>> >> >> >>>>> >> >> Y'all mean, it has a zero at the beginning of the version number and >>>>> >> >> it is still adding new features? ?Yes, that is correct, but it seems >>>>> >> >> more reasonable to me to phrase that as 'active development' rather >>>>> >> >> than 'unstable', because they take considerable care to be backwards >>>>> >> >> compatible, have a large automated Cython test suite, and a major >>>>> >> >> stress-tester in the Sage test suite. >>>>> >> >> >>>>> >> > >>>>> >> > Matthew, >>>>> >> > >>>>> >> > No one in their right mind would build a large performance library >>>>> >> > using >>>>> >> > Cython, it just isn't the right tool. For what it was designed for - >>>>> >> > wrapping existing c code or writing small and simple things close to >>>>> >> > Python >>>>> >> > - it does very well, but it was never designed for making core C/C++ >>>>> >> > libraries and in that role it just gets in the way. >>>>> >> >>>>> >> I believe the proposal is to refactor the lowest levels in pure C and >>>>> >> move the some or most of the library superstructure to Cython. >>>>> > >>>>> > >>>>> > Go for it. >>>>> >>>>> The proposal of moving to a core C + cython has been discussed by >>>>> multiple contributors. It is certainly a valid proposal. *I* have >>>>> worked on this (npymath, separate compilation), although certainly not >>>>> as much as I would have wanted to. I think much can be done in that >>>>> vein. Using the "shut up if you don't do it" is a straw man (and >>>>> uncalled for). >>>> >>>> >>>> OK, I was annoyed. >>> >>> By what? >> >> Your misunderstanding of what was being discussed. The proposal being >> discussed is implementing the core of numpy in C++, wrapped in C to be >> usable as a C library that other extensions can use, and then exposed >> to Python in an unspecified way. Cython was raised as an alternative >> for this core, but as Chuck points out, it doesn't really fit. Your >> assertion that what was being discussed was putting the core in C and >> using Cython to wrap it was simply a non-sequitur. Discussion of >> alternatives is fine. You weren't doing that. > > You read David's email? ?Was he also being annoying? Not really, because he was responding on-topic to the bizarro-branch of the conversation that you spawned about the merits of moving from hand-written C extensions to a Cython-wrapped C library. Whatever annoyance his email might inspire is your fault, not his. The discussion was about whether to use C++ or Cython for the core. Chuck argued that Cython was not a suitable implementation language for the core. You responded that his objections to Cython didn't apply to what you thought was being discussed, using Cython to wrap a pure-C library. As Pauli (Wolfgang, not our Pauli) once phrased it, you were "not even wrong". It's hard to respond coherently to someone who is breaking the fundamental expectations of discourse. Even I had to stare at the thread for a few minutes to figure out where things went off the rails. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From cournape at gmail.com Sat Feb 18 17:24:33 2012 From: cournape at gmail.com (David Cournapeau) Date: Sat, 18 Feb 2012 22:24:33 +0000 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: On Sat, Feb 18, 2012 at 9:40 PM, Charles R Harris wrote: > > Well, we already have code obfuscation (DOUBLE_your_pleasure, > FLOAT_your_boat), so we might as well let the compiler handle it. Yes, those are not great, but on the other hand, it is not that a fundamental issue IMO. Iterators as we have it in NumPy is something that is clearly limited by C. Writing the neighborhood iterator is the only case where I really felt that C++ *could* be a significant improvement. I use *could* because writing iterator in C++ is hard, and will be much harder to read (I find both boost and STL - e.g. stlport -- iterators to be close to write-only code). But there is the question on how you can make C++-based iterators available in C. I would be interested in a simple example of how this could be done, ignoring all the other issues (portability, exception, etc?). The STL is also potentially compelling, but that's where we go into my "beware of the dragons" area of C++. Portability loss, compilation time increase and warts are significant there. scipy.sparse.sparsetools has been a source of issues that was quite high compared to its proportion of scipy amount code (we *do* have some hard-won experience on C++-related issues). > > Jim Hugunin was a keynote speaker at one of the scipy conventions. At dinner > he said that if he was to do it again he would use managed code ;) I don't > propose we do that, but tools do advance. In an ideal world, we would have a better language than C++ that can be spit out as C for portability. I have looked for a way to do this for as long as I have been contributing to NumPy (I have looked at ooc, D, coccinelle at various stages). I believe the best way is actually in the vein of FFTW: written in a very high level language (OCAML) for the hard part, and spitting out C. This is better than C++ is many ways - this is also clearly not realistic :) David From matthew.brett at gmail.com Sat Feb 18 17:29:03 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 18 Feb 2012 14:29:03 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: Hi, On Sat, Feb 18, 2012 at 2:20 PM, Robert Kern wrote: > On Sat, Feb 18, 2012 at 22:06, Matthew Brett wrote: >> Hi, >> >> On Sat, Feb 18, 2012 at 2:03 PM, Robert Kern wrote: >>> On Sat, Feb 18, 2012 at 21:51, Matthew Brett wrote: >>>> On Sat, Feb 18, 2012 at 1:40 PM, Charles R Harris >>>> wrote: >>>>> >>>>> >>>>> On Sat, Feb 18, 2012 at 2:17 PM, David Cournapeau >>>>> wrote: >>>>>> >>>>>> On Sat, Feb 18, 2012 at 8:45 PM, Charles R Harris >>>>>> wrote: >>>>>> > >>>>>> > >>>>>> > On Sat, Feb 18, 2012 at 1:39 PM, Matthew Brett >>>>>> > wrote: >>>>>> >> >>>>>> >> Hi, >>>>>> >> >>>>>> >> On Sat, Feb 18, 2012 at 12:35 PM, Charles R Harris >>>>>> >> wrote: >>>>>> >> > >>>>>> >> > >>>>>> >> > On Sat, Feb 18, 2012 at 12:21 PM, Matthew Brett >>>>>> >> > >>>>>> >> > wrote: >>>>>> >> >> >>>>>> >> >> Hi. >>>>>> >> >> >>>>>> >> >> On Sat, Feb 18, 2012 at 12:18 AM, Christopher Jordan-Squire >>>>>> >> >> wrote: >>>>>> >> >> > On Fri, Feb 17, 2012 at 11:31 PM, Matthew Brett >>>>>> >> >> > wrote: >>>>>> >> >> >> Hi, >>>>>> >> >> >> >>>>>> >> >> >> On Fri, Feb 17, 2012 at 10:18 PM, Christopher Jordan-Squire >>>>>> >> >> >> wrote: >>>>>> >> >> >>> On Fri, Feb 17, 2012 at 8:30 PM, Sturla Molden >>>>>> >> >> >>> >>>>>> >> >> >>> wrote: >>>>>> >> >> >>>> >>>>>> >> >> >>>> >>>>>> >> >> >>>> Den 18. feb. 2012 kl. 05:01 skrev Jason Grout >>>>>> >> >> >>>> : >>>>>> >> >> >>>> >>>>>> >> >> >>>>> On 2/17/12 9:54 PM, Sturla Molden wrote: >>>>>> >> >> >>>>>> We would have to write a C++ programming tutorial that is >>>>>> >> >> >>>>>> based >>>>>> >> >> >>>>>> on >>>>>> >> >> >>>>>> Pyton knowledge instead of C knowledge. >>>>>> >> >> >>>>> >>>>>> >> >> >>>>> I personally would love such a thing. ?It's been a while since >>>>>> >> >> >>>>> I >>>>>> >> >> >>>>> did >>>>>> >> >> >>>>> anything nontrivial on my own in C++. >>>>>> >> >> >>>>> >>>>>> >> >> >>>> >>>>>> >> >> >>>> One example: How do we code multiple return values? >>>>>> >> >> >>>> >>>>>> >> >> >>>> In Python: >>>>>> >> >> >>>> - Return a tuple. >>>>>> >> >> >>>> >>>>>> >> >> >>>> In C: >>>>>> >> >> >>>> - Use pointers (evilness) >>>>>> >> >> >>>> >>>>>> >> >> >>>> In C++: >>>>>> >> >> >>>> - Return a std::tuple, as you would in Python. >>>>>> >> >> >>>> - Use references, as you would in Fortran or Pascal. >>>>>> >> >> >>>> - Use pointers, as you would in C. >>>>>> >> >> >>>> >>>>>> >> >> >>>> C++ textbooks always pick the last... >>>>>> >> >> >>>> >>>>>> >> >> >>>> I would show the first and the second method, and perhaps >>>>>> >> >> >>>> intentionally forget the last. >>>>>> >> >> >>>> >>>>>> >> >> >>>> Sturla >>>>>> >> >> >>>> >>>>>> >> >> >> >>>>>> >> >> >>> On the flip side, cython looked pretty...but I didn't get the >>>>>> >> >> >>> performance gains I wanted, and had to spend a lot of time >>>>>> >> >> >>> figuring >>>>>> >> >> >>> out if it was cython, needing to add types, buggy support for >>>>>> >> >> >>> numpy, >>>>>> >> >> >>> or actually the algorithm. >>>>>> >> >> >> >>>>>> >> >> >> At the time, was the numpy support buggy? ?I personally haven't >>>>>> >> >> >> had >>>>>> >> >> >> many problems with Cython and numpy. >>>>>> >> >> >> >>>>>> >> >> > >>>>>> >> >> > It's not that the support WAS buggy, it's that it wasn't clear to >>>>>> >> >> > me >>>>>> >> >> > what was going on and where my performance bottleneck was. Even >>>>>> >> >> > after >>>>>> >> >> > microbenchmarking with ipython, using timeit and prun, and using >>>>>> >> >> > the >>>>>> >> >> > cython code visualization tool. Ultimately I don't think it was >>>>>> >> >> > cython, so perhaps my comment was a bit unfair. But it was >>>>>> >> >> > unfortunately difficult to verify that. Of course, as you say, >>>>>> >> >> > diagnosing and solving such issues would become easier to resolve >>>>>> >> >> > with >>>>>> >> >> > more cython experience. >>>>>> >> >> > >>>>>> >> >> >>> The C files generated by cython were >>>>>> >> >> >>> enormous and difficult to read. They really weren't meant for >>>>>> >> >> >>> human >>>>>> >> >> >>> consumption. >>>>>> >> >> >> >>>>>> >> >> >> Yes, it takes some practice to get used to what Cython will do, >>>>>> >> >> >> and >>>>>> >> >> >> how to optimize the output. >>>>>> >> >> >> >>>>>> >> >> >>> As Sturla has said, regardless of the quality of the >>>>>> >> >> >>> current product, it isn't stable. >>>>>> >> >> >> >>>>>> >> >> >> I've personally found it more or less rock solid. ?Could you say >>>>>> >> >> >> what >>>>>> >> >> >> you mean by "it isn't stable"? >>>>>> >> >> >> >>>>>> >> >> > >>>>>> >> >> > I just meant what Sturla said, nothing more: >>>>>> >> >> > >>>>>> >> >> > "Cython is still 0.16, it is still unfinished. We cannot base >>>>>> >> >> > NumPy >>>>>> >> >> > on >>>>>> >> >> > an unfinished compiler." >>>>>> >> >> >>>>>> >> >> Y'all mean, it has a zero at the beginning of the version number and >>>>>> >> >> it is still adding new features? ?Yes, that is correct, but it seems >>>>>> >> >> more reasonable to me to phrase that as 'active development' rather >>>>>> >> >> than 'unstable', because they take considerable care to be backwards >>>>>> >> >> compatible, have a large automated Cython test suite, and a major >>>>>> >> >> stress-tester in the Sage test suite. >>>>>> >> >> >>>>>> >> > >>>>>> >> > Matthew, >>>>>> >> > >>>>>> >> > No one in their right mind would build a large performance library >>>>>> >> > using >>>>>> >> > Cython, it just isn't the right tool. For what it was designed for - >>>>>> >> > wrapping existing c code or writing small and simple things close to >>>>>> >> > Python >>>>>> >> > - it does very well, but it was never designed for making core C/C++ >>>>>> >> > libraries and in that role it just gets in the way. >>>>>> >> >>>>>> >> I believe the proposal is to refactor the lowest levels in pure C and >>>>>> >> move the some or most of the library superstructure to Cython. >>>>>> > >>>>>> > >>>>>> > Go for it. >>>>>> >>>>>> The proposal of moving to a core C + cython has been discussed by >>>>>> multiple contributors. It is certainly a valid proposal. *I* have >>>>>> worked on this (npymath, separate compilation), although certainly not >>>>>> as much as I would have wanted to. I think much can be done in that >>>>>> vein. Using the "shut up if you don't do it" is a straw man (and >>>>>> uncalled for). >>>>> >>>>> >>>>> OK, I was annoyed. >>>> >>>> By what? >>> >>> Your misunderstanding of what was being discussed. The proposal being >>> discussed is implementing the core of numpy in C++, wrapped in C to be >>> usable as a C library that other extensions can use, and then exposed >>> to Python in an unspecified way. Cython was raised as an alternative >>> for this core, but as Chuck points out, it doesn't really fit. Your >>> assertion that what was being discussed was putting the core in C and >>> using Cython to wrap it was simply a non-sequitur. Discussion of >>> alternatives is fine. You weren't doing that. >> >> You read David's email? ?Was he also being annoying? > > Not really, because he was responding on-topic to the bizarro-branch > of the conversation that you spawned about the merits of moving from > hand-written C extensions to a Cython-wrapped C library. Whatever > annoyance his email might inspire is your fault, not his. The > discussion was about whether to use C++ or Cython for the core. Chuck > argued that Cython was not a suitable implementation language for the > core. You responded that his objections to Cython didn't apply to what > you thought was being discussed, using Cython to wrap a pure-C > library. As Pauli (Wolfgang, not our Pauli) once phrased it, you were > "not even wrong". It's hard to respond coherently to someone who is > breaking the fundamental expectations of discourse. Even I had to > stare at the thread for a few minutes to figure out where things went > off the rails. I'm sorry but this seems to me to be aggressive, offensive, and unjust. The discussion was, from the beginning, mainly about the relative benefits of rewriting the core with C / Cython, or C++. I don't think anyone was proposing writing every line of the numpy core in Cython. Ergo (sorry to use the debating term), the proposal to use Cython was always to take some of the higher level code out of C and leave some of it in C. It does indeed make the debate ridiculous to oppose a proposal that no-one has made. Now I am sure it is obvious to you, that the proposal to refactor the current C code to into low-level C libraries, and higher level Cython wrappers, is absurd and off the table. It isn't obvious to me. I don't think I broke a fundamental rule of polite discourse to clarify that is what I meant, Best, Matthew From sturla at molden.no Sat Feb 18 17:50:46 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 18 Feb 2012 23:50:46 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: <4F402B46.4020103@molden.no> Den 18.02.2012 23:24, skrev David Cournapeau: > Iterators as we have it in NumPy is something that is clearly limited > by C. Computers tend to have more than one CPU now. Iterators are inherently bad, whether they are written in C or C++. NumPy core should be written with objects that are scalable on multiple processors. Remember the original numeric was written in a time where dektop computers only had one processor. > In an ideal world, we would have a better language than C++ that can be spit out as > C for portability. What about a statically typed Python? (That is, not Cython.) We just need to make the compiler :-) Sturla From robert.kern at gmail.com Sat Feb 18 17:51:42 2012 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 18 Feb 2012 22:51:42 +0000 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: On Sat, Feb 18, 2012 at 22:29, Matthew Brett wrote: > Hi, > > On Sat, Feb 18, 2012 at 2:20 PM, Robert Kern wrote: >> On Sat, Feb 18, 2012 at 22:06, Matthew Brett wrote: >>> Hi, >>> >>> On Sat, Feb 18, 2012 at 2:03 PM, Robert Kern wrote: >>>> Your misunderstanding of what was being discussed. The proposal being >>>> discussed is implementing the core of numpy in C++, wrapped in C to be >>>> usable as a C library that other extensions can use, and then exposed >>>> to Python in an unspecified way. Cython was raised as an alternative >>>> for this core, but as Chuck points out, it doesn't really fit. Your >>>> assertion that what was being discussed was putting the core in C and >>>> using Cython to wrap it was simply a non-sequitur. Discussion of >>>> alternatives is fine. You weren't doing that. >>> >>> You read David's email? ?Was he also being annoying? >> >> Not really, because he was responding on-topic to the bizarro-branch >> of the conversation that you spawned about the merits of moving from >> hand-written C extensions to a Cython-wrapped C library. Whatever >> annoyance his email might inspire is your fault, not his. The >> discussion was about whether to use C++ or Cython for the core. Chuck >> argued that Cython was not a suitable implementation language for the >> core. You responded that his objections to Cython didn't apply to what >> you thought was being discussed, using Cython to wrap a pure-C >> library. As Pauli (Wolfgang, not our Pauli) once phrased it, you were >> "not even wrong". It's hard to respond coherently to someone who is >> breaking the fundamental expectations of discourse. Even I had to >> stare at the thread for a few minutes to figure out where things went >> off the rails. > > I'm sorry but this seems to me to be aggressive, offensive, and unjust. > > The discussion was, from the beginning, mainly about the relative > benefits of rewriting the core with C / Cython, or C++. > > I don't think anyone was proposing writing every line of the numpy > core in Cython. ?Ergo (sorry to use the debating term), the proposal > to use Cython was always to take some of the higher level code out of > C and leave some of it in C. ? It does indeed make the debate > ridiculous to oppose a proposal that no-one has made. > > Now I am sure it is obvious to you, that the proposal to refactor the > current C code to into low-level C libraries, and higher level Cython > wrappers, is absurd and off the table. ?It isn't obvious to me. ?I > don't think I broke a fundamental rule of polite discourse to clarify > that is what I meant, It's not off the table, but it's not what this discussion was about. The proposal is to implement the core in C++. Regardless of whether the core is separated out as an independent non-Python library or not. Some people want to use higher level language features in the core. Cython was brought up as an alternative. If they were bringing up Cython in the context of C-core+Cython-wrapper, then they were also misunderstanding what the proposal was about. The discussion is about a C++-core versus a C-core (either the current one or a refactored one). If you want to argue for a C-core over a C++-core, that's great, but talking about Cython features and stability is not relevant to that discussion. It's an entirely orthogonal issue to what is motivating the request to use C++ in the core. C-core+Cython-wrapper is still a viable alternative, but the relevant bit of that is "C-core". I would wager that after any refactoring of the core, regardless of whether it is implemented in C++ or C, we would then wrap it in Cython. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From travis at continuum.io Sat Feb 18 17:54:55 2012 From: travis at continuum.io (Travis Oliphant) Date: Sat, 18 Feb 2012 16:54:55 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <14348F26-27EB-4313-8F90-D9726DC2144D@continuum.io> Message-ID: <3E9F109C-241F-4B91-832D-CDB09941206B@continuum.io> On Feb 18, 2012, at 4:03 PM, Matthew Brett wrote: > Hi, > > On Sat, Feb 18, 2012 at 1:57 PM, Travis Oliphant wrote: >> The C/C++ discussion is just getting started. Everyone should keep in mind >> that this is not something that is going to happening quickly. This will >> be a point of discussion throughout the year. I'm not a huge supporter of >> C++, but C++11 does look like it's made some nice progress, and as I think >> about making a core-set of NumPy into a library that can be called by >> multiple languages (and even multiple implementations of Python), tempered >> C++ seems like it might be an appropriate way to go. > > Could you say more about this? Do you have any idea when the decision > about C++ is likely to be made? At what point does it make most sense > to make the argument for or against? Can you suggest a good way for > us to be able to make more substantial arguments either way? I think early arguments against are always appropriate --- if you believe they have a chance of swaying Mark or Chuck who are the strongest supporters of C++ at this point. I will be quite nervous about going crazy with C++. It was suggested that I use C++ 7 years ago when I wrote NumPy. I didn't go that route then largely because of compiler issues, ABI-concerns, and I knew C better than C++ so I felt like it would have taken me longer to do something in C++. I made the right decision for me. If you think my C-code is horrible, you would have been completely offended by whatever C++ I might have done at the time. But I basically agree with Chuck that there is a lot of C-code in NumPy and template-based-code that is really trying to be C++ spelled differently. The decision will not be made until NumPy 2.0 work is farther along. The most likely outcome is that Mark will develop something quite nice in C++ which he is already toying with, and we will either choose to use it in NumPy to build 2.0 on --- or not. I'm interested in sponsoring Mark and working as closely as I can with he and Chuck to see what emerges. I'm reading very carefully any arguments against using C++ because I've actually pushed back on Mark pretty hard as we've discussed these things over the past months. I am nervous about corner use-cases that will be unpleasant for some groups and some platforms. But, that vague nervousness is not enough to discount the clear benefits. I'm curious about the state of C++ compilers for Blue-Gene and other big-iron machines as well. My impression is that most of them use g++. which has pretty good support for C++. David and others raised some important concerns (merging multiple compilers seems like the biggest issue --- it already is...). If someone out there seriously opposes judicious and careful use of C++ and can show a clear reason why it would be harmful --- feel free to speak up at any time. We are leaning that way with Mark out in front of us leading the charge. > > Can you say a little more about your impression of the previous Cython > refactor and why it was not successful? > Sure. This list actually deserves a long writeup about that. First, there wasn't a "Cython-refactor" of NumPy. There was a Cython-refactor of SciPy. I'm not sure of it's current status. I'm still very supportive of that sort of thing. I don't know if Cython ever solved the "raising an exception in a Fortran-called call-back" issue. I used setjmp and longjmp in several places in SciPy originally in order to enable exceptions raised in a Python-callback that is wrapped in a C-function pointer and being handed to a Fortran-routine that asks for a function-pointer. What happend in NumPy, was that the code was re-factored to become a library. I don't think much NumPy code actually ended up in Cython (the random-number generators have been in Cython from the beginning). The biggest problem with merging the code was that Mark Wiebe got active at about that same time :-) He ended up changing several things in the code-base that made it difficult to merge-in the changes. Some of the bug-fixes and memory-leak patches, and tests did get into the code-base, but the essential creation of the NumPy library did not make it. There was some very good work done that I hope we can still take advantage of. Another factor. the decision to make an extra layer of indirection makes small arrays that much slower. I agree with Mark that in a core library we need to go the other way with small arrays being completely allocated in the data-structure itself (reducing the number of pointer de-references). So, Cython did not play a major role on the NumPy side of things. It played a very nice role on the SciPy side of things. -Travis > Thanks a lot, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From matthew.brett at gmail.com Sat Feb 18 18:04:10 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 18 Feb 2012 15:04:10 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: Hi, On Sat, Feb 18, 2012 at 2:51 PM, Robert Kern wrote: > On Sat, Feb 18, 2012 at 22:29, Matthew Brett wrote: >> Hi, >> >> On Sat, Feb 18, 2012 at 2:20 PM, Robert Kern wrote: >>> On Sat, Feb 18, 2012 at 22:06, Matthew Brett wrote: >>>> Hi, >>>> >>>> On Sat, Feb 18, 2012 at 2:03 PM, Robert Kern wrote: > >>>>> Your misunderstanding of what was being discussed. The proposal being >>>>> discussed is implementing the core of numpy in C++, wrapped in C to be >>>>> usable as a C library that other extensions can use, and then exposed >>>>> to Python in an unspecified way. Cython was raised as an alternative >>>>> for this core, but as Chuck points out, it doesn't really fit. Your >>>>> assertion that what was being discussed was putting the core in C and >>>>> using Cython to wrap it was simply a non-sequitur. Discussion of >>>>> alternatives is fine. You weren't doing that. >>>> >>>> You read David's email? ?Was he also being annoying? >>> >>> Not really, because he was responding on-topic to the bizarro-branch >>> of the conversation that you spawned about the merits of moving from >>> hand-written C extensions to a Cython-wrapped C library. Whatever >>> annoyance his email might inspire is your fault, not his. The >>> discussion was about whether to use C++ or Cython for the core. Chuck >>> argued that Cython was not a suitable implementation language for the >>> core. You responded that his objections to Cython didn't apply to what >>> you thought was being discussed, using Cython to wrap a pure-C >>> library. As Pauli (Wolfgang, not our Pauli) once phrased it, you were >>> "not even wrong". It's hard to respond coherently to someone who is >>> breaking the fundamental expectations of discourse. Even I had to >>> stare at the thread for a few minutes to figure out where things went >>> off the rails. >> >> I'm sorry but this seems to me to be aggressive, offensive, and unjust. >> >> The discussion was, from the beginning, mainly about the relative >> benefits of rewriting the core with C / Cython, or C++. >> >> I don't think anyone was proposing writing every line of the numpy >> core in Cython. ?Ergo (sorry to use the debating term), the proposal >> to use Cython was always to take some of the higher level code out of >> C and leave some of it in C. ? It does indeed make the debate >> ridiculous to oppose a proposal that no-one has made. >> >> Now I am sure it is obvious to you, that the proposal to refactor the >> current C code to into low-level C libraries, and higher level Cython >> wrappers, is absurd and off the table. ?It isn't obvious to me. ?I >> don't think I broke a fundamental rule of polite discourse to clarify >> that is what I meant, > > It's not off the table, but it's not what this discussion was about. I beg to differ - which was why I replied the way I did. As I see it the two proposals being discussed were: 1) C++ rewrite of C core 2) Refactor current C core into C / Cython I think you can see from David's reply that that was also his understanding. Of course you could use Cython to interface to the 'core' in C or the 'core' in C++, but the difference would be, that some of the stuff in C++ for option 1) would be in Cython, in option 2). Now you might be saying, that you believe the discussion was only ever about whether the non-Cython bits would be in C or C++. That would indeed make sense of your lack of interest in discussion of Cython. I think you'd be hard pressed to claim it was only me discussing Cython though. Chuck was pointing out that it was completely ridiculous trying to implement the entire core in Cython. Yes it is. As no-one has proposed that, it seems to me only reasonable to point out what I meant, in the interests of productive discourse. Best, Matthew From cournape at gmail.com Sat Feb 18 18:09:32 2012 From: cournape at gmail.com (David Cournapeau) Date: Sat, 18 Feb 2012 23:09:32 +0000 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <4F402B46.4020103@molden.no> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> <4F402B46.4020103@molden.no> Message-ID: On Sat, Feb 18, 2012 at 10:50 PM, Sturla Molden wrote: > ?> In an ideal world, we would have a better language than C++ that can > be spit out as > C for portability. > > What about a statically typed Python? (That is, not Cython.) We just > need to make the compiler :-) There are better languages than C++ that has most of the technical benefits stated in this discussion (rust and D being the most "obvious" ones), but whose usage is unrealistic today for various reasons: knowledge, availability on "esoteric" platforms, etc? A new language is completely ridiculous. David From sturla at molden.no Sat Feb 18 18:17:58 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 19 Feb 2012 00:17:58 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> <4F402B46.4020103@molden.no> Message-ID: <4F4031A6.5050208@molden.no> Den 19.02.2012 00:09, skrev David Cournapeau: > reasons: knowledge, availability on "esoteric" platforms, etc? A new > language is completely ridiculous. Yes, that is why I argued against Cython as well. Personally I prefer C++ to C, but only if it is written in a readable way. And if the purpose is to write C in C++, then it's brain dead. Sturla From sturla at molden.no Sat Feb 18 18:33:37 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 19 Feb 2012 00:33:37 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> <4F402B46.4020103@molden.no> Message-ID: <4F403551.8080302@molden.no> Den 19.02.2012 00:09, skrev David Cournapeau: > There are better languages than C++ that has most of the technical > benefits stated in this discussion (rust and D being the most > "obvious" ones), What about Java? (compile with GJC for CPython) Or just write everything in Cython, even the core? Sturla From sturla at molden.no Sat Feb 18 18:36:13 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 19 Feb 2012 00:36:13 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <4F403551.8080302@molden.no> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3F2297.7010703@creativetrax.com> <4F402B46.4020103@molden.no> <4F403551.8080302@molden.no> Message-ID: <4F4035ED.40105@molden.no> Den 19.02.2012 00:33, skrev Sturla Molden: > Or just write everything in Cython, even the core? That is, use memory view syntax and fused types for generics, and hope it is stable before we are done ;-) Sturla From charlesr.harris at gmail.com Sat Feb 18 18:59:24 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 18 Feb 2012 16:59:24 -0700 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: On Sat, Feb 18, 2012 at 3:24 PM, David Cournapeau wrote: > On Sat, Feb 18, 2012 at 9:40 PM, Charles R Harris > wrote: > > > > > Well, we already have code obfuscation (DOUBLE_your_pleasure, > > FLOAT_your_boat), so we might as well let the compiler handle it. > > Yes, those are not great, but on the other hand, it is not that a > fundamental issue IMO. > "Name mangling" is what I meant. But C++ does exactly the same thing, just more systematically. It's not whether it's great, it's whether the compiler or the programmer does the boring stuff. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Sat Feb 18 19:07:39 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 19 Feb 2012 01:07:39 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <3E9F109C-241F-4B91-832D-CDB09941206B@continuum.io> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <14348F26-27EB-4313-8F90-D9726DC2144D@continuum.io> <3E9F109C-241F-4B91-832D-CDB09941206B@continuum.io> Message-ID: <4F403D4B.3070809@molden.no> Den 18.02.2012 23:54, skrev Travis Oliphant: > Another factor. the decision to make an extra layer of indirection makes small arrays that much slower. I agree with Mark that in a core library we need to go the other way with small arrays being completely allocated in the data-structure itself (reducing the number of pointer de-references). > I am not sure there is much overhead to double *const data = (double*)PyArray_DATA(array); If C code calls PyArray_DATA(array) more than needed, the fix is not to store the data inside the struct, but rather fix the real problem. For example, the Cython syntax for NumPy arrays will under the hood unbox the ndarray struct into local variables. That gives the fastest data access. The NumPy core could e.g. have macros that takes care of the unboxing. But for the purpose of cache use, it could be smart to make sure the data buffer is allocated directly after the PyObject struct (or at least in vicinity of it), so it will be loaded into cache along with the PyObject. That is, prefetched before dereferencing PyArray_DATA(array). But with respect to placement we must keep in mind the the PyObject can be subclassed. Putting e.g. 4 kb of static buffer space inside the PyArrayObject struct will bloat every ndarray. Sturla From njs at pobox.com Sat Feb 18 19:12:56 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 19 Feb 2012 00:12:56 +0000 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <3E9F109C-241F-4B91-832D-CDB09941206B@continuum.io> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <14348F26-27EB-4313-8F90-D9726DC2144D@continuum.io> <3E9F109C-241F-4B91-832D-CDB09941206B@continuum.io> Message-ID: On Sat, Feb 18, 2012 at 10:54 PM, Travis Oliphant wrote: > I'm reading very carefully any arguments against using C++ because I've actually pushed back on Mark pretty hard as we've discussed these things over the past months. ?I am nervous about corner use-cases that will be unpleasant for some groups and some platforms. ? ?But, that vague nervousness is not enough to discount the clear benefits. ? I'm curious about the state of C++ compilers for Blue-Gene and other big-iron machines as well. ? My impression is that most of them use g++. ? which has pretty good support for C++. ? ?David and others raised some important concerns (merging multiple compilers seems like the biggest issue --- it already is...). ? ?If someone out there seriously opposes judicious and careful use of C++ and can show a clear reason why it would be harmful --- feel free to speak up at any time. ? We are leaning that way with Mark out in front of us leading the charge. I don't oppose it, but I admit I'm not really clear on what the supposed advantages would be. Everyone seems to agree that -- Only a carefully-chosen subset of C++ features should be used -- But this subset would be pretty useful I wonder if anyone is actually thinking of the same subset :-). Chuck mentioned iterators as one advantage. I don't understand, since iterators aren't even a C++ feature, they're just objects with "next" and "dereference" operators. The only difference between these is spelling: for (my_iter i = foo.begin(); i != foo.end(); ++i) { ... } for (my_iter i = my_iter_begin(foo); !my_iter_ended(&i); my_iter_next(&i)) { ... } So I assume he's thinking about something more, but the discussion has been too high-level for me to figure out what. Using C++ templates to generate ufunc loops is an obvious application, but again, in the simple examples I'm thinking of (e.g., the stuff in numpy/core/src/umath/loops.c.src), this pretty much comes down to whether we want to spell the function names like "SHORT_add" or "add", and write the code like "*(T *))x[0] + ((T *)y)[0]" or "((@TYPE@ *)x)[0] + ((@TYPE@ *)y)[0]". Maybe there are other places where we'd get some advantage from the compiler knowing what was going on, like if we're doing type-based dispatch to overloaded functions, but I don't know if that'd be useful for the templates we actually use. RAII is pretty awesome, and RAII smart-pointers might help a lot with getting reference-counting right. OTOH, you really only need RAII if you're using exceptions; otherwise, the goto-failure pattern usually works pretty well, esp. if used systematically. Do we know that the Python memory allocator plays well with the C++ allocation interfaces on all relevant systems? (Potentially you have to know for every pointer whether it was allocated by new, new[], malloc, or PyMem_Malloc, because they all have different deallocation functions. This is already an issue for malloc versus PyMem_Malloc, but C++ makes it worse.) Again, it really doesn't matter to me personally which approach is chosen. But getting more concrete might be useful... -- Nathaniel From njs at pobox.com Sat Feb 18 19:19:53 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 19 Feb 2012 00:19:53 +0000 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> <4F402B46.4020103@molden.no> Message-ID: On Sat, Feb 18, 2012 at 11:09 PM, David Cournapeau wrote: > On Sat, Feb 18, 2012 at 10:50 PM, Sturla Molden wrote: > >> ?> In an ideal world, we would have a better language than C++ that can >> be spit out as > C for portability. >> >> What about a statically typed Python? (That is, not Cython.) We just >> need to make the compiler :-) > > There are better languages than C++ that has most of the technical > benefits stated in this discussion (rust and D being the most > "obvious" ones), but whose usage is unrealistic today for various > reasons: knowledge, availability on "esoteric" platforms, etc? A new > language is completely ridiculous. Off-topic: rust is an obvious one? That makes my day, Graydon is an old friend and collaborator :-). But FYI, it wouldn't be relevant anyway; its emphasis on concurrency means that it can easily call C, but you can't really call it from C -- it needs to "own" the overall runtime. And I failed to convince him to add numerical-array-relevant features like operator overloading to make it more convenient for numerical programmers attracted by the concurrency support :-(. There are some very small values of "new language" that might be relevant alternatives, like -- if templates are the big draw for C++, then making the existing code generators suck less might do just as well, while avoiding the build system and portability hassles of C++. *shrug* -- Nathaniel From charlesr.harris at gmail.com Sat Feb 18 19:24:42 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 18 Feb 2012 17:24:42 -0700 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <14348F26-27EB-4313-8F90-D9726DC2144D@continuum.io> <3E9F109C-241F-4B91-832D-CDB09941206B@continuum.io> Message-ID: On Sat, Feb 18, 2012 at 5:12 PM, Nathaniel Smith wrote: > On Sat, Feb 18, 2012 at 10:54 PM, Travis Oliphant > wrote: > > I'm reading very carefully any arguments against using C++ because I've > actually pushed back on Mark pretty hard as we've discussed these things > over the past months. I am nervous about corner use-cases that will be > unpleasant for some groups and some platforms. But, that vague > nervousness is not enough to discount the clear benefits. I'm curious > about the state of C++ compilers for Blue-Gene and other big-iron machines > as well. My impression is that most of them use g++. which has pretty > good support for C++. David and others raised some important concerns > (merging multiple compilers seems like the biggest issue --- it already > is...). If someone out there seriously opposes judicious and careful use > of C++ and can show a clear reason why it would be harmful --- feel free to > speak up at any time. We are leaning that way with Mark out in front of > us leading the charge. > > I don't oppose it, but I admit I'm not really clear on what the > supposed advantages would be. Everyone seems to agree that > -- Only a carefully-chosen subset of C++ features should be used > -- But this subset would be pretty useful > I wonder if anyone is actually thinking of the same subset :-). > > Chuck mentioned iterators as one advantage. I don't understand, since > iterators aren't even a C++ feature, they're just objects with "next" > and "dereference" operators. The only difference between these is > spelling: > for (my_iter i = foo.begin(); i != foo.end(); ++i) { ... } > for (my_iter i = my_iter_begin(foo); !my_iter_ended(&i); > my_iter_next(&i)) { ... } > So I assume he's thinking about something more, but the discussion has > been too high-level for me to figure out what. > They are classes, data with methods in one cute little bundle. > Using C++ templates to generate ufunc loops is an obvious application, > but again, in the simple examples I'm thinking of (e.g., the stuff in > numpy/core/src/umath/loops.c.src), this pretty much comes down to > whether we want to spell the function names like "SHORT_add" or > "add", and write the code like "*(T *))x[0] + ((T *)y)[0]" or > "((@TYPE@ *)x)[0] + ((@TYPE@ *)y)[0]". Maybe there are other places > where we'd get some advantage from the compiler knowing what was going > on, like if we're doing type-based dispatch to overloaded functions, > but I don't know if that'd be useful for the templates we actually > use. > > RAII is pretty awesome, and RAII smart-pointers might help a lot with > getting reference-counting right. OTOH, you really only need RAII if > you're using exceptions; otherwise, the goto-failure pattern usually > works pretty well, esp. if used systematically. > > That's more like having destructors. Let the compiler do it, part of useful code abstraction is to hide those sort of sordid details. > Do we know that the Python memory allocator plays well with the C++ > allocation interfaces on all relevant systems? (Potentially you have > to know for every pointer whether it was allocated by new, new[], > malloc, or PyMem_Malloc, because they all have different deallocation > functions. This is already an issue for malloc versus PyMem_Malloc, > but C++ makes it worse.) > > I think the low level library will ignore the Python memory allocator, but there is a template for allocators that makes them selectable. > Again, it really doesn't matter to me personally which approach is > chosen. But getting more concrete might be useful... > > Agreed. I think much will be clarified once there is some actual code to look at. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Sat Feb 18 19:35:37 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 19 Feb 2012 01:35:37 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <14348F26-27EB-4313-8F90-D9726DC2144D@continuum.io> <3E9F109C-241F-4B91-832D-CDB09941206B@continuum.io> Message-ID: <4F4043D9.5080400@molden.no> Den 19.02.2012 01:12, skrev Nathaniel Smith: > > I don't oppose it, but I admit I'm not really clear on what the > supposed advantages would be. Everyone seems to agree that > -- Only a carefully-chosen subset of C++ features should be used > -- But this subset would be pretty useful > I wonder if anyone is actually thinking of the same subset :-). Probably not, everybody have their own favourite subset. > > Chuck mentioned iterators as one advantage. I don't understand, since > iterators aren't even a C++ feature, they're just objects with "next" > and "dereference" operators. The only difference between these is > spelling: > for (my_iter i = foo.begin(); i != foo.end(); ++i) { ... } > for (my_iter i = my_iter_begin(foo); !my_iter_ended(&i); > my_iter_next(&i)) { ... } > So I assume he's thinking about something more, but the discussion has > been too high-level for me to figure out what. C++11 has this option: for (auto& item : container) { // iterate over the container object, // get a reference to each item // // "container" can be an STL class or // A C-style array with known size. } Which does this: for item in container: pass > Using C++ templates to generate ufunc loops is an obvious application, > but again, in the simple examples Template metaprogramming? Don't even think about it. It is brain dead to try to outsmart the compiler. Sturla From matthew.brett at gmail.com Sat Feb 18 20:18:21 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 18 Feb 2012 17:18:21 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <3E9F109C-241F-4B91-832D-CDB09941206B@continuum.io> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <14348F26-27EB-4313-8F90-D9726DC2144D@continuum.io> <3E9F109C-241F-4B91-832D-CDB09941206B@continuum.io> Message-ID: Hi, On Sat, Feb 18, 2012 at 2:54 PM, Travis Oliphant wrote: > > On Feb 18, 2012, at 4:03 PM, Matthew Brett wrote: > >> Hi, >> >> On Sat, Feb 18, 2012 at 1:57 PM, Travis Oliphant wrote: >>> The C/C++ discussion is just getting started. ?Everyone should keep in mind >>> that this is not something that is going to happening quickly. ? This will >>> be a point of discussion throughout the year. ? ?I'm not a huge supporter of >>> C++, but C++11 does look like it's made some nice progress, and as I think >>> about making a core-set of NumPy into a library that can be called by >>> multiple languages (and even multiple implementations of Python), tempered >>> C++ seems like it might be an appropriate way to go. >> >> Could you say more about this? ?Do you have any idea when the decision >> about C++ is likely to be made? ?At what point does it make most sense >> to make the argument for or against? ?Can you suggest a good way for >> us to be able to make more substantial arguments either way? > > I think early arguments against are always appropriate --- if you believe they have a chance of swaying Mark or Chuck who are the strongest supporters of C++ at this point. ? ? I will be quite nervous about going crazy with C++. ? It was suggested that I use C++ 7 years ago when I wrote NumPy. ? I didn't go that route then largely because of compiler issues, ?ABI-concerns, and I knew C better than C++ so I felt like it would have taken me longer to do something in C++. ? ? I made the right decision for me. ? If you think my C-code is horrible, you would have been completely offended by whatever C++ I might have done at the time. > > But I basically agree with Chuck that there is a lot of C-code in NumPy and template-based-code that is really trying to be C++ spelled differently. > > The decision will not be made until NumPy 2.0 work is farther along. ? ? The most likely outcome is that Mark will develop something quite nice in C++ which he is already toying with, and we will either choose to use it in NumPy to build 2.0 on --- or not. ? I'm interested in sponsoring Mark and working as closely as I can with he and Chuck to see what emerges. Would it be fair to say then, that you are expecting the discussion about C++ will mainly arise after the Mark has written the code? I can see that it will be easier to specific at that point, but there must be a serious risk that it will be too late to seriously consider an alternative approach. >> Can you say a little more about your impression of the previous Cython >> refactor and why it was not successful? >> > > Sure. ?This list actually deserves a long writeup about that. ? First, there wasn't a "Cython-refactor" of NumPy. ? There was a Cython-refactor of SciPy. ? I'm not sure of it's current status. ? I'm still very supportive of that sort of thing. I think I missed that - is it on git somewhere? > I don't know if Cython ever solved the "raising an exception in a Fortran-called call-back" issue. ? I used setjmp and longjmp in several places in SciPy originally in order to enable exceptions raised in a Python-callback that is wrapped in a C-function pointer and being handed to a Fortran-routine that asks for a function-pointer. > > What happend in NumPy, was that the code was re-factored to become a library. ? I don't think much NumPy code actually ended up in Cython (the random-number generators have been in Cython from the beginning). > > > The biggest problem with merging the code was that Mark Wiebe got active at about that same time :-) ? He ended up changing several things in the code-base that made it difficult to merge-in the changes. ? Some of the bug-fixes and memory-leak patches, and tests did get into the code-base, but the essential creation of the NumPy library did not make it. ? There was some very good work done that I hope we can still take advantage of. > Another factor. ? the decision to make an extra layer of indirection makes small arrays that much slower. ? I agree with Mark that in a core library we need to go the other way with small arrays being completely allocated in the data-structure itself (reducing the number of pointer de-references Does that imply there was a review of the refactor at some point to do things like benchmarking? Are there any sources to get started trying to understand the nature of the Numpy refactor and where it ran into trouble? Was it just the small arrays? > So, Cython did not play a major role on the NumPy side of things. ? It played a very nice role on the SciPy side of things. I guess Cython was attractive because the desire was to make a stand-alone library? If that is still the goal, presumably that excludes Cython from serious consideration? What are the primary advantages of making the standalone library? Are there any serious disbenefits? Thanks a lot for the reply, Matthew From matthew.brett at gmail.com Sat Feb 18 20:32:34 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 18 Feb 2012 17:32:34 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <14348F26-27EB-4313-8F90-D9726DC2144D@continuum.io> <3E9F109C-241F-4B91-832D-CDB09941206B@continuum.io> Message-ID: On Sat, Feb 18, 2012 at 5:18 PM, Matthew Brett wrote: > Hi, > > On Sat, Feb 18, 2012 at 2:54 PM, Travis Oliphant wrote: >> >> On Feb 18, 2012, at 4:03 PM, Matthew Brett wrote: >> >>> Hi, >>> >>> On Sat, Feb 18, 2012 at 1:57 PM, Travis Oliphant wrote: >>>> The C/C++ discussion is just getting started. ?Everyone should keep in mind >>>> that this is not something that is going to happening quickly. ? This will >>>> be a point of discussion throughout the year. ? ?I'm not a huge supporter of >>>> C++, but C++11 does look like it's made some nice progress, and as I think >>>> about making a core-set of NumPy into a library that can be called by >>>> multiple languages (and even multiple implementations of Python), tempered >>>> C++ seems like it might be an appropriate way to go. >>> >>> Could you say more about this? ?Do you have any idea when the decision >>> about C++ is likely to be made? ?At what point does it make most sense >>> to make the argument for or against? ?Can you suggest a good way for >>> us to be able to make more substantial arguments either way? >> >> I think early arguments against are always appropriate --- if you believe they have a chance of swaying Mark or Chuck who are the strongest supporters of C++ at this point. ? ? I will be quite nervous about going crazy with C++. ? It was suggested that I use C++ 7 years ago when I wrote NumPy. ? I didn't go that route then largely because of compiler issues, ?ABI-concerns, and I knew C better than C++ so I felt like it would have taken me longer to do something in C++. ? ? I made the right decision for me. ? If you think my C-code is horrible, you would have been completely offended by whatever C++ I might have done at the time. >> >> But I basically agree with Chuck that there is a lot of C-code in NumPy and template-based-code that is really trying to be C++ spelled differently. >> >> The decision will not be made until NumPy 2.0 work is farther along. ? ? The most likely outcome is that Mark will develop something quite nice in C++ which he is already toying with, and we will either choose to use it in NumPy to build 2.0 on --- or not. ? I'm interested in sponsoring Mark and working as closely as I can with he and Chuck to see what emerges. > > Would it be fair to say then, that you are expecting the discussion > about C++ will mainly arise after the Mark has written the code? ? I > can see that it will be easier to specific at that point, but there > must be a serious risk that it will be too late to seriously consider > an alternative approach. > >>> Can you say a little more about your impression of the previous Cython >>> refactor and why it was not successful? >>> >> >> Sure. ?This list actually deserves a long writeup about that. ? First, there wasn't a "Cython-refactor" of NumPy. ? There was a Cython-refactor of SciPy. ? I'm not sure of it's current status. ? I'm still very supportive of that sort of thing. > > I think I missed that - is it on git somewhere? > >> I don't know if Cython ever solved the "raising an exception in a Fortran-called call-back" issue. ? I used setjmp and longjmp in several places in SciPy originally in order to enable exceptions raised in a Python-callback that is wrapped in a C-function pointer and being handed to a Fortran-routine that asks for a function-pointer. >> >> What happend in NumPy, was that the code was re-factored to become a library. ? I don't think much NumPy code actually ended up in Cython (the random-number generators have been in Cython from the beginning). >> >> >> The biggest problem with merging the code was that Mark Wiebe got active at about that same time :-) ? He ended up changing several things in the code-base that made it difficult to merge-in the changes. ? Some of the bug-fixes and memory-leak patches, and tests did get into the code-base, but the essential creation of the NumPy library did not make it. ? There was some very good work done that I hope we can still take advantage of. > >> Another factor. ? the decision to make an extra layer of indirection makes small arrays that much slower. ? I agree with Mark that in a core library we need to go the other way with small arrays being completely allocated in the data-structure itself (reducing the number of pointer de-references > > Does that imply there was a review of the refactor at some point to do > things like benchmarking? ? Are there any sources to get started > trying to understand the nature of the Numpy refactor and where it ran > into trouble? ?Was it just the small arrays? > >> So, Cython did not play a major role on the NumPy side of things. ? It played a very nice role on the SciPy side of things. > > I guess Cython was attractive because the desire was to make a Sorry - that should read "I guess Cython was _not_ attractive ... " > stand-alone library? ? If that is still the goal, presumably that > excludes Cython from serious consideration? ?What are the primary > advantages of making the standalone library? ?Are there any serious > disbenefits? Best, Matthew From hugadams at gwmail.gwu.edu Sat Feb 18 21:12:20 2012 From: hugadams at gwmail.gwu.edu (Adam Hughes) Date: Sat, 18 Feb 2012 21:12:20 -0500 Subject: [Numpy-discussion] Forbidden charcter in the "names" argument of genfromtxt? Message-ID: Hey everyone, I have timeseries data in which the column label is simply a filename from which the original data was taken. Here's some sample data: name1.txt name2.txt name3.txt 32 34 953 32 03 402 I've noticed that the standard genfromtxt() method works great; however, the names aren't written correctly. That is, if I use the command: print data['name1.txt'] Nothing happens. However, when I remove the file extension, Eg: name1 name2 name3 32 34 953 32 03 402 Then print data['name1'] return (32, 32) as expected. It seems that the period in the name isn't compatible with the genfromtxt() names attribute. Is there a workaround, or do I need to restructure my program to get the extension removed? I'd rather not do this if possible for reasons that aren't important for the discussion at hand. Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From wardefar at iro.umontreal.ca Sat Feb 18 23:06:11 2012 From: wardefar at iro.umontreal.ca (David Warde-Farley) Date: Sat, 18 Feb 2012 23:06:11 -0500 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: On 2012-02-18, at 2:47 AM, Matthew Brett wrote: > Of course it might be that so-far undiscovered C++ developers are > drawn to a C++ rewrite of Numpy. But it that really likely? If we can trick them into thinking the GIL doesn't exist, then maybe... David From travis at continuum.io Sat Feb 18 23:38:48 2012 From: travis at continuum.io (Travis Oliphant) Date: Sat, 18 Feb 2012 22:38:48 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <14348F26-27EB-4313-8F90-D9726DC2144D@continuum.io> <3E9F109C-241F-4B91-832D-CDB09941206B@continuum.io> Message-ID: <5FFF6A19-661E-445A-95F9-B0366C148C0E@continuum.io> >> >> The decision will not be made until NumPy 2.0 work is farther along. The most likely outcome is that Mark will develop something quite nice in C++ which he is already toying with, and we will either choose to use it in NumPy to build 2.0 on --- or not. I'm interested in sponsoring Mark and working as closely as I can with he and Chuck to see what emerges. > > Would it be fair to say then, that you are expecting the discussion > about C++ will mainly arise after the Mark has written the code? I > can see that it will be easier to specific at that point, but there > must be a serious risk that it will be too late to seriously consider > an alternative approach. We will need to see examples of what Mark is talking about and clarify some of the compiler issues. Certainly there is some risk that once code is written that it will be tempting to just use it. Other approaches are certainly worth exploring in the mean-time, but C++ has some strong arguments for it. >>> Can you say a little more about your impression of the previous Cython >>> refactor and why it was not successful? >>> >> >> Sure. This list actually deserves a long writeup about that. First, there wasn't a "Cython-refactor" of NumPy. There was a Cython-refactor of SciPy. I'm not sure of it's current status. I'm still very supportive of that sort of thing. > > I think I missed that - is it on git somewhere? I thought so, but I can't find it either. We should ask Jason McCampbell of Enthought where the code is located. Here are the distributed eggs: http://www.enthought.com/repo/.iron/ -Travis > >> Another factor. the decision to make an extra layer of indirection makes small arrays that much slower. I agree with Mark that in a core library we need to go the other way with small arrays being completely allocated in the data-structure itself (reducing the number of pointer de-references > > Does that imply there was a review of the refactor at some point to do > things like benchmarking? Are there any sources to get started > trying to understand the nature of the Numpy refactor and where it ran > into trouble? Was it just the small arrays? The main trouble was just the pace of development of NumPy and the divergence of the trees so that the re-factor branch did not keep up. It's changes were quite extensive, and so were some of Mark's. So, that created the difficulty in merging them together. Mark's review of the re-factor was that small-array support was going to get worse. I'm not sure if we ever did any bench-marking in that direction. > >> So, Cython did not play a major role on the NumPy side of things. It played a very nice role on the SciPy side of things. > > I guess Cython was attractive because the desire was to make a > stand-alone library? If that is still the goal, presumably that > excludes Cython from serious consideration? What are the primary > advantages of making the standalone library? Are there any serious > disbenefits? From my perspective having a standalone core NumPy is still a goal. The primary advantages of having a NumPy library (call it NumLib for the sake of argument) are 1) Ability for projects like PyPy, IronPython, and Jython to use it more easily 2) Ability for Ruby, Perl, Node.JS, and other new languages to use the code for their technical computing projects. 3) increasing the number of users who can help make it more solid 4) being able to build the user-base (and corresponding performance with eye-balls from Intel, NVidia, AMD, Microsoft, Google, etc. looking at the code). The disadvantages I can think of: 1) More users also means we might risk "lowest-commond-denominator" problems --- i.e. trying to be too much to too many may make it not useful for anyone. Also, more users means more people with opinions that might be difficult to re-concile. 2) The work of doing the re-write is not small: probably at least 6 person-months 3) Not being able to rely on Python objects (dictionaries, lists, and tuples are currently used in the code-base quite a bit --- though the re-factor did show some examples of how to remove this usage). 4) Handling of "Object" arrays requires some re-design. I'm sure there are other factors that could be added to both lists. -Travis > > Thanks a lot for the reply, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sun Feb 19 00:19:30 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 18 Feb 2012 21:19:30 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <5FFF6A19-661E-445A-95F9-B0366C148C0E@continuum.io> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <14348F26-27EB-4313-8F90-D9726DC2144D@continuum.io> <3E9F109C-241F-4B91-832D-CDB09941206B@continuum.io> <5FFF6A19-661E-445A-95F9-B0366C148C0E@continuum.io> Message-ID: Hi, On Sat, Feb 18, 2012 at 8:38 PM, Travis Oliphant wrote: > We will need to see examples of what Mark is talking about and clarify some > of the compiler issues. ? Certainly there is some risk that once code is > written that it will be tempting to just use it. ? Other approaches are > certainly worth exploring in the mean-time, but C++ has some strong > arguments for it. The worry as I understand it is that a C++ rewrite might make the numpy core effectively a read-only project for anyone but Mark. Do you have any feeling for whether that is likely? > I thought so, but I can't find it either. ?We should ask Jason McCampbell of > Enthought where the code is located. ? Here are the distributed eggs: > ??http://www.enthought.com/repo/.iron/ Should I email him? Happy to do that. > From my perspective having a standalone core NumPy is still a goal. ? The > primary advantages of having a NumPy library (call it NumLib for the sake of > argument) are > > 1) Ability for projects like PyPy, IronPython, and Jython to use it more > easily > 2) Ability for Ruby, Perl, Node.JS, and other new languages to use the code > for their technical computing projects. > 3) increasing the number of users who can help make it more solid > 4)?being able to build the user-base (and corresponding performance with > eye-balls from Intel, NVidia, AMD, Microsoft, Google, etc. looking at the > code). > > The disadvantages I can think of: > 1) More users also means we might risk "lowest-commond-denominator" problems > --- i.e. trying to be too much to too many may make it not useful for > anyone. Also, more users means more people with opinions that might be > difficult to re-concile. > 2) The work of doing the re-write is not small: ?probably at least 6 > person-months > 3) Not being able to rely on Python objects (dictionaries, lists, and tuples > are currently used in the code-base quite a bit --- though the re-factor did > show some examples of how to remove this usage). > 4) Handling of "Object" arrays requires some re-design. How would numpylib compare to libraries like eigen? How likely do you think it would be that unrelated projects would use numpylib rather than eigen or other numerical libraries? Do you think the choice of C++ rather than C will influence whether other projects will take it up? See you, Matthew From ben.root at ou.edu Sun Feb 19 00:47:53 2012 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 18 Feb 2012 23:47:53 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <14348F26-27EB-4313-8F90-D9726DC2144D@continuum.io> <3E9F109C-241F-4B91-832D-CDB09941206B@continuum.io> <5FFF6A19-661E-445A-95F9-B0366C148C0E@continuum.io> Message-ID: On Saturday, February 18, 2012, Matthew Brett wrote: > Hi, > > On Sat, Feb 18, 2012 at 8:38 PM, Travis Oliphant > > wrote: > > > We will need to see examples of what Mark is talking about and clarify > some > > of the compiler issues. Certainly there is some risk that once code is > > written that it will be tempting to just use it. Other approaches are > > certainly worth exploring in the mean-time, but C++ has some strong > > arguments for it. > > The worry as I understand it is that a C++ rewrite might make the > numpy core effectively a read-only project for anyone but Mark. Do > you have any feeling for whether that is likely? > > Dude, have you seen the .c files in numpy/core? They are already read-only for pretty much everybody but Mark. All kidding aside, is your concern that when Mark starts this that no one will be able to contribute until he is done? I can tell you right now that won't be the case as I will be trying to flesh out issues with datetime64 with him. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Feb 19 01:09:11 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 18 Feb 2012 23:09:11 -0700 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <5FFF6A19-661E-445A-95F9-B0366C148C0E@continuum.io> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <14348F26-27EB-4313-8F90-D9726DC2144D@continuum.io> <3E9F109C-241F-4B91-832D-CDB09941206B@continuum.io> <5FFF6A19-661E-445A-95F9-B0366C148C0E@continuum.io> Message-ID: On Sat, Feb 18, 2012 at 9:38 PM, Travis Oliphant wrote: > > The decision will not be made until NumPy 2.0 work is farther along. > The most likely outcome is that Mark will develop something quite nice in > C++ which he is already toying with, and we will either choose to use it in > NumPy to build 2.0 on --- or not. I'm interested in sponsoring Mark and > working as closely as I can with he and Chuck to see what emerges. > > > Would it be fair to say then, that you are expecting the discussion > about C++ will mainly arise after the Mark has written the code? I > can see that it will be easier to specific at that point, but there > must be a serious risk that it will be too late to seriously consider > an alternative approach. > > > We will need to see examples of what Mark is talking about and clarify > some of the compiler issues. Certainly there is some risk that once code > is written that it will be tempting to just use it. Other approaches are > certainly worth exploring in the mean-time, but C++ has some strong > arguments for it. > > > Can you say a little more about your impression of the previous Cython > > refactor and why it was not successful? > > > > Sure. This list actually deserves a long writeup about that. First, > there wasn't a "Cython-refactor" of NumPy. There was a Cython-refactor of > SciPy. I'm not sure of it's current status. I'm still very supportive > of that sort of thing. > > > I think I missed that - is it on git somewhere? > > > I thought so, but I can't find it either. We should ask Jason McCampbell > of Enthought where the code is located. Here are the distributed eggs: > http://www.enthought.com/repo/.iron/ > Refactor is with the other numpy repos here. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sun Feb 19 01:11:29 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 18 Feb 2012 22:11:29 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <14348F26-27EB-4313-8F90-D9726DC2144D@continuum.io> <3E9F109C-241F-4B91-832D-CDB09941206B@continuum.io> <5FFF6A19-661E-445A-95F9-B0366C148C0E@continuum.io> Message-ID: Hi, On Sat, Feb 18, 2012 at 9:47 PM, Benjamin Root wrote: > > > On Saturday, February 18, 2012, Matthew Brett wrote: >> >> Hi, >> >> On Sat, Feb 18, 2012 at 8:38 PM, Travis Oliphant >> wrote: >> >> > We will need to see examples of what Mark is talking about and clarify >> > some >> > of the compiler issues. ? Certainly there is some risk that once code is >> > written that it will be tempting to just use it. ? Other approaches are >> > certainly worth exploring in the mean-time, but C++ has some strong >> > arguments for it. >> >> The worry as I understand it is that a C++ rewrite might make the >> numpy core effectively a read-only project for anyone but Mark. ?Do >> you have any feeling for whether that is likely? >> > > Dude, have you seen the .c files in numpy/core? They are already read-only > for pretty much everybody but Mark. I think the question is whether refactoring in C would be preferable to refactoring in C++. > All kidding aside, is your concern that when Mark starts this that no one > will be able to contribute until he is done? I can tell you right now that > won't be the case as I will be trying to flesh out issues with datetime64 > with him. No - can I refer you back to the emails from David in particular about the difficulties of sharing development in C++? I can find the links - but do you remember the ones I'm referring to? See you, Matthew From matthew.brett at gmail.com Sun Feb 19 01:15:05 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 18 Feb 2012 22:15:05 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <14348F26-27EB-4313-8F90-D9726DC2144D@continuum.io> <3E9F109C-241F-4B91-832D-CDB09941206B@continuum.io> <5FFF6A19-661E-445A-95F9-B0366C148C0E@continuum.io> Message-ID: Hi, On Sat, Feb 18, 2012 at 10:09 PM, Charles R Harris wrote: > > > On Sat, Feb 18, 2012 at 9:38 PM, Travis Oliphant > wrote: >> >> Sure. ?This list actually deserves a long writeup about that. ? First, >> there wasn't a "Cython-refactor" of NumPy. ? There was a Cython-refactor of >> SciPy. ? I'm not sure of it's current status. ? I'm still very supportive of >> that sort of thing. >> >> >> I think I missed that - is it on git somewhere? >> >> >> I thought so, but I can't find it either. ?We should ask Jason McCampbell >> of Enthought where the code is located. ? Here are the distributed eggs: >> ??http://www.enthought.com/repo/.iron/ > > > Refactor is with the other numpy repos here. I think Travis is referring to the _scipy_ refactor here. I can't see that with the numpy repos, or with the scipy repos, but I may have missed it, See you, Matthew From mwwiebe at gmail.com Sun Feb 19 02:18:20 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sun, 19 Feb 2012 01:18:20 -0600 Subject: [Numpy-discussion] How a transition to C++ could work Message-ID: The suggestion of transitioning the NumPy core code from C to C++ has sparked a vigorous debate, and I thought I'd start a new thread to give my perspective on some of the issues raised, and describe how such a transition could occur. First, I'd like to reiterate the gcc rationale for their choice to switch: http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale In particular, these points deserve emphasis: - The C subset of C++ is just as efficient as C. - C++ supports cleaner code in several significant cases. - C++ makes it easier to write cleaner interfaces by making it harder to break interface boundaries. - C++ never requires uglier code. Some people have pointed out that the Python templating preprocessor used in NumPy is suggestive of C++ templates. A nice advantage of using C++ templates instead of this preprocessor is that third party tools to improve software quality, like static analysis tools, will be able to run directly on the NumPy source code. Additionally, IDEs like XCode and Visual C++ will be able to provide the full suite of tab-completion/intellisense features that programmers working in those environments are accustomed to. There are concerns about ABI/API interoperability and interactions with C++ exceptions. I've dealt with these types of issues on enough platforms to know that while they're important, they're a lot easier to handle than the issues with Fortran, BLAS, and LAPACK in SciPy. My experience has been that providing a C API from a C++ library is no harder than providing a C API from a C library. It's worth comparing the possibility of C++ versus the possibility of other languages, and the ones that have been suggested for consideration are D, Cython, Rust, Fortran 2003, Go, RPython, C# and Java. The target language has to interact naturally with the CPython API. It needs to provide direct access to all the various sizes of signed int, unsigned int, and float. It needs to have mature compiler support wherever we want to deploy NumPy. Taken together, these requirements eliminate a majority of these possibilities. From these criteria, the only languages which seem to have a clear possibility for the implementation of Numpy are C, C++, and D. For D, I suspect the tooling is not mature enough, but I'm not 100% certain of that. The biggest question for any of these possibilities is how do you get the code from its current state to a state which fully utilizes the target language. C++, being nearly a superset of C, offers a strategy to gradually absorb C++ features. Any of the other language choices requires a rewrite, which would be quite disruptive. Because of all these reasons taken together, I believe the only realistic language to use, other than sticking with C, is C++. Finally, here's what I think is the best strategy for transitioning to C++. First, let's consider what we do if 1.7 becomes an LTS release. 1) Immediately after branching for 1.7, we minimally patch all the .c files so that they can build with a C++ compiler and with a C compiler at the same time. Then we rename all .c -> .cpp, and update the build systems for C++. 2) During the 1.8 development cycle, we heavily restrict C++ feature usage. But, where a feature implementation would be arguably easier and less error-prone with C++, we allow it. This is a period for learning about C++ and how it can benefit NumPy. 3) After the 1.8 release, the community will have developed more experience with C++, and will be in a better position to discuss a way forward. If, for some reason, a 1.7 LTS is unacceptable, it might be a good idea to restrict the 1.8 release to the subset of both C and C++. I would much prefer using the 1.8 development cycle to dip our toes into the C++ world to get some of the low-hanging benefits without doing anything disruptive. A really important point to emphasize is that C++ allows for a strategy where we gradually evolve the codebase to better incorporate its language features. This is what I'm advocating. No massive rewrite, no disruptive changes. Gradual code evolution, with ABI and API compatibility comparable to what we've delivered in 1.6 and the upcoming 1.7 releases. Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Sun Feb 19 03:08:43 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sun, 19 Feb 2012 02:08:43 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: On Sat, Feb 18, 2012 at 4:24 PM, David Cournapeau wrote: > On Sat, Feb 18, 2012 at 9:40 PM, Charles R Harris > wrote: > > > > > Well, we already have code obfuscation (DOUBLE_your_pleasure, > > FLOAT_your_boat), so we might as well let the compiler handle it. > > Yes, those are not great, but on the other hand, it is not that a > fundamental issue IMO. > > Iterators as we have it in NumPy is something that is clearly limited > by C. Writing the neighborhood iterator is the only case where I > really felt that C++ *could* be a significant improvement. I use > *could* because writing iterator in C++ is hard, and will be much > harder to read (I find both boost and STL - e.g. stlport -- iterators > to be close to write-only code). But there is the question on how you > can make C++-based iterators available in C. I would be interested in > a simple example of how this could be done, ignoring all the other > issues (portability, exception, etc?). > > The STL is also potentially compelling, but that's where we go into my > "beware of the dragons" area of C++. Portability loss, compilation > time increase and warts are significant there. > scipy.sparse.sparsetools has been a source of issues that was quite > high compared to its proportion of scipy amount code (we *do* have > some hard-won experience on C++-related issues). These standard library issues were definitely valid 10 years ago, but all the major C++ compilers have great C++98 support now. Is there a specific target platform/compiler combination you're thinking of where we can do tests on this? I don't believe the compile times are as bad as many people suspect, can you give some simple examples of things we might do in NumPy you expect to compile slower in C++ vs C? -Mark > > > > Jim Hugunin was a keynote speaker at one of the scipy conventions. At > dinner > > he said that if he was to do it again he would use managed code ;) I > don't > > propose we do that, but tools do advance. > > In an ideal world, we would have a better language than C++ that can > be spit out as C for portability. I have looked for a way to do this > for as long as I have been contributing to NumPy (I have looked at > ooc, D, coccinelle at various stages). I believe the best way is > actually in the vein of FFTW: written in a very high level language > (OCAML) for the hard part, and spitting out C. This is better than C++ > is many ways - this is also clearly not realistic :) > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Sun Feb 19 03:24:26 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Sun, 19 Feb 2012 00:24:26 -0800 Subject: [Numpy-discussion] How a transition to C++ could work In-Reply-To: References: Message-ID: Hey, Mark On Feb 18, 2012 11:18 PM, "Mark Wiebe" wrote: > My experience has been that providing a C API from a C++ library is no harder than providing a C API from a C library. Interfacing to compiled C++ libs have been tricky, so can this concern be dismissed so easily? (Some examples that came to mind were _import_array--easy to fix because it is ours, I guess--or Cython generated code). > A really important point to emphasize is that C++ allows for a strategy where we gradually evolve the codebase to better incorporate its language features. This is what I'm advocating. No massive rewrite, no disruptive changes. Gradual code evolution, with ABI and API compatibility comparable to what we've delivered in 1.6 and the upcoming 1.7 releases. If we're to switch to C++ (a language that can very easily be wielded in terrible ways), then this certainly seems like a sound approach. Regards St?fan -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sun Feb 19 03:32:40 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 19 Feb 2012 00:32:40 -0800 Subject: [Numpy-discussion] How a transition to C++ could work In-Reply-To: References: Message-ID: Hi, Thanks for this - it's very helpful. On Sat, Feb 18, 2012 at 11:18 PM, Mark Wiebe wrote: > The suggestion of transitioning the NumPy core code from C to C++ has > sparked a?vigorous?debate, and I thought I'd start a new thread to give my > perspective on some of the issues raised, and describe how such a transition > could occur. > > First, I'd like to reiterate the gcc rationale for their choice to switch: > http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale > > In particular, these points deserve emphasis: > > The C subset of C++ is just as efficient as C. > C++ supports cleaner code in several significant cases. > C++ makes it easier to write cleaner interfaces by making it harder to break > interface boundaries. > C++ never requires uglier code. > > Some people have pointed out that the Python templating preprocessor used in > NumPy is suggestive of C++ templates. A nice advantage of using C++ > templates instead of this preprocessor is that third party tools to improve > software quality, like static analysis tools, will be able to run directly > on the NumPy source code. Additionally, IDEs like XCode and Visual C++ will > be able to provide the full suite of tab-completion/intellisense features > that programmers working in those environments are accustomed to. > > There are concerns about ABI/API interoperability and interactions with C++ > exceptions. I've dealt with these types of issues on enough platforms to > know that while they're important, they're a lot easier to handle than the > issues with Fortran, BLAS, and LAPACK in SciPy. My experience has been that > providing a C API from a C++ library is no harder than providing a C API > from a C library. > > It's worth comparing the possibility of C++ versus the possibility of other > languages, and the ones that have been suggested for consideration are D, > Cython, Rust, Fortran 2003, Go, RPython, C# and Java. The target language > has to interact naturally with the CPython API. It needs to provide direct > access to all the various sizes of signed int, unsigned int, and float. It > needs to have mature compiler support wherever we want to deploy NumPy. > Taken together, these requirements eliminate a majority of these > possibilities. From these criteria, the only languages which seem to have a > clear possibility for the implementation of Numpy are C, C++, and D. On which criteria did you eliminate Cython? > The biggest question for any of these possibilities is how do you get the > code from its current state to a state which fully utilizes the target > language.?C++, being nearly a superset of C, offers a strategy to gradually > absorb C++ features. Any of the other language choices requires a rewrite, > which would be quite disruptive. Because of all these reasons taken > together, I believe the only realistic language to use, other than sticking > with C, is C++. > > Finally, here's what I think is the best strategy for transitioning to C++. > First, let's consider what we do if 1.7 becomes an LTS release. > > 1) Immediately after branching for 1.7, we minimally patch all the .c files > so that they can build with a C++ compiler and with a C compiler at the same > time. Then we rename all .c -> .cpp, and update the build systems for C++. > 2) During the 1.8 development cycle, we heavily restrict C++ feature usage. > But, where a feature implementation would be arguably easier and less > error-prone with C++, we allow it. This is a period for learning about C++ > and how it can benefit NumPy. > 3) After the 1.8 release, the community will have developed more experience > with C++, and will be in a better position to discuss a way forward. > > If, for some reason, a 1.7 LTS is unacceptable, it might be a good idea to > restrict the 1.8 release to the subset of both C and C++. I would much > prefer using the 1.8 development cycle to dip our toes into the C++ world to > get some of the low-hanging benefits without doing anything disruptive. > > A really important point to emphasize is that C++ allows for a strategy > where we gradually evolve the codebase to better incorporate its language > features. This is what I'm advocating. No massive rewrite, no disruptive > changes. Gradual code evolution, with ABI and API compatibility comparable > to what we've delivered in 1.6 and the upcoming 1.7 releases. Do you have any comment on the need for coding standards when using C++? I saw the warning in: http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale about using C++ unwisely. See you, Matthew From mwwiebe at gmail.com Sun Feb 19 03:33:52 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sun, 19 Feb 2012 02:33:52 -0600 Subject: [Numpy-discussion] How a transition to C++ could work In-Reply-To: References: Message-ID: On Sun, Feb 19, 2012 at 2:24 AM, St?fan van der Walt wrote: > Hey, Mark > > On Feb 18, 2012 11:18 PM, "Mark Wiebe" wrote: > > My experience has been that providing a C API from a C++ library is no > harder than providing a C API from a C library. > > Interfacing to compiled C++ libs have been tricky, so can this concern be > dismissed so easily? (Some examples that came to mind were > _import_array--easy to fix because it is ours, I guess--or Cython generated > code). > I'm speaking from personal experience having dealt with these types of issues extensively before. If people have more detailed examples of problems, possibly links to discussions where one of these problems has occurred, that would be helpful. This argument isn't very useful if it's just my positive experience versus others negative experience, we need to get into specifics to advance the discussion. -Mark > > A really important point to emphasize is that C++ allows for a strategy > where we gradually evolve the codebase to better incorporate its language > features. This is what I'm advocating. No massive rewrite, no disruptive > changes. Gradual code evolution, with ABI and API compatibility comparable > to what we've delivered in 1.6 and the upcoming 1.7 releases. > > If we're to switch to C++ (a language that can very easily be wielded in > terrible ways), then this certainly seems like a sound approach. > > Regards > St?fan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Sun Feb 19 03:39:47 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Sun, 19 Feb 2012 09:39:47 +0100 Subject: [Numpy-discussion] change the mask state of one element in a masked array In-Reply-To: <4F3FDCA2.7090900@hawaii.edu> References: <4F3FDCA2.7090900@hawaii.edu> Message-ID: thanks. 2012/2/18 Eric Firing > On 02/18/2012 05:52 AM, Chao YUE wrote: > > Dear all, > > > > I built a new empty masked array: > > > > In [91]: a=np.ma.empty((2,5)) > > Of course this only makes sense if you are going to immediately populate > the array. > > > > > In [92]: a > > Out[92]: > > masked_array(data = > > [[ 1.20569155e-312 3.34730819e-316 1.13580079e-316 > 1.11459945e-316 > > 9.69610549e-317] > > [ 6.94900258e-310 8.48292532e-317 6.94900258e-310 > 9.76397825e-317 > > 6.94900258e-310]], > > mask = > > False, > > fill_value = 1e+20) > > > > > > as you see, the mask for all the elements are false. so how can I set > > for some elements to masked elements (mask state as true)? > > let's say, I want a[0,0] to be masked. > > a[0,0] = np.ma.masked > > Eric > > > > > thanks & cheers, > > > > Chao > > > > -- > > > *********************************************************************************** > > Chao YUE > > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > > UMR 1572 CEA-CNRS-UVSQ > > Batiment 712 - Pe 119 > > 91191 GIF Sur YVETTE Cedex > > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > > > ************************************************************************************ > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Feb 19 03:44:53 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 19 Feb 2012 09:44:53 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <14348F26-27EB-4313-8F90-D9726DC2144D@continuum.io> <3E9F109C-241F-4B91-832D-CDB09941206B@continuum.io> <5FFF6A19-661E-445A-95F9-B0366C148C0E@continuum.io> Message-ID: On Sun, Feb 19, 2012 at 6:47 AM, Benjamin Root wrote: > > All kidding aside, is your concern that when Mark starts this that no one > will be able to contribute until he is done? I can tell you right now that > won't be the case as I will be trying to flesh out issues with datetime64 > with him. > If you're interested in that, you may be interested in https://github.com/numpy/numpy/pull/156. It's about datetime behavior and compile issues, which are the main reason we can't have a 1.7 release right now. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Sun Feb 19 03:49:49 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sun, 19 Feb 2012 02:49:49 -0600 Subject: [Numpy-discussion] How a transition to C++ could work In-Reply-To: References: Message-ID: On Sun, Feb 19, 2012 at 2:32 AM, Matthew Brett wrote: > Hi, > > Thanks for this - it's very helpful. > > On Sat, Feb 18, 2012 at 11:18 PM, Mark Wiebe wrote: > > The suggestion of transitioning the NumPy core code from C to C++ has > > sparked a vigorous debate, and I thought I'd start a new thread to give > my > > perspective on some of the issues raised, and describe how such a > transition > > could occur. > > > > First, I'd like to reiterate the gcc rationale for their choice to > switch: > > http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale > > > > In particular, these points deserve emphasis: > > > > The C subset of C++ is just as efficient as C. > > C++ supports cleaner code in several significant cases. > > C++ makes it easier to write cleaner interfaces by making it harder to > break > > interface boundaries. > > C++ never requires uglier code. > > > > Some people have pointed out that the Python templating preprocessor > used in > > NumPy is suggestive of C++ templates. A nice advantage of using C++ > > templates instead of this preprocessor is that third party tools to > improve > > software quality, like static analysis tools, will be able to run > directly > > on the NumPy source code. Additionally, IDEs like XCode and Visual C++ > will > > be able to provide the full suite of tab-completion/intellisense features > > that programmers working in those environments are accustomed to. > > > > There are concerns about ABI/API interoperability and interactions with > C++ > > exceptions. I've dealt with these types of issues on enough platforms to > > know that while they're important, they're a lot easier to handle than > the > > issues with Fortran, BLAS, and LAPACK in SciPy. My experience has been > that > > providing a C API from a C++ library is no harder than providing a C API > > from a C library. > > > > It's worth comparing the possibility of C++ versus the possibility of > other > > languages, and the ones that have been suggested for consideration are D, > > Cython, Rust, Fortran 2003, Go, RPython, C# and Java. The target language > > has to interact naturally with the CPython API. It needs to provide > direct > > access to all the various sizes of signed int, unsigned int, and float. > It > > needs to have mature compiler support wherever we want to deploy NumPy. > > Taken together, these requirements eliminate a majority of these > > possibilities. From these criteria, the only languages which seem to > have a > > clear possibility for the implementation of Numpy are C, C++, and D. > > On which criteria did you eliminate Cython? The "mature compiler support" one. As glue between C/C++ and Python, it looks great, but Dag's evaluation of Cython's maturity for implementing the style of functionality in NumPy seems pretty authoritative. So people don't have to dig through the giant email thread, here's the specific message content from Dag, and it's context: On 02/18/2012 12:35 PM, Charles R Harris wrote: > > No one in their right mind would build a large performance library using > Cython, it just isn't the right tool. For what it was designed for - > wrapping existing c code or writing small and simple things close to > Python - it does very well, but it was never designed for making core > C/C++ libraries and in that role it just gets in the way. +1. Even I who have contributed to Cython realize this; last autumn I implemented a library by writing it in C and wrapping it in Cython. > > The biggest question for any of these possibilities is how do you get the > > code from its current state to a state which fully utilizes the target > > language. C++, being nearly a superset of C, offers a strategy to > gradually > > absorb C++ features. Any of the other language choices requires a > rewrite, > > which would be quite disruptive. Because of all these reasons taken > > together, I believe the only realistic language to use, other than > sticking > > with C, is C++. > > > > Finally, here's what I think is the best strategy for transitioning to > C++. > > First, let's consider what we do if 1.7 becomes an LTS release. > > > > 1) Immediately after branching for 1.7, we minimally patch all the .c > files > > so that they can build with a C++ compiler and with a C compiler at the > same > > time. Then we rename all .c -> .cpp, and update the build systems for > C++. > > 2) During the 1.8 development cycle, we heavily restrict C++ feature > usage. > > But, where a feature implementation would be arguably easier and less > > error-prone with C++, we allow it. This is a period for learning about > C++ > > and how it can benefit NumPy. > > 3) After the 1.8 release, the community will have developed more > experience > > with C++, and will be in a better position to discuss a way forward. > > > > If, for some reason, a 1.7 LTS is unacceptable, it might be a good idea > to > > restrict the 1.8 release to the subset of both C and C++. I would much > > prefer using the 1.8 development cycle to dip our toes into the C++ > world to > > get some of the low-hanging benefits without doing anything disruptive. > > > > A really important point to emphasize is that C++ allows for a strategy > > where we gradually evolve the codebase to better incorporate its language > > features. This is what I'm advocating. No massive rewrite, no disruptive > > changes. Gradual code evolution, with ABI and API compatibility > comparable > > to what we've delivered in 1.6 and the upcoming 1.7 releases. > > Do you have any comment on the need for coding standards when using > C++? I saw the warning in: > > http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale > > about using C++ unwisely. > Yes, coding standards are very important. I think they are important for C as well, and it's a problem that NumPy hasn't had any standards written down yet. Chuck is presently the most rigorous enforcer of standards within the current C codebase, so I would nominate him to take a first pass at writing them down. The same applies to Python, and that's what PEP 8 is for. Cheers, Mark > > See you, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Sun Feb 19 03:51:08 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Sun, 19 Feb 2012 00:51:08 -0800 Subject: [Numpy-discussion] How a transition to C++ could work In-Reply-To: References: Message-ID: On Feb 19, 2012 12:34 AM, "Mark Wiebe" wrote: > > I'm speaking from personal experience having dealt with these types of issues extensively before. If people have more detailed examples of problems, possibly links to discussions where one of these problems has occurred, that would be helpful. This argument isn't very useful if it's just my positive experience versus others negative experience, we need to get into specifics to advance the discussion. Wow, the NumPy list has gotten so serious :) I'm certainly not doubting anyone's experience--just trying to get a handle on possible transition risks. OK, so let's talk specifics: how do you dynamically grab a function pointer to a compiled C++ library, a la ctypes? Feel free to point me to StackOverflow or elsewhere. St?fan -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Sun Feb 19 03:55:19 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Sun, 19 Feb 2012 00:55:19 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: On Feb 19, 2012 12:09 AM, "Mark Wiebe" wrote: > > These standard library issues were definitely valid 10 years ago, but all the major C++ compilers have great C++98 support now. Is there a specific target platform/compiler combination you're thinking of where we can do tests on this? I don't believe the compile times are as bad as many people suspect, can you give some simple examples of things we might do in NumPy you expect to compile slower in C++ vs C? The concern may be more that this will be an issue once we start templating (scipy.sparse as an example). Compiling templates requires a lot of memory (more than with the current Heath Robbinson solution). St?fan -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sun Feb 19 03:56:00 2012 From: cournape at gmail.com (David Cournapeau) Date: Sun, 19 Feb 2012 08:56:00 +0000 Subject: [Numpy-discussion] How a transition to C++ could work In-Reply-To: References: Message-ID: Hi Mark, thank you for joining this discussion. On Sun, Feb 19, 2012 at 7:18 AM, Mark Wiebe wrote: > The suggestion of transitioning the NumPy core code from C to C++ has > sparked a?vigorous?debate, and I thought I'd start a new thread to give my > perspective on some of the issues raised, and describe how such a transition > could occur. > > First, I'd like to reiterate the gcc rationale for their choice to switch: > http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale > > In particular, these points deserve emphasis: > > The C subset of C++ is just as efficient as C. > C++ supports cleaner code in several significant cases. > C++ makes it easier to write cleaner interfaces by making it harder to break > interface boundaries. > C++ never requires uglier code. I think those arguments will not be very useful: they are subjective, and unlikely to convince people who prefer C to C++. > > There are concerns about ABI/API interoperability and interactions with C++ > exceptions. I've dealt with these types of issues on enough platforms to > know that while they're important, they're a lot easier to handle than the > issues with Fortran, BLAS, and LAPACK in SciPy. My experience has been that > providing a C API from a C++ library is no harder than providing a C API > from a C library. This needs more details. I have some experience in both areas as well, and mine is quite different. Reiterating a few examples that worry me: - how can you ensure that exceptions happening in C++ will never cross different .so/.dll ? How can one make sure C++ extensions built by different compilers can work ? Is not using exceptions like it is done in zeromq acceptable ? (would be nice to find out more about the decisions made by the zeromq team about their usage of C++). I cannot find a recent example, but I have seen errors similar to this(http://software.intel.com/en-us/forums/showthread.php?t=42940) quite a few times. - how can you expose in C some heavily-using C++ features ? I would expect you would like to use templates for iterators in numpy - you can you make them available to 3rd party extensions without requiring C++. > > It's worth comparing the possibility of C++ versus the possibility of other > languages, and the ones that have been suggested for consideration are D, > Cython, Rust, Fortran 2003, Go, RPython, C# and Java. The target language > has to interact naturally with the CPython API. It needs to provide direct > access to all the various sizes of signed int, unsigned int, and float. It > needs to have mature compiler support wherever we want to deploy NumPy. > Taken together, these requirements eliminate a majority of these > possibilities. From these criteria, the only languages which seem to have a > clear possibility for the implementation of Numpy are C, C++, and D. For D, > I suspect the tooling is not mature enough, but I'm not 100% certain of > that. While I agree that no other language is realistic, staying in C has the nice advantage that we can more easily use one of them if they mature (rust/D - go, rpython, C#/java can be dismissed for fundamental technical reasons right away). This is not a very strong argument against using C++, obviously. > > 1) Immediately after branching for 1.7, we minimally patch all the .c files > so that they can build with a C++ compiler and with a C compiler at the same > time. Then we rename all .c -> .cpp, and update the build systems for C++. > 2) During the 1.8 development cycle, we heavily restrict C++ feature usage. > But, where a feature implementation would be arguably easier and less > error-prone with C++, we allow it. This is a period for learning about C++ > and how it can benefit NumPy. > 3) After the 1.8 release, the community will have developed more experience > with C++, and will be in a better position to discuss a way forward. A step that would be useful sooner rather than later is one where numpy has been split into smaller extensions (instead of multiarray/ufunc, essentially). This would help avoiding recompilation of lots of code for any small change. It is already quite painful with C, but with C++, it will be unbearable. This can be done in C, and would be useful whether the decision to move to C++ is accepted or not. cheers, David From mwwiebe at gmail.com Sun Feb 19 03:59:40 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sun, 19 Feb 2012 02:59:40 -0600 Subject: [Numpy-discussion] How a transition to C++ could work In-Reply-To: References: Message-ID: On Sun, Feb 19, 2012 at 2:51 AM, St?fan van der Walt wrote: > > On Feb 19, 2012 12:34 AM, "Mark Wiebe" wrote: > > > > I'm speaking from personal experience having dealt with these types of > issues extensively before. If people have more detailed examples of > problems, possibly links to discussions where one of these problems has > occurred, that would be helpful. This argument isn't very useful if it's > just my positive experience versus others negative experience, we need to > get into specifics to advance the discussion. > > Wow, the NumPy list has gotten so serious :) I'm certainly not doubting > anyone's experience--just trying to get a handle on possible transition > risks. > > Heh, when threads get longer than 50 message, I think that's a sign something is serious! > OK, so let's talk specifics: how do you dynamically grab a function > pointer to a compiled C++ library, a la ctypes? Feel free to point me to > StackOverflow or elsewhere. > If the C++ library is exposing a C-API, it's identical to the case for C. If it's not, and you must access the functions via ctypes anyway, you need to determine the mangled name of the function. The mangled name encodes the types of the parameters, to support function polymorphism, and is different for each OS platform. Also, if the function takes a class object as a parameter, or returns one, ctypes doesn't give you a way to forward that parameter. In general, the standard advice is to wrap the C++ library using Boost.Python, Cython, or something similar. Dealing directly with the mangled names, while possible, is not likely to make you happy. Cheers, Mark > St?fan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sun Feb 19 04:10:13 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 19 Feb 2012 01:10:13 -0800 Subject: [Numpy-discussion] How a transition to C++ could work In-Reply-To: References: Message-ID: Hi, On Sun, Feb 19, 2012 at 12:49 AM, Mark Wiebe wrote: > On Sun, Feb 19, 2012 at 2:32 AM, Matthew Brett > wrote: >> >> Hi, >> >> Thanks for this - it's very helpful. >> >> On Sat, Feb 18, 2012 at 11:18 PM, Mark Wiebe wrote: >> > The suggestion of transitioning the NumPy core code from C to C++ has >> > sparked a?vigorous?debate, and I thought I'd start a new thread to give >> > my >> > perspective on some of the issues raised, and describe how such a >> > transition >> > could occur. >> > >> > First, I'd like to reiterate the gcc rationale for their choice to >> > switch: >> > http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale >> > >> > In particular, these points deserve emphasis: >> > >> > The C subset of C++ is just as efficient as C. >> > C++ supports cleaner code in several significant cases. >> > C++ makes it easier to write cleaner interfaces by making it harder to >> > break >> > interface boundaries. >> > C++ never requires uglier code. >> > >> > Some people have pointed out that the Python templating preprocessor >> > used in >> > NumPy is suggestive of C++ templates. A nice advantage of using C++ >> > templates instead of this preprocessor is that third party tools to >> > improve >> > software quality, like static analysis tools, will be able to run >> > directly >> > on the NumPy source code. Additionally, IDEs like XCode and Visual C++ >> > will >> > be able to provide the full suite of tab-completion/intellisense >> > features >> > that programmers working in those environments are accustomed to. >> > >> > There are concerns about ABI/API interoperability and interactions with >> > C++ >> > exceptions. I've dealt with these types of issues on enough platforms to >> > know that while they're important, they're a lot easier to handle than >> > the >> > issues with Fortran, BLAS, and LAPACK in SciPy. My experience has been >> > that >> > providing a C API from a C++ library is no harder than providing a C API >> > from a C library. >> > >> > It's worth comparing the possibility of C++ versus the possibility of >> > other >> > languages, and the ones that have been suggested for consideration are >> > D, >> > Cython, Rust, Fortran 2003, Go, RPython, C# and Java. The target >> > language >> > has to interact naturally with the CPython API. It needs to provide >> > direct >> > access to all the various sizes of signed int, unsigned int, and float. >> > It >> > needs to have mature compiler support wherever we want to deploy NumPy. >> > Taken together, these requirements eliminate a majority of these >> > possibilities. From these criteria, the only languages which seem to >> > have a >> > clear possibility for the implementation of Numpy are C, C++, and D. >> >> On which criteria did you eliminate Cython? > > > The "mature compiler support" one. I took you to mean that the code would compile on any platform. As Cython compiles to C, I think Cython passes, if that is what you meant. Maybe you meant you thought that Cython was not mature in some sense, but if so, I'm not sure which sense you mean. > As glue between C/C++ and Python, it > looks great, but Dag's evaluation of Cython's maturity for implementing the > style of functionality in NumPy seems pretty authoritative. So people don't > have to dig through the giant email thread, here's the specific message > content from Dag, and it's context: > > On 02/18/2012 12:35 PM, Charles R Harris wrote: >> >> No one in their right mind would build a large performance library using >> Cython, it just isn't the right tool. For what it was designed for - >> wrapping existing c code or writing small and simple things close to >> Python - it does very well, but it was never designed for making core >> C/C++ libraries and in that role it just gets in the way. > > +1. Even I who have contributed to Cython realize this; last autumn I > implemented a library by writing it in C and wrapping it in Cython. As you probably saw, I think the proposal was indeed to use Cython to provide the higher-level parts of the core, while refactoring the rest of the C code underneath it. Obviously one could also refactor the C into C++, so the proposal to use Cython for some of the core is to some extent orthogonal to the choice of C / C++. I don't know the core, perhaps there isn't much of it that would benefit from being in Cython, I'd be interested to know your views. But, superficially, it seems like an attractive solution to making (some of) the core easier to maintain. Best, Matthew From ben_w_123 at yahoo.co.uk Sun Feb 19 04:10:24 2012 From: ben_w_123 at yahoo.co.uk (Ben Walsh) Date: Sun, 19 Feb 2012 09:10:24 +0000 (GMT) Subject: [Numpy-discussion] How a transition to C++ could work In-Reply-To: References: Message-ID: > Date: Sun, 19 Feb 2012 01:18:20 -0600 > From: Mark Wiebe > Subject: [Numpy-discussion] How a transition to C++ could work > To: Discussion of Numerical Python > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > The suggestion of transitioning the NumPy core code from C to C++ has > sparked a vigorous debate, and I thought I'd start a new thread to give my > perspective on some of the issues raised, and describe how such a > transition could occur. > > First, I'd like to reiterate the gcc rationale for their choice to switch: > http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale > > In particular, these points deserve emphasis: > > - The C subset of C++ is just as efficient as C. > - C++ supports cleaner code in several significant cases. > - C++ makes it easier to write cleaner interfaces by making it harder to > break interface boundaries. > - C++ never requires uglier code. > I think they're trying to solve a different problem. I thought the problem that numpy was trying to solve is "make inner loops of numerical algorithms very fast". C is great for this because you can write C code and picture precisely what assembly code will be generated. C++ removes some of this advantage -- now there is extra code generated by the compiler to handle constructors, destructors, operators etc which can make a material difference to fast inner loops. So you end up just writing "C-style" anyway. On the other hand, if your problem really is "write lots of OO code with virtual methods and have it turned into machine code" (probably like the GCC guys) then maybe C++ is the way to go. Some more opinions on C++: http://gigamonkeys.wordpress.com/2009/10/16/coders-c-plus-plus/ Sorry if this all seems a bit negative about C++. It's just been my experience that C++ adds complexity while C keeps things nice and simple. Looking forward to seeing some more concrete examples. Cheers Ben From cournape at gmail.com Sun Feb 19 04:16:21 2012 From: cournape at gmail.com (David Cournapeau) Date: Sun, 19 Feb 2012 09:16:21 +0000 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: On Sun, Feb 19, 2012 at 8:08 AM, Mark Wiebe wrote: > On Sat, Feb 18, 2012 at 4:24 PM, David Cournapeau > wrote: >> >> On Sat, Feb 18, 2012 at 9:40 PM, Charles R Harris >> wrote: >> >> > >> > Well, we already have code obfuscation (DOUBLE_your_pleasure, >> > FLOAT_your_boat), so we might as well let the compiler handle it. >> >> Yes, those are not great, but on the other hand, it is not that a >> fundamental issue IMO. >> >> Iterators as we have it in NumPy is something that is clearly limited >> by C. Writing the neighborhood iterator is the only case where I >> really felt that C++ *could* be a significant improvement. I use >> *could* because writing iterator in C++ is hard, and will be much >> harder to read (I find both boost and STL - e.g. stlport -- iterators >> to be close to write-only code). But there is the question on how you >> can make C++-based iterators available in C. I would be interested in >> a simple example of how this could be done, ignoring all the other >> issues (portability, exception, etc?). >> >> The STL is also potentially compelling, but that's where we go into my >> "beware of the dragons" area of C++. Portability loss, compilation >> time increase and warts are significant there. >> scipy.sparse.sparsetools has been a source of issues that was quite >> high compared to its proportion of scipy amount code (we *do* have >> some hard-won experience on C++-related issues). > > > These standard library issues were definitely valid 10 years ago, but all > the major C++ compilers have great C++98 support now. STL varies significantly between platforms, I believe it is still the case today. Do you know the status of the STL on bluegen, on small devices ? We unfortunately cannot restrict ourselves to one well known implementation (e.g. STLPort). > Is there a specific > target platform/compiler combination you're thinking of where we can do > tests on this? I don't believe the compile times are as bad as many people > suspect, can you give some simple examples of things we might do in NumPy > you expect to compile slower in C++ vs C? Switching from gcc to g++ on the same codebase should not change much compilation times. We should test, but that's not what worries me. What worries me is when we start using C++ specific code, STL and co. Today, scipy.sparse.sparsetools takes half of the build time of the whole scipy, and it does not even use fancy features. It also takes Gb of ram when building in parallel. David From mwwiebe at gmail.com Sun Feb 19 04:19:05 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sun, 19 Feb 2012 03:19:05 -0600 Subject: [Numpy-discussion] How a transition to C++ could work In-Reply-To: References: Message-ID: On Sun, Feb 19, 2012 at 2:56 AM, David Cournapeau wrote: > Hi Mark, > > thank you for joining this discussion. > > On Sun, Feb 19, 2012 at 7:18 AM, Mark Wiebe wrote: > > The suggestion of transitioning the NumPy core code from C to C++ has > > sparked a vigorous debate, and I thought I'd start a new thread to give > my > > perspective on some of the issues raised, and describe how such a > transition > > could occur. > > > > First, I'd like to reiterate the gcc rationale for their choice to > switch: > > http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale > > > > In particular, these points deserve emphasis: > > > > The C subset of C++ is just as efficient as C. > > C++ supports cleaner code in several significant cases. > > C++ makes it easier to write cleaner interfaces by making it harder to > break > > interface boundaries. > > C++ never requires uglier code. > > I think those arguments will not be very useful: they are subjective, > and unlikely to convince people who prefer C to C++. They are arguments from a team which implement both a C and a C++ compiler. In the spectrum of possible authorities on the matter, they rate about as high as I can imagine. > > > > There are concerns about ABI/API interoperability and interactions with > C++ > > exceptions. I've dealt with these types of issues on enough platforms to > > know that while they're important, they're a lot easier to handle than > the > > issues with Fortran, BLAS, and LAPACK in SciPy. My experience has been > that > > providing a C API from a C++ library is no harder than providing a C API > > from a C library. > > This needs more details. I have some experience in both areas as well, > and mine is quite different. Reiterating a few examples that worry me: > - how can you ensure that exceptions happening in C++ will never > cross different .so/.dll ? This is a necessary part of providing a C API, and is included as a requirement of doing that. All C++ libraries which expose a C API deal with this. > How can one make sure C++ extensions built > by different compilers can work ? This is no different from the situation in C. Already in C on Windows, one can't build NumPy with a different version of Visual C++ than the one used to build CPython. > Is not using exceptions like it is > done in zeromq acceptable ? (would be nice to find out more about the > decisions made by the zeromq team about their usage of C++). I prefer to use exceptions in C++, but some major projects have decided to disable them. LLVM/Clang is the most notable example. My experience working with high-performance graphics code has been that appropriate use of exceptions (i.e. not doing something like using them for control flow) do not pose a problem. I cannot > find a recent example, but I have seen errors similar to > this(http://software.intel.com/en-us/forums/showthread.php?t=42940) > quite a few times. > This kind of thing would happen when using 'new' to allocate memory, and with the compiler setting enabled to raise bad_alloc on such allocation failures (the default for most compilers nowadays). If exception handling is disabled in the compiler, new will return NULL instead. Unless the compiler has a bizarre issue, catching either std::exception or std::bad_alloc specifically within NumPy should be sufficient to deal with it. Also note that the possibility of something like this will only arise once more advanced C++ features are being adopted. - how can you expose in C some heavily-using C++ features ? If the advantages of those C++ features depend on the C++ language, you have to map them to a limited subset of the feature in C. For example, if a feature is based on a C++ template, you can instantiate specific instances of the template for all the types you want to support from C. > I would > expect you would like to use templates for iterators in numpy - you > can you make them available to 3rd party extensions without requiring > C++. > Yes, something like the nditer is a good example. From C, it would have to retain an API in the current style, but C++ users could gain an easier-to-use variant. > > > > > It's worth comparing the possibility of C++ versus the possibility of > other > > languages, and the ones that have been suggested for consideration are D, > > Cython, Rust, Fortran 2003, Go, RPython, C# and Java. The target language > > has to interact naturally with the CPython API. It needs to provide > direct > > access to all the various sizes of signed int, unsigned int, and float. > It > > needs to have mature compiler support wherever we want to deploy NumPy. > > Taken together, these requirements eliminate a majority of these > > possibilities. From these criteria, the only languages which seem to > have a > > clear possibility for the implementation of Numpy are C, C++, and D. For > D, > > I suspect the tooling is not mature enough, but I'm not 100% certain of > > that. > > While I agree that no other language is realistic, staying in C has > the nice advantage that we can more easily use one of them if they > mature (rust/D - go, rpython, C#/java can be dismissed for fundamental > technical reasons right away). This is not a very strong argument > against using C++, obviously. > To provide a counterpoint to this argument, switching to C++ could actually make a transition to another language easier. C++ classes and templates map to equivalent features in D quite naturally, to provide a specific example. > > > > > 1) Immediately after branching for 1.7, we minimally patch all the .c > files > > so that they can build with a C++ compiler and with a C compiler at the > same > > time. Then we rename all .c -> .cpp, and update the build systems for > C++. > > 2) During the 1.8 development cycle, we heavily restrict C++ feature > usage. > > But, where a feature implementation would be arguably easier and less > > error-prone with C++, we allow it. This is a period for learning about > C++ > > and how it can benefit NumPy. > > 3) After the 1.8 release, the community will have developed more > experience > > with C++, and will be in a better position to discuss a way forward. > > A step that would be useful sooner rather than later is one where > numpy has been split into smaller extensions (instead of > multiarray/ufunc, essentially). This would help avoiding recompilation > of lots of code for any small change. It is already quite painful with > C, but with C++, it will be unbearable. This can be done in C, and > would be useful whether the decision to move to C++ is accepted or > not. > I'm pretty confident that the current code will compile in C++ in nearly identical time to C. Having a properly working incremental build system would be a nice step to take numpy builds out of the dark ages, though. Your tireless efforts to make this happen are appreciated! -Mark > > cheers, > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Sun Feb 19 04:28:05 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sun, 19 Feb 2012 03:28:05 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: On Sun, Feb 19, 2012 at 3:16 AM, David Cournapeau wrote: > On Sun, Feb 19, 2012 at 8:08 AM, Mark Wiebe wrote: > > On Sat, Feb 18, 2012 at 4:24 PM, David Cournapeau > > wrote: > >> > >> On Sat, Feb 18, 2012 at 9:40 PM, Charles R Harris > >> wrote: > >> > >> > > >> > Well, we already have code obfuscation (DOUBLE_your_pleasure, > >> > FLOAT_your_boat), so we might as well let the compiler handle it. > >> > >> Yes, those are not great, but on the other hand, it is not that a > >> fundamental issue IMO. > >> > >> Iterators as we have it in NumPy is something that is clearly limited > >> by C. Writing the neighborhood iterator is the only case where I > >> really felt that C++ *could* be a significant improvement. I use > >> *could* because writing iterator in C++ is hard, and will be much > >> harder to read (I find both boost and STL - e.g. stlport -- iterators > >> to be close to write-only code). But there is the question on how you > >> can make C++-based iterators available in C. I would be interested in > >> a simple example of how this could be done, ignoring all the other > >> issues (portability, exception, etc?). > >> > >> The STL is also potentially compelling, but that's where we go into my > >> "beware of the dragons" area of C++. Portability loss, compilation > >> time increase and warts are significant there. > >> scipy.sparse.sparsetools has been a source of issues that was quite > >> high compared to its proportion of scipy amount code (we *do* have > >> some hard-won experience on C++-related issues). > > > > > > These standard library issues were definitely valid 10 years ago, but all > > the major C++ compilers have great C++98 support now. > > STL varies significantly between platforms, I believe it is still the > case today. Do you know the status of the STL on bluegen, on small > devices ? We unfortunately cannot restrict ourselves to one well known > implementation (e.g. STLPort). Is there anyone who uses a blue gene or small device which needs up-to-date numpy support, that I could talk to directly? We really need a list of supported platforms on the numpy wiki we can refer to when discussing this stuff, it all seems very nebulous to me. > Is there a specific > > target platform/compiler combination you're thinking of where we can do > > tests on this? I don't believe the compile times are as bad as many > people > > suspect, can you give some simple examples of things we might do in NumPy > > you expect to compile slower in C++ vs C? > > Switching from gcc to g++ on the same codebase should not change much > compilation times. We should test, but that's not what worries me. > What worries me is when we start using C++ specific code, STL and co. > Today, scipy.sparse.sparsetools takes half of the build time of the > whole scipy, and it does not even use fancy features. It also takes Gb > of ram when building in parallel. > Particular styles of using templates can cause this, yes. To properly do this kind of advanced C++ library work, it's important to think about the big-O notation behavior of your template instantiations, not just the big-O notation of run-time. C++ templates have a turing-complete language (which is said to be quite similar to haskell, but spelled vastly different) running at compile time in them. This is what gives template meta-programming in C++ great power, but since templates weren't designed for this style of programming originally, template meta-programming is not very easy. Cheers, Mark > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Sun Feb 19 04:34:34 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sun, 19 Feb 2012 03:34:34 -0600 Subject: [Numpy-discussion] How a transition to C++ could work In-Reply-To: References: Message-ID: On Sun, Feb 19, 2012 at 3:10 AM, Matthew Brett wrote: > > As you probably saw, I think the proposal was indeed to use Cython to > provide the higher-level parts of the core, while refactoring the rest > of the C code underneath it. Obviously one could also refactor the C > into C++, so the proposal to use Cython for some of the core is to > some extent orthogonal to the choice of C / C++. I don't know the > core, perhaps there isn't much of it that would benefit from being in > Cython, I'd be interested to know your views. But, superficially, it > seems like an attractive solution to making (some of) the core easier > to maintain. > Using Cython in the binding role is orthogonal to the choice of C versus C++, you are right. This binding aspect isn't the part where C++ provides most of the benefits I envision, so increasing (or decreasing) the use of Cython within NumPy seems like a good topic for a separate thread just about Cython. Cheers, Mark > > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sun Feb 19 04:45:42 2012 From: cournape at gmail.com (David Cournapeau) Date: Sun, 19 Feb 2012 09:45:42 +0000 Subject: [Numpy-discussion] How a transition to C++ could work In-Reply-To: References: Message-ID: On Sun, Feb 19, 2012 at 9:19 AM, Mark Wiebe wrote: > On Sun, Feb 19, 2012 at 2:56 AM, David Cournapeau > wrote: >> >> Hi Mark, >> >> thank you for joining this discussion. >> >> On Sun, Feb 19, 2012 at 7:18 AM, Mark Wiebe wrote: >> > The suggestion of transitioning the NumPy core code from C to C++ has >> > sparked a?vigorous?debate, and I thought I'd start a new thread to give >> > my >> > perspective on some of the issues raised, and describe how such a >> > transition >> > could occur. >> > >> > First, I'd like to reiterate the gcc rationale for their choice to >> > switch: >> > http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale >> > >> > In particular, these points deserve emphasis: >> > >> > The C subset of C++ is just as efficient as C. >> > C++ supports cleaner code in several significant cases. >> > C++ makes it easier to write cleaner interfaces by making it harder to >> > break >> > interface boundaries. >> > C++ never requires uglier code. >> >> I think those arguments will not be very useful: they are subjective, >> and unlikely to convince people who prefer C to C++. > > > They are arguments from a team which implement both a C and a C++ compiler. > In the spectrum of possible authorities on the matter, they rate about as > high as I can imagine. There are quite a few arguments who are as authoritative and think those arguments are not very strong. They are as unlikely to change your mind as the gcc's arguments are unlikely to convince me I am afraid. > > This is a necessary part of providing a C API, and is included as a > requirement of doing that. All C++ libraries which expose a C API deal with > this. The only two given examples given so far for a C library around C++ code (clang and zeromq) do not use exceptions. Can you provide an example of a C++ library that has a C API and does use exception ? If not, I would like to know the technical details if you don't mind expanding on them. > >> >> How can one make sure C++ extensions built >> by different compilers can work ? > > > This is no different from the situation in C. Already in C on Windows, one > can't build NumPy with a different version of Visual C++ than the one used > to build CPython. This is a different situation. On windows, the mismatch between VS is due to the way win32 has been used by python itself - it could actually be fixed eventually by python (there are efforts in that regard). It is not a language issue. Except for that case, numpy has a pretty good record of allowing people to mix and match compilers. Using mingw on windows and intel compilers on linux are the typical cases, but not the only ones. >> >> I would >> expect you would like to use templates for iterators in numpy - you >> can you make them available to 3rd party extensions without requiring >> C++. > > > Yes, something like the nditer is a good example. From C, it would have to > retain an API in the current style, but C++ users could gain an > easier-to-use variant. Providing an "official" C++ library on top of the current C API would certainly be nice for people who prefer C++ to C. But this is quite different from using C++ at the core. The current way iterators work would be very hard (if at all possible ?) to rewrite in idiomatic in C++ while keeping even API compatibility with the existing C one. For numpy 2.0, we can somehow relax on this. If it is not too time consuming, could you show a simplified example of how it would work to write the iterator in C++ while providing a C API in the spirit of what we have now ? David From mwwiebe at gmail.com Sun Feb 19 04:52:22 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sun, 19 Feb 2012 03:52:22 -0600 Subject: [Numpy-discussion] How a transition to C++ could work In-Reply-To: References: Message-ID: On Sun, Feb 19, 2012 at 3:10 AM, Ben Walsh wrote: > > > > Date: Sun, 19 Feb 2012 01:18:20 -0600 > > From: Mark Wiebe > > Subject: [Numpy-discussion] How a transition to C++ could work > > To: Discussion of Numerical Python > > Message-ID: > > KduRpZKtgUi516oQtqD4vAzm746HmpqgpFXNqQ at mail.gmail.com> > > Content-Type: text/plain; charset="utf-8" > > > > The suggestion of transitioning the NumPy core code from C to C++ has > > sparked a vigorous debate, and I thought I'd start a new thread to give > my > > perspective on some of the issues raised, and describe how such a > > transition could occur. > > > > First, I'd like to reiterate the gcc rationale for their choice to > switch: > > http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale > > > > In particular, these points deserve emphasis: > > > > - The C subset of C++ is just as efficient as C. > > - C++ supports cleaner code in several significant cases. > > - C++ makes it easier to write cleaner interfaces by making it harder > to > > break interface boundaries. > > - C++ never requires uglier code. > > > > I think they're trying to solve a different problem. > > I thought the problem that numpy was trying to solve is "make inner loops > of numerical algorithms very fast". C is great for this because you can > write C code and picture precisely what assembly code will be generated. > What you're describing is also the C subset of C++, so your experience applies just as well to C++! > C++ removes some of this advantage -- now there is extra code generated by > the compiler to handle constructors, destructors, operators etc which can > make a material difference to fast inner loops. So you end up just writing > "C-style" anyway. > This is in fact not true, and writing in C++ style can often produce faster code. A classic example of this is C qsort vs C++ std::sort. You may be thinking of using virtual functions in a class hierarchy, where a tradeoff between performance and run-time polymorphism is being done. Emulating the functionality that virtual functions provide in C will give similar performance characteristics as the C++ language feature itself. > On the other hand, if your problem really is "write lots of OO code with > virtual methods and have it turned into machine code" (probably like the > GCC guys) then maybe C++ is the way to go. > Managing the complexity of the dtype subsystem, the ufunc subsystem, the nditer component, and other parts of NumPy could benefit from C++ Not in a stereotypical "OO code with virtual methods" way, that is not how typical modern C++ is done. > Some more opinions on C++: > http://gigamonkeys.wordpress.com/2009/10/16/coders-c-plus-plus/ > > Sorry if this all seems a bit negative about C++. It's just been my > experience that C++ adds complexity while C keeps things nice and simple. > Yes, there are lots of negative opinions about C++ out there, it's true. Just like there are negative opinions about C, Java, C#, and any other language which has become popular. My experience with regard to complexity and C vs C++ is that C forces the complexity of dealing with resource lifetimes out into all the code everyone writes, while C++ allows one to encapsulate that sort of complexity into a class which is small and more easily verifiable. This is about code quality, and the best quality C++ code I've worked with has been way easier to program in than the best quality C code I've worked with. Looking forward to seeing some more concrete examples. > In the interests of starting small, here's one that I mentioned in the other thread: Consider a regression like this: http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html Fixing this in C would require switching all the relevant usages of NPY_MAXARGS to use a dynamic memory allocation. This brings with it the potential of easily introducing a memory leak, and is a lot of work to do. In C++, this functionality could be placed inside a class, where the deterministic construction/destruction semantics eliminate the risk of memory leaks and make the code easier to read at the same time. Cheers, Mark > Cheers > > Ben > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sun Feb 19 05:03:05 2012 From: cournape at gmail.com (David Cournapeau) Date: Sun, 19 Feb 2012 10:03:05 +0000 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: On Sun, Feb 19, 2012 at 9:28 AM, Mark Wiebe wrote: > Is there anyone who uses a blue gene or small device which needs up-to-date > numpy support, that I could talk to directly? We really need a list of > supported platforms on the numpy wiki we can refer to when discussing this > stuff, it all seems very nebulous to me. They may not need an up to date numpy version now, but if stopping support for them is a requirement for C++, it must be kept in mind. I actually suspect Travis to have more details on the big iron side of things. On the small side of things: http://projects.scipy.org/numpy/ticket/1969 This may seem like not very useful - but that's part of what a open source project is all about in my mind. > > Particular styles of using templates can cause this, yes. To properly do > this kind of advanced C++ library work, it's important to think about the > big-O notation behavior of your template instantiations, not just the big-O > notation of run-time. C++ templates have a turing-complete language (which > is said to be quite similar to haskell, but spelled vastly different) > running at compile time in them. This is what gives template > meta-programming in C++ great power, but since templates weren't designed > for this style of programming originally, template meta-programming is not > very easy. scipy.sparse.sparsetools is quite straightforward in its usage of templates (would be great if you could suggest improvement BTW, e.g. scipy/sparse/sparsetools/csr.h), and does not by itself use any meta-template programming. I like that numpy can be built in a few seconds (at least without optimization), and consider this to be a useful feature. cheers, David From ralf.gommers at googlemail.com Sun Feb 19 05:03:15 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 19 Feb 2012 11:03:15 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: On Sun, Feb 19, 2012 at 10:28 AM, Mark Wiebe wrote: > On Sun, Feb 19, 2012 at 3:16 AM, David Cournapeau wrote: > >> On Sun, Feb 19, 2012 at 8:08 AM, Mark Wiebe wrote: >> > On Sat, Feb 18, 2012 at 4:24 PM, David Cournapeau >> > wrote: >> >> >> >> On Sat, Feb 18, 2012 at 9:40 PM, Charles R Harris >> >> wrote: >> >> >> >> > >> >> > Well, we already have code obfuscation (DOUBLE_your_pleasure, >> >> > FLOAT_your_boat), so we might as well let the compiler handle it. >> >> >> >> Yes, those are not great, but on the other hand, it is not that a >> >> fundamental issue IMO. >> >> >> >> Iterators as we have it in NumPy is something that is clearly limited >> >> by C. Writing the neighborhood iterator is the only case where I >> >> really felt that C++ *could* be a significant improvement. I use >> >> *could* because writing iterator in C++ is hard, and will be much >> >> harder to read (I find both boost and STL - e.g. stlport -- iterators >> >> to be close to write-only code). But there is the question on how you >> >> can make C++-based iterators available in C. I would be interested in >> >> a simple example of how this could be done, ignoring all the other >> >> issues (portability, exception, etc?). >> >> >> >> The STL is also potentially compelling, but that's where we go into my >> >> "beware of the dragons" area of C++. Portability loss, compilation >> >> time increase and warts are significant there. >> >> scipy.sparse.sparsetools has been a source of issues that was quite >> >> high compared to its proportion of scipy amount code (we *do* have >> >> some hard-won experience on C++-related issues). >> > >> > >> > These standard library issues were definitely valid 10 years ago, but >> all >> > the major C++ compilers have great C++98 support now. >> >> STL varies significantly between platforms, I believe it is still the >> case today. Do you know the status of the STL on bluegen, on small >> devices ? We unfortunately cannot restrict ourselves to one well known >> implementation (e.g. STLPort). > > > Is there anyone who uses a blue gene or small device which needs > up-to-date numpy support, that I could talk to directly? We really need a > list of supported platforms on the numpy wiki we can refer to when > discussing this stuff, it all seems very nebulous to me. > The list of officially supported platforms, where supported means we test and release binaries if appropriate, is short: Windows, Linux, OS X. There are many platforms which are "supported" in the form of feedback on the mailing list or Trac. This explanation is written down somewhere, not sure where right now. The best way to get an overview of those is to look at the distutils code for various compilers, and at npy_cpu.h and similar. We're not talking about expanding the number of officially supported platforms here, but not breaking those unofficially supported ones (too badly). It's possible we break those once in a while, which becomes apparent only when we get a patch of a few lines long that fixes it. What should be avoided is that those few-line patches have to turn into very large patches. The most practical way to deal with this is probably to take two or three non-standard platforms/compilers, set up a buildbot on them, and when things break ensure that fixing it is not too hard. >From recent history, I'd suggest AIX, an ARM device and a PathScale compiler. But the limitation is probably finding someone willing to run a buildbot. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sun Feb 19 05:14:23 2012 From: cournape at gmail.com (David Cournapeau) Date: Sun, 19 Feb 2012 10:14:23 +0000 Subject: [Numpy-discussion] How a transition to C++ could work In-Reply-To: References: Message-ID: On Sun, Feb 19, 2012 at 9:52 AM, Mark Wiebe wrote: > On Sun, Feb 19, 2012 at 3:10 AM, Ben Walsh wrote: >> >> >> >> > Date: Sun, 19 Feb 2012 01:18:20 -0600 >> > From: Mark Wiebe >> > Subject: [Numpy-discussion] How a transition to C++ could work >> > To: Discussion of Numerical Python >> > Message-ID: >> > >> > >> > Content-Type: text/plain; charset="utf-8" >> > >> > The suggestion of transitioning the NumPy core code from C to C++ has >> > sparked a vigorous debate, and I thought I'd start a new thread to give >> > my >> > perspective on some of the issues raised, and describe how such a >> > transition could occur. >> > >> > First, I'd like to reiterate the gcc rationale for their choice to >> > switch: >> > http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale >> > >> > In particular, these points deserve emphasis: >> > >> > ? - The C subset of C++ is just as efficient as C. >> > ? - C++ supports cleaner code in several significant cases. >> > ? - C++ makes it easier to write cleaner interfaces by making it harder >> > to >> > ? break interface boundaries. >> > ? - C++ never requires uglier code. >> > >> >> I think they're trying to solve a different problem. >> >> I thought the problem that numpy was trying to solve is "make inner loops >> of numerical algorithms very fast". C is great for this because you can >> write C code and picture precisely what assembly code will be generated. > > > What you're describing is also the C subset of C++, so your experience > applies just as well to C++! > >> >> C++ removes some of this advantage -- now there is extra code generated by >> the compiler to handle constructors, destructors, operators etc which can >> make a material difference to fast inner loops. So you end up just writing >> "C-style" anyway. > > > This is in fact not true, and writing in C++ style can often produce faster > code. A classic example of this is C qsort vs C++ std::sort. You may be > thinking of using virtual functions in a class hierarchy, where a tradeoff > between performance and run-time polymorphism is being done. Emulating the > functionality that virtual functions provide in C will give similar > performance characteristics as the C++ language feature itself. > >> >> On the other hand, if your problem really is "write lots of OO code with >> virtual methods and have it turned into machine code" (probably like the >> GCC guys) then maybe C++ is the way to go. > > > Managing the complexity of the dtype subsystem, the ufunc subsystem, the > nditer component, and other parts of NumPy could benefit from C++ Not in a > stereotypical "OO code with virtual methods" way, that is not how typical > modern C++ is done. > >> >> Some more opinions on C++: >> http://gigamonkeys.wordpress.com/2009/10/16/coders-c-plus-plus/ >> >> Sorry if this all seems a bit negative about C++. It's just been my >> experience that C++ adds complexity while C keeps things nice and simple. > > > Yes, there are lots of negative opinions about C++ out there, it's true. > Just like there are negative opinions about C, Java, C#, and any other > language which has become popular. My experience with regard to complexity > and C vs C++ is that C forces the complexity of dealing with resource > lifetimes out into all the code everyone writes, while C++ allows one to > encapsulate that sort of complexity into a class which is small and more > easily verifiable. This is about code quality, and the best quality C++ code > I've worked with has been way easier to program in than the best quality C > code I've worked with. While I actually believe this to be true (very good C++ can be easier to read/use than very good C). Good C is also much more common than good C++, at least in open source. On the good C++ codebases you have been working on, could you rely on everybody being a very good C++ programmer ? Because this will most likely never happen for numpy. This is the crux of the argument from an organizational POV: the variance in C++ code quality is much more difficult to control. I have seen C++ code that is certainly much poorer and more complex than numpy, to a point where not much could be done to save the codebase. cheers, David From mwwiebe at gmail.com Sun Feb 19 05:26:23 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sun, 19 Feb 2012 04:26:23 -0600 Subject: [Numpy-discussion] How a transition to C++ could work In-Reply-To: References: Message-ID: On Sun, Feb 19, 2012 at 3:45 AM, David Cournapeau wrote: > On Sun, Feb 19, 2012 at 9:19 AM, Mark Wiebe wrote: > > On Sun, Feb 19, 2012 at 2:56 AM, David Cournapeau > > wrote: > >> > >> Hi Mark, > >> > >> thank you for joining this discussion. > >> > >> On Sun, Feb 19, 2012 at 7:18 AM, Mark Wiebe wrote: > >> > The suggestion of transitioning the NumPy core code from C to C++ has > >> > sparked a vigorous debate, and I thought I'd start a new thread to > give > >> > my > >> > perspective on some of the issues raised, and describe how such a > >> > transition > >> > could occur. > >> > > >> > First, I'd like to reiterate the gcc rationale for their choice to > >> > switch: > >> > http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale > >> > > >> > In particular, these points deserve emphasis: > >> > > >> > The C subset of C++ is just as efficient as C. > >> > C++ supports cleaner code in several significant cases. > >> > C++ makes it easier to write cleaner interfaces by making it harder to > >> > break > >> > interface boundaries. > >> > C++ never requires uglier code. > >> > >> I think those arguments will not be very useful: they are subjective, > >> and unlikely to convince people who prefer C to C++. > > > > > > They are arguments from a team which implement both a C and a C++ > compiler. > > In the spectrum of possible authorities on the matter, they rate about as > > high as I can imagine. > > There are quite a few arguments who are as authoritative and think > those arguments are not very strong. They are as unlikely to change > your mind as the gcc's arguments are unlikely to convince me I am > afraid. I imagine only points 2 and 3 are controversial for you, 1 and 4 are pretty straightforward, yes? We could dig into the specifics of these points if you'd like. > > > > This is a necessary part of providing a C API, and is included as a > > requirement of doing that. All C++ libraries which expose a C API deal > with > > this. > > The only two given examples given so far for a C library around C++ > code (clang and zeromq) do not use exceptions. Can you provide an > example of a C++ library that has a C API and does use exception ? > I couldn't find a nice example with a short search, unfortunately. > If not, I would like to know the technical details if you don't mind > expanding on them. > Sure. First, one would standardize on having all exceptions be derived from std::exception. (std::bad_alloc, which we discussed before, and all other standard exceptions, do). Then, each function exposed to the C API where internally C++ exceptions are used would look roughly like: int api_function(int param1, float param2, PyArrayObject *param3) { try { ... implementation .. return 0;. } catch(std::bad_alloc&) { PyErr_NoMemory(); return -1; } catch(numpy::convergence_error& e) { PyErr_SetString(NpyExc_ConvergenceError, e.what()); return -1; } catch(std::exception& e) { PyErr_SetString(PyExc_RuntimeError, e.what()); return -1; } } > > > > >> > >> How can one make sure C++ extensions built > >> by different compilers can work ? > > > > > > This is no different from the situation in C. Already in C on Windows, > one > > can't build NumPy with a different version of Visual C++ than the one > used > > to build CPython. > > This is a different situation. On windows, the mismatch between VS is > due to the way win32 has been used by python itself - it could > actually be fixed eventually by python (there are efforts in that > regard). It is not a language issue. > I've already tried fixing this and building NumPy with Visual C++ 2010, and the memory allocation/deallocation issues were pretty easy to fix. The problem was that NumPy C code takes a FILE* object from a file opened from within Python code. The root of the issue is when CPython uses a different C runtime library (MSVCR##.dll) than NumPy. > Except for that case, numpy has a pretty good record of allowing > people to mix and match compilers. Using mingw on windows and intel > compilers on linux are the typical cases, but not the only ones. In these cases the compiler is adopting the name-mangling ABI of the compiler it's matching. On Windows, the intel compiler uses the Visual C++ ABI, and on Linux, it uses the gcc ABI. But, since the CPython API is a C API, things would still work fine even if the name-mangling were different. > >> > >> I would > >> expect you would like to use templates for iterators in numpy - you > >> can you make them available to 3rd party extensions without requiring > >> C++. > > > > > > Yes, something like the nditer is a good example. From C, it would have > to > > retain an API in the current style, but C++ users could gain an > > easier-to-use variant. > > Providing an "official" C++ library on top of the current C API would > certainly be nice for people who prefer C++ to C. But this is quite > different from using C++ at the core. > That's true. > The current way iterators work would be very hard (if at all possible > ?) to rewrite in idiomatic in C++ while keeping even API compatibility > with the existing C one. For numpy 2.0, we can somehow relax on this. > If it is not too time consuming, could you show a simplified example > of how it would work to write the iterator in C++ while providing a C > API in the spirit of what we have now ? I think the C iterator API could stay pretty much the same as it is now. Implementing it on top of a more flexible C++ iterator API should be no problem. I'm thinking mostly of the nditer API, which does not use macros. For the other iterators, like the neighborhood iterator, which depends on macros for inlining functionality (correct me if I'm mischaracterizing it), they would probably stay essentially the same. Equivalent C++ iterators which provide a simpler interface for C++ programmers could be done beside them, but because of the desire to push functionality into header files via macros, that particular part of the implementation couldn't be shared. -Mark > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjordan1 at uw.edu Sun Feb 19 05:30:21 2012 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Sun, 19 Feb 2012 02:30:21 -0800 Subject: [Numpy-discussion] How a transition to C++ could work In-Reply-To: References: Message-ID: On Sun, Feb 19, 2012 at 2:14 AM, David Cournapeau wrote: > On Sun, Feb 19, 2012 at 9:52 AM, Mark Wiebe wrote: >> On Sun, Feb 19, 2012 at 3:10 AM, Ben Walsh wrote: >>> >>> >>> >>> > Date: Sun, 19 Feb 2012 01:18:20 -0600 >>> > From: Mark Wiebe >>> > Subject: [Numpy-discussion] How a transition to C++ could work >>> > To: Discussion of Numerical Python >>> > Message-ID: >>> > >>> > >>> > Content-Type: text/plain; charset="utf-8" >>> > >>> > The suggestion of transitioning the NumPy core code from C to C++ has >>> > sparked a vigorous debate, and I thought I'd start a new thread to give >>> > my >>> > perspective on some of the issues raised, and describe how such a >>> > transition could occur. >>> > >>> > First, I'd like to reiterate the gcc rationale for their choice to >>> > switch: >>> > http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale >>> > >>> > In particular, these points deserve emphasis: >>> > >>> > ? - The C subset of C++ is just as efficient as C. >>> > ? - C++ supports cleaner code in several significant cases. >>> > ? - C++ makes it easier to write cleaner interfaces by making it harder >>> > to >>> > ? break interface boundaries. >>> > ? - C++ never requires uglier code. >>> > >>> >>> I think they're trying to solve a different problem. >>> >>> I thought the problem that numpy was trying to solve is "make inner loops >>> of numerical algorithms very fast". C is great for this because you can >>> write C code and picture precisely what assembly code will be generated. >> >> >> What you're describing is also the C subset of C++, so your experience >> applies just as well to C++! >> >>> >>> C++ removes some of this advantage -- now there is extra code generated by >>> the compiler to handle constructors, destructors, operators etc which can >>> make a material difference to fast inner loops. So you end up just writing >>> "C-style" anyway. >> >> >> This is in fact not true, and writing in C++ style can often produce faster >> code. A classic example of this is C qsort vs C++ std::sort. You may be >> thinking of using virtual functions in a class hierarchy, where a tradeoff >> between performance and run-time polymorphism is being done. Emulating the >> functionality that virtual functions provide in C will give similar >> performance characteristics as the C++ language feature itself. >> >>> >>> On the other hand, if your problem really is "write lots of OO code with >>> virtual methods and have it turned into machine code" (probably like the >>> GCC guys) then maybe C++ is the way to go. >> >> >> Managing the complexity of the dtype subsystem, the ufunc subsystem, the >> nditer component, and other parts of NumPy could benefit from C++ Not in a >> stereotypical "OO code with virtual methods" way, that is not how typical >> modern C++ is done. >> >>> >>> Some more opinions on C++: >>> http://gigamonkeys.wordpress.com/2009/10/16/coders-c-plus-plus/ >>> >>> Sorry if this all seems a bit negative about C++. It's just been my >>> experience that C++ adds complexity while C keeps things nice and simple. >> >> >> Yes, there are lots of negative opinions about C++ out there, it's true. >> Just like there are negative opinions about C, Java, C#, and any other >> language which has become popular. My experience with regard to complexity >> and C vs C++ is that C forces the complexity of dealing with resource >> lifetimes out into all the code everyone writes, while C++ allows one to >> encapsulate that sort of complexity into a class which is small and more >> easily verifiable. This is about code quality, and the best quality C++ code >> I've worked with has been way easier to program in than the best quality C >> code I've worked with. > > While I actually believe this to be true (very good C++ can be easier > to read/use than very good C). Good C is also much more common than > good C++, at least in open source. > > On the good C++ codebases you have been working on, could you rely on > everybody being a very good C++ programmer ? Because this will most > likely never happen for numpy. This is the crux of the argument from > an organizational POV: the variance in C++ code quality is much more > difficult to control. I have seen C++ code that is certainly much > poorer and more complex than numpy, to a point where not much could be > done to save the codebase. > Can this possibly be extended to the following: How will Mark's (extensive) experience about performance and long-term consequences of design decisions be communicated to future developers? We not only want new numpy developers, we want them to write good code without unintentional performance regressions. It seems like something more than just code guidelines would be required. There's also the issue that c++ compilation error messages can be awful and disheartening. Are there ways of making them not as bad by following certain coding styles, or is that baked in? (I know clang is moving towards making them much better, though.) -Chris > cheers, > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From mwwiebe at gmail.com Sun Feb 19 05:41:20 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sun, 19 Feb 2012 04:41:20 -0600 Subject: [Numpy-discussion] How a transition to C++ could work In-Reply-To: References: Message-ID: On Sun, Feb 19, 2012 at 4:14 AM, David Cournapeau wrote: > On Sun, Feb 19, 2012 at 9:52 AM, Mark Wiebe wrote: > > On Sun, Feb 19, 2012 at 3:10 AM, Ben Walsh > wrote: > >> > >> > >> > >> > Date: Sun, 19 Feb 2012 01:18:20 -0600 > >> > From: Mark Wiebe > >> > Subject: [Numpy-discussion] How a transition to C++ could work > >> > To: Discussion of Numerical Python > >> > Message-ID: > >> > > >> > > >> > Content-Type: text/plain; charset="utf-8" > >> > > >> > The suggestion of transitioning the NumPy core code from C to C++ has > >> > sparked a vigorous debate, and I thought I'd start a new thread to > give > >> > my > >> > perspective on some of the issues raised, and describe how such a > >> > transition could occur. > >> > > >> > First, I'd like to reiterate the gcc rationale for their choice to > >> > switch: > >> > http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale > >> > > >> > In particular, these points deserve emphasis: > >> > > >> > - The C subset of C++ is just as efficient as C. > >> > - C++ supports cleaner code in several significant cases. > >> > - C++ makes it easier to write cleaner interfaces by making it > harder > >> > to > >> > break interface boundaries. > >> > - C++ never requires uglier code. > >> > > >> > >> I think they're trying to solve a different problem. > >> > >> I thought the problem that numpy was trying to solve is "make inner > loops > >> of numerical algorithms very fast". C is great for this because you can > >> write C code and picture precisely what assembly code will be generated. > > > > > > What you're describing is also the C subset of C++, so your experience > > applies just as well to C++! > > > >> > >> C++ removes some of this advantage -- now there is extra code generated > by > >> the compiler to handle constructors, destructors, operators etc which > can > >> make a material difference to fast inner loops. So you end up just > writing > >> "C-style" anyway. > > > > > > This is in fact not true, and writing in C++ style can often produce > faster > > code. A classic example of this is C qsort vs C++ std::sort. You may be > > thinking of using virtual functions in a class hierarchy, where a > tradeoff > > between performance and run-time polymorphism is being done. Emulating > the > > functionality that virtual functions provide in C will give similar > > performance characteristics as the C++ language feature itself. > > > >> > >> On the other hand, if your problem really is "write lots of OO code with > >> virtual methods and have it turned into machine code" (probably like the > >> GCC guys) then maybe C++ is the way to go. > > > > > > Managing the complexity of the dtype subsystem, the ufunc subsystem, the > > nditer component, and other parts of NumPy could benefit from C++ Not in > a > > stereotypical "OO code with virtual methods" way, that is not how typical > > modern C++ is done. > > > >> > >> Some more opinions on C++: > >> http://gigamonkeys.wordpress.com/2009/10/16/coders-c-plus-plus/ > >> > >> Sorry if this all seems a bit negative about C++. It's just been my > >> experience that C++ adds complexity while C keeps things nice and > simple. > > > > > > Yes, there are lots of negative opinions about C++ out there, it's true. > > Just like there are negative opinions about C, Java, C#, and any other > > language which has become popular. My experience with regard to > complexity > > and C vs C++ is that C forces the complexity of dealing with resource > > lifetimes out into all the code everyone writes, while C++ allows one to > > encapsulate that sort of complexity into a class which is small and more > > easily verifiable. This is about code quality, and the best quality C++ > code > > I've worked with has been way easier to program in than the best quality > C > > code I've worked with. > > While I actually believe this to be true (very good C++ can be easier > to read/use than very good C). Good C is also much more common than > good C++, at least in open source. > > On the good C++ codebases you have been working on, could you rely on > everybody being a very good C++ programmer? Not initially, but I designed the coding standards and taught the programmers I hired how to write good C++ code. > Because this will most > likely never happen for numpy. This is the role I see good coding standards and consistent code review playing. Programmers who don't know how to write good C++ code can be taught. There are also good books to read, like "C++ Coding Standards," "Effective C++", and others that can help people learn proper technique. > This is the crux of the argument from > an organizational POV: the variance in C++ code quality is much more > difficult to control. I have seen C++ code that is certainly much > poorer and more complex than numpy, to a point where not much could be > done to save the codebase. > That's a consequence of the power C++ provides. It assumes the programmer knows what he or she is doing, and provides the tools to make things great or shoot oneself in the foot. I'd like to use that power to make NumPy better, in a way which uses high quality modern C++ style. I'm willing to help anyone contributing C-level code to NumPy to learn this style. I'd rather not have to write any more C code, where it's easy to get a crash because the C compiler allowed an implicit type conversion to slip through when I typed the wrong thing. -Mark > > cheers, > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wardefar at iro.umontreal.ca Sun Feb 19 05:44:27 2012 From: wardefar at iro.umontreal.ca (David Warde-Farley) Date: Sun, 19 Feb 2012 05:44:27 -0500 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <14348F26-27EB-4313-8F90-D9726DC2144D@continuum.io> <3E9F109C-241F-4B91-832D-CDB09941206B@continuum.io> <5FFF6A19-661E-445A-95F9-B0366C148C0E@continuum.io> Message-ID: <996B3429-7357-44D9-8CD5-45EB4D397B31@iro.umontreal.ca> On 2012-02-19, at 12:47 AM, Benjamin Root wrote: > Dude, have you seen the .c files in numpy/core? They are already read-only for pretty much everybody but Mark. I've managed to patch several of them without incident, and I do not do a lot of programming in C. It could be simpler, but it's not really a big deal to navigate once you've spent some time reading it. I think the comments about the developer audience NumPy will attract are important. There may be lots of C++ developers out there, but the intersection of (truly competent in C++) and (likely to involve oneself in NumPy development) may well be quite small. David From mwwiebe at gmail.com Sun Feb 19 05:49:29 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sun, 19 Feb 2012 04:49:29 -0600 Subject: [Numpy-discussion] How a transition to C++ could work In-Reply-To: References: Message-ID: On Sun, Feb 19, 2012 at 4:30 AM, Christopher Jordan-Squire wrote: > On Sun, Feb 19, 2012 at 2:14 AM, David Cournapeau > wrote: > > On Sun, Feb 19, 2012 at 9:52 AM, Mark Wiebe wrote: > >> On Sun, Feb 19, 2012 at 3:10 AM, Ben Walsh > wrote: > >>> > >>> > >>> > >>> > Date: Sun, 19 Feb 2012 01:18:20 -0600 > >>> > From: Mark Wiebe > >>> > Subject: [Numpy-discussion] How a transition to C++ could work > >>> > To: Discussion of Numerical Python > >>> > Message-ID: > >>> > > >>> > > >>> > Content-Type: text/plain; charset="utf-8" > >>> > > >>> > The suggestion of transitioning the NumPy core code from C to C++ has > >>> > sparked a vigorous debate, and I thought I'd start a new thread to > give > >>> > my > >>> > perspective on some of the issues raised, and describe how such a > >>> > transition could occur. > >>> > > >>> > First, I'd like to reiterate the gcc rationale for their choice to > >>> > switch: > >>> > http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale > >>> > > >>> > In particular, these points deserve emphasis: > >>> > > >>> > - The C subset of C++ is just as efficient as C. > >>> > - C++ supports cleaner code in several significant cases. > >>> > - C++ makes it easier to write cleaner interfaces by making it > harder > >>> > to > >>> > break interface boundaries. > >>> > - C++ never requires uglier code. > >>> > > >>> > >>> I think they're trying to solve a different problem. > >>> > >>> I thought the problem that numpy was trying to solve is "make inner > loops > >>> of numerical algorithms very fast". C is great for this because you can > >>> write C code and picture precisely what assembly code will be > generated. > >> > >> > >> What you're describing is also the C subset of C++, so your experience > >> applies just as well to C++! > >> > >>> > >>> C++ removes some of this advantage -- now there is extra code > generated by > >>> the compiler to handle constructors, destructors, operators etc which > can > >>> make a material difference to fast inner loops. So you end up just > writing > >>> "C-style" anyway. > >> > >> > >> This is in fact not true, and writing in C++ style can often produce > faster > >> code. A classic example of this is C qsort vs C++ std::sort. You may be > >> thinking of using virtual functions in a class hierarchy, where a > tradeoff > >> between performance and run-time polymorphism is being done. Emulating > the > >> functionality that virtual functions provide in C will give similar > >> performance characteristics as the C++ language feature itself. > >> > >>> > >>> On the other hand, if your problem really is "write lots of OO code > with > >>> virtual methods and have it turned into machine code" (probably like > the > >>> GCC guys) then maybe C++ is the way to go. > >> > >> > >> Managing the complexity of the dtype subsystem, the ufunc subsystem, the > >> nditer component, and other parts of NumPy could benefit from C++ Not > in a > >> stereotypical "OO code with virtual methods" way, that is not how > typical > >> modern C++ is done. > >> > >>> > >>> Some more opinions on C++: > >>> http://gigamonkeys.wordpress.com/2009/10/16/coders-c-plus-plus/ > >>> > >>> Sorry if this all seems a bit negative about C++. It's just been my > >>> experience that C++ adds complexity while C keeps things nice and > simple. > >> > >> > >> Yes, there are lots of negative opinions about C++ out there, it's true. > >> Just like there are negative opinions about C, Java, C#, and any other > >> language which has become popular. My experience with regard to > complexity > >> and C vs C++ is that C forces the complexity of dealing with resource > >> lifetimes out into all the code everyone writes, while C++ allows one to > >> encapsulate that sort of complexity into a class which is small and more > >> easily verifiable. This is about code quality, and the best quality C++ > code > >> I've worked with has been way easier to program in than the best > quality C > >> code I've worked with. > > > > While I actually believe this to be true (very good C++ can be easier > > to read/use than very good C). Good C is also much more common than > > good C++, at least in open source. > > > > On the good C++ codebases you have been working on, could you rely on > > everybody being a very good C++ programmer ? Because this will most > > likely never happen for numpy. This is the crux of the argument from > > an organizational POV: the variance in C++ code quality is much more > > difficult to control. I have seen C++ code that is certainly much > > poorer and more complex than numpy, to a point where not much could be > > done to save the codebase. > > > > Can this possibly be extended to the following: How will Mark's > (extensive) experience about performance and long-term consequences of > design decisions be communicated to future developers? We not only > want new numpy developers, we want them to write good code without > unintentional performance regressions. It seems like something more > than just code guidelines would be required. > I've tried to set a bit of an example to start with the NEPs I've written. The NEPs for both the nditer and the NA functionality are very long and detailed. Some documents giving general code tours of NumPy would be very helpful, however, and this kind of document could communicate both the current code and what direction it might evolve in the future. It might be worth creating a performance test suite to protect against performance regressions. Wes McKinney has made some noise in that direction. ( http://wesmckinney.com/blog/?p=373) > There's also the issue that c++ compilation error messages can be > awful and disheartening. Are there ways of making them not as bad by > following certain coding styles, or is that baked in? (I know clang is > moving towards making them much better, though.) > Yes, this is a problem. Clang has already made this a lot better than the status quo if you have the good fortune of using it. There are ways of making them not as bad, the boost library developers for example have put a lot of thought into this issue, and came up with the boost static assert library as one mechanism to help improve error messages. C++11 introduces static_assert as a language feature motivated by that experience. Cheers, Mark > -Chris > > > cheers, > > > > David > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun Feb 19 06:25:15 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 19 Feb 2012 11:25:15 +0000 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: On Sun, Feb 19, 2012 at 9:16 AM, David Cournapeau wrote: > On Sun, Feb 19, 2012 at 8:08 AM, Mark Wiebe wrote: >> Is there a specific >> target platform/compiler combination you're thinking of where we can do >> tests on this? I don't believe the compile times are as bad as many people >> suspect, can you give some simple examples of things we might do in NumPy >> you expect to compile slower in C++ vs C? > > Switching from gcc to g++ on the same codebase should not change much > compilation times. We should test, but that's not what worries me. > What worries me is when we start using C++ specific code, STL and co. > Today, scipy.sparse.sparsetools takes half of the build time ?of the > whole scipy, and it does not even use fancy features. It also takes Gb > of ram when building in parallel. I like C++ but it definitely does have issues with compilation times. IIRC the main problem is very simple: STL and friends (e.g. Boost) are huge libraries, and because they use templates, the entire source code is in the header files. That means that as soon as you #include a few standard C++ headers, your innocent little source file has suddenly become hundreds of thousands of lines long, and it just takes the compiler a while to churn through megabytes of source code, no matter what it is. (Effectively you recompile some significant fraction of STL from scratch on every file, and then throw it away.) Precompiled headers can help some, but require complex and highly non-portable build-system support. (E.g., gcc's precompiled header constraints are here: http://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html -- only one per source file, etc.) To demonstrate: a trivial hello-world in C using , versus a trivial version in C++ using . On my laptop (gcc 4.5.2), compiling each program 100 times in a loop requires: C: 2.28 CPU seconds C compiled with C++ compiler: 4.61 CPU seconds C++: 17.66 CPU seconds Slowdown for using g++ instead of gcc: 2.0x Slowdown for using C++ standard library: 3.8x Total C++ penalty: 7.8x Lines of code compiled in each case: $ gcc -E hello.c | wc 855 2039 16934 $ g++ -E hello.cc | wc 18569 40994 437954 (I.e., the C++ hello world is almost half a megabyte.) Of course we won't be using , but , etc. all have the same basic character. -- Nathaniel (Test files attached, times were from: time sh -c 'for i in $(seq 100); do gcc hello.c -o hello-c; done' cp hello.c c-hello.cc time sh -c 'for i in $(seq 100); do g++ c-hello.cc -o c-hello-cc; done' time sh -c 'for i in $(seq 100); do g++ hello.cc -o hello-cc; done' and then summing the resulting user and system times.) -------------- next part -------------- A non-text attachment was scrubbed... Name: hello.c Type: text/x-csrc Size: 98 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: hello.cc Type: text/x-c++src Size: 114 bytes Desc: not available URL: From ndbecker2 at gmail.com Sun Feb 19 08:27:24 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Sun, 19 Feb 2012 08:27:24 -0500 Subject: [Numpy-discussion] Proposed Roadmap Overview References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <14348F26-27EB-4313-8F90-D9726DC2144D@continuum.io> <3E9F109C-241F-4B91-832D-CDB09941206B@continuum.io> <4F4043D9.5080400@molden.no> Message-ID: Sturla Molden wrote: > Den 19.02.2012 01:12, skrev Nathaniel Smith: >> >> I don't oppose it, but I admit I'm not really clear on what the >> supposed advantages would be. Everyone seems to agree that >> -- Only a carefully-chosen subset of C++ features should be used >> -- But this subset would be pretty useful >> I wonder if anyone is actually thinking of the same subset :-). > > Probably not, everybody have their own favourite subset. > > >> >> Chuck mentioned iterators as one advantage. I don't understand, since >> iterators aren't even a C++ feature, they're just objects with "next" >> and "dereference" operators. The only difference between these is >> spelling: >> for (my_iter i = foo.begin(); i != foo.end(); ++i) { ... } >> for (my_iter i = my_iter_begin(foo); !my_iter_ended(&i); >> my_iter_next(&i)) { ... } >> So I assume he's thinking about something more, but the discussion has >> been too high-level for me to figure out what. > > I find range interface (i.e., boost::range) is far more useful than raw iterator interface. I always write all my algorithms using this abstraction. From ndbecker2 at gmail.com Sun Feb 19 08:42:14 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Sun, 19 Feb 2012 08:42:14 -0500 Subject: [Numpy-discussion] Proposed Roadmap Overview References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: Nathaniel Smith wrote: > On Sun, Feb 19, 2012 at 9:16 AM, David Cournapeau wrote: >> On Sun, Feb 19, 2012 at 8:08 AM, Mark Wiebe wrote: >>> Is there a specific >>> target platform/compiler combination you're thinking of where we can do >>> tests on this? I don't believe the compile times are as bad as many people >>> suspect, can you give some simple examples of things we might do in NumPy >>> you expect to compile slower in C++ vs C? >> >> Switching from gcc to g++ on the same codebase should not change much >> compilation times. We should test, but that's not what worries me. >> What worries me is when we start using C++ specific code, STL and co. >> Today, scipy.sparse.sparsetools takes half of the build time of the >> whole scipy, and it does not even use fancy features. It also takes Gb >> of ram when building in parallel. > > I like C++ but it definitely does have issues with compilation times. > > IIRC the main problem is very simple: STL and friends (e.g. Boost) are > huge libraries, and because they use templates, the entire source code > is in the header files. That means that as soon as you #include a few > standard C++ headers, your innocent little source file has suddenly > become hundreds of thousands of lines long, and it just takes the > compiler a while to churn through megabytes of source code, no matter > what it is. (Effectively you recompile some significant fraction of > STL from scratch on every file, and then throw it away.) > > Precompiled headers can help some, but require complex and highly > non-portable build-system support. (E.g., gcc's precompiled header > constraints are here: > http://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html -- only one > per source file, etc.) > > To demonstrate: a trivial hello-world in C using , versus a > trivial version in C++ using . > > On my laptop (gcc 4.5.2), compiling each program 100 times in a loop requires: > C: 2.28 CPU seconds > C compiled with C++ compiler: 4.61 CPU seconds > C++: 17.66 CPU seconds > Slowdown for using g++ instead of gcc: 2.0x > Slowdown for using C++ standard library: 3.8x > Total C++ penalty: 7.8x > > Lines of code compiled in each case: > $ gcc -E hello.c | wc > 855 2039 16934 > $ g++ -E hello.cc | wc > 18569 40994 437954 > (I.e., the C++ hello world is almost half a megabyte.) > > Of course we won't be using , but , > etc. all have the same basic character. > > -- Nathaniel > > (Test files attached, times were from: > time sh -c 'for i in $(seq 100); do gcc hello.c -o hello-c; done' > cp hello.c c-hello.cc > time sh -c 'for i in $(seq 100); do g++ c-hello.cc -o c-hello-cc; done' > time sh -c 'for i in $(seq 100); do g++ hello.cc -o hello-cc; done' > and then summing the resulting user and system times.) On Fedora linux I use ccache, which is completely transparant and makes a huge difference in build times. From ndbecker2 at gmail.com Sun Feb 19 09:00:03 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Sun, 19 Feb 2012 09:00:03 -0500 Subject: [Numpy-discussion] Proposed Roadmap Overview References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: Sturla Molden wrote: > > Den 18. feb. 2012 kl. 01:58 skrev Charles R Harris > : > >> >> >> On Fri, Feb 17, 2012 at 4:44 PM, David Cournapeau wrote: >> I don't think c++ has any significant advantage over c for high performance >> libraries. I am not convinced by the number of people argument either: it is >> not my experience that c++ is easier to maintain in a open source context, >> where the level of people is far from consistent. I doubt many people did not >> contribute to numoy because it is in c instead if c++. While this is somehow >> subjective, there are reasons that c is much more common than c++ in that >> context. >> >> >> I think C++ offers much better tools than C for the sort of things in Numpy. >> The compiler will take care of lots of things that now have to be hand >> crafted and I wouldn't be surprised to see the code size shrink by a >> significant factor. > > The C++11 standard is fantastic. There are automatic data types, closures, > reference counting, weak references, an improved STL with datatypes that map > almost 1:1 against any built-in Python type, a sane threading API, regex, ect. > Even prng is Mersenne Twister by standard. With C++11 it is finally possible > to "write C++ (almost) like Python". On the downside, C++ takes a long term to > learn, most C++ text books teach bad programming habits from the beginning to > the end, and C++ becomes inherently dangerous if you write C++ like C. Many > also abuse C++ as an bloatware generator. Templates can also be abused to > write code that are impossible to debug. While it in theory could be better, C > is a much smaller language. Personally I prefer C++ to C, but I am not > convinced it will be better for NumPy. > I'm all for c++11, but if you are worried about portability, dude, you have a bit of a problem here. > I agree about Cython. It is nice for writing a Python interface for C, but get > messy and unclean when used for anything else. It also has too much focus on > adding all sorts of "new features" instead of correctness and stability. I > don't trust it to generate bug-free code anymore. > > For wrapping C, Swig might be just as good. For C++, SIP, CXX or Boost.Pyton > work well too. > > If cracy ideas are allowed, what about PyPy RPython? Or perhaps Go? Or even C# > if a native compuler could be found? > > c# is a non-starter if you want to run on linux. From mjldehoon at yahoo.com Sun Feb 19 09:23:54 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 19 Feb 2012 06:23:54 -0800 (PST) Subject: [Numpy-discussion] Flag this message Re: [Matplotlib-users] Discussion with Guido van Rossum and (hopefully) core python-dev on scientific Python and Python3 Message-ID: <1329661434.62926.YahooMailClassic@web161205.mail.bf1.yahoo.com> > While a number of scientific Python packages are already available for > Python 3 (either in released form or in their master git branches), > it's fair to say that there hasn't been a major transition of the > scientific community to Python3. Since there is no more development > being done on the Python2 series, eventually we will all want to find > ways to make this transition, and we think that this is an excellent > time to engage the core python development team and consider ideas > that would make Python3 generally a more appealing language for > scientific work. For scientific visualization, in particular for matplotlib, it would be really good if Python3 had event loop support. In Python2 extension modules such as Tkinter, PyGTK, PyQT each implement their own event loop, leading to incompatibilities. For reliable event handling, Python should be in charge of the event loop, rather than the extension modules. For comparison, see the event loop support provided by Tcl / Tk. -Michiel. From sturla at molden.no Sun Feb 19 09:42:18 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 19 Feb 2012 15:42:18 +0100 Subject: [Numpy-discussion] How a transition to C++ could work In-Reply-To: References: Message-ID: <99D9768D-B593-415F-A915-DCD1431847A2@molden.no> Den 19. feb. 2012 kl. 09:51 skrev St?fan van der Walt : > > OK, so let's talk specifics: how do you dynamically grab a function pointer to a compiled C++ library, a la ctypes? Feel free to point me to StackOverflow or elsewhere. > You declare the function with the signature extern "C". Sturla -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam at lambdafoundry.com Sun Feb 19 10:20:12 2012 From: adam at lambdafoundry.com (Adam Klein) Date: Sun, 19 Feb 2012 10:20:12 -0500 Subject: [Numpy-discussion] How a transition to C++ could work In-Reply-To: References: Message-ID: <-5964722698688473837@unknownmsgid> On Feb 19, 2012, at 2:18 AM, Mark Wiebe wrote: The suggestion of transitioning the NumPy core code from C to C++ has sparked a vigorous debate, and I thought I'd start a new thread to give my perspective on some of the issues raised, and describe how such a transition could occur. First, I'd like to reiterate the gcc rationale for their choice to switch: http://gcc.gnu.org/wiki/gcc-in-cxx#Rationale In particular, these points deserve emphasis: - The C subset of C++ is just as efficient as C. - C++ supports cleaner code in several significant cases. - C++ makes it easier to write cleaner interfaces by making it harder to break interface boundaries. - C++ never requires uglier code. Some people have pointed out that the Python templating preprocessor used in NumPy is suggestive of C++ templates. A nice advantage of using C++ templates instead of this preprocessor is that third party tools to improve software quality, like static analysis tools, will be able to run directly on the NumPy source code. Additionally, IDEs like XCode and Visual C++ will be able to provide the full suite of tab-completion/intellisense features that programmers working in those environments are accustomed to. There are concerns about ABI/API interoperability and interactions with C++ exceptions. I've dealt with these types of issues on enough platforms to know that while they're important, they're a lot easier to handle than the issues with Fortran, BLAS, and LAPACK in SciPy. My experience has been that providing a C API from a C++ library is no harder than providing a C API from a C library. It's worth comparing the possibility of C++ versus the possibility of other languages, and the ones that have been suggested for consideration are D, Cython, Rust, Fortran 2003, Go, RPython, C# and Java. The target language has to interact naturally with the CPython API. It needs to provide direct access to all the various sizes of signed int, unsigned int, and float. It needs to have mature compiler support wherever we want to deploy NumPy. Taken together, these requirements eliminate a majority of these possibilities. From these criteria, the only languages which seem to have a clear possibility for the implementation of Numpy are C, C++, and D. For D, I suspect the tooling is not mature enough, but I'm not 100% certain of that. I am a huge fan of D, but you are dead on about its tooling, so +1 on the observation. Its code generation especially with respect to floating point is also a known area needing improvement IIRC. The biggest question for any of these possibilities is how do you get the code from its current state to a state which fully utilizes the target language. C++, being nearly a superset of C, offers a strategy to gradually absorb C++ features. Any of the other language choices requires a rewrite, which would be quite disruptive. Because of all these reasons taken together, I believe the only realistic language to use, other than sticking with C, is C++. Finally, here's what I think is the best strategy for transitioning to C++. First, let's consider what we do if 1.7 becomes an LTS release. 1) Immediately after branching for 1.7, we minimally patch all the .c files so that they can build with a C++ compiler and with a C compiler at the same time. Then we rename all .c -> .cpp, and update the build systems for C++. 2) During the 1.8 development cycle, we heavily restrict C++ feature usage. But, where a feature implementation would be arguably easier and less error-prone with C++, we allow it. This is a period for learning about C++ and how it can benefit NumPy. 3) After the 1.8 release, the community will have developed more experience with C++, and will be in a better position to discuss a way forward. If, for some reason, a 1.7 LTS is unacceptable, it might be a good idea to restrict the 1.8 release to the subset of both C and C++. I would much prefer using the 1.8 development cycle to dip our toes into the C++ world to get some of the low-hanging benefits without doing anything disruptive. A really important point to emphasize is that C++ allows for a strategy where we gradually evolve the codebase to better incorporate its language features. This is what I'm advocating. No massive rewrite, no disruptive changes. Gradual code evolution, with ABI and API compatibility comparable to what we've delivered in 1.6 and the upcoming 1.7 releases. Thanks, Mark _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Sun Feb 19 10:23:58 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 19 Feb 2012 16:23:58 +0100 Subject: [Numpy-discussion] How a transition to C++ could work In-Reply-To: References: Message-ID: <4F41140E.9080004@molden.no> Den 19.02.2012 11:30, skrev Christopher Jordan-Squire: > > Can this possibly be extended to the following: How will Mark's > (extensive) experience about performance and long-term consequences of > design decisions be communicated to future developers? We not only > want new numpy developers, we want them to write good code without > unintentional performance regressions. It seems like something more > than just code guidelines would be required. There are more examples of crappy than good C++ out there. There are tons of litterature on how to write crappy C++. And most programmers do not have the skill or knowledge to write good C++. My biggest issue with C++ is the variability of skills among programmers. It will result in code that are: - unncessesary complex - ugly looking - difficult to understand - verbose and long - inefficient - full of subtile errors - impossible to debug - impossible to maintain - not scalable with hardware - dependent on one particular compiler It is easier to achive this with C++ than C. But it is also easier to avoid. Double-edged sword. It will take more than guidelines. Sturla From pav at iki.fi Sun Feb 19 10:35:04 2012 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 19 Feb 2012 16:35:04 +0100 Subject: [Numpy-discussion] Scipy Cython refactor In-Reply-To: <5FFF6A19-661E-445A-95F9-B0366C148C0E@continuum.io> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <14348F26-27EB-4313-8F90-D9726DC2144D@continuum.io> <3E9F109C-241F-4B91-832D-CDB09941206B@continuum.io> <5FFF6A19-661E-445A-95F9-B0366C148C0E@continuum.io> Message-ID: 19.02.2012 05:38, Travis Oliphant kirjoitti: [clip] >>> Sure. This list actually deserves a long writeup about that. >>> First, there wasn't a "Cython-refactor" of NumPy. There was a >>> Cython-refactor of SciPy. I'm not sure of it's current status. >>> I'm still very supportive of that sort of thing. >> >> I think I missed that - is it on git somewhere? > > I thought so, but I can't find it either. We should ask Jason > McCampbell of Enthought where the code is located. Here are the > distributed eggs: http://www.enthought.com/repo/.iron/ They're here: https://github.com/dagss/private-scipy-refactor https://github.com/jasonmccampbell/scipy-refactor The main problem with merging this was the experimental status of FWrap, and the fact that the wrappers it generates are big compared to f2py and required manual editing of the generated code. So, there were maintainability concerns with the Fortran pieces. These could probably be solved, however, and I wouldn't be opposed to e.g. cleaning up the generated code and using manually crafted Cython. Cherry picking the Cython replacements for all the modules wrapped in C probably should be done in any case. The parts of Scipy affected by the refactoring have not changed significantly, so there are no significant problems in re-raising the issue of merging the work back. Pauli From sturla at molden.no Sun Feb 19 10:38:01 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 19 Feb 2012 16:38:01 +0100 Subject: [Numpy-discussion] How a transition to C++ could work In-Reply-To: References: Message-ID: <4F411759.6070700@molden.no> Den 19.02.2012 10:52, skrev Mark Wiebe: > > C++ removes some of this advantage -- now there is extra code > generated by > the compiler to handle constructors, destructors, operators etc > which can > make a material difference to fast inner loops. So you end up just > writing > "C-style" anyway. > > > This is in fact not true, and writing in C++ style can often produce > faster code. A classic example of this is C qsort vs C++ std::sort. > You may be thinking of using virtual functions in a class hierarchy, > where a tradeoff between performance and run-time polymorphism is > being done. Emulating the functionality that virtual functions provide > in C will give similar performance characteristics as the C++ language > feature itself. I agree with Mark here. C++ usually produces the faster code. C++ has abstractions that makes it easier to write more efficient code. C++ provides more and better information to the compiler (e.g. strict aliasing rules). C++ compilers are also getting insanely good at optimisation, usually better than C compilers. But C++ also makes it easy to write sluggish bloatware, so the effect on performance is not predictable. Sturla -------------- next part -------------- An HTML attachment was scrubbed... URL: From xavier.gnata at gmail.com Sun Feb 19 10:45:06 2012 From: xavier.gnata at gmail.com (xavier.gnata at gmail.com) Date: Sun, 19 Feb 2012 16:45:06 +0100 Subject: [Numpy-discussion] NumPy in PyPy ? Message-ID: <4F411902.8090407@gmail.com> Hi, I'm trying to understand what's going on with : http://morepypy.blogspot.com/2012/01/numpypy-status-update.html What's your opinion on such a numpy rewrite?? Thanks, Xavier From adam at lambdafoundry.com Sun Feb 19 10:45:36 2012 From: adam at lambdafoundry.com (Adam Klein) Date: Sun, 19 Feb 2012 10:45:36 -0500 Subject: [Numpy-discussion] How a transition to C++ could work In-Reply-To: <4F411759.6070700@molden.no> References: <4F411759.6070700@molden.no> Message-ID: <-7163173211858824926@unknownmsgid> On Feb 19, 2012, at 10:38 AM, Sturla Molden wrote: Den 19.02.2012 10:52, skrev Mark Wiebe: C++ removes some of this advantage -- now there is extra code generated by > the compiler to handle constructors, destructors, operators etc which can > make a material difference to fast inner loops. So you end up just writing > "C-style" anyway. > This is in fact not true, and writing in C++ style can often produce faster code. A classic example of this is C qsort vs C++ std::sort. You may be thinking of using virtual functions in a class hierarchy, where a tradeoff between performance and run-time polymorphism is being done. Emulating the functionality that virtual functions provide in C will give similar performance characteristics as the C++ language feature itself. I agree with Mark here. C++ usually produces the faster code. C++ has abstractions that makes it easier to write more efficient code. C++ provides more and better information to the compiler (e.g. strict aliasing rules). C++ compilers are also getting insanely good at optimisation, usually better than C compilers. But C++ also makes it easy to write sluggish bloatware, so the effect on performance is not predictable. Just to add, with respect to acceptable compilation times, a judicious choice of C++ features is critical. Sturla _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Sun Feb 19 10:48:04 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 19 Feb 2012 16:48:04 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> Message-ID: <4F4119B4.7060209@molden.no> Den 19.02.2012 10:28, skrev Mark Wiebe: > > Particular styles of using templates can cause this, yes. To properly > do this kind of advanced C++ library work, it's important to think > about the big-O notation behavior of your template instantiations, not > just the big-O notation of run-time. C++ templates have a > turing-complete language (which is said to be quite similar to > haskell, but spelled vastly different) running at compile time in > them. This is what gives template meta-programming in C++ great power, > but since templates weren't designed for this style of programming > originally, template meta-programming is not very easy. > > The problem with metaprogramming is that we are doing manually the work that belongs to the compiler. Blitz++ was supposed to be a library that "thought like a compiler". But then compilers just got better. Today, it is no longer possible for a numerical library programmer to outsmart an optimizing C++ compiler. All metaprogramming can do today is produce error messages noone can understand. And the resulting code will often be slower because the compiler has less opportunities to do its work. Sturla From sturla at molden.no Sun Feb 19 10:53:33 2012 From: sturla at molden.no (Sturla Molden) Date: Sun, 19 Feb 2012 16:53:33 +0100 Subject: [Numpy-discussion] How a transition to C++ could work In-Reply-To: <-7163173211858824926@unknownmsgid> References: <4F411759.6070700@molden.no> <-7163173211858824926@unknownmsgid> Message-ID: <4F411AFD.7080402@molden.no> Den 19.02.2012 16:45, skrev Adam Klein: > > Just to add, with respect to acceptable compilation times, a judicious > choice of C++ features is critical. > I use Python to avoid recompiling my code all the time. I don't recompile NumPy every time I use it. (I know you are thinking about development, but you have the wrong perspective.) Sturla From xavier.gnata at gmail.com Sun Feb 19 11:13:39 2012 From: xavier.gnata at gmail.com (xavier.gnata at gmail.com) Date: Sun, 19 Feb 2012 17:13:39 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <4F4119B4.7060209@molden.no> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F4119B4.7060209@molden.no> Message-ID: <4F411FB3.6050902@gmail.com> On 02/19/2012 04:48 PM, Sturla Molden wrote: > Den 19.02.2012 10:28, skrev Mark Wiebe: >> Particular styles of using templates can cause this, yes. To properly >> do this kind of advanced C++ library work, it's important to think >> about the big-O notation behavior of your template instantiations, not >> just the big-O notation of run-time. C++ templates have a >> turing-complete language (which is said to be quite similar to >> haskell, but spelled vastly different) running at compile time in >> them. This is what gives template meta-programming in C++ great power, >> but since templates weren't designed for this style of programming >> originally, template meta-programming is not very easy. >> >> > The problem with metaprogramming is that we are doing manually the work > that belongs to the compiler. Blitz++ was supposed to be a library that > "thought like a compiler". But then compilers just got better. Today, it > is no longer possible for a numerical library programmer to outsmart an > optimizing C++ compiler. All metaprogramming can do today is produce > error messages noone can understand. And the resulting code will often > be slower because the compiler has less opportunities to do its work. > > Sturla "Today, it is no longer possible for a numerical library programmer to outsmart an optimizing C++ compiler." I'm no sure. If you want to be able to write A=B+C+D; with decent performances, I think you have to use a lib based on expression templates. It would be great if C++ compilers could automatically optimize out spurious copies into temporaries. However, I don't think the compilers are smart enough to do so...not yet. Xavier From ralf.gommers at googlemail.com Sun Feb 19 11:19:23 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 19 Feb 2012 17:19:23 +0100 Subject: [Numpy-discussion] How a transition to C++ could work In-Reply-To: <4F411AFD.7080402@molden.no> References: <4F411759.6070700@molden.no> <-7163173211858824926@unknownmsgid> <4F411AFD.7080402@molden.no> Message-ID: On Sun, Feb 19, 2012 at 4:53 PM, Sturla Molden wrote: > Den 19.02.2012 16:45, skrev Adam Klein: > > > > Just to add, with respect to acceptable compilation times, a judicious > > choice of C++ features is critical. > > > > I use Python to avoid recompiling my code all the time. I don't > recompile NumPy every time I use it. > > (I know you are thinking about development, but you have the wrong > perspective.) > No he doesn't. Perspectives aren't wrong, just different. I compile both numpy and scipy on a regular (almost daily) basis, and long compile times are very annoying. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Sun Feb 19 12:34:59 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Sun, 19 Feb 2012 09:34:59 -0800 Subject: [Numpy-discussion] How a transition to C++ could work In-Reply-To: References: Message-ID: On Feb 19, 2012 2:41 AM, "Mark Wiebe" wrote: > > This is the role I see good coding standards and consistent code review playing. Programmers who don't know how to write good C++ code can be taught. There are also good books to read, like "C++ Coding Standards," "Effective C++", and others that can help people learn proper technique. I recommended this book (one in the list avove) to anyone who is not afraid of C++ yet: http://search.barnesandnoble.com/Effective-C/Scott-Meyers/e/9780321334879 With great power comes great responsibility. St?fan -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun Feb 19 13:32:02 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 19 Feb 2012 18:32:02 +0000 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <4F411FB3.6050902@gmail.com> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F4119B4.7060209@molden.no> <4F411FB3.6050902@gmail.com> Message-ID: On Sun, Feb 19, 2012 at 4:13 PM, xavier.gnata at gmail.com wrote: > I'm no sure. If you want to be able to write A=B+C+D; with decent > performances, I think you have to use a lib based on expression templates. > It would be great if C++ compilers could automatically optimize out > spurious copies into temporaries. > However, I don't think the compilers are smart enough to do so...not yet. But isn't this all irrelevant to numpy? Numpy is basically a large collection of bare inner loops, plus a bunch of dynamic dispatch machinery to make sure that the right one gets called at the right time. Since these are exposed directly to Python, there's really no way for the compiler to optimize out spurious copies or anything like that -- even a very smart fortran-esque static compiler can't optimize complex expressions like A=B+C+D if they simply aren't present at compile time. And I guess even less-fancy C compilers will still be able to optimize simple ufunc loops pretty well. IIUC the important thing for numpy speed is the code that works out at runtime whether this particular array would benefit from a column-based or row-based strategy, chooses the right buffer sizes, etc., which isn't really something a compiler can help with. -- Nathaniel From mwwiebe at gmail.com Sun Feb 19 14:13:20 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sun, 19 Feb 2012 13:13:20 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: On Sun, Feb 19, 2012 at 5:25 AM, Nathaniel Smith wrote: > On Sun, Feb 19, 2012 at 9:16 AM, David Cournapeau > wrote: > > On Sun, Feb 19, 2012 at 8:08 AM, Mark Wiebe wrote: > >> Is there a specific > >> target platform/compiler combination you're thinking of where we can do > >> tests on this? I don't believe the compile times are as bad as many > people > >> suspect, can you give some simple examples of things we might do in > NumPy > >> you expect to compile slower in C++ vs C? > > > > Switching from gcc to g++ on the same codebase should not change much > > compilation times. We should test, but that's not what worries me. > > What worries me is when we start using C++ specific code, STL and co. > > Today, scipy.sparse.sparsetools takes half of the build time of the > > whole scipy, and it does not even use fancy features. It also takes Gb > > of ram when building in parallel. > > I like C++ but it definitely does have issues with compilation times. > > IIRC the main problem is very simple: STL and friends (e.g. Boost) are > huge libraries, and because they use templates, the entire source code > is in the header files. That means that as soon as you #include a few > standard C++ headers, your innocent little source file has suddenly > become hundreds of thousands of lines long, and it just takes the > compiler a while to churn through megabytes of source code, no matter > what it is. (Effectively you recompile some significant fraction of > STL from scratch on every file, and then throw it away.) > > Precompiled headers can help some, but require complex and highly > non-portable build-system support. (E.g., gcc's precompiled header > constraints are here: > http://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html -- only one > per source file, etc.) > This doesn't look too bad, I think it would be worth setting these up in NumPy. The complexity you see is because its pretty close to the only way that precompiled headers could be set up. > To demonstrate: a trivial hello-world in C using , versus a > trivial version in C++ using . > > On my laptop (gcc 4.5.2), compiling each program 100 times in a loop > requires: > C: 2.28 CPU seconds > C compiled with C++ compiler: 4.61 CPU seconds > C++: 17.66 CPU seconds > Slowdown for using g++ instead of gcc: 2.0x > Slowdown for using C++ standard library: 3.8x > Total C++ penalty: 7.8x > > Lines of code compiled in each case: > $ gcc -E hello.c | wc > 855 2039 16934 > $ g++ -E hello.cc | wc > 18569 40994 437954 > (I.e., the C++ hello world is almost half a megabyte.) > > Of course we won't be using , but , > etc. all have the same basic character. > Thanks for doing the benchmark. It is a bit artificial, however, and when I tried these trivial examples with -O0 and -O2, the difference (in gcc 4.7) of the C++ compile time was about 4%. In NumPy presently as it is in C, the difference between -O0 and -O2 is very significant, and any comparisons need to take this kind of thing into account. When I said I thought the compile-time differences would be smaller than many people expect, I was thinking about how this optimization phase, which is shared between C and C++, often dominating the compile times. Cheers, Mark > -- Nathaniel > > (Test files attached, times were from: > time sh -c 'for i in $(seq 100); do gcc hello.c -o hello-c; done' > cp hello.c c-hello.cc > time sh -c 'for i in $(seq 100); do g++ c-hello.cc -o c-hello-cc; done' > time sh -c 'for i in $(seq 100); do g++ hello.cc -o hello-cc; done' > and then summing the resulting user and system times.) > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Sun Feb 19 14:23:14 2012 From: travis at continuum.io (Travis Oliphant) Date: Sun, 19 Feb 2012 13:23:14 -0600 Subject: [Numpy-discussion] NumPy in PyPy ? In-Reply-To: <4F411902.8090407@gmail.com> References: <4F411902.8090407@gmail.com> Message-ID: I have written up a summary of my views here: http://technicaldiscovery.blogspot.com/2011/10/thoughts-on-porting-numpy-to-pypy.html -Travis On Feb 19, 2012, at 9:45 AM, xavier.gnata at gmail.com wrote: > Hi, > > I'm trying to understand what's going on with : > http://morepypy.blogspot.com/2012/01/numpypy-status-update.html > > What's your opinion on such a numpy rewrite?? > > Thanks, > Xavier > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sun Feb 19 15:32:51 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 19 Feb 2012 12:32:51 -0800 Subject: [Numpy-discussion] Scipy Cython refactor In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <14348F26-27EB-4313-8F90-D9726DC2144D@continuum.io> <3E9F109C-241F-4B91-832D-CDB09941206B@continuum.io> <5FFF6A19-661E-445A-95F9-B0366C148C0E@continuum.io> Message-ID: Hi, On Sun, Feb 19, 2012 at 7:35 AM, Pauli Virtanen wrote: > 19.02.2012 05:38, Travis Oliphant kirjoitti: > [clip] >>>> Sure. ?This list actually deserves a long writeup about that. >>>> First, there wasn't a "Cython-refactor" of NumPy. ? There was a >>>> Cython-refactor of SciPy. ? I'm not sure of it's current status. >>>> I'm still very supportive of that sort of thing. >>> >>> I think I missed that - is it on git somewhere? >> >> I thought so, but I can't find it either. ?We should ask Jason >> McCampbell of Enthought where the code is located. ? Here are the >> distributed eggs: ? http://www.enthought.com/repo/.iron/ > > They're here: > > ? ?https://github.com/dagss/private-scipy-refactor > ? ?https://github.com/jasonmccampbell/scipy-refactor > > The main problem with merging this was the experimental status of FWrap, > and the fact that the wrappers it generates are big compared to f2py and > required manual editing of the generated code. So, there were > maintainability concerns with the Fortran pieces. > > These could probably be solved, however, and I wouldn't be opposed to > e.g. cleaning up the generated code and using manually crafted Cython. > Cherry picking the Cython replacements for all the modules wrapped in C > probably should be done in any case. > > The parts of Scipy affected by the refactoring have not changed > significantly, so there are no significant problems in re-raising the > issue of merging the work back. Thanks for making a new thread. Who knows this work best? Who do you think should join the discussion to plan the work? I might have some time for this - maybe a sprint would be in order, Best, Matthew From mwwiebe at gmail.com Sun Feb 19 15:36:02 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sun, 19 Feb 2012 14:36:02 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: On Sun, Feb 19, 2012 at 4:03 AM, David Cournapeau wrote: > On Sun, Feb 19, 2012 at 9:28 AM, Mark Wiebe wrote: > > > Is there anyone who uses a blue gene or small device which needs > up-to-date > > numpy support, that I could talk to directly? We really need a list of > > supported platforms on the numpy wiki we can refer to when discussing > this > > stuff, it all seems very nebulous to me. > > They may not need an up to date numpy version now, but if stopping > support for them is a requirement for C++, it must be kept in mind. I > actually suspect Travis to have more details on the big iron side of > things. On the small side of things: > http://projects.scipy.org/numpy/ticket/1969 > > This may seem like not very useful - but that's part of what a open > source project is all about in my mind. > > > > > Particular styles of using templates can cause this, yes. To properly do > > this kind of advanced C++ library work, it's important to think about the > > big-O notation behavior of your template instantiations, not just the > big-O > > notation of run-time. C++ templates have a turing-complete language > (which > > is said to be quite similar to haskell, but spelled vastly different) > > running at compile time in them. This is what gives template > > meta-programming in C++ great power, but since templates weren't designed > > for this style of programming originally, template meta-programming is > not > > very easy. > > scipy.sparse.sparsetools is quite straightforward in its usage of > templates (would be great if you could suggest improvement BTW, e.g. > scipy/sparse/sparsetools/csr.h), and does not by itself use any > meta-template programming. > I took a look, and I think the reason this is so slow to compile and uses so much memory is visible as follows: [sparsetools]$ wc *.cxx | sort -n 4039 13276 116263 csgraph_wrap.cxx 6464 21385 189537 dia_wrap.cxx 14002 45406 412262 coo_wrap.cxx 32385 102534 963688 csc_wrap.cxx 42997 140896 1313797 bsr_wrap.cxx 50041 161127 1501400 csr_wrap.cxx 149928 484624 4496947 total That's almost 4.5MB of code, in 6 files. C/C++ compilers are not optimized to compile this sort of thing fast, they are focused on more "human-style" coding with smaller individual files. Looking at some of these SWIG-generated files, the way they dispatch based on the input Python types is bloated as well. Probably the main question I would ask is, does scipy really need sparse matrix variants for all of int8, uint8, int16, uint16, etc? Trimming away some of these might be reasonable, and would be a start to improve compile times. The reason for the slowness is not C++ templates in this example. Cheers, Mark > I like that numpy can be built in a few seconds (at least without > optimization), and consider this to be a useful feature. > > cheers, > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From xavier.gnata at gmail.com Sun Feb 19 16:10:22 2012 From: xavier.gnata at gmail.com (xavier.gnata at gmail.com) Date: Sun, 19 Feb 2012 22:10:22 +0100 Subject: [Numpy-discussion] NumPy in PyPy ? In-Reply-To: References: <4F411902.8090407@gmail.com> Message-ID: <4F41653E.6010506@gmail.com> I'm trying to promote the usage of python and scientific python modules at work. I fully agree with the fact that numpy is only the entrance point to scientific python. Without at least scipy and matplotlib, it is hopeless to forget about matlab. Speed : In my usecases, numpy is decently fast. However, I'm using numexpr and would like get to same about of vanilla numpy (and then with an option to use the GPU...) Backward compatibility is key : We want to reuse our code during let say at least 5 to 10 years. Matlab is ok in terms of backward compatibility. numpy should be (and not only numpy...scipy...matplotlib and so on). It is ok to run an *automated* tool to correct the code if some breakage occurs...but it shall be reliable and automated. go,scientific python, go! :) Cheers, Xavier > I have written up a summary of my views here: > > http://technicaldiscovery.blogspot.com/2011/10/thoughts-on-porting-numpy-to-pypy.html > > > -Travis > > On Feb 19, 2012, at 9:45 AM, xavier.gnata at gmail.com > wrote: > >> Hi, >> >> I'm trying to understand what's going on with : >> http://morepypy.blogspot.com/2012/01/numpypy-status-update.html >> >> What's your opinion on such a numpy rewrite?? >> >> Thanks, >> Xavier >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From mwwiebe at gmail.com Sun Feb 19 16:16:34 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sun, 19 Feb 2012 13:16:34 -0800 Subject: [Numpy-discussion] Scipy Cython refactor In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <14348F26-27EB-4313-8F90-D9726DC2144D@continuum.io> <3E9F109C-241F-4B91-832D-CDB09941206B@continuum.io> <5FFF6A19-661E-445A-95F9-B0366C148C0E@continuum.io> Message-ID: On Sun, Feb 19, 2012 at 7:35 AM, Pauli Virtanen wrote: > 19.02.2012 05:38, Travis Oliphant kirjoitti: > [clip] > >>> Sure. This list actually deserves a long writeup about that. > >>> First, there wasn't a "Cython-refactor" of NumPy. There was a > >>> Cython-refactor of SciPy. I'm not sure of it's current status. > >>> I'm still very supportive of that sort of thing. > >> > >> I think I missed that - is it on git somewhere? > > > > I thought so, but I can't find it either. We should ask Jason > > McCampbell of Enthought where the code is located. Here are the > > distributed eggs: http://www.enthought.com/repo/.iron/ > > They're here: > > https://github.com/dagss/private-scipy-refactor > https://github.com/jasonmccampbell/scipy-refactor > > The main problem with merging this was the experimental status of FWrap, > and the fact that the wrappers it generates are big compared to f2py and > required manual editing of the generated code. So, there were > maintainability concerns with the Fortran pieces. > > These could probably be solved, however, and I wouldn't be opposed to > e.g. cleaning up the generated code and using manually crafted Cython. > Cherry picking the Cython replacements for all the modules wrapped in C > probably should be done in any case. > > The parts of Scipy affected by the refactoring have not changed > significantly, so there are no significant problems in re-raising the > issue of merging the work back. > >From the numpy roadmap discussion, the sparsetools code might be a good candidate for Cythonization. The 4.5MB of code SWIG is generating is mostly parameter checking boilerplate, and if Cython lives up to its reputation, it will be able to easily make this smaller and compile a lot faster. It looks like neither of those two branches switched this code to Cython, unfortunately. -Mark > Pauli > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun Feb 19 18:39:45 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 19 Feb 2012 23:39:45 +0000 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: On Sun, Feb 19, 2012 at 7:13 PM, Mark Wiebe wrote: > On Sun, Feb 19, 2012 at 5:25 AM, Nathaniel Smith wrote: >> Precompiled headers can help some, but require complex and highly >> non-portable build-system support. (E.g., gcc's precompiled header >> constraints are here: >> http://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html -- only one >> per source file, etc.) > > This doesn't look too bad, I think it would be worth setting these up in > NumPy. The complexity you see is because its pretty close to the only way > that precompiled headers could be set up. Sure, so long as you know what headers every file needs. (Or more likely, figure out a more-or-less complete set of all the headers might ever need, and then -include that into every file.) >> To demonstrate: a trivial hello-world in C using , versus a >> trivial version in C++ using . >> >> On my laptop (gcc 4.5.2), compiling each program 100 times in a loop >> requires: >> ?C: 2.28 CPU seconds >> ?C compiled with C++ compiler: 4.61 CPU seconds >> ?C++: 17.66 CPU seconds >> Slowdown for using g++ instead of gcc: 2.0x >> Slowdown for using C++ standard library: 3.8x >> Total C++ penalty: 7.8x >> >> Lines of code compiled in each case: >> ?$ gcc -E hello.c | wc >> ? ? ?855 ? ?2039 ? 16934 >> ?$ g++ -E hello.cc | wc >> ? ?18569 ? 40994 ?437954 >> (I.e., the C++ hello world is almost half a megabyte.) >> >> Of course we won't be using , but , >> etc. all have the same basic character. > > > Thanks for doing the benchmark. It is a bit artificial, however, and when I > tried these trivial examples with -O0 and -O2, the difference (in gcc 4.7) > of the C++ compile time was about 4%. In NumPy presently as it is in C, the > difference between -O0 and -O2 is very significant, and any comparisons need > to take this kind of thing into account. When I said I thought the > compile-time differences would be smaller than many people expect, I was > thinking about how this optimization phase, which is shared between C and > C++, often dominating the compile times. Sure -- but the effective increased code-size for STL-using C++ affects the optimizer too; it's effectively re-optimizing all the used parts of STL again for each source file. (Presumably in this benchmark that half megabyte of extra code is mostly unused, and therefore getting thrown out before the optimizer does any work on it -- but that doesn't happen if you're actually using the library!) Maybe things have gotten better in the last year or two, I dunno; if you run a better benchmark I'll listen. But there's an order-of-magnitude difference in compile times between most real-world C projects and most real-world C++ projects. It might not be a deal-breaker and it might not apply for subset of C++ you're planning to use, but AFAICT that's the facts. -- Nathaniel From njs at pobox.com Sun Feb 19 18:42:54 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 19 Feb 2012 23:42:54 +0000 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: On Sun, Feb 19, 2012 at 1:42 PM, Neal Becker wrote: > On Fedora linux I use ccache, which is completely transparant and makes a huge > difference in build times. ccache is fabulous (and it's fabulous for C too), but it only helps when 'make' has screwed up and decided to rebuild some file that didn't really need rebuilding, or when doing a clean build (which is more or less the same thing, if you think about it). -- Nathaniel From charlesr.harris at gmail.com Sun Feb 19 19:12:06 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 19 Feb 2012 17:12:06 -0700 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: On Sun, Feb 19, 2012 at 4:42 PM, Nathaniel Smith wrote: > On Sun, Feb 19, 2012 at 1:42 PM, Neal Becker wrote: > > On Fedora linux I use ccache, which is completely transparant and makes > a huge > > difference in build times. > > ccache is fabulous (and it's fabulous for C too), but it only helps > when 'make' has screwed up and decided to rebuild some file that > didn't really need rebuilding, or when doing a clean build (which is > more or less the same thing, if you think about it). > > For Numpy, there are also other things going on. My clean builds finish in about 30 seconds using one cpu, not so clean builds take longer. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Sun Feb 19 19:14:12 2012 From: sturla at molden.no (Sturla Molden) Date: Mon, 20 Feb 2012 01:14:12 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> Message-ID: <4F419054.8040203@molden.no> Den 20.02.2012 00:39, skrev Nathaniel Smith: > But there's an order-of-magnitude difference in compile times between > most real-world C projects and most real-world C++ projects. It might > not be a deal-breaker and it might not apply for subset of C++ you're > planning to use, but AFAICT that's the facts. This is mainly a complaint about the build-process. Maybe make or distutis are broken, I don't know. But with a sane build tool (e.g. MS Visual Studio or Eclipse) this is not a problem. You just recompile the file you are working with, not the rest (unless you do a clean build). Sturla From stefan at sun.ac.za Sun Feb 19 23:20:25 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Sun, 19 Feb 2012 20:20:25 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <4F419054.8040203@molden.no> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F419054.8040203@molden.no> Message-ID: On Feb 19, 2012 4:14 PM, "Sturla Molden" wrote: > > Den 20.02.2012 00:39, skrev Nathaniel Smith: > > But there's an order-of-magnitude difference in compile times between > > most real-world C projects and most real-world C++ projects. It might > > not be a deal-breaker and it might not apply for subset of C++ you're > > planning to use, but AFAICT that's the facts. > > This is mainly a complaint about the build-process. This has nothing to do with the build process. More complex languages take longer to compile. The benchmark shown is also entirely independent of build system. St?fan -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.anton.letnes at gmail.com Mon Feb 20 02:35:46 2012 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Mon, 20 Feb 2012 08:35:46 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F419054.8040203@molden.no> Message-ID: In the language wars, I have one question. Why is Fortran not being considered? Fortran already implements many of the features that we want in NumPy: - slicing and similar operations, at least some of the fancy indexing kind - element-wise array operations and function calls - array bounds-checking and other debugging aid (with debugging flags) - arrays that mentally map very well onto numpy arrays. To me, this spells +1 to ease of contribution, over some abstract C/C++ template - in newer standards it has some nontrivial mathematical functions: gamma, bessel, etc. that numpy lacks right now - compilers that are good at optimizing for floating-point performance, because that's what Fortran is all about - not Fortran as such, but BLAS and LAPACK are easily accessed by Fortran - possibly other numerical libraries that can be helpful - Fortran has, in its newer standards, thought of C interoperability. We could still keep bits of the code in C (or even C++?) if we'd like to, or perhaps f2py/Cython could do the wrapping. - some programmers know Fortran better than C++. Fortran is at least used by many science guys, like me. Until someone comes along with actual numbers or at least anecdotal evidence, I don't think the "more programmers know X than Y" argument is too interesting. Personally I've learned both, and Fortran is much more accessible than C++ (to me) if you're used to the "work with (numpy) arrays" mentality. As far as I can understand, implementing element-wise operations, slicing, and a host of other NumPy features is in some sense pointless - the Fortran compiler authors have already done it for us. Of course some nice wrapping will be needed in C, Cython, f2py, or similar. Since my understanding is limited, I'd be interested in being proved wrong, though :) Paul From scipy at samueljohn.de Mon Feb 20 04:34:43 2012 From: scipy at samueljohn.de (Samuel John) Date: Mon, 20 Feb 2012 10:34:43 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E8840.8070405@continuum.io> Message-ID: On 17.02.2012, at 21:46, Ralf Gommers wrote: > [...] > So far no one has managed to build the numpy/scipy combo with the LLVM-based compilers, so if you were willing to have a go at fixing that it would be hugely appreciated. See http://projects.scipy.org/scipy/ticket/1500 for details. > > Once that's fixed, numpy can switch to using it for releases. Well, I had great success with using clang and clang++ (which uses llvm) to compile both numpy and scipy on OS X 10.7.3. Samuel From pav at iki.fi Mon Feb 20 04:54:19 2012 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 20 Feb 2012 10:54:19 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F419054.8040203@molden.no> Message-ID: 20.02.2012 08:35, Paul Anton Letnes kirjoitti: > In the language wars, I have one question. > Why is Fortran not being considered? Fortran is OK for simple numerical algorithms, but starts to suck heavily if you need to do any string handling, I/O, complicated logic, or data structures. Most of the work in Numpy implementation is not actually in numerics, but in figuring out the correct operation to dispatch the computations to. So, this is one reason why Fortran is not considered. -- Pauli Virtanen From stefan at sun.ac.za Mon Feb 20 05:24:23 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 20 Feb 2012 02:24:23 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F419054.8040203@molden.no> Message-ID: On Mon, Feb 20, 2012 at 1:54 AM, Pauli Virtanen wrote: > 20.02.2012 08:35, Paul Anton Letnes kirjoitti: >> In the language wars, I have one question. >> Why is Fortran not being considered? > > Fortran is OK for simple numerical algorithms, but starts to suck > heavily if you need to do any string handling, I/O, complicated logic, > or data structures. Out of curiosity, is this still true for the latest Fortran versions? I guess there the problem may be compiler support over various platforms. St?fan From charlesr.harris at gmail.com Mon Feb 20 06:43:40 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 20 Feb 2012 04:43:40 -0700 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F419054.8040203@molden.no> Message-ID: On Mon, Feb 20, 2012 at 2:54 AM, Pauli Virtanen wrote: > 20.02.2012 08:35, Paul Anton Letnes kirjoitti: > > In the language wars, I have one question. > > Why is Fortran not being considered? > > Fortran is OK for simple numerical algorithms, but starts to suck > heavily if you need to do any string handling, I/O, complicated logic, > or data structures. > > Most of the work in Numpy implementation is not actually in numerics, > but in figuring out the correct operation to dispatch the computations > to. So, this is one reason why Fortran is not considered. > > There also used to be a problem with unsigned types not being available. I don't know if that is still the case. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Mon Feb 20 10:09:46 2012 From: sturla at molden.no (Sturla Molden) Date: Mon, 20 Feb 2012 16:09:46 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F419054.8040203@molden.no> Message-ID: <4F42623A.1070406@molden.no> Den 20.02.2012 12:43, skrev Charles R Harris: > > > There also used to be a problem with unsigned types not being > available. I don't know if that is still the case. > Fortran -- like Python and Java -- does not have built-in unsigned integer types. It is never really a problem though. One can e.g. use a longer integer or keep them in an array of bytes. (Fortran 2003 is OOP so it is possible to define one if needed. Not saying it is a good idea.) Sturla From sturla at molden.no Mon Feb 20 10:15:59 2012 From: sturla at molden.no (Sturla Molden) Date: Mon, 20 Feb 2012 16:15:59 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F419054.8040203@molden.no> Message-ID: <4F4263AF.9020502@molden.no> Den 20.02.2012 10:54, skrev Pauli Virtanen: > Fortran is OK for simple numerical algorithms, but starts to suck > heavily if you need to do any string handling, I/O, complicated logic, > or data structures For string handling, C is actually worse than Fortran. In Fortran a string can be sliced like in Python. It is not as nice as Python, but far better than C. Fortran's built-in I/O syntax is archaic, but the ISO C bindings in Fortran 2003 means one can use other means of I/O (posix, win api, C stdio) in a portable way. Sturla From sturla at molden.no Mon Feb 20 10:29:46 2012 From: sturla at molden.no (Sturla Molden) Date: Mon, 20 Feb 2012 16:29:46 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F419054.8040203@molden.no> Message-ID: <4F4266EA.702@molden.no> Den 20.02.2012 08:35, skrev Paul Anton Letnes: > In the language wars, I have one question. Why is Fortran not being considered? Fortran already implements many of the features that we want in NumPy: Yes ... but it does not make Fortran a systems programming language. Making NumPy is different from using it. > - slicing and similar operations, at least some of the fancy indexing kind > - element-wise array operations and function calls > - array bounds-checking and other debugging aid (with debugging flags) That is nice for numerical computing, but not really needed to make NumPy. > - arrays that mentally map very well onto numpy arrays. To me, this spells +1 to ease of contribution, over some abstract C/C++ template Mentally perhaps, but not binary. NumPy needs uniformly strided memory on the binary level. Fortran just gives this at the mental level. E.g. there is nothing that dictates a Fortran pointer has to be a view, the compiler is free to employ copy-in copy-out. In Fortran, a function call can invalidate a pointer. One would therefore have to store the array in an array of integer*1, and use the intrinsic function transfer() to parse the contents into NumPy dtypes. > - in newer standards it has some nontrivial mathematical functions: gamma, bessel, etc. that numpy lacks right now That belongs to SciPy. > - compilers that are good at optimizing for floating-point performance, because that's what Fortran is all about Insanely good, but not when we start to do the (binary, not mentally) strided access that NumPy needs. (Not that C compilers would be any better.) > - not Fortran as such, but BLAS and LAPACK are easily accessed by Fortran > - possibly other numerical libraries that can be helpful > - Fortran has, in its newer standards, thought of C interoperability. We could still keep bits of the code in C (or even C++?) if we'd like to, or perhaps f2py/Cython could do the wrapping. Not f2py, as it depends on NumPy. - some programmers know Fortran better than C++. Fortran is at least used by many science guys, like me. That is a valid arguments. Fortran is also much easier to read and debug. Sturla From sturla at molden.no Mon Feb 20 11:35:26 2012 From: sturla at molden.no (Sturla Molden) Date: Mon, 20 Feb 2012 17:35:26 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F419054.8040203@molden.no> Message-ID: <4F42764E.1040504@molden.no> Den 20.02.2012 08:35, skrev Paul Anton Letnes: > As far as I can understand, implementing element-wise operations, slicing, and a host of other NumPy features is in some sense pointless - the Fortran compiler authors have already done it for us. Only if you know the array dimensions in advance. Sturla From sturla at molden.no Mon Feb 20 11:42:31 2012 From: sturla at molden.no (Sturla Molden) Date: Mon, 20 Feb 2012 17:42:31 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> <4F402B46.4020103@molden.no> Message-ID: <4F4277F7.4000401@molden.no> Den 19.02.2012 00:09, skrev David Cournapeau: > There are better languages than C++ that has most of the technical > benefits stated in this discussion (rust and D being the most > "obvious" ones), but whose usage is unrealistic today for various > reasons: knowledge, availability on "esoteric" platforms, etc? A new > language is completely ridiculous. There are still other options than C or C++ that are worth considering. One would be to write NumPy in Python. E.g. we could use LLVM as a JIT-compiler and produce the performance critical code we need on the fly. Sturla From sturla at molden.no Mon Feb 20 11:55:09 2012 From: sturla at molden.no (Sturla Molden) Date: Mon, 20 Feb 2012 17:55:09 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <4F4277F7.4000401@molden.no> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3F2297.7010703@creativetrax.com> <4F402B46.4020103@molden.no> <4F4277F7.4000401@molden.no> Message-ID: <4F427AED.3030903@molden.no> Den 20.02.2012 17:42, skrev Sturla Molden: > There are still other options than C or C++ that are worth considering. > One would be to write NumPy in Python. E.g. we could use LLVM as a > JIT-compiler and produce the performance critical code we need on the fly. > > LLVM and its C/C++ frontend Clang are BSD licenced. It compiles faster than GCC and often produces better machine code. They can therefore be used inside an array library. It would give a faster NumPy, and we could keep most of it in Python. Sturla From charlesr.harris at gmail.com Mon Feb 20 12:14:31 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 20 Feb 2012 10:14:31 -0700 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <4F427AED.3030903@molden.no> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3F2297.7010703@creativetrax.com> <4F402B46.4020103@molden.no> <4F4277F7.4000401@molden.no> <4F427AED.3030903@molden.no> Message-ID: On Mon, Feb 20, 2012 at 9:55 AM, Sturla Molden wrote: > Den 20.02.2012 17:42, skrev Sturla Molden: > > There are still other options than C or C++ that are worth considering. > > One would be to write NumPy in Python. E.g. we could use LLVM as a > > JIT-compiler and produce the performance critical code we need on the > fly. > > > > > > LLVM and its C/C++ frontend Clang are BSD licenced. It compiles faster > than GCC and often produces better machine code. They can therefore be > used inside an array library. It would give a faster NumPy, and we could > keep most of it in Python. > > Would that work for Ruby also? One of the advantages of C++ is that the code doesn't need to be refactored to start with, just modified step by step going into the future. I think PyPy is close to what you are talking about. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Mon Feb 20 12:18:59 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 20 Feb 2012 09:18:59 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <4F427AED.3030903@molden.no> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3F2297.7010703@creativetrax.com> <4F402B46.4020103@molden.no> <4F4277F7.4000401@molden.no> <4F427AED.3030903@molden.no> Message-ID: <4F428083.50809@astro.uio.no> On 02/20/2012 08:55 AM, Sturla Molden wrote: > Den 20.02.2012 17:42, skrev Sturla Molden: >> There are still other options than C or C++ that are worth considering. >> One would be to write NumPy in Python. E.g. we could use LLVM as a >> JIT-compiler and produce the performance critical code we need on the fly. >> >> > > LLVM and its C/C++ frontend Clang are BSD licenced. It compiles faster > than GCC and often produces better machine code. They can therefore be > used inside an array library. It would give a faster NumPy, and we could > keep most of it in Python. I think it is moot to focus on improving NumPy performance as long as in practice all NumPy operations are memory bound due to the need to take a trip through system memory for almost any operation. C/C++ is simply "good enough". JIT is when you're chasing a 2x improvement or so, but today NumPy can be 10-20x slower than a Cython loop. You need at least a slightly different Python API to get anywhere, so numexpr/Theano is the right place to work on an implementation of this idea. Of course it would be nice if numexpr/Theano offered something as convenient as with lazy: arr = A + B + C # with all of these NumPy arrays # compute upon exiting... Dag From matthieu.brucher at gmail.com Mon Feb 20 12:19:52 2012 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Mon, 20 Feb 2012 18:19:52 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <4F4043D9.5080400@molden.no> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <14348F26-27EB-4313-8F90-D9726DC2144D@continuum.io> <3E9F109C-241F-4B91-832D-CDB09941206B@continuum.io> <4F4043D9.5080400@molden.no> Message-ID: > C++11 has this option: > > for (auto& item : container) { > // iterate over the container object, > // get a reference to each item > // > // "container" can be an STL class or > // A C-style array with known size. > } > > Which does this: > > for item in container: > pass > It is even better than using the macro way because the compiler knows everything is constant (start and end), so it can do better things. > > Using C++ templates to generate ufunc loops is an obvious application, > > but again, in the simple examples > > Template metaprogramming? > > Don't even think about it. It is brain dead to try to outsmart the > compiler. > It is really easy to outsmart the compiler. Really. I use metaprogramming for loop creation to optimize cache behavior, communication in parallel environments, and there is no way the compiler would have done things as efficiently (and there is a lot of leeway to enhance my code). -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthieu.brucher at gmail.com Mon Feb 20 12:22:20 2012 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Mon, 20 Feb 2012 18:22:20 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <5FFF6A19-661E-445A-95F9-B0366C148C0E@continuum.io> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <14348F26-27EB-4313-8F90-D9726DC2144D@continuum.io> <3E9F109C-241F-4B91-832D-CDB09941206B@continuum.io> <5FFF6A19-661E-445A-95F9-B0366C148C0E@continuum.io> Message-ID: > Would it be fair to say then, that you are expecting the discussion > about C++ will mainly arise after the Mark has written the code? I > can see that it will be easier to specific at that point, but there > must be a serious risk that it will be too late to seriously consider > an alternative approach. > > > We will need to see examples of what Mark is talking about and clarify > some of the compiler issues. Certainly there is some risk that once code > is written that it will be tempting to just use it. Other approaches are > certainly worth exploring in the mean-time, but C++ has some strong > arguments for it. > Compilers for C++98 are now stable enough (except on Bluegene, see the Boost distribution with xlc++) C++ helps a lot to enhance robustness.ts? > > From my perspective having a standalone core NumPy is still a goal. The > primary advantages of having a NumPy library (call it NumLib for the sake > of argument) are > > 1) Ability for projects like PyPy, IronPython, and Jython to use it more > easily > 2) Ability for Ruby, Perl, Node.JS, and other new languages to use the > code for their technical computing projects. > 3) increasing the number of users who can help make it more solid > 4) being able to build the user-base (and corresponding performance with > eye-balls from Intel, NVidia, AMD, Microsoft, Google, etc. looking at the > code). > > The disadvantages I can think of: > 1) More users also means we might risk "lowest-commond-denominator" > problems --- i.e. trying to be too much to too many may make it not useful > for anyone. Also, more users means more people with opinions that might be > difficult to re-concile. > 2) The work of doing the re-write is not small: probably at least 6 > person-months > 3) Not being able to rely on Python objects (dictionaries, lists, and > tuples are currently used in the code-base quite a bit --- though the > re-factor did show some examples of how to remove this usage). > 4) Handling of "Object" arrays requires some re-design. > > I'm sure there are other factors that could be added to both lists. > > -Travis > > > > Thanks a lot for the reply, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthieu.brucher at gmail.com Mon Feb 20 12:24:44 2012 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Mon, 20 Feb 2012 18:24:44 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <14348F26-27EB-4313-8F90-D9726DC2144D@continuum.io> <3E9F109C-241F-4B91-832D-CDB09941206B@continuum.io> <5FFF6A19-661E-445A-95F9-B0366C148C0E@continuum.io> Message-ID: 2012/2/19 Matthew Brett > Hi, > > On Sat, Feb 18, 2012 at 8:38 PM, Travis Oliphant > wrote: > > > We will need to see examples of what Mark is talking about and clarify > some > > of the compiler issues. Certainly there is some risk that once code is > > written that it will be tempting to just use it. Other approaches are > > certainly worth exploring in the mean-time, but C++ has some strong > > arguments for it. > > The worry as I understand it is that a C++ rewrite might make the > numpy core effectively a read-only project for anyone but Mark. Do > you have any feeling for whether that is likely? > Some of us are C developers, other are C++. It will depend on the background of each of us. > How would numpylib compare to libraries like eigen? How likely do you > think it would be that unrelated projects would use numpylib rather > than eigen or other numerical libraries? Do you think the choice of > C++ rather than C will influence whether other projects will take it > up? > I guess that the C++ port may open a door to change the back-end, and perhaps use Eigen, or ArBB. As those guys (ArBB) wanted to provided a Python interface compatible with Numpy to their VM, it may be interesting to be able to change back-ends (although it is limited to one platform and 2 OS). -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Mon Feb 20 12:26:06 2012 From: sturla at molden.no (Sturla Molden) Date: Mon, 20 Feb 2012 18:26:06 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F402B46.4020103@molden.no> <4F4277F7.4000401@molden.no> <4F427AED.3030903@molden.no> Message-ID: <4F42822E.5040600@molden.no> Den 20.02.2012 18:14, skrev Charles R Harris: > > Would that work for Ruby also? One of the advantages of C++ is that > the code doesn't need to be refactored to start with, just modified > step by step going into the future. I think PyPy is close to what you > are talking about. > If we plant to support more languages than Python, it might be better to use C++ (sorry). But it does not mean that LLVM cannot be used. Either one can generate C or C++, or just use the assembly language (which is very simple and readable too: http://llvm.org/docs/LangRef.html). We have exact knowledge about an ndarray at runtime: - dtype - dimensions - strides - whether the array is contiguous or not This can be JIT-compiled into specialized looping code by LLVM. These kernels can then be stored in a database and resued. If it matters, LLVM is embeddable in C++. Sturla From matthieu.brucher at gmail.com Mon Feb 20 12:28:19 2012 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Mon, 20 Feb 2012 18:28:19 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> Message-ID: 2012/2/19 Nathaniel Smith > On Sun, Feb 19, 2012 at 9:16 AM, David Cournapeau > wrote: > > On Sun, Feb 19, 2012 at 8:08 AM, Mark Wiebe wrote: > >> Is there a specific > >> target platform/compiler combination you're thinking of where we can do > >> tests on this? I don't believe the compile times are as bad as many > people > >> suspect, can you give some simple examples of things we might do in > NumPy > >> you expect to compile slower in C++ vs C? > > > > Switching from gcc to g++ on the same codebase should not change much > > compilation times. We should test, but that's not what worries me. > > What worries me is when we start using C++ specific code, STL and co. > > Today, scipy.sparse.sparsetools takes half of the build time of the > > whole scipy, and it does not even use fancy features. It also takes Gb > > of ram when building in parallel. > > I like C++ but it definitely does have issues with compilation times. > > IIRC the main problem is very simple: STL and friends (e.g. Boost) are > huge libraries, and because they use templates, the entire source code > is in the header files. That means that as soon as you #include a few > standard C++ headers, your innocent little source file has suddenly > become hundreds of thousands of lines long, and it just takes the > compiler a while to churn through megabytes of source code, no matter > what it is. (Effectively you recompile some significant fraction of > STL from scratch on every file, and then throw it away.) > In fact Boost tries to be clean about this. Up to a few minor releases of GCC, their headers were a mess. When you included something, a lot of additional code was brought, and the compile-time exploded. But this is no longer the case. If we restrict the core to a few includes, even with templates, it should not be long to compile. -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From francesc at continuum.io Mon Feb 20 12:28:59 2012 From: francesc at continuum.io (Francesc Alted) Date: Mon, 20 Feb 2012 18:28:59 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <4F428083.50809@astro.uio.no> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3F2297.7010703@creativetrax.com> <4F402B46.4020103@molden.no> <4F4277F7.4000401@molden.no> <4F427AED.3030903@molden.no> <4F428083.50809@astro.uio.no> Message-ID: On Feb 20, 2012, at 6:18 PM, Dag Sverre Seljebotn wrote: > You need at least a slightly different Python API to get anywhere, so > numexpr/Theano is the right place to work on an implementation of this > idea. Of course it would be nice if numexpr/Theano offered something as > convenient as > > with lazy: > arr = A + B + C # with all of these NumPy arrays > # compute upon exiting? Hmm, that would be cute indeed. Do you have an idea on how the code in the with context could be passed to the Python AST compiler (? la numexpr.evaluate("A + B + C"))? -- Francesc Alted From matthieu.brucher at gmail.com Mon Feb 20 12:30:04 2012 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Mon, 20 Feb 2012 18:30:04 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <4F4119B4.7060209@molden.no> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F4119B4.7060209@molden.no> Message-ID: 2012/2/19 Sturla Molden > Den 19.02.2012 10:28, skrev Mark Wiebe: > > > > Particular styles of using templates can cause this, yes. To properly > > do this kind of advanced C++ library work, it's important to think > > about the big-O notation behavior of your template instantiations, not > > just the big-O notation of run-time. C++ templates have a > > turing-complete language (which is said to be quite similar to > > haskell, but spelled vastly different) running at compile time in > > them. This is what gives template meta-programming in C++ great power, > > but since templates weren't designed for this style of programming > > originally, template meta-programming is not very easy. > > > > > > The problem with metaprogramming is that we are doing manually the work > that belongs to the compiler. Blitz++ was supposed to be a library that > "thought like a compiler". But then compilers just got better. Today, it > is no longer possible for a numerical library programmer to outsmart an > optimizing C++ compiler. All metaprogramming can do today is produce > error messages noone can understand. And the resulting code will often > be slower because the compiler has less opportunities to do its work. > As I've said, the compiler is pretty much stupid. It cannot do what Blitzz++ did, or what Eigen is currently doing, mainly because of the basis different languages (C or C++). -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjordan1 at uw.edu Mon Feb 20 12:34:38 2012 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Mon, 20 Feb 2012 09:34:38 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <4F428083.50809@astro.uio.no> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3F2297.7010703@creativetrax.com> <4F402B46.4020103@molden.no> <4F4277F7.4000401@molden.no> <4F427AED.3030903@molden.no> <4F428083.50809@astro.uio.no> Message-ID: On Mon, Feb 20, 2012 at 9:18 AM, Dag Sverre Seljebotn wrote: > On 02/20/2012 08:55 AM, Sturla Molden wrote: >> Den 20.02.2012 17:42, skrev Sturla Molden: >>> There are still other options than C or C++ that are worth considering. >>> One would be to write NumPy in Python. E.g. we could use LLVM as a >>> JIT-compiler and produce the performance critical code we need on the fly. >>> >>> >> >> LLVM and its C/C++ frontend Clang are BSD licenced. It compiles faster >> than GCC and often produces better machine code. They can therefore be >> used inside an array library. It would give a faster NumPy, and we could >> keep most of it in Python. > > I think it is moot to focus on improving NumPy performance as long as in > practice all NumPy operations are memory bound due to the need to take a > trip through system memory for almost any operation. C/C++ is simply > "good enough". JIT is when you're chasing a 2x improvement or so, but > today NumPy can be 10-20x slower than a Cython loop. > I don't follow this. Could you expand a bit more? (Specifically, I wasn't aware that numpy could be 10-20x slower than a cython loop, if we're talking about the base numpy library--so core operations. I'm also not totally sure why a JIT is a 2x improvement or so vs. cython. Not that a disagree on either of these points, I'd just like a bit more detail.) Thanks, Chris > You need at least a slightly different Python API to get anywhere, so > numexpr/Theano is the right place to work on an implementation of this > idea. Of course it would be nice if numexpr/Theano offered something as > convenient as > > with lazy: > ? ? arr = A + B + C # with all of these NumPy arrays > # compute upon exiting... > > Dag > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From sturla at molden.no Mon Feb 20 12:44:50 2012 From: sturla at molden.no (Sturla Molden) Date: Mon, 20 Feb 2012 18:44:50 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <4F428083.50809@astro.uio.no> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F402B46.4020103@molden.no> <4F4277F7.4000401@molden.no> <4F427AED.3030903@molden.no> <4F428083.50809@astro.uio.no> Message-ID: <4F428692.6040800@molden.no> Den 20.02.2012 18:18, skrev Dag Sverre Seljebotn: > > I think it is moot to focus on improving NumPy performance as long as in > practice all NumPy operations are memory bound due to the need to take a > trip through system memory for almost any operation. C/C++ is simply > "good enough". JIT is when you're chasing a 2x improvement or so, but > today NumPy can be 10-20x slower than a Cython loop. > > You need at least a slightly different Python API to get anywhere, so > numexpr/Theano is the right place to work on an implementation of this > idea. Of course it would be nice if numexpr/Theano offered something as > convenient as > > with lazy: > arr = A + B + C # with all of these NumPy arrays > # compute upon exiting... > > Lazy evaluation is nice. But I was thinking more about how to avoid C++ in the NumPy core, so more than 2 or 3 programmers could contribute. I.e. my point was not that loops in LLVM would be much faster than C++ (that is besides the point), but the code could be written in Python instead of C++. But if the idea is to support other languages as well (which I somehow forgot), then this approach certainly becomes less useful. (OTOH, lazy evaluation is certainly easier to achieve with JIT compilation. But that will have to wait until NumPy 5.0 perhaps...) Sturla From d.s.seljebotn at astro.uio.no Mon Feb 20 12:46:49 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 20 Feb 2012 09:46:49 -0800 Subject: [Numpy-discussion] ndarray and lazy evaluation (was: Proposed Rodmap Overview) In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F402B46.4020103@molden.no> <4F4277F7.4000401@molden.no> <4F427AED.3030903@molden.no> <4F428083.50809@astro.uio.no> Message-ID: <4F428709.9020206@astro.uio.no> On 02/20/2012 09:24 AM, Olivier Delalleau wrote: > Hi Dag, > > Would you mind elaborating a bit on that example you mentioned at the > end of your email? I don't quite understand what behavior you would like > to achieve Sure, see below. I think we should continue discussion on numpy-discuss. I wrote: > You need at least a slightly different Python API to get anywhere, so > numexpr/Theano is the right place to work on an implementation of this > idea. Of course it would be nice if numexpr/Theano offered something as > convenient as > > with lazy: > arr = A + B + C # with all of these NumPy arrays > # compute upon exiting... More information: The disadvantage today of using Theano (or numexpr) is that they require using a different API, so that one has to learn and use Theano "from the ground up", rather than just slap it on in an optimization phase. The alternative would require extensive changes to NumPy, so I guess Theano authors or Francesc would need to push for this. The alternative would be (with A, B, C ndarray instances): with theano.lazy: arr = A + B + C On __enter__, the context manager would hook into NumPy to override it's arithmetic operators. Then it would build a Theano symbolic tree instead of performing computations right away. In addition to providing support for overriding arithmetic operators, slicing etc., it would be necesarry for "arr" to be an ndarray instance which is "not yet computed" (data-pointer set to NULL, and store a compute-me callback and some context information). Finally, the __exit__ would trigger computation. For other operations which need the data pointer (e.g., single element lookup) one could either raise an exception or trigger computation. This is just a rough sketch. It is not difficult "in principle", but of course there's really a massive amount of work involved to work support for this into the NumPy APIs. Probably, we're talking a NumPy 3.0 thing, after the current round of refactorings have settled... Please: Before discussing this further one should figure out if there's manpower available for it; no sense in hashing out a castle in the sky in details. Also it would be better to talk in person about this if possible (I'm in Berkeley now and will attend PyData and PyCon). Dag From sturla at molden.no Mon Feb 20 13:02:17 2012 From: sturla at molden.no (Sturla Molden) Date: Mon, 20 Feb 2012 19:02:17 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F402B46.4020103@molden.no> <4F4277F7.4000401@molden.no> <4F427AED.3030903@molden.no> <4F428083.50809@astro.uio.no> Message-ID: <4F428AA9.7020203@molden.no> Den 20.02.2012 18:34, skrev Christopher Jordan-Squire: > I don't follow this. Could you expand a bit more? (Specifically, I > wasn't aware that numpy could be 10-20x slower than a cython loop, if > we're talking about the base numpy library--so core operations. I'm > also not totally sure why a JIT is a 2x improvement or so vs. cython. > Not that a disagree on either of these points, I'd just like a bit > more detail.) Dag Sverre is right about this. NumPy is memory bound, Cython loops are (usually) CPU bound. If you write: x[:] = a + b + c # numpy arrays then this happens (excluding reference counting): - allocate temporary array - loop over a and b, add to temporary - allocate 2nd temporary array - loop over 1st temporary array and c, add to 2nd - deallocate 1st temporary array - loop over 2nd temporary array, assign to x - deallocate 2nd temporary array Since memory access is slow, memory allocation and deallocation is slow, and computation is fast, this will be perhaps 10 times slower than what we could do with a loop in Cython: for i in range(n): x[i] = a[i] + b[i] + c[i] I.e. we get rid of the temporary arrays and the multiple loops. All the temporaries here are put in registers. It is streaming data into the CPU that is slow, not computing! It has actually been experimented with streaming data in a compressed form, and decompressing on the fly, as data access still dominates the runtime (even if you do a lot of computing per element). Sturla From francesc at continuum.io Mon Feb 20 13:04:03 2012 From: francesc at continuum.io (Francesc Alted) Date: Mon, 20 Feb 2012 19:04:03 +0100 Subject: [Numpy-discussion] ndarray and lazy evaluation (was: Proposed Rodmap Overview) In-Reply-To: <4F428709.9020206@astro.uio.no> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F402B46.4020103@molden.no> <4F4277F7.4000401@molden.no> <4F427AED.3030903@molden.no> <4F428083.50809@astro.uio.no> <4F428709.9020206@astro.uio.no> Message-ID: On Feb 20, 2012, at 6:46 PM, Dag Sverre Seljebotn wrote: > On 02/20/2012 09:24 AM, Olivier Delalleau wrote: >> Hi Dag, >> >> Would you mind elaborating a bit on that example you mentioned at the >> end of your email? I don't quite understand what behavior you would like >> to achieve > > Sure, see below. I think we should continue discussion on numpy-discuss. > > I wrote: > >> You need at least a slightly different Python API to get anywhere, so >> numexpr/Theano is the right place to work on an implementation of this >> idea. Of course it would be nice if numexpr/Theano offered something as >> convenient as >> >> with lazy: >> arr = A + B + C # with all of these NumPy arrays >> # compute upon exiting... > > More information: > > The disadvantage today of using Theano (or numexpr) is that they require > using a different API, so that one has to learn and use Theano "from the > ground up", rather than just slap it on in an optimization phase. > > The alternative would require extensive changes to NumPy, so I guess > Theano authors or Francesc would need to push for this. > > The alternative would be (with A, B, C ndarray instances): > > with theano.lazy: > arr = A + B + C > > On __enter__, the context manager would hook into NumPy to override it's > arithmetic operators. Then it would build a Theano symbolic tree instead > of performing computations right away. > > In addition to providing support for overriding arithmetic operators, > slicing etc., it would be necesarry for "arr" to be an ndarray instance > which is "not yet computed" (data-pointer set to NULL, and store a > compute-me callback and some context information). > > Finally, the __exit__ would trigger computation. For other operations > which need the data pointer (e.g., single element lookup) one could > either raise an exception or trigger computation. > > This is just a rough sketch. It is not difficult "in principle", but of > course there's really a massive amount of work involved to work support > for this into the NumPy APIs. > > Probably, we're talking a NumPy 3.0 thing, after the current round of > refactorings have settled... > > Please: Before discussing this further one should figure out if there's > manpower available for it; no sense in hashing out a castle in the sky > in details. I see. Mark Wiebe already suggested the same thing some time ago: https://github.com/numpy/numpy/blob/master/doc/neps/deferred-ufunc-evaluation.rst > Also it would be better to talk in person about this if > possible (I'm in Berkeley now and will attend PyData and PyCon). Nice. Most of Continuum crew (me included) will be attending to both conferences. Mark W. will make PyCon only, but will be a good occasion to discuss this further. See you, -- Francesc Alted From shish at keba.be Mon Feb 20 13:07:13 2012 From: shish at keba.be (Olivier Delalleau) Date: Mon, 20 Feb 2012 13:07:13 -0500 Subject: [Numpy-discussion] ndarray and lazy evaluation (was: Proposed Rodmap Overview) In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F402B46.4020103@molden.no> <4F4277F7.4000401@molden.no> <4F427AED.3030903@molden.no> <4F428083.50809@astro.uio.no> <4F428709.9020206@astro.uio.no> Message-ID: Never mind. The link Francesc posted answered my question :) -=- Olivier Le 20 f?vrier 2012 12:54, Olivier Delalleau a ?crit : > Le 20 f?vrier 2012 12:46, Dag Sverre Seljebotn > a ?crit : > > On 02/20/2012 09:24 AM, Olivier Delalleau wrote: >> > Hi Dag, >> > >> > Would you mind elaborating a bit on that example you mentioned at the >> > end of your email? I don't quite understand what behavior you would like >> > to achieve >> >> Sure, see below. I think we should continue discussion on numpy-discuss. >> >> I wrote: >> >> > You need at least a slightly different Python API to get anywhere, so >> > numexpr/Theano is the right place to work on an implementation of this >> > idea. Of course it would be nice if numexpr/Theano offered something as >> > convenient as >> > >> > with lazy: >> > arr = A + B + C # with all of these NumPy arrays >> > # compute upon exiting... >> >> More information: >> >> The disadvantage today of using Theano (or numexpr) is that they require >> using a different API, so that one has to learn and use Theano "from the >> ground up", rather than just slap it on in an optimization phase. >> >> The alternative would require extensive changes to NumPy, so I guess >> Theano authors or Francesc would need to push for this. >> >> The alternative would be (with A, B, C ndarray instances): >> >> with theano.lazy: >> arr = A + B + C >> >> On __enter__, the context manager would hook into NumPy to override it's >> arithmetic operators. Then it would build a Theano symbolic tree instead >> of performing computations right away. >> >> In addition to providing support for overriding arithmetic operators, >> slicing etc., it would be necesarry for "arr" to be an ndarray instance >> which is "not yet computed" (data-pointer set to NULL, and store a >> compute-me callback and some context information). >> >> Finally, the __exit__ would trigger computation. For other operations >> which need the data pointer (e.g., single element lookup) one could >> either raise an exception or trigger computation. >> >> This is just a rough sketch. It is not difficult "in principle", but of >> course there's really a massive amount of work involved to work support >> for this into the NumPy APIs. >> >> Probably, we're talking a NumPy 3.0 thing, after the current round of >> refactorings have settled... >> >> Please: Before discussing this further one should figure out if there's >> manpower available for it; no sense in hashing out a castle in the sky >> in details. Also it would be better to talk in person about this if >> possible (I'm in Berkeley now and will attend PyData and PyCon). >> >> Dag >> > > Thanks for the additional details. > > I feel like this must be a stupid question, but I have to ask: what is the > point of being lazy here, since the computation is performed on exit anyway? > > -=- Olivier > -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Mon Feb 20 13:08:50 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 20 Feb 2012 10:08:50 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F402B46.4020103@molden.no> <4F4277F7.4000401@molden.no> <4F427AED.3030903@molden.no> <4F428083.50809@astro.uio.no> Message-ID: <4F428C32.1050706@astro.uio.no> On 02/20/2012 09:34 AM, Christopher Jordan-Squire wrote: > On Mon, Feb 20, 2012 at 9:18 AM, Dag Sverre Seljebotn > wrote: >> On 02/20/2012 08:55 AM, Sturla Molden wrote: >>> Den 20.02.2012 17:42, skrev Sturla Molden: >>>> There are still other options than C or C++ that are worth considering. >>>> One would be to write NumPy in Python. E.g. we could use LLVM as a >>>> JIT-compiler and produce the performance critical code we need on the fly. >>>> >>>> >>> >>> LLVM and its C/C++ frontend Clang are BSD licenced. It compiles faster >>> than GCC and often produces better machine code. They can therefore be >>> used inside an array library. It would give a faster NumPy, and we could >>> keep most of it in Python. >> >> I think it is moot to focus on improving NumPy performance as long as in >> practice all NumPy operations are memory bound due to the need to take a >> trip through system memory for almost any operation. C/C++ is simply >> "good enough". JIT is when you're chasing a 2x improvement or so, but >> today NumPy can be 10-20x slower than a Cython loop. >> > > I don't follow this. Could you expand a bit more? (Specifically, I > wasn't aware that numpy could be 10-20x slower than a cython loop, if > we're talking about the base numpy library--so core operations. I'm The problem with NumPy is the temporaries needed -- if you want to compute A + B + np.sqrt(D) then, if the arrays are larger than cache size (a couple of megabytes), then each of those operations will first transfer the data in and out over the memory bus. I.e. first you compute an element of sqrt(D), then the result of that is put in system memory, then later the same number is read back in order to add it to an element in B, and so on. The compute-to-bandwidth ratio of modern CPUs is between 30:1 and 60:1... so in extreme cases it's cheaper to do 60 additions than to transfer a single number from system memory. It is much faster to only transfer an element (or small block) from each of A, B, and D to CPU cache, then do the entire expression, then transfer the result back. This is easy to code in Cython/Fortran/C and impossible with NumPy/Python. This is why numexpr/Theano exists. You can make the slowdown over Cython/Fortran/C almost arbitrarily large by adding terms to the equation above. So of course, the actual slowdown depends on your usecase. > also not totally sure why a JIT is a 2x improvement or so vs. cython. > Not that a disagree on either of these points, I'd just like a bit > more detail.) I meant that the JIT may be a 2x improvement over the current NumPy C code. There's some logic when iterating arrays that could perhaps be specialized away depending on the actual array layout at runtime. But I'm thinking that probably a JIT wouldn't help all that much, so it's probably 1x -- the 2x was just to be very conservative w.r.t. the argument I was making, as I don't know the NumPy C sources well enough. Dag From d.s.seljebotn at astro.uio.no Mon Feb 20 13:14:34 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 20 Feb 2012 10:14:34 -0800 Subject: [Numpy-discussion] ndarray and lazy evaluation In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F402B46.4020103@molden.no> <4F4277F7.4000401@molden.no> <4F427AED.3030903@molden.no> <4F428083.50809@astro.uio.no> <4F428709.9020206@astro.uio.no> Message-ID: <4F428D8A.70509@astro.uio.no> On 02/20/2012 10:04 AM, Francesc Alted wrote: > On Feb 20, 2012, at 6:46 PM, Dag Sverre Seljebotn wrote: > >> On 02/20/2012 09:24 AM, Olivier Delalleau wrote: >>> Hi Dag, >>> >>> Would you mind elaborating a bit on that example you mentioned at the >>> end of your email? I don't quite understand what behavior you would like >>> to achieve >> >> Sure, see below. I think we should continue discussion on numpy-discuss. >> >> I wrote: >> >>> You need at least a slightly different Python API to get anywhere, so >>> numexpr/Theano is the right place to work on an implementation of this >>> idea. Of course it would be nice if numexpr/Theano offered something as >>> convenient as >>> >>> with lazy: >>> arr = A + B + C # with all of these NumPy arrays >>> # compute upon exiting... >> >> More information: >> >> The disadvantage today of using Theano (or numexpr) is that they require >> using a different API, so that one has to learn and use Theano "from the >> ground up", rather than just slap it on in an optimization phase. >> >> The alternative would require extensive changes to NumPy, so I guess >> Theano authors or Francesc would need to push for this. >> >> The alternative would be (with A, B, C ndarray instances): >> >> with theano.lazy: >> arr = A + B + C >> >> On __enter__, the context manager would hook into NumPy to override it's >> arithmetic operators. Then it would build a Theano symbolic tree instead >> of performing computations right away. >> >> In addition to providing support for overriding arithmetic operators, >> slicing etc., it would be necesarry for "arr" to be an ndarray instance >> which is "not yet computed" (data-pointer set to NULL, and store a >> compute-me callback and some context information). >> >> Finally, the __exit__ would trigger computation. For other operations >> which need the data pointer (e.g., single element lookup) one could >> either raise an exception or trigger computation. >> >> This is just a rough sketch. It is not difficult "in principle", but of >> course there's really a massive amount of work involved to work support >> for this into the NumPy APIs. >> >> Probably, we're talking a NumPy 3.0 thing, after the current round of >> refactorings have settled... >> >> Please: Before discussing this further one should figure out if there's >> manpower available for it; no sense in hashing out a castle in the sky >> in details. > > I see. Mark Wiebe already suggested the same thing some time ago: > > https://github.com/numpy/numpy/blob/master/doc/neps/deferred-ufunc-evaluation.rst Thanks, I didn't know about that (though I did really assume this was on Mark's radar already). > >> Also it would be better to talk in person about this if >> possible (I'm in Berkeley now and will attend PyData and PyCon). > > Nice. Most of Continuum crew (me included) will be attending to both conferences. Mark W. will make PyCon only, but will be a good occasion to discuss this further. I certainly don't think I have anything to add to this discussion beyond what Mark wrote up. But will be nice to meet up anyway. Dag From francesc at continuum.io Mon Feb 20 13:18:40 2012 From: francesc at continuum.io (Francesc Alted) Date: Mon, 20 Feb 2012 19:18:40 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <4F428C32.1050706@astro.uio.no> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F402B46.4020103@molden.no> <4F4277F7.4000401@molden.no> <4F427AED.3030903@molden.no> <4F428083.50809@astro.uio.no> <4F428C32.1050706@astro.uio.no> Message-ID: On Feb 20, 2012, at 7:08 PM, Dag Sverre Seljebotn wrote: > On 02/20/2012 09:34 AM, Christopher Jordan-Squire wrote: >> On Mon, Feb 20, 2012 at 9:18 AM, Dag Sverre Seljebotn >> wrote: >>> On 02/20/2012 08:55 AM, Sturla Molden wrote: >>>> Den 20.02.2012 17:42, skrev Sturla Molden: >>>>> There are still other options than C or C++ that are worth considering. >>>>> One would be to write NumPy in Python. E.g. we could use LLVM as a >>>>> JIT-compiler and produce the performance critical code we need on the fly. >>>>> >>>>> >>>> >>>> LLVM and its C/C++ frontend Clang are BSD licenced. It compiles faster >>>> than GCC and often produces better machine code. They can therefore be >>>> used inside an array library. It would give a faster NumPy, and we could >>>> keep most of it in Python. >>> >>> I think it is moot to focus on improving NumPy performance as long as in >>> practice all NumPy operations are memory bound due to the need to take a >>> trip through system memory for almost any operation. C/C++ is simply >>> "good enough". JIT is when you're chasing a 2x improvement or so, but >>> today NumPy can be 10-20x slower than a Cython loop. >>> >> >> I don't follow this. Could you expand a bit more? (Specifically, I >> wasn't aware that numpy could be 10-20x slower than a cython loop, if >> we're talking about the base numpy library--so core operations. I'm > > The problem with NumPy is the temporaries needed -- if you want to compute > > A + B + np.sqrt(D) > > then, if the arrays are larger than cache size (a couple of megabytes), > then each of those operations will first transfer the data in and out > over the memory bus. I.e. first you compute an element of sqrt(D), then > the result of that is put in system memory, then later the same number > is read back in order to add it to an element in B, and so on. > > The compute-to-bandwidth ratio of modern CPUs is between 30:1 and > 60:1... so in extreme cases it's cheaper to do 60 additions than to > transfer a single number from system memory. > > It is much faster to only transfer an element (or small block) from each > of A, B, and D to CPU cache, then do the entire expression, then > transfer the result back. This is easy to code in Cython/Fortran/C and > impossible with NumPy/Python. > > This is why numexpr/Theano exists. Well, I can't speak for Theano (it is quite more general than numexpr, and more geared towards using GPUs, right?), but this was certainly the issue that make David Cooke to create numexpr. A more in-deep explanation about this problem can be seen in: http://www.euroscipy.org/talk/1657 which includes some graphical explanations. -- Francesc Alted From brett.olsen at gmail.com Mon Feb 20 13:35:46 2012 From: brett.olsen at gmail.com (Brett Olsen) Date: Mon, 20 Feb 2012 12:35:46 -0600 Subject: [Numpy-discussion] Forbidden charcter in the "names" argument of genfromtxt? In-Reply-To: References: Message-ID: On Sat, Feb 18, 2012 at 8:12 PM, Adam Hughes wrote: > Hey everyone, > > I have timeseries data in which the column label is simply a filename from > which the original data was taken.? Here's some sample data: > > name1.txt? name2.txt? name3.txt > 32????????????? 34??????????? 953 > 32????????????? 03??????????? 402 > > I've noticed that the standard genfromtxt() method works great; however, the > names aren't written correctly.? That is, if I use the command: > > print data['name1.txt'] > > Nothing happens. > > However, when I remove the file extension, Eg: > > name1? name2? name3 > 32????????????? 34??????????? 953 > 32????????????? 03??????????? 402 > > Then print data['name1'] return (32, 32) as expected.? It seems that the > period in the name isn't compatible with the genfromtxt() names attribute. > Is there a workaround, or do I need to restructure my program to get the > extension removed?? I'd rather not do this if possible for reasons that > aren't important for the discussion at hand. It looks like the period is just getting stripped out of the names: In [1]: import numpy as N In [2]: N.genfromtxt('sample.txt', names=True) Out[2]: array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], dtype=[('name1txt', ' References: Message-ID: Thanks Brett. I appreciate you taking the time to help me out. In particular, I did not know the correct syntax for this: data.dtype.names = names Which is very helpful. If I would have known how to access data.dtype.names, I think it would have saved me a great deal of trouble. I guess it's all part of a learning curve. I'll keep in mind that the period may cause problems later; however, as far as I can tell so far, there's nothing going wrong when I access the data. On Mon, Feb 20, 2012 at 1:35 PM, Brett Olsen wrote: > On Sat, Feb 18, 2012 at 8:12 PM, Adam Hughes > wrote: > > Hey everyone, > > > > I have timeseries data in which the column label is simply a filename > from > > which the original data was taken. Here's some sample data: > > > > name1.txt name2.txt name3.txt > > 32 34 953 > > 32 03 402 > > > > I've noticed that the standard genfromtxt() method works great; however, > the > > names aren't written correctly. That is, if I use the command: > > > > print data['name1.txt'] > > > > Nothing happens. > > > > However, when I remove the file extension, Eg: > > > > name1 name2 name3 > > 32 34 953 > > 32 03 402 > > > > Then print data['name1'] return (32, 32) as expected. It seems that the > > period in the name isn't compatible with the genfromtxt() names > attribute. > > Is there a workaround, or do I need to restructure my program to get the > > extension removed? I'd rather not do this if possible for reasons that > > aren't important for the discussion at hand. > > It looks like the period is just getting stripped out of the names: > > In [1]: import numpy as N > > In [2]: N.genfromtxt('sample.txt', names=True) > Out[2]: > array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], > dtype=[('name1txt', ' > Interestingly, this still happens if you supply the names manually: > > In [17]: def reader(filename): > ....: infile = open(filename, 'r') > ....: names = infile.readline().split() > ....: data = N.genfromtxt(infile, names=names) > ....: infile.close() > ....: return data > ....: > > In [20]: data = reader('sample.txt') > > In [21]: data > Out[21]: > array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], > dtype=[('name1txt', ' > What you can do is reset the names after genfromtxt is through with it, > though: > > In [34]: def reader(filename): > ....: infile = open(filename, 'r') > ....: names = infile.readline().split() > ....: infile.close() > ....: data = N.genfromtxt(filename, names=True) > ....: data.dtype.names = names > ....: return data > ....: > > In [35]: data = reader('sample.txt') > > In [36]: data > Out[36]: > array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], > dtype=[('name1.txt', ' ' > Be warned, I don't know why the period is getting stripped; there may > be a good reason, and adding it in might cause problems. > > ~Brett > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Mon Feb 20 13:58:53 2012 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 20 Feb 2012 13:58:53 -0500 Subject: [Numpy-discussion] Forbidden charcter in the "names" argument of genfromtxt? In-Reply-To: References: Message-ID: On Mon, Feb 20, 2012 at 1:35 PM, Brett Olsen wrote: > On Sat, Feb 18, 2012 at 8:12 PM, Adam Hughes wrote: >> Hey everyone, >> >> I have timeseries data in which the column label is simply a filename from >> which the original data was taken.? Here's some sample data: >> >> name1.txt? name2.txt? name3.txt >> 32????????????? 34??????????? 953 >> 32????????????? 03??????????? 402 >> >> I've noticed that the standard genfromtxt() method works great; however, the >> names aren't written correctly.? That is, if I use the command: >> >> print data['name1.txt'] >> >> Nothing happens. >> >> However, when I remove the file extension, Eg: >> >> name1? name2? name3 >> 32????????????? 34??????????? 953 >> 32????????????? 03??????????? 402 >> >> Then print data['name1'] return (32, 32) as expected.? It seems that the >> period in the name isn't compatible with the genfromtxt() names attribute. >> Is there a workaround, or do I need to restructure my program to get the >> extension removed?? I'd rather not do this if possible for reasons that >> aren't important for the discussion at hand. > > It looks like the period is just getting stripped out of the names: > > In [1]: import numpy as N > > In [2]: N.genfromtxt('sample.txt', names=True) > Out[2]: > array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], > ? ? ?dtype=[('name1txt', ' > Interestingly, this still happens if you supply the names manually: > > In [17]: def reader(filename): > ? ....: ? ? infile = open(filename, 'r') > ? ....: ? ? names = infile.readline().split() > ? ....: ? ? data = N.genfromtxt(infile, names=names) > ? ....: ? ? infile.close() > ? ....: ? ? return data > ? ....: > > In [20]: data = reader('sample.txt') > > In [21]: data > Out[21]: > array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], > ? ? ?dtype=[('name1txt', ' > What you can do is reset the names after genfromtxt is through with it, though: > > In [34]: def reader(filename): > ? ....: ? ? infile = open(filename, 'r') > ? ....: ? ? names = infile.readline().split() > ? ....: ? ? infile.close() > ? ....: ? ? data = N.genfromtxt(filename, names=True) > ? ....: ? ? data.dtype.names = names > ? ....: ? ? return data > ? ....: > > In [35]: data = reader('sample.txt') > > In [36]: data > Out[36]: > array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], > ? ? ?dtype=[('name1.txt', ' > Be warned, I don't know why the period is getting stripped; there may > be a good reason, and adding it in might cause problems. I think the period is stripped because recarrays also offer attribute access of names. So you wouldn't be able to do your_array.sample.txt All the names get passed through a name validator. IIRC it's something like from numpy.lib import _iotools validator = _iotools.NameValidator() validator.validate('sample1.txt') validator.validate('a name with spaces') NameValidator has a good docstring and the gist of this should be in the genfromtxt docs, if it's not already. Skipper From hugadams at gwmail.gwu.edu Mon Feb 20 14:02:16 2012 From: hugadams at gwmail.gwu.edu (Adam Hughes) Date: Mon, 20 Feb 2012 14:02:16 -0500 Subject: [Numpy-discussion] Forbidden charcter in the "names" argument of genfromtxt? In-Reply-To: References: Message-ID: Thanks for clearing that up. On Mon, Feb 20, 2012 at 1:58 PM, Skipper Seabold wrote: > On Mon, Feb 20, 2012 at 1:35 PM, Brett Olsen > wrote: > > On Sat, Feb 18, 2012 at 8:12 PM, Adam Hughes > wrote: > >> Hey everyone, > >> > >> I have timeseries data in which the column label is simply a filename > from > >> which the original data was taken. Here's some sample data: > >> > >> name1.txt name2.txt name3.txt > >> 32 34 953 > >> 32 03 402 > >> > >> I've noticed that the standard genfromtxt() method works great; > however, the > >> names aren't written correctly. That is, if I use the command: > >> > >> print data['name1.txt'] > >> > >> Nothing happens. > >> > >> However, when I remove the file extension, Eg: > >> > >> name1 name2 name3 > >> 32 34 953 > >> 32 03 402 > >> > >> Then print data['name1'] return (32, 32) as expected. It seems that the > >> period in the name isn't compatible with the genfromtxt() names > attribute. > >> Is there a workaround, or do I need to restructure my program to get the > >> extension removed? I'd rather not do this if possible for reasons that > >> aren't important for the discussion at hand. > > > > It looks like the period is just getting stripped out of the names: > > > > In [1]: import numpy as N > > > > In [2]: N.genfromtxt('sample.txt', names=True) > > Out[2]: > > array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], > > dtype=[('name1txt', ' ' > > > Interestingly, this still happens if you supply the names manually: > > > > In [17]: def reader(filename): > > ....: infile = open(filename, 'r') > > ....: names = infile.readline().split() > > ....: data = N.genfromtxt(infile, names=names) > > ....: infile.close() > > ....: return data > > ....: > > > > In [20]: data = reader('sample.txt') > > > > In [21]: data > > Out[21]: > > array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], > > dtype=[('name1txt', ' ' > > > What you can do is reset the names after genfromtxt is through with it, > though: > > > > In [34]: def reader(filename): > > ....: infile = open(filename, 'r') > > ....: names = infile.readline().split() > > ....: infile.close() > > ....: data = N.genfromtxt(filename, names=True) > > ....: data.dtype.names = names > > ....: return data > > ....: > > > > In [35]: data = reader('sample.txt') > > > > In [36]: data > > Out[36]: > > array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)], > > dtype=[('name1.txt', ' ' > > > Be warned, I don't know why the period is getting stripped; there may > > be a good reason, and adding it in might cause problems. > > I think the period is stripped because recarrays also offer attribute > access of names. So you wouldn't be able to do > > your_array.sample.txt > > All the names get passed through a name validator. IIRC it's something like > > from numpy.lib import _iotools > > validator = _iotools.NameValidator() > > validator.validate('sample1.txt') > validator.validate('a name with spaces') > > NameValidator has a good docstring and the gist of this should be in > the genfromtxt docs, if it's not already. > > Skipper > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniele at grinta.net Mon Feb 20 14:14:09 2012 From: daniele at grinta.net (Daniele Nicolodi) Date: Mon, 20 Feb 2012 20:14:09 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> Message-ID: <4F429B81.50304@grinta.net> On 18/02/12 04:54, Sturla Molden wrote: > This is not true. C++ can be much easier, particularly for those who > already know Python. The problem: C++ textbooks teach C++ as a subset > of C. Writing C in C++ just adds the complexity of C++ on top of C, > for no good reason. I can write FORTRAN in any language, it does not > mean it is a good idea. We would have to start by teaching people to > write good C++. E.g., always use the STL like Python built-in types > if possible. Dynamic memory should be std::vector, not new or malloc. > Pointers should be replaced with references. We would have to write a > C++ programming tutorial that is based on Pyton knowledge instead of > C knowledge. Hello Sturla, unrelated to the numpy tewrite debate, can you please suggest some resources you think can be used to learn how to program C++ "the proper way"? Thank you. Cheers, -- Daniele From matthieu.brucher at gmail.com Mon Feb 20 14:17:18 2012 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Mon, 20 Feb 2012 20:17:18 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <4F429B81.50304@grinta.net> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F429B81.50304@grinta.net> Message-ID: 2012/2/20 Daniele Nicolodi > On 18/02/12 04:54, Sturla Molden wrote: > > This is not true. C++ can be much easier, particularly for those who > > already know Python. The problem: C++ textbooks teach C++ as a subset > > of C. Writing C in C++ just adds the complexity of C++ on top of C, > > for no good reason. I can write FORTRAN in any language, it does not > > mean it is a good idea. We would have to start by teaching people to > > write good C++. E.g., always use the STL like Python built-in types > > if possible. Dynamic memory should be std::vector, not new or malloc. > > Pointers should be replaced with references. We would have to write a > > C++ programming tutorial that is based on Pyton knowledge instead of > > C knowledge. > > Hello Sturla, > > unrelated to the numpy tewrite debate, can you please suggest some > resources you think can be used to learn how to program C++ "the proper > way"? > One of the best books may be "Accelerated C++" or the new Stroutrup's book (not the C++ language) Matthieu -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher -------------- next part -------------- An HTML attachment was scrubbed... URL: From bergstrj at iro.umontreal.ca Mon Feb 20 14:26:24 2012 From: bergstrj at iro.umontreal.ca (James Bergstra) Date: Mon, 20 Feb 2012 14:26:24 -0500 Subject: [Numpy-discussion] ndarray and lazy evaluation Message-ID: On Mon, Feb 20, 2012 at 12:28 PM, Francesc Alted wrote: > On Feb 20, 2012, at 6:18 PM, Dag Sverre Seljebotn wrote: > > You need at least a slightly different Python API to get anywhere, so > > numexpr/Theano is the right place to work on an implementation of this > > idea. Of course it would be nice if numexpr/Theano offered something as > > convenient as > > > > with lazy: > > arr = A + B + C # with all of these NumPy arrays > > # compute upon exiting? > > Hmm, that would be cute indeed. Do you have an idea on how the code in > the with context could be passed to the Python AST compiler (? la > numexpr.evaluate("A + B + C"))? > > The biggest problem with the numexpr approach (e.g. evaluate("A + B + C")) whether the programmer has to type the quotes or not, is that the sub-program has to be completely expressed in the sub-language. If I write >>> def f(x): return x[:3] >>> numexpr.evaluate("A + B + f(C)") I would like that to be fast, but it's not obvious at all how that would work. We would be asking numexpr to introspect arbitrary callable python objects, and recompile arbitrary Python code, effectively setting up the expectation in the user's mind that numexpr is re-implementing an entire compiler. That can be fast obviously, but it seems to me to represent significant departure from numpy's focus, which I always thought was the data-container rather than the expression evaluation (though maybe this firestorm of discussion is aimed at changing this?) Theano went with another option which was to replace the A, B, and C variables with objects that have a modified __add__. Theano's back-end can be slow at times and the codebase can feel like a heavy dependency, but my feeling is still that this is a great approach to getting really fast implementations of compound expressions. The context syntax you suggest using is a little ambiguous in that the indented block of a with statement block includes *statements* whereas what you mean to build in the indented block is a *single expression* graph. You could maybe get the right effect with something like A, B, C = np.random.rand(3, 5) expr = np.compound_expression() with np.expression_builder(expr) as foo: arr = A + B + C brr = A + B * C foo.return((arr, brr)) # compute arr and brr as quickly as possible a, b = expr.run() # modify one of the arrays that the expression was compiled to use A[:] += 1 # re-run the compiled expression on the new value a, b = expr.run() - JB -- James Bergstra, Ph.D. Research Scientist Rowland Institute, Harvard University -------------- next part -------------- An HTML attachment was scrubbed... URL: From bergstrj at iro.umontreal.ca Mon Feb 20 14:26:43 2012 From: bergstrj at iro.umontreal.ca (James Bergstra) Date: Mon, 20 Feb 2012 14:26:43 -0500 Subject: [Numpy-discussion] ndarray and lazy evaluation Message-ID: On Mon, Feb 20, 2012 at 1:01 PM, James Bergstra wrote: > On Mon, Feb 20, 2012 at 12:28 PM, Francesc Alted wrote: > >> On Feb 20, 2012, at 6:18 PM, Dag Sverre Seljebotn wrote: >> > You need at least a slightly different Python API to get anywhere, so >> > numexpr/Theano is the right place to work on an implementation of this >> > idea. Of course it would be nice if numexpr/Theano offered something as >> > convenient as >> > >> > with lazy: >> > arr = A + B + C # with all of these NumPy arrays >> > # compute upon exiting? >> >> Hmm, that would be cute indeed. Do you have an idea on how the code in >> the with context could be passed to the Python AST compiler (? la >> numexpr.evaluate("A + B + C"))? >> >> > The biggest problem with the numexpr approach (e.g. evaluate("A + B + C")) > whether the programmer has to type the quotes or not, is that the > sub-program has to be completely expressed in the sub-language. > > If I write > > >>> def f(x): return x[:3] > >>> numexpr.evaluate("A + B + f(C)") > > I would like that to be fast, but it's not obvious at all how that would > work. We would be asking numexpr to introspect arbitrary callable python > objects, and recompile arbitrary Python code, effectively setting up the > expectation in the user's mind that numexpr is re-implementing an entire > compiler. That can be fast obviously, but it seems to me to represent > significant departure from numpy's focus, which I always thought was the > data-container rather than the expression evaluation (though maybe this > firestorm of discussion is aimed at changing this?) > > Theano went with another option which was to replace the A, B, and C > variables with objects that have a modified __add__. Theano's back-end can > be slow at times and the codebase can feel like a heavy dependency, but my > feeling is still that this is a great approach to getting really fast > implementations of compound expressions. > > The context syntax you suggest using is a little ambiguous in that the > indented block of a with statement block includes *statements* whereas what > you mean to build in the indented block is a *single expression* graph. > You could maybe get the right effect with something like > > A, B, C = np.random.rand(3, 5) > > expr = np.compound_expression() > with np.expression_builder(expr) as foo: > arr = A + B + C > brr = A + B * C > foo.return((arr, brr)) > > # compute arr and brr as quickly as possible > a, b = expr.run() > > # modify one of the arrays that the expression was compiled to use > A[:] += 1 > > # re-run the compiled expression on the new value > a, b = expr.run() > > - JB > I should add that the biggest benefit of expressing things as compound expressions in this way is not in saving temporaries (though that is nice) it's being able to express enough computation work at a time that it offsets the time required to ship the arguments off to a GPU for evaluation! This has been a *huge* win reaped by the Theano approach, it works really well. The abstraction boundary offered by this sort of expression graph has been really effective. This speaks even more to the importance of distinguishing between the data container (e.g. numpy, Theano's internal ones, PyOpenCL's one, PyCUDA's one) and the expression compilation and evaluation infrastructures (e.g. Theano, numexpr, cython). The goal should be as much as possible to separate these two so that programs can be expressed in a natural way, and then evaluated using containers that are suited to the program. - JB -- James Bergstra, Ph.D. Research Scientist Rowland Institute, Harvard University -------------- next part -------------- An HTML attachment was scrubbed... URL: From xscript at gmx.net Mon Feb 20 14:28:01 2012 From: xscript at gmx.net (=?utf-8?Q?Llu=C3=ADs?=) Date: Mon, 20 Feb 2012 20:28:01 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: (Francesc Alted's message of "Mon, 20 Feb 2012 18:28:59 +0100") References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3F2297.7010703@creativetrax.com> <4F402B46.4020103@molden.no> <4F4277F7.4000401@molden.no> <4F427AED.3030903@molden.no> <4F428083.50809@astro.uio.no> Message-ID: <874nulxq2m.fsf@ginnungagap.bsc.es> Francesc Alted writes: > On Feb 20, 2012, at 6:18 PM, Dag Sverre Seljebotn wrote: >> You need at least a slightly different Python API to get anywhere, so >> numexpr/Theano is the right place to work on an implementation of this >> idea. Of course it would be nice if numexpr/Theano offered something as >> convenient as >> >> with lazy: >> arr = A + B + C # with all of these NumPy arrays >> # compute upon exiting? > Hmm, that would be cute indeed. Do you have an idea on how the code in the with > context could be passed to the Python AST compiler (? la numexpr.evaluate("A + B > + C"))? Well, I started writing some experiments to "almost transparently" translate regular ndarray operations to numexpr strings (or others) using only python code. The concept is very simple: # you only need the first one to start building the AST a = lazy(np.arange(16)) b = np.arange(16) res = a + b + 3 print evaluate(res) # the actual evaluation can be delayed to something like __repr__ or __str__ print repr(res) print res # you could also delay evaluation until someone uses res to create a new array My target was to use this to also generate optimized GPU kernels in-flight using pycuda, but I think some other relatively recent project already performed something similar (w.r.t. generating cuda kernels out of python expressions). The supporting code for numexpr was something like: import numexpr import numpy as np def build_arg_expr (arg, args): if isinstance(arg, Expr): # recursively build the expression arg_expr, arg_args = arg.build_expr() args.update(arg_args) return arg_expr else: # unique argument identifier arg_id = "arg_%d" % id(arg) args[arg_id] = arg return arg_id # generic expression builder class Expr: def evaluate(self): expr, args = self.build_expr() return numexpr.evaluate(expr, local_dict = args, global_dict = {}) def __repr__ (self): return self.evaluate().__repr__() def __str__ (self): return self.evaluate().__str__() def __add__ (self, other): return ExprAdd(self, other) # expression builder for adds class ExprAdd(Expr): def __init__(self, arg1, arg2): self.arg1 = arg1 self.arg2 = arg2 def build_expr(self): args = {} expr1 = build_arg_expr(self.arg1, args) expr2 = build_arg_expr(self.arg2, args) return "("+expr1+") + ("+expr2+")", args # ndarray-like class to generate expression builders class LazyNdArray(np.ndarray): def __add__ (self, other): return ExprAdd(self, other) # build a LazyNdArray def lazy (arg): return arg.view(LazyNdArray) # evaluate with numexpr an arbitrary expression builder def evaluate(arg): return arg.evaluate() The thing here is to always return to the user something that looks like an ndarray. As you can see the whole thing is not very complex, but some less funny code had to be written meanwhile for work and I just dropped this :) Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From ndbecker2 at gmail.com Mon Feb 20 14:41:04 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Mon, 20 Feb 2012 14:41:04 -0500 Subject: [Numpy-discussion] Proposed Roadmap Overview References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> Message-ID: Charles R Harris wrote: > On Fri, Feb 17, 2012 at 12:09 PM, Benjamin Root wrote: > >> >> >> On Fri, Feb 17, 2012 at 1:00 PM, Christopher Jordan-Squire < >> cjordan1 at uw.edu> wrote: >> >>> On Fri, Feb 17, 2012 at 10:21 AM, Mark Wiebe wrote: >>> > On Fri, Feb 17, 2012 at 11:52 AM, Eric Firing >>> wrote: >>> >> >>> >> On 02/17/2012 05:39 AM, Charles R Harris wrote: >>> >> > >>> >> > >>> >> > On Fri, Feb 17, 2012 at 8:01 AM, David Cournapeau < >>> cournape at gmail.com >>> >> > > wrote: >>> >> > >>> >> > Hi Travis, >>> >> > >>> >> > On Thu, Feb 16, 2012 at 10:39 PM, Travis Oliphant >>> >> > > wrote: >>> >> > > Mark Wiebe and I have been discussing off and on (as well as >>> >> > talking with Charles) a good way forward to balance two competing >>> >> > desires: >>> >> > > >>> >> > > * addition of new features that are needed in NumPy >>> >> > > * improving the code-base generally and moving towards >>> a >>> >> > more maintainable NumPy >>> >> > > >>> >> > > I know there are load voices for just focusing on the second >>> of >>> >> > these and avoiding the first until we have finished that. I >>> >> > recognize the need to improve the code base, but I will also be >>> >> > pushing for improvements to the feature-set and user experience >>> in >>> >> > the process. >>> >> > > >>> >> > > As a result, I am proposing a rough outline for releases over >>> the >>> >> > next year: >>> >> > > >>> >> > > * NumPy 1.7 to come out as soon as the serious bugs >>> can be >>> >> > eliminated. Bryan, Francesc, Mark, and I are able to help triage >>> >> > some of those. >>> >> > > >>> >> > > * NumPy 1.8 to come out in July which will have as many >>> >> > ABI-compatible feature enhancements as we can add while improving >>> >> > test coverage and code cleanup. I will post to this list more >>> >> > details of what we plan to address with it later. Included for >>> >> > possible inclusion are: >>> >> > > * resolving the NA/missing-data issues >>> >> > > * finishing group-by >>> >> > > * incorporating the start of label arrays >>> >> > > * incorporating a meta-object >>> >> > > * a few new dtypes (variable-length string, >>> >> > varialbe-length unicode and an enum type) >>> >> > > * adding ufunc support for flexible dtypes and possibly >>> >> > structured arrays >>> >> > > * allowing generalized ufuncs to work on more kinds of >>> >> > arrays besides just contiguous >>> >> > > * improving the ability for NumPy to receive >>> JIT-generated >>> >> > function pointers for ufuncs and other calculation opportunities >>> >> > > * adding "filters" to Input and Output >>> >> > > * simple computed fields for dtypes >>> >> > > * accepting a Data-Type specification as a class or >>> JSON >>> >> > file >>> >> > > * work towards improving the dtype-addition mechanism >>> >> > > * re-factoring of code so that it can compile with a >>> C++ >>> >> > compiler and be minimally dependent on Python data-structures. >>> >> > >>> >> > This is a pretty exciting list of features. What is the rationale >>> >> > for >>> >> > code being compiled as C++ ? IMO, it will be difficult to do so >>> >> > without preventing useful C constructs, and without removing >>> some of >>> >> > the existing features (like our use of C99 complex). The subset >>> that >>> >> > is both C and C++ compatible is quite constraining. >>> >> > >>> >> > >>> >> > I'm in favor of this myself, C++ would allow a lot code cleanup and >>> make >>> >> > it easier to provide an extensible base, I think it would be a >>> natural >>> >> > fit with numpy. Of course, some C++ projects become tangled messes of >>> >> > inheritance, but I'd be very interested in seeing what a good C++ >>> >> > designer like Mark, intimately familiar with the numpy code base, >>> could >>> >> > do. This opportunity might not come by again anytime soon and I >>> think we >>> >> > should grab onto it. The initial step would be a release whose code >>> that >>> >> > would compile in both C/C++, which mostly comes down to removing C++ >>> >> > keywords like 'new'. >>> >> > >>> >> > I did suggest running it by you for build issues, so please raise any >>> >> > you can think of. Note that MatPlotLib is in C++, so I don't think >>> the >>> >> > problems are insurmountable. And choosing a set of compilers to >>> support >>> >> > is something that will need to be done. >>> >> >>> >> It's true that matplotlib relies heavily on C++, both via the Agg >>> >> library and in its own extension code. Personally, I don't like this; >>> I >>> >> think it raises the barrier to contributing. C++ is an order of >>> >> magnitude more complicated than C--harder to read, and much harder to >>> >> write, unless one is a true expert. In mpl it brings reliance on the >>> CXX >>> >> library, which Mike D. has had to help maintain. And if it does >>> >> increase compiler specificity, that's bad. >>> > >>> > >>> > This gets to the recruitment issue, which is one of the most important >>> > problems I see numpy facing. I personally have contributed a lot of >>> code to >>> > NumPy *in spite of* the fact it's in C. NumPy being in C instead of C++ >>> was >>> > the biggest negative point when I considered whether it was worth >>> > contributing to the project. I suspect there are many programmers out >>> there >>> > who are skilled in low-level, high-performance C++, who would be >>> willing to >>> > contribute, but don't want to code in C. >>> > >>> > I believe NumPy should be trying to find people who want to make high >>> > performance, close to the metal, libraries. This is a very different >>> type of >>> > programmer than one who wants to program in Python, but is willing to >>> dabble >>> > in a lower level language to make something run faster. High performance >>> > library development is one of the things the C++ developer community >>> does >>> > very well, and that community is where we have a good chance of finding >>> the >>> > programmers NumPy needs. >>> > >>> >> I would much rather see development in the direction of sticking with C >>> >> where direct low-level control and speed are needed, and using cython >>> to >>> >> gain higher level language benefits where appropriate. Of course, that >>> >> brings in the danger of reliance on another complex tool, cython. If >>> >> that danger is considered excessive, then just stick with C. >>> > >>> > >>> > There are many small benefits C++ can offer, even if numpy chooses only >>> to >>> > use a tiny subset of the C++ language. For example, RAII can be used to >>> > reliably eliminate PyObject reference leaks. >>> > >>> > Consider a regression like this: >>> > http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057831.html >>> > >>> > Fixing this in C would require switching all the relevant usages of >>> > NPY_MAXARGS to use a dynamic memory allocation. This brings with it the >>> > potential of easily introducing a memory leak, and is a lot of work to >>> do. >>> > In C++, this functionality could be placed inside a class, where the >>> > deterministic construction/destruction semantics eliminate the risk of >>> > memory leaks and make the code easier to read at the same time. There >>> are >>> > other examples like this where the C language has forced a suboptimal >>> design >>> > choice because of how hard it would be to do it better. >>> > >>> > Cheers, >>> > Mark >>> > >>> >>> In a similar vein, could incorporating C++ lead to a simpler low-level >>> API for numpy? I know Mark has talked before about--in the long-term, >>> as a dream project to scratch his own itch, and something the BDF12 >>> doesn't necessarily agree with--implementing the great ideas in numpy >>> as a layered C++ library. (Which would have the added benefit of >>> making numpy more of a general array library that could be exposed to >>> any language which can call C++ libraries.) >>> >>> I don't imagine that's on the table for anything near-term, but I >>> wonder if making more of the low-level stuff C++ would make it easier >>> for performance nuts to write their own code in C/C++ interfacing with >>> numpy, and then expose it to python. After playing around with ufuncs >>> at the C level for a little while last summer, I quickly realized any >>> simplifications would be greatly appreciated. >>> >>> -Chris >>> >>> >>> >> I am also in favor of moving towards a C++ oriented library. Personally, >> I find C++ easier to read and understand, most likely because I learned it >> first. I only learned C in the context of learning C++. >> >> Just a thought, with the upcoming revisions to the C++ standard, this does >> open up the possibility of some nice templating features that would make >> the library easier to use in native C++ programs. On a side note, does >> anybody use std::valarray? >> >> > My impression is that std::valarray didn't really solve the problems it was > intended to solve. IIRC, the valarray author himself said as much, but I > don't recall where. > > Chuck A related question is whether numpy core in c++ would be based on any existing c++ libs for HPC. There are quite a few efforts for 1 and 2 dimensions. Fewer for arbitrary (or arbitrary up to some reasonable limit) dimension. Or, would we be talking about purely custom c++ code for numpy? I suspect the latter. Although there are many promising c++ matrix/vector type libraries (too many), I suspect it would be too difficult to preserve all numpy semantics via this route. From xscript at gmx.net Mon Feb 20 14:41:38 2012 From: xscript at gmx.net (=?utf-8?Q?Llu=C3=ADs?=) Date: Mon, 20 Feb 2012 20:41:38 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <874nulxq2m.fsf@ginnungagap.bsc.es> (=?utf-8?Q?=22Llu=C3=ADs?= =?utf-8?Q?=22's?= message of "Mon, 20 Feb 2012 20:28:01 +0100") References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3F2297.7010703@creativetrax.com> <4F402B46.4020103@molden.no> <4F4277F7.4000401@molden.no> <4F427AED.3030903@molden.no> <4F428083.50809@astro.uio.no> <874nulxq2m.fsf@ginnungagap.bsc.es> Message-ID: <87ty2lwavh.fsf@ginnungagap.bsc.es> Llu?s writes: > Francesc Alted writes: >> On Feb 20, 2012, at 6:18 PM, Dag Sverre Seljebotn wrote: >>> You need at least a slightly different Python API to get anywhere, so >>> numexpr/Theano is the right place to work on an implementation of this >>> idea. Of course it would be nice if numexpr/Theano offered something as >>> convenient as >>> >>> with lazy: >>> arr = A + B + C # with all of these NumPy arrays >>> # compute upon exiting? >> Hmm, that would be cute indeed. Do you have an idea on how the code in the with >> context could be passed to the Python AST compiler (? la numexpr.evaluate("A + B >> + C"))? > Well, I started writing some experiments to "almost transparently" translate > regular ndarray operations to numexpr strings (or others) using only python > code. [...] > My target was to use this to also generate optimized GPU kernels in-flight using > pycuda, but I think some other relatively recent project already performed > something similar (w.r.t. generating cuda kernels out of python expressions). Aaahhh, I just had a quick look at Theano and it seems it's the project I was referring to. Good job! :) Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From paul.anton.letnes at gmail.com Mon Feb 20 14:55:26 2012 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Mon, 20 Feb 2012 20:55:26 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <4F4266EA.702@molden.no> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F419054.8040203@molden.no> <4F4266EA.702@molden .no> Message-ID: <89CB07B6-E529-4D84-8B1E-B0F61D6048BB@gmail.com> On 20. feb. 2012, at 16:29, Sturla Molden wrote: > Den 20.02.2012 08:35, skrev Paul Anton Letnes: >> In the language wars, I have one question. Why is Fortran not being considered? Fortran already implements many of the features that we want in NumPy: > > Yes ... but it does not make Fortran a systems programming language. > Making NumPy is different from using it. > >> - slicing and similar operations, at least some of the fancy indexing kind >> - element-wise array operations and function calls >> - array bounds-checking and other debugging aid (with debugging flags) > > That is nice for numerical computing, but not really needed to make NumPy. > > >> - arrays that mentally map very well onto numpy arrays. To me, this spells +1 to ease of contribution, over some abstract C/C++ template > > Mentally perhaps, but not binary. NumPy needs uniformly strided memory > on the binary level. Fortran just gives this at the mental level. E.g. > there is nothing that dictates a Fortran pointer has to be a view, the > compiler is free to employ copy-in copy-out. In Fortran, a function call > can invalidate a pointer. One would therefore have to store the array > in an array of integer*1, and use the intrinsic function transfer() to > parse the contents into NumPy dtypes. > >> - in newer standards it has some nontrivial mathematical functions: gamma, bessel, etc. that numpy lacks right now > > That belongs to SciPy. I don't see exactly why. Why should numpy have exponential but not gamma functions? The division seems kinda arbitrary. Not that I am arguing violently for bessel functions in numpy. >> - compilers that are good at optimizing for floating-point performance, because that's what Fortran is all about > > Insanely good, but not when we start to do the (binary, not mentally) > strided access that NumPy needs. (Not that C compilers would be any better.) > > > >> - not Fortran as such, but BLAS and LAPACK are easily accessed by Fortran >> - possibly other numerical libraries that can be helpful >> - Fortran has, in its newer standards, thought of C interoperability. We could still keep bits of the code in C (or even C++?) if we'd like to, or perhaps f2py/Cython could do the wrapping. > > Not f2py, as it depends on NumPy. > > - some programmers know Fortran better than C++. Fortran is at least used by many science guys, like me. > > > That is a valid arguments. Fortran is also much easier to read and debug. > > > Sturla Thanks for an excellent answer, Sturla - very informative indeed. Paul. From xscript at gmx.net Mon Feb 20 14:57:44 2012 From: xscript at gmx.net (=?utf-8?Q?Llu=C3=ADs?=) Date: Mon, 20 Feb 2012 20:57:44 +0100 Subject: [Numpy-discussion] ndarray and lazy evaluation In-Reply-To: (James Bergstra's message of "Mon, 20 Feb 2012 14:26:43 -0500") References: Message-ID: <87linxuvk7.fsf@ginnungagap.bsc.es> James Bergstra writes: [...] > I should add that the biggest benefit of expressing things as compound > expressions in this way is not in saving temporaries (though that is nice) it's > being able to express enough computation work at a time that it offsets the time > required to ship the arguments off to a GPU for evaluation! Right, that's exacly what you need for an "external computation" to pay off. Just out of curiosity (feel free to respond with a RTFM or a RTFP :)), do you support any of these? (sorry for the made-up names) * automatic transfer double-buffering * automatic problem partitioning into domains (e.g., multiple GPUs; even better if also supports nodes - MPI -) * point-specific computations (e.g., code dependant on the thread id, although this can also be expressed in other ways, like index ranges) * point-relative computations (the most common would be a stencil) If you have all of them, then I'd say the project has a huge potential for total world dominance :) Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From bergstrj at iro.umontreal.ca Mon Feb 20 15:01:33 2012 From: bergstrj at iro.umontreal.ca (James Bergstra) Date: Mon, 20 Feb 2012 15:01:33 -0500 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <87ty2lwavh.fsf@ginnungagap.bsc.es> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3F2297.7010703@creativetrax.com> <4F402B46.4020103@molden.no> <4F4277F7.4000401@molden.no> <4F427AED.3030903@molden.no> <4F428083.50809@astro.uio.no> <874nulxq2m.fsf@ginnungagap.bsc.es> <87ty2lwavh.fsf@ginnungagap.bsc.es> Message-ID: Looks like Dag forked the discussion of lazy evaluation to a new thread ([Numpy-discussion] ndarray and lazy evaluation). There are actually several projects inspired by this sort of design: off the top of my head I can think of Theano, copperhead, numexpr, arguably sympy, and some non-public code by Nicolas Pinto. So I think the strengths of the approach in principle are established... the big question is how to make this approach easy to use in all the settings where it could be useful. I don't think any of these projects has gotten that totally right. -JB On Mon, Feb 20, 2012 at 2:41 PM, Llu?s wrote: > Llu?s writes: > > > Francesc Alted writes: > >> On Feb 20, 2012, at 6:18 PM, Dag Sverre Seljebotn wrote: > >>> You need at least a slightly different Python API to get anywhere, so > >>> numexpr/Theano is the right place to work on an implementation of this > >>> idea. Of course it would be nice if numexpr/Theano offered something as > >>> convenient as > >>> > >>> with lazy: > >>> arr = A + B + C # with all of these NumPy arrays > >>> # compute upon exiting? > > >> Hmm, that would be cute indeed. Do you have an idea on how the code in > the with > >> context could be passed to the Python AST compiler (? la > numexpr.evaluate("A + B > >> + C"))? > > > Well, I started writing some experiments to "almost transparently" > translate > > regular ndarray operations to numexpr strings (or others) using only > python > > code. > [...] > > My target was to use this to also generate optimized GPU kernels > in-flight using > > pycuda, but I think some other relatively recent project already > performed > > something similar (w.r.t. generating cuda kernels out of python > expressions). > > Aaahhh, I just had a quick look at Theano and it seems it's the project I > was > referring to. > > Good job! :) > > > Lluis > > -- > "And it's much the same thing with knowledge, for whenever you learn > something new, the whole world becomes that much richer." > -- The Princess of Pure Reason, as told by Norton Juster in The Phantom > Tollbooth > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- http://www-etud.iro.umontreal.ca/~bergstrj -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Mon Feb 20 15:12:51 2012 From: sturla at molden.no (Sturla Molden) Date: Mon, 20 Feb 2012 21:12:51 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <4F429B81.50304@grinta.net> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F429B81.50304@grinta.net> Message-ID: <4F42A943.70406@molden.no> Den 20.02.2012 20:14, skrev Daniele Nicolodi: > Hello Sturla, unrelated to the numpy tewrite debate, can you please > suggest some resources you think can be used to learn how to program > C++ "the proper way"? Thank you. Cheers, This is totally OT on this list, however ... Scott Meyer's books have been mentioned. Also look at some litterature on the STL (e.g. Josuittis). Getting the Boost library is essential as well. The Qt library have many examples of beautiful C++. But the most important part, in my opinion, is to put the "C with classes" mentality away. Look at it as compiled Python or Java. The STL (the standard C++ library) has classes that do the same as the types we use in Python --- there are parallels to tuple, dict, set, list, deque, etc. The STL is actually richer than Python. Just use them the way we use Python. With C++11 (the latest standard), even for loops can be like Python. There are lamdas and closures, to be used as in Python, and there is an 'auto' keyword for type inference; you don't have to declare the type of a variable, the compiler will figure it out. Don't use new[] just because you can, when there is std::vector that behaves lika Python list. If you need to allocate a resource, wrap it in a class. Allocate from the contructor and deallocate from the destructor. That way an exception cannot cause a resource leak, and the clean-up code will be called automatically when the object fall of the stack. If you need to control the lifetime of an object, make an inner block with curly brackets, and declare it on top of the block. Don't call new and delete to control where you want it to be allocated and deallocated. Nothing goes on the heap unless STL puts it there. Always put objects on the stack, never allocate to a pointer with new. Always use references, and forget about pointers. This has to do with putting the "C with classes" mentality away. Always implement a copy constructor so the classes work with the STL. std:: vector x(n); // ok void foobar(std:: vector& x); // ok double* x = new double [n]; // bad std:: vector *x = new std:: vector (n); // bad void foobar(std:: vector* x); // bad If you get any textbook on Windows programming from Microsoft Press, you have an excellent resource on what not to do. Verbose functions and field names, Hungarian notation, factories instead of constructors, etc. If you find yourself using macros or template magic to avoid the overhead of a virtual function (MFC, ATL, wxWidgets, FOX), for the expense of readability, you are probably doing something you shouldn't. COM is probably the worst example I know of, just compare the beautiful OpenGL to Direct3D. VTK is another example of what I consider ugly C++. But that's just my opinion. Sturla From bergstrj at iro.umontreal.ca Mon Feb 20 15:30:23 2012 From: bergstrj at iro.umontreal.ca (James Bergstra) Date: Mon, 20 Feb 2012 15:30:23 -0500 Subject: [Numpy-discussion] ndarray and lazy evaluation In-Reply-To: <87linxuvk7.fsf@ginnungagap.bsc.es> References: <87linxuvk7.fsf@ginnungagap.bsc.es> Message-ID: On Mon, Feb 20, 2012 at 2:57 PM, Llu?s wrote: > James Bergstra writes: > [...] > > I should add that the biggest benefit of expressing things as compound > > expressions in this way is not in saving temporaries (though that is > nice) it's > > being able to express enough computation work at a time that it offsets > the time > > required to ship the arguments off to a GPU for evaluation! > > Right, that's exacly what you need for an "external computation" to pay > off. > > Just out of curiosity (feel free to respond with a RTFM or a RTFP :)), do > you > support any of these? (sorry for the made-up names) > > * automatic transfer double-buffering > Not currently, but it would be quite straightforward to do it. Email theano-dev and ask how if you really want to know. > > * automatic problem partitioning into domains (e.g., multiple GPUs; even > better > if also supports nodes - MPI -) > Not currently, and it would be hard. > > * point-specific computations (e.g., code dependant on the thread id, > although > this can also be expressed in other ways, like index ranges) > > No. > * point-relative computations (the most common would be a stencil) > > No, but I think theano provides a decent expression language to tackle this. The "Composite" element-wise code generator is an example of how I would think about this. It provides point-relative computations across several arguments. You might want something different that applies a stencil computation across one or several arguments... the "scan" operator was another foray into this territory, and it got tricky when the stencil operation could have side-effects (like random number generation) and could define it's own input domain (stencil shape), but the result is quite powerful. -- http://www-etud.iro.umontreal.ca/~bergstrj -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Mon Feb 20 15:33:36 2012 From: sturla at molden.no (Sturla Molden) Date: Mon, 20 Feb 2012 21:33:36 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <4F42A943.70406@molden.no> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F429B81.50304@grinta.net> <4F42A943.70406@molden.no> Message-ID: <4F42AE20.9080607@molden.no> Den 20.02.2012 21:12, skrev Sturla Molden: > > If you need to control the lifetime of an object, make an inner block > with curly brackets, and declare it on top of the block. Don't call new > and delete to control where you want it to be allocated and deallocated. > Nothing goes on the heap unless STL puts it there. > Here is an example: // bad Foo *bar = new Foo(); delete Foo; // ok { Foo bar(); } Remember C++ does not allow a "finally" clause to exception handling. You cannot do this: try { Foo *bar = new Foo(); } finally { // syntax error delete Foo; } So... try { Foo *bar = new Foo(); } catch(...) { } // might not get here, possible // resource leak delete Foo; Which is why we should always do this: { Foo bar(); } This is perhaps the most common source of errors in C++ code. If we use C++ in the NumPy core, we need a Nazi regime against these type of obscure errors. Sturla From robert.kern at gmail.com Mon Feb 20 15:40:53 2012 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 20 Feb 2012 20:40:53 +0000 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <89CB07B6-E529-4D84-8B1E-B0F61D6048BB@gmail.com> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F419054.8040203@molden.no> <4F4266EA.702@molden.no> <89CB07B6-E529-4D84-8B1E-B0F61D6048BB@gmail.com> Message-ID: On Mon, Feb 20, 2012 at 19:55, Paul Anton Letnes wrote: > > On 20. feb. 2012, at 16:29, Sturla Molden wrote: >>> - in newer standards it has some nontrivial mathematical functions: gamma, bessel, etc. that numpy lacks right now >> >> That belongs to SciPy. > > I don't see exactly why. Why should numpy have exponential but not gamma functions? The division seems kinda arbitrary. Not that I am arguing violently for bessel functions in numpy. The semi-arbitrary dividing line that we have settled on is C99. If a special function is in the C99 standard, we'll accept an implementation for it in numpy. Part (well, most) of the rationale is just to have a clear dividing line even if it's fairly arbitrary. The other part is that if a decidedly non-mathematically-focused standard like C99 includes a special function in its standard library, then odds are good that it's something that is widely used enough as a building block for other things. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From travis at continuum.io Mon Feb 20 23:04:00 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 20 Feb 2012 22:04:00 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <4F427AED.3030903@molden.no> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3F2297.7010703@creativetrax.com> <4F402B46.4020103@molden.no> <4F4277F7.4000401@molden.no> <4F427AED.3030903@molden.no> Message-ID: Interesting you bring this up. I actually have a working prototype of using Python to emit LLVM. I will be showing it at the HPC tutorial that I am giving at PyCon. I will be making this available after PyCon to a wider audience as open source. It uses llvm-py (modified to work with LLVM 3.0) and code I wrote to do the translation from Python byte-code to LLVM. This LLVM can then be "JIT"ed. I have several applications that I would like to use this for. It would be possible to write "more of NumPy" using this approach. Initially, it makes it *very* easy to create a machine-code ufunc from Python code. There are other use-cases of having loops written in Python and plugged in to a calculation, filtering, or indexing framework that this system will be useful for. There is still a need for a core data-type object, a core array object, and a core calculation object. Maybe some-day these cores can be shrunk to a smaller subset and more of something along the lines of LLVM generation from Python can be used. But, there is a lot of work to do before that is possible. But, a lot of the currently pre-compiled loops can be done on the fly instead using this approach. There are several things I'm working on in that direction. This is not PyPy. It certainly uses the same ideas that they are using, but instead it fits into the CPython run-time and doesn't require changing the whole ecosystem. If you are interested in this work let me know. I think I'm going to call the project numpy-llvm, or fast-py, or something like that. It is available on github and will be open source (but it's still under active development). Here is an example of the code to create a ufunc using the system (this is like vectorize, but it creates machine code and by-passes the interpreter and so is 100x faster). from math import sin, pi def sinc(x): if x==0: return 1.0 else: return sin(x*pi)/(pi*x) from translate import Translate t = Translate(sinc) t.translate() print t.mod res = t.make_ufunc('sinc') -Travis On Feb 20, 2012, at 10:55 AM, Sturla Molden wrote: > Den 20.02.2012 17:42, skrev Sturla Molden: >> There are still other options than C or C++ that are worth considering. >> One would be to write NumPy in Python. E.g. we could use LLVM as a >> JIT-compiler and produce the performance critical code we need on the fly. >> >> > > LLVM and its C/C++ frontend Clang are BSD licenced. It compiles faster > than GCC and often produces better machine code. They can therefore be > used inside an array library. It would give a faster NumPy, and we could > keep most of it in Python. > > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at onerussian.com Mon Feb 20 23:35:50 2012 From: lists at onerussian.com (Yaroslav Halchenko) Date: Mon, 20 Feb 2012 23:35:50 -0500 Subject: [Numpy-discussion] is there an efficient way to get a random set of subsets/combinations? Message-ID: <20120221043549.GC17082@onerussian.com> Hi to all Numeric Python experts, could not think of a mailing list with better fit to my question which might have an obvious answer: straightforward (naive) Python code to answer my question would be something like import random, itertools n,p,k=100,50,10 # don't try to run with this numbers! ;) print random.sample(list(itertools.combinations(range(n), p)), k) so the goal is to get k (non-repeating) p-subsets of n, where n and p prohibitively large to first populate the full set of combinations. Thank you in advance ;-) -- =------------------------------------------------------------------= Keep in touch www.onerussian.com Yaroslav Halchenko www.ohloh.net/accounts/yarikoptic From cjordan1 at uw.edu Tue Feb 21 00:17:19 2012 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Mon, 20 Feb 2012 21:17:19 -0800 Subject: [Numpy-discussion] is there an efficient way to get a random set of subsets/combinations? In-Reply-To: <20120221043549.GC17082@onerussian.com> References: <20120221043549.GC17082@onerussian.com> Message-ID: If you're using numpy 2.0 (the development branch), the function numpy.random.choice might do what you're looking for. -Chris On Mon, Feb 20, 2012 at 8:35 PM, Yaroslav Halchenko wrote: > Hi to all Numeric ?Python experts, > > could not think of a mailing list with better fit to my question which might > have an obvious answer: > > straightforward (naive) Python code to answer my question would be > something like > > import random, itertools > n,p,k=100,50,10 ?# don't try to run with this numbers! ;) > print random.sample(list(itertools.combinations(range(n), p)), k) > > so the goal is to get k (non-repeating) p-subsets of n, where n and p > prohibitively large to first populate the full set of combinations. > > Thank you in advance ;-) > -- > =------------------------------------------------------------------= > Keep in touch ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? www.onerussian.com > Yaroslav Halchenko ? ? ? ? ? ? ? ? www.ohloh.net/accounts/yarikoptic > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From kalatsky at gmail.com Tue Feb 21 01:57:11 2012 From: kalatsky at gmail.com (Val Kalatsky) Date: Tue, 21 Feb 2012 00:57:11 -0600 Subject: [Numpy-discussion] is there an efficient way to get a random set of subsets/combinations? In-Reply-To: <20120221043549.GC17082@onerussian.com> References: <20120221043549.GC17082@onerussian.com> Message-ID: Hi Slava, Since your k is only 10, here is a quickie: import numpy as np arr = np.arange(n) for i in range(k): np.random.shuffle(arr) print np.sort(arr[:p]) If your ever get non-unique entries in a set of k=10 for your n and p, consider yourself lucky:) Val On Mon, Feb 20, 2012 at 10:35 PM, Yaroslav Halchenko wrote: > Hi to all Numeric Python experts, > > could not think of a mailing list with better fit to my question which > might > have an obvious answer: > > straightforward (naive) Python code to answer my question would be > something like > > import random, itertools > n,p,k=100,50,10 # don't try to run with this numbers! ;) > print random.sample(list(itertools.combinations(range(n), p)), k) > > so the goal is to get k (non-repeating) p-subsets of n, where n and p > prohibitively large to first populate the full set of combinations. > > Thank you in advance ;-) > -- > =------------------------------------------------------------------= > Keep in touch www.onerussian.com > Yaroslav Halchenko www.ohloh.net/accounts/yarikoptic > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Feb 21 06:44:22 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 21 Feb 2012 11:44:22 +0000 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3F2297.7010703@creativetrax.com> <4F402B46.4020103@molden.no> <4F4277F7.4000401@molden.no> <4F427AED.3030903@molden.no> Message-ID: On Tue, Feb 21, 2012 at 4:04 AM, Travis Oliphant wrote: > It uses llvm-py (modified to work with LLVM 3.0) and code I wrote to do the > translation from Python byte-code to LLVM. ? This LLVM can then be "JIT"ed. > ? I have several applications that I would like to use this for. ? It would > be possible to write "more of NumPy" using this approach. ? ? Initially, it > makes it *very* easy to create a machine-code ufunc from Python code. > There are other use-cases of having loops written in Python and plugged in > to a calculation, filtering, or indexing framework that this system will be > useful for. Very neat! It's interesting that you decided to use Python bytecode as your source representation. I'm curious what your strategy is for overcoming all the challenges that have plagued previous attempts to efficiently compile "real Python"? (Unladen Swallow, PyPy, etc.) Just support some subset of the language that's easy to handle and do type inference over? Or do you plan to continue using Python as your input language? I guess the conventional wisdom would be that there's a lot of potential for using LLVM to generate efficient specialized loops for numpy on the fly (cf. llvm-pipe for a similar and successful project), but that the key would be to use a more specialized representation than Python bytecode -- one that left out hard/irrelevant parts of the language, that had richer type information, that didn't change around for different Python releases, etc. -- Nathaniel From lists at onerussian.com Tue Feb 21 09:34:18 2012 From: lists at onerussian.com (Yaroslav Halchenko) Date: Tue, 21 Feb 2012 09:34:18 -0500 Subject: [Numpy-discussion] is there an efficient way to get a random set of subsets/combinations? In-Reply-To: Message-ID: <20120221143418.GD17082@onerussian.com> Thank you guys for replies! On Mon, 20 Feb 2012, Christopher Jordan-Squire wrote: > If you're using numpy 2.0 (the development branch), the function > numpy.random.choice might do what you're looking for. yeap -- handy one, although would require manual control over repetitions lazy me was trying to avoid ;) On Tue, 21 Feb 2012, Val Kalatsky wrote: > Hi Slava, Mom, is that you? ;-) > Since your k is only 10, here is a?quickie: > import numpy as np > arr = np.arange(n) > for i in range(k): > ? ? np.random.shuffle(arr) > ? ? print np.sort(arr[:p]) > If your ever get non-unique entries in a set of k=10 for your n and p, > consider yourself lucky:) well -- I just thought that there might be an ideal function which in limit would return all combinations if given large enough k for reasonably small (n, p)... but indeed I should just put a logic in place to treat those cases separately. -- =------------------------------------------------------------------= Keep in touch www.onerussian.com Yaroslav Halchenko www.ohloh.net/accounts/yarikoptic From ndbecker2 at gmail.com Tue Feb 21 13:26:17 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 21 Feb 2012 13:26:17 -0500 Subject: [Numpy-discussion] Where is arrayobject.h? Message-ID: What is the correct way to find the installed location of arrayobject.h? On fedora, I had been using: (via scons): import distutils.sysconfig PYTHONINC = distutils.sysconfig.get_python_inc() PYTHONLIB = distutils.sysconfig.get_python_lib(1) NUMPYINC = PYTHONLIB + '/numpy/core/include' But on ubuntu, this fails. It seems numpy was installed into /usr/local/lib/..., while PYTHONLIB expands to /usr/lib/python2.7/dist-packages. Is there a universal method? From sole at esrf.fr Tue Feb 21 13:31:55 2012 From: sole at esrf.fr (=?ISO-8859-1?Q?=22V=2E_Armando_Sol=E9=22?=) Date: Tue, 21 Feb 2012 19:31:55 +0100 Subject: [Numpy-discussion] Where is arrayobject.h? In-Reply-To: References: Message-ID: <4F43E31B.3000802@esrf.fr> On 21/02/2012 19:26, Neal Becker wrote: > What is the correct way to find the installed location of arrayobject.h? > > On fedora, I had been using: > (via scons): > > import distutils.sysconfig > PYTHONINC = distutils.sysconfig.get_python_inc() > PYTHONLIB = distutils.sysconfig.get_python_lib(1) > > NUMPYINC = PYTHONLIB + '/numpy/core/include' > > But on ubuntu, this fails. It seems numpy was installed into > /usr/local/lib/..., while PYTHONLIB expands to /usr/lib/python2.7/dist-packages. > > Is there a universal method? > > I use: import numpy numpy.get_include() If that is universal I cannot tell. Armando From gael.varoquaux at normalesup.org Tue Feb 21 17:18:53 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 21 Feb 2012 23:18:53 +0100 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <996B3429-7357-44D9-8CD5-45EB4D397B31@iro.umontreal.ca> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <14348F26-27EB-4313-8F90-D9726DC2144D@continuum.io> <3E9F109C-241F-4B91-832D-CDB09941206B@continuum.io> <5FFF6A19-661E-445A-95F9-B0366C148C0E@continuum.io> <996B3429-7357-44D9-8CD5-45EB4D397B31@iro.umontreal.ca> Message-ID: <20120221221853.GB13483@phare.normalesup.org> On Sun, Feb 19, 2012 at 05:44:27AM -0500, David Warde-Farley wrote: > I think the comments about the developer audience NumPy will attract are important. There may be lots of C++ developers out there, but the intersection of (truly competent in C++) and (likely to involve oneself in NumPy development) may well be quite small. That's a very valid concern. It is reminiscent of a possible cause to our lack of contributors to Mayavi: contributing to Mayavi requires knowing VTK. One of the major benefits of Mayavi is that it makes it is to use the power of VTK without understanding it well. The intersection of the people interested in using Mayavi and able to contribute to it is almost empty. This is stricking to me, because I know a lot of who know VTK well. Most of them couldn't care less for Mayavi: they are happy coding directly in VTK in C++. This is also a reason why I don't code UIs any more: I simply cannot find the resource to maintain them in proportion with the number of users that they garner. A sad statement. Gael From alan at ajackson.org Tue Feb 21 21:31:27 2012 From: alan at ajackson.org (alan at ajackson.org) Date: Tue, 21 Feb 2012 20:31:27 -0600 Subject: [Numpy-discussion] Live coding demonstration Message-ID: <20120221203127.3b7d845d@ajackson.org> This is the sort of programming environment I would love to have in python. http://flowingdata.com/2012/02/20/live-coding-and-inventing-on-principle/ -- ----------------------------------------------------------------------- | Alan K. Jackson | To see a World in a Grain of Sand | | alan at ajackson.org | And a Heaven in a Wild Flower, | | www.ajackson.org | Hold Infinity in the palm of your hand | | Houston, Texas | And Eternity in an hour. - Blake | ----------------------------------------------------------------------- From fccoelho at gmail.com Wed Feb 22 06:34:27 2012 From: fccoelho at gmail.com (Flavio Coelho) Date: Wed, 22 Feb 2012 09:34:27 -0200 Subject: [Numpy-discussion] Live coding demonstration In-Reply-To: <20120221203127.3b7d845d@ajackson.org> References: <20120221203127.3b7d845d@ajackson.org> Message-ID: Shouldn't be hard to implement as a set of plugins to an editor. Hope someone starts such a project. On Wed, Feb 22, 2012 at 00:31, wrote: > This is the sort of programming environment I would love to have in > python. > > > http://flowingdata.com/2012/02/20/live-coding-and-inventing-on-principle/ > > -- > ----------------------------------------------------------------------- > | Alan K. Jackson | To see a World in a Grain of Sand | > | alan at ajackson.org | And a Heaven in a Wild Flower, | > | www.ajackson.org | Hold Infinity in the palm of your hand | > | Houston, Texas | And Eternity in an hour. - Blake | > ----------------------------------------------------------------------- > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Fl?vio Code?o Coelho ================ +55(21) 3799-5567 Professor Escola de Matem?tica Aplicada Funda??o Get?lio Vargas Rio de Janeiro - RJ Brasil -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Wed Feb 22 07:25:10 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Wed, 22 Feb 2012 07:25:10 -0500 Subject: [Numpy-discussion] Proposed Roadmap Overview References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F429B81.50304@grinta.net> <4F42A943.70406@molden.no> Message-ID: It's great advice to say avoid using new instead rely on scope and classes such as std::vector. I just want to point out, that sometimes objects must outlive scope. For those cases, std::shared_ptr can be helpful. From perry at stsci.edu Wed Feb 22 08:44:27 2012 From: perry at stsci.edu (Perry Greenfield) Date: Wed, 22 Feb 2012 08:44:27 -0500 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <20120221221853.GB13483@phare.normalesup.org> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <14348F26-27EB-4313-8F90-D9726DC2144D@continuum.io> <3E9F109C-241F-4B91-832D-CDB09941206B@continuum.io> <5FFF6A19-661E-445A-95F9-B0366C148C0E@continuum.io> <996B3429-7357-44D9-8CD5-45EB4D397B31@iro.umontreal.ca> <20120221221853.GB13483@phare.normalesup.org> Message-ID: <4F2D80E8-6534-4997-A216-3BB3C338F506@stsci.edu> I, like Travis, have my worries about C++. But if those actually doing the work (and particularly the subsequent support) feel it is the best language for implementation, I can live with that. I particularly like the incremental and conservative approach to introducing C++ that was proposed by Mark. What I would like to stress in doing this that all along that process, extensive testing is performed (preferably with some build-bot process) to ensure that whatever C++ features are being introduced are fully portable and don't present intractable distribution issues. Whatever we do, we don't want to go far down that road only to find out that there is no good solution in that regard with certain platforms. We are particularly sensitive to this issue since we distribute our software, and anything that makes installation of numpy problematic is a very serious issue for us. It has to be an easy install on all common platforms. That is one thing C allowed, despite all its flaws, which is near universal installation advantages over any other language available. If the appropriate subset of C++ can achieve that, great. But it has to be proved continuously as it is incrementally adopted. (I'm not much persuaded by comments like "my experience has shown it not to be a problem") Is there any disagreement with this? It's less clear to me what to do about more unusual platforms. It seems to me that some sort of testing against those that may prove important in the future (e.g., gpus?) will be needed, but how to do this is not clear to me. Perry From mangabasi at gmail.com Wed Feb 22 08:51:51 2012 From: mangabasi at gmail.com (=?UTF-8?Q?Fahredd=C4=B1n_Basegmez?=) Date: Wed, 22 Feb 2012 08:51:51 -0500 Subject: [Numpy-discussion] Live coding demonstration In-Reply-To: References: <20120221203127.3b7d845d@ajackson.org> Message-ID: I have been working on an application somehow similar to his approach. Instead of trying to explain what it is I will let you see it yourselves. http://www.youtube.com/watch?v=rQUW5BvdIkc&list=UUiomLkTUHKpZohYYfj1WsMg&index=7&feature=plcp http://www.youtube.com/watch?v=NjpUmSfo3mY&list=UUiomLkTUHKpZohYYfj1WsMg&index=6&feature=plcp http://www.youtube.com/watch?v=K1pdoLi6UPc&list=UUiomLkTUHKpZohYYfj1WsMg&index=9&feature=plcp http://www.youtube.com/watch?v=_y1nWiIoKk8&list=UUiomLkTUHKpZohYYfj1WsMg&index=3&feature=plcp Fahri On Wed, Feb 22, 2012 at 6:34 AM, Flavio Coelho wrote: > Shouldn't be hard to implement as a set of plugins to an editor. > Hope someone starts such a project. > > > > > On Wed, Feb 22, 2012 at 00:31, wrote: > >> This is the sort of programming environment I would love to have in >> python. >> >> >> http://flowingdata.com/2012/02/20/live-coding-and-inventing-on-principle/ >> >> -- >> ----------------------------------------------------------------------- >> | Alan K. Jackson | To see a World in a Grain of Sand | >> | alan at ajackson.org | And a Heaven in a Wild Flower, | >> | www.ajackson.org | Hold Infinity in the palm of your hand | >> | Houston, Texas | And Eternity in an hour. - Blake | >> ----------------------------------------------------------------------- >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > > -- > Fl?vio Code?o Coelho > ================ > +55(21) 3799-5567 > Professor > Escola de Matem?tica Aplicada > Funda??o Get?lio Vargas > Rio de Janeiro - RJ > Brasil > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjstarn at usgs.gov Wed Feb 22 08:56:04 2012 From: jjstarn at usgs.gov (Jeffrey Starn) Date: Wed, 22 Feb 2012 08:56:04 -0500 Subject: [Numpy-discussion] NumPy-Discussion Digest, Vol 65, Issue 165 In-Reply-To: References: Message-ID: Good for us Feds! (But I'm sure some people will think it's a waste of money) Not sure how that impacts our building here, but actually the landscaping is not too bad considering where we are. I don't know what's native in our front planting. --------------------------------------------------------------------------------------------------------- Jeff Starn, Groundwater Specialist U.S. Geological Survey 101 Pitkin Street East Hartford, CT 06108 (860) 291-6746 jjstarn at usgs.gov "Our observation of the planets' regular motion was the first triumph of empirical science over irrational dogma. We named them after the gods just to be safe." --Jon Stewart --------------------------------------------------------------------------------------------------------- From: numpy-discussion-request at scipy.org To: numpy-discussion at scipy.org Date: 02/22/2012 08:52 AM Subject: NumPy-Discussion Digest, Vol 65, Issue 165 Sent by: numpy-discussion-bounces at scipy.org Send NumPy-Discussion mailing list submissions to numpy-discussion at scipy.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.scipy.org/mailman/listinfo/numpy-discussion or, via email, send a message with subject or body 'help' to numpy-discussion-request at scipy.org You can reach the person managing the list at numpy-discussion-owner at scipy.org When replying, please edit your Subject line so it is more specific than "Re: Contents of NumPy-Discussion digest..." Today's Topics: 1. Where is arrayobject.h? (Neal Becker) 2. Re: Where is arrayobject.h? (V. Armando Sol?) 3. Re: Proposed Roadmap Overview (Gael Varoquaux) 4. Live coding demonstration (alan at ajackson.org) 5. Re: Live coding demonstration (Flavio Coelho) 6. Re: Proposed Roadmap Overview (Neal Becker) 7. Re: Proposed Roadmap Overview (Perry Greenfield) 8. Re: Live coding demonstration (Fahredd?n Basegmez) ---------------------------------------------------------------------- Message: 1 Date: Tue, 21 Feb 2012 13:26:17 -0500 From: Neal Becker Subject: [Numpy-discussion] Where is arrayobject.h? To: numpy-discussion at scipy.org Message-ID: Content-Type: text/plain; charset="ISO-8859-1" What is the correct way to find the installed location of arrayobject.h? On fedora, I had been using: (via scons): import distutils.sysconfig PYTHONINC = distutils.sysconfig.get_python_inc() PYTHONLIB = distutils.sysconfig.get_python_lib(1) NUMPYINC = PYTHONLIB + '/numpy/core/include' But on ubuntu, this fails. It seems numpy was installed into /usr/local/lib/..., while PYTHONLIB expands to /usr/lib/python2.7/dist-packages. Is there a universal method? ------------------------------ Message: 2 Date: Tue, 21 Feb 2012 19:31:55 +0100 From: "V. Armando Sol?" Subject: Re: [Numpy-discussion] Where is arrayobject.h? To: Discussion of Numerical Python Message-ID: <4F43E31B.3000802 at esrf.fr> Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 21/02/2012 19:26, Neal Becker wrote: > What is the correct way to find the installed location of arrayobject.h? > > On fedora, I had been using: > (via scons): > > import distutils.sysconfig > PYTHONINC = distutils.sysconfig.get_python_inc() > PYTHONLIB = distutils.sysconfig.get_python_lib(1) > > NUMPYINC = PYTHONLIB + '/numpy/core/include' > > But on ubuntu, this fails. It seems numpy was installed into > /usr/local/lib/..., while PYTHONLIB expands to /usr/lib/python2.7/dist-packages. > > Is there a universal method? > > I use: import numpy numpy.get_include() If that is universal I cannot tell. Armando ------------------------------ Message: 3 Date: Tue, 21 Feb 2012 23:18:53 +0100 From: Gael Varoquaux Subject: Re: [Numpy-discussion] Proposed Roadmap Overview To: Discussion of Numerical Python Message-ID: <20120221221853.GB13483 at phare.normalesup.org> Content-Type: text/plain; charset=us-ascii On Sun, Feb 19, 2012 at 05:44:27AM -0500, David Warde-Farley wrote: > I think the comments about the developer audience NumPy will attract are important. There may be lots of C++ developers out there, but the intersection of (truly competent in C++) and (likely to involve oneself in NumPy development) may well be quite small. That's a very valid concern. It is reminiscent of a possible cause to our lack of contributors to Mayavi: contributing to Mayavi requires knowing VTK. One of the major benefits of Mayavi is that it makes it is to use the power of VTK without understanding it well. The intersection of the people interested in using Mayavi and able to contribute to it is almost empty. This is stricking to me, because I know a lot of who know VTK well. Most of them couldn't care less for Mayavi: they are happy coding directly in VTK in C++. This is also a reason why I don't code UIs any more: I simply cannot find the resource to maintain them in proportion with the number of users that they garner. A sad statement. Gael ------------------------------ Message: 4 Date: Tue, 21 Feb 2012 20:31:27 -0600 From: Subject: [Numpy-discussion] Live coding demonstration To: numpy-discussion at scipy.org Message-ID: <20120221203127.3b7d845d at ajackson.org> Content-Type: text/plain; charset=US-ASCII This is the sort of programming environment I would love to have in python. http://flowingdata.com/2012/02/20/live-coding-and-inventing-on-principle/ -- ----------------------------------------------------------------------- | Alan K. Jackson | To see a World in a Grain of Sand | | alan at ajackson.org | And a Heaven in a Wild Flower, | | www.ajackson.org | Hold Infinity in the palm of your hand | | Houston, Texas | And Eternity in an hour. - Blake | ----------------------------------------------------------------------- ------------------------------ Message: 5 Date: Wed, 22 Feb 2012 09:34:27 -0200 From: Flavio Coelho Subject: Re: [Numpy-discussion] Live coding demonstration To: Discussion of Numerical Python Message-ID: Content-Type: text/plain; charset="iso-8859-1" Shouldn't be hard to implement as a set of plugins to an editor. Hope someone starts such a project. On Wed, Feb 22, 2012 at 00:31, wrote: > This is the sort of programming environment I would love to have in > python. > > > http://flowingdata.com/2012/02/20/live-coding-and-inventing-on-principle/ > > -- > ----------------------------------------------------------------------- > | Alan K. Jackson | To see a World in a Grain of Sand | > | alan at ajackson.org | And a Heaven in a Wild Flower, | > | www.ajackson.org | Hold Infinity in the palm of your hand | > | Houston, Texas | And Eternity in an hour. - Blake | > ----------------------------------------------------------------------- > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Fl?vio Code?o Coelho ================ +55(21) 3799-5567 Professor Escola de Matem?tica Aplicada Funda??o Get?lio Vargas Rio de Janeiro - RJ Brasil -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120222/46382d84/attachment-0001.html ------------------------------ Message: 6 Date: Wed, 22 Feb 2012 07:25:10 -0500 From: Neal Becker Subject: Re: [Numpy-discussion] Proposed Roadmap Overview To: numpy-discussion at scipy.org Message-ID: Content-Type: text/plain; charset="ISO-8859-1" It's great advice to say avoid using new instead rely on scope and classes such as std::vector. I just want to point out, that sometimes objects must outlive scope. For those cases, std::shared_ptr can be helpful. ------------------------------ Message: 7 Date: Wed, 22 Feb 2012 08:44:27 -0500 From: Perry Greenfield Subject: Re: [Numpy-discussion] Proposed Roadmap Overview To: Discussion of Numerical Python Message-ID: <4F2D80E8-6534-4997-A216-3BB3C338F506 at stsci.edu> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes I, like Travis, have my worries about C++. But if those actually doing the work (and particularly the subsequent support) feel it is the best language for implementation, I can live with that. I particularly like the incremental and conservative approach to introducing C++ that was proposed by Mark. What I would like to stress in doing this that all along that process, extensive testing is performed (preferably with some build-bot process) to ensure that whatever C++ features are being introduced are fully portable and don't present intractable distribution issues. Whatever we do, we don't want to go far down that road only to find out that there is no good solution in that regard with certain platforms. We are particularly sensitive to this issue since we distribute our software, and anything that makes installation of numpy problematic is a very serious issue for us. It has to be an easy install on all common platforms. That is one thing C allowed, despite all its flaws, which is near universal installation advantages over any other language available. If the appropriate subset of C++ can achieve that, great. But it has to be proved continuously as it is incrementally adopted. (I'm not much persuaded by comments like "my experience has shown it not to be a problem") Is there any disagreement with this? It's less clear to me what to do about more unusual platforms. It seems to me that some sort of testing against those that may prove important in the future (e.g., gpus?) will be needed, but how to do this is not clear to me. Perry ------------------------------ Message: 8 Date: Wed, 22 Feb 2012 08:51:51 -0500 From: Fahredd?n Basegmez Subject: Re: [Numpy-discussion] Live coding demonstration To: Discussion of Numerical Python Message-ID: Content-Type: text/plain; charset="iso-8859-1" I have been working on an application somehow similar to his approach. Instead of trying to explain what it is I will let you see it yourselves. http://www.youtube.com/watch?v=rQUW5BvdIkc&list=UUiomLkTUHKpZohYYfj1WsMg&index=7&feature=plcp http://www.youtube.com/watch?v=NjpUmSfo3mY&list=UUiomLkTUHKpZohYYfj1WsMg&index=6&feature=plcp http://www.youtube.com/watch?v=K1pdoLi6UPc&list=UUiomLkTUHKpZohYYfj1WsMg&index=9&feature=plcp http://www.youtube.com/watch?v=_y1nWiIoKk8&list=UUiomLkTUHKpZohYYfj1WsMg&index=3&feature=plcp Fahri On Wed, Feb 22, 2012 at 6:34 AM, Flavio Coelho wrote: > Shouldn't be hard to implement as a set of plugins to an editor. > Hope someone starts such a project. > > > > > On Wed, Feb 22, 2012 at 00:31, wrote: > >> This is the sort of programming environment I would love to have in >> python. >> >> >> http://flowingdata.com/2012/02/20/live-coding-and-inventing-on-principle/ >> >> -- >> ----------------------------------------------------------------------- >> | Alan K. Jackson | To see a World in a Grain of Sand | >> | alan at ajackson.org | And a Heaven in a Wild Flower, | >> | www.ajackson.org | Hold Infinity in the palm of your hand | >> | Houston, Texas | And Eternity in an hour. - Blake | >> ----------------------------------------------------------------------- >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > > -- > Fl?vio Code?o Coelho > ================ > +55(21) 3799-5567 > Professor > Escola de Matem?tica Aplicada > Funda??o Get?lio Vargas > Rio de Janeiro - RJ > Brasil > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120222/619a2ee2/attachment.html ------------------------------ _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion End of NumPy-Discussion Digest, Vol 65, Issue 165 ************************************************* -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.verelst at gmail.com Wed Feb 22 12:46:28 2012 From: david.verelst at gmail.com (David Verelst) Date: Wed, 22 Feb 2012 18:46:28 +0100 Subject: [Numpy-discussion] Numpy 1.5.1/1.6.1 doesn't build doc with sphinx 1.1.2 In-Reply-To: References: Message-ID: <4F4529F4.4030303@gmail.com> Note sure if this already has been discussed, but it seems that latest Sphinx, 1.2predev-20120222 directly from their Hg repository, does not have this problem any more. While 1.2 failed to build the documentation on my end, 1.2predev delivered a result. Regards, David On 24/11/11 15:31, Pauli Virtanen wrote: > 24.11.2011 15:22, Sandro Tosi kirjoitti: > [clip] >> The full log of the debian package build is at: >> http://people.debian.org/~jwilk/tmp/python-numpy_1.6.1-3_i386.build >> >> attached is the file left in /tmp by sphinx for the error. >> >> Could you please give it a look? > Seems like some sort of a Sphinx bug (rather than having to do with the > Sphinx extensions Numpy uses). It appears it doesn't like having a > `glossary::` specified in a docstring used in `automodule::`. Or, > alternatively, there's something wrong with the formatting in the file > `numpy/doc/glossary.py`. > > Workarounds: copy-paste the glossary list from `glossary.py` to > `doc/source/glossary.txt` in place of `automodule:: numpy.doc.glossary` > > If that doesn't help, then something in the formatting of the glossary > list that makes Sphinx to choke (and it's certainly then a Sphinx bug). > From charlesr.harris at gmail.com Wed Feb 22 13:27:37 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 22 Feb 2012 11:27:37 -0700 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <4F2D80E8-6534-4997-A216-3BB3C338F506@stsci.edu> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <14348F26-27EB-4313-8F90-D9726DC2144D@continuum.io> <3E9F109C-241F-4B91-832D-CDB09941206B@continuum.io> <5FFF6A19-661E-445A-95F9-B0366C148C0E@continuum.io> <996B3429-7357-44D9-8CD5-45EB4D397B31@iro.umontreal.ca> <20120221221853.GB13483@phare.normalesup.org> <4F2D80E8-6534-4997-A216-3BB3C338F506@stsci.edu> Message-ID: Hi Perry, On Wed, Feb 22, 2012 at 6:44 AM, Perry Greenfield wrote: > I, like Travis, have my worries about C++. But if those actually doing > the work (and particularly the subsequent support) feel it is the best > language for implementation, I can live with that. > > I particularly like the incremental and conservative approach to > introducing C++ that was proposed by Mark. What I would like to stress > in doing this that all along that process, extensive testing is > performed (preferably with some build-bot process) to ensure that > whatever C++ features are being introduced are fully portable and > don't present intractable distribution issues. Whatever we do, we > don't want to go far down that road only to find out that there is no > good solution in that regard with certain platforms. > > We are particularly sensitive to this issue since we distribute our > software, and anything that makes installation of numpy problematic is > a very serious issue for us. It has to be an easy install on all > common platforms. That is one thing C allowed, despite all its flaws, > which is near universal installation advantages over any other > language available. If the appropriate subset of C++ can achieve that, > great. But it has to be proved continuously as it is incrementally > adopted. (I'm not much persuaded by comments like "my experience has > shown it not to be a problem") > > Is there any disagreement with this? > > It's less clear to me what to do about more unusual platforms. It > seems to me that some sort of testing against those that may prove > important in the future (e.g., gpus?) will be needed, but how to do > this is not clear to me. > > Your group has been one of the best for testing numpy. What systems do you support at this time? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From morph at debian.org Wed Feb 22 13:32:43 2012 From: morph at debian.org (Sandro Tosi) Date: Wed, 22 Feb 2012 19:32:43 +0100 Subject: [Numpy-discussion] Numpy 1.5.1/1.6.1 doesn't build doc with sphinx 1.1.2 In-Reply-To: <4F4529F4.4030303@gmail.com> References: <4F4529F4.4030303@gmail.com> Message-ID: On Wed, Feb 22, 2012 at 18:46, David Verelst wrote: > Note sure if this already has been discussed, but it seems that latest > Sphinx, 1.2predev-20120222 directly from their Hg repository, does not > have this problem any more. While 1.2 failed to build the documentation > on my end, 1.2predev delivered a result. I'm sorry I didn't follow this up: the fix is already in the numpy repository, and i've applied in Debian with this patch (url to upstream tracker in the file): http://patch-tracker.debian.org/patch/series/view/python-numpy/1:1.6.1-5/20_sphinx_1.1.2.diff Regards, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From chaoyuejoy at gmail.com Wed Feb 22 16:45:40 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Wed, 22 Feb 2012 22:45:40 +0100 Subject: [Numpy-discussion] python geospatial package? Message-ID: Hi all, Is anyone using some python geospatial package that can do jobs like intersection, etc. the job is like you automatically extract a region on a global map etc. thanks and cheers, Chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Feb 22 17:47:46 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 22 Feb 2012 14:47:46 -0800 Subject: [Numpy-discussion] np.longlong casts to int Message-ID: Hi, I was gaily using np.longlong for casting to the highest available float type when I noticed this: In [4]: np.array([2.1], dtype=np.longlong) Out[4]: array([2], dtype=int64) whereas: In [5]: np.array([2.1], dtype=np.float128) Out[5]: array([ 2.1], dtype=float128) This on OSX snow leopard numpies 1.2 .1 -> current devel and OSX tiger PPC recent devel. I had the impression that np.float128 and np.longlong would be identical in behavior - but I guess not? Best, Matthew From jniehof at lanl.gov Wed Feb 22 17:49:33 2012 From: jniehof at lanl.gov (Jonathan T. Niehof) Date: Wed, 22 Feb 2012 15:49:33 -0700 Subject: [Numpy-discussion] np.longlong casts to int In-Reply-To: References: Message-ID: <4F4570FD.6090608@lanl.gov> On 02/22/2012 03:47 PM, Matthew Brett wrote: > Hi, > > I was gaily using np.longlong for casting to the highest available > float type when I noticed this: > > In [4]: np.array([2.1], dtype=np.longlong) > Out[4]: array([2], dtype=int64) > > whereas: > > In [5]: np.array([2.1], dtype=np.float128) > Out[5]: array([ 2.1], dtype=float128) > > This on OSX snow leopard numpies 1.2 .1 -> current devel and OSX tiger > PPC recent devel. > > I had the impression that np.float128 and np.longlong would be > identical in behavior - but I guess not? A C long (and longlong) is an integer type. Were you expecting int128? -- Jonathan Niehof ISR-3 Space Data Systems Los Alamos National Laboratory MS-D466 Los Alamos, NM 87545 Phone: 505-667-9595 email: jniehof at lanl.gov Correspondence / Technical data or Software Publicly Available From stefan at sun.ac.za Wed Feb 22 20:21:34 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 22 Feb 2012 17:21:34 -0800 Subject: [Numpy-discussion] np.longlong casts to int In-Reply-To: References: Message-ID: On Wed, Feb 22, 2012 at 2:47 PM, Matthew Brett wrote: > In [4]: np.array([2.1], dtype=np.longlong) > Out[4]: array([2], dtype=int64) Maybe just a typo: In [3]: np.array([2.1], dtype=np.longfloat) Out[3]: array([ 2.1], dtype=float128) St?fan From matthew.brett at gmail.com Wed Feb 22 20:24:39 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 22 Feb 2012 17:24:39 -0800 Subject: [Numpy-discussion] np.longlong casts to int In-Reply-To: References: Message-ID: 2012/2/22 St?fan van der Walt : > On Wed, Feb 22, 2012 at 2:47 PM, Matthew Brett wrote: >> In [4]: np.array([2.1], dtype=np.longlong) >> Out[4]: array([2], dtype=int64) > > Maybe just a typo: > > In [3]: np.array([2.1], dtype=np.longfloat) > Out[3]: array([ 2.1], dtype=float128) A thinko maybe. Luckily I was in fact using longdouble in the live code, See you, Matthew From jrocher at enthought.com Wed Feb 22 20:25:13 2012 From: jrocher at enthought.com (Jonathan Rocher) Date: Wed, 22 Feb 2012 19:25:13 -0600 Subject: [Numpy-discussion] python geospatial package? In-Reply-To: References: Message-ID: Hi Chao, What do you want to do exactly? Did look at GDAL http://www.gdal.org/ ? Jonathan On Wed, Feb 22, 2012 at 3:45 PM, Chao YUE wrote: > Hi all, > > Is anyone using some python geospatial package that can do jobs like > intersection, etc. the job is like you automatically extract a region on a > global map etc. > > thanks and cheers, > > Chao > > -- > > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > > ************************************************************************************ > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Jonathan Rocher, PhD Scientific software developer Enthought, Inc. jrocher at enthought.com 1-512-536-1057 http://www.enthought.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From teoliphant at gmail.com Wed Feb 22 21:37:48 2012 From: teoliphant at gmail.com (Travis Oliphant) Date: Wed, 22 Feb 2012 20:37:48 -0600 Subject: [Numpy-discussion] ABI status of Master Message-ID: Hey all, From what I can tell, the master branch is still ABI compatible with NumPy 1.7. Is that true? I'd like to relabel the version of the master branch to 1.8. Does anyone see any problems with that? Thanks, -Travis From ralf.gommers at googlemail.com Thu Feb 23 02:01:18 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 23 Feb 2012 08:01:18 +0100 Subject: [Numpy-discussion] ABI status of Master In-Reply-To: References: Message-ID: On Thu, Feb 23, 2012 at 3:37 AM, Travis Oliphant wrote: > Hey all, > > >From what I can tell, the master branch is still ABI compatible with > NumPy 1.7. Is that true? > > I'd like to relabel the version of the master branch to 1.8. Does > anyone see any problems with that? > Before we branch maintenance/1.7.x, you should relabel it to 1.7. That should have been done a while ago. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Thu Feb 23 02:10:18 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 23 Feb 2012 01:10:18 -0600 Subject: [Numpy-discussion] ABI status of Master In-Reply-To: References: Message-ID: <2DAFC2F5-4B0B-4C94-A4EE-8C552538F6E3@continuum.io> Definitely! Thanks for the reminder. Travis -- Travis Oliphant (on a mobile) 512-826-7480 On Feb 23, 2012, at 1:01 AM, Ralf Gommers wrote: > > > On Thu, Feb 23, 2012 at 3:37 AM, Travis Oliphant wrote: > Hey all, > > >From what I can tell, the master branch is still ABI compatible with NumPy 1.7. Is that true? > > I'd like to relabel the version of the master branch to 1.8. Does anyone see any problems with that? > > Before we branch maintenance/1.7.x, you should relabel it to 1.7. That should have been done a while ago. > > Ralf > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.haessig at crans.org Thu Feb 23 04:06:10 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Thu, 23 Feb 2012 10:06:10 +0100 Subject: [Numpy-discussion] np.longlong casts to int In-Reply-To: References: Message-ID: <4F460182.5070004@crans.org> Hi, Le 23/02/2012 02:24, Matthew Brett a ?crit : > Luckily I was in fact using longdouble in the live code, I had never "exotic" floating point precision, so thanks for your post which made me take a look at docstring and documentation. If I got it right from the docstring, 'np.longdouble', 'np.longfloat' are all in fact 'np.float128'. (numpy 1.5) However, I was surprised that float128 is not mentioned in the array of available types in the user guide. http://docs.scipy.org/doc/numpy/user/basics.types.html Is there a specific reason for this absence, or is just about visiting the documentation wiki ;-) ? Additionally, I don't know what are the writing guidelines of the user guide, but would it make sense to add some "new numpy 1.x" messages as in the Python doc. I'm thinking here of np.float16. I know it exists from messages on this mailing list but my 1.5 don't have it. Best, Pierre PS : I found float128 mentionned in the reference http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html#built-in-scalar-types However, it is not as easily readable as the user guide (which makes sense !). Does the following statements mean that those types are not available on all platforms ? float96 96 bits, platform? float128 128 bits, platform? -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From schut at sarvision.nl Thu Feb 23 04:14:56 2012 From: schut at sarvision.nl (Vincent Schut) Date: Thu, 23 Feb 2012 10:14:56 +0100 Subject: [Numpy-discussion] python geospatial package? In-Reply-To: References: Message-ID: On 02/22/2012 10:45 PM, Chao YUE wrote: > Hi all, > > Is anyone using some python geospatial package that can do jobs like > intersection, etc. the job is like you automatically extract a region > on a global map etc. > > thanks and cheers, > > Chao Chao, shapely would do this, though I found it had a bit of a steep learning curve. Or you could go the gdal/ogr way, which uses the geos library under the hood (if present) to do geometrical operations like intersections etc. cheers, Vincent. > > -- > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > ************************************************************************************ > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From francesc at continuum.io Thu Feb 23 06:40:10 2012 From: francesc at continuum.io (Francesc Alted) Date: Thu, 23 Feb 2012 05:40:10 -0600 Subject: [Numpy-discussion] np.longlong casts to int In-Reply-To: <4F460182.5070004@crans.org> References: <4F460182.5070004@crans.org> Message-ID: <4A41A25D-7CBF-4D5A-A8C6-B2C495F7D302@continuum.io> On Feb 23, 2012, at 3:06 AM, Pierre Haessig wrote: > Hi, > Le 23/02/2012 02:24, Matthew Brett a ?crit : >> Luckily I was in fact using longdouble in the live code, > I had never "exotic" floating point precision, so thanks for your post > which made me take a look at docstring and documentation. > > If I got it right from the docstring, 'np.longdouble', 'np.longfloat' > are all in fact 'np.float128'. > (numpy 1.5) That in fact depends on the platform you are using. Typically, for 32-bit platforms, 'np.longfloat' and 'np.longdouble' are bound to 'np.float96', while in 64-bit are to 'np.float128'. > However, I was surprised that float128 is not mentioned in the array of > available types in the user guide. > http://docs.scipy.org/doc/numpy/user/basics.types.html > Is there a specific reason for this absence, or is just about visiting > the documentation wiki ;-) ? The reason is most probably that you cannot get a float96 or float128 whenever you want (depends on your architecture), so adding these types to the manual could be misleading. However, I'd advocate to document them while warning about platform portability issues. > Additionally, I don't know what are the writing guidelines of the user > guide, but would it make sense to add some "new numpy 1.x" messages as > in the Python doc. I'm thinking here of np.float16. I know it exists > from messages on this mailing list but my 1.5 don't have it. float16 was introduced in NumPy 1.6, IIRC. > PS : I found float128 mentionned in the reference > http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html#built-in-scalar-types > However, it is not as easily readable as the user guide (which makes > sense !). > > Does the following statements mean that those types are not available on > all platforms ? > float96 96 bits, platform? > float128 128 bits, platform? Exactly. I'd update this to read: float96 96 bits. Only available on 32-bit (i386) platforms. float128 128 bits. Only available on 64-bit (AMD64) platforms. -- Francesc Alted From njs at pobox.com Thu Feb 23 06:43:51 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 23 Feb 2012 11:43:51 +0000 Subject: [Numpy-discussion] np.longlong casts to int In-Reply-To: <4A41A25D-7CBF-4D5A-A8C6-B2C495F7D302@continuum.io> References: <4F460182.5070004@crans.org> <4A41A25D-7CBF-4D5A-A8C6-B2C495F7D302@continuum.io> Message-ID: On Thu, Feb 23, 2012 at 11:40 AM, Francesc Alted wrote: > Exactly. ?I'd update this to read: > > float96 ? ?96 bits. ?Only available on 32-bit (i386) platforms. > float128 ?128 bits. ?Only available on 64-bit (AMD64) platforms. Except float96 is actually 80 bits. (Usually?) Plus some padding... -- Nathaniel From pierre.haessig at crans.org Thu Feb 23 07:06:03 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Thu, 23 Feb 2012 13:06:03 +0100 Subject: [Numpy-discussion] np.longlong casts to int In-Reply-To: <4A41A25D-7CBF-4D5A-A8C6-B2C495F7D302@continuum.io> References: <4F460182.5070004@crans.org> <4A41A25D-7CBF-4D5A-A8C6-B2C495F7D302@continuum.io> Message-ID: <4F462BAB.2070709@crans.org> Le 23/02/2012 12:40, Francesc Alted a ?crit : >> However, I was surprised that float128 is not mentioned in the array of >> > available types in the user guide. >> > http://docs.scipy.org/doc/numpy/user/basics.types.html >> > Is there a specific reason for this absence, or is just about visiting >> > the documentation wiki ;-) ? > The reason is most probably that you cannot get a float96 or float128 whenever you want (depends on your architecture), so adding these types to the manual could be misleading. However, I'd advocate to document them while warning about platform portability issues >> Does the following statements mean that those types are not available on >> > all platforms ? >> > float96 96 bits, platform? >> > float128 128 bits, platform? > Exactly. I'd update this to read: > > float96 96 bits. Only available on 32-bit (i386) platforms. > float128 128 bits. Only available on 64-bit (AMD64) platforms. > Thanks for the enlightenment ! I was not aware of this 96 bits <-> 128 bits relationship. -- Pierre -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From francesc at continuum.io Thu Feb 23 07:06:48 2012 From: francesc at continuum.io (Francesc Alted) Date: Thu, 23 Feb 2012 06:06:48 -0600 Subject: [Numpy-discussion] np.longlong casts to int In-Reply-To: References: <4F460182.5070004@crans.org> <4A41A25D-7CBF-4D5A-A8C6-B2C495F7D302@continuum.io> Message-ID: <69F4FC8C-896E-4401-BC11-374E49CEBF07@continuum.io> On Feb 23, 2012, at 5:43 AM, Nathaniel Smith wrote: > On Thu, Feb 23, 2012 at 11:40 AM, Francesc Alted wrote: >> Exactly. I'd update this to read: >> >> float96 96 bits. Only available on 32-bit (i386) platforms. >> float128 128 bits. Only available on 64-bit (AMD64) platforms. > > Except float96 is actually 80 bits. (Usually?) Plus some padding? Good point. The thing is that they actually use 96 bit for storage purposes (this is due to alignment requirements). Another quirk related with this is that MSVC automatically maps long double to 64-bit doubles: http://msdn.microsoft.com/en-us/library/9cx8xs15.aspx Not sure on why they did that (portability issues?). -- Francesc Alted From francesc at continuum.io Thu Feb 23 07:23:29 2012 From: francesc at continuum.io (Francesc Alted) Date: Thu, 23 Feb 2012 06:23:29 -0600 Subject: [Numpy-discussion] np.longlong casts to int In-Reply-To: <69F4FC8C-896E-4401-BC11-374E49CEBF07@continuum.io> References: <4F460182.5070004@crans.org> <4A41A25D-7CBF-4D5A-A8C6-B2C495F7D302@continuum.io> <69F4FC8C-896E-4401-BC11-374E49CEBF07@continuum.io> Message-ID: On Feb 23, 2012, at 6:06 AM, Francesc Alted wrote: > On Feb 23, 2012, at 5:43 AM, Nathaniel Smith wrote: > >> On Thu, Feb 23, 2012 at 11:40 AM, Francesc Alted wrote: >>> Exactly. I'd update this to read: >>> >>> float96 96 bits. Only available on 32-bit (i386) platforms. >>> float128 128 bits. Only available on 64-bit (AMD64) platforms. >> >> Except float96 is actually 80 bits. (Usually?) Plus some padding? > > Good point. The thing is that they actually use 96 bit for storage purposes (this is due to alignment requirements). > > Another quirk related with this is that MSVC automatically maps long double to 64-bit doubles: > > http://msdn.microsoft.com/en-us/library/9cx8xs15.aspx > > Not sure on why they did that (portability issues?). Hmm, yet another quirk (this time in NumPy itself). On 32-bit platforms: In [16]: np.longdouble Out[16]: numpy.float96 In [17]: np.finfo(np.longdouble).eps Out[17]: 1.084202172485504434e-19 while on 64-bit ones: In [8]: np.longdouble Out[8]: numpy.float128 In [9]: np.finfo(np.longdouble).eps Out[9]: 1.084202172485504434e-19 i.e. NumPy is saying that the eps (machine epsilon) is the same on both platforms, despite the fact that one uses 80-bit precision and the other 128-bit precision. For the 80-bit, the eps should be (): In [5]: 1 / 2**63. Out[5]: 1.0842021724855044e-19 [http://en.wikipedia.org/wiki/Extended_precision] which is correctly stated by NumPy, while for 128-bit (quad precision), eps should be: In [6]: 1 / 2**113. Out[6]: 9.62964972193618e-35 [http://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format] If nobody objects, I'll file a bug about this. -- Francesc Alted From kikocorreoso at gmail.com Thu Feb 23 07:39:14 2012 From: kikocorreoso at gmail.com (Kiko) Date: Thu, 23 Feb 2012 13:39:14 +0100 Subject: [Numpy-discussion] python geospatial package? In-Reply-To: References: Message-ID: 2012/2/23 Vincent Schut > On 02/22/2012 10:45 PM, Chao YUE wrote: > > Hi all, > > > > Is anyone using some python geospatial package that can do jobs like > > intersection, etc. the job is like you automatically extract a region > > on a global map etc. > > > > thanks and cheers, > > > > Chao > > Depending what you want to do: Shapely, GDAL/OGR, pyproj, Mapnik, Basemap,... -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaakko.luttinen at aalto.fi Thu Feb 23 08:50:36 2012 From: jaakko.luttinen at aalto.fi (Jaakko Luttinen) Date: Thu, 23 Feb 2012 15:50:36 +0200 Subject: [Numpy-discussion] Special matrices with structure? Message-ID: <4F46442C.7090100@aalto.fi> Hi! I was wondering whether it would be easy/possible/reasonable to have classes for arrays that have special structure in order to use less memory and speed up some computations? For instance: - symmetric matrix could be stored in almost half the memory required by a non-symmetric matrix - diagonal matrix only needs to store the diagonal vector - Toeplitz matrix only needs to store one or two vectors - sparse matrix only needs to store non-zero elements (some implementations in scipy.sparse) - and so on If such classes were implemented, it would be nice if they worked with numpy functions (dot, diag, ...) and operations (+, *, +=, ...) easily. I believe this has been discussed before but google didn't help a lot.. Regards, Jaakko From matthew.brett at gmail.com Thu Feb 23 11:26:52 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 23 Feb 2012 08:26:52 -0800 Subject: [Numpy-discussion] np.longlong casts to int In-Reply-To: References: <4F460182.5070004@crans.org> <4A41A25D-7CBF-4D5A-A8C6-B2C495F7D302@continuum.io> <69F4FC8C-896E-4401-BC11-374E49CEBF07@continuum.io> Message-ID: Hi, On Thu, Feb 23, 2012 at 4:23 AM, Francesc Alted wrote: > On Feb 23, 2012, at 6:06 AM, Francesc Alted wrote: >> On Feb 23, 2012, at 5:43 AM, Nathaniel Smith wrote: >> >>> On Thu, Feb 23, 2012 at 11:40 AM, Francesc Alted wrote: >>>> Exactly. ?I'd update this to read: >>>> >>>> float96 ? ?96 bits. ?Only available on 32-bit (i386) platforms. >>>> float128 ?128 bits. ?Only available on 64-bit (AMD64) platforms. >>> >>> Except float96 is actually 80 bits. (Usually?) Plus some padding? >> >> Good point. ?The thing is that they actually use 96 bit for storage purposes (this is due to alignment requirements). >> >> Another quirk related with this is that MSVC automatically maps long double to 64-bit doubles: >> >> http://msdn.microsoft.com/en-us/library/9cx8xs15.aspx >> >> Not sure on why they did that (portability issues?). > > Hmm, yet another quirk (this time in NumPy itself). ?On 32-bit platforms: > > In [16]: np.longdouble > Out[16]: numpy.float96 > > In [17]: np.finfo(np.longdouble).eps > Out[17]: 1.084202172485504434e-19 > > while on 64-bit ones: > > In [8]: np.longdouble > Out[8]: numpy.float128 > > In [9]: np.finfo(np.longdouble).eps > Out[9]: 1.084202172485504434e-19 > > i.e. NumPy is saying that the eps (machine epsilon) is the same on both platforms, despite the fact that one uses 80-bit precision and the other 128-bit precision. ?For the 80-bit, the eps should be (): > > In [5]: 1 / 2**63. > Out[5]: 1.0842021724855044e-19 > > [http://en.wikipedia.org/wiki/Extended_precision] > > which is correctly stated by NumPy, while for 128-bit (quad precision), eps should be: > > In [6]: 1 / 2**113. > Out[6]: 9.62964972193618e-35 > > [http://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format] > > If nobody objects, I'll file a bug about this. There was half a proposal for renaming these guys in the interests of clarity: http://mail.scipy.org/pipermail/numpy-discussion/2011-October/058820.html I'd be happy to write this up as a NEP. Best, Matthew From charlesr.harris at gmail.com Thu Feb 23 11:28:12 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 23 Feb 2012 09:28:12 -0700 Subject: [Numpy-discussion] np.longlong casts to int In-Reply-To: References: <4F460182.5070004@crans.org> <4A41A25D-7CBF-4D5A-A8C6-B2C495F7D302@continuum.io> <69F4FC8C-896E-4401-BC11-374E49CEBF07@continuum.io> Message-ID: On Thu, Feb 23, 2012 at 5:23 AM, Francesc Alted wrote: > On Feb 23, 2012, at 6:06 AM, Francesc Alted wrote: > > On Feb 23, 2012, at 5:43 AM, Nathaniel Smith wrote: > > > >> On Thu, Feb 23, 2012 at 11:40 AM, Francesc Alted > wrote: > >>> Exactly. I'd update this to read: > >>> > >>> float96 96 bits. Only available on 32-bit (i386) platforms. > >>> float128 128 bits. Only available on 64-bit (AMD64) platforms. > >> > >> Except float96 is actually 80 bits. (Usually?) Plus some padding? > > > > Good point. The thing is that they actually use 96 bit for storage > purposes (this is due to alignment requirements). > > > > Another quirk related with this is that MSVC automatically maps long > double to 64-bit doubles: > > > > http://msdn.microsoft.com/en-us/library/9cx8xs15.aspx > > > > Not sure on why they did that (portability issues?). > > Hmm, yet another quirk (this time in NumPy itself). On 32-bit platforms: > > In [16]: np.longdouble > Out[16]: numpy.float96 > > In [17]: np.finfo(np.longdouble).eps > Out[17]: 1.084202172485504434e-19 > > while on 64-bit ones: > > In [8]: np.longdouble > Out[8]: numpy.float128 > > In [9]: np.finfo(np.longdouble).eps > Out[9]: 1.084202172485504434e-19 > > i.e. NumPy is saying that the eps (machine epsilon) is the same on both > platforms, despite the fact that one uses 80-bit precision and the other > 128-bit precision. For the 80-bit, the eps should be (): > > That's correct. They are both extended precision (80 bits), but aligned on 32bit/64bit boundaries respectively. Sun provides a true quad precision, also called float128, while on PPC long double is an odd combination of two doubles. Chuck > In [5]: 1 / 2**63. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From francesc at continuum.io Thu Feb 23 11:59:18 2012 From: francesc at continuum.io (Francesc Alted) Date: Thu, 23 Feb 2012 10:59:18 -0600 Subject: [Numpy-discussion] np.longlong casts to int In-Reply-To: References: <4F460182.5070004@crans.org> <4A41A25D-7CBF-4D5A-A8C6-B2C495F7D302@continuum.io> <69F4FC8C-896E-4401-BC11-374E49CEBF07@continuum.io> Message-ID: On Feb 23, 2012, at 10:26 AM, Matthew Brett wrote: > Hi, > > On Thu, Feb 23, 2012 at 4:23 AM, Francesc Alted wrote: >> On Feb 23, 2012, at 6:06 AM, Francesc Alted wrote: >>> On Feb 23, 2012, at 5:43 AM, Nathaniel Smith wrote: >>> >>>> On Thu, Feb 23, 2012 at 11:40 AM, Francesc Alted wrote: >>>>> Exactly. I'd update this to read: >>>>> >>>>> float96 96 bits. Only available on 32-bit (i386) platforms. >>>>> float128 128 bits. Only available on 64-bit (AMD64) platforms. >>>> >>>> Except float96 is actually 80 bits. (Usually?) Plus some padding? >>> >>> Good point. The thing is that they actually use 96 bit for storage purposes (this is due to alignment requirements). >>> >>> Another quirk related with this is that MSVC automatically maps long double to 64-bit doubles: >>> >>> http://msdn.microsoft.com/en-us/library/9cx8xs15.aspx >>> >>> Not sure on why they did that (portability issues?). >> >> Hmm, yet another quirk (this time in NumPy itself). On 32-bit platforms: >> >> In [16]: np.longdouble >> Out[16]: numpy.float96 >> >> In [17]: np.finfo(np.longdouble).eps >> Out[17]: 1.084202172485504434e-19 >> >> while on 64-bit ones: >> >> In [8]: np.longdouble >> Out[8]: numpy.float128 >> >> In [9]: np.finfo(np.longdouble).eps >> Out[9]: 1.084202172485504434e-19 >> >> i.e. NumPy is saying that the eps (machine epsilon) is the same on both platforms, despite the fact that one uses 80-bit precision and the other 128-bit precision. For the 80-bit, the eps should be (): >> >> In [5]: 1 / 2**63. >> Out[5]: 1.0842021724855044e-19 >> >> [http://en.wikipedia.org/wiki/Extended_precision] >> >> which is correctly stated by NumPy, while for 128-bit (quad precision), eps should be: >> >> In [6]: 1 / 2**113. >> Out[6]: 9.62964972193618e-35 >> >> [http://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format] >> >> If nobody objects, I'll file a bug about this. > > There was half a proposal for renaming these guys in the interests of clarity: > > http://mail.scipy.org/pipermail/numpy-discussion/2011-October/058820.html Oh, my bad. thanks for pointing this out! > I'd be happy to write this up as a NEP. Or even better, adapt the docs to say something like: float96 96 bits storage, 80-bit precision. Only available on 32-bit platforms. float128 128 bits storage, 80-bit precision. Only available on 64-bit platforms. -- Francesc Alted From d.s.seljebotn at astro.uio.no Thu Feb 23 12:47:58 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 23 Feb 2012 09:47:58 -0800 Subject: [Numpy-discussion] Special matrices with structure? In-Reply-To: <4F46442C.7090100@aalto.fi> References: <4F46442C.7090100@aalto.fi> Message-ID: <4F467BCE.10005@astro.uio.no> On 02/23/2012 05:50 AM, Jaakko Luttinen wrote: > Hi! > > I was wondering whether it would be easy/possible/reasonable to have > classes for arrays that have special structure in order to use less > memory and speed up some computations? > > For instance: > - symmetric matrix could be stored in almost half the memory required by > a non-symmetric matrix > - diagonal matrix only needs to store the diagonal vector > - Toeplitz matrix only needs to store one or two vectors > - sparse matrix only needs to store non-zero elements (some > implementations in scipy.sparse) > - and so on > > If such classes were implemented, it would be nice if they worked with > numpy functions (dot, diag, ...) and operations (+, *, +=, ...) easily. > > I believe this has been discussed before but google didn't help a lot.. I'm currently working on a library for this. The catch is that that I'm doing it as a work project, not a hobby project -- so only the features I strictly need for my PhD thesis really gets priority. That means that it will only really be developed for use on clusters/MPI, not so much for single-node LAPACK. I'd love to pair up with someone who could make sure the library is more generally useful, which is my real goal (if I ever get spare time again...). The general idea of my approach is to have lazily evaluated expressions: A = # ... diagonal matrix B = # ... dense matrix L = (give(A) + give(B)).cholesky() # only "symbolic"! # give means: overwrite if you want to explain(L) # prints what it will do if it computes L L = compute(L) # does the computation What the code above would do is: - First, determine that the fastest way of doing + is to take the elements in A and += them inplace to the diagonal in B - Then, do the Cholesky in Note that if you change the types of. The goal is to facilitate writing general code which doesn't know the types of the matrices, yet still string together the optimal chain of calls. This requires waiting with evaluation until an explicit compute call (which essentially does a "compilation"). Adding matrix types and operations is done through pattern matching. This one can provide code like this to provide optimized code for wierd special cases: @computation(RowMajorDense + ColMajorDense, RowMajorDense) def add(a, b): # provide an optimized case for row-major + col-major, resulting # in row-major @cost(add) def add_cost(a, b): # provide estimate for cost of the above routine The compiler looks at all the provided @computation and should determines the cheapest path. My code is at https://github.com/dagss/oomatrix, but I certainly haven't done anything yet to make the codebase useful to anyone but me, so you probably shouldn't look at it, but rather ask me here. Dag From d.s.seljebotn at astro.uio.no Thu Feb 23 12:49:06 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 23 Feb 2012 09:49:06 -0800 Subject: [Numpy-discussion] Special matrices with structure? In-Reply-To: <4F467BCE.10005@astro.uio.no> References: <4F46442C.7090100@aalto.fi> <4F467BCE.10005@astro.uio.no> Message-ID: <4F467C12.6070302@astro.uio.no> On 02/23/2012 09:47 AM, Dag Sverre Seljebotn wrote: > On 02/23/2012 05:50 AM, Jaakko Luttinen wrote: >> Hi! >> >> I was wondering whether it would be easy/possible/reasonable to have >> classes for arrays that have special structure in order to use less >> memory and speed up some computations? >> >> For instance: >> - symmetric matrix could be stored in almost half the memory required by >> a non-symmetric matrix >> - diagonal matrix only needs to store the diagonal vector >> - Toeplitz matrix only needs to store one or two vectors >> - sparse matrix only needs to store non-zero elements (some >> implementations in scipy.sparse) >> - and so on >> >> If such classes were implemented, it would be nice if they worked with >> numpy functions (dot, diag, ...) and operations (+, *, +=, ...) easily. >> >> I believe this has been discussed before but google didn't help a lot.. > > I'm currently working on a library for this. The catch is that that I'm > doing it as a work project, not a hobby project -- so only the features > I strictly need for my PhD thesis really gets priority. That means that > it will only really be developed for use on clusters/MPI, not so much > for single-node LAPACK. > > I'd love to pair up with someone who could make sure the library is more > generally useful, which is my real goal (if I ever get spare time > again...). > > The general idea of my approach is to have lazily evaluated expressions: > > A = # ... diagonal matrix > B = # ... dense matrix > > L = (give(A) + give(B)).cholesky() # only "symbolic"! > # give means: overwrite if you want to > > explain(L) # prints what it will do if it computes L > L = compute(L) # does the computation > > What the code above would do is: > > - First, determine that the fastest way of doing + is to take the > elements in A and += them inplace to the diagonal in B > - Then, do the Cholesky in Sorry: Then, do the Cholesky inplace in the buffer of B, and use that for L. Dag > > Note that if you change the types of. The goal is to facilitate writing > general code which doesn't know the types of the matrices, yet still > string together the optimal chain of calls. This requires waiting with > evaluation until an explicit compute call (which essentially does a > "compilation"). > > Adding matrix types and operations is done through pattern matching. > This one can provide code like this to provide optimized code for wierd > special cases: > > @computation(RowMajorDense + ColMajorDense, RowMajorDense) > def add(a, b): > # provide an optimized case for row-major + col-major, resulting > # in row-major > > @cost(add) > def add_cost(a, b): > # provide estimate for cost of the above routine > > The compiler looks at all the provided @computation and should > determines the cheapest path. > > My code is at https://github.com/dagss/oomatrix, but I certainly haven't > done anything yet to make the codebase useful to anyone but me, so you > probably shouldn't look at it, but rather ask me here. > > Dag From pierre.haessig at crans.org Thu Feb 23 13:11:53 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Thu, 23 Feb 2012 19:11:53 +0100 Subject: [Numpy-discussion] np.longlong casts to int In-Reply-To: References: <4F460182.5070004@crans.org> <4A41A25D-7CBF-4D5A-A8C6-B2C495F7D302@continuum.io> <69F4FC8C-896E-4401-BC11-374E49CEBF07@continuum.io> Message-ID: <4F468169.4050909@crans.org> Le 23/02/2012 17:28, Charles R Harris a ?crit : > That's correct. They are both extended precision (80 bits), but > aligned on 32bit/64bit boundaries respectively. Sun provides a true > quad precision, also called float128, while on PPC long double is an > odd combination of two doubles. This is insane ! ;-) -- Pierre -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From matthew.brett at gmail.com Thu Feb 23 13:42:38 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 23 Feb 2012 10:42:38 -0800 Subject: [Numpy-discussion] np.longlong casts to int In-Reply-To: <4F468169.4050909@crans.org> References: <4F460182.5070004@crans.org> <4A41A25D-7CBF-4D5A-A8C6-B2C495F7D302@continuum.io> <69F4FC8C-896E-4401-BC11-374E49CEBF07@continuum.io> <4F468169.4050909@crans.org> Message-ID: Hi, On Thu, Feb 23, 2012 at 10:11 AM, Pierre Haessig wrote: > Le 23/02/2012 17:28, Charles R Harris a ?crit : >> That's correct. They are both extended precision (80 bits), but >> aligned on 32bit/64bit boundaries respectively. Sun provides a true >> quad precision, also called float128, while on PPC long double is an >> odd combination of two doubles. > This is insane ! ;-) I don't know if it's insane, but it is certainly very confusing, as this thread the previous one show. The question is, what would be less confusing? Best, Matthew From mwwiebe at gmail.com Thu Feb 23 13:45:21 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Thu, 23 Feb 2012 10:45:21 -0800 Subject: [Numpy-discussion] np.longlong casts to int In-Reply-To: References: <4F460182.5070004@crans.org> <4A41A25D-7CBF-4D5A-A8C6-B2C495F7D302@continuum.io> <69F4FC8C-896E-4401-BC11-374E49CEBF07@continuum.io> <4F468169.4050909@crans.org> Message-ID: On Thu, Feb 23, 2012 at 10:42 AM, Matthew Brett wrote: > Hi, > > On Thu, Feb 23, 2012 at 10:11 AM, Pierre Haessig > wrote: > > Le 23/02/2012 17:28, Charles R Harris a ?crit : > >> That's correct. They are both extended precision (80 bits), but > >> aligned on 32bit/64bit boundaries respectively. Sun provides a true > >> quad precision, also called float128, while on PPC long double is an > >> odd combination of two doubles. > > This is insane ! ;-) > > I don't know if it's insane, but it is certainly very confusing, as > this thread the previous one show. > > The question is, what would be less confusing? > One approach would be to never alias longdouble as float###. Especially float128 seems to imply that it's the IEEE standard binary128 float, which it is on some platforms, but not on most. Cheers, Mark > > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Thu Feb 23 13:55:08 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 23 Feb 2012 10:55:08 -0800 Subject: [Numpy-discussion] np.longlong casts to int In-Reply-To: References: <4F460182.5070004@crans.org> <4A41A25D-7CBF-4D5A-A8C6-B2C495F7D302@continuum.io> <69F4FC8C-896E-4401-BC11-374E49CEBF07@continuum.io> <4F468169.4050909@crans.org> Message-ID: Hi, On Thu, Feb 23, 2012 at 10:45 AM, Mark Wiebe wrote: > On Thu, Feb 23, 2012 at 10:42 AM, Matthew Brett > wrote: >> >> Hi, >> >> On Thu, Feb 23, 2012 at 10:11 AM, Pierre Haessig >> wrote: >> > Le 23/02/2012 17:28, Charles R Harris a ?crit : >> >> That's correct. They are both extended precision (80 bits), but >> >> aligned on 32bit/64bit boundaries respectively. Sun provides a true >> >> quad precision, also called float128, while on PPC long double is an >> >> odd combination of two doubles. >> > This is insane ! ;-) >> >> I don't know if it's insane, but it is certainly very confusing, as >> this thread the previous one show. >> >> The question is, what would be less confusing? > > > One approach would be to never alias longdouble as float###. Especially > float128 seems to imply that it's the IEEE standard binary128 float, which > it is on some platforms, but not on most. It's virtually never IEEE binary128. Yarik Halchenko found a real one on an s/360 running Debian. Some docs seem to suggest there are Sun machines out there with binary128, as Chuck said. So the vast majority of numpy users with float128 have Intel 80-bit, and some have PPC twin-float. Do we all agree then that 'float128' is a bad name? In the last thread, I had the feeling there was some consensus on renaming Intel 80s to: float128 -> float80_128 float96 -> float80_96 For those platforms implementing it, maybe float128 -> float128_ieee Maybe for PPC: float128 -> float_pair_128 and, personally, I still think it would be preferable, and less confusing, to encourage use of 'longdouble' instead of the various platform specific aliases. What do you think? Best, Matthew From mwwiebe at gmail.com Thu Feb 23 14:08:48 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Thu, 23 Feb 2012 11:08:48 -0800 Subject: [Numpy-discussion] np.longlong casts to int In-Reply-To: References: <4F460182.5070004@crans.org> <4A41A25D-7CBF-4D5A-A8C6-B2C495F7D302@continuum.io> <69F4FC8C-896E-4401-BC11-374E49CEBF07@continuum.io> <4F468169.4050909@crans.org> Message-ID: On Thu, Feb 23, 2012 at 10:55 AM, Matthew Brett wrote: > Hi, > > On Thu, Feb 23, 2012 at 10:45 AM, Mark Wiebe wrote: > > On Thu, Feb 23, 2012 at 10:42 AM, Matthew Brett > > > wrote: > >> > >> Hi, > >> > >> On Thu, Feb 23, 2012 at 10:11 AM, Pierre Haessig > >> wrote: > >> > Le 23/02/2012 17:28, Charles R Harris a ?crit : > >> >> That's correct. They are both extended precision (80 bits), but > >> >> aligned on 32bit/64bit boundaries respectively. Sun provides a true > >> >> quad precision, also called float128, while on PPC long double is an > >> >> odd combination of two doubles. > >> > This is insane ! ;-) > >> > >> I don't know if it's insane, but it is certainly very confusing, as > >> this thread the previous one show. > >> > >> The question is, what would be less confusing? > > > > > > One approach would be to never alias longdouble as float###. Especially > > float128 seems to imply that it's the IEEE standard binary128 float, > which > > it is on some platforms, but not on most. > > It's virtually never IEEE binary128. Yarik Halchenko found a real one > on an s/360 running Debian. Some docs seem to suggest there are Sun > machines out there with binary128, as Chuck said. So the vast > majority of numpy users with float128 have Intel 80-bit, and some have > PPC twin-float. > > Do we all agree then that 'float128' is a bad name? > > In the last thread, I had the feeling there was some consensus on > renaming Intel 80s to: > > float128 -> float80_128 > float96 -> float80_96 > > For those platforms implementing it, maybe > > float128 -> float128_ieee > > Maybe for PPC: > > float128 -> float_pair_128 > > and, personally, I still think it would be preferable, and less > confusing, to encourage use of 'longdouble' instead of the various > platform specific aliases. > +1, I think it's good for its name to correspond to the name in C/C++, so that when people search for information on it they will find the relevant information more easily. With a bunch of NumPy-specific aliases, it just creates more hassle for everybody. -Mark > What do you think? > > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wesmckinn at gmail.com Thu Feb 23 14:32:13 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Thu, 23 Feb 2012 14:32:13 -0500 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers Message-ID: dear all, I haven't read all 180 e-mails, but I didn't see this on Travis's initial list. All of the existing flat file reading solutions I have seen are not suitable for many applications, and they compare very unfavorably to tools present in other languages, like R. Here are some of the main issues I see: - Memory usage: creating millions of Python objects when reading a large file results in horrendously bad memory utilization, which the Python interpreter is loathe to return to the operating system. Any solution using the CSV module (like pandas's parsers-- which are a lot faster than anything else I know of in Python) suffers from this problem because the data come out boxed in tuples of PyObjects. Try loading a 1,000,000 x 20 CSV file into a structured array using np.genfromtxt or into a DataFrame using pandas.read_csv and you will immediately see the problem. R, by contrast, uses very little memory. - Performance: post-processing of Python objects results in poor performance. Also, for the actual parsing, anything regular expression based (like the loadtable effort over the summer, all apologies to those who worked on it), is doomed to failure. I think having a tool with a high degree of compatibility and intelligence for parsing unruly small files does make sense though, but it's not appropriate for large, well-behaved files. - Need to "factorize": as soon as there is an enum dtype in NumPy, we will want to enable the file parsers for structured arrays and DataFrame to be able to "factorize" / convert to enum certain columns (for example, all string columns) during the parsing process, and not afterward. This is very important for enabling fast groupby on large datasets and reducing unnecessary memory usage up front (imagine a column with a million values, with only 10 unique values occurring). This would be trivial to implement using a C hash table implementation like khash.h To be clear: I'm going to do this eventually whether or not it happens in NumPy because it's an existing problem for heavy pandas users. I see no reason why the code can't emit structured arrays, too, so we might as well have a common library component that I can use in pandas and specialize to the DataFrame internal structure. It seems clear to me that this work needs to be done at the lowest level possible, probably all in C (or C++?) or maybe Cython plus C utilities. If anyone wants to get involved in this particular problem right now, let me know! best, Wes From ndbecker2 at gmail.com Thu Feb 23 14:33:37 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Thu, 23 Feb 2012 14:33:37 -0500 Subject: [Numpy-discussion] mkl usage Message-ID: Is mkl only used for linear algebra? Will it speed up e.g., elementwise transendental functions? From francesc at continuum.io Thu Feb 23 14:44:25 2012 From: francesc at continuum.io (Francesc Alted) Date: Thu, 23 Feb 2012 13:44:25 -0600 Subject: [Numpy-discussion] mkl usage In-Reply-To: References: Message-ID: On Feb 23, 2012, at 1:33 PM, Neal Becker wrote: > Is mkl only used for linear algebra? Will it speed up e.g., elementwise > transendental functions? Yes, MKL comes with VML that has this type of optimizations: http://software.intel.com/sites/products/documentation/hpc/mkl/vml/vmldata.htm Also, see some speedups in a numexpr linked against MKL here: http://code.google.com/p/numexpr/wiki/NumexprVML See also how native multi-threading implementation in numexpr beats MKL's one (at least for this particular example). -- Francesc Alted From pav at iki.fi Thu Feb 23 14:47:53 2012 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 23 Feb 2012 20:47:53 +0100 Subject: [Numpy-discussion] mkl usage In-Reply-To: References: Message-ID: 23.02.2012 20:44, Francesc Alted kirjoitti: > On Feb 23, 2012, at 1:33 PM, Neal Becker wrote: > >> Is mkl only used for linear algebra? Will it speed up e.g., elementwise >> transendental functions? > > Yes, MKL comes with VML that has this type of optimizations: And also no, in the sense that Numpy and Scipy don't use VML. -- Pauli Virtanen From pav at iki.fi Thu Feb 23 14:53:47 2012 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 23 Feb 2012 20:53:47 +0100 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: Message-ID: Hi, 23.02.2012 20:32, Wes McKinney kirjoitti: [clip] > To be clear: I'm going to do this eventually whether or not it > happens in NumPy because it's an existing problem for heavy > pandas users. I see no reason why the code can't emit structured > arrays, too, so we might as well have a common library component > that I can use in pandas and specialize to the DataFrame internal > structure. If you do this, one useful aim could be to design the code such that it can be used in loadtxt, at least as a fast path for common cases. I'd really like to avoid increasing the number of APIs for text file loading. -- Pauli Virtanen From pierre.haessig at crans.org Thu Feb 23 14:56:50 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Thu, 23 Feb 2012 20:56:50 +0100 Subject: [Numpy-discussion] np.longlong casts to int In-Reply-To: References: <4F460182.5070004@crans.org> <4A41A25D-7CBF-4D5A-A8C6-B2C495F7D302@continuum.io> <69F4FC8C-896E-4401-BC11-374E49CEBF07@continuum.io> <4F468169.4050909@crans.org> Message-ID: <4F469A02.6060205@crans.org> Le 23/02/2012 20:08, Mark Wiebe a ?crit : > +1, I think it's good for its name to correspond to the name in C/C++, > so that when people search for information on it they will find the > relevant information more easily. With a bunch of NumPy-specific > aliases, it just creates more hassle for everybody. I don't fully agree. First, this assumes that people were "C-educated", at least a bit. I got some C education, but I spent most of my scientific programming time sitting in front of Python, Matlab, and a bit of R (in that order). In this context, double, floats, long and short are all esoteric incantation. Second the C/C++ names are very unprecise with regards to their memory content, and sometimes platform dependent. On the other "float64" is very informative. Also, how do these name scale with extended precision (where it's available... ;-) ) ? I wonder what may come after longdoulble/longfloat : what about hyperlongsuperfancyextendeddoublefloat ? I feel float1024 simpler ;-) Now, because of all the specifities you described, this seems to be a complex topic. I guess that good & documented aliases help people understand this very complexity. Best, Pierre -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From travis at continuum.io Thu Feb 23 15:08:52 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 23 Feb 2012 14:08:52 -0600 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: Message-ID: This is actually on my short-list as well --- it just didn't make it to the list. In fact, we have someone starting work on it this week. It is his first project so it will take him a little time to get up to speed on it, but he will contact Wes and work with him and report progress to this list. Integration with np.loadtxt is a high-priority. I think loadtxt is now the 3rd or 4th "text-reading" interface I've seen in NumPy. I have no interest in making a new one if we can avoid it. But, we do need to make it faster with less memory overhead for simple cases like Wes describes. -Travis On Feb 23, 2012, at 1:53 PM, Pauli Virtanen wrote: > Hi, > > 23.02.2012 20:32, Wes McKinney kirjoitti: > [clip] >> To be clear: I'm going to do this eventually whether or not it >> happens in NumPy because it's an existing problem for heavy >> pandas users. I see no reason why the code can't emit structured >> arrays, too, so we might as well have a common library component >> that I can use in pandas and specialize to the DataFrame internal >> structure. > > If you do this, one useful aim could be to design the code such that it > can be used in loadtxt, at least as a fast path for common cases. I'd > really like to avoid increasing the number of APIs for text file loading. > > -- > Pauli Virtanen > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From wesmckinn at gmail.com Thu Feb 23 15:15:54 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Thu, 23 Feb 2012 15:15:54 -0500 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: Message-ID: On Thu, Feb 23, 2012 at 3:08 PM, Travis Oliphant wrote: > This is actually on my short-list as well --- it just didn't make it to the list. > > In fact, we have someone starting work on it this week. ?It is his first project so it will take him a little time to get up to speed on it, but he will contact Wes and work with him and report progress to this list. > > Integration with np.loadtxt is a high-priority. ?I think loadtxt is now the 3rd or 4th "text-reading" interface I've seen in NumPy. ?I have no interest in making a new one if we can avoid it. ? But, we do need to make it faster with less memory overhead for simple cases like Wes describes. > > -Travis Yeah, what I envision is just an infrastructural "parsing engine" to replace the pure Python guts of np.loadtxt, np.genfromtxt, and the csv module + Cython guts of pandas.read_{csv, table, excel}. It needs to be somewhat adaptable to some of the domain specific decisions of structured arrays vs. DataFrames-- like I use Python objects for strings, but one consequence of this is that I can "intern" strings (only one PyObject per unique string value occurring) where structured arrays cannot, so you get much better performance and memory usage that way. That's soon to change, though, I gather, at which point I'll almost definitely (!) move to pointer arrays instead of dtype=object arrays. - Wes > > > On Feb 23, 2012, at 1:53 PM, Pauli Virtanen wrote: > >> Hi, >> >> 23.02.2012 20:32, Wes McKinney kirjoitti: >> [clip] >>> To be clear: I'm going to do this eventually whether or not it >>> happens in NumPy because it's an existing problem for heavy >>> pandas users. I see no reason why the code can't emit structured >>> arrays, too, so we might as well have a common library component >>> that I can use in pandas and specialize to the DataFrame internal >>> structure. >> >> If you do this, one useful aim could be to design the code such that it >> can be used in loadtxt, at least as a fast path for common cases. I'd >> really like to avoid increasing the number of APIs for text file loading. >> >> -- >> Pauli Virtanen >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ndbecker2 at gmail.com Thu Feb 23 15:19:29 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Thu, 23 Feb 2012 15:19:29 -0500 Subject: [Numpy-discussion] mkl usage References: Message-ID: Pauli Virtanen wrote: > 23.02.2012 20:44, Francesc Alted kirjoitti: >> On Feb 23, 2012, at 1:33 PM, Neal Becker wrote: >> >>> Is mkl only used for linear algebra? Will it speed up e.g., elementwise >>> transendental functions? >> >> Yes, MKL comes with VML that has this type of optimizations: > > And also no, in the sense that Numpy and Scipy don't use VML. > My question is: "Should I purchase MKL?" To what extent will it speed up my existing python code, without my having to exert (much) effort? So that would be numpy/scipy. I'd entertain trying other things, if it wasn't much effort. From warren.weckesser at enthought.com Thu Feb 23 15:19:53 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Thu, 23 Feb 2012 14:19:53 -0600 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: Message-ID: On Thu, Feb 23, 2012 at 2:08 PM, Travis Oliphant wrote: > This is actually on my short-list as well --- it just didn't make it to > the list. > > In fact, we have someone starting work on it this week. It is his first > project so it will take him a little time to get up to speed on it, but he > will contact Wes and work with him and report progress to this list. > > Integration with np.loadtxt is a high-priority. I think loadtxt is now > the 3rd or 4th "text-reading" interface I've seen in NumPy. I have no > interest in making a new one if we can avoid it. But, we do need to make > it faster with less memory overhead for simple cases like Wes describes. > > -Travis > > I have a "proof of concept" CSV reader written in C (with a Cython wrapper). I'll put it on github this weekend. Warren > > On Feb 23, 2012, at 1:53 PM, Pauli Virtanen wrote: > > > Hi, > > > > 23.02.2012 20:32, Wes McKinney kirjoitti: > > [clip] > >> To be clear: I'm going to do this eventually whether or not it > >> happens in NumPy because it's an existing problem for heavy > >> pandas users. I see no reason why the code can't emit structured > >> arrays, too, so we might as well have a common library component > >> that I can use in pandas and specialize to the DataFrame internal > >> structure. > > > > If you do this, one useful aim could be to design the code such that it > > can be used in loadtxt, at least as a fast path for common cases. I'd > > really like to avoid increasing the number of APIs for text file loading. > > > > -- > > Pauli Virtanen > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From erin.sheldon at gmail.com Thu Feb 23 15:23:11 2012 From: erin.sheldon at gmail.com (Erin Sheldon) Date: Thu, 23 Feb 2012 15:23:11 -0500 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: Message-ID: <1330028193-sup-4426@rohan> Wes - I designed the recfile package to fill this need. It might be a start. Some features: - the ability to efficiently read any subset of the data without loading the whole file. - reads directly into a recarray, so no overheads. - object oriented interface, mimicking recarray slicing. - also supports writing Currently it is fixed-width fields only. It is C++, but wouldn't be hard to convert it C if that is a requirement. Also, it works for binary or ascii. http://code.google.com/p/recfile/ the trunk is pretty far past the most recent release. Erin Scott Sheldon Excerpts from Wes McKinney's message of Thu Feb 23 14:32:13 -0500 2012: > dear all, > > I haven't read all 180 e-mails, but I didn't see this on Travis's > initial list. > > All of the existing flat file reading solutions I have seen are > not suitable for many applications, and they compare very unfavorably > to tools present in other languages, like R. Here are some of the > main issues I see: > > - Memory usage: creating millions of Python objects when reading > a large file results in horrendously bad memory utilization, > which the Python interpreter is loathe to return to the > operating system. Any solution using the CSV module (like > pandas's parsers-- which are a lot faster than anything else I > know of in Python) suffers from this problem because the data > come out boxed in tuples of PyObjects. Try loading a 1,000,000 > x 20 CSV file into a structured array using np.genfromtxt or > into a DataFrame using pandas.read_csv and you will immediately > see the problem. R, by contrast, uses very little memory. > > - Performance: post-processing of Python objects results in poor > performance. Also, for the actual parsing, anything regular > expression based (like the loadtable effort over the summer, > all apologies to those who worked on it), is doomed to > failure. I think having a tool with a high degree of > compatibility and intelligence for parsing unruly small files > does make sense though, but it's not appropriate for large, > well-behaved files. > > - Need to "factorize": as soon as there is an enum dtype in > NumPy, we will want to enable the file parsers for structured > arrays and DataFrame to be able to "factorize" / convert to > enum certain columns (for example, all string columns) during > the parsing process, and not afterward. This is very important > for enabling fast groupby on large datasets and reducing > unnecessary memory usage up front (imagine a column with a > million values, with only 10 unique values occurring). This > would be trivial to implement using a C hash table > implementation like khash.h > > To be clear: I'm going to do this eventually whether or not it > happens in NumPy because it's an existing problem for heavy > pandas users. I see no reason why the code can't emit structured > arrays, too, so we might as well have a common library component > that I can use in pandas and specialize to the DataFrame internal > structure. > > It seems clear to me that this work needs to be done at the > lowest level possible, probably all in C (or C++?) or maybe > Cython plus C utilities. > > If anyone wants to get involved in this particular problem right > now, let me know! > > best, > Wes -- Erin Scott Sheldon Brookhaven National Laboratory From wesmckinn at gmail.com Thu Feb 23 15:24:28 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Thu, 23 Feb 2012 15:24:28 -0500 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: Message-ID: On Thu, Feb 23, 2012 at 3:19 PM, Warren Weckesser wrote: > > On Thu, Feb 23, 2012 at 2:08 PM, Travis Oliphant > wrote: >> >> This is actually on my short-list as well --- it just didn't make it to >> the list. >> >> In fact, we have someone starting work on it this week. ?It is his first >> project so it will take him a little time to get up to speed on it, but he >> will contact Wes and work with him and report progress to this list. >> >> Integration with np.loadtxt is a high-priority. ?I think loadtxt is now >> the 3rd or 4th "text-reading" interface I've seen in NumPy. ?I have no >> interest in making a new one if we can avoid it. ? But, we do need to make >> it faster with less memory overhead for simple cases like Wes describes. >> >> -Travis >> > > > I have a "proof of concept" CSV reader written in C (with a Cython > wrapper).? I'll put it on github this weekend. > > Warren Sweet, between this, Continuum folks, and me and my guys I think we can come up with something good and suits all our needs. We should set up some realistic performance test cases that we can monitor via vbench (wesm/vbench) while we're work on the project. - W > >> >> >> On Feb 23, 2012, at 1:53 PM, Pauli Virtanen wrote: >> >> > Hi, >> > >> > 23.02.2012 20:32, Wes McKinney kirjoitti: >> > [clip] >> >> To be clear: I'm going to do this eventually whether or not it >> >> happens in NumPy because it's an existing problem for heavy >> >> pandas users. I see no reason why the code can't emit structured >> >> arrays, too, so we might as well have a common library component >> >> that I can use in pandas and specialize to the DataFrame internal >> >> structure. >> > >> > If you do this, one useful aim could be to design the code such that it >> > can be used in loadtxt, at least as a fast path for common cases. I'd >> > really like to avoid increasing the number of APIs for text file >> > loading. >> > >> > -- >> > Pauli Virtanen >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From wesmckinn at gmail.com Thu Feb 23 15:24:44 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Thu, 23 Feb 2012 15:24:44 -0500 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: <1330028193-sup-4426@rohan> References: <1330028193-sup-4426@rohan> Message-ID: On Thu, Feb 23, 2012 at 3:23 PM, Erin Sheldon wrote: > Wes - > > I designed the recfile package to fill this need. ?It might be a start. > > Some features: > > ? ?- the ability to efficiently read any subset of the data without > ? ? ?loading the whole file. > ? ?- reads directly into a recarray, so no overheads. > ? ?- object oriented interface, mimicking recarray slicing. > ? ?- also supports writing > > Currently it is fixed-width fields only. ?It is C++, but wouldn't be > hard to convert it C if that is a requirement. ?Also, it works for > binary or ascii. > > ? ?http://code.google.com/p/recfile/ > > the trunk is pretty far past the most recent release. > > Erin Scott Sheldon Can you relicense as BSD-compatible? > Excerpts from Wes McKinney's message of Thu Feb 23 14:32:13 -0500 2012: >> dear all, >> >> I haven't read all 180 e-mails, but I didn't see this on Travis's >> initial list. >> >> All of the existing flat file reading solutions I have seen are >> not suitable for many applications, and they compare very unfavorably >> to tools present in other languages, like R. Here are some of the >> main issues I see: >> >> - Memory usage: creating millions of Python objects when reading >> ? a large file results in horrendously bad memory utilization, >> ? which the Python interpreter is loathe to return to the >> ? operating system. Any solution using the CSV module (like >> ? pandas's parsers-- which are a lot faster than anything else I >> ? know of in Python) suffers from this problem because the data >> ? come out boxed in tuples of PyObjects. Try loading a 1,000,000 >> ? x 20 CSV file into a structured array using np.genfromtxt or >> ? into a DataFrame using pandas.read_csv and you will immediately >> ? see the problem. R, by contrast, uses very little memory. >> >> - Performance: post-processing of Python objects results in poor >> ? performance. Also, for the actual parsing, anything regular >> ? expression based (like the loadtable effort over the summer, >> ? all apologies to those who worked on it), is doomed to >> ? failure. I think having a tool with a high degree of >> ? compatibility and intelligence for parsing unruly small files >> ? does make sense though, but it's not appropriate for large, >> ? well-behaved files. >> >> - Need to "factorize": as soon as there is an enum dtype in >> ? NumPy, we will want to enable the file parsers for structured >> ? arrays and DataFrame to be able to "factorize" / convert to >> ? enum certain columns (for example, all string columns) during >> ? the parsing process, and not afterward. This is very important >> ? for enabling fast groupby on large datasets and reducing >> ? unnecessary memory usage up front (imagine a column with a >> ? million values, with only 10 unique values occurring). This >> ? would be trivial to implement using a C hash table >> ? implementation like khash.h >> >> To be clear: I'm going to do this eventually whether or not it >> happens in NumPy because it's an existing problem for heavy >> pandas users. I see no reason why the code can't emit structured >> arrays, too, so we might as well have a common library component >> that I can use in pandas and specialize to the DataFrame internal >> structure. >> >> It seems clear to me that this work needs to be done at the >> lowest level possible, probably all in C (or C++?) or maybe >> Cython plus C utilities. >> >> If anyone wants to get involved in this particular problem right >> now, let me know! >> >> best, >> Wes > -- > Erin Scott Sheldon > Brookhaven National Laboratory From eric at depagne.org Thu Feb 23 15:31:09 2012 From: eric at depagne.org (=?iso-8859-1?q?=C9ric_Depagne?=) Date: Thu, 23 Feb 2012 21:31:09 +0100 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: Message-ID: <201202232131.10074.eric@depagne.org> Le jeudi 23 f?vrier 2012 21:24:28, Wes McKinney a ?crit : > That would indeed be great. Reading large files is a real pain whatever the python method used. BTW, could you tell us what you mean by large files? cheers, ?ric. > Sweet, between this, Continuum folks, and me and my guys I think we > can come up with something good and suits all our needs. We should set > up some realistic performance test cases that we can monitor via > vbench (wesm/vbench) while we're work on the project. > Un clavier azerty en vaut deux ---------------------------------------------------------- ?ric Depagne eric at depagne.org From erin.sheldon at gmail.com Thu Feb 23 15:33:54 2012 From: erin.sheldon at gmail.com (Erin Sheldon) Date: Thu, 23 Feb 2012 15:33:54 -0500 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: <1330028193-sup-4426@rohan> Message-ID: <1330029145-sup-4796@rohan> Excerpts from Wes McKinney's message of Thu Feb 23 15:24:44 -0500 2012: > On Thu, Feb 23, 2012 at 3:23 PM, Erin Sheldon wrote: > > I designed the recfile package to fill this need. ?It might be a start. > Can you relicense as BSD-compatible? If required, that would be fine with me. -e > > > Excerpts from Wes McKinney's message of Thu Feb 23 14:32:13 -0500 2012: > >> dear all, > >> > >> I haven't read all 180 e-mails, but I didn't see this on Travis's > >> initial list. > >> > >> All of the existing flat file reading solutions I have seen are > >> not suitable for many applications, and they compare very unfavorably > >> to tools present in other languages, like R. Here are some of the > >> main issues I see: > >> > >> - Memory usage: creating millions of Python objects when reading > >> ? a large file results in horrendously bad memory utilization, > >> ? which the Python interpreter is loathe to return to the > >> ? operating system. Any solution using the CSV module (like > >> ? pandas's parsers-- which are a lot faster than anything else I > >> ? know of in Python) suffers from this problem because the data > >> ? come out boxed in tuples of PyObjects. Try loading a 1,000,000 > >> ? x 20 CSV file into a structured array using np.genfromtxt or > >> ? into a DataFrame using pandas.read_csv and you will immediately > >> ? see the problem. R, by contrast, uses very little memory. > >> > >> - Performance: post-processing of Python objects results in poor > >> ? performance. Also, for the actual parsing, anything regular > >> ? expression based (like the loadtable effort over the summer, > >> ? all apologies to those who worked on it), is doomed to > >> ? failure. I think having a tool with a high degree of > >> ? compatibility and intelligence for parsing unruly small files > >> ? does make sense though, but it's not appropriate for large, > >> ? well-behaved files. > >> > >> - Need to "factorize": as soon as there is an enum dtype in > >> ? NumPy, we will want to enable the file parsers for structured > >> ? arrays and DataFrame to be able to "factorize" / convert to > >> ? enum certain columns (for example, all string columns) during > >> ? the parsing process, and not afterward. This is very important > >> ? for enabling fast groupby on large datasets and reducing > >> ? unnecessary memory usage up front (imagine a column with a > >> ? million values, with only 10 unique values occurring). This > >> ? would be trivial to implement using a C hash table > >> ? implementation like khash.h > >> > >> To be clear: I'm going to do this eventually whether or not it > >> happens in NumPy because it's an existing problem for heavy > >> pandas users. I see no reason why the code can't emit structured > >> arrays, too, so we might as well have a common library component > >> that I can use in pandas and specialize to the DataFrame internal > >> structure. > >> > >> It seems clear to me that this work needs to be done at the > >> lowest level possible, probably all in C (or C++?) or maybe > >> Cython plus C utilities. > >> > >> If anyone wants to get involved in this particular problem right > >> now, let me know! > >> > >> best, > >> Wes > > -- > > Erin Scott Sheldon > > Brookhaven National Laboratory -- Erin Scott Sheldon Brookhaven National Laboratory From francesc at continuum.io Thu Feb 23 15:34:32 2012 From: francesc at continuum.io (Francesc Alted) Date: Thu, 23 Feb 2012 14:34:32 -0600 Subject: [Numpy-discussion] mkl usage In-Reply-To: References: Message-ID: On Feb 23, 2012, at 2:19 PM, Neal Becker wrote: > Pauli Virtanen wrote: > >> 23.02.2012 20:44, Francesc Alted kirjoitti: >>> On Feb 23, 2012, at 1:33 PM, Neal Becker wrote: >>> >>>> Is mkl only used for linear algebra? Will it speed up e.g., elementwise >>>> transendental functions? >>> >>> Yes, MKL comes with VML that has this type of optimizations: >> >> And also no, in the sense that Numpy and Scipy don't use VML. >> > > My question is: > > "Should I purchase MKL?" > > To what extent will it speed up my existing python code, without my having to > exert (much) effort? > > So that would be numpy/scipy. Pauli already answered you. If you are restricted to use numpy/scipy and your aim is to accelerate the evaluation of transcendental functions, then there is no point in purchasing MKL. If you can open your spectrum and use numexpr, then I think you should ponder about it. -- Francesc Alted From pierre.haessig at crans.org Thu Feb 23 15:35:45 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Thu, 23 Feb 2012 21:35:45 +0100 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: Message-ID: <4F46A321.1020808@crans.org> Le 23/02/2012 20:32, Wes McKinney a ?crit : > If anyone wants to get involved in this particular problem right > now, let me know! Hi Wes, I'm totally out of the implementations issues you described, but I have some million-lines-long CSV files so that I experience "some slowdown" when loading those. I'll be very glad to use any upgraded loadfromtxt/genfromtxt/anyfunction once it's out ! Best, Pierre (and this reminds me shamefully that I still didn't take the time to give a serious try at your pandas...) -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From pierre.haessig at crans.org Thu Feb 23 15:37:09 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Thu, 23 Feb 2012 21:37:09 +0100 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: Message-ID: <4F46A375.4000302@crans.org> Le 23/02/2012 21:08, Travis Oliphant a ?crit : > I think loadtxt is now the 3rd or 4th "text-reading" interface I've seen in NumPy. Ok, now I understand why I got confused ;-) -- Pierre -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From wesmckinn at gmail.com Thu Feb 23 15:45:18 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Thu, 23 Feb 2012 15:45:18 -0500 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: <201202232131.10074.eric@depagne.org> References: <201202232131.10074.eric@depagne.org> Message-ID: On Thu, Feb 23, 2012 at 3:31 PM, ?ric Depagne wrote: > Le jeudi 23 f?vrier 2012 21:24:28, Wes McKinney a ?crit : >> > That would indeed be great. Reading large files is a real pain whatever the > python method used. > > BTW, could you tell us what you mean by large files? > > cheers, > ?ric. Reasonably wide CSV files with hundreds of thousands to millions of rows. I have a separate interest in JSON handling but that is a different kind of problem, and probably just a matter of forking ultrajson and having it not produce Python-object-based data structures. - Wes From erin.sheldon at gmail.com Thu Feb 23 15:55:15 2012 From: erin.sheldon at gmail.com (Erin Sheldon) Date: Thu, 23 Feb 2012 15:55:15 -0500 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: <201202232131.10074.eric@depagne.org> Message-ID: <1330030305-sup-9559@rohan> Excerpts from Wes McKinney's message of Thu Feb 23 15:45:18 -0500 2012: > Reasonably wide CSV files with hundreds of thousands to millions of > rows. I have a separate interest in JSON handling but that is a > different kind of problem, and probably just a matter of forking > ultrajson and having it not produce Python-object-based data > structures. As a benchmark, recfile can read an uncached file with 350,000 lines and 32 columns in about 5 seconds. File size ~220M -e -- Erin Scott Sheldon Brookhaven National Laboratory From wesmckinn at gmail.com Thu Feb 23 16:07:04 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Thu, 23 Feb 2012 16:07:04 -0500 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: <1330030305-sup-9559@rohan> References: <201202232131.10074.eric@depagne.org> <1330030305-sup-9559@rohan> Message-ID: On Thu, Feb 23, 2012 at 3:55 PM, Erin Sheldon wrote: > Excerpts from Wes McKinney's message of Thu Feb 23 15:45:18 -0500 2012: >> Reasonably wide CSV files with hundreds of thousands to millions of >> rows. I have a separate interest in JSON handling but that is a >> different kind of problem, and probably just a matter of forking >> ultrajson and having it not produce Python-object-based data >> structures. > > As a benchmark, recfile can read an uncached file with 350,000 lines and > 32 columns in about 5 seconds. ?File size ~220M > > -e > -- > Erin Scott Sheldon > Brookhaven National Laboratory That's pretty good. That's faster than pandas's csv-module+Cython approach almost certainly (but I haven't run your code to get a read on how much my hardware makes a difference), but that's not shocking at all: In [1]: df = DataFrame(np.random.randn(350000, 32)) In [2]: df.to_csv('/home/wesm/tmp/foo.csv') In [3]: %time df2 = read_csv('/home/wesm/tmp/foo.csv') CPU times: user 6.62 s, sys: 0.40 s, total: 7.02 s Wall time: 7.04 s I must think that skipping the process of creating 11.2 mm Python string objects and then individually converting each of them to float. Note for reference (i'm skipping the first row which has the column labels from above): In [2]: %time arr = np.genfromtxt('/home/wesm/tmp/foo.csv', dtype=None, delimiter=',', skip_header=1)CPU times: user 24.17 s, sys: 0.48 s, total: 24.65 s Wall time: 24.67 s In [6]: %time arr = np.loadtxt('/home/wesm/tmp/foo.csv', delimiter=',', skiprows=1) CPU times: user 11.08 s, sys: 0.22 s, total: 11.30 s Wall time: 11.32 s In this last case for example, around 500 MB of RAM is taken up for an array that should only be about 80-90MB. If you're a data scientist working in Python, this is _not good_. -W From gael.varoquaux at normalesup.org Thu Feb 23 16:09:14 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 23 Feb 2012 22:09:14 +0100 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: <201202232131.10074.eric@depagne.org> <1330030305-sup-9559@rohan> Message-ID: <20120223210914.GB24098@phare.normalesup.org> On Thu, Feb 23, 2012 at 04:07:04PM -0500, Wes McKinney wrote: > In this last case for example, around 500 MB of RAM is taken up for an > array that should only be about 80-90MB. If you're a data scientist > working in Python, this is _not good_. But why, oh why, are people storing big data in CSV? G From eric at depagne.org Thu Feb 23 16:14:39 2012 From: eric at depagne.org (=?iso-8859-1?q?=C9ric_Depagne?=) Date: Thu, 23 Feb 2012 22:14:39 +0100 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: <20120223210914.GB24098@phare.normalesup.org> References: <20120223210914.GB24098@phare.normalesup.org> Message-ID: <201202232214.39667.eric@depagne.org> > But why, oh why, are people storing big data in CSV? Well, that's what scientist do :-) ?ric. > > G > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Un clavier azerty en vaut deux ---------------------------------------------------------- ?ric Depagne eric at depagne.org From robert.kern at gmail.com Thu Feb 23 16:14:34 2012 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 23 Feb 2012 21:14:34 +0000 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: <20120223210914.GB24098@phare.normalesup.org> References: <201202232131.10074.eric@depagne.org> <1330030305-sup-9559@rohan> <20120223210914.GB24098@phare.normalesup.org> Message-ID: On Thu, Feb 23, 2012 at 21:09, Gael Varoquaux wrote: > On Thu, Feb 23, 2012 at 04:07:04PM -0500, Wes McKinney wrote: >> In this last case for example, around 500 MB of RAM is taken up for an >> array that should only be about 80-90MB. If you're a data scientist >> working in Python, this is _not good_. > > But why, oh why, are people storing big data in CSV? Because everyone can read it. It's not so much "storage" as "transmission". -- Robert Kern From erin.sheldon at gmail.com Thu Feb 23 16:20:36 2012 From: erin.sheldon at gmail.com (Erin Sheldon) Date: Thu, 23 Feb 2012 16:20:36 -0500 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: <201202232131.10074.eric@depagne.org> <1330030305-sup-9559@rohan> Message-ID: <1330031912-sup-1385@rohan> Excerpts from Wes McKinney's message of Thu Feb 23 16:07:04 -0500 2012: > That's pretty good. That's faster than pandas's csv-module+Cython > approach almost certainly (but I haven't run your code to get a read > on how much my hardware makes a difference), but that's not shocking > at all: > > In [1]: df = DataFrame(np.random.randn(350000, 32)) > > In [2]: df.to_csv('/home/wesm/tmp/foo.csv') > > In [3]: %time df2 = read_csv('/home/wesm/tmp/foo.csv') > CPU times: user 6.62 s, sys: 0.40 s, total: 7.02 s > Wall time: 7.04 s > > I must think that skipping the process of creating 11.2 mm Python > string objects and then individually converting each of them to float. > > Note for reference (i'm skipping the first row which has the column > labels from above): > > In [2]: %time arr = np.genfromtxt('/home/wesm/tmp/foo.csv', > dtype=None, delimiter=',', skip_header=1)CPU times: user 24.17 s, sys: > 0.48 s, total: 24.65 s > Wall time: 24.67 s > > In [6]: %time arr = np.loadtxt('/home/wesm/tmp/foo.csv', > delimiter=',', skiprows=1) > CPU times: user 11.08 s, sys: 0.22 s, total: 11.30 s > Wall time: 11.32 s > > In this last case for example, around 500 MB of RAM is taken up for an > array that should only be about 80-90MB. If you're a data scientist > working in Python, this is _not good_. It might be good to compare on recarrays, which are a bit more complex. Can you try one of these .dat files? http://www.cosmo.bnl.gov/www/esheldon/data/lensing/scat/05/ The dtype is [('ra', 'f8'), ('dec', 'f8'), ('g1', 'f8'), ('g2', 'f8'), ('err', 'f8'), ('scinv', 'f8', 27)] -- Erin Scott Sheldon Brookhaven National Laboratory From ben.root at ou.edu Thu Feb 23 16:38:22 2012 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 23 Feb 2012 15:38:22 -0600 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: <201202232131.10074.eric@depagne.org> <1330030305-sup-9559@rohan> <20120223210914.GB24098@phare.normalesup.org> Message-ID: On Thu, Feb 23, 2012 at 3:14 PM, Robert Kern wrote: > On Thu, Feb 23, 2012 at 21:09, Gael Varoquaux > wrote: > > On Thu, Feb 23, 2012 at 04:07:04PM -0500, Wes McKinney wrote: > >> In this last case for example, around 500 MB of RAM is taken up for an > >> array that should only be about 80-90MB. If you're a data scientist > >> working in Python, this is _not good_. > > > > But why, oh why, are people storing big data in CSV? > > Because everyone can read it. It's not so much "storage" as "transmission". > > Because their labmate/officemate/advisor is using Excel... Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From wesmckinn at gmail.com Thu Feb 23 16:39:34 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Thu, 23 Feb 2012 16:39:34 -0500 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: <1330031912-sup-1385@rohan> References: <201202232131.10074.eric@depagne.org> <1330030305-sup-9559@rohan> <1330031912-sup-1385@rohan> Message-ID: On Thu, Feb 23, 2012 at 4:20 PM, Erin Sheldon wrote: > Excerpts from Wes McKinney's message of Thu Feb 23 16:07:04 -0500 2012: >> That's pretty good. That's faster than pandas's csv-module+Cython >> approach almost certainly (but I haven't run your code to get a read >> on how much my hardware makes a difference), but that's not shocking >> at all: >> >> In [1]: df = DataFrame(np.random.randn(350000, 32)) >> >> In [2]: df.to_csv('/home/wesm/tmp/foo.csv') >> >> In [3]: %time df2 = read_csv('/home/wesm/tmp/foo.csv') >> CPU times: user 6.62 s, sys: 0.40 s, total: 7.02 s >> Wall time: 7.04 s >> >> I must think that skipping the process of creating 11.2 mm Python >> string objects and then individually converting each of them to float. >> >> Note for reference (i'm skipping the first row which has the column >> labels from above): >> >> In [2]: %time arr = np.genfromtxt('/home/wesm/tmp/foo.csv', >> dtype=None, delimiter=',', skip_header=1)CPU times: user 24.17 s, sys: >> 0.48 s, total: 24.65 s >> Wall time: 24.67 s >> >> In [6]: %time arr = np.loadtxt('/home/wesm/tmp/foo.csv', >> delimiter=',', skiprows=1) >> CPU times: user 11.08 s, sys: 0.22 s, total: 11.30 s >> Wall time: 11.32 s >> >> In this last case for example, around 500 MB of RAM is taken up for an >> array that should only be about 80-90MB. If you're a data scientist >> working in Python, this is _not good_. > > It might be good to compare on recarrays, which are a bit more complex. > Can you try one of these .dat files? > > ? ?http://www.cosmo.bnl.gov/www/esheldon/data/lensing/scat/05/ > > The dtype is > > [('ra', 'f8'), > ?('dec', 'f8'), > ?('g1', 'f8'), > ?('g2', 'f8'), > ?('err', 'f8'), > ?('scinv', 'f8', 27)] > > -- > Erin Scott Sheldon > Brookhaven National Laboratory Forgot this one that is also widely used: In [28]: %time recs = matplotlib.mlab.csv2rec('/home/wesm/tmp/foo.csv', skiprows=1) CPU times: user 65.16 s, sys: 0.30 s, total: 65.46 s Wall time: 65.55 s ok with one of those dat files and the dtype I get: In [18]: %time arr = np.genfromtxt('/home/wesm/Downloads/scat-05-000.dat', dtype=dtype, skip_header=0, delimiter=' ') CPU times: user 17.52 s, sys: 0.14 s, total: 17.66 s Wall time: 17.67 s difference not so stark in this case. I don't produce structured arrays, though In [26]: %time arr = read_table('/home/wesm/Downloads/scat-05-000.dat', header=None, sep=' ') CPU times: user 10.15 s, sys: 0.10 s, total: 10.25 s Wall time: 10.26 s - Wes From pierre.haessig at crans.org Thu Feb 23 17:21:28 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Thu, 23 Feb 2012 23:21:28 +0100 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: <201202232131.10074.eric@depagne.org> <1330030305-sup-9559@rohan> <20120223210914.GB24098@phare.normalesup.org> Message-ID: <4F46BBE8.1020100@crans.org> Le 23/02/2012 22:38, Benjamin Root a ?crit : > labmate/officemate/advisor is using Excel... ... or an industrial partner with its windows-based software that can export (when it works) some very nice field data from a proprietary Honeywell data logger. CSV data is better than no data ! (and better than XLS data !) About the *big* data aspect of Gael's question, this reminds me a software project saying [1] that I would distort the following way : '' Q : How does a CSV data file get to be a million line long ? A : One line at a time ! '' And my experience with some time series measurements was really about this : small changes in the data rate, a slightly longer acquisition period, and that's it ! Pierre (I shamefully confess I spent several hours writing *ad-hoc* Python scripts full of regexps and generators just to fix various tiny details of those CSV files... but in the end it worked !) [1] I just quickly googled "one day at a time" for a reference and ended up on http://en.wikipedia.org/wiki/The_Mythical_Man-Month -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From patricka at uvic.ca Thu Feb 23 17:28:00 2012 From: patricka at uvic.ca (Patrick Armstrong) Date: Thu, 23 Feb 2012 14:28:00 -0800 Subject: [Numpy-discussion] Problem Building Numpy with Python 2.7.1 and OS X 10.7.3 Message-ID: <440B19D3-1A26-4EE8-AC99-69DFCF93CBC9@uvic.ca> Hi there, I'm having a problem building NumPy on Python 2.7.1 and OS X 10.7.3. Here is my build log: https://gist.github.com/1895377 Does anyone have any idea what might be happening? I get a very similar error when compiling with clang. Installing a binary really isn't an option for me due to some specifics of my project. Does anyone have an idea what might be wrong? Thanks. --patrick From lamblinp at iro.umontreal.ca Thu Feb 23 17:41:13 2012 From: lamblinp at iro.umontreal.ca (Pascal Lamblin) Date: Thu, 23 Feb 2012 23:41:13 +0100 Subject: [Numpy-discussion] Announcing Theano 0.5 Message-ID: <20120223224113.GA26872@bob.blip.be> =========================== Announcing Theano 0.5 =========================== This is a major version, with lots of new features, bug fixes, and some interface changes (deprecated or potentially misleading features were removed). Upgrading to Theano 0.5 is recommended for everyone, but you should first make sure that your code does not raise deprecation warnings with Theano 0.4.1. Otherwise, in one case the results can change. In other cases, the warnings are turned into errors (see below for details). For those using the bleeding edge version in the git repository, we encourage you to update to the `0.5` tag. If you have updated to 0.5rc1 or 0.5rc2, you are highly encouraged to update to 0.5, as some bugs introduced in those versions have now been fixed, see items marked with '#' in the lists below. What's New ---------- Highlight: * Moved to github: http://github.com/Theano/Theano/ * Old trac ticket moved to assembla ticket: http://www.assembla.com/spaces/theano/tickets * Theano vision: http://deeplearning.net/software/theano/introduction.html#theano-vision (Many people) * Theano with GPU works in some cases on Windows now. Still experimental. (Sebastian Urban) * Faster dot() call: New/Better direct call to cpu and gpu ger, gemv, gemm and dot(vector, vector). (James, Fr?d?ric, Pascal) * C implementation of Alloc. (James, Pascal) * theano.grad() now also work with sparse variable. (Arnaud) * Macro to implement the Jacobian/Hessian with theano.tensor.{jacobian,hessian} (Razvan) * See the Interface changes. Interface Behavior Changes: * The current default value of the parameter axis of theano.{max,min,argmax,argmin,max_and_argmax} is now the same as numpy: None. i.e. operate on all dimensions of the tensor. (Fr?d?ric Bastien, Olivier Delalleau) (was deprecated and generated a warning since Theano 0.3 released Nov. 23rd, 2010) * The current output dtype of sum with input dtype [u]int* is now always [u]int64. You can specify the output dtype with a new dtype parameter to sum. The output dtype is the one using for the summation. There is no warning in previous Theano version about this. The consequence is that the sum is done in a dtype with more precision than before. So the sum could be slower, but will be more resistent to overflow. This new behavior is the same as numpy. (Olivier, Pascal) # When using a GPU, detect faulty nvidia drivers. This was detected when running Theano tests. Now this is always tested. Faulty drivers results in in wrong results for reduce operations. (Frederic B.) Interface Features Removed (most were deprecated): * The string modes FAST_RUN_NOGC and STABILIZE are not accepted. They were accepted only by theano.function(). Use Mode(linker='c|py_nogc') or Mode(optimizer='stabilize') instead. * tensor.grad(cost, wrt) now always returns an object of the "same type" as wrt (list/tuple/TensorVariable). (Ian Goodfellow, Olivier) * A few tag.shape and Join.vec_length left have been removed. (Frederic) * The .value attribute of shared variables is removed, use shared.set_value() or shared.get_value() instead. (Frederic) * Theano config option "home" is not used anymore as it was redundant with "base_compiledir". If you use it, Theano will now raise an error. (Olivier D.) * scan interface changes: (Razvan Pascanu) * The use of `return_steps` for specifying how many entries of the output to return has been removed. Instead, apply a subtensor to the output returned by scan to select a certain slice. * The inner function (that scan receives) should return its outputs and updates following this order: [outputs], [updates], [condition]. One can skip any of the three if not used, but the order has to stay unchanged. Interface bug fix: * Rop in some case should have returned a list of one Theano variable, but returned the variable itself. (Razvan) New deprecation (will be removed in Theano 0.6, warning generated if you use them): * tensor.shared() renamed to tensor._shared(). You probably want to call theano.shared() instead! (Olivier D.) Bug fixes (incorrect results): * On CPU, if the convolution had received explicit shape information, they where not checked at runtime. This caused wrong result if the input shape was not the one expected. (Frederic, reported by Sander Dieleman) * Theoretical bug: in some case we could have GPUSum return bad value. We were not able to reproduce this problem * patterns affected ({0,1}*nb dim, 0 no reduction on this dim, 1 reduction on this dim): 01, 011, 0111, 010, 10, 001, 0011, 0101 (Frederic) * div by zero in verify_grad. This hid a bug in the grad of Images2Neibs. (James) * theano.sandbox.neighbors.Images2Neibs grad was returning a wrong value. The grad is now disabled and returns an error. (Frederic) * An expression of the form "1 / (exp(x) +- constant)" was systematically matched to "1 / (exp(x) + 1)" and turned into a sigmoid regardless of the value of the constant. A warning will be issued if your code was affected by this bug. (Olivier, reported by Sander Dieleman) * When indexing into a subtensor of negative stride (for instance, x[a:b:-1][c]), an optimization replacing it with a direct indexing (x[d]) used an incorrect formula, leading to incorrect results. (Pascal, reported by Razvan) * The tile() function is now stricter in what it accepts to allow for better error-checking/avoiding nonsensical situations. The gradient has been disabled for the time being as it only implemented (incorrectly) one special case. The `reps` argument must be a constant (not a tensor variable), and must have the same length as the number of dimensions in the `x` argument; this is now checked. (David) # Fix a bug with Gemv and Ger on CPU, when used on vectors with negative strides. Data was read from incorrect (and possibly uninitialized) memory space. This bug was probably introduced in 0.5rc1. (Pascal L.) # The Theano flag "nvcc.flags" is now included in the hard part of the key. This mean that now we recompile all modules for each value of "nvcc.flags". A change in "nvcc.flags" used to be ignored for module that were already compiled. (Frederic B.) Scan fixes: * computing grad of a function of grad of scan (reported by Justin Bayer, fix by Razvan) before : most of the time crash, but could be wrong value with bad number of dimensions (so a visible bug) now : do the right thing. * gradient with respect to outputs using multiple taps (reported by Timothy, fix by Razvan) before : it used to return wrong values now : do the right thing. Note: The reported case of this bug was happening in conjunction with the save optimization of scan that give run time errors. So if you didn't manually disable the same memory optimization (number in the list4), you are fine if you didn't manually request multiple taps. * Rop of gradient of scan (reported by Timothy and Justin Bayer, fix by Razvan) before : compilation error when computing R-op now : do the right thing. * save memory optimization of scan (reported by Timothy and Nicolas BL, fix by Razvan) before : for certain corner cases used to result in a runtime shape error now : do the right thing. * Scan grad when the input of scan has sequences of different lengths. (Razvan, reported by Michael Forbes) * Scan.infer_shape now works correctly when working with a condition for the number of loops. In the past, it returned n_steps as the length, which is not always true. (Razvan) * Scan.infer_shape crash fix. (Razvan) New features: * AdvancedIncSubtensor grad defined and tested (Justin Bayer) * Adding 1D advanced indexing support to inc_subtensor and set_subtensor (James Bergstra) * tensor.{zeros,ones}_like now support the dtype param as numpy (Frederic) * Added configuration flag "exception_verbosity" to control the verbosity of exceptions (Ian) * theano-cache list: list the content of the theano cache (Frederic) * theano-cache unlock: remove the Theano lock (Olivier) * tensor.ceil_int_div to compute ceil(a / float(b)) (Frederic) * MaxAndArgMax.grad now works with any axis (The op supports only 1 axis) (Frederic) * used by tensor.{max,min,max_and_argmax} * tensor.{all,any} (Razvan) * tensor.roll as numpy: (Matthew Rocklin, David Warde-Farley) * Theano with GPU works in some cases on Windows now. Still experimental. (Sebastian Urban) * IfElse now allows to have a list/tuple as the result of the if/else branches. * They must have the same length and corresponding type (Razvan) * Argmax output dtype is now int64 instead of int32. (Olivier) * Added the element-wise operation arccos. (Ian) * Added sparse dot with dense grad output. (Yann Dauphin) * Optimized to Usmm and UsmmCscDense in some case (Yann) * Note: theano.dot and theano.sparse.structured_dot() always had a gradient with the same sparsity pattern as the inputs. The new theano.sparse.dot() has a dense gradient for all inputs. * GpuAdvancedSubtensor1 supports broadcasted dimensions. (Frederic) * TensorVariable.zeros_like() and SparseVariable.zeros_like() * theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.device_properties() (Frederic) * theano.sandbox.cuda.cuda_ndarray.cuda_ndarray.mem_info() return free and total gpu memory (Frederic) * Theano flags compiledir_format. Keep the same default as before: compiledir_%(platform)s-%(processor)s-%(python_version)s. (Josh Bleecher Snyder) * We also support the "theano_version" substitution. * IntDiv c code (faster and allow this elemwise to be fused with other elemwise) (Pascal) * Internal filter_variable mechanism in Type. (Pascal, Ian) * Ifelse works on sparse. * It makes use of gpu shared variable more transparent with theano.function updates and givens parameter. * Added a_tensor.transpose(axes) axes is optional (James) * theano.tensor.transpose(a_tensor, kwargs) We where ignoring kwargs, now it is used as the axes. * a_CudaNdarray_object[*] = int, now works (Frederic) * tensor_variable.size (as numpy) computes the product of the shape elements. (Olivier) * sparse_variable.size (as scipy) computes the number of stored values. (Olivier) * sparse_variable[N, N] now works (Li Yao, Frederic) * sparse_variable[M:N, O:P] now works (Li Yao, Frederic, Pascal) M, N, O, and P can be Python int or scalar tensor variables, None, or omitted (sparse_variable[:, :M] or sparse_variable[:M, N:] work). * tensor.tensordot can now be moved to GPU (Sander Dieleman, Pascal, based on code from Tijmen Tieleman's gnumpy, http://www.cs.toronto.edu/~tijmen/gnumpy.html) # Many infer_shape implemented on sparse matrices op. (David W.F.) # Added theano.sparse.verify_grad_sparse to easily allow testing grad of sparse op. It support testing the full and structured gradient. # The keys in our cache now store the hash of constants and not the constant values themselves. This is significantly more efficient for big constant arrays. (Frederic B.) # 'theano-cache list' lists key files bigger than 1M (Frederic B.) # 'theano-cache list' prints an histogram of the number of keys per compiled module (Frederic B.) # 'theano-cache list' prints the number of compiled modules per op class (Frederic B.) # The Theano flag "nvcc.fastmath" is now also used for the cuda_ndarray.cu file. # Add the header_dirs to the hard part of the compilation key. This is currently used only by cuda, but if we use library that are only headers, this can be useful. (Frederic B.) # Alloc, GpuAlloc are not always pre-computed (constant_folding optimization) at compile time if all their inputs are constant. (Frederic B., Pascal L., reported by Sander Dieleman) # New Op tensor.sort(), wrapping numpy.sort (Hani Almousli) New optimizations: * AdvancedSubtensor1 reuses preallocated memory if available (scan, c|py_nogc linker) (Frederic) * dot22, dot22scalar work with complex. (Frederic) * Generate Gemv/Gemm more often. (James) * Remove scan when all computations can be moved outside the loop. (Razvan) * scan optimization done earlier. This allows other optimizations to be applied. (Frederic, Guillaume, Razvan) * exp(x) * sigmoid(-x) is now correctly optimized to the more stable form sigmoid(x). (Olivier) * Added Subtensor(Rebroadcast(x)) => Rebroadcast(Subtensor(x)) optimization. (Guillaume) * Made the optimization process faster. (James) * Allow fusion of elemwise when the scalar op needs support code. (James) * Better opt that lifts transpose around dot. (James) Crashes fixed: * T.mean crash at graph building time. (Ian) * "Interactive debugger" crash fix. (Ian, Frederic) * Do not call gemm with strides 0, some blas refuse it. (Pascal Lamblin) * Optimization crash with gemm and complex. (Frederic) * GPU crash with elemwise. (Frederic, some reported by Chris Currivan) * Compilation crash with amdlibm and the GPU. (Frederic) * IfElse crash. (Frederic) * Execution crash fix in AdvancedSubtensor1 on 32 bit computers. (Pascal) * GPU compilation crash on MacOS X. (Olivier) * Support for OSX Enthought Python Distribution 7.x. (Graham Taylor, Olivier) * When the subtensor inputs had 0 dimensions and the outputs 0 dimensions. (Frederic) * Crash when the step to subtensor was not 1 in conjunction with some optimization. (Frederic, reported by Olivier Chapelle) * Runtime crash related to an optimization with subtensor of alloc (reported by Razvan, fixed by Frederic) * Fix dot22scalar cast of integer scalars (Justin Bayer, Fr?d?ric, Olivier) * Fix runtime crash in gemm, dot22. FB * Fix on 32bits computer: make sure all shape are int64.(Olivier) * Fix to deque on python 2.4 (Olivier) * Fix crash when not using c code (or using DebugMode) (not used by default) with numpy 1.6*. Numpy has a bug in the reduction code that made it crash. (Pascal) * Crashes of blas functions (Gemv on CPU; Ger, Gemv and Gemm on GPU) when matrices had non-unit stride in both dimensions (CPU and GPU), or when matrices had negative strides (GPU only). In those cases, we are now making copies. (Pascal) # More cases supported in AdvancedIncSubtensor1. (Olivier D.) # Fix crash when a broadcasted constant was used as input of an elemwise Op and needed to be upcasted to match the op's output. (Reported by John Salvatier, fixed by Pascal L.) # Fixed a memory leak with shared variable (we kept a pointer to the original value) (Ian G.) Known bugs: * CAReduce with nan in inputs don't return the good output (`Ticket `_). * This is used in tensor.{max,mean,prod,sum} and in the grad of PermuteRowElements. Sandbox: * cvm interface more consistent with current linker. (James) * Now all tests pass with the linker=cvm flags. * vm linker has a callback parameter. (James) * review/finish/doc: diag/extract_diag. (Arnaud Bergeron, Frederic, Olivier) * review/finish/doc: AllocDiag/diag. (Arnaud, Frederic, Guillaume) * review/finish/doc: MatrixInverse, matrix_inverse. (Razvan) * review/finish/doc: matrix_dot. (Razvan) * review/finish/doc: det (determinent) op. (Philippe Hamel) * review/finish/doc: Cholesky determinent op. (David) * review/finish/doc: ensure_sorted_indices. (Li Yao) * review/finish/doc: spectral_radius_boud. (Xavier Glorot) * review/finish/doc: sparse sum. (Valentin Bisson) * review/finish/doc: Remove0 (Valentin) * review/finish/doc: SquareDiagonal (Eric) Sandbox New features (not enabled by default): * CURAND_RandomStreams for uniform and normal (not picklable, GPU only) (James) * New sandbox.linalg.ops.pinv(pseudo-inverse) op (Razvan) Documentation: * Many updates. (Many people) * Updates to install doc on MacOS. (Olivier) * Updates to install doc on Windows. (David, Olivier) * Doc on the Rop function (Ian) * Added how to use scan to loop with a condition as the number of iteration. (Razvan) * Added how to wrap in Theano an existing python function (in numpy, scipy, ...). (Frederic) * Refactored GPU installation of Theano. (Olivier) Others: * Better error messages in many places. (Many people) * PEP8 fixes. (Many people) * Add a warning about numpy bug when using advanced indexing on a tensor with more than 2**32 elements (the resulting array is not correctly filled and ends with zeros). (Pascal, reported by David WF) * Added Scalar.ndim=0 and ScalarSharedVariable.ndim=0 (simplify code) (Razvan) * New min_informative_str() function to print graph. (Ian) * Fix catching of exception. (Sometimes we used to catch interrupts) (Frederic, David, Ian, Olivier) * Better support for utf string. (David) * Fix pydotprint with a function compiled with a ProfileMode (Frederic) * Was broken with change to the profiler. * Warning when people have old cache entries. (Olivier) * More tests for join on the GPU and CPU. (Frederic) * Do not request to load the GPU module by default in scan module. (Razvan) * Fixed some import problems. (Frederic and others) * Filtering update. (James) * On Windows, the default compiledir changed to be local to the computer/user and not transferred with roaming profile. (Sebastian Urban) * New theano flag "on_shape_error". Defaults to "warn" (same as previous behavior): it prints a warning when an error occurs when inferring the shape of some apply node. The other accepted value is "raise" to raise an error when this happens. (Frederic) * The buidbot now raises optimization/shape errors instead of just printing a warning. (Frederic) * better pycuda tests (Frederic) * check_blas.py now accept the shape and the number of iteration as parameter (Frederic) * Fix opt warning when the opt ShapeOpt is disabled (enabled by default) (Frederic) * More internal verification on what each op.infer_shape return. (Frederic, James) * Argmax dtype to int64 (Olivier) * Improved docstring and basic tests for the Tile Op (David). Reviewers (alphabetical order): * David, Frederic, Ian, James, Olivier, Razvan Download and Install -------------------- You can download Theano from http://pypi.python.org/pypi/Theano Installation instructions are available at http://deeplearning.net/software/theano/install.html Description ----------- Theano is a Python library that allows you to define, optimize, and efficiently evaluate mathematical expressions involving multi-dimensional arrays. It is built on top of NumPy. Theano features: * tight integration with NumPy: a similar interface to NumPy's. numpy.ndarrays are also used internally in Theano-compiled functions. * transparent use of a GPU: perform data-intensive computations up to 140x faster than on a CPU (support for float32 only). * efficient symbolic differentiation: Theano can compute derivatives for functions of one or many inputs. * speed and stability optimizations: avoid nasty bugs when computing expressions such as log(1+ exp(x)) for large values of x. * dynamic C code generation: evaluate expressions faster. * extensive unit-testing and self-verification: includes tools for detecting and diagnosing bugs and/or potential problems. Theano has been powering large-scale computationally intensive scientific research since 2007, but it is also approachable enough to be used in the classroom (IFT6266 at the University of Montreal). Resources --------- About Theano: http://deeplearning.net/software/theano/ Theano-related projects: http://github.com/Theano/Theano/wiki/Related-projects About NumPy: http://numpy.scipy.org/ About SciPy: http://www.scipy.org/ Machine Learning Tutorial with Theano on Deep Architectures: http://deeplearning.net/tutorial/ Acknowledgments --------------- I would like to thank all contributors of Theano. For this particular release, many people have helped, notably (in alphabetical order): Hani Almousli, Fr?d?ric Bastien, Justin Bayer, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Yann Dauphin, Olivier Delalleau, Guillaume Desjardins, Sander Dieleman, Xavier Glorot, Ian Goodfellow, Philippe Hamel, Pascal Lamblin, Eric Laufer, Gr?goire Mesnil, Razvan Pascanu, Matthew Rocklin, Graham Taylor, Sebastian Urban, David Warde-Farley, and Yao Li. I would also like to thank users who submitted bug reports, notably: Nicolas Boulanger-Lewandowski, Olivier Chapelle, Michael Forbes, Timothy Lillicrap, and John Salvatier. Also, thank you to all NumPy and Scipy developers as Theano builds on their strengths. -- Pascal From matthew.brett at gmail.com Thu Feb 23 19:00:18 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 23 Feb 2012 19:00:18 -0500 Subject: [Numpy-discussion] np.longlong casts to int In-Reply-To: <4F469A02.6060205@crans.org> References: <4F460182.5070004@crans.org> <4A41A25D-7CBF-4D5A-A8C6-B2C495F7D302@continuum.io> <69F4FC8C-896E-4401-BC11-374E49CEBF07@continuum.io> <4F468169.4050909@crans.org> <4F469A02.6060205@crans.org> Message-ID: Hi, On Thu, Feb 23, 2012 at 2:56 PM, Pierre Haessig wrote: > Le 23/02/2012 20:08, Mark Wiebe a ?crit : >> +1, I think it's good for its name to correspond to the name in C/C++, >> so that when people search for information on it they will find the >> relevant information more easily. With a bunch of NumPy-specific >> aliases, it just creates more hassle for everybody. > I don't fully agree. > > First, this assumes that people were "C-educated", at least a bit. I got > some C education, but I spent most of my scientific programming time > sitting in front of Python, Matlab, and a bit of R (in that order). In > this context, double, floats, long ?and short are all esoteric incantation. > Second the C/C++ names are very unprecise with regards to their memory > content, and sometimes platform dependent. On the other "float64" is > very informative. Right - no proposal to change float64 because it's not ambiguous - it is both binary64 IEEE floating point format and 64 bit width. The confusion here is for float128 - which is very occasionally IEEE binary128 and can be at least two other things (PPC twin double, and Intel 80 bit padded to 128 bits). Some of us were also surprised to find float96 is the same precision as float128 (being an 80 bit Intel padded to 96 bits). The renaming is an attempt to make it less confusing. Do you agree the renaming is less confusing? Do you have another proposal? Preferring 'longdouble' is precisely to flag up to people that they may need to do some more research to find out what exactly that is. Which is correct :) Best, Matthew From travis at continuum.io Thu Feb 23 19:20:07 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 23 Feb 2012 18:20:07 -0600 Subject: [Numpy-discussion] Test survey that I have been putting together Message-ID: Hey all, I would like to gather concrete information about NumPy users and have some data to look at regarding the user base and features that are of interest. We have been putting together a survey that I would love feedback on from members of this list. If you have time and are interested in helping us gather information for improving NumPy, could you please take and fill out information on the following survey: https://www.surveymonkey.com/s/numpy_list_survey After you complete the survey, I would really appreciate any feedback on questions that could be improved, removed, or added. Once we incoporate your feedback, we will distribute the survey more broadly and will report back the main results of the survey to this list. Thank you, -Travis From ajfrank at ics.uci.edu Thu Feb 23 19:36:52 2012 From: ajfrank at ics.uci.edu (Drew Frank) Date: Thu, 23 Feb 2012 16:36:52 -0800 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: <4F46BBE8.1020100@crans.org> References: <201202232131.10074.eric@depagne.org> <1330030305-sup-9559@rohan> <20120223210914.GB24098@phare.normalesup.org> <4F46BBE8.1020100@crans.org> Message-ID: For convenience, here's a link to the mailing list thread on this topic from a couple months ago: http://thread.gmane.org/gmane.comp.python.numeric.general/47094 . Drew -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.anton.letnes at gmail.com Fri Feb 24 00:46:00 2012 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Fri, 24 Feb 2012 06:46:00 +0100 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: Message-ID: As others on this list, I've also been confused a bit by the prolific numpy interfaces to reading text. Would it be an idea to create some sort of object oriented solution for this purpose? reader = np.FileReader('my_file.txt') reader.loadtxt() # for backwards compat.; np.loadtxt could instantiate a reader and call this function if one wants to keep the interface reader.very_general_and_typically_slow_reading(missing_data=True) reader.my_files_look_like_this_plz_be_fast(fmt='%20.8e', separator=',', ncol=2) reader.cvs_read() # same as above, but with sensible defaults reader.lazy_read() # returns a generator/iterator, so you can slice out a small part of a huge array, for instance, even when working with text (yes, inefficient) reader.convert_line_by_line(myfunc) # line-by-line call myfunc, letting the user somehow convert easily to his/her format of choice: netcdf, hdf5, ... Not fast, but convenient Another option is to create a hierarchy of readers implemented as classes. Not sure if the benefits outweigh the disadvantages. Just a crazy idea - it would at least gather all the file reading interfaces into one place (or one object hierarchy) so folks know where to look. The whole numpy namespace is a bit cluttered, imho, and for newbies it would be beneficial to use submodules to a greater extent than today - but that's a more long-term discussion. Paul On 23. feb. 2012, at 21:08, Travis Oliphant wrote: > This is actually on my short-list as well --- it just didn't make it to the list. > > In fact, we have someone starting work on it this week. It is his first project so it will take him a little time to get up to speed on it, but he will contact Wes and work with him and report progress to this list. > > Integration with np.loadtxt is a high-priority. I think loadtxt is now the 3rd or 4th "text-reading" interface I've seen in NumPy. I have no interest in making a new one if we can avoid it. But, we do need to make it faster with less memory overhead for simple cases like Wes describes. > > -Travis > > > > On Feb 23, 2012, at 1:53 PM, Pauli Virtanen wrote: > >> Hi, >> >> 23.02.2012 20:32, Wes McKinney kirjoitti: >> [clip] >>> To be clear: I'm going to do this eventually whether or not it >>> happens in NumPy because it's an existing problem for heavy >>> pandas users. I see no reason why the code can't emit structured >>> arrays, too, so we might as well have a common library component >>> that I can use in pandas and specialize to the DataFrame internal >>> structure. >> >> If you do this, one useful aim could be to design the code such that it >> can be used in loadtxt, at least as a fast path for common cases. I'd >> really like to avoid increasing the number of APIs for text file loading. >> >> -- >> Pauli Virtanen >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From rjd4+numpy at cam.ac.uk Fri Feb 24 07:11:39 2012 From: rjd4+numpy at cam.ac.uk (Bob Dowling) Date: Fri, 24 Feb 2012 12:11:39 +0000 Subject: [Numpy-discussion] Matrices and arrays of vectors Message-ID: <4F477E7B.5000400@cam.ac.uk> Conceptually, I have a 2-d grid of 2-d vectors. I am representing this as an ndarray of shape (2,M,N). I want to apply a 2x2 matrix individually to each vector in the grid. However, I can't work out the appropriate syntax for getting the matrix multiplication to broadcast over the grid. All attempts at multiplication seem to want to cast the 3-d array into a matricx with the inevitable error message "ValueError: shape too large to be a matrix." Does such a method exist? This is a dummy script to show the sort of thing I'm trying to do: import math import numpy import numpy.random # Rotation angle theta = math.pi/6.0 # Grid shape M = 10 N = 10 # Establish the rotation matrix c = math.cos(theta) s = math.sin(theta) rotation = numpy.matrix([[c, s], [-1.0*s, c]]) # Fudge some data to work with data = numpy.random.uniform(-1.0, 1.0, 2*M*N).reshape(2,M,N) # THIS DOES NOT WORK. # BUT WHAT DOES? rotated_data = rotation*data From pierre.haessig at crans.org Fri Feb 24 07:43:23 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Fri, 24 Feb 2012 13:43:23 +0100 Subject: [Numpy-discussion] np.longlong casts to int In-Reply-To: References: <4F460182.5070004@crans.org> <4A41A25D-7CBF-4D5A-A8C6-B2C495F7D302@continuum.io> <69F4FC8C-896E-4401-BC11-374E49CEBF07@continuum.io> <4F468169.4050909@crans.org> <4F469A02.6060205@crans.org> Message-ID: <4F4785EB.3010605@crans.org> Hi, Le 24/02/2012 01:00, Matthew Brett a ?crit : > Right - no proposal to change float64 because it's not ambiguous - it > is both binary64 IEEE floating point format and 64 bit width. All right ! Focusing the renaming only on those "extended precision" float types makes sense. > The confusion here is for float128 - which is very occasionally IEEE > binary128 and can be at least two other things (PPC twin double, and > Intel 80 bit padded to 128 bits). Some of us were also surprised to > find float96 is the same precision as float128 (being an 80 bit Intel > padded to 96 bits). > > The renaming is an attempt to make it less confusing. Do you agree > the renaming is less confusing? Do you have another proposal? > > Preferring 'longdouble' is precisely to flag up to people that they > may need to do some more research to find out what exactly that is. > Which is correct :) The renaming scheme you mentionned (float80_96, float80_128, float128_ieee, float_pair_128 ) is very informative, maybe too much ! (In this list, I would shorten float128_ieee -> float128 though). So in the end, I may concur with you on "longdouble" as a good name for "extended precision" in the Intel 80 bits sense. (Should "longfloat" be deprecated ?). float128 may be kept for ieee definition only, since it looks like the natural extension of float64. Maybe one day it will be available on our "standard" machines ? Also I just browsed Wikipedia's page [1] to get a bit of background and I wonder what is the use case of these 80 bits numbers apart from what is described as "keeping intermediate results" when performing exponentiation on doubles ? Best, Pierre [1] http://en.wikipedia.org/wiki/Extended_precision -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From ndbecker2 at gmail.com Fri Feb 24 07:54:56 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 24 Feb 2012 07:54:56 -0500 Subject: [Numpy-discussion] mkl usage References: Message-ID: Francesc Alted wrote: > On Feb 23, 2012, at 2:19 PM, Neal Becker wrote: > >> Pauli Virtanen wrote: >> >>> 23.02.2012 20:44, Francesc Alted kirjoitti: >>>> On Feb 23, 2012, at 1:33 PM, Neal Becker wrote: >>>> >>>>> Is mkl only used for linear algebra? Will it speed up e.g., elementwise >>>>> transendental functions? >>>> >>>> Yes, MKL comes with VML that has this type of optimizations: >>> >>> And also no, in the sense that Numpy and Scipy don't use VML. >>> >> >> My question is: >> >> "Should I purchase MKL?" >> >> To what extent will it speed up my existing python code, without my having to >> exert (much) effort? >> >> So that would be numpy/scipy. > > Pauli already answered you. If you are restricted to use numpy/scipy and your > aim is to accelerate the evaluation of transcendental functions, then there is > no point in purchasing MKL. If you can open your spectrum and use numexpr, > then I think you should ponder about it. > > -- Francesc Alted Thanks. One more thing, on theano I'm guessing MKL is required to be installed onto each host that would use it at runtime? So I'd need a licensed copy for each host that will execute the code? I'm guessing that because theano needs to compile code at runtime. From Nicolas.Rougier at inria.fr Fri Feb 24 07:55:41 2012 From: Nicolas.Rougier at inria.fr (Nicolas Rougier) Date: Fri, 24 Feb 2012 13:55:41 +0100 Subject: [Numpy-discussion] Matrices and arrays of vectors In-Reply-To: <4F477E7B.5000400@cam.ac.uk> References: <4F477E7B.5000400@cam.ac.uk> Message-ID: You should use a (M,N,2) array to store your vectors: import math import numpy import numpy.random # Rotation angle theta = math.pi/6.0 # Grid shape M = 10 N = 10 # Establish the rotation matrix c = math.cos(theta) s = math.sin(theta) rotation = numpy.array([[c, s], [-1.0*s, c]]) # Fudge some data to work with data = numpy.random.uniform(-1.0, 1.0, (M,N,2)) numpy.dot(data,rotation) Nicolas On Feb 24, 2012, at 13:11 , Bob Dowling wrote: > import math > import numpy > import numpy.random > > # Rotation angle > theta = math.pi/6.0 > > # Grid shape > M = 10 > N = 10 > > # Establish the rotation matrix > c = math.cos(theta) > s = math.sin(theta) > rotation = numpy.matrix([[c, s], [-1.0*s, c]]) > > # Fudge some data to work with > data = numpy.random.uniform(-1.0, 1.0, 2*M*N).reshape(2,M,N) > > # THIS DOES NOT WORK. > # BUT WHAT DOES? > rotated_data = rotation*data From pierre.haessig at crans.org Fri Feb 24 08:06:53 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Fri, 24 Feb 2012 14:06:53 +0100 Subject: [Numpy-discussion] Matrices and arrays of vectors In-Reply-To: References: <4F477E7B.5000400@cam.ac.uk> Message-ID: <4F478B6D.2040407@crans.org> Hi, Le 24/02/2012 13:55, Nicolas Rougier a ?crit : > You should use a (M,N,2) array to store your vectors: > [...] > [...] > numpy.dot(data,rotation) looking at how numpy.dot generalizes the matrix product* to N-dim arrays, I came to the same conclusion. I just suspect that the 'rotation' array should be transposed. (or flip the sign of theta which is equivalent...) Best, Pierre * from numpy.dot docstring : """" dot(a, b) Dot product of two arrays. For N dimensions it is a sum product over the *last* axis of `a` and the *second-to-last* of `b`:: dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m]) """" -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From gruben at bigpond.net.au Fri Feb 24 08:11:59 2012 From: gruben at bigpond.net.au (gary ruben) Date: Sat, 25 Feb 2012 00:11:59 +1100 Subject: [Numpy-discussion] Matrices and arrays of vectors In-Reply-To: <4F477E7B.5000400@cam.ac.uk> References: <4F477E7B.5000400@cam.ac.uk> Message-ID: I haven't checked correctness, but how about np.tensordot(rotation, data, axes=1) Gary R On 24 February 2012 23:11, Bob Dowling wrote: > Conceptually, I have a 2-d grid of 2-d vectors. ?I am representing this > as an ndarray of shape (2,M,N). ?I want to apply a 2x2 matrix > individually to each vector in the grid. > > However, I can't work out the appropriate syntax for getting the matrix > multiplication to broadcast over the grid. ?All attempts at > multiplication seem to want to cast the 3-d array into a matricx with > the inevitable error message "ValueError: shape too large to be a matrix." > > Does such a method exist? > > This is a dummy script to show the sort of thing I'm trying to do: > > import math > import numpy > import numpy.random > > # Rotation angle > theta = math.pi/6.0 > > # Grid shape > M = 10 > N = 10 > > # Establish the rotation matrix > c = math.cos(theta) > s = math.sin(theta) > rotation = numpy.matrix([[c, s], [-1.0*s, c]]) > > # Fudge some data to work with > data = numpy.random.uniform(-1.0, 1.0, 2*M*N).reshape(2,M,N) > > # THIS DOES NOT WORK. > # BUT WHAT DOES? > rotated_data = rotation*data > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From pierre.haessig at crans.org Fri Feb 24 08:29:17 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Fri, 24 Feb 2012 14:29:17 +0100 Subject: [Numpy-discussion] Test survey that I have been putting together In-Reply-To: References: Message-ID: <4F4790AD.8050608@crans.org> Hi, Great idea ! What's the plan to spread the word about this survey ? Is it about forwarding the link to friends and colleagues ? Le 24/02/2012 01:20, Travis Oliphant a ?crit : > After you complete the survey, I would really appreciate any feedback on questions that could be improved, removed, or added. Q15 "Do you feel NumPy's existing feature set meets your needs?" Yes/No binary answer -> I would like to answer something like "quite yes" ! In terms of features request, I saw the question about "which features I'd be ready to pay for", and all those seemed pretty fancy advanced functions that are out my personal simple use cases. (I guess this relates to Continuum, is that right ? ) However, there are "simpler" features that I'd like to have, some being under development now (like loadtxt & friends, datetime, NAs, ...) but which are not mentioned in any question. Are these feature absent from this survey on purpose ? Best, Pierre -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From hmgaudecker at gmail.com Fri Feb 24 08:49:02 2012 From: hmgaudecker at gmail.com (Hans-Martin v. Gaudecker) Date: Fri, 24 Feb 2012 14:49:02 +0100 Subject: [Numpy-discussion] Test survey that I have been putting together In-Reply-To: References: Message-ID: <0941707A-DE58-4CD8-9029-CF7F5D2EE0CB@gmail.com> > From: Travis Oliphant > After you complete the survey, I would really appreciate any feedback on questions that could be improved, removed, or added. Hi Travis, I didn't really get whom you mean by "they" in: 5. What do they want you to be using (technologies, languages, libraries)? Is that supposed to be getting back to 2. -- colleagues, coauthors? Or the organisation in 4.? The cluster question is not really covering the use-case for many University users, I imagine. E.g. our cluster here has a large number of nodes, but it is shared among many users so I would hardly be able to use more than a few dozen nodes at a time. Maybe rephrase in terms of access? And although I am aware that it might stir up some emotions on the list, adding "missing value support" to the desired features would be useful IMHO. Best, Hans-Martin From rjd4+numpy at cam.ac.uk Fri Feb 24 08:49:37 2012 From: rjd4+numpy at cam.ac.uk (Bob Dowling) Date: Fri, 24 Feb 2012 13:49:37 +0000 Subject: [Numpy-discussion] Matrices and arrays of vectors In-Reply-To: References: <4F477E7B.5000400@cam.ac.uk> Message-ID: <4F479571.4000808@cam.ac.uk> numpy.dot and numpy.tensordot do exactly what I need. Thank you all (and especially the gentleman who spotted I was rotating in the wrong direction). From pierre.haessig at crans.org Fri Feb 24 08:59:18 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Fri, 24 Feb 2012 14:59:18 +0100 Subject: [Numpy-discussion] Matrices and arrays of vectors In-Reply-To: <4F479571.4000808@cam.ac.uk> References: <4F477E7B.5000400@cam.ac.uk> <4F479571.4000808@cam.ac.uk> Message-ID: <4F4797B6.6000907@crans.org> Le 24/02/2012 14:49, Bob Dowling a ?crit : > Thank you all (and especially the gentleman who spotted I was rotating > in the wrong direction). No I hadn't ! I had just mentioned the transpose issue for writing "numpy.dot(data,rotation)". So in the end the two sign flips cancel each other and Nicolas' code should give the right answer out of the box ! :-) -- Pierre -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From erin.sheldon at gmail.com Fri Feb 24 09:07:22 2012 From: erin.sheldon at gmail.com (Erin Sheldon) Date: Fri, 24 Feb 2012 09:07:22 -0500 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: Message-ID: <1330092347-sup-3918@rohan> Excerpts from Travis Oliphant's message of Thu Feb 23 15:08:52 -0500 2012: > This is actually on my short-list as well --- it just didn't make it to the list. > > In fact, we have someone starting work on it this week. It is his > first project so it will take him a little time to get up to speed on > it, but he will contact Wes and work with him and report progress to > this list. > > Integration with np.loadtxt is a high-priority. I think loadtxt is > now the 3rd or 4th "text-reading" interface I've seen in NumPy. I > have no interest in making a new one if we can avoid it. But, we do > need to make it faster with less memory overhead for simple cases like > Wes describes. I'm willing to adapt my code if it is wanted, but at the same time I don't want to step on this person's toes. Should I proceed? -e -- Erin Scott Sheldon Brookhaven National Laboratory From gregor.thalhammer at gmail.com Fri Feb 24 09:35:23 2012 From: gregor.thalhammer at gmail.com (Gregor Thalhammer) Date: Fri, 24 Feb 2012 15:35:23 +0100 Subject: [Numpy-discussion] mkl usage In-Reply-To: References: Message-ID: Am 24.2.2012 um 13:54 schrieb Neal Becker: > Francesc Alted wrote: > >> On Feb 23, 2012, at 2:19 PM, Neal Becker wrote: >> >>> Pauli Virtanen wrote: >>> >>>> 23.02.2012 20:44, Francesc Alted kirjoitti: >>>>> On Feb 23, 2012, at 1:33 PM, Neal Becker wrote: >>>>> >>>>>> Is mkl only used for linear algebra? Will it speed up e.g., elementwise >>>>>> transendental functions? >>>>> >>>>> Yes, MKL comes with VML that has this type of optimizations: >>>> >>>> And also no, in the sense that Numpy and Scipy don't use VML. >>>> >>> >>> My question is: >>> >>> "Should I purchase MKL?" >>> >>> To what extent will it speed up my existing python code, without my having to >>> exert (much) effort? >>> >>> So that would be numpy/scipy. >> >> Pauli already answered you. If you are restricted to use numpy/scipy and your >> aim is to accelerate the evaluation of transcendental functions, then there is >> no point in purchasing MKL. If you can open your spectrum and use numexpr, >> then I think you should ponder about it. >> >> -- Francesc Alted > > Thanks. One more thing, on theano I'm guessing MKL is required to be installed > onto each host that would use it at runtime? So I'd need a licensed copy for > each host that will execute the code? I'm guessing that because theano needs to > compile code at runtime. No, this is not a must. Theano uses the MKL (BLAS) only for linear algebra (dot products). For linear algebra, Theano can also use numpy instead of directly linking to the MKL. So if you have numpy with an optimized BLAS library (MKL) available, you can indirectly use MKL without having it installed on each computer you use Theano. If you want to use fast transcendental functions, you should go for numexpr. Christoph Gohlke provides Windows binaries linked against MKL (for numpy and numexpr), so you don't need to purchase MKL to check how much performance increase you can gain. Long ago I wrote a small package that injects the transcendental functions of MKL/VML into numpy, so you could get better performance without making any changes to your code. But my experience is that you gain only little because typically the performance is limited by memory bandwidth. Perhaps now on multi-core hosts you gain more due since the VML library uses multi-threading. Gregor From markbak at gmail.com Fri Feb 24 09:39:34 2012 From: markbak at gmail.com (Mark Bakker) Date: Fri, 24 Feb 2012 15:39:34 +0100 Subject: [Numpy-discussion] distributing pre-compiled f2py extensions on OSX Message-ID: Two short questions: 1. When I distribute pre-compiled f2py extensions for OSX, it seems that the users need gfortran installed, else it cannot find libgfortran.3.dylib. Is there a way to link that file with the extension? 2. Should extensions compiled on Snowleopard work on Lion? Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From rpyle at post.harvard.edu Fri Feb 24 10:38:25 2012 From: rpyle at post.harvard.edu (Robert Pyle) Date: Fri, 24 Feb 2012 10:38:25 -0500 Subject: [Numpy-discussion] np.longlong casts to int In-Reply-To: <4F4785EB.3010605@crans.org> References: <4F460182.5070004@crans.org> <4A41A25D-7CBF-4D5A-A8C6-B2C495F7D302@continuum.io> <69F4FC8C-896E-4401-BC11-374E49CEBF07@continuum.io> <4F468169.4050909@crans.org> <4F469A02.6060205@crans.org> <4F4785EB.3010605@crans.org> Message-ID: <28FA596A-C373-42EB-A2B7-5B29D3154A91@post.harvard.edu> On Feb 24, 2012, at 7:43 AM, Pierre Haessig wrote: > Hi, > Le 24/02/2012 01:00, Matthew Brett a ?crit : >> Right - no proposal to change float64 because it's not ambiguous - it >> is both binary64 IEEE floating point format and 64 bit width. > All right ! Focusing the renaming only on those "extended precision" > float types makes sense. >> The confusion here is for float128 - which is very occasionally IEEE >> binary128 and can be at least two other things (PPC twin double, and >> Intel 80 bit padded to 128 bits). Some of us were also surprised to >> find float96 is the same precision as float128 (being an 80 bit Intel >> padded to 96 bits). >> >> The renaming is an attempt to make it less confusing. Do you agree >> the renaming is less confusing? Do you have another proposal? >> >> Preferring 'longdouble' is precisely to flag up to people that they >> may need to do some more research to find out what exactly that is. >> Which is correct :) > > The renaming scheme you mentionned (float80_96, float80_128, > float128_ieee, float_pair_128 ) is very informative, maybe too much ! > (In this list, I would shorten float128_ieee -> float128 though). > > So in the end, I may concur with you on "longdouble" as a good name for > "extended precision" in the Intel 80 bits sense. (Should "longfloat" be > deprecated ?). > float128 may be kept for ieee definition only, since it looks like the > natural extension of float64. Maybe one day it will be available on our > "standard" machines ? > > Also I just browsed Wikipedia's page [1] to get a bit of background and > I wonder what is the use case of these 80 bits numbers apart from what > is described as "keeping intermediate results" when performing > exponentiation on doubles ? In AIFF audio files, the sample rate is stored in the Common Chunk as an 80-bit "extended" floating-point number. The field allocated for this is exactly 80 bits wide (i.e., no padding to 96 or 128). The 1989 Apple document defining AIFF can be found at I once wrote my own "save as AIFF" routine and I remember it was a pain to format the 80-bit extended float. Bob Pyle > > Best, > Pierre > > [1] http://en.wikipedia.org/wiki/Extended_precision > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From deniset at trdlnk.com Fri Feb 24 12:10:22 2012 From: deniset at trdlnk.com (Denise Thornton) Date: Fri, 24 Feb 2012 11:10:22 -0600 Subject: [Numpy-discussion] Quantitative Researcher (Chicago, IL) Message-ID: <00a001ccf317$32afc840$980f58c0$@trdlnk.com> Quantitative Researcher (Chicago, IL) Job Description TradeLink Securities is currently hiring a Quantitative Researcher to join our team. We are looking for a quantitative researcher to help develop and test investment and trading strategies. The ideal candidate will have experience analyzing, modeling and managing large scale real world data in programming languages such as Python, C++, R or Matlab. We are heavy users of python and associated scientific computing tools like numpy, scipy, matplotlib, cython, pandas and SQL. Familiarity with these tools is a big plus, but we would consider a candidate with sufficient experience in other environments. Unlike many quantitative shops, our focus is not on applying sophisticated mathematical models to high frequency derivative trading, but rather on using a rigorous approach to detecting, describing and capturing inefficiencies that can cause securities to depart significantly from fair value. You would join a small team of quantitative researchers, programmers, analysts and traders. We manage all aspects of the investment and trading process, from idea generation and testing, production trading implementation, and performance diagnostics and reporting. Responsibilities . Develop and test trading strategies . Conduct research, analyze, and present discoveries . Work closely with other quantitative researchers, programmers, analysts and traders . Optimize current trading strategies/systems and adjust parameters to adapt to changing market conditions . Develop and maintain proprietary databases, reports and monitors . Work closely with the traders to understand business requirements, design efficient and scalable solutions . Create and maintain up to date work plans, followed by communicating status to traders . Perform related duties as required Skills & Requirements . Master's degree, PHD or equivalent in Mathematics, Financial Mathematics, Statistics, Engineering, Econometrics, Physics, Computer Science or related advanced degree in quantitative fields preferred . Experience with C++, R or equivalent is a plus . Data modeling . Statistical skills . Interest in financial markets . Ability to respond rapidly . Strong attention to detail . Excellent written and verbal communication skills. . Ability to work in a high pressure, dynamic trading environment. . Ability to work independently Please apply online at: http://tradelinkllc.atsondemand.com//index.cfm?fuseaction=512172.viewjobdeta il&CID=512172&JID=413727&BUID=1960 -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Fri Feb 24 16:45:55 2012 From: alan.isaac at gmail.com (Alan G Isaac) Date: Fri, 24 Feb 2012 16:45:55 -0500 Subject: [Numpy-discussion] Test survey that I have been putting together In-Reply-To: References: Message-ID: <4F480513.7080603@gmail.com> On 2/23/2012 7:20 PM, Travis Oliphant wrote: > https://www.surveymonkey.com/s/numpy_list_survey > After you complete the survey, I would really appreciate any feedback on questions that could be improved, removed, or added. I felt the survey was targeting business users rather than academic users. fwiw, Alan Isaac From ralf.gommers at googlemail.com Sat Feb 25 05:16:50 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 25 Feb 2012 11:16:50 +0100 Subject: [Numpy-discussion] distributing pre-compiled f2py extensions on OSX In-Reply-To: References: Message-ID: On Fri, Feb 24, 2012 at 3:39 PM, Mark Bakker wrote: > Two short questions: > > 1. When I distribute pre-compiled f2py extensions for OSX, it seems that > the users need gfortran installed, else it cannot find libgfortran.3.dylib. > Is there a way to link that file with the extension? > You can look at how this is done in Scipy. > 2. Should extensions compiled on Snowleopard work on Lion? > Yes. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sat Feb 25 08:14:44 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 25 Feb 2012 14:14:44 +0100 Subject: [Numpy-discussion] Problem Building Numpy with Python 2.7.1 and OS X 10.7.3 In-Reply-To: <440B19D3-1A26-4EE8-AC99-69DFCF93CBC9@uvic.ca> References: <440B19D3-1A26-4EE8-AC99-69DFCF93CBC9@uvic.ca> Message-ID: On Thu, Feb 23, 2012 at 11:28 PM, Patrick Armstrong wrote: > Hi there, > > I'm having a problem building NumPy on Python 2.7.1 and OS X 10.7.3. Here > is my build log: > > https://gist.github.com/1895377 > > Does anyone have any idea what might be happening? I get a very similar > error when compiling with clang. > > Installing a binary really isn't an option for me due to some specifics of > my project. Does anyone have an idea what might be wrong? > > Since you're using pip, I assume that gcc-4.2 is llvm-gcc. As a first step, I suggest using plain gcc and not using pip (so just "python setup.py install"). Also make sure you follow the recommendations in "version specific notes" at http://scipy.org/Installing_SciPy/Mac_OS_X. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From jwevandijk at xs4all.nl Sat Feb 25 14:50:12 2012 From: jwevandijk at xs4all.nl (Janwillem) Date: Sat, 25 Feb 2012 20:50:12 +0100 Subject: [Numpy-discussion] numpy uint16 array to unicode string Message-ID: <4F493B74.2040105@xs4all.nl> I have a buffer as a numpy array of uint16. Chunks of this buffer are unicode characters. How do I get these chunks, say data[start:end], in a string? Other chunks are 32bit integers how do I get these chunks into a numpy array of int32? Many thanks, Janwillem From robert.kern at gmail.com Sat Feb 25 14:56:38 2012 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 25 Feb 2012 19:56:38 +0000 Subject: [Numpy-discussion] numpy uint16 array to unicode string In-Reply-To: <4F493B74.2040105@xs4all.nl> References: <4F493B74.2040105@xs4all.nl> Message-ID: On Sat, Feb 25, 2012 at 19:50, Janwillem wrote: > I have a buffer as a numpy array of uint16. Chunks of this buffer are > unicode characters. How do I get these chunks, say data[start:end], in a > string? data[sart:end].tostring().decode('UTF-16LE') or 'UTF-16BE' if they are big-endian. > Other chunks are 32bit integers how do I get these chunks into a numpy > array of int32? data[start:end].view(np.int32) -- Robert Kern From markbak at gmail.com Sat Feb 25 15:29:51 2012 From: markbak at gmail.com (Mark Bakker) Date: Sat, 25 Feb 2012 21:29:51 +0100 Subject: [Numpy-discussion] distributing pre-compiled f2py extensions on OSX Message-ID: Thanks for the reply, Ralf. Can you point me a bit in the right direction. Scipy is pretty big. Thanks, Mark ------------------------------ > > Message: 2 > Date: Sat, 25 Feb 2012 11:16:50 +0100 > From: Ralf Gommers > Subject: Re: [Numpy-discussion] distributing pre-compiled f2py > extensions on OSX > To: Discussion of Numerical Python > Message-ID: > > > Content-Type: text/plain; charset="iso-8859-1" > > On Fri, Feb 24, 2012 at 3:39 PM, Mark Bakker wrote: > > > Two short questions: > > > > 1. When I distribute pre-compiled f2py extensions for OSX, it seems that > > the users need gfortran installed, else it cannot find > libgfortran.3.dylib. > > Is there a way to link that file with the extension? > > > > You can look at how this is done in Scipy. > > > > 2. Should extensions compiled on Snowleopard work on Lion? > > > > Yes. > > Ralf > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120225/9fa76d70/attachment-0001.html > > ------------------------------ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jwevandijk at xs4all.nl Sat Feb 25 15:32:47 2012 From: jwevandijk at xs4all.nl (Janwillem) Date: Sat, 25 Feb 2012 21:32:47 +0100 Subject: [Numpy-discussion] numpy uint16 array to unicode string In-Reply-To: References: <4F493B74.2040105@xs4all.nl> Message-ID: <4F49456F.40700@xs4all.nl> Thanks, works, should have found that myself. On 25-02-2012 20:56, Robert Kern wrote: > On Sat, Feb 25, 2012 at 19:50, Janwillem wrote: >> I have a buffer as a numpy array of uint16. Chunks of this buffer are >> unicode characters. How do I get these chunks, say data[start:end], in a >> string? > data[sart:end].tostring().decode('UTF-16LE') > > or 'UTF-16BE' if they are big-endian. > >> Other chunks are 32bit integers how do I get these chunks into a numpy >> array of int32? > data[start:end].view(np.int32) > From wesmckinn at gmail.com Sat Feb 25 15:49:37 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Sat, 25 Feb 2012 15:49:37 -0500 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: <1330092347-sup-3918@rohan> References: <1330092347-sup-3918@rohan> Message-ID: On Fri, Feb 24, 2012 at 9:07 AM, Erin Sheldon wrote: > Excerpts from Travis Oliphant's message of Thu Feb 23 15:08:52 -0500 2012: >> This is actually on my short-list as well --- it just didn't make it to the list. >> >> In fact, we have someone starting work on it this week. ?It is his >> first project so it will take him a little time to get up to speed on >> it, but he will contact Wes and work with him and report progress to >> this list. >> >> Integration with np.loadtxt is a high-priority. ?I think loadtxt is >> now the 3rd or 4th "text-reading" interface I've seen in NumPy. ?I >> have no interest in making a new one if we can avoid it. ? But, we do >> need to make it faster with less memory overhead for simple cases like >> Wes describes. > > I'm willing to adapt my code if it is wanted, but at the same time I > don't want to step on this person's toes. ?Should I proceed? > > -e > -- > Erin Scott Sheldon > Brookhaven National Laboratory > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion That may work-- I haven't taken a look at the code but it is probably a good starting point. We could create a new repo on the pydata GitHub org (http://github.com/pydata) and use that as our point of collaboration. I will hopefully be able to put some serious energy into this this spring. - Wes From bergstrj at iro.umontreal.ca Sat Feb 25 16:44:11 2012 From: bergstrj at iro.umontreal.ca (James Bergstra) Date: Sat, 25 Feb 2012 16:44:11 -0500 Subject: [Numpy-discussion] bincount([], minlength=2) should work right? Message-ID: bincount([]) makes no sense, but if a minlength argument is provided, then the routine should succeed. It fails in 1.6.1, has it been fixed in master? - James -- http://www-etud.iro.umontreal.ca/~bergstrj From erin.sheldon at gmail.com Sat Feb 25 17:04:54 2012 From: erin.sheldon at gmail.com (Erin Sheldon) Date: Sat, 25 Feb 2012 17:04:54 -0500 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: <1330092347-sup-3918@rohan> Message-ID: <1330207186-sup-1957@rohan> Excerpts from Wes McKinney's message of Sat Feb 25 15:49:37 -0500 2012: > That may work-- I haven't taken a look at the code but it is probably > a good starting point. We could create a new repo on the pydata GitHub > org (http://github.com/pydata) and use that as our point of > collaboration. I will hopefully be able to put some serious energy > into this this spring. First I want to make sure that we are not duplicating effort of the person Travis mentioned. Logistically, I think it is probably easier to just fork numpy into my github account and then work it directly into the code base, and ask for a pull request when things are ready. I expect I could have something with all the required features ready in a week or so. It is mainly just porting the code from C++ to C, and writing the interfaces by hand instead of with swig; I've got plenty of experience with that, so it should be straightforward. -e -- Erin Scott Sheldon Brookhaven National Laboratory From alan.isaac at gmail.com Sat Feb 25 17:13:31 2012 From: alan.isaac at gmail.com (Alan G Isaac) Date: Sat, 25 Feb 2012 17:13:31 -0500 Subject: [Numpy-discussion] bincount([], minlength=2) should work right? In-Reply-To: References: Message-ID: <4F495D0B.5010007@gmail.com> On 2/25/2012 4:44 PM, James Bergstra wrote: > bincount([]) makes no sense, I disagree: http://permalink.gmane.org/gmane.comp.python.numeric.general/42041 > but if a minlength argument is provided, > then the routine should succeed. Definitely! Alan Isaac From jsseabold at gmail.com Sat Feb 25 17:16:06 2012 From: jsseabold at gmail.com (Skipper Seabold) Date: Sat, 25 Feb 2012 17:16:06 -0500 Subject: [Numpy-discussion] bincount([], minlength=2) should work right? In-Reply-To: <4F495D0B.5010007@gmail.com> References: <4F495D0B.5010007@gmail.com> Message-ID: On Sat, Feb 25, 2012 at 5:13 PM, Alan G Isaac wrote: > On 2/25/2012 4:44 PM, James Bergstra wrote: >> bincount([]) makes no sense, > > I disagree: > http://permalink.gmane.org/gmane.comp.python.numeric.general/42041 > > >> but if a minlength argument is provided, >> then the routine should succeed. > > Definitely! > There's a PR to fix this here. https://github.com/numpy/numpy/pull/84 Skipper From ben.root at ou.edu Sat Feb 25 17:17:40 2012 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 25 Feb 2012 16:17:40 -0600 Subject: [Numpy-discussion] bincount([], minlength=2) should work right? In-Reply-To: <4F495D0B.5010007@gmail.com> References: <4F495D0B.5010007@gmail.com> Message-ID: On Saturday, February 25, 2012, Alan G Isaac wrote: > On 2/25/2012 4:44 PM, James Bergstra wrote: > > bincount([]) makes no sense, > > I disagree: > http://permalink.gmane.org/gmane.comp.python.numeric.general/42041 > > > > but if a minlength argument is provided, > > then the routine should succeed. > > Definitely! > > Alan Isaac I thought we already fixed this? Or was that only for histogram? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From bergstrj at iro.umontreal.ca Sat Feb 25 17:34:32 2012 From: bergstrj at iro.umontreal.ca (James Bergstra) Date: Sat, 25 Feb 2012 17:34:32 -0500 Subject: [Numpy-discussion] bincount([], minlength=2) should work right? In-Reply-To: <4F495D0B.5010007@gmail.com> References: <4F495D0B.5010007@gmail.com> Message-ID: On Sat, Feb 25, 2012 at 5:13 PM, Alan G Isaac wrote: > On 2/25/2012 4:44 PM, James Bergstra wrote: >> bincount([]) makes no sense, > > I disagree: > http://permalink.gmane.org/gmane.comp.python.numeric.general/42041 > gmane is down to me at the moment, but if this argues that the answer should be [], that would make sense to me too. I was too quick about undefining it in my post. -- http://www-etud.iro.umontreal.ca/~bergstrj From ralf.gommers at googlemail.com Sat Feb 25 17:38:47 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 25 Feb 2012 23:38:47 +0100 Subject: [Numpy-discussion] distributing pre-compiled f2py extensions on OSX In-Reply-To: References: Message-ID: On Sat, Feb 25, 2012 at 9:29 PM, Mark Bakker wrote: > Thanks for the reply, Ralf. > Can you point me a bit in the right direction. > Scipy is pretty big. > All Fortran sources in Scipy are wrapped with f2py, and can be compiled with gfortran the way you want. As a simple example, have a look at integrate/setup.py to see how the extension "dop" is compiled. You can probably just use a similar config.add_library() call. If you only need the exact compile arguments, just compile scipy and get them from the build log. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Sat Feb 25 18:26:02 2012 From: kwgoodman at gmail.com (Keith Goodman) Date: Sat, 25 Feb 2012 15:26:02 -0800 Subject: [Numpy-discussion] Creating a bool array with Cython Message-ID: Is this a reasonable (and fast) way to create a bool array in cython? def makebool(): cdef: int n = 2 np.npy_intp *dims = [n] np.ndarray[np.uint8_t, ndim=1] a a = PyArray_EMPTY(1, dims, NPY_UINT8, 0) a[0] = 1 a[1] = 0 a.dtype = np.bool return a Will a numpy bool array be np.uint8 on all platforms? How can I do a.dtype=np.bool using the numpy C api? Is there any point (speed) in doing so? From d.s.seljebotn at astro.uio.no Sat Feb 25 22:04:07 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 25 Feb 2012 19:04:07 -0800 Subject: [Numpy-discussion] Creating a bool array with Cython In-Reply-To: References: Message-ID: <4F49A127.1060202@astro.uio.no> On 02/25/2012 03:26 PM, Keith Goodman wrote: > Is this a reasonable (and fast) way to create a bool array in cython? > > def makebool(): > cdef: > int n = 2 > np.npy_intp *dims = [n] > np.ndarray[np.uint8_t, ndim=1] a > a = PyArray_EMPTY(1, dims, NPY_UINT8, 0) > a[0] = 1 > a[1] = 0 > a.dtype = np.bool > return a > > Will a numpy bool array be np.uint8 on all platforms? Someone else will have to answer that for sure, though I expect and always assumed so. > How can I do a.dtype=np.bool using the numpy C api? Is there any point > (speed) in doing so? Did you try np.ndarray[np.uint8_t, ndim=1, cast=True] a ? You should be able to do that and then pass NPY_BOOL to PyArray_EMPTY, to avoid having to change the dtype later. Dag From teoliphant at gmail.com Sun Feb 26 01:16:53 2012 From: teoliphant at gmail.com (Travis Oliphant) Date: Sun, 26 Feb 2012 00:16:53 -0600 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: <1330092347-sup-3918@rohan> References: <1330092347-sup-3918@rohan> Message-ID: I will just let Jay know that he should coordinate with you. It would be helpful for him to have someone to collaborate with on this. I'm looking forward to seeing your code. Definitely don't hold back on our account. We will adapt to whatever you can offer. Best regards, -Travis On Feb 24, 2012, at 8:07 AM, Erin Sheldon wrote: > Excerpts from Travis Oliphant's message of Thu Feb 23 15:08:52 -0500 2012: >> This is actually on my short-list as well --- it just didn't make it to the list. >> >> In fact, we have someone starting work on it this week. It is his >> first project so it will take him a little time to get up to speed on >> it, but he will contact Wes and work with him and report progress to >> this list. >> >> Integration with np.loadtxt is a high-priority. I think loadtxt is >> now the 3rd or 4th "text-reading" interface I've seen in NumPy. I >> have no interest in making a new one if we can avoid it. But, we do >> need to make it faster with less memory overhead for simple cases like >> Wes describes. > > I'm willing to adapt my code if it is wanted, but at the same time I > don't want to step on this person's toes. Should I proceed? > > -e > -- > Erin Scott Sheldon > Brookhaven National Laboratory From ndbecker2 at gmail.com Sun Feb 26 09:52:16 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Sun, 26 Feb 2012 09:52:16 -0500 Subject: [Numpy-discussion] Creating a bool array with Cython References: Message-ID: Keith Goodman wrote: > Is this a reasonable (and fast) way to create a bool array in cython? > > def makebool(): > cdef: > int n = 2 > np.npy_intp *dims = [n] > np.ndarray[np.uint8_t, ndim=1] a > a = PyArray_EMPTY(1, dims, NPY_UINT8, 0) > a[0] = 1 > a[1] = 0 > a.dtype = np.bool > return a > > Will a numpy bool array be np.uint8 on all platforms? > > How can I do a.dtype=np.bool using the numpy C api? Is there any point > (speed) in doing so? Cython has it's own mail lists - maybe you can find more answers there. From scipy at samueljohn.de Sun Feb 26 10:38:41 2012 From: scipy at samueljohn.de (Samuel John) Date: Sun, 26 Feb 2012 16:38:41 +0100 Subject: [Numpy-discussion] Problem Building Numpy with Python 2.7.1 and OS X 10.7.3 In-Reply-To: References: <440B19D3-1A26-4EE8-AC99-69DFCF93CBC9@uvic.ca> Message-ID: <3037AFD6-FCBB-40E0-8AE1-051C9DDF27AE@samueljohn.de> Hi The plain gcc (non-llvm) is no longer there, if you install Lion and directly Xcode 4.3. Only, if you have the old Xcode 4.2 or lower, then you may have a non-llvm gcc. For Xcode 4.3, I recommend installing the "Command Line Tools for Xcode" from the preferences of Xcode. Then you'll have the unix tools and compilers for building software. The solution is to compile numpy and scipy with clang. I had no problems so far but I think few people actually compiled it with clang. The issue #1500 (scipy) may help here. http://projects.scipy.org/scipy/ticket/1500 On 25.02.2012, at 14:14, Ralf Gommers wrote: > Since you're using pip, I assume that gcc-4.2 is llvm-gcc. As a first step, I suggest using plain gcc and not using pip (so just "python setup.py install"). Also make sure you follow the recommendations in "version specific notes" at http://scipy.org/Installing_SciPy/Mac_OS_X. This website should be updated. cheers, Samuel From kwgoodman at gmail.com Sun Feb 26 11:08:16 2012 From: kwgoodman at gmail.com (Keith Goodman) Date: Sun, 26 Feb 2012 08:08:16 -0800 Subject: [Numpy-discussion] Creating a bool array with Cython In-Reply-To: <4F49A127.1060202@astro.uio.no> References: <4F49A127.1060202@astro.uio.no> Message-ID: On Sat, Feb 25, 2012 at 7:04 PM, Dag Sverre Seljebotn wrote: > On 02/25/2012 03:26 PM, Keith Goodman wrote: >> Is this a reasonable (and fast) way to create a bool array in cython? >> >> ? ? ?def makebool(): >> ? ? ? ? ?cdef: >> ? ? ? ? ? ? ?int n = 2 >> ? ? ? ? ? ? ?np.npy_intp *dims = [n] >> ? ? ? ? ? ? ?np.ndarray[np.uint8_t, ndim=1] a >> ? ? ? ? ?a = PyArray_EMPTY(1, dims, NPY_UINT8, 0) >> ? ? ? ? ?a[0] = 1 >> ? ? ? ? ?a[1] = 0 >> ? ? ? ? ?a.dtype = np.bool >> ? ? ? ? ?return a >> >> Will a numpy bool array be np.uint8 on all platforms? > > Someone else will have to answer that for sure, though I expect and > always assumed so. > >> How can I do a.dtype=np.bool using the numpy C api? Is there any point >> (speed) in doing so? > > Did you try > > np.ndarray[np.uint8_t, ndim=1, cast=True] a > > ? You should be able to do that and then pass NPY_BOOL to PyArray_EMPTY, > to avoid having to change the dtype later. That looks like the grown up way to write the code. (Plus saves 300 ns in overhead.) Thanks, Dag. (I don't write the code to create the empty output array by hand. I have a templating system that does that. But it does not know how to create bool arrays. Hence the a.dtype = np.bool workaround. Thanks to your advice, I'll try to teach the templating system to create bool arrays so I don't have to remember how to do it.) From ralf.gommers at googlemail.com Sun Feb 26 11:09:16 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 26 Feb 2012 17:09:16 +0100 Subject: [Numpy-discussion] Problem Building Numpy with Python 2.7.1 and OS X 10.7.3 In-Reply-To: <3037AFD6-FCBB-40E0-8AE1-051C9DDF27AE@samueljohn.de> References: <440B19D3-1A26-4EE8-AC99-69DFCF93CBC9@uvic.ca> <3037AFD6-FCBB-40E0-8AE1-051C9DDF27AE@samueljohn.de> Message-ID: On Sun, Feb 26, 2012 at 4:38 PM, Samuel John wrote: > Hi > > The plain gcc (non-llvm) is no longer there, if you install Lion and > directly Xcode 4.3. > Only, if you have the old Xcode 4.2 or lower, then you may have a non-llvm > gcc. > > For Xcode 4.3, I recommend installing the "Command Line Tools for Xcode" > from the preferences of Xcode. Then you'll have the unix tools and > compilers for building software. > > The solution is to compile numpy and scipy with clang. I had no problems > so far but I think few people actually compiled it with clang. > > The issue #1500 (scipy) may help here. > http://projects.scipy.org/scipy/ticket/1500 > > > On 25.02.2012, at 14:14, Ralf Gommers wrote: > > Since you're using pip, I assume that gcc-4.2 is llvm-gcc. As a first > step, I suggest using plain gcc and not using pip (so just "python setup.py > install"). Also make sure you follow the recommendations in "version > specific notes" at http://scipy.org/Installing_SciPy/Mac_OS_X. > > This website should be updated. > Samuel, do you want to take a shot at updating it? You've successfully compiled both numpy and scipy in a couple of different ways on Lion, so you'll probably do a much better job there than I could. It's a wiki page, so you should be able to edit it directly. Thanks, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at enthought.com Sun Feb 26 12:23:50 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Sun, 26 Feb 2012 11:23:50 -0600 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: Message-ID: On Thu, Feb 23, 2012 at 2:19 PM, Warren Weckesser < warren.weckesser at enthought.com> wrote: > > On Thu, Feb 23, 2012 at 2:08 PM, Travis Oliphant wrote: > >> This is actually on my short-list as well --- it just didn't make it to >> the list. >> >> In fact, we have someone starting work on it this week. It is his first >> project so it will take him a little time to get up to speed on it, but he >> will contact Wes and work with him and report progress to this list. >> >> Integration with np.loadtxt is a high-priority. I think loadtxt is now >> the 3rd or 4th "text-reading" interface I've seen in NumPy. I have no >> interest in making a new one if we can avoid it. But, we do need to make >> it faster with less memory overhead for simple cases like Wes describes. >> >> -Travis >> >> > > I have a "proof of concept" CSV reader written in C (with a Cython > wrapper). I'll put it on github this weekend. > > Warren > > The text reader that I've been working on is now on github: https://github.com/WarrenWeckesser/textreader Currently it makes two passes through the file. The first pass just counts the number of rows. It then allocates the array and reads the file again to parse the data and fill in the array. Eventually the first pass wll be optional, and you'll be able to specify how many rows to read (and then continue reading another block if you haven't read the entire file). You currently have to give the dtype as a structured array. That would be nice to fix. Actually, there are quite a few "must have" features that it doesn't have yet. One issue that this code handles is newlines embedded in quoted fields. Excel can generate and read files like this: 1.0,2.0,"foo bar" That is one "row" with three fields. The third field contains "foo\nbar". I haven't pushed it to the extreme, but the "big" example (in the examples/ directory) is a 1 gig text file with 2 million rows and 50 fields in each row. This is read in less than 30 seconds (but that's with a solid state drive). Quoting the README file: "This is experimental, unreleased software. Use at your own risk." There are some hard-coded buffer sizes (that eventually should be dynamic), and the error checking is not complete, so mistakes or unanticipated cases can result in seg. faults. Warren > > >> >> On Feb 23, 2012, at 1:53 PM, Pauli Virtanen wrote: >> >> > Hi, >> > >> > 23.02.2012 20:32, Wes McKinney kirjoitti: >> > [clip] >> >> To be clear: I'm going to do this eventually whether or not it >> >> happens in NumPy because it's an existing problem for heavy >> >> pandas users. I see no reason why the code can't emit structured >> >> arrays, too, so we might as well have a common library component >> >> that I can use in pandas and specialize to the DataFrame internal >> >> structure. >> > >> > If you do this, one useful aim could be to design the code such that it >> > can be used in loadtxt, at least as a fast path for common cases. I'd >> > really like to avoid increasing the number of APIs for text file >> loading. >> > >> > -- >> > Pauli Virtanen >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun Feb 26 12:46:31 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 26 Feb 2012 12:46:31 -0500 Subject: [Numpy-discussion] log relative error Message-ID: (I got distracted by some numerical accuracy checks. np.polyfit looks good in NIST test.) Does numpy have something like this? def lre(actual, desired): '''calculate log relative error, number of correct significant digits not an informative function name Parameters ---------- actual : array_like actual values desired : array_like desired values Returns ------- lre : ndarray number of significant digits, uses relative absolute difference if desired is not zero, and absolute difference if desired is zero. References ---------- http://en.wikibooks.org/wiki/Statistics:Numerical_Methods/Numerical_Comparison_of_Statistical_Software#Measuring_Accuracy http://digilander.libero.it/foxes/StRD_Benchmarks/X_NIST_StRD_Benchmarks.htm ''' actual = np.atleast_1d(np.asarray(actual)) desired = np.atleast_1d(np.asarray(desired)) mask_zero = desired == 0 dig = np.zeros(desired.shape) dig[mask_zero] = -np.log10(np.abs(actual[mask_zero])) dig[~mask_zero] = -np.log10(np.abs(actual[~mask_zero] - desired[~mask_zero]) / np.abs(desired[~mask_zero])) if np.size(dig) == 1: dig = np.squeeze(dig)[()] return dig Josef From wesmckinn at gmail.com Sun Feb 26 13:37:56 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Sun, 26 Feb 2012 13:37:56 -0500 Subject: [Numpy-discussion] Migrating issues to GitHub In-Reply-To: References: Message-ID: On Thu, Feb 16, 2012 at 4:32 PM, Ralf Gommers wrote: > > > On Thu, Feb 16, 2012 at 10:20 PM, Thouis (Ray) Jones > wrote: >> >> On Thu, Feb 16, 2012 at 19:25, Ralf Gommers >> wrote: >> > In another thread Jira was proposed as an alternative to Trac. Can you >> > point >> > out some of its strengths and weaknesses, and tell us why you decided to >> > move away from it? >> >> .... > > > >> >> Jira reminded me of Java. > > > OK, you convinced me:) > > Ralf > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Having just created a NumPy ticket (http://projects.scipy.org/numpy/ticket/2065) I feel pretty strongly about moving the issue tracker to GitHub--the lack of attachments is easy to work around. I think it will help a lot for building more community engagement with the development process. My experience using it with pandas has been very positive-- I have churned through around 700 issues over the last 12 months and I've never felt like it's gotten in my way (except for the occasional CSS/JS bugs in the web UI). - Wes From njs at pobox.com Sun Feb 26 14:00:38 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 26 Feb 2012 19:00:38 +0000 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: Message-ID: On Sun, Feb 26, 2012 at 5:23 PM, Warren Weckesser wrote: > I haven't pushed it to the extreme, but the "big" example (in the examples/ > directory) is a 1 gig text file with 2 million rows and 50 fields in each > row.? This is read in less than 30 seconds (but that's with a solid state > drive). Obviously this was just a quick test, but FYI, a solid state drive shouldn't really make any difference here -- this is a pure sequential read, and for those, SSDs are if anything actually slower than traditional spinning-platter drives. For this kind of benchmarking, you'd really rather be measuring the CPU time, or reading byte streams that are already in memory. If you can process more MB/s than the drive can provide, then your code is effectively perfectly fast. Looking at this number has a few advantages: - You get more repeatable measurements (no disk buffers and stuff messing with you) - If your code can go faster than your drive, then the drive won't make your benchmark look bad - There are probably users out there that have faster drives than you (e.g., I just measured ~340 megabytes/s off our lab's main RAID array), so it's nice to be able to measure optimizations even after they stop mattering on your equipment. Cheers, -- Nathaniel From warren.weckesser at enthought.com Sun Feb 26 14:16:10 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Sun, 26 Feb 2012 13:16:10 -0600 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: Message-ID: On Sun, Feb 26, 2012 at 1:00 PM, Nathaniel Smith wrote: > On Sun, Feb 26, 2012 at 5:23 PM, Warren Weckesser > wrote: > > I haven't pushed it to the extreme, but the "big" example (in the > examples/ > > directory) is a 1 gig text file with 2 million rows and 50 fields in each > > row. This is read in less than 30 seconds (but that's with a solid state > > drive). > > Obviously this was just a quick test, but FYI, a solid state drive > shouldn't really make any difference here -- this is a pure sequential > read, and for those, SSDs are if anything actually slower than > traditional spinning-platter drives. > > Good point. > For this kind of benchmarking, you'd really rather be measuring the > CPU time, or reading byte streams that are already in memory. If you > can process more MB/s than the drive can provide, then your code is > effectively perfectly fast. Looking at this number has a few > advantages: > - You get more repeatable measurements (no disk buffers and stuff > messing with you) > - If your code can go faster than your drive, then the drive won't > make your benchmark look bad > - There are probably users out there that have faster drives than you > (e.g., I just measured ~340 megabytes/s off our lab's main RAID > array), so it's nice to be able to measure optimizations even after > they stop mattering on your equipment. > > For anyone benchmarking software like this, be sure to clear the disk cache before each run. In linux: $ sync $ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches" In Mac OSX: $ purge I'm not sure what the equivalent is in Windows. Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: From francesc at continuum.io Sun Feb 26 14:24:02 2012 From: francesc at continuum.io (Francesc Alted) Date: Sun, 26 Feb 2012 13:24:02 -0600 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: Message-ID: <0FA8739F-A026-404A-8A6B-DF92AC2FC89A@continuum.io> On Feb 26, 2012, at 1:16 PM, Warren Weckesser wrote: > For anyone benchmarking software like this, be sure to clear the disk cache before each run. In linux: > > $ sync > $ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches" > It is also a good idea to run a disk-cache enabled test too, just to better see how things can be improved in your code. Disk subsystem is pretty slow, and during development you can get much better feedback by looking at load times from memory, not from disk (also, tests run much faster, so you can save a lot of devel time). > In Mac OSX: > > $ purge Now that I switched to a Mac, this is good to know. Thanks! -- Francesc Alted From njs at pobox.com Sun Feb 26 14:49:47 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 26 Feb 2012 19:49:47 +0000 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: Message-ID: On Sun, Feb 26, 2012 at 7:16 PM, Warren Weckesser wrote: > On Sun, Feb 26, 2012 at 1:00 PM, Nathaniel Smith wrote: >> For this kind of benchmarking, you'd really rather be measuring the >> CPU time, or reading byte streams that are already in memory. If you >> can process more MB/s than the drive can provide, then your code is >> effectively perfectly fast. Looking at this number has a few >> advantages: >> ?- You get more repeatable measurements (no disk buffers and stuff >> messing with you) >> ?- If your code can go faster than your drive, then the drive won't >> make your benchmark look bad >> ?- There are probably users out there that have faster drives than you >> (e.g., I just measured ~340 megabytes/s off our lab's main RAID >> array), so it's nice to be able to measure optimizations even after >> they stop mattering on your equipment. > > > For anyone benchmarking software like this, be sure to clear the disk cache > before each run.? In linux: Err, my argument was that you should do exactly the opposite, and just worry about hot-cache times (or time reading a big in-memory buffer, to avoid having to think about the OS's caching strategies). Clearing the disk cache is very important for getting meaningful, repeatable benchmarks in code where you know that the cache will usually be cold and where hitting the disk will have unpredictable effects (i.e., pretty much anything doing random access, like databases, which have complicated locality patterns, you may or may not trigger readahead, etc.). But here we're talking about pure sequential reads, where the disk just goes however fast it goes, and your code can either keep up or not. One minor point where the OS interface could matter: it's good to set up your code so it can use mmap() instead of read(), since this can reduce overhead. read() has to copy the data from the disk into OS memory, and then from OS memory into your process's memory; mmap() skips the second step. -- Nathaniel From warren.weckesser at enthought.com Sun Feb 26 14:58:35 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Sun, 26 Feb 2012 13:58:35 -0600 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: Message-ID: On Sun, Feb 26, 2012 at 1:49 PM, Nathaniel Smith wrote: > On Sun, Feb 26, 2012 at 7:16 PM, Warren Weckesser > wrote: > > On Sun, Feb 26, 2012 at 1:00 PM, Nathaniel Smith wrote: > >> For this kind of benchmarking, you'd really rather be measuring the > >> CPU time, or reading byte streams that are already in memory. If you > >> can process more MB/s than the drive can provide, then your code is > >> effectively perfectly fast. Looking at this number has a few > >> advantages: > >> - You get more repeatable measurements (no disk buffers and stuff > >> messing with you) > >> - If your code can go faster than your drive, then the drive won't > >> make your benchmark look bad > >> - There are probably users out there that have faster drives than you > >> (e.g., I just measured ~340 megabytes/s off our lab's main RAID > >> array), so it's nice to be able to measure optimizations even after > >> they stop mattering on your equipment. > > > > > > For anyone benchmarking software like this, be sure to clear the disk > cache > > before each run. In linux: > > Err, my argument was that you should do exactly the opposite, and just > worry about hot-cache times (or time reading a big in-memory buffer, > to avoid having to think about the OS's caching strategies). > > Right, I got that. Sorry if the placement of the notes about how to clear the cache seemed to imply otherwise. > Clearing the disk cache is very important for getting meaningful, > repeatable benchmarks in code where you know that the cache will > usually be cold and where hitting the disk will have unpredictable > effects (i.e., pretty much anything doing random access, like > databases, which have complicated locality patterns, you may or may > not trigger readahead, etc.). But here we're talking about pure > sequential reads, where the disk just goes however fast it goes, and > your code can either keep up or not. > > One minor point where the OS interface could matter: it's good to set > up your code so it can use mmap() instead of read(), since this can > reduce overhead. read() has to copy the data from the disk into OS > memory, and then from OS memory into your process's memory; mmap() > skips the second step. > > Thanks for the tip. Do you happen to have any sample code that demonstrates this? I'd like to explore this more. Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: From francesc at continuum.io Sun Feb 26 15:00:01 2012 From: francesc at continuum.io (Francesc Alted) Date: Sun, 26 Feb 2012 14:00:01 -0600 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: Message-ID: On Feb 26, 2012, at 1:49 PM, Nathaniel Smith wrote: > On Sun, Feb 26, 2012 at 7:16 PM, Warren Weckesser > wrote: >> On Sun, Feb 26, 2012 at 1:00 PM, Nathaniel Smith wrote: >>> For this kind of benchmarking, you'd really rather be measuring the >>> CPU time, or reading byte streams that are already in memory. If you >>> can process more MB/s than the drive can provide, then your code is >>> effectively perfectly fast. Looking at this number has a few >>> advantages: >>> - You get more repeatable measurements (no disk buffers and stuff >>> messing with you) >>> - If your code can go faster than your drive, then the drive won't >>> make your benchmark look bad >>> - There are probably users out there that have faster drives than you >>> (e.g., I just measured ~340 megabytes/s off our lab's main RAID >>> array), so it's nice to be able to measure optimizations even after >>> they stop mattering on your equipment. >> >> >> For anyone benchmarking software like this, be sure to clear the disk cache >> before each run. In linux: > > Err, my argument was that you should do exactly the opposite, and just > worry about hot-cache times (or time reading a big in-memory buffer, > to avoid having to think about the OS's caching strategies). > > Clearing the disk cache is very important for getting meaningful, > repeatable benchmarks in code where you know that the cache will > usually be cold and where hitting the disk will have unpredictable > effects (i.e., pretty much anything doing random access, like > databases, which have complicated locality patterns, you may or may > not trigger readahead, etc.). But here we're talking about pure > sequential reads, where the disk just goes however fast it goes, and > your code can either keep up or not. Exactly. > One minor point where the OS interface could matter: it's good to set > up your code so it can use mmap() instead of read(), since this can > reduce overhead. read() has to copy the data from the disk into OS > memory, and then from OS memory into your process's memory; mmap() > skips the second step. Cool. Nice trick! -- Francesc Alted From njs at pobox.com Sun Feb 26 16:00:30 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 26 Feb 2012 21:00:30 +0000 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: Message-ID: On Sun, Feb 26, 2012 at 7:58 PM, Warren Weckesser wrote: > Right, I got that.? Sorry if the placement of the notes about how to clear > the cache seemed to imply otherwise. OK, cool, np. >> Clearing the disk cache is very important for getting meaningful, >> repeatable benchmarks in code where you know that the cache will >> usually be cold and where hitting the disk will have unpredictable >> effects (i.e., pretty much anything doing random access, like >> databases, which have complicated locality patterns, you may or may >> not trigger readahead, etc.). But here we're talking about pure >> sequential reads, where the disk just goes however fast it goes, and >> your code can either keep up or not. >> >> One minor point where the OS interface could matter: it's good to set >> up your code so it can use mmap() instead of read(), since this can >> reduce overhead. read() has to copy the data from the disk into OS >> memory, and then from OS memory into your process's memory; mmap() >> skips the second step. > > Thanks for the tip.? Do you happen to have any sample code that demonstrates > this?? I'd like to explore this more. No, I've never actually run into a situation where I needed it myself, but I learned the trick from Tridge so I tend to believe it :-). mmap() is actually a pretty simple interface -- the only thing I'd watch out for is that you want to mmap() the file in pieces (so as to avoid VM exhaustion on 32-bit systems), but you want to use pretty big pieces (because each call to mmap()/munmap() has overhead). So you might want to use chunks in the 32-128 MiB range. Or since I guess you're probably developing on a 64-bit system you can just be lazy and mmap the whole file for initial testing. git uses mmap, but I'm not sure it's very useful example code. Also it's not going to do magic. Your code has to be fairly quick before avoiding a single memcpy() will be noticeable. HTH, -- Nathaniel From warren.weckesser at enthought.com Sun Feb 26 16:22:35 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Sun, 26 Feb 2012 15:22:35 -0600 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: Message-ID: On Sun, Feb 26, 2012 at 3:00 PM, Nathaniel Smith wrote: > On Sun, Feb 26, 2012 at 7:58 PM, Warren Weckesser > wrote: > > Right, I got that. Sorry if the placement of the notes about how to > clear > > the cache seemed to imply otherwise. > > OK, cool, np. > > >> Clearing the disk cache is very important for getting meaningful, > >> repeatable benchmarks in code where you know that the cache will > >> usually be cold and where hitting the disk will have unpredictable > >> effects (i.e., pretty much anything doing random access, like > >> databases, which have complicated locality patterns, you may or may > >> not trigger readahead, etc.). But here we're talking about pure > >> sequential reads, where the disk just goes however fast it goes, and > >> your code can either keep up or not. > >> > >> One minor point where the OS interface could matter: it's good to set > >> up your code so it can use mmap() instead of read(), since this can > >> reduce overhead. read() has to copy the data from the disk into OS > >> memory, and then from OS memory into your process's memory; mmap() > >> skips the second step. > > > > Thanks for the tip. Do you happen to have any sample code that > demonstrates > > this? I'd like to explore this more. > > No, I've never actually run into a situation where I needed it myself, > but I learned the trick from Tridge so I tend to believe it :-). > mmap() is actually a pretty simple interface -- the only thing I'd > watch out for is that you want to mmap() the file in pieces (so as to > avoid VM exhaustion on 32-bit systems), but you want to use pretty big > pieces (because each call to mmap()/munmap() has overhead). So you > might want to use chunks in the 32-128 MiB range. Or since I guess > you're probably developing on a 64-bit system you can just be lazy and > mmap the whole file for initial testing. git uses mmap, but I'm not > sure it's very useful example code. > > Also it's not going to do magic. Your code has to be fairly quick > before avoiding a single memcpy() will be noticeable. > > HTH, > Yes, thanks! I'm working on a mmap version now. I'm very curious to see just how much of an improvement it can give. Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: From erin.sheldon at gmail.com Sun Feb 26 17:35:00 2012 From: erin.sheldon at gmail.com (Erin Sheldon) Date: Sun, 26 Feb 2012 17:35:00 -0500 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: Message-ID: <1330295443-sup-5380@rohan> Excerpts from Warren Weckesser's message of Sun Feb 26 16:22:35 -0500 2012: > Yes, thanks! I'm working on a mmap version now. I'm very curious to see > just how much of an improvement it can give. FYI, memmap is generally an incomplete solution for numpy arrays; it only understands rows, not columns and rows. If you memmap a rec array on disk and try to load one full column, it still loads the whole file beforehand. This was why I essentially wrote my own memmap like interface with recfile, the code I'm converting. It allows working with columns and rows without loading large chunks of memory. BTW, I think we will definitely benefit from merging some of our codes. When I get my stuff fully converted we should discuss. -e -- Erin Scott Sheldon Brookhaven National Laboratory From erin.sheldon at gmail.com Sun Feb 26 19:26:03 2012 From: erin.sheldon at gmail.com (Erin Sheldon) Date: Sun, 26 Feb 2012 19:26:03 -0500 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: <1330295443-sup-5380@rohan> References: <1330295443-sup-5380@rohan> Message-ID: <1330302294-sup-6649@rohan> Excerpts from Erin Sheldon's message of Sun Feb 26 17:35:00 -0500 2012: > Excerpts from Warren Weckesser's message of Sun Feb 26 16:22:35 -0500 2012: > > Yes, thanks! I'm working on a mmap version now. I'm very curious to see > > just how much of an improvement it can give. > > FYI, memmap is generally an incomplete solution for numpy arrays; it > only understands rows, not columns and rows. If you memmap a rec array > on disk and try to load one full column, it still loads the whole file > beforehand. I read your message out of context. I was referring to interfaces to binary files, but I forgot your only working on the text interface. Sorry for the noise, -e -- Erin Scott Sheldon Brookhaven National Laboratory From patricka at uvic.ca Sun Feb 26 19:47:42 2012 From: patricka at uvic.ca (Patrick Armstrong) Date: Sun, 26 Feb 2012 16:47:42 -0800 Subject: [Numpy-discussion] Problem Building Numpy with Python 2.7.1 and OS X 10.7.3 In-Reply-To: References: <440B19D3-1A26-4EE8-AC99-69DFCF93CBC9@uvic.ca> Message-ID: Hi, On 2012-02-25, at 5:14 AM, Ralf Gommers wrote: > Since you're using pip, I assume that gcc-4.2 is llvm-gcc. As a first step, I suggest using plain gcc and not using pip (so just "python setup.py install"). Also make sure you follow the recommendations in "version specific notes" at http://scipy.org/Installing_SciPy/Mac_OS_X. As is mentioned earlier in the thread, Xcode doesn't distribute plain gcc anymore. I've tried building with llvm-gcc, and with clang, and I get the same failed result. To be sure, I tried building with a plain "python setup.py install", and I get the same result. Here' my build log with clang: https://gist.github.com/1920128 --patrick -------------- next part -------------- An HTML attachment was scrubbed... URL: From tetsuro_kikuchi at jesc.or.jp Sun Feb 26 19:53:28 2012 From: tetsuro_kikuchi at jesc.or.jp (tetsuro_kikuchi at jesc.or.jp) Date: Mon, 27 Feb 2012 09:53:28 +0900 Subject: [Numpy-discussion] How to modify an array Message-ID: Dear sirs, Please allow me to ask you a beginner's question. I have an nparray whose shape is (144, 91, 1). The elements of this array are integer "0", "1" or "2", but I don't know which of the three integers is assigned to each element. I would like to make a copy of this array, and then replace only the elements whose value is "2" into "0". Could you teach how to make such a modification? Sincerely yours, Tetsuro Kikuchi From shish at keba.be Sun Feb 26 20:01:42 2012 From: shish at keba.be (Olivier Delalleau) Date: Sun, 26 Feb 2012 20:01:42 -0500 Subject: [Numpy-discussion] How to modify an array In-Reply-To: References: Message-ID: This should do what you want: array_copy = my_array.copy() array_copy[array_copy == 2] = 0 -=- Olivier Le 26 f?vrier 2012 19:53, a ?crit : > Dear sirs, > > > Please allow me to ask you a beginner's question. > > I have an nparray whose shape is (144, 91, 1). The elements of this array > are integer "0", "1" or "2", but I don't know which of the three integers > is assigned to each element. > I would like to make a copy of this array, and then replace only the > elements whose value is "2" into "0". Could you teach how to make such a > modification? > > > Sincerely yours, > > Tetsuro Kikuchi > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tetsuro_kikuchi at jesc.or.jp Sun Feb 26 21:52:49 2012 From: tetsuro_kikuchi at jesc.or.jp (tetsuro_kikuchi at jesc.or.jp) Date: Mon, 27 Feb 2012 11:52:49 +0900 Subject: [Numpy-discussion] How to modify an array Message-ID: Dear Olivier, Thank you very much for your help. It worked fine! Tetsuro Kikuchi Olivier Delalleau ??: Discussion of Numerical Python ???: cc: numpy-discussion-bounce ??: Re: [Numpy-discussion] How to modify an array s at scipy.org 2012/02/27 10:01 Discussion of Numerical Python ???????? ? This should do what you want: array_copy = my_array.copy() array_copy[array_copy == 2] = 0 -=- Olivier Le 26 f?vrier 2012 19:53, a ?crit : Dear sirs, Please allow me to ask you a beginner's question. I have an nparray whose shape is (144, 91, 1). The elements of this array are integer "0", "1" or "2", but I don't know which of the three integers is assigned to each element. I would like to make a copy of this array, and then replace only the elements whose value is "2" into "0". Could you teach how to make such a modification? Sincerely yours, Tetsuro Kikuchi _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From nathan.faggian at gmail.com Sun Feb 26 23:33:18 2012 From: nathan.faggian at gmail.com (Nathan Faggian) Date: Mon, 27 Feb 2012 15:33:18 +1100 Subject: [Numpy-discussion] numpy ndenumerate Message-ID: Hi, Is there a good reason for ndenumerate in numpy being slower than standard indexing? For example: --- import numpy as np def fast_itt(a): for index, value in np.ndenumerate(a): a[index] += 1 def slow_itt(a): for r in range(0, a.shape[0]): for c in range(0, a.shape[1]): a[r,c] += 1 a = np.zeros((100,100)) %timeit fast_itt(a) 10 loops, best of 3: 25.7 ms per loop %timeit slow_itt(a) 100 loops, best of 3: 13 ms per loop --- I appreciate that there are better ways of operating on arrays but there are many good reasons for permuting through indices and ndenumerate is a nice way of this... I am left wondering why it performs badly in this case. Cheers, Nathan. From jayvius at gmail.com Mon Feb 27 00:24:25 2012 From: jayvius at gmail.com (Jay Bourque) Date: Mon, 27 Feb 2012 05:24:25 +0000 (UTC) Subject: [Numpy-discussion] =?utf-8?q?Possible_roadmap_addendum=3A_buildin?= =?utf-8?q?g_better=09text_file_readers?= References: <1330092347-sup-3918@rohan> <1330207186-sup-1957@rohan> Message-ID: Erin Sheldon gmail.com> writes: > > Excerpts from Wes McKinney's message of Sat Feb 25 15:49:37 -0500 2012: > > That may work-- I haven't taken a look at the code but it is probably > > a good starting point. We could create a new repo on the pydata GitHub > > org (http://github.com/pydata) and use that as our point of > > collaboration. I will hopefully be able to put some serious energy > > into this this spring. > > First I want to make sure that we are not duplicating effort of the > person Travis mentioned. > > Logistically, I think it is probably easier to just fork numpy into my > github account and then work it directly into the code base, and ask for > a pull request when things are ready. > > I expect I could have something with all the required features ready in > a week or so. It is mainly just porting the code from C++ to C, and > writing the interfaces by hand instead of with swig; I've got plenty of > experience with that, so it should be straightforward. > > -e Hi Erin, I'm the one Travis mentioned earlier about working on this. I was planning on diving into it this week, but it sounds like you may have some code already that fits the requirements? If so, I would be available to help you with porting/testing your code with numpy, or I can take what you have and build on it in my numpy fork on github. -Jay Bourque Continuum IO From pierre.haessig at crans.org Mon Feb 27 03:31:13 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Mon, 27 Feb 2012 09:31:13 +0100 Subject: [Numpy-discussion] np.longlong casts to int In-Reply-To: <28FA596A-C373-42EB-A2B7-5B29D3154A91@post.harvard.edu> References: <4F460182.5070004@crans.org> <4A41A25D-7CBF-4D5A-A8C6-B2C495F7D302@continuum.io> <69F4FC8C-896E-4401-BC11-374E49CEBF07@continuum.io> <4F468169.4050909@crans.org> <4F469A02.6060205@crans.org> <4F4785EB.3010605@crans.org> <28FA596A-C373-42EB-A2B7-5B29D3154A91@post.harvard.edu> Message-ID: <4F4B3F51.9090500@crans.org> Le 24/02/2012 16:38, Robert Pyle a ?crit : >> I wonder what is the use case of these 80 bits numbers apart from what >> > is described as "keeping intermediate results" when performing >> > exponentiation on doubles ? > In AIFF audio files, the sample rate is stored in the Common Chunk as an 80-bit "extended" floating-point number. The field allocated for this is exactly 80 bits wide (i.e., no padding to 96 or 128). The 1989 Apple document defining AIFF can be found at > > I once wrote my own "save as AIFF" routine and I remember it was a pain to format the 80-bit extended float. That's an interesting use case ! Thanks for sharing ! -- Pierre -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From xscript at gmx.net Mon Feb 27 08:51:48 2012 From: xscript at gmx.net (=?utf-8?Q?Llu=C3=ADs?=) Date: Mon, 27 Feb 2012 14:51:48 +0100 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: <1330295443-sup-5380@rohan> (Erin Sheldon's message of "Sun, 26 Feb 2012 17:35:00 -0500") References: <1330295443-sup-5380@rohan> Message-ID: <87k4384c5n.fsf@ginnungagap.bsc.es> Erin Sheldon writes: [...] > This was why I essentially wrote my own memmap like interface with > recfile, the code I'm converting. It allows working with columns and > rows without loading large chunks of memory. [...] This sounds like at any point in time you only have one part of the array mapped into the application. My question is then, why would you manually implement the buffering? The OS should already take care of that by unmapping pages when it's short on physical memory, and faulting pages in when you access them. This reminds me of some previous discussion about making the ndarray API more friendly to code that wants to manage the underlying storage, from mmap'ing it to handling compressed storage. Are there any news on that front? Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From jason-sage at creativetrax.com Mon Feb 27 09:19:26 2012 From: jason-sage at creativetrax.com (Jason Grout) Date: Mon, 27 Feb 2012 08:19:26 -0600 Subject: [Numpy-discussion] Nature Editorial: Science Software should be Open Source Message-ID: <4F4B90EE.6010306@creativetrax.com> Jan Groenewald posted this link to the Sage development list, and I thought people here would be interested (and I figured people on the matplotlib, scipy, and ipython lists would see it here too): http://www.nature.com/nature/journal/v482/n7386/full/nature10836.html Thanks, Jason From erin.sheldon at gmail.com Mon Feb 27 09:44:52 2012 From: erin.sheldon at gmail.com (Erin Sheldon) Date: Mon, 27 Feb 2012 09:44:52 -0500 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: <1330092347-sup-3918@rohan> <1330207186-sup-1957@rohan> Message-ID: <1330351883-sup-9943@rohan> Excerpts from Jay Bourque's message of Mon Feb 27 00:24:25 -0500 2012: > Hi Erin, > > I'm the one Travis mentioned earlier about working on this. I was planning on > diving into it this week, but it sounds like you may have some code already that > fits the requirements? If so, I would be available to help you with > porting/testing your code with numpy, or I can take what you have and build on > it in my numpy fork on github. Hi Jay,all - What I've got is a solution for writing and reading structured arrays to and from files, both in text files and binary files. It is written in C and python. It allows reading arbitrary subsets of the data efficiently without reading in the whole file. It defines a class Recfile that exposes an array like interface for reading, e.g. x=rf[columns][rows]. Limitations: Because it was designed with arrays in mind, it doesn't deal with not fixed-width string fields. Also, it doesn't deal with quoted strings, as those are not necessary for writing or reading arrays with fixed length strings. Doesn't deal with missing data. This is where Wes' tokenizing-oriented code might be useful. So there is a fair amount of functionality to be added for edge cases, but it provides a framework. I think some of this can be written into the C code, others will have to be done at the python level. I've forked numpy on my github account, and should have the code added in a few days. I'll send mail when it is ready. Help will be greatly appreciated getting this to work with loadtxt, adding functionality from Wes' and others code, and testing. Also, because it works on binary files too, I think it might be worth it to make numpy.fromfile a python function, and to use a Recfile object when reading subsets of the data. For example numpy.fromfile(f, rows=rows, columns=columns, dtype=dtype) could instantiate a Recfile object to read the column and row subsets. We could rename the C fromfile to something appropriate, and call it when the whole file is being read (recfile uses it internally when reading ranges). thanks, -e -- Erin Scott Sheldon Brookhaven National Laboratory From pjabardo at yahoo.com.br Mon Feb 27 10:10:38 2012 From: pjabardo at yahoo.com.br (Paulo Jabardo) Date: Mon, 27 Feb 2012 07:10:38 -0800 (PST) Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: <1330092347-sup-3918@rohan> <1330207186-sup-1957@rohan> Message-ID: <1330355438.96238.YahooMailNeo@web160601.mail.bf1.yahoo.com> I have a few features that I believe would make text file easier for many people. In some countries (most?) the decimal separator in real numbers is not a point but a comma. I think it would be very useful that the decimal separator be specified with a keyword argument (decimal = '.' for example) on the text reading function. There are workarounds such as previously replacing dots with commas, changing the locale (which is usually a messy solution) but it is always very annoying. I often use rpy to call R's functions read.table or scan to read text files. I have been meaning to write "improved" functions to read text files but lately I find it much simpler to use rpy.? Another thing that is very useful is the ability to read a predetermined number of lines from the file. As of right now loadtxt and genfromtxt both read the entire file AFAICT. Paulo ________________________________ De: Jay Bourque Para: numpy-discussion at scipy.org Enviadas: Segunda-feira, 27 de Fevereiro de 2012 2:24 Assunto: Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers Erin Sheldon gmail.com> writes: > > Excerpts from Wes McKinney's message of Sat Feb 25 15:49:37 -0500 2012: > > That may work-- I haven't taken a look at the code but it is probably > > a good starting point. We could create a new repo on the pydata GitHub > > org (http://github.com/pydata) and use that as our point of > > collaboration. I will hopefully be able to put some serious energy > > into this this spring. > > First I want to make sure that we are not duplicating effort of the > person Travis mentioned. > > Logistically, I think it is probably easier to just fork numpy into my > github account and then work it directly into the code base, and ask for > a pull request when things are ready. > > I expect I could have something with all the required features ready in > a week or so.? It is mainly just porting the code from C++ to C, and > writing the interfaces by hand instead of with swig; I've got plenty of > experience with that, so it should be straightforward. > > -e Hi Erin, I'm the one Travis mentioned earlier about working on this. I was planning on diving into it this week, but it sounds like you may have some code already that fits the requirements? If so, I would be available to help you with porting/testing your code with numpy, or I can take what you have and build on it in my numpy fork on github. -Jay Bourque Continuum IO _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Mon Feb 27 10:53:52 2012 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 27 Feb 2012 10:53:52 -0500 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: <1330355438.96238.YahooMailNeo@web160601.mail.bf1.yahoo.com> References: <1330092347-sup-3918@rohan> <1330207186-sup-1957@rohan> <1330355438.96238.YahooMailNeo@web160601.mail.bf1.yahoo.com> Message-ID: <4F4BA710.8070200@gmail.com> On 2/27/2012 10:10 AM, Paulo Jabardo wrote: > I have a few features that I believe would make text file easier for many people. In some countries (most?) the decimal separator in real numbers is not a point but a comma. > I think it would be very useful that the decimal separator be specified with a keyword argument (decimal = '.' for example) on the text reading function. Down that path lies madness. For a fast reader, just document input format to use "international notation" (i.e., the decimal point) and give the user the responsibility to ensure the data are in the right format. The format translation utilities should be separate, and calling them should be optional. fwiw, Alan Isaac From jmccampbell at enthought.com Mon Feb 27 12:03:31 2012 From: jmccampbell at enthought.com (Jason McCampbell) Date: Mon, 27 Feb 2012 11:03:31 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <5FFF6A19-661E-445A-95F9-B0366C148C0E@continuum.io> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <14348F26-27EB-4313-8F90-D9726DC2144D@continuum.io> <3E9F109C-241F-4B91-832D-CDB09941206B@continuum.io> <5FFF6A19-661E-445A-95F9-B0366C148C0E@continuum.io> Message-ID: > > Sure. This list actually deserves a long writeup about that. First, > there wasn't a "Cython-refactor" of NumPy. There was a Cython-refactor of > SciPy. I'm not sure of it's current status. I'm still very supportive > of that sort of thing. > > > I think I missed that - is it on git somewhere? > > > I thought so, but I can't find it either. We should ask Jason McCampbell > of Enthought where the code is located. Here are the distributed eggs: > http://www.enthought.com/repo/.iron/ > > -Travis > Hi Travis and everyone, just cleaning up email and saw this question. The trees had been in my personal GitHub account prior to Enthought switching over. I forked them now and the paths are: https://github.com/enthought/numpy-refactor https://github.com/enthought/scipy-refactor The numpy code is on the 'refactor' branch. The master branch is dated but consistent (correct commit IDs) with the master NumPy repository on GitHub so the refactor branch should be able to be pushed to the main numpy account if desired. The scipy code was cloned from the subversion repository and so would either need to be moved back to svn or sync'd with any git migration. Jason -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Feb 27 12:07:11 2012 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 27 Feb 2012 17:07:11 +0000 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: <1330351883-sup-9943@rohan> References: <1330092347-sup-3918@rohan> <1330207186-sup-1957@rohan> <1330351883-sup-9943@rohan> Message-ID: On Mon, Feb 27, 2012 at 2:44 PM, Erin Sheldon wrote: > What I've got is a solution for writing and reading structured arrays to > and from files, both in text files and binary files. ?It is written in C > and python. ?It allows reading arbitrary subsets of the data efficiently > without reading in the whole file. ?It defines a class Recfile that > exposes an array like interface for reading, e.g. x=rf[columns][rows]. What format do you use for binary data? Something tiled? I don't understand how you can read in a single column of a standard text or mmap-style binary file any more efficiently than by reading the whole file. -- Nathaniel From pjabardo at yahoo.com.br Mon Feb 27 13:00:09 2012 From: pjabardo at yahoo.com.br (Paulo Jabardo) Date: Mon, 27 Feb 2012 10:00:09 -0800 (PST) Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers References: <1330092347-sup-3918@rohan> <1330207186-sup-1957@rohan> <1330355438.96238.YahooMailNeo@web160601.mail.bf1.yahoo.com> <4F4BA710.8070200@gmail.com> Message-ID: <1330365609.4893.YahooMailNeo@web160606.mail.bf1.yahoo.com> I don't know what is the best solution but this certainly isn't madness. First of all '.' isn't international notation it is used in some countries. In most of Europe (and Latin America) the comma is used. Anyone in countries that use a comma as a separator will stumble upon text files with comma as decimal separators very often. Usually a simple search and replace is sufficient but if if the data has string fields, one might mess up the data. Is this the most important feature? Of course not but it helps a lot. As a matter of fact, one of the reasons I started to use R years ago was the flexibility of the function read.table: I don't have to worry about tabular data in text text files, I know I can read them (most of the time...). Now, I use rpy to call read.table. As for speed, right now read.table is faster than loadtxt. Of course numpy shouldn't simply reproduce any feature found in R (or matlab, scilab, etc) but reading data from external sources is a very important step in any data analysis (and often a difficult step). So while this feature is not a top priority it is important for anyone that has to deal with external data written by other programs that use the "correct" locale and it is certainly not in the path to madness. I have been thinking for a while about writing/porting a read.table equivalent but unfortunately I haven't had much time in the past few months and because of that I have kind of stopped my transition from R to python for a while. Paulo ________________________________ De: Alan G Isaac Para: Discussion of Numerical Python Enviadas: Segunda-feira, 27 de Fevereiro de 2012 12:53 Assunto: Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers On 2/27/2012 10:10 AM, Paulo Jabardo wrote: > I have a few features that I believe would make text file easier for many people. In some countries (most?) the decimal separator in real numbers is not a point but a comma. > I think it would be very useful that the decimal separator be specified with a keyword argument (decimal = '.' for example) on the text reading function. Down that path lies madness. For a fast reader, just document input format to use "international notation" (i.e., the decimal point) and give the user the responsibility to ensure the data are in the right format. The format translation utilities should be separate, and calling them should be optional. fwiw, Alan Isaac _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From erin.sheldon at gmail.com Mon Feb 27 13:02:41 2012 From: erin.sheldon at gmail.com (Erin Sheldon) Date: Mon, 27 Feb 2012 13:02:41 -0500 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: <1330092347-sup-3918@rohan> <1330207186-sup-1957@rohan> <1330351883-sup-9943@rohan> Message-ID: <1330365437-sup-7898@rohan> Excerpts from Nathaniel Smith's message of Mon Feb 27 12:07:11 -0500 2012: > On Mon, Feb 27, 2012 at 2:44 PM, Erin Sheldon wrote: > > What I've got is a solution for writing and reading structured arrays to > > and from files, both in text files and binary files. ?It is written in C > > and python. ?It allows reading arbitrary subsets of the data efficiently > > without reading in the whole file. ?It defines a class Recfile that > > exposes an array like interface for reading, e.g. x=rf[columns][rows]. > > What format do you use for binary data? Something tiled? I don't > understand how you can read in a single column of a standard text or > mmap-style binary file any more efficiently than by reading the whole > file. For binary, I just seek to the appropriate bytes on disk and read them, no mmap. The user must have input an accurate dtype describing rows in the file of course. This saves a lot of memory and time on big files if you just need small subsets. For ascii, the approach is similar except care must be taken when skipping over unread fields and rows. For writing binary, I just tofile() so the bytes correspond directly between array and file. For ascii, I use the appropriate formats for each type. Does this answer your question? -e -- Erin Scott Sheldon Brookhaven National Laboratory From alan.isaac at gmail.com Mon Feb 27 13:07:08 2012 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 27 Feb 2012 13:07:08 -0500 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: <1330365609.4893.YahooMailNeo@web160606.mail.bf1.yahoo.com> References: <1330092347-sup-3918@rohan> <1330207186-sup-1957@rohan> <1330355438.96238.YahooMailNeo@web160601.mail.bf1.yahoo.com> <4F4BA710.8070200@gmail.com> <1330365609.4893.YahooMailNeo@web160606.mail.bf1.yahoo.com> Message-ID: <4F4BC64C.6060702@gmail.com> On 2/27/2012 1:00 PM, Paulo Jabardo wrote: > First of all '.' isn't international notation That is in fact a standard designation. http://en.wikipedia.org/wiki/Decimal_mark#Influence_of_calculators_and_computers Alan Isaac From pav at iki.fi Mon Feb 27 14:28:06 2012 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 27 Feb 2012 20:28:06 +0100 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: <4F4BC64C.6060702@gmail.com> References: <1330092347-sup-3918@rohan> <1330207186-sup-1957@rohan> <1330355438.96238.YahooMailNeo@web160601.mail.bf1.yahoo.com> <4F4BA710.8070200@gmail.com> <1330365609.4893.YahooMailNeo@web160606.mail.bf1.yahoo.com> <4F4BC64C.6060702@gmail.com> Message-ID: 27.02.2012 19:07, Alan G Isaac kirjoitti: > On 2/27/2012 1:00 PM, Paulo Jabardo wrote: >> First of all '.' isn't international notation > > That is in fact a standard designation. > http://en.wikipedia.org/wiki/Decimal_mark#Influence_of_calculators_and_computers ISO specifies comma to be used in international standards (ISO/IEC Directives, part 2 / 6.6.8.1): http://isotc.iso.org/livelink/livelink?func=ll&objId=10562502&objAction=download Not that it necessarily is important for this discussion. From alan.isaac at gmail.com Mon Feb 27 14:43:54 2012 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 27 Feb 2012 14:43:54 -0500 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: <1330092347-sup-3918@rohan> <1330207186-sup-1957@rohan> <1330355438.96238.YahooMailNeo@web160601.mail.bf1.yahoo.com> <4F4BA710.8070200@gmail.com> <1330365609.4893.YahooMailNeo@web160606.mail.bf1.yahoo.com> <4F4BC64C.6060702@gmail.com> Message-ID: <4F4BDCFA.5020502@gmail.com> On 2/27/2012 2:28 PM, Pauli Virtanen wrote: > ISO specifies comma to be used in international standards > (ISO/IEC Directives, part 2 / 6.6.8.1): > > http://isotc.iso.org/livelink/livelink?func=ll&objId=10562502&objAction=download I do not think you are right. I think that is a presentational requirement: rules of presentation for documents that are intended to become international standards. Note as well the requirement of spacing to separate digits. Clearly this cannot be a data storage specification. Naturally, the important thing is to agree on a standard data representation. Which one it is is less important, especially if conversion tools will be supplied. But it really is past time for the scientific community to insist on one international standard, and the decimal point has privilege of place because of computing language conventions. (Being the standard in the two largest economies in the world is a different kind of argument in favor of this choice.) Alan Isaac From matthew.brett at gmail.com Mon Feb 27 14:47:55 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 27 Feb 2012 14:47:55 -0500 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: <4F4BDCFA.5020502@gmail.com> References: <1330092347-sup-3918@rohan> <1330207186-sup-1957@rohan> <1330355438.96238.YahooMailNeo@web160601.mail.bf1.yahoo.com> <4F4BA710.8070200@gmail.com> <1330365609.4893.YahooMailNeo@web160606.mail.bf1.yahoo.com> <4F4BC64C.6060702@gmail.com> <4F4BDCFA.5020502@gmail.com> Message-ID: Hi, On Mon, Feb 27, 2012 at 2:43 PM, Alan G Isaac wrote: > On 2/27/2012 2:28 PM, Pauli Virtanen wrote: >> ISO specifies comma to be used in international standards >> (ISO/IEC Directives, part 2 / 6.6.8.1): >> >> http://isotc.iso.org/livelink/livelink?func=ll&objId=10562502&objAction=download > > > I do not think you are right. > I think that is a presentational requirement: > rules of presentation for documents that > are intended to become international standards. > Note as well the requirement of spacing to > separate digits. Clearly this cannot be a data > storage specification. > > Naturally, the important thing is to agree on a > standard data representation. ?Which one it is > is less important, especially if conversion tools > will be supplied. > > But it really is past time for the scientific community > to insist on one international standard, and the > decimal point has privilege of place because of > computing language conventions. (Being the standard > in the two largest economies in the world is a > different kind of argument in favor of this choice.) Maybe we can just agree it is an important option to have rather than an unimportant one, Best, Matthew From alan.isaac at gmail.com Mon Feb 27 14:56:06 2012 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 27 Feb 2012 14:56:06 -0500 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: <1330092347-sup-3918@rohan> <1330207186-sup-1957@rohan> <1330355438.96238.YahooMailNeo@web160601.mail.bf1.yahoo.com> <4F4BA710.8070200@gmail.com> <1330365609.4893.YahooMailNeo@web160606.mail.bf1.yahoo.com> <4F4BC64C.6060702@gmail.com> <4F4BDCFA.5020502@gmail.com> Message-ID: <4F4BDFD6.6030005@gmail.com> On 2/27/2012 2:47 PM, Matthew Brett wrote: > Maybe we can just agree it is an important option to have rather than > an unimportant one, It depends on what you mean by "option". If you mean there should be conversion tools from other formats to a specified supported format, then I agree. If you mean that the core reader should be cluttered with attempts to handle various and ill-specified formats, so that we end up with the kind of mess that leads people to expect their "CSV file" to be correctly parsed when they are using a non-comma delimiter, then I disagree. Cheers, Alan Isaac From pav at iki.fi Mon Feb 27 14:58:30 2012 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 27 Feb 2012 20:58:30 +0100 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: <4F4BDCFA.5020502@gmail.com> References: <1330092347-sup-3918@rohan> <1330207186-sup-1957@rohan> <1330355438.96238.YahooMailNeo@web160601.mail.bf1.yahoo.com> <4F4BA710.8070200@gmail.com> <1330365609.4893.YahooMailNeo@web160606.mail.bf1.yahoo.com> <4F4BC64C.6060702@gmail.com> <4F4BDCFA.5020502@gmail.com> Message-ID: Hi, 27.02.2012 20:43, Alan G Isaac kirjoitti: > On 2/27/2012 2:28 PM, Pauli Virtanen wrote: >> ISO specifies comma to be used in international standards >> (ISO/IEC Directives, part 2 / 6.6.8.1): >> >> http://isotc.iso.org/livelink/livelink?func=ll&objId=10562502&objAction=download > > I do not think you are right. > I think that is a presentational requirement: > rules of presentation for documents that > are intended to become international standards. Yes, it's an requirement for the standard texts themselves, but not what the standard texts specify. Which is why I didn't think it was so relevant (but the wikipedia link just prompted an immediate [citation needed]). I agree that using something else than '.' does not make much sense. -- Pauli Virtanen From matthew.brett at gmail.com Mon Feb 27 15:44:08 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 27 Feb 2012 15:44:08 -0500 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: <1330092347-sup-3918@rohan> <1330207186-sup-1957@rohan> <1330355438.96238.YahooMailNeo@web160601.mail.bf1.yahoo.com> <4F4BA710.8070200@gmail.com> <1330365609.4893.YahooMailNeo@web160606.mail.bf1.yahoo.com> <4F4BC64C.6060702@gmail.com> <4F4BDCFA.5020502@gmail.com> Message-ID: Hi, On Mon, Feb 27, 2012 at 2:58 PM, Pauli Virtanen wrote: > Hi, > > 27.02.2012 20:43, Alan G Isaac kirjoitti: >> On 2/27/2012 2:28 PM, Pauli Virtanen wrote: >>> ISO specifies comma to be used in international standards >>> (ISO/IEC Directives, part 2 / 6.6.8.1): >>> >>> http://isotc.iso.org/livelink/livelink?func=ll&objId=10562502&objAction=download >> >> I do not think you are right. >> I think that is a presentational requirement: >> rules of presentation for documents that >> are intended to become international standards. > > Yes, it's an requirement for the standard texts themselves, but not what > the standard texts specify. Which is why I didn't think it was so > relevant (but the wikipedia link just prompted an immediate [citation > needed]). I agree that using something else than '.' does not make much > sense. I suppose if anyone out there is from a country that uses commas for decimals in CSV files and does not want to have to convert them before reading them will be keen to volunteer to help with the coding. I am certainly glad it is not my own case, Best, Matthew From teoliphant at gmail.com Mon Feb 27 16:10:46 2012 From: teoliphant at gmail.com (Travis Oliphant) Date: Mon, 27 Feb 2012 15:10:46 -0600 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: <1330092347-sup-3918@rohan> <1330207186-sup-1957@rohan> <1330355438.96238.YahooMailNeo@web160601.mail.bf1.yahoo.com> <4F4BA710.8070200@gmail.com> <1330365609.4893.YahooMailNeo@web160606.mail.bf1.yahoo.com> <4F4BC64C.6060702@gmail.com> <4F4BDCFA.5020502@gmail.com> Message-ID: The architecture of this system should separate the iteration across the I/O from the transformation *on* the data. It should also allow the ability to plug-in different transformations at a low-level --- some thought should go into the API of the low-level transformation. Being able to memory-map text files would also be a bonus (but this would require some kind of index to allow seeking through the file). I have some ideas in this direction, but don't have the time to write them up just yet. -Travis On Feb 27, 2012, at 2:44 PM, Matthew Brett wrote: > Hi, > > On Mon, Feb 27, 2012 at 2:58 PM, Pauli Virtanen wrote: >> Hi, >> >> 27.02.2012 20:43, Alan G Isaac kirjoitti: >>> On 2/27/2012 2:28 PM, Pauli Virtanen wrote: >>>> ISO specifies comma to be used in international standards >>>> (ISO/IEC Directives, part 2 / 6.6.8.1): >>>> >>>> http://isotc.iso.org/livelink/livelink?func=ll&objId=10562502&objAction=download >>> >>> I do not think you are right. >>> I think that is a presentational requirement: >>> rules of presentation for documents that >>> are intended to become international standards. >> >> Yes, it's an requirement for the standard texts themselves, but not what >> the standard texts specify. Which is why I didn't think it was so >> relevant (but the wikipedia link just prompted an immediate [citation >> needed]). I agree that using something else than '.' does not make much >> sense. > > I suppose if anyone out there is from a country that uses commas for > decimals in CSV files and does not want to have to convert them before > reading them will be keen to volunteer to help with the coding. I am > certainly glad it is not my own case, > > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From jsseabold at gmail.com Mon Feb 27 17:55:29 2012 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 27 Feb 2012 17:55:29 -0500 Subject: [Numpy-discussion] speed of array creation from tuples Message-ID: I am surprised by this (though maybe I shouldn't be?) It's always faster to use list comprehension to unpack lists of tuples than np.array/asarray? [~/] [1]: X = [tuple(np.random.randint(10,size=2)) for _ in range(100)] [~/] [2]: timeit np.array([x1 for _,x1 in X]) 10000 loops, best of 3: 26.4 us per loop [~/] [3]: timeit np.array(X) 1000 loops, best of 3: 378 us per loop [~/] [4]: timeit x1, x2 = np.array([x for _,x in X]), np.array([y for y,_ in X]) 10000 loops, best of 3: 53.4 us per loop [~/] [5]: timeit x1, x2 = np.array(X).T 1000 loops, best of 3: 384 us per loop Skipper -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Mon Feb 27 18:12:45 2012 From: kwgoodman at gmail.com (Keith Goodman) Date: Mon, 27 Feb 2012 15:12:45 -0800 Subject: [Numpy-discussion] speed of array creation from tuples In-Reply-To: References: Message-ID: On Mon, Feb 27, 2012 at 2:55 PM, Skipper Seabold wrote: > I am surprised by this (though maybe I shouldn't be?) It's always faster to > use list comprehension to unpack lists of tuples than np.array/asarray? > > [~/] > [1]: X = [tuple(np.random.randint(10,size=2)) for _ in > range(100)] > > [~/] > [2]: timeit np.array([x1 for _,x1 in > X]) > 10000 loops, best of 3: 26.4 us per loop > > [~/] > [3]: timeit > np.array(X) > 1000 loops, best of 3: 378 us per loop > > [~/] > [4]: timeit x1, x2 = np.array([x for _,x in X]), np.array([y for y,_ in > X]) > 10000 loops, best of 3: 53.4 us per loop > > [~/] > [5]: timeit x1, x2 = > np.array(X).T > 1000 loops, best of 3: 384 us per loop Here's a similar thread: http://mail.scipy.org/pipermail/numpy-discussion/2010-February/048304.html From ralf.gommers at googlemail.com Tue Feb 28 01:15:25 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 28 Feb 2012 07:15:25 +0100 Subject: [Numpy-discussion] ANN: SciPy 0.10.1 released Message-ID: Hi all, I am pleased to announce the availability of SciPy 0.10.1. This is a maintenance release, with no new features compared to 0.10.0. Sources and binaries can be found at http://sourceforge.net/projects/scipy/files/scipy/0.10.1/, release notes are copied below. Enjoy, The SciPy developers ========================== SciPy 0.10.1 Release Notes ========================== .. contents:: SciPy 0.10.1 is a bug-fix release with no new features compared to 0.10.0. Main changes ------------ The most important changes are:: 1. The single precision routines of ``eigs`` and ``eigsh`` in ``scipy.sparse.linalg`` have been disabled (they internally use double precision now). 2. A compatibility issue related to changes in NumPy macros has been fixed, in order to make scipy 0.10.1 compile with the upcoming numpy 1.7.0 release. Other issues fixed ------------------ - #835: stats: nan propagation in stats.distributions - #1202: io: netcdf segfault - #1531: optimize: make curve_fit work with method as callable. - #1560: linalg: fixed mistake in eig_banded documentation. - #1565: ndimage: bug in ndimage.variance - #1457: ndimage: standard_deviation does not work with sequence of indexes - #1562: cluster: segfault in linkage function - #1568: stats: One-sided fisher_exact() returns `p` < 1 for 0 successful attempts - #1575: stats: zscore and zmap handle the axis keyword incorrectly -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.p.krauss at gmail.com Tue Feb 28 09:45:55 2012 From: thomas.p.krauss at gmail.com (Tom K.) Date: Tue, 28 Feb 2012 14:45:55 +0000 (UTC) Subject: [Numpy-discussion] Numpy on App Engine Message-ID: Congrats, numpy is now available on the Google App Engine: http://googleappengine.blogspot.in/2012/02/announcing-general-availability-of.html From robert.kern at gmail.com Tue Feb 28 10:09:00 2012 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 28 Feb 2012 15:09:00 +0000 Subject: [Numpy-discussion] [NumPy-Tickets] [NumPy] #2067: when x is an N-D array, list(x) produces surprising results In-Reply-To: References: <020.be3246a62ad863e76890a74d43a6bc30@scipy.org> <029.54094f42ff9be4619e30f2b31c0a2b9e@scipy.org> Message-ID: Off-Trac comments should probably go to numpy-discussion rather than back to numpy-tickets. I'm not sure why it's not read-only, but it should be. On Tue, Feb 28, 2012 at 01:15, Phillip Feldman wrote: > I'd appreciate a pointer to something in the NumPy reference material > that states that a N-dim ndarray is treated as a collection of > (N-1)-dim ndarrays for purposes of iteration. I'm not sure anything states that about iteration specifically, but it's implicit from the documented behavior when given a single integer index: http://docs.scipy.org/doc/numpy/user/basics.indexing.html#single-element-indexing > Phillip > > On Mon, Feb 27, 2012 at 11:45 AM, ? wrote: >> #2067: when x is an N-D array, list(x) produces surprising results >> ------------------------+--------------------------------------------------- >> ?Reporter: ?pfeldman ? ?| ? ? ? Owner: ?somebody >> ? ? Type: ?defect ? ? ?| ? ? ?Status: ?new >> ?Priority: ?normal ? ? ?| ? Milestone: ?Unscheduled >> Component: ?numpy.core ?| ? ? Version: ?1.6.1 >> ?Keywords: ? ? ? ? ? ? ?| >> ------------------------+--------------------------------------------------- >> >> Comment(by pv): >> >> ?The behavior does make sense, and it is what would naturally be expected. >> ?From the point of view of iteration, a N-dim ndarray *is* a collection of >> ?(N-1)-dim sub-arrays. The list constructor merely retains the consistency >> ?between `list(a)[j]` and `list(a[j])`. I don't see any reason to change >> ?this behavior, as raveling is a separate operation. >> >> -- >> Ticket URL: >> NumPy >> My example project > _______________________________________________ > NumPy-Tickets mailing list > NumPy-Tickets at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-tickets -- Robert Kern From jdh2358 at gmail.com Tue Feb 28 14:05:49 2012 From: jdh2358 at gmail.com (John Hunter) Date: Tue, 28 Feb 2012 13:05:49 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> <4F402B46.4020103@molden.no> Message-ID: On Sat, Feb 18, 2012 at 5:09 PM, David Cournapeau wrote: > > There are better languages than C++ that has most of the technical > benefits stated in this discussion (rust and D being the most > "obvious" ones), but whose usage is unrealistic today for various > reasons: knowledge, availability on "esoteric" platforms, etc? A new > language is completely ridiculous. I just saw this for the first time today: Linus Torvalds on C++ ( http://harmful.cat-v.org/software/c++/linus). The post is from 2007 so many of you may have seen it, but I thought it was entertainng enough and on-topic enough with this thread that I'd share it in case you haven't. The point he makes: In other words, the only way to do good, efficient, and system-level and portable C++ ends up to limit yourself to all the things that are basically available in C was interesting to me because the best C++ library I have ever worked with (agg) imports *nothing* except standard C libs (no standard template library). In fact, the only includes external to external to itself are math.h, stdlib.h, stdio.h, and string.h. To shoehorn Jamie Zawinski's famous regex quote ( http://regex.info/blog/2006-09-15/247). "Some people, when confronted with a problem, think ?I know, I'll use boost.? Now they have two problems." Here is the Linus post: From: Linus Torvalds linux-foundation.org> Subject: Re: [RFC] Convert builin-mailinfo.c to use The Better String Library. Newsgroups: gmane.comp.version-control.git Date: 2007-09-06 17:50:28 GMT (2 years, 14 weeks, 16 hours and 36 minutes ago) On Wed, 5 Sep 2007, Dmitry Kakurin wrote: > > When I first looked at Git source code two things struck me as odd: > 1. Pure C as opposed to C++. No idea why. Please don't talk about portability, > it's BS. *YOU* are full of bullshit. C++ is a horrible language. It's made more horrible by the fact that a lot of substandard programmers use it, to the point where it's much much easier to generate total and utter crap with it. Quite frankly, even if the choice of C were to do *nothing* but keep the C++ programmers out, that in itself would be a huge reason to use C. In other words: the choice of C is the only sane choice. I know Miles Bader jokingly said "to piss you off", but it's actually true. I've come to the conclusion that any programmer that would prefer the project to be in C++ over C is likely a programmer that I really *would* prefer to piss off, so that he doesn't come and screw up any project I'm involved with. C++ leads to really really bad design choices. You invariably start using the "nice" library features of the language like STL and Boost and other total and utter crap, that may "help" you program, but causes: - infinite amounts of pain when they don't work (and anybody who tells me that STL and especially Boost are stable and portable is just so full of BS that it's not even funny) - inefficient abstracted programming models where two years down the road you notice that some abstraction wasn't very efficient, but now all your code depends on all the nice object models around it, and you cannot fix it without rewriting your app. In other words, the only way to do good, efficient, and system-level and portable C++ ends up to limit yourself to all the things that are basically available in C. And limiting your project to C means that people don't screw that up, and also means that you get a lot of programmers that do actually understand low-level issues and don't screw things up with any idiotic "object model" crap. So I'm sorry, but for something like git, where efficiency was a primary objective, the "advantages" of C++ is just a huge mistake. The fact that we also piss off people who cannot see that is just a big additional advantage. If you want a VCS that is written in C++, go play with Monotone. Really. They use a "real database". They use "nice object-oriented libraries". They use "nice C++ abstractions". And quite frankly, as a result of all these design decisions that sound so appealing to some CS people, the end result is a horrible and unmaintainable mess. But I'm sure you'd like it more than git. Linus -------------- next part -------------- An HTML attachment was scrubbed... URL: From teoliphant at gmail.com Tue Feb 28 14:12:31 2012 From: teoliphant at gmail.com (Travis Oliphant) Date: Tue, 28 Feb 2012 13:12:31 -0600 Subject: [Numpy-discussion] YouTrack License Message-ID: I just received word that NumPy has a license to use TeamCity and YouTrack for NumPy development. YouTrack is a really nice issue tracker: http://www.jetbrains.com/youtrack/ TeamCity is a really nice Continuous Integration system: http://www.jetbrains.com/teamcity/ I'm planning to set this up to try out. Is anyone interested in helping out? Thanks, -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Feb 28 15:13:09 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 28 Feb 2012 13:13:09 -0700 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> <4F402B46.4020103@molden.no> Message-ID: On Tue, Feb 28, 2012 at 12:05 PM, John Hunter wrote: > On Sat, Feb 18, 2012 at 5:09 PM, David Cournapeau wrote: > >> >> There are better languages than C++ that has most of the technical >> >> benefits stated in this discussion (rust and D being the most >> "obvious" ones), but whose usage is unrealistic today for various >> reasons: knowledge, availability on "esoteric" platforms, etc? A new >> language is completely ridiculous. >> > > > I just saw this for the first time today: Linus Torvalds on C++ ( > http://harmful.cat-v.org/software/c++/linus). The post is from 2007 so > many of you may have seen it, but I thought it was entertainng enough and > on-topic enough with this thread that I'd share it in case you haven't. > > > The point he makes: > > In other words, the only way to do good, efficient, and system-level and > portable C++ ends up to limit yourself to all the things that > are basically > available in C > > was interesting to me because the best C++ library I have ever worked with > (agg) imports *nothing* except standard C libs (no standard template > library). In fact, the only includes external to external to itself > are math.h, stdlib.h, stdio.h, and string.h. > > To shoehorn Jamie Zawinski's famous regex quote ( > http://regex.info/blog/2006-09-15/247). "Some people, when confronted > with a problem, think ?I know, I'll use boost.? Now they have two > problems." > > Here is the Linus post: > > From: Linus Torvalds linux-foundation.org> > Subject: Re: [RFC] Convert builin-mailinfo.c to use The Better String > Library. > Newsgroups: gmane.comp.version-control.git > Date: 2007-09-06 17:50:28 GMT (2 years, 14 weeks, 16 hours and 36 minutes > ago) > > On Wed, 5 Sep 2007, Dmitry Kakurin wrote: > > > > When I first looked at Git source code two things struck me as odd: > > 1. Pure C as opposed to C++. No idea why. Please don't talk about > portability, > > it's BS. > > *YOU* are full of bullshit. > > C++ is a horrible language. It's made more horrible by the fact that a lot > of substandard programmers use it, to the point where it's much much > easier to generate total and utter crap with it. Quite frankly, even if > the choice of C were to do *nothing* but keep the C++ programmers out, > that in itself would be a huge reason to use C. > > In other words: the choice of C is the only sane choice. I know Miles > Bader jokingly said "to piss you off", but it's actually true. I've come > to the conclusion that any programmer that would prefer the project to be > in C++ over C is likely a programmer that I really *would* prefer to piss > off, so that he doesn't come and screw up any project I'm involved with. > > C++ leads to really really bad design choices. You invariably start using > the "nice" library features of the language like STL and Boost and other > total and utter crap, that may "help" you program, but causes: > > - infinite amounts of pain when they don't work (and anybody who tells me > that STL and especially Boost are stable and portable is just so full > of BS that it's not even funny) > > - inefficient abstracted programming models where two years down the road > you notice that some abstraction wasn't very efficient, but now all > your code depends on all the nice object models around it, and you > cannot fix it without rewriting your app. > > In other words, the only way to do good, efficient, and system-level and > portable C++ ends up to limit yourself to all the things that are > basically available in C. And limiting your project to C means that people > don't screw that up, and also means that you get a lot of programmers that > do actually understand low-level issues and don't screw things up with any > idiotic "object model" crap. > > So I'm sorry, but for something like git, where efficiency was a primary > objective, the "advantages" of C++ is just a huge mistake. The fact that > we also piss off people who cannot see that is just a big additional > advantage. > > If you want a VCS that is written in C++, go play with Monotone. Really. > They use a "real database". They use "nice object-oriented libraries". > They use "nice C++ abstractions". And quite frankly, as a result of all > these design decisions that sound so appealing to some CS people, the end > result is a horrible and unmaintainable mess. > > But I'm sure you'd like it more than git. > > Yeah, Linus doesn't like C++. No doubt that is in part because of the attempt to rewrite Linux in C++ back in the early 90's and the resulting compiler and portability problems. Linus also writes C like it was his native tongue, he likes to work close to the metal, and he'd probably prefer it over Python for most problems ;) Things have improved in the compiler department, and I think C++ really wasn't much of an improvement over C until templates and the STL came along. The boost smart pointers are also really nice. OTOH, it is really easy to write awful C++ because of the way inheritance and the other features were over-hyped and the 'everything and the kitchen sink' way it developed. Like any tool, familiarity and skill are essential to good results, but unlike some tools, one also needs to forgo some of the features to keep it under control. It's not a hammer, it is a three inch wide Swiss Army Knife. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ceball at gmail.com Tue Feb 28 15:56:33 2012 From: ceball at gmail.com (Chris Ball) Date: Tue, 28 Feb 2012 20:56:33 +0000 (UTC) Subject: [Numpy-discussion] YouTrack License References: Message-ID: Travis Oliphant gmail.com> writes: > > I just received word that NumPy has a ?license to use TeamCity and YouTrack for NumPy development.? > YouTrack is a really nice issue tracker: ??http://www.jetbrains.com/youtrack/ > > > TeamCity is a really nice Continuous Integration system: ?http:// www.jetbrains.com/teamcity/ ShiningPanda's hosted Jenkins service was mentioned in a previous thread ("Issue Tracking"; http://thread.gmane.org/ gmane.comp.python.numeric.general/47965), along with the limitation that it supports only Debian 6. I emailed ShiningPanda asking for some more information about using the service with a bigger range of hardware/platforms, in case it helps for comparison with TeamCity. From the ShiningPanda team (with permission): " It is possible to install a Jenkins server on external hardware, run the tests there, and bring back the results on the hosted instance of ShiningPanda. It is very easy to do using this plugin: https://wiki.jenkins-ci.org/display/JENKINS/ Build+Publisher+Plugin. [On the external hardware you] will need to install a Jenkins server, it's not much work. Besides Jenkins is quite self-contained so you could easily pre- configure one and distribute it. But it will need a JVM, which could be a problem. It is not in Shining Panda documentation yet. [...] The current gentleman agreement for this feature is: if you cannot build it on ShiningPanda, then you can build it outside. It will most certainly make it to our ToS at some point. We are adding support for Windows 7: it should be available next month. At first it will be a free beta, then it will become a paid option: around $0.60 ~ $0.70 per hour of build. MacOS X should follow later this year. " Chris From ralf.gommers at googlemail.com Tue Feb 28 16:11:50 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 28 Feb 2012 22:11:50 +0100 Subject: [Numpy-discussion] YouTrack License In-Reply-To: References: Message-ID: On Tue, Feb 28, 2012 at 8:12 PM, Travis Oliphant wrote: > I just received word that NumPy has a license to use TeamCity and > YouTrack for NumPy development. > > YouTrack is a really nice issue tracker: > http://www.jetbrains.com/youtrack/ > > TeamCity is a really nice Continuous Integration system: > http://www.jetbrains.com/teamcity/ > > I'm planning to set this up to try out. Is anyone interested in helping > out? > I'm certainly interested to try it out. I can help with the converter script too if needed. Looks pretty simple, there's something to start from at http://www.jetbrains.com/youtrack/features/import.html. How about just putting it in a new github repo so everyone can see/review the mapping between Trac and YouTrack fields? We should probably create a basic export from YouTrack script at the same time, to make sure there's no lock-in. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Tue Feb 28 16:34:08 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 28 Feb 2012 13:34:08 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F402B46.4020103@molden.no> Message-ID: <4F4D4850.9030508@astro.uio.no> On 02/28/2012 11:05 AM, John Hunter wrote: > On Sat, Feb 18, 2012 at 5:09 PM, David Cournapeau > wrote: > > > There are better languages than C++ that has most of the technical > benefits stated in this discussion (rust and D being the most > "obvious" ones), but whose usage is unrealistic today for various > reasons: knowledge, availability on "esoteric" platforms, etc? A new > language is completely ridiculous. > > > > I just saw this for the first time today: Linus Torvalds on C++ > (http://harmful.cat-v.org/software/c++/linus). The post is from 2007 so > many of you may have seen it, but I thought it was entertainng enough > and on-topic enough with this thread that I'd share it in case you haven't. > > > The point he makes: > > In other words, the only way to do good, efficient, and system-level and > portable C++ ends up to limit yourself to all the things that > are basically > available in C > > was interesting to me because the best C++ library I have ever worked > with (agg) imports *nothing* except standard C libs (no standard > template library). In fact, the only includes external to external to > itself are math.h, stdlib.h, stdio.h, and string.h. > > To shoehorn Jamie Zawinski's famous regex quote > (http://regex.info/blog/2006-09-15/247). "Some people, when confronted > with a problem, think ?I know, I'll use boost.? Now they have two > problems." In the same vein, this one neatly sums up all the bad sides of C++. (I don't really want to enter the language discussion. But this list is a nice list of the cons, and perhaps that can save discussion time because people don't have to enumerate those reasons again on this list?) http://yosefk.com/c++fqa/defective.html Dag > > Here is the Linus post: > > From: Linus Torvalds linux-foundation.org > > > Subject: Re: [RFC] Convert builin-mailinfo.c to use The Better String > Library. > Newsgroups: gmane.comp.version-control.git > Date: 2007-09-06 17:50:28 GMT (2 years, 14 weeks, 16 hours and 36 > minutes ago) > > On Wed, 5 Sep 2007, Dmitry Kakurin wrote: > > > > When I first looked at Git source code two things struck me as odd: > > 1. Pure C as opposed to C++. No idea why. Please don't talk about > portability, > > it's BS. > > *YOU* are full of bullshit. > > C++ is a horrible language. It's made more horrible by the fact that a lot > of substandard programmers use it, to the point where it's much much > easier to generate total and utter crap with it. Quite frankly, even if > the choice of C were to do *nothing* but keep the C++ programmers out, > that in itself would be a huge reason to use C. > > In other words: the choice of C is the only sane choice. I know Miles > Bader jokingly said "to piss you off", but it's actually true. I've come > to the conclusion that any programmer that would prefer the project to be > in C++ over C is likely a programmer that I really *would* prefer to piss > off, so that he doesn't come and screw up any project I'm involved with. > > C++ leads to really really bad design choices. You invariably start using > the "nice" library features of the language like STL and Boost and other > total and utter crap, that may "help" you program, but causes: > > - infinite amounts of pain when they don't work (and anybody who tells me > that STL and especially Boost are stable and portable is just so full > of BS that it's not even funny) > > - inefficient abstracted programming models where two years down the road > you notice that some abstraction wasn't very efficient, but now all > your code depends on all the nice object models around it, and you > cannot fix it without rewriting your app. > > In other words, the only way to do good, efficient, and system-level and > portable C++ ends up to limit yourself to all the things that are > basically available in C. And limiting your project to C means that people > don't screw that up, and also means that you get a lot of programmers that > do actually understand low-level issues and don't screw things up with any > idiotic "object model" crap. > > So I'm sorry, but for something like git, where efficiency was a primary > objective, the "advantages" of C++ is just a huge mistake. The fact that > we also piss off people who cannot see that is just a big additional > advantage. > > If you want a VCS that is written in C++, go play with Monotone. Really. > They use a "real database". They use "nice object-oriented libraries". > They use "nice C++ abstractions". And quite frankly, as a result of all > these design decisions that sound so appealing to some CS people, the end > result is a horrible and unmaintainable mess. > > But I'm sure you'd like it more than git. > > Linus > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Tue Feb 28 16:52:56 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 28 Feb 2012 14:52:56 -0700 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <4F4D4850.9030508@astro.uio.no> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F402B46.4020103@molden.no> <4F4D4850.9030508@astro.uio.no> Message-ID: On Tue, Feb 28, 2012 at 2:34 PM, Dag Sverre Seljebotn < d.s.seljebotn at astro.uio.no> wrote: > On 02/28/2012 11:05 AM, John Hunter wrote: > > On Sat, Feb 18, 2012 at 5:09 PM, David Cournapeau > > wrote: > > > > > > There are better languages than C++ that has most of the technical > > benefits stated in this discussion (rust and D being the most > > "obvious" ones), but whose usage is unrealistic today for various > > reasons: knowledge, availability on "esoteric" platforms, etc? A new > > language is completely ridiculous. > > > > > > > > I just saw this for the first time today: Linus Torvalds on C++ > > (http://harmful.cat-v.org/software/c++/linus). The post is from 2007 so > > many of you may have seen it, but I thought it was entertainng enough > > and on-topic enough with this thread that I'd share it in case you > haven't. > > > > > > The point he makes: > > > > In other words, the only way to do good, efficient, and system-level > and > > portable C++ ends up to limit yourself to all the things that > > are basically > > available in C > > > > was interesting to me because the best C++ library I have ever worked > > with (agg) imports *nothing* except standard C libs (no standard > > template library). In fact, the only includes external to external to > > itself are math.h, stdlib.h, stdio.h, and string.h. > > > > To shoehorn Jamie Zawinski's famous regex quote > > (http://regex.info/blog/2006-09-15/247). "Some people, when confronted > > with a problem, think ?I know, I'll use boost.? Now they have two > > problems." > > > In the same vein, this one neatly sums up all the bad sides of C++. > > (I don't really want to enter the language discussion. But this list is > a nice list of the cons, and perhaps that can save discussion time > because people don't have to enumerate those reasons again on this list?) > > http://yosefk.com/c++fqa/defective.html > > Heh, I was hoping for something good, but that was kinda unfair. OK, so C++ isn't JAVA or C# or Python, no garbage collection or introspection or whatever, but so what. Destructors are called as the exception unwinds up the call stack, etc. That list is sort of the opposite end of the critical spectrum from Linus (C++ does too much) and is more like a complaint that C++ doesn't walk the dog. Can't satisfy everyone ;) Chuck. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Tue Feb 28 17:00:05 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 28 Feb 2012 23:00:05 +0100 Subject: [Numpy-discussion] Moving to gcc 4.* for win32 installers ? In-Reply-To: References: Message-ID: On Mon, Feb 6, 2012 at 10:54 PM, David Cournapeau wrote: > On Sat, Feb 4, 2012 at 3:55 PM, Ralf Gommers > wrote: > > > > > > On Wed, Dec 14, 2011 at 6:50 PM, Ralf Gommers < > ralf.gommers at googlemail.com> > > wrote: > >> > >> > >> > >> On Wed, Dec 14, 2011 at 3:04 PM, David Cournapeau > >> wrote: > >>> > >>> On Tue, Dec 13, 2011 at 3:43 PM, Ralf Gommers > >>> wrote: > >>> > On Sun, Oct 30, 2011 at 12:18 PM, David Cournapeau < > cournape at gmail.com> > >>> > wrote: > >>> >> > >>> >> On Thu, Oct 27, 2011 at 5:19 PM, Ralf Gommers > >>> >> wrote: > >>> >> > Hi David, > >>> >> > > >>> >> > On Thu, Oct 27, 2011 at 3:02 PM, David Cournapeau > >>> >> > > >>> >> > wrote: > >>> >> >> > >>> >> >> Hi, > >>> >> >> > >>> >> >> I was wondering if we could finally move to a more recent version > >>> >> >> of > >>> >> >> compilers for official win32 installers. This would of course > >>> >> >> concern > >>> >> >> the next release cycle, not the ones where beta/rc are already in > >>> >> >> progress. > >>> >> >> > >>> >> >> Basically, the pros: > >>> >> >> - we will have to move at some point > >>> >> >> - gcc 4.* seem less buggy, especially C++ and fortran. > >>> >> >> - no need to maintain msvcr90 vodoo > >>> >> >> The cons: > >>> >> >> - it will most likely break the ABI > >>> >> >> - we need to recompile atlas (but I can take care of it) > >>> >> >> - the biggest: it is difficult to combine gfortran with visual > >>> >> >> studio (more exactly you cannot link gfortran runtime to a visual > >>> >> >> studio executable). The only solution I could think of would be > to > >>> >> >> recompile the gfortran runtime with Visual Studio, which for some > >>> >> >> reason does not sound very appealing :) > >>> >> > > >>> >> > To get the datetime changes to work with MinGW, we already > concluded > >>> >> > that > >>> >> > building with 4.x is more or less required (without recognizing > some > >>> >> > of > >>> >> > the > >>> >> > points you list above). Changes to mingw32ccompiler to fix > >>> >> > compilation > >>> >> > with > >>> >> > 4.x went in in https://github.com/numpy/numpy/pull/156. It would > be > >>> >> > good > >>> >> > if > >>> >> > you could check those. > >>> >> > >>> >> I will look into it more carefully, but overall, it seems that > >>> >> building atlas 3.8.4, numpy and scipy with gcc 4.x works quite well. > >>> >> The main issue is that gcc 4.* adds some dependencies on mingw dlls. > >>> >> There are two options: > >>> >> - adding the dlls in the installers > >>> >> - statically linking those, which seems to be a bad idea > >>> >> (generalizing the dll boundaries problem to exception and things we > >>> >> would rather not care about: > >>> >> http://cygwin.com/ml/cygwin/2007-06/msg00332.html). > >>> >> > >>> >> > It probably makes sense make this move for numpy 1.7. If this > breaks > >>> >> > the > >>> >> > ABI > >>> >> > then it would be easiest to make numpy 1.7 the minimum required > >>> >> > version > >>> >> > for > >>> >> > scipy 0.11. > >>> >> > >>> >> My thinking as well. > >>> >> > >>> > > >>> > Hi David, what is the current status of this issue? I kind of forgot > >>> > this is > >>> > a prerequisite for the next release when starting the 1.7.0 release > >>> > thread. > >>> > >>> The only issue at this point is the distribution of mingw dlls. I have > >>> not found a way to do it nicely (where nicely means something that is > >>> distributed within numpy package). Given that those dlls are actually > >>> versioned and seem to have a strong versioning policy, maybe we can > >>> just install them inside the python installation ? > >>> > >> Although not ideal, I don't have a problem with that in principle. > >> However, wouldn't it break installing without admin rights if Python is > >> installed by the admin? > > > > > > David, do you have any more thoughts on this? Is there a final solution > in > > sight? Anything I or anyone else can do to help? > > I have not found a way to do it without installing the dll alongside > python libraries. That brings the problem of how to install libraries > there from bdist_wininst/bdist_msi installers, which I had not the > time to look at. > Could you push your changes to github? The 3.4.5 installers are becoming very hard to even locate, so I'd like to try them out. Han reported at https://github.com/numpy/numpy/pull/214 that current master crashes with MinGW 4.x. Thanks, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Tue Feb 28 17:16:42 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 28 Feb 2012 23:16:42 +0100 Subject: [Numpy-discussion] Problem Building Numpy with Python 2.7.1 and OS X 10.7.3 In-Reply-To: References: <440B19D3-1A26-4EE8-AC99-69DFCF93CBC9@uvic.ca> Message-ID: On Mon, Feb 27, 2012 at 1:47 AM, Patrick Armstrong wrote: > Hi, > > On 2012-02-25, at 5:14 AM, Ralf Gommers wrote: > > Since you're using pip, I assume that gcc-4.2 is llvm-gcc. As a first > step, I suggest using plain gcc and not using pip (so just "python setup.py > install"). Also make sure you follow the recommendations in "version > specific notes" at http://scipy.org/Installing_SciPy/Mac_OS_X. > > > As is mentioned earlier in the thread, Xcode doesn't distribute plain gcc > anymore. I've tried building with llvm-gcc, and with clang, and I get the > same failed result. > > To be sure, I tried building with a plain "python setup.py install", and I > get the same result. Here' my build log with clang: > https://gist.github.com/1920128 > Just tried on OS X 10.6 with clang, that works fine. No idea why 10.7 would be different. $ export CC=clang $ python setup.py build_ext -i Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Feb 28 17:22:16 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 28 Feb 2012 22:22:16 +0000 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: <1330387831-sup-839@rohan> References: <1330092347-sup-3918@rohan> <1330207186-sup-1957@rohan> <1330351883-sup-9943@rohan> <1330365437-sup-7898@rohan> <1330387831-sup-839@rohan> Message-ID: [Re-adding the list to the To: field, after it got dropped accidentally] On Tue, Feb 28, 2012 at 12:28 AM, Erin Sheldon wrote: > Excerpts from Nathaniel Smith's message of Mon Feb 27 17:33:52 -0500 2012: >> On Mon, Feb 27, 2012 at 6:02 PM, Erin Sheldon wrote: >> > Excerpts from Nathaniel Smith's message of Mon Feb 27 12:07:11 -0500 2012: >> >> On Mon, Feb 27, 2012 at 2:44 PM, Erin Sheldon wrote: >> >> > What I've got is a solution for writing and reading structured arrays to >> >> > and from files, both in text files and binary files. ?It is written in C >> >> > and python. ?It allows reading arbitrary subsets of the data efficiently >> >> > without reading in the whole file. ?It defines a class Recfile that >> >> > exposes an array like interface for reading, e.g. x=rf[columns][rows]. >> >> >> >> What format do you use for binary data? Something tiled? I don't >> >> understand how you can read in a single column of a standard text or >> >> mmap-style binary file any more efficiently than by reading the whole >> >> file. >> > >> > For binary, I just seek to the appropriate bytes on disk and read them, >> > no mmap. ?The user must have input an accurate dtype describing rows in >> > the file of course. ?This saves a lot of memory and time on big files if >> > you just need small subsets. >> >> Have you quantified the time savings? I'd expect this to either be the >> same speed or slower than reading the entire file. > > Nathaniel - > > Yes I've verified it, but as you point out below there are pathological > cases. ? ?See below. > >> The reason is that usually the OS cannot read just a few bytes from a >> middle of a file -- if it is reading at all, it will read at least a >> page (4K on linux). If your rows are less than 4K in size, then >> reading a little bit out of each row means that you will be loading >> the entire file from disk regardless. You avoid any parsing overhead >> for the skipped columns, but for binary files that should be zero. >> (Even if you're doing endian conversion or something it should still >> be trivial compared to disk speed.) > > I'll say up front, the speed gains for binary data are often huge over > even numpy.memmap because memmap is not column aware. ?My code doesn't > have that limitation. Hi Erin, I don't doubt your observations, but... there must be something more going on! In a modern VM implementation, what happens when you request to read from an arbitrary offset in the file is: 1) The OS works out which disk page (or set of pages, for a longer read) contains the given offset 2) It reads those pages from the disk, and loads them into some OS owned buffers (the "page cache") 3) It copies the relevant bytes out of the page cache into the buffer passed to read() And when you mmap() and then attempt to access some memory at an arbitrary offset within the mmap region, what happens is: 1) The processor notices that it doesn't actually know how the memory address given maps to real memory (a tlb miss), so it asks the OS 2) The OS notices that this is a memory-mapped region, and works out which disk page maps to the given memory address 3) It reads that page from disk, and loads it into some OS owned buffers (the "page cache") 4) It tells the processor That is, reading at a bunch of fixed offsets inside a large memory mapped array (which is what numpy does when you request a single column of a recarray) should end up issuing *exactly the same read commands* as writing code that explicitly seeks to those addresses and reads them. But, I realized I've never actually tested this myself, so I wrote a little test (attached). It reads a bunch of uint32's at equally-spaced offsets from a large file, using either mmap, explicit seeks, or the naive read-everything approach. I'm finding it very hard to get precise results, because I don't have a spare drive and anything that touches the disk really disrupts the timing here (and apparently Ubuntu no longer has a real single-user mode :-(), but here are some examples on a 200,000,000 byte file with different simulated row sizes: 1024 byte rows: Mode: MMAP. Checksum: bdd205e9. Time: 3.44 s Mode: SEEK. Checksum: bdd205e9. Time: 3.34 s Mode: READALL. Checksum: bdd205e9. Time: 3.53 s Mode: MMAP. Checksum: bdd205e9. Time: 3.39 s Mode: SEEK. Checksum: bdd205e9. Time: 3.30 s Mode: READALL. Checksum: bdd205e9. Time: 3.17 s Mode: MMAP. Checksum: bdd205e9. Time: 3.16 s Mode: SEEK. Checksum: bdd205e9. Time: 3.41 s Mode: READALL. Checksum: bdd205e9. Time: 3.43 s 65536 byte rows (64 KiB): Mode: MMAP. Checksum: da4f9d8d. Time: 3.25 s Mode: SEEK. Checksum: da4f9d8d. Time: 3.27 s Mode: READALL. Checksum: da4f9d8d. Time: 3.16 s Mode: MMAP. Checksum: da4f9d8d. Time: 3.34 s Mode: SEEK. Checksum: da4f9d8d. Time: 3.36 s Mode: READALL. Checksum: da4f9d8d. Time: 3.44 s Mode: MMAP. Checksum: da4f9d8d. Time: 3.18 s Mode: SEEK. Checksum: da4f9d8d. Time: 3.19 s Mode: READALL. Checksum: da4f9d8d. Time: 3.16 s 1048576 byte rows (1 MiB): Mode: MMAP. Checksum: 22963df9. Time: 1.57 s Mode: SEEK. Checksum: 22963df9. Time: 1.44 s Mode: READALL. Checksum: 22963df9. Time: 3.13 s Mode: MMAP. Checksum: 22963df9. Time: 1.59 s Mode: SEEK. Checksum: 22963df9. Time: 1.43 s Mode: READALL. Checksum: 22963df9. Time: 3.16 s Mode: MMAP. Checksum: 22963df9. Time: 1.55 s Mode: SEEK. Checksum: 22963df9. Time: 1.66 s Mode: READALL. Checksum: 22963df9. Time: 3.15 s And for comparison: In [32]: a = np.memmap("src/bigfile", np.uint32, "r") In [33]: time hex(np.sum(a[::1048576//4][:-1], dtype=a.dtype)) CPU times: user 0.00 s, sys: 0.01 s, total: 0.01 s Wall time: 1.54 s Out[34]: '0x22963df9L' (Ubuntu Linux 2.6.38-13, traditional spinning-disk media) So, in this test: For small rows: seeking is irrelevant, reading everything is just as fast. (And the cutoff for "small" is not very small... I tried 512KiB too and it looked like 32KiB). For large rows: seeking is faster than reading everything. But mmap, explicit seeks, and np.memmap all act the same. I guess it's possible the difference you're seeing could just mean that, like, Windows has a terrible VM subsystem, but that would be weird. > In the ascii case the gains for speed are less > for the reasons you point out; you have to read through the data even to > skip rows and fields. ?Then it is really about memory. > > > Even for binary, there are pathological cases, e.g. 1) reading a random > subset of nearly all rows. ?2) reading a single column when rows are > small. ?In case 2 you will only go this route in the first place if you > need to save memory. ?The user should be aware of these issues. FWIW, this route actually doesn't save any memory as compared to np.memmap. > I wrote this code to deal with a typical use case of mine where small > subsets of binary or ascii data are begin read from a huge file. ?It > solves that problem. Cool. >> If your rows are greater than 4K in size, then seeking will allow you >> to avoid loading some parts of the file from disk... but it also might >> defeat the OS's readahead heuristics, which means that instead of >> doing a streaming read, you're might be doing a disk seek for every >> row. On an SSD, this is fine, and probably a nice win. On a >> traditional spinning-disk, losing readahead will cause a huge >> slowdown; you should only win if each row is like... a megabyte >> apiece. Seeks are much much slower than continuous reads. So I'm >> surprised if this actually helps! But that's just theory, so I am >> curious to see the actual numbers. >> >> Re: memory savings -- it's definitely a win to avoid allocating the >> whole array if you just want to read one column, but you can >> accomplish that without any particular cleverness in the low-level >> file reading code. You just read the first N rows into a fixed-size >> temp buffer, copy out the relevant column into the output array, >> repeat. > > Certainly this could also be done that way, and would be equally good > for some cases. > > I've already got all the C code to do this stuff, so it not much work > for me to hack it into my numpy fork. ?If it turns out there are cases > that are faster using another method, we should root them out during > testing and add the appropriate logic. Cool. I'm just a little concerned that, since we seem to have like... 5 different implementations of this stuff all being worked on at the same time, we need to get some consensus on which features actually matter, so they can be melded together into the Single Best File Reader Evar. An interface where indexing and file-reading are combined is significantly more complicated than one where the core file-reading inner-loop can ignore indexing. So far I'm not sure why this complexity would be worthwhile, so that's what I'm trying to understand. Cheers, -- Nathaniel > Also, for some crazy ascii files we may want to revert to pure python > anyway, but I think these should be special cases that can be flagged > at runtime through keyword arguments to the python functions. > > BTW, did you mean to go off-list? > > cheers, > > -e > -- > Erin Scott Sheldon > Brookhaven National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: read-by.c Type: text/x-csrc Size: 3526 bytes Desc: not available URL: From kwmsmith at gmail.com Tue Feb 28 18:01:12 2012 From: kwmsmith at gmail.com (Kurt Smith) Date: Tue, 28 Feb 2012 17:01:12 -0600 Subject: [Numpy-discussion] Determining the 'viewness' of an array, and flags.owndata confusion Message-ID: For an arbitrary numpy array 'a', what does 'a.flags.owndata' indicate? I originally thought that owndata is False iff 'a' is a view. But that is incorrect. Consider the following: In [119]: a = np.zeros((3,3)) In [120]: a.flags.owndata # should be True; zeros() creates and returns a non-view array. Out[120]: True In [121]: a_view1 = a[2:, :] In [122]: a_view1.flags.owndata # expected to be False Out[122]: False In [123]: a_fancy1 = a[[0,1], :] In [124]: a_fancy1.flags.owndata # expected to be True, a_fancy1 is a fancy-indexed array. Out[124]: True In [125]: a_fancy2 = a[:, [0,1]] In [126]: a_fancy2.flags.owndata # expected to be True, a_fancy2 is a fancy-indexed array. Out[126]: False So when I query an array's flags.owndata, what is it telling me? What I want to know is whether an array is a view or not. If flags.owndata has nothing to do with the 'viewness' of an array, how would I determine if an array is a view? In the previous example, a_fancy2 does not own its data, as indicated by 'owndata' being False. But when I modify a_fancy2, 'a' is not modified, as expected, but contrary to what 'owndata' would seem to indicate. The numpybook's documentation of owndata could perhaps be a bit clearer on the subject: """ OWNDATA (O) the array owns the memory it uses or if it borrows it from another object (if this is False, the base attribute retrieves a reference to the object this array obtained its data from) """ >From my reading, owndata has nothing to do with the 'viewness' of an array. Is this correct? What is the intent of this flag, then? Its wording could perhaps be improved: owndata is True iff (1) the array owns the memory it uses or (2) the array borrows it from another object. The second clause seems to indicate that the array **does not** own its data since it is borrowing it from another object. However, flags.owndata will be true in this case. If I cannot use flags.owndata, what is a reliable way to determine whether or not an array is a view? From balarsen at lanl.gov Tue Feb 28 18:12:48 2012 From: balarsen at lanl.gov (Larsen, Brian A) Date: Tue, 28 Feb 2012 23:12:48 +0000 Subject: [Numpy-discussion] Determining the 'viewness' of an array, and flags.owndata confusion In-Reply-To: References: Message-ID: <81ABBF06-EEBF-4EDC-8BE8-FD604F132A34@lanl.gov> Stack overflow post related to this: http://stackoverflow.com/questions/9164269/can-you-tell-if-an-array-is-a-view-of-another On Feb 28, 2012, at 4:01 PM, Kurt Smith wrote: For an arbitrary numpy array 'a', what does 'a.flags.owndata' indicate? I originally thought that owndata is False iff 'a' is a view. But that is incorrect. Consider the following: In [119]: a = np.zeros((3,3)) In [120]: a.flags.owndata # should be True; zeros() creates and returns a non-view array. Out[120]: True In [121]: a_view1 = a[2:, :] In [122]: a_view1.flags.owndata # expected to be False Out[122]: False In [123]: a_fancy1 = a[[0,1], :] In [124]: a_fancy1.flags.owndata # expected to be True, a_fancy1 is a fancy-indexed array. Out[124]: True In [125]: a_fancy2 = a[:, [0,1]] In [126]: a_fancy2.flags.owndata # expected to be True, a_fancy2 is a fancy-indexed array. Out[126]: False So when I query an array's flags.owndata, what is it telling me? What I want to know is whether an array is a view or not. If flags.owndata has nothing to do with the 'viewness' of an array, how would I determine if an array is a view? In the previous example, a_fancy2 does not own its data, as indicated by 'owndata' being False. But when I modify a_fancy2, 'a' is not modified, as expected, but contrary to what 'owndata' would seem to indicate. The numpybook's documentation of owndata could perhaps be a bit clearer on the subject: """ OWNDATA (O) the array owns the memory it uses or if it borrows it from another object (if this is False, the base attribute retrieves a reference to the object this array obtained its data from) """ >From my reading, owndata has nothing to do with the 'viewness' of an array. Is this correct? What is the intent of this flag, then? Its wording could perhaps be improved: owndata is True iff (1) the array owns the memory it uses or (2) the array borrows it from another object. The second clause seems to indicate that the array **does not** own its data since it is borrowing it from another object. However, flags.owndata will be true in this case. If I cannot use flags.owndata, what is a reliable way to determine whether or not an array is a view? _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Brian A. Larsen ISR-1 Space Science and Applications Los Alamos National Laboratory PO Box 1663, MS-D466 Los Alamos, NM 87545 USA (For overnight add: SM-30, Bikini Atoll Road) Phone: 505-665-7691 Fax: 505-665-7395 email: balarsen at lanl.gov Correspondence / Technical data or Software Publicly Available -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwmsmith at gmail.com Tue Feb 28 18:26:54 2012 From: kwmsmith at gmail.com (Kurt Smith) Date: Tue, 28 Feb 2012 17:26:54 -0600 Subject: [Numpy-discussion] Determining the 'viewness' of an array, and flags.owndata confusion In-Reply-To: <81ABBF06-EEBF-4EDC-8BE8-FD604F132A34@lanl.gov> References: <81ABBF06-EEBF-4EDC-8BE8-FD604F132A34@lanl.gov> Message-ID: On Tue, Feb 28, 2012 at 5:12 PM, Larsen, Brian A wrote: > Stack overflow post related to this: > http://stackoverflow.com/questions/9164269/can-you-tell-if-an-array-is-a-view-of-another They recommend testing "a.base". However, in the example I posted, "a.base" is not reliable either. For example: In [5]: a = zeros((3,3)) In [6]: a_fancy1 = a[[0,1], :] In [7]: a_fancy1.base # None In [8]: a_fancy2 = a[:, [0,1]] In [9]: a_fancy2.base # Not None. But a_fancy2 is not a view into 'a'. Out[9]: array([[ 0., 0., 0.], [ 0., 0., 0.]]) So if I test to see if 'arr.base' is not None, it may or or may not be a view. Is this correct? > > > > > On Feb 28, 2012, at 4:01 PM, Kurt Smith wrote: > > For an arbitrary numpy array 'a', what does 'a.flags.owndata' indicate? > > I originally thought that owndata is False iff 'a' is a view. ?But > that is incorrect. > > Consider the following: > > In [119]: a = np.zeros((3,3)) > > In [120]: a.flags.owndata ?# should be True; zeros() creates and > returns a non-view array. > Out[120]: True > > In [121]: a_view1 = a[2:, :] > > In [122]: a_view1.flags.owndata ??# expected to be False > Out[122]: False > > In [123]: a_fancy1 = a[[0,1], :] > > In [124]: a_fancy1.flags.owndata ?# expected to be True, a_fancy1 is a > fancy-indexed array. > Out[124]: True > > In [125]: a_fancy2 = a[:, [0,1]] > > In [126]: a_fancy2.flags.owndata # expected to be True, a_fancy2 is a > fancy-indexed array. > Out[126]: False > > So when I query an array's flags.owndata, what is it telling me? ?What > I want to know is whether an array is a view or not. ?If flags.owndata > has nothing to do with the 'viewness' of an array, how would I > determine if an array is a view? > > In the previous example, a_fancy2 does not own its data, as indicated > by 'owndata' being False. ?But when I modify a_fancy2, 'a' is not > modified, as expected, but contrary to what 'owndata' would seem to > indicate. > > The numpybook's documentation of owndata could perhaps be a bit > clearer on the subject: > > """ > OWNDATA (O) the array owns the memory it uses or if it borrows it from > another object (if this is False, the base attribute retrieves a > reference to the object this array obtained its data from) > """ > > From my reading, owndata has nothing to do with the 'viewness' of an > > array. ?Is this correct? ?What is the intent of this flag, then? ?Its > wording could perhaps be improved: owndata is True iff (1) the array > owns the memory it uses or (2) the array borrows it from another > object. ?The second clause seems to indicate that the array **does > not** own its data since it is borrowing it from another object. > However, flags.owndata will be true in this case. > > If I cannot use flags.owndata, what is a reliable way to determine > whether or not an array is a view? > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > -- > > Brian A. Larsen > ISR-1 Space Science and Applications > Los Alamos National Laboratory > PO Box 1663,?MS-D466 > Los Alamos, NM 87545 > USA > > (For overnight add: > SM-30, Bikini Atoll Road) > > Phone:?505-665-7691 > Fax: ??505-665-7395 > email:?balarsen at lanl.gov > > Correspondence / > Technical data or Software Publicly Available > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From robert.kern at gmail.com Tue Feb 28 18:29:12 2012 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 28 Feb 2012 23:29:12 +0000 Subject: [Numpy-discussion] Determining the 'viewness' of an array, and flags.owndata confusion In-Reply-To: References: Message-ID: On Tue, Feb 28, 2012 at 23:01, Kurt Smith wrote: > For an arbitrary numpy array 'a', what does 'a.flags.owndata' indicate? > > I originally thought that owndata is False iff 'a' is a view. ?But > that is incorrect. > > Consider the following: > > In [119]: a = np.zeros((3,3)) > > In [120]: a.flags.owndata ?# should be True; zeros() creates and > returns a non-view array. > Out[120]: True > > In [121]: a_view1 = a[2:, :] > > In [122]: a_view1.flags.owndata ? # expected to be False > Out[122]: False > > In [123]: a_fancy1 = a[[0,1], :] > > In [124]: a_fancy1.flags.owndata ?# expected to be True, a_fancy1 is a > fancy-indexed array. > Out[124]: True > > In [125]: a_fancy2 = a[:, [0,1]] > > In [126]: a_fancy2.flags.owndata # expected to be True, a_fancy2 is a > fancy-indexed array. > Out[126]: False > > So when I query an array's flags.owndata, what is it telling me? ?What > I want to know is whether an array is a view or not. ?If flags.owndata > has nothing to do with the 'viewness' of an array, how would I > determine if an array is a view? > > In the previous example, a_fancy2 does not own its data, as indicated > by 'owndata' being False. ?But when I modify a_fancy2, 'a' is not > modified, as expected, but contrary to what 'owndata' would seem to > indicate. > > The numpybook's documentation of owndata could perhaps be a bit > clearer on the subject: > > """ > OWNDATA (O) the array owns the memory it uses or if it borrows it from > another object (if this is False, the base attribute retrieves a > reference to the object this array obtained its data from) > """ > > >From my reading, owndata has nothing to do with the 'viewness' of an > array. ?Is this correct? ?What is the intent of this flag, then? ?Its > wording could perhaps be improved: owndata is True iff (1) the array > owns the memory it uses or (2) the array borrows it from another > object. ?The second clause seems to indicate that the array **does > not** own its data since it is borrowing it from another object. > However, flags.owndata will be true in this case. > > If I cannot use flags.owndata, what is a reliable way to determine > whether or not an array is a view? Your original intuition was correct. It's just that sometimes an operation will make a copy of the input data, but then make a view on that copy. The output object is a view, just not a view on the input object. -- Robert Kern From njs at pobox.com Tue Feb 28 18:31:08 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 28 Feb 2012 23:31:08 +0000 Subject: [Numpy-discussion] Determining the 'viewness' of an array, and flags.owndata confusion In-Reply-To: References: Message-ID: On Tue, Feb 28, 2012 at 11:01 PM, Kurt Smith wrote: > For an arbitrary numpy array 'a', what does 'a.flags.owndata' indicate? I think what it really indicates is whether a's destructor should call free() on a's data pointer. > I originally thought that owndata is False iff 'a' is a view. ?But > that is incorrect. > > Consider the following: > > In [119]: a = np.zeros((3,3)) > > In [120]: a.flags.owndata ?# should be True; zeros() creates and > returns a non-view array. > Out[120]: True > > In [121]: a_view1 = a[2:, :] > > In [122]: a_view1.flags.owndata ? # expected to be False > Out[122]: False > > In [123]: a_fancy1 = a[[0,1], :] > > In [124]: a_fancy1.flags.owndata ?# expected to be True, a_fancy1 is a > fancy-indexed array. > Out[124]: True > > In [125]: a_fancy2 = a[:, [0,1]] > > In [126]: a_fancy2.flags.owndata # expected to be True, a_fancy2 is a > fancy-indexed array. > Out[126]: False > > So when I query an array's flags.owndata, what is it telling me? ?What > I want to know is whether an array is a view or not. ?If flags.owndata > has nothing to do with the 'viewness' of an array, how would I > determine if an array is a view? > > In the previous example, a_fancy2 does not own its data, as indicated > by 'owndata' being False. ?But when I modify a_fancy2, 'a' is not > modified, as expected, but contrary to what 'owndata' would seem to > indicate. It looks like the fancy indexing code in this case actually creates a new array, and then instead of returning it directly, returns a (reshaped) view on this new array. If you look at a_fancy2.base, you'll see this new array. So: a_fancy2 *is* a view... it's just not a view of 'a'. It's a view of this other array. > If I cannot use flags.owndata, what is a reliable way to determine > whether or not an array is a view? owndata *is* a reliable way to determine whether or not an array is a view; it just turns out that this is not a very useful question to ask. What are you actually trying to do? There's probably another way to accomplish it. -- Nathaniel From erin.sheldon at gmail.com Tue Feb 28 18:36:58 2012 From: erin.sheldon at gmail.com (Erin Sheldon) Date: Tue, 28 Feb 2012 18:36:58 -0500 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: <1330351883-sup-9943@rohan> References: <1330092347-sup-3918@rohan> <1330207186-sup-1957@rohan> <1330351883-sup-9943@rohan> Message-ID: <1330471021-sup-7403@rohan> Hi All - I've added the relevant code to my numpy fork here https://github.com/esheldon/numpy The python module and c file are at /numpy/lib/recfile.py and /numpy/lib/src/_recfile.c Access from python is numpy.recfile See below for the doc string for the main class, Recfile. Some example usage is shown. As listed in the limitations section below, quoted strings are not yet supported for text files. This can be addressed by optionally using some smarter code when reading strings from these types of files. I'd greatly appreciate some help with that aspect. There is a test suite in numpy.recfile.test() A class for reading and writing structured arrays to and from files. Both binary and text files are supported. Any subset of the data can be read without loading the whole file. See the limitations section below for caveats. parameters ---------- fobj: file or string A string or file object. mode: string Mode for opening when fobj is a string dtype: A numpy dtype or descriptor describing each line of the file. The dtype must contain fields. This is a required parameter; it is a keyword only for clarity. Note for text files the dtype will be converted to native byte ordering. Any data written to the file must also be in the native byte ordering. nrows: int, optional Number of rows in the file. If not entered, the rows will be counted from the file itself. This is a simple calculation for binary files, but can be slow for text files. delim: string, optional The delimiter for text files. If None or "" the file is assumed to be binary. Should be a single character. skipheader: int, optional Skip this many lines in the header. offset: int, optional Move to this offset in the file. Reads will all be relative to this location. If not sent, it is taken from the current positioin in the input file object or 0 if a filename was entered. string_newlines: bool, optional If true, strings in text files may contain newlines. This is only relevant for text files when the nrows= keyword is not sent, because the number of lines must be counted. In this case the full text reading code is used to count rows instead of a simple newline count. Because the text is fully processed twice, this can double the time to read files. padnull: bool If True, nulls in strings are replaced with spaces when writing text ignorenull: bool If True, nulls in strings are not written when writing text. This results in string fields that are not fixed width, so cannot be read back in using recfile limitations ----------- Currently, only fixed width string fields are supported. String fields can contain any characters, including newlines, but for text files quoted strings are not currently supported: the quotes will be part of the result. For binary files, structured sub-arrays and complex can be writen and read, but this is not supported yet for text files. examples --------- # read from binary file dtype=[('id','i4'),('x','f8'),('y','f8'),('arr','f4',(2,2))] rec=numpy.recfile.Recfile(fname,dtype=dtype) # read all data using either slice or method notation data=rec[:] data=rec.read() # read row slices data=rec[8:55:3] # read subset of columns and possibly rows # can use either slice or method notation data=rec['x'][:] data=rec['id','x'][:] data=rec[col_list][row_list] data=rec.read(columns=col_list, rows=row_list) # for text files, just send the delimiter string # all the above calls will also work rec=numpy.recfile.Recfile(fname,dtype=dtype,delim=',') # save time for text files by sending row count rec=numpy.recfile.Recfile(fname,dtype=dtype,delim=',',nrows=10000) # write some data rec=numpy.recfile.Recfile(fname,mode='w',dtype=dtype,delim=',') rec.write(data) # append some data rec.write(more_data) # print metadata about the file print rec Recfile nrows: 345472 ncols: 6 mode: 'w' id Excerpts from Jay Bourque's message of Mon Feb 27 00:24:25 -0500 2012: > > Hi Erin, > > > > I'm the one Travis mentioned earlier about working on this. I was planning on > > diving into it this week, but it sounds like you may have some code already that > > fits the requirements? If so, I would be available to help you with > > porting/testing your code with numpy, or I can take what you have and build on > > it in my numpy fork on github. > > Hi Jay,all - > > What I've got is a solution for writing and reading structured arrays to > and from files, both in text files and binary files. It is written in C > and python. It allows reading arbitrary subsets of the data efficiently > without reading in the whole file. It defines a class Recfile that > exposes an array like interface for reading, e.g. x=rf[columns][rows]. > > Limitations: Because it was designed with arrays in mind, it doesn't > deal with not fixed-width string fields. Also, it doesn't deal with > quoted strings, as those are not necessary for writing or reading arrays > with fixed length strings. Doesn't deal with missing data. This is > where Wes' tokenizing-oriented code might be useful. So there is a fair > amount of functionality to be added for edge cases, but it provides a > framework. I think some of this can be written into the C code, others > will have to be done at the python level. > > I've forked numpy on my github account, and should have the code added > in a few days. I'll send mail when it is ready. Help will be greatly > appreciated getting this to work with loadtxt, adding functionality from > Wes' and others code, and testing. > > Also, because it works on binary files too, I think it might be worth it > to make numpy.fromfile a python function, and to use a Recfile object > when reading subsets of the data. For example numpy.fromfile(f, > rows=rows, columns=columns, dtype=dtype) could instantiate a Recfile > object to read the column and row subsets. We could rename the C > fromfile to something appropriate, and call it when the whole file is > being read (recfile uses it internally when reading ranges). > > thanks, > -e -- Erin Scott Sheldon Brookhaven National Laboratory From rowen at uw.edu Tue Feb 28 19:09:33 2012 From: rowen at uw.edu (Russell E. Owen) Date: Tue, 28 Feb 2012 16:09:33 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> <4F402B46.4020103@molden.no> Message-ID: In article , David Cournapeau wrote: > On Sat, Feb 18, 2012 at 10:50 PM, Sturla Molden wrote: > > > ??> In an ideal world, we would have a better language than C++ that can > > be spit out as > C for portability. > > > > What about a statically typed Python? (That is, not Cython.) We just > > need to make the compiler :-) > > There are better languages than C++ that has most of the technical > benefits stated in this discussion (rust and D being the most > "obvious" ones), but whose usage is unrealistic today for various > reasons: knowledge, availability on "esoteric" platforms, etc??? A new > language is completely ridiculous. I just want to say that C++ has come a long way. I used to hate it, but now that it has matured, and using some basic features of boost (especially shared_ptr) can turn it into a really nice language. The next version will be even better, but one can write nice C++ today. shared_ptr allows objects that easily manage their own memory (basic automatic garbage collection). Generic programming seems like a really good fit to numpy's array types. I am part of a large project that codes in C++ and Python and we find it works very well for us. I can't imagine working in C anymore and doing without exception handling and namespaces. So I'm sorry to hear that C++ is not being considered for a numpy rewrite. -- Russell From bryanv at continuum.io Tue Feb 28 19:49:39 2012 From: bryanv at continuum.io (Bryan Van de Ven) Date: Tue, 28 Feb 2012 16:49:39 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3F2297.7010703@creativetrax.com> <4F402B46.4020103@molden.no> Message-ID: <4F4D7623.5000801@continuum.io> On 2/28/12 4:09 PM, Russell E. Owen wrote: > I can't imagine working in C anymore and doing without exception > handling and namespaces. So I'm sorry to hear that C++ is not being > considered for a numpy rewrite. -- Russell AFAIK C++ is still being considered for numpy in the future, and I think it is safe to say that a concrete implementation will be put forward for consideration at some point. Just my own $0.02 regarding this issue: I am in favor of using C++ for numpy, I think it could confer various benefits. However, I am also in favor of explicitly deciding and documenting what subset of C++ features are acceptable for use within the numpy codebase. Bryan Van de Ven From jrocher at enthought.com Tue Feb 28 21:06:40 2012 From: jrocher at enthought.com (Jonathan Rocher) Date: Tue, 28 Feb 2012 20:06:40 -0600 Subject: [Numpy-discussion] Determining the 'viewness' of an array, and flags.owndata confusion In-Reply-To: References: Message-ID: Thank you all for your answers. Kurt and I were training new developers on numpy and telling them that - fancy indexing creates a copy - owndata was a good way to know if an array is a view or a copy. It turns out that these were both correct statements but we didn't think that the fancy indexing would sometimes return a view after making a copy (BTW is that necessary? It sounds to me like it is complicating things for no real benefit.) So a more precise question then is "is a_fancy a view on a's data or does it copy the data and its data is independent?" and the answer is probably: since "a_fancy1.base is a" is false then the 2 array's data are independent from each other. Do I have it right now? Jonathan On Tue, Feb 28, 2012 at 5:31 PM, Nathaniel Smith wrote: > On Tue, Feb 28, 2012 at 11:01 PM, Kurt Smith wrote: > > For an arbitrary numpy array 'a', what does 'a.flags.owndata' indicate? > > I think what it really indicates is whether a's destructor should call > free() on a's data pointer. > > > I originally thought that owndata is False iff 'a' is a view. But > > that is incorrect. > > > > Consider the following: > > > > In [119]: a = np.zeros((3,3)) > > > > In [120]: a.flags.owndata # should be True; zeros() creates and > > returns a non-view array. > > Out[120]: True > > > > In [121]: a_view1 = a[2:, :] > > > > In [122]: a_view1.flags.owndata # expected to be False > > Out[122]: False > > > > In [123]: a_fancy1 = a[[0,1], :] > > > > In [124]: a_fancy1.flags.owndata # expected to be True, a_fancy1 is a > > fancy-indexed array. > > Out[124]: True > > > > In [125]: a_fancy2 = a[:, [0,1]] > > > > In [126]: a_fancy2.flags.owndata # expected to be True, a_fancy2 is a > > fancy-indexed array. > > Out[126]: False > > > > So when I query an array's flags.owndata, what is it telling me? What > > I want to know is whether an array is a view or not. If flags.owndata > > has nothing to do with the 'viewness' of an array, how would I > > determine if an array is a view? > > > > In the previous example, a_fancy2 does not own its data, as indicated > > by 'owndata' being False. But when I modify a_fancy2, 'a' is not > > modified, as expected, but contrary to what 'owndata' would seem to > > indicate. > > It looks like the fancy indexing code in this case actually creates a > new array, and then instead of returning it directly, returns a > (reshaped) view on this new array. If you look at a_fancy2.base, > you'll see this new array. > > So: a_fancy2 *is* a view... it's just not a view of 'a'. It's a view > of this other array. > > > If I cannot use flags.owndata, what is a reliable way to determine > > whether or not an array is a view? > > owndata *is* a reliable way to determine whether or not an array is a > view; it just turns out that this is not a very useful question to > ask. > > What are you actually trying to do? There's probably another way to > accomplish it. > > -- Nathaniel > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Jonathan Rocher, PhD Scientific software developer Enthought, Inc. jrocher at enthought.com 1-512-536-1057 http://www.enthought.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Wed Feb 29 00:51:48 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 28 Feb 2012 21:51:48 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <4F4D7623.5000801@continuum.io> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3F2297.7010703@creativetrax.com> <4F402B46.4020103@molden.no> <4F4D7623.5000801@continuum.io> Message-ID: On Tue, Feb 28, 2012 at 4:49 PM, Bryan Van de Ven wrote: > Just my own $0.02 regarding this issue: I am in favor of using C++ for > numpy, I think it could confer various benefits. However, I am also in > favor of explicitly deciding and documenting what subset of C++ features > are acceptable for use within the numpy codebase. I would *love* to see us adopt the NEP/PEP process for decisions as complex as this one. The PEP process serves the Python community very well, and I think it's an excellent balance of minimal overhead and maximum benefit for organizing the process of making complex/controversial decisions. PEP/NEPs serve a number of important purposes: - they encourage the proponent of the idea to organize the initial presentation in a concrete, easy to follow way that can be used for decision making. - they serve as a stable reference of the key points in a discussion, in contrast to the meandering that is normal of a mailing list thread. - they can be updated and evolve as the discussion happens, incorporating the distilled ideas that result. - if important new points are brought up in the discussion, the community can ensure that they are added to the NEP. - once a decision is reached, the NEP is updated with the rationale for the decision. Whether it's acceptance or rejection, this ensures that in the future, others can come back to this document to see the reasons, avoiding repetitive discussions. - the NEP can serve as documentation for a specific feature; we see this often in Python, where the standard docs refer to PEPs for details. - over time, these documents build a history of the key decisions in the design of a project, in a way that is much easier to read and reason about than a random splatter of long mailing list threads. I was offline when the long discussions on process happened a few weeks ago, and it's not my intent to dig into every point brought up there. I'm only proposing that we adopt the NEP process for complex decisions, of which the C++ shift is certainly one. In the end, I think the NEP process will actually *help* the discussion process. It helps keep the key points on focus even as the discussion may drift in the mailing list, which means ultimately everyone wastes less energy. I obviously can't force anyone to do this, but for what it's worth, I know that at least for IPython, I've had this in mind for a while. We haven't had any majorly contentious decisions that really need it yet, but for example I have in mind a redesign and extension of the magic system that I intend to write-up pep-style. While I suspect nobody would yell if I just went ahead and implemented it on a pull request, there are enough moving parts and new ideas that I want to gather feedback in an organized manner before proceeding with implementation. And I don't find that idea to be a burden, I actually do think it will make the whole thing go more smoothly even for me. Just a thought... f From kalatsky at gmail.com Wed Feb 29 01:38:38 2012 From: kalatsky at gmail.com (Val Kalatsky) Date: Wed, 29 Feb 2012 00:38:38 -0600 Subject: [Numpy-discussion] Determining the 'viewness' of an array, and flags.owndata confusion In-Reply-To: References: Message-ID: Viewness is in the eyes of the beholder. You have to use indirect methods to figure it out. Probably the most robust approach is to go up the base chain until you get None. In [71]: c1=np.arange(16) In [72]: c2=c1[::2] In [73]: c4=c2[::2] In [74]: c8=c4[::2] In [75]: id(c8.base)==id(c4) Out[75]: True In [76]: id(c8.base.base.base)==id(c1) Out[76]: True The graph of dependencies is only a tree, so sooner or later you'll get to the root. Val On Tue, Feb 28, 2012 at 8:06 PM, Jonathan Rocher wrote: > Thank you all for your answers. Kurt and I were training new developers on > numpy and telling them that > - fancy indexing creates a copy > - owndata was a good way to know if an array is a view or a copy. > It turns out that these were both correct statements but we didn't think > that the fancy indexing would sometimes return a view after making a copy > (BTW is that necessary? It sounds to me like it is complicating things for > no real benefit.) > > So a more precise question then is "is a_fancy a view on a's data or does > it copy the data and its data is independent?" and the answer is probably: > since "a_fancy1.base is a" is false then the 2 array's data are > independent from each other. > > Do I have it right now? > > Jonathan > > > On Tue, Feb 28, 2012 at 5:31 PM, Nathaniel Smith wrote: > >> On Tue, Feb 28, 2012 at 11:01 PM, Kurt Smith wrote: >> > For an arbitrary numpy array 'a', what does 'a.flags.owndata' indicate? >> >> I think what it really indicates is whether a's destructor should call >> free() on a's data pointer. >> >> > I originally thought that owndata is False iff 'a' is a view. But >> > that is incorrect. >> > >> > Consider the following: >> > >> > In [119]: a = np.zeros((3,3)) >> > >> > In [120]: a.flags.owndata # should be True; zeros() creates and >> > returns a non-view array. >> > Out[120]: True >> > >> > In [121]: a_view1 = a[2:, :] >> > >> > In [122]: a_view1.flags.owndata # expected to be False >> > Out[122]: False >> > >> > In [123]: a_fancy1 = a[[0,1], :] >> > >> > In [124]: a_fancy1.flags.owndata # expected to be True, a_fancy1 is a >> > fancy-indexed array. >> > Out[124]: True >> > >> > In [125]: a_fancy2 = a[:, [0,1]] >> > >> > In [126]: a_fancy2.flags.owndata # expected to be True, a_fancy2 is a >> > fancy-indexed array. >> > Out[126]: False >> > >> > So when I query an array's flags.owndata, what is it telling me? What >> > I want to know is whether an array is a view or not. If flags.owndata >> > has nothing to do with the 'viewness' of an array, how would I >> > determine if an array is a view? >> > >> > In the previous example, a_fancy2 does not own its data, as indicated >> > by 'owndata' being False. But when I modify a_fancy2, 'a' is not >> > modified, as expected, but contrary to what 'owndata' would seem to >> > indicate. >> >> It looks like the fancy indexing code in this case actually creates a >> new array, and then instead of returning it directly, returns a >> (reshaped) view on this new array. If you look at a_fancy2.base, >> you'll see this new array. >> >> So: a_fancy2 *is* a view... it's just not a view of 'a'. It's a view >> of this other array. >> >> > If I cannot use flags.owndata, what is a reliable way to determine >> > whether or not an array is a view? >> >> owndata *is* a reliable way to determine whether or not an array is a >> view; it just turns out that this is not a very useful question to >> ask. >> >> What are you actually trying to do? There's probably another way to >> accomplish it. >> >> -- Nathaniel >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > > -- > Jonathan Rocher, PhD > Scientific software developer > Enthought, Inc. > jrocher at enthought.com > 1-512-536-1057 > http://www.enthought.com > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Wed Feb 29 01:46:59 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 28 Feb 2012 22:46:59 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3F2297.7010703@creativetrax.com> <4F402B46.4020103@molden.no> <4F4D7623.5000801@continuum.io> Message-ID: <3CB0878C-A0B3-4B51-84AC-8BF874752C13@continuum.io> We already use the NEP process for such decisions. This discussion came from simply from the *idea* of writing such a NEP. Nothing has been decided. Only opinions have been shared that might influence the NEP. This is all pretty premature, though --- migration to C++ features on a trial branch is some months away were it to happen. Travis -- Travis Oliphant (on a mobile) 512-826-7480 On Feb 28, 2012, at 9:51 PM, Fernando Perez wrote: > On Tue, Feb 28, 2012 at 4:49 PM, Bryan Van de Ven wrote: >> Just my own $0.02 regarding this issue: I am in favor of using C++ for >> numpy, I think it could confer various benefits. However, I am also in >> favor of explicitly deciding and documenting what subset of C++ features >> are acceptable for use within the numpy codebase. > > I would *love* to see us adopt the NEP/PEP process for decisions as > complex as this one. The PEP process serves the Python community very > well, and I think it's an excellent balance of minimal overhead and > maximum benefit for organizing the process of making > complex/controversial decisions. PEP/NEPs serve a number of important > purposes: > > - they encourage the proponent of the idea to organize the initial > presentation in a concrete, easy to follow way that can be used for > decision making. > > - they serve as a stable reference of the key points in a discussion, > in contrast to the meandering that is normal of a mailing list thread. > > - they can be updated and evolve as the discussion happens, > incorporating the distilled ideas that result. > > - if important new points are brought up in the discussion, the > community can ensure that they are added to the NEP. > > - once a decision is reached, the NEP is updated with the rationale > for the decision. Whether it's acceptance or rejection, this ensures > that in the future, others can come back to this document to see the > reasons, avoiding repetitive discussions. > > - the NEP can serve as documentation for a specific feature; we see > this often in Python, where the standard docs refer to PEPs for > details. > > - over time, these documents build a history of the key decisions in > the design of a project, in a way that is much easier to read and > reason about than a random splatter of long mailing list threads. > > > I was offline when the long discussions on process happened a few > weeks ago, and it's not my intent to dig into every point brought up > there. I'm only proposing that we adopt the NEP process for complex > decisions, of which the C++ shift is certainly one. > > In the end, I think the NEP process will actually *help* the > discussion process. It helps keep the key points on focus even as the > discussion may drift in the mailing list, which means ultimately > everyone wastes less energy. > > I obviously can't force anyone to do this, but for what it's worth, I > know that at least for IPython, I've had this in mind for a while. We > haven't had any majorly contentious decisions that really need it yet, > but for example I have in mind a redesign and extension of the magic > system that I intend to write-up pep-style. While I suspect nobody > would yell if I just went ahead and implemented it on a pull request, > there are enough moving parts and new ideas that I want to gather > feedback in an organized manner before proceeding with implementation. > And I don't find that idea to be a burden, I actually do think it > will make the whole thing go more smoothly even for me. > > Just a thought... > > f > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From fperez.net at gmail.com Wed Feb 29 02:03:28 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 28 Feb 2012 23:03:28 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <3CB0878C-A0B3-4B51-84AC-8BF874752C13@continuum.io> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3F2297.7010703@creativetrax.com> <4F402B46.4020103@molden.no> <4F4D7623.5000801@continuum.io> <3CB0878C-A0B3-4B51-84AC-8BF874752C13@continuum.io> Message-ID: On Tue, Feb 28, 2012 at 10:46 PM, Travis Oliphant wrote: > We already use the NEP process for such decisions. ? This discussion came from simply from the *idea* of writing such a NEP. > > Nothing has been decided. ?Only opinions have been shared that might influence the NEP. ?This is all pretty premature, though --- ?migration to C++ features on a trial branch is some months away were it to happen. Sure, I know we do have neps, they live in the main numpy repo (which btw, I think they should be moved to a standalone repo to make their management independent of the core code, but that's an easy and minor point we can ignore for now). I was just thinking that this discussion is precisely the kind of thing that would be well served by being organized in a nep, before even jumping into implementation. A nep can precisely help organize a discussion where there's enough to think about and make decisions *before* effort has gone into implementing anything. It's important not to forget that once someone goes far enough down the road of implementing something, this adds pressure to turn the implementation into a fait accompli, simply out of not wanting to throw work away. For a decision as binary as 'rewrite the core in C++ or not', it would seem to me that organizing the problem in a NEP *before* starting to implement something in a trial branch would be precisely the way to go, and that it would actually make the decision process and discussion easier and more productive. Cheers, f From mwwiebe at gmail.com Wed Feb 29 02:28:01 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Tue, 28 Feb 2012 23:28:01 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3F2297.7010703@creativetrax.com> <4F402B46.4020103@molden.no> <4F4D7623.5000801@continuum.io> <3CB0878C-A0B3-4B51-84AC-8BF874752C13@continuum.io> Message-ID: On Tue, Feb 28, 2012 at 11:03 PM, Fernando Perez wrote: > On Tue, Feb 28, 2012 at 10:46 PM, Travis Oliphant > wrote: > > We already use the NEP process for such decisions. This discussion > came from simply from the *idea* of writing such a NEP. > > > > Nothing has been decided. Only opinions have been shared that might > influence the NEP. This is all pretty premature, though --- migration to > C++ features on a trial branch is some months away were it to happen. > > Sure, I know we do have neps, they live in the main numpy repo (which > btw, I think they should be moved to a standalone repo to make their > management independent of the core code, but that's an easy and minor > point we can ignore for now). I was just thinking that this discussion > is precisely the kind of thing that would be well served by being > organized in a nep, before even jumping into implementation. > > A nep can precisely help organize a discussion where there's enough to > think about and make decisions *before* effort has gone into > implementing anything. It's important not to forget that once someone > goes far enough down the road of implementing something, this adds > pressure to turn the implementation into a fait accompli, simply out > of not wanting to throw work away. > > For a decision as binary as 'rewrite the core in C++ or not', it would > seem to me that organizing the problem in a NEP *before* starting to > implement something in a trial branch would be precisely the way to > go, and that it would actually make the decision process and > discussion easier and more productive. > The development approach I really like is to start with a relatively rough NEP, then cycle through feedback, updating the NEP, and implementation. Organizing ones thoughts to describe them in a design document can often clarify things that are confusing when just looking at code. Feedback from the community, both developers and users, can help expose where your assumptions are and often lead to insights from subjects you didn't even know about. Implementation puts those ideas through the a cold, hard, reality check, and can provide a hands-on experience for later rounds of feedback. This iterative process is most important to emphasize, the design document and the code must both evolve together. Stamping a NEP as "final" before getting into code is just as bad as jumping into code without writing a preliminary design. For the decision about adopting C++, a NEP proposing how we would go about doing it, which evolves as the community gains experience with the idea, will be very helpful. I would emphasize that the adoption of C++ does not require a rewrite. The patch required to make NumPy build with a C++ compiler is very small, and individual features of C++ can be adopted slowly, in a piecemeal fashion. What I'm advocating for is this kind of gradual evolution, and my starting point for writing a NEP would be the email I wrote here: http://mail.scipy.org/pipermail/numpy-discussion/2012-February/060778.html Github actually has a bug that the RST table of contents is stripped, and this makes reading longer NEPS right in the repository uncomfortable. Maybe alternatives to a git repository for NEPs should be considered. I reported the bug to github, but they told me that was just how they did things. Cheers, Mark > > Cheers, > > f > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From erin.sheldon at gmail.com Wed Feb 29 10:11:51 2012 From: erin.sheldon at gmail.com (Erin Sheldon) Date: Wed, 29 Feb 2012 10:11:51 -0500 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: <1330092347-sup-3918@rohan> <1330207186-sup-1957@rohan> <1330351883-sup-9943@rohan> <1330365437-sup-7898@rohan> <1330387831-sup-839@rohan> Message-ID: <1330526203-sup-2784@rohan> Excerpts from Nathaniel Smith's message of Tue Feb 28 17:22:16 -0500 2012: > > Even for binary, there are pathological cases, e.g. 1) reading a random > > subset of nearly all rows. ?2) reading a single column when rows are > > small. ?In case 2 you will only go this route in the first place if you > > need to save memory. ?The user should be aware of these issues. > > FWIW, this route actually doesn't save any memory as compared to np.memmap. Actually, for numpy.memmap you will read the whole file if you try to grab a single column and read a large fraction of the rows. Here is an example that will end up pulling the entire file into memory mm=numpy.memmap(fname, dtype=dtype) rows=numpy.arange(mm.size) x=mm['x'][rows] I just tested this on a 3G binary file and I'm sitting at 3G memory usage. I believe this is because numpy.memmap only understands rows. I don't fully understand the reason for that, but I suspect it is related to the fact that the ndarray really only has a concept of itemsize, and the fields are really just a reinterpretation of those bytes. It may be that one could tweak the ndarray code to get around this. But I would appreciate enlightenment on this subject. This fact was the original motivator for writing my code; the text reading ability came later. > Cool. I'm just a little concerned that, since we seem to have like... > 5 different implementations of this stuff all being worked on at the > same time, we need to get some consensus on which features actually > matter, so they can be melded together into the Single Best File > Reader Evar. An interface where indexing and file-reading are combined > is significantly more complicated than one where the core file-reading > inner-loop can ignore indexing. So far I'm not sure why this > complexity would be worthwhile, so that's what I'm trying to > understand. I think I've addressed the reason why the low level C code was written. And I think a unified, high level interface to binary and text files, which the Recfile class provides, is worthwhile. Can you please say more about "...one where the core file-reading inner-loop can ignore indexing"? I didn't catch the meaning. -e > > Cheers, > -- Nathaniel > > > Also, for some crazy ascii files we may want to revert to pure python > > anyway, but I think these should be special cases that can be flagged > > at runtime through keyword arguments to the python functions. > > > > BTW, did you mean to go off-list? > > > > cheers, > > > > -e > > -- > > Erin Scott Sheldon > > Brookhaven National Laboratory -- Erin Scott Sheldon Brookhaven National Laboratory From erin.sheldon at gmail.com Wed Feb 29 10:14:08 2012 From: erin.sheldon at gmail.com (Erin Sheldon) Date: Wed, 29 Feb 2012 10:14:08 -0500 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: <1330526203-sup-2784@rohan> References: <1330092347-sup-3918@rohan> <1330207186-sup-1957@rohan> <1330351883-sup-9943@rohan> <1330365437-sup-7898@rohan> <1330387831-sup-839@rohan> <1330526203-sup-2784@rohan> Message-ID: <1330528396-sup-576@rohan> Excerpts from Erin Sheldon's message of Wed Feb 29 10:11:51 -0500 2012: > Actually, for numpy.memmap you will read the whole file if you try to > grab a single column and read a large fraction of the rows. Here is an That should have been: "...read *all* the rows". -e -- Erin Scott Sheldon Brookhaven National Laboratory From pwl_b at wp.pl Wed Feb 29 10:22:23 2012 From: pwl_b at wp.pl (=?utf-8?b?UGF3ZcWC?= Biernat) Date: Wed, 29 Feb 2012 15:22:23 +0000 (UTC) Subject: [Numpy-discussion] [Numpy] quadruple precision Message-ID: I am completely new to Numpy and I know only the basics of Python, to this point I was using Fortran 03/08 to write numerical code. However, I am starting a new large project of mine and I am looking forward to using Python to call some low level Fortran code responsible for most of the intensive number crunching. In this context I stumbled into f2py and it looks just like what I need, but before I start writing an app in mixture of Python and Fortran I have a question about numerical precision of variables used in numpy and f2py. Is there any way to interact with Fortran's real(16) (supported by gcc and Intel's ifort) data type from numpy? By real(16) I mean the binary128 type as in IEEE 754. (In C this data type is experimentally supported as __float128 (gcc) and _Quad (Intel's icc).) I have investigated the float128 data type, but it seems to work as binary64 or binary80 depending on the architecture. If there is currently no way to interact with binary128, how hard would it be to patch the sources of numpy to add such data type? I am interested only in basic stuff, comparable in functionality to libmath. As said before, I have little knowledge of Python, Numpy and f2py, I am however, interested in investing some time in learing it and implementing the mentioned features, but only if there is any hope of succeeding. From robert.kern at gmail.com Wed Feb 29 10:51:45 2012 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 29 Feb 2012 15:51:45 +0000 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: <1330526203-sup-2784@rohan> References: <1330092347-sup-3918@rohan> <1330207186-sup-1957@rohan> <1330351883-sup-9943@rohan> <1330365437-sup-7898@rohan> <1330387831-sup-839@rohan> <1330526203-sup-2784@rohan> Message-ID: On Wed, Feb 29, 2012 at 15:11, Erin Sheldon wrote: > Excerpts from Nathaniel Smith's message of Tue Feb 28 17:22:16 -0500 2012: >> > Even for binary, there are pathological cases, e.g. 1) reading a random >> > subset of nearly all rows. ?2) reading a single column when rows are >> > small. ?In case 2 you will only go this route in the first place if you >> > need to save memory. ?The user should be aware of these issues. >> >> FWIW, this route actually doesn't save any memory as compared to np.memmap. > > Actually, for numpy.memmap you will read the whole file if you try to > grab a single column and read a large fraction of the rows. ?Here is an > example that will end up pulling the entire file into memory > > ? ?mm=numpy.memmap(fname, dtype=dtype) > ? ?rows=numpy.arange(mm.size) > ? ?x=mm['x'][rows] > > I just tested this on a 3G binary file and I'm sitting at 3G memory > usage. ?I believe this is because numpy.memmap only understands rows. ?I > don't fully understand the reason for that, but I suspect it is related > to the fact that the ndarray really only has a concept of itemsize, and > the fields are really just a reinterpretation of those bytes. ?It may be > that one could tweak the ndarray code to get around this. ?But I would > appreciate enlightenment on this subject. Each record (I would avoid the word "row" in this context) is contiguous in memory whether that memory is mapped to disk or not. Additionally, the way that virtual memory (i.e. mapped memory) works is that when you request the data at a given virtual address, the OS will go look up the page it resides in (typically 4-8k in size) and pull the whole page into main memory. Since you are requesting most of the records, you are probably pulling all of the file into main memory. Memory mapping works best when you pull out contiguous chunks at a time rather than pulling out stripes. numpy structured arrays do not rearrange your data to put all of the 'x' data contiguous with each other. You can arrange that yourself, if you like (use a structured scalar with a dtype such that each field is an array of the appropriate length and dtype). Then pulling out all of the 'x' field values will only touch a smaller fraction of the file. -- Robert Kern From fperez.net at gmail.com Wed Feb 29 11:11:16 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Wed, 29 Feb 2012 08:11:16 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3F2297.7010703@creativetrax.com> <4F402B46.4020103@molden.no> <4F4D7623.5000801@continuum.io> <3CB0878C-A0B3-4B51-84AC-8BF874752C13@continuum.io> Message-ID: On Tue, Feb 28, 2012 at 11:28 PM, Mark Wiebe wrote: > The development approach I really like is to start with a relatively rough > NEP, then cycle through feedback, updating the NEP, and implementation. > Organizing ones thoughts to describe them in a design document can often > clarify things that are confusing when just looking at code. Feedback from > the community, both developers and users, can help expose where your > assumptions are and often lead to insights from subjects you didn't even > know about. Implementation puts those ideas through the a cold, hard, > reality check, and can provide a hands-on experience for later rounds of > feedback. > This iterative process is most important to emphasize, the design document > and the code must both evolve together. Stamping a NEP as "final" before > getting into code is just as bad as jumping into code without writing a > preliminary design. Certainly! We're in complete agreement here. I didn't mean to suggest (though perhaps I phrased it poorly) that the nep discussion and implementation phases should be fully disjoint, since I do believe that implementation and discussion can and should inform each other. > Github actually has a bug that the RST table of contents is stripped, and > this makes reading longer NEPS right in the repository uncomfortable. Maybe > alternatives to a git repository for NEPs should be considered. I reported > the bug to github, but they told me that was just how they did things. That's easy to solve, and can be done with a minimum of work in a way that will make the nep-handling process far eaiser: - split the neps into their own repo, and make that a repo targeted for building a website, like we do with the ipython docs for example. - have a 'nep repo manager' who merges PRs from nep authors quickly. In practice, nep authors could even be given write access to the repo while they work on their own nep, I think we can trust people not to mess around outside their directory. - the nep repo is source-only, and we have a nep-web repo where the *built* neps are displayed using the gh-pages mechanism. With this, we achieve something like what python uses, with a separate and nicely formatted web version of the neps for easy reading, but in addition with the fluidity of the github workflow for source management. We already have all the pieces for this, so it would be a very easy job for someone to make it happen (~2 hours at most, would be my guess). Cheers, f From jrocher at enthought.com Wed Feb 29 12:13:35 2012 From: jrocher at enthought.com (Jonathan Rocher) Date: Wed, 29 Feb 2012 11:13:35 -0600 Subject: [Numpy-discussion] [Numpy] quadruple precision In-Reply-To: References: Message-ID: Thanks to your question, I discovered that there is a float128 dtype in numpy In[5]: np.__version__ Out[5]: '1.6.1' In[6]: np.float128? Type: type Base Class: String Form: Namespace: Interactive File: /Library/Frameworks/Python.framework/Versions/7.2/lib/python2.7/site-packages/numpy/__init__.py Docstring: 128-bit floating-point number. Character code: 'g'. C long float compatible. Based on some reported issues, it seems like there are issues though with this and its mapping to python long integer... http://mail.scipy.org/pipermail/numpy-discussion/2011-October/058784.html HTH, Jonathan On Wed, Feb 29, 2012 at 9:22 AM, Pawe? Biernat wrote: > I am completely new to Numpy and I know only the basics of Python, to > this point I was using Fortran 03/08 to write numerical code. However, > I am starting a new large project of mine and I am looking forward to > using Python to call some low level Fortran code responsible for most > of the intensive number crunching. In this context I stumbled into > f2py and it looks just like what I need, but before I start writing an > app in mixture of Python and Fortran I have a question about numerical > precision of variables used in numpy and f2py. > > Is there any way to interact with Fortran's real(16) (supported by gcc > and Intel's ifort) data type from numpy? By real(16) I mean the > binary128 type as in IEEE 754. (In C this data type is experimentally > supported as __float128 (gcc) and _Quad (Intel's icc).) I have > investigated the float128 data type, but it seems to work as binary64 > or binary80 depending on the architecture. If there is currently no > way to interact with binary128, how hard would it be to patch the > sources of numpy to add such data type? I am interested only in > basic stuff, comparable in functionality to libmath. > > As said before, I have little knowledge of Python, Numpy and f2py, I > am however, interested in investing some time in learing it and > implementing the mentioned features, but only if there is any hope of > succeeding. > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Jonathan Rocher, PhD Scientific software developer Enthought, Inc. jrocher at enthought.com 1-512-536-1057 http://www.enthought.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Feb 29 13:14:34 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 29 Feb 2012 13:14:34 -0500 Subject: [Numpy-discussion] [Numpy] quadruple precision In-Reply-To: References: Message-ID: Hi, On Wed, Feb 29, 2012 at 12:13 PM, Jonathan Rocher wrote: > Thanks to your question, I discovered that there is a float128 dtype in > numpy > > In[5]: np.__version__ > Out[5]: '1.6.1' > > In[6]: np.float128? > Type:?????? type > Base Class: > String Form: > Namespace:? Interactive > File: > /Library/Frameworks/Python.framework/Versions/7.2/lib/python2.7/site-packages/numpy/__init__.py > Docstring: > 128-bit floating-point number. Character code: 'g'. C long float > compatible. Right - but remember that numpy float128 is different on different platforms. In particular, float128 is any C longdouble type that needs 128 bits of memory, regardless of precision or implementation. See [1] for background on C longdouble type. The numpy platforms I know about are: Intel : 80 bit float padded to 128 bits [2] PPC : pair of float64 values [3] Debian IBM s390 : real quadruple precision [4] [5] I see that some Sun machines implement real quadruple precision in software but I haven't run numpy on a Sun machine [6] [1] http://en.wikipedia.org/wiki/Long_double [2] http://en.wikipedia.org/wiki/Extended_precision#x86_Architecture_Extended_Precision_Format [3] http://en.wikipedia.org/wiki/Double-double_%28arithmetic%29#Double-double_arithmetic [4] http://en.wikipedia.org/wiki/Double-double_%28arithmetic%29#IEEE_754_quadruple-precision_binary_floating-point_format:_binary128 [5] https://github.com/nipy/nibabel/issues/76 [6] http://en.wikipedia.org/wiki/Double-double_%28arithmetic%29#Implementations > Based on some reported issues, it seems like there are issues though with > this and its mapping to python long integer... > http://mail.scipy.org/pipermail/numpy-discussion/2011-October/058784.html I tried to summarize the problems I knew about here: http://mail.scipy.org/pipermail/numpy-discussion/2011-November/059087.html There are some routines to deal with some of the problems here: https://github.com/nipy/nibabel/blob/master/nibabel/casting.py After spending some time with the various longdoubles in numpy, I have learned to stare at my code for a long time considering how it might run into the various problems above. Best, Matthew From njs at pobox.com Wed Feb 29 13:17:53 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 29 Feb 2012 18:17:53 +0000 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: <1330526203-sup-2784@rohan> References: <1330092347-sup-3918@rohan> <1330207186-sup-1957@rohan> <1330351883-sup-9943@rohan> <1330365437-sup-7898@rohan> <1330387831-sup-839@rohan> <1330526203-sup-2784@rohan> Message-ID: On Wed, Feb 29, 2012 at 3:11 PM, Erin Sheldon wrote: > Excerpts from Nathaniel Smith's message of Tue Feb 28 17:22:16 -0500 2012: >> > Even for binary, there are pathological cases, e.g. 1) reading a random >> > subset of nearly all rows. ?2) reading a single column when rows are >> > small. ?In case 2 you will only go this route in the first place if you >> > need to save memory. ?The user should be aware of these issues. >> >> FWIW, this route actually doesn't save any memory as compared to np.memmap. > > Actually, for numpy.memmap you will read the whole file if you try to > grab a single column and read a large fraction of the rows. ?Here is an > example that will end up pulling the entire file into memory > > ? ?mm=numpy.memmap(fname, dtype=dtype) > ? ?rows=numpy.arange(mm.size) > ? ?x=mm['x'][rows] > > I just tested this on a 3G binary file and I'm sitting at 3G memory > usage. ?I believe this is because numpy.memmap only understands rows. ?I > don't fully understand the reason for that, but I suspect it is related > to the fact that the ndarray really only has a concept of itemsize, and > the fields are really just a reinterpretation of those bytes. ?It may be > that one could tweak the ndarray code to get around this. ?But I would > appreciate enlightenment on this subject. Ahh, that makes sense. But, the tool you are using to measure memory usage is misleading you -- you haven't mentioned what platform you're on, but AFAICT none of them have very good tools for describing memory usage when mmap is in use. (There isn't a very good way to handle it.) What's happening is this: numpy read out just that column from the mmap'ed memory region. The OS saw this and decided to read the entire file, for reasons discussed previously. Then, since it had read the entire file, it decided to keep it around in memory for now, just in case some program wanted it again in the near future. Now, if you instead fetched just those bytes from the file using seek+read or whatever, the OS would treat that request in the exact same way: it'd still read the entire file, and it would still keep the whole thing around in memory. On Linux, you could test this by dropping caches (echo 1 > /proc/sys/vm/drop_caches), checking how much memory is listed as "free" in top, and then using your code to read the same file -- you'll see that the 'free' memory drops by 3 gigabytes, and the 'buffers' or 'cached' numbers will grow by 3 gigabytes. [Note: if you try this experiment, make sure that you don't have the same file opened with np.memmap -- for some reason Linux seems to ignore the request to drop_caches for files that are mmap'ed.] The difference between mmap and reading is that in the former case, then this cache memory will be "counted against" your process's resident set size. The same memory is used either way -- it's just that it gets reported differently by your tool. And in fact, this memory is not really "used" at all, in the way we usually mean that term -- it's just a cache that the OS keeps, and it will immediately throw it away if there's a better use for that memory. The only reason it's loading the whole 3 gigabytes into memory in the first place is that you have >3 gigabytes of memory to spare. You might even be able to tell the OS that you *won't* be reading that file again, so there's no point in keeping it all cached -- on Unix this is done via the madavise() or posix_fadvise() syscalls. (No guarantee the OS will actually listen, though.) > This fact was the original motivator for writing my code; the text > reading ability came later. > >> Cool. I'm just a little concerned that, since we seem to have like... >> 5 different implementations of this stuff all being worked on at the >> same time, we need to get some consensus on which features actually >> matter, so they can be melded together into the Single Best File >> Reader Evar. An interface where indexing and file-reading are combined >> is significantly more complicated than one where the core file-reading >> inner-loop can ignore indexing. So far I'm not sure why this >> complexity would be worthwhile, so that's what I'm trying to >> understand. > > I think I've addressed the reason why the low level C code was written. > And I think a unified, high level interface to binary and text files, > which the Recfile class provides, is worthwhile. > > Can you please say more about "...one where the core file-reading > inner-loop can ignore indexing"? ?I didn't catch the meaning. Sure, sorry. What I mean is just, it's easier to write code that only knows how to do a dumb sequential read, and doesn't know how to seek to particular places and pick out just the fields that are being requested. And it's easier to maintain, and optimize, and document, and add features, and so forth. (And we can still have a high-level interface on top of it, if that's useful.) So I'm trying to understand if there's really a compelling advantage that we get by build seeking smarts into our low-level C code, that we can't get otherwise. -- Nathaniel From matthew.brett at gmail.com Wed Feb 29 13:54:03 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 29 Feb 2012 13:54:03 -0500 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: <3CB0878C-A0B3-4B51-84AC-8BF874752C13@continuum.io> References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3F2297.7010703@creativetrax.com> <4F402B46.4020103@molden.no> <4F4D7623.5000801@continuum.io> <3CB0878C-A0B3-4B51-84AC-8BF874752C13@continuum.io> Message-ID: Hi, On Wed, Feb 29, 2012 at 1:46 AM, Travis Oliphant wrote: > We already use the NEP process for such decisions. ? This discussion came from simply from the *idea* of writing such a NEP. > > Nothing has been decided. ?Only opinions have been shared that might influence the NEP. ?This is all pretty premature, though --- ?migration to C++ features on a trial branch is some months away were it to happen. Fernando can correct me if I'm wrong, but I think he was asking a governance question. That is: would you (as BDF$N) consider the following guideline: "As a condition for accepting significant changes to Numpy, for each significant change, there will be a NEP. The NEP shall follow the same model as the Python PEPs - that is - there will be a summary of the changes, the issues arising, the for / against opinions and alternatives offered. There will usually be a draft implementation. The NEP will contain the resolution of the discussion as it relates to the code" For example, the masked array NEP, although very substantial, contains little discussion of the controversy arising, or the intended resolution of the controversy: https://github.com/numpy/numpy/blob/3f685a1a990f7b6e5149c80b52436fb4207e49f5/doc/neps/missing-data.rst I mean, although it is useful, it is not in the form of a PEP, as Fernando has described it. Would you accept extending the guidelines to the NEP format? Best, Matthew From erin.sheldon at gmail.com Wed Feb 29 13:57:05 2012 From: erin.sheldon at gmail.com (Erin Sheldon) Date: Wed, 29 Feb 2012 13:57:05 -0500 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: References: <1330092347-sup-3918@rohan> <1330207186-sup-1957@rohan> <1330351883-sup-9943@rohan> <1330365437-sup-7898@rohan> <1330387831-sup-839@rohan> <1330526203-sup-2784@rohan> Message-ID: <1330541308-sup-7553@rohan> Excerpts from Nathaniel Smith's message of Wed Feb 29 13:17:53 -0500 2012: > On Wed, Feb 29, 2012 at 3:11 PM, Erin Sheldon wrote: > > Excerpts from Nathaniel Smith's message of Tue Feb 28 17:22:16 -0500 2012: > >> > Even for binary, there are pathological cases, e.g. 1) reading a random > >> > subset of nearly all rows. ?2) reading a single column when rows are > >> > small. ?In case 2 you will only go this route in the first place if you > >> > need to save memory. ?The user should be aware of these issues. > >> > >> FWIW, this route actually doesn't save any memory as compared to np.memmap. > > > > Actually, for numpy.memmap you will read the whole file if you try to > > grab a single column and read a large fraction of the rows. ?Here is an > > example that will end up pulling the entire file into memory > > > > ? ?mm=numpy.memmap(fname, dtype=dtype) > > ? ?rows=numpy.arange(mm.size) > > ? ?x=mm['x'][rows] > > > > I just tested this on a 3G binary file and I'm sitting at 3G memory > > usage. ?I believe this is because numpy.memmap only understands rows. ?I > > don't fully understand the reason for that, but I suspect it is related > > to the fact that the ndarray really only has a concept of itemsize, and > > the fields are really just a reinterpretation of those bytes. ?It may be > > that one could tweak the ndarray code to get around this. ?But I would > > appreciate enlightenment on this subject. > > Ahh, that makes sense. But, the tool you are using to measure memory > usage is misleading you -- you haven't mentioned what platform you're > on, but AFAICT none of them have very good tools for describing memory > usage when mmap is in use. (There isn't a very good way to handle it.) > > What's happening is this: numpy read out just that column from the > mmap'ed memory region. The OS saw this and decided to read the entire > file, for reasons discussed previously. Then, since it had read the > entire file, it decided to keep it around in memory for now, just in > case some program wanted it again in the near future. > > Now, if you instead fetched just those bytes from the file using > seek+read or whatever, the OS would treat that request in the exact > same way: it'd still read the entire file, and it would still keep the > whole thing around in memory. On Linux, you could test this by > dropping caches (echo 1 > /proc/sys/vm/drop_caches), checking how much > memory is listed as "free" in top, and then using your code to read > the same file -- you'll see that the 'free' memory drops by 3 > gigabytes, and the 'buffers' or 'cached' numbers will grow by 3 > gigabytes. > > [Note: if you try this experiment, make sure that you don't have the > same file opened with np.memmap -- for some reason Linux seems to > ignore the request to drop_caches for files that are mmap'ed.] > > The difference between mmap and reading is that in the former case, > then this cache memory will be "counted against" your process's > resident set size. The same memory is used either way -- it's just > that it gets reported differently by your tool. And in fact, this > memory is not really "used" at all, in the way we usually mean that > term -- it's just a cache that the OS keeps, and it will immediately > throw it away if there's a better use for that memory. The only reason > it's loading the whole 3 gigabytes into memory in the first place is > that you have >3 gigabytes of memory to spare. > > You might even be able to tell the OS that you *won't* be reading that > file again, so there's no point in keeping it all cached -- on Unix > this is done via the madavise() or posix_fadvise() syscalls. (No > guarantee the OS will actually listen, though.) This is interesting, and on my machine I think I've verified that what you say is actually true. This all makes theoretical sense, but goes against some experiments I and my colleagues have done. For example, a colleague of mine was able to read a couple of large files in using my code but not using memmap. The combined files were greater than memory size. With memmap the code started swapping. This was on 32-bit OSX. But as I said, I just tested this on my linux box and it works fine with numpy.memmap. I don't have an OSX box to test this. So if what you say holds up on non-linux systems, it is in fact an indicator that the section of my code dealing with binary could be dropped; that bit was trivial anyway. -e -- Erin Scott Sheldon Brookhaven National Laboratory From ndbecker2 at gmail.com Wed Feb 29 14:20:57 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Wed, 29 Feb 2012 14:20:57 -0500 Subject: [Numpy-discussion] Proposed Roadmap Overview References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> <4F402B46.4020103@molden.no> Message-ID: Charles R Harris wrote: > On Tue, Feb 28, 2012 at 12:05 PM, John Hunter wrote: > >> On Sat, Feb 18, 2012 at 5:09 PM, David Cournapeau wrote: >> >>> >>> There are better languages than C++ that has most of the technical >>> >>> benefits stated in this discussion (rust and D being the most >>> "obvious" ones), but whose usage is unrealistic today for various >>> reasons: knowledge, availability on "esoteric" platforms, etc? A new >>> language is completely ridiculous. >>> >> >> >> I just saw this for the first time today: Linus Torvalds on C++ ( >> http://harmful.cat-v.org/software/c++/linus). The post is from 2007 so >> many of you may have seen it, but I thought it was entertainng enough and >> on-topic enough with this thread that I'd share it in case you haven't. >> >> >> The point he makes: >> >> In other words, the only way to do good, efficient, and system-level and >> portable C++ ends up to limit yourself to all the things that >> are basically >> available in C >> >> was interesting to me because the best C++ library I have ever worked with >> (agg) imports *nothing* except standard C libs (no standard template >> library). In fact, the only includes external to external to itself >> are math.h, stdlib.h, stdio.h, and string.h. >> >> To shoehorn Jamie Zawinski's famous regex quote ( >> http://regex.info/blog/2006-09-15/247). "Some people, when confronted >> with a problem, think ?I know, I'll use boost.? Now they have two >> problems." >> >> Here is the Linus post: >> >> From: Linus Torvalds linux-foundation.org> >> Subject: Re: [RFC] Convert builin-mailinfo.c to use The Better String >> Library. >> Newsgroups: gmane.comp.version-control.git >> Date: 2007-09-06 17:50:28 GMT (2 years, 14 weeks, 16 hours and 36 minutes >> ago) >> >> On Wed, 5 Sep 2007, Dmitry Kakurin wrote: >> > >> > When I first looked at Git source code two things struck me as odd: >> > 1. Pure C as opposed to C++. No idea why. Please don't talk about >> portability, >> > it's BS. >> >> *YOU* are full of bullshit. >> >> C++ is a horrible language. It's made more horrible by the fact that a lot >> of substandard programmers use it, to the point where it's much much >> easier to generate total and utter crap with it. Quite frankly, even if >> the choice of C were to do *nothing* but keep the C++ programmers out, >> that in itself would be a huge reason to use C. >> >> In other words: the choice of C is the only sane choice. I know Miles >> Bader jokingly said "to piss you off", but it's actually true. I've come >> to the conclusion that any programmer that would prefer the project to be >> in C++ over C is likely a programmer that I really *would* prefer to piss >> off, so that he doesn't come and screw up any project I'm involved with. >> >> C++ leads to really really bad design choices. You invariably start using >> the "nice" library features of the language like STL and Boost and other >> total and utter crap, that may "help" you program, but causes: >> >> - infinite amounts of pain when they don't work (and anybody who tells me >> that STL and especially Boost are stable and portable is just so full >> of BS that it's not even funny) >> >> - inefficient abstracted programming models where two years down the road >> you notice that some abstraction wasn't very efficient, but now all >> your code depends on all the nice object models around it, and you >> cannot fix it without rewriting your app. >> >> In other words, the only way to do good, efficient, and system-level and >> portable C++ ends up to limit yourself to all the things that are >> basically available in C. And limiting your project to C means that people >> don't screw that up, and also means that you get a lot of programmers that >> do actually understand low-level issues and don't screw things up with any >> idiotic "object model" crap. >> >> So I'm sorry, but for something like git, where efficiency was a primary >> objective, the "advantages" of C++ is just a huge mistake. The fact that >> we also piss off people who cannot see that is just a big additional >> advantage. >> >> If you want a VCS that is written in C++, go play with Monotone. Really. >> They use a "real database". They use "nice object-oriented libraries". >> They use "nice C++ abstractions". And quite frankly, as a result of all >> these design decisions that sound so appealing to some CS people, the end >> result is a horrible and unmaintainable mess. >> >> But I'm sure you'd like it more than git. >> >> > Yeah, Linus doesn't like C++. No doubt that is in part because of the > attempt to rewrite Linux in C++ back in the early 90's and the resulting > compiler and portability problems. Linus also writes C like it was his > native tongue, he likes to work close to the metal, and he'd probably > prefer it over Python for most problems ;) Things have improved in the > compiler department, and I think C++ really wasn't much of an improvement > over C until templates and the STL came along. The boost smart pointers are > also really nice. OTOH, it is really easy to write awful C++ because of the > way inheritance and the other features were over-hyped and the 'everything > and the kitchen sink' way it developed. Like any tool, familiarity and > skill are essential to good results, but unlike some tools, one also needs > to forgo some of the features to keep it under control. It's not a hammer, > it is a three inch wide Swiss Army Knife. > > Chuck Much of Linus's complaints have to do with the use of c++ in the _kernel_. These objections are quite different for an _application_. For example, there are issues with the need for support libraries for exception handling. Not an issue for an application. From jdh2358 at gmail.com Wed Feb 29 14:25:26 2012 From: jdh2358 at gmail.com (John Hunter) Date: Wed, 29 Feb 2012 13:25:26 -0600 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3E93CA.8020703@hawaii.edu> <1CEB1B98-4DE0-4878-AC28-2D35B50593E2@molden.no> <4F3F2297.7010703@creativetrax.com> <4F402B46.4020103@molden.no> Message-ID: On Wed, Feb 29, 2012 at 1:20 PM, Neal Becker wrote: > > Much of Linus's complaints have to do with the use of c++ in the _kernel_. > These objections are quite different for an _application_. For example, > there > are issues with the need for support libraries for exception handling. > Not an > issue for an application. > > Actually, the thread was on the git mailing list, and many of his complaints were addressing the appropriateness of C++ for git development. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Wed Feb 29 14:39:18 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 29 Feb 2012 20:39:18 +0100 Subject: [Numpy-discussion] Possible roadmap addendum: building better text file readers In-Reply-To: <1330541308-sup-7553@rohan> References: <1330092347-sup-3918@rohan> <1330207186-sup-1957@rohan> <1330351883-sup-9943@rohan> <1330365437-sup-7898@rohan> <1330387831-sup-839@rohan> <1330526203-sup-2784@rohan> <1330541308-sup-7553@rohan> Message-ID: On Wed, Feb 29, 2012 at 7:57 PM, Erin Sheldon wrote: > Excerpts from Nathaniel Smith's message of Wed Feb 29 13:17:53 -0500 2012: > > On Wed, Feb 29, 2012 at 3:11 PM, Erin Sheldon > wrote: > > > Excerpts from Nathaniel Smith's message of Tue Feb 28 17:22:16 -0500 > 2012: > > >> > Even for binary, there are pathological cases, e.g. 1) reading a > random > > >> > subset of nearly all rows. 2) reading a single column when rows are > > >> > small. In case 2 you will only go this route in the first place if > you > > >> > need to save memory. The user should be aware of these issues. > > >> > > >> FWIW, this route actually doesn't save any memory as compared to > np.memmap. > > > > > > Actually, for numpy.memmap you will read the whole file if you try to > > > grab a single column and read a large fraction of the rows. Here is an > > > example that will end up pulling the entire file into memory > > > > > > mm=numpy.memmap(fname, dtype=dtype) > > > rows=numpy.arange(mm.size) > > > x=mm['x'][rows] > > > > > > I just tested this on a 3G binary file and I'm sitting at 3G memory > > > usage. I believe this is because numpy.memmap only understands rows. > I > > > don't fully understand the reason for that, but I suspect it is related > > > to the fact that the ndarray really only has a concept of itemsize, and > > > the fields are really just a reinterpretation of those bytes. It may > be > > > that one could tweak the ndarray code to get around this. But I would > > > appreciate enlightenment on this subject. > > > > Ahh, that makes sense. But, the tool you are using to measure memory > > usage is misleading you -- you haven't mentioned what platform you're > > on, but AFAICT none of them have very good tools for describing memory > > usage when mmap is in use. (There isn't a very good way to handle it.) > > > > What's happening is this: numpy read out just that column from the > > mmap'ed memory region. The OS saw this and decided to read the entire > > file, for reasons discussed previously. Then, since it had read the > > entire file, it decided to keep it around in memory for now, just in > > case some program wanted it again in the near future. > > > > Now, if you instead fetched just those bytes from the file using > > seek+read or whatever, the OS would treat that request in the exact > > same way: it'd still read the entire file, and it would still keep the > > whole thing around in memory. On Linux, you could test this by > > dropping caches (echo 1 > /proc/sys/vm/drop_caches), checking how much > > memory is listed as "free" in top, and then using your code to read > > the same file -- you'll see that the 'free' memory drops by 3 > > gigabytes, and the 'buffers' or 'cached' numbers will grow by 3 > > gigabytes. > > > > [Note: if you try this experiment, make sure that you don't have the > > same file opened with np.memmap -- for some reason Linux seems to > > ignore the request to drop_caches for files that are mmap'ed.] > > > > The difference between mmap and reading is that in the former case, > > then this cache memory will be "counted against" your process's > > resident set size. The same memory is used either way -- it's just > > that it gets reported differently by your tool. And in fact, this > > memory is not really "used" at all, in the way we usually mean that > > term -- it's just a cache that the OS keeps, and it will immediately > > throw it away if there's a better use for that memory. The only reason > > it's loading the whole 3 gigabytes into memory in the first place is > > that you have >3 gigabytes of memory to spare. > > > > You might even be able to tell the OS that you *won't* be reading that > > file again, so there's no point in keeping it all cached -- on Unix > > this is done via the madavise() or posix_fadvise() syscalls. (No > > guarantee the OS will actually listen, though.) > > This is interesting, and on my machine I think I've verified that what > you say is actually true. > > This all makes theoretical sense, but goes against some experiments I > and my colleagues have done. For example, a colleague of mine was able > to read a couple of large files in using my code but not using memmap. > The combined files were greater than memory size. With memmap the code > started swapping. This was on 32-bit OSX. But as I said, I just tested > this on my linux box and it works fine with numpy.memmap. I don't have > an OSX box to test this. > I've seen this on OS X too. Here's another example on Linux: http://thread.gmane.org/gmane.comp.python.numeric.general/43965. Using tcmalloc was reported by a couple of people to solve that particular issue. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Wed Feb 29 14:39:56 2012 From: cournape at gmail.com (David Cournapeau) Date: Wed, 29 Feb 2012 14:39:56 -0500 Subject: [Numpy-discussion] [Numpy] quadruple precision In-Reply-To: References: Message-ID: On Wed, Feb 29, 2012 at 10:22 AM, Pawe? Biernat wrote: > I am completely new to Numpy and I know only the basics of Python, to > this point I was using Fortran 03/08 to write numerical code. However, > I am starting a new large project of mine and I am looking forward to > using Python to call some low level Fortran code responsible for most > of the intensive number crunching. In this context I stumbled into > f2py and it looks just like what I need, but before I start writing an > app in mixture of Python and Fortran I have a question about numerical > precision of variables used in numpy and f2py. > > Is there any way to interact with Fortran's real(16) (supported by gcc > and Intel's ifort) data type from numpy? By real(16) I mean the > binary128 type as in IEEE 754. (In C this data type is experimentally > supported as __float128 (gcc) and _Quad (Intel's icc).) I have > investigated the float128 data type, but it seems to work as binary64 > or binary80 depending on the architecture. If there is currently no > way to interact with binary128, how hard would it be to patch the > sources of numpy to add such data type? I am interested only in > basic stuff, comparable in functionality to libmath. > > As said before, I have little knowledge of Python, Numpy and f2py, I > am however, interested in investing some time in learing it and > implementing the mentioned features, but only if there is any hope of > succeeding. Numpy does not have proper support for the quadruple precision float numbers, because very few implementation do (no common CPU handle it in hw, for example). The dtype128 is a bit confusingly named: the 128 refers to the padding in memory, but not its "real" precision. It often (but not always) refer to the long double in the underlying C implementation. The latter depends on the OS, CPU and compilers. cheers, David From pierre.haessig at crans.org Wed Feb 29 14:52:07 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Wed, 29 Feb 2012 20:52:07 +0100 Subject: [Numpy-discussion] [Numpy] quadruple precision In-Reply-To: References: Message-ID: <4F4E81E7.3040107@crans.org> Hi, Le 29/02/2012 16:22, Pawe? Biernat a ?crit : > Is there any way to interact with Fortran's real(16) (supported by gcc > and Intel's ifort) data type from numpy? By real(16) I mean the > binary128 type as in IEEE 754. (In C this data type is experimentally > supported as __float128 (gcc) and _Quad (Intel's icc).) I googled a bit this "__float128". It seems a fairly new addition (GCC 4.6, released March 2011). The related point in the changelog [1] is : "GCC now ships with the LGPL-licensed libquadmath library, which provides quad-precision mathematical functions for targets with a __float128 datatype. __float128 is available for targets on 32-bit x86, x86-64 and Itanium architectures. The libquadmath library is automatically built on such targets when building the Fortran compiler." It seems this __float128 is newcomer in the "picture of data types" that Matthew just mentioned. As David says, arithmetic with such a 128 bits data type is probably not "hardwired" in most processors (I mean Intel & friends) which are limited to 80 bits ("long doubles") so it may be a bit slow. However, this GCC implementation with libquadmath seems to create some level of abstraction. Maybe this is one acceptably good way for a real "IEEE float 128" dtype in numpy ? Best, Pierre [1] http://gcc.gnu.org/gcc-4.6/changes.html -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From francesc at continuum.io Wed Feb 29 15:09:10 2012 From: francesc at continuum.io (Francesc Alted) Date: Wed, 29 Feb 2012 12:09:10 -0800 Subject: [Numpy-discussion] [Numpy] quadruple precision In-Reply-To: <4F4E81E7.3040107@crans.org> References: <4F4E81E7.3040107@crans.org> Message-ID: <2078A981-04C4-4A4A-9BCA-B363AEC1D23F@continuum.io> On Feb 29, 2012, at 11:52 AM, Pierre Haessig wrote: > Hi, > > Le 29/02/2012 16:22, Pawe? Biernat a ?crit : >> Is there any way to interact with Fortran's real(16) (supported by gcc >> and Intel's ifort) data type from numpy? By real(16) I mean the >> binary128 type as in IEEE 754. (In C this data type is experimentally >> supported as __float128 (gcc) and _Quad (Intel's icc).) > I googled a bit this "__float128". It seems a fairly new addition (GCC > 4.6, released March 2011). > The related point in the changelog [1] is : > > "GCC now ships with the LGPL-licensed libquadmath library, which > provides quad-precision mathematical functions for targets with a > __float128 datatype. __float128 is available for targets on 32-bit x86, > x86-64 and Itanium architectures. The libquadmath library is > automatically built on such targets when building the Fortran compiler." Great find! > It seems this __float128 is newcomer in the "picture of data types" that > Matthew just mentioned. > As David says, arithmetic with such a 128 bits data type is probably not > "hardwired" in most processors (I mean Intel & friends) which are > limited to 80 bits ("long doubles") so it may be a bit slow. However, > this GCC implementation with libquadmath seems to create some level of > abstraction. Maybe this is one acceptably good way for a real "IEEE > float 128" dtype in numpy ? That would be really nice. The problem here is two-folded: * Backwards-compatibility. float128 should represent a different data-type than before, so we probably should find a new name (and charcode!) for quad-precision. Maybe quad128? * Compiler-dependency. The new type will be only available on platforms that has GCC 4.6 or above. Again, using the new name for this should be fine. On platforms/compilers not supporting the quad128 thing, it should not be defined. Uh, I foresee many portability problems for people using this, but perhaps it is worth the mess. -- Francesc Alted From mattm184 at gmail.com Wed Feb 29 15:10:39 2012 From: mattm184 at gmail.com (Matt Miller) Date: Wed, 29 Feb 2012 12:10:39 -0800 Subject: [Numpy-discussion] Cygwin compile: fatal error: fenv/fenv.c: Message-ID: Hi all, I am getting the following error when running `python setup.py install` for Numpy in Cygwin. This error happens on the latest as well as the maintenance branched for 1.5 and 1.6. ... creating build/temp.cygwin-1.7.11-i686-2.6 creating build/temp.cygwin-1.7.11-i686-2.6/build creating build/temp.cygwin-1.7.11-i686-2.6/build/src.cygwin-1.7.11-i686-2.6 creating build/temp.cygwin-1.7.11-i686-2.6/build/src.cygwin-1.7.11-i686-2.6/numpy creating build/temp.cygwin-1.7.11-i686-2.6/build/src.cygwin-1.7.11-i686-2.6/numpy/core creating build/temp.cygwin-1.7.11-i686-2.6/build/src.cygwin-1.7.11-i686-2.6/numpy/core/src creating build/temp.cygwin-1.7.11-i686-2.6/build/src.cygwin-1.7.11-i686-2.6/numpy/core/src/npymath creating build/temp.cygwin-1.7.11-i686-2.6/numpy creating build/temp.cygwin-1.7.11-i686-2.6/numpy/core creating build/temp.cygwin-1.7.11-i686-2.6/numpy/core/src creating build/temp.cygwin-1.7.11-i686-2.6/numpy/core/src/npymath compile options: '-Inumpy/core/include -Ibuild/src.cygwin-1.7.11-i686-2.6/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/usr/include/python2.6 -Ibuild/src.cygwin-1.7.11-i686-2.6/numpy/core/src/multiarray -Ibuild/src.cygwin-1.7.11-i686-2.6/numpy/core/src/umath -c' gcc: build/src.cygwin-1.7.11-i686-2.6/numpy/core/src/npymath/npy_math.c gcc: build/src.cygwin-1.7.11-i686-2.6/numpy/core/src/npymath/ieee754.c numpy/core/src/npymath/ieee754.c.src:590:25: fatal error: fenv/fenv.c: No such file or directory compilation terminated. numpy/core/src/npymath/ieee754.c.src:590:25: fatal error: fenv/fenv.c: No such file or directory compilation terminated. error: Command "gcc -fno-strict-aliasing -g -O2 -pipe -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Inumpy/core/include -Ibuild/src.cygwin-1.7.11-i686-2.6/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/usr/include/python2.6 -Ibuild/src.cygwin-1.7.11-i686-2.6/numpy/core/src/multiarray -Ibuild/src.cygwin-1.7.11-i686-2.6/numpy/core/src/umath -c build/src.cygwin-1.7.11-i686-2.6/numpy/core/src/npymath/ieee754.c -o build/temp.cygwin-1.7.11-i686-2.6/build/src.cygwin-1.7.11-i686-2.6/numpy/core/src/npymath/ieee754.o" failed with exit status 1 Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From pwl_b at wp.pl Wed Feb 29 15:17:33 2012 From: pwl_b at wp.pl (=?utf-8?b?UGF3ZcWC?= Biernat) Date: Wed, 29 Feb 2012 20:17:33 +0000 (UTC) Subject: [Numpy-discussion] [Numpy] quadruple precision References: <4F4E81E7.3040107@crans.org> Message-ID: Pierre Haessig crans.org> writes: > > Hi, > > Le 29/02/2012 16:22, Pawe? Biernat a ?crit : > > Is there any way to interact with Fortran's real(16) (supported by gcc > > and Intel's ifort) data type from numpy? By real(16) I mean the > > binary128 type as in IEEE 754. (In C this data type is experimentally > > supported as __float128 (gcc) and _Quad (Intel's icc).) > I googled a bit this "__float128". It seems a fairly new addition (GCC > 4.6, released March 2011). > The related point in the changelog [1] is : > > "GCC now ships with the LGPL-licensed libquadmath library, which > provides quad-precision mathematical functions for targets with a > __float128 datatype. __float128 is available for targets on 32-bit x86, > x86-64 and Itanium architectures. The libquadmath library is > automatically built on such targets when building the Fortran compiler." > > It seems this __float128 is newcomer in the "picture of data types" that > Matthew just mentioned. > As David says, arithmetic with such a 128 bits data type is probably not > "hardwired" in most processors (I mean Intel & friends) which are > limited to 80 bits ("long doubles") so it may be a bit slow. However, > this GCC implementation with libquadmath seems to create some level of > abstraction. Maybe this is one acceptably good way for a real "IEEE > float 128" dtype in numpy ? > > Best, > Pierre > > [1] http://gcc.gnu.org/gcc-4.6/changes.html > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Intel also has its own implementation of binary128, although not well documented (as you said, it's software emulated, but still quite fast): http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/fortran/lin/compiler_f/fpops/fortran/fpops_flur_f.htm The documentation is for Fortran's real(16), but I belive the same holds for _Quad type in C. My naive question is, is there a way to recompile numpy with "long double" (or just "float128") replaced with "_Quad" or "__float128"? There are at least two compilers that support the respective data types, so this should be doable. I tested interoperability of binary128 with Fortran and C (using gcc and Intel's compilers) and it works like a charm. The only problem that comes to my mind is i/o, because there is no printf format for _Quad or __float128 and fortran routines have to be used to do all i/o. Pawe? From charlesr.harris at gmail.com Wed Feb 29 15:19:57 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 29 Feb 2012 13:19:57 -0700 Subject: [Numpy-discussion] [Numpy] quadruple precision In-Reply-To: <2078A981-04C4-4A4A-9BCA-B363AEC1D23F@continuum.io> References: <4F4E81E7.3040107@crans.org> <2078A981-04C4-4A4A-9BCA-B363AEC1D23F@continuum.io> Message-ID: On Wed, Feb 29, 2012 at 1:09 PM, Francesc Alted wrote: > On Feb 29, 2012, at 11:52 AM, Pierre Haessig wrote: > > > Hi, > > > > Le 29/02/2012 16:22, Pawe? Biernat a ?crit : > >> Is there any way to interact with Fortran's real(16) (supported by gcc > >> and Intel's ifort) data type from numpy? By real(16) I mean the > >> binary128 type as in IEEE 754. (In C this data type is experimentally > >> supported as __float128 (gcc) and _Quad (Intel's icc).) > > I googled a bit this "__float128". It seems a fairly new addition (GCC > > 4.6, released March 2011). > > The related point in the changelog [1] is : > > > > "GCC now ships with the LGPL-licensed libquadmath library, which > > provides quad-precision mathematical functions for targets with a > > __float128 datatype. __float128 is available for targets on 32-bit x86, > > x86-64 and Itanium architectures. The libquadmath library is > > automatically built on such targets when building the Fortran compiler." > > Great find! > > > It seems this __float128 is newcomer in the "picture of data types" that > > Matthew just mentioned. > > As David says, arithmetic with such a 128 bits data type is probably not > > "hardwired" in most processors (I mean Intel & friends) which are > > limited to 80 bits ("long doubles") so it may be a bit slow. However, > > this GCC implementation with libquadmath seems to create some level of > > abstraction. Maybe this is one acceptably good way for a real "IEEE > > float 128" dtype in numpy ? > > That would be really nice. The problem here is two-folded: > > * Backwards-compatibility. float128 should represent a different > data-type than before, so we probably should find a new name (and > charcode!) for quad-precision. Maybe quad128? > > * Compiler-dependency. The new type will be only available on platforms > that has GCC 4.6 or above. Again, using the new name for this should be > fine. On platforms/compilers not supporting the quad128 thing, it should > not be defined. > > Uh, I foresee many portability problems for people using this, but perhaps > it is worth the mess. > > The quad precision library has been there for a while, and quad precision is also supported by the Intel compiler. I don't know about MSVC. Intel has been working on adding quad precision to their hardware for several years and there is an IEEE spec for it, so some day it will be here, but it isn't here yet. It's a bit sad, I could use quad precision in FORTRAN on a VAX 25 years ago. Mind, I only needed it once ;) I suppose lack of pressing need accounts for the delay. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Wed Feb 29 15:32:21 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 29 Feb 2012 21:32:21 +0100 Subject: [Numpy-discussion] Cygwin compile: fatal error: fenv/fenv.c: In-Reply-To: References: Message-ID: On Wed, Feb 29, 2012 at 9:10 PM, Matt Miller wrote: > Hi all, > > I am getting the following error when running `python setup.py install` > for Numpy in Cygwin. This error happens on the latest as well as > the maintenance branched for 1.5 and 1.6. > This should fix it: http://projects.scipy.org/numpy/ticket/1944. Can you confirm that that works? Then I'll make the change in master. Ralf > ... > creating build/temp.cygwin-1.7.11-i686-2.6 > creating build/temp.cygwin-1.7.11-i686-2.6/build > creating build/temp.cygwin-1.7.11-i686-2.6/build/src.cygwin-1.7.11-i686-2.6 > creating > build/temp.cygwin-1.7.11-i686-2.6/build/src.cygwin-1.7.11-i686-2.6/numpy > creating > build/temp.cygwin-1.7.11-i686-2.6/build/src.cygwin-1.7.11-i686-2.6/numpy/core > creating > build/temp.cygwin-1.7.11-i686-2.6/build/src.cygwin-1.7.11-i686-2.6/numpy/core/src > creating > build/temp.cygwin-1.7.11-i686-2.6/build/src.cygwin-1.7.11-i686-2.6/numpy/core/src/npymath > creating build/temp.cygwin-1.7.11-i686-2.6/numpy > creating build/temp.cygwin-1.7.11-i686-2.6/numpy/core > creating build/temp.cygwin-1.7.11-i686-2.6/numpy/core/src > creating build/temp.cygwin-1.7.11-i686-2.6/numpy/core/src/npymath > compile options: '-Inumpy/core/include > -Ibuild/src.cygwin-1.7.11-i686-2.6/numpy/core/include/numpy > -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core > -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath > -Inumpy/core/src/npysort -Inumpy/core/include -I/usr/include/python2.6 > -Ibuild/src.cygwin-1.7.11-i686-2.6/numpy/core/src/multiarray > -Ibuild/src.cygwin-1.7.11-i686-2.6/numpy/core/src/umath -c' > gcc: build/src.cygwin-1.7.11-i686-2.6/numpy/core/src/npymath/npy_math.c > gcc: build/src.cygwin-1.7.11-i686-2.6/numpy/core/src/npymath/ieee754.c > numpy/core/src/npymath/ieee754.c.src:590:25: fatal error: fenv/fenv.c: No > such file or directory > compilation terminated. > numpy/core/src/npymath/ieee754.c.src:590:25: fatal error: fenv/fenv.c: No > such file or directory > compilation terminated. > error: Command "gcc -fno-strict-aliasing -g -O2 -pipe -DNDEBUG -g -fwrapv > -O3 -Wall -Wstrict-prototypes -Inumpy/core/include > -Ibuild/src.cygwin-1.7.11-i686-2.6/numpy/core/include/numpy > -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core > -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath > -Inumpy/core/src/npysort -Inumpy/core/include -I/usr/include/python2.6 > -Ibuild/src.cygwin-1.7.11-i686-2.6/numpy/core/src/multiarray > -Ibuild/src.cygwin-1.7.11-i686-2.6/numpy/core/src/umath -c > build/src.cygwin-1.7.11-i686-2.6/numpy/core/src/npymath/ieee754.c -o > build/temp.cygwin-1.7.11-i686-2.6/build/src.cygwin-1.7.11-i686-2.6/numpy/core/src/npymath/ieee754.o" > failed with exit status 1 > > > Thanks > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mattm184 at gmail.com Wed Feb 29 16:41:43 2012 From: mattm184 at gmail.com (Matt Miller) Date: Wed, 29 Feb 2012 13:41:43 -0800 Subject: [Numpy-discussion] Cygwin compile: fatal error: fenv/fenv.c Message-ID: That fixed changed my error message to this: numpy/core/src/private/lowlevel_strided_loops.h:404:1: warning: ?PyArray_PrepareThreeRawArrayIter? declared ?static? but never defined numpy/core/src/private/lowlevel_strided_loops.h:430:1: warning: ?PyArray_PrepareFourRawArrayIter? declared ?static? but never defined gcc -shared -Wl,--enable-auto-image-base build/temp.cygwin-1.7.11-i686-2.6/numpy/core/src/umath/umathmodule_onefile.o -L/usr/lib/python2.6/config -Lbuild/temp.cygwin-1.7.11-i686-2.6 -lnpymath -lpython2.6 -o build/lib.cygwin-1.7.11-i686-2.6/numpy/core/umath.dll build/temp.cygwin-1.7.11-i686-2.6/libnpymath.a(ieee754.o):ieee754.c:(.rdata+0x0): multiple definition of `_npy__fe_dfl_env' build/temp.cygwin-1.7.11-i686-2.6/numpy/core/src/umath/umathmodule_onefile.o:umathmodule_onefile.c:(.rdata+0x13158): first defined here collect2: ld returned 1 exit status build/temp.cygwin-1.7.11-i686-2.6/libnpymath.a(ieee754.o):ieee754.c:(.rdata+0x0): multiple definition of `_npy__fe_dfl_env' build/temp.cygwin-1.7.11-i686-2.6/numpy/core/src/umath/umathmodule_onefile.o:umathmodule_onefile.c:(.rdata+0x13158): first defined here collect2: ld returned 1 exit status error: Command "gcc -shared -Wl,--enable-auto-image-base build/temp.cygwin-1.7.11-i686-2.6/numpy/core/src/umath/umathmodule_onefile.o -L/usr/lib/python2.6/config -Lbuild/temp.cygwin-1.7.11-i686-2.6 -lnpymath -lpython2.6 -o build/lib.cygwin-1.7.11-i686-2.6/numpy/core/umath.dll" failed with exit status 1 Thanks for the quick reply! > > > Hi all, > > > > I am getting the following error when running `python setup.py install` > > for Numpy in Cygwin. This error happens on the latest as well as > > the maintenance branched for 1.5 and 1.6. > > > > This should fix it: http://projects.scipy.org/numpy/ticket/1944. > > Can you confirm that that works? Then I'll make the change in master. > > Ralf > > > > ... > > creating build/temp.cygwin-1.7.11-i686-2.6 > > creating build/temp.cygwin-1.7.11-i686-2.6/build > > creating > build/temp.cygwin-1.7.11-i686-2.6/build/src.cygwin-1.7.11-i686-2.6 > > creating > > build/temp.cygwin-1.7.11-i686-2.6/build/src.cygwin-1.7.11-i686-2.6/numpy > > creating > > > build/temp.cygwin-1.7.11-i686-2.6/build/src.cygwin-1.7.11-i686-2.6/numpy/core > > creating > > > build/temp.cygwin-1.7.11-i686-2.6/build/src.cygwin-1.7.11-i686-2.6/numpy/core/src > > creating > > > build/temp.cygwin-1.7.11-i686-2.6/build/src.cygwin-1.7.11-i686-2.6/numpy/core/src/npymath > > creating build/temp.cygwin-1.7.11-i686-2.6/numpy > > creating build/temp.cygwin-1.7.11-i686-2.6/numpy/core > > creating build/temp.cygwin-1.7.11-i686-2.6/numpy/core/src > > creating build/temp.cygwin-1.7.11-i686-2.6/numpy/core/src/npymath > > compile options: '-Inumpy/core/include > > -Ibuild/src.cygwin-1.7.11-i686-2.6/numpy/core/include/numpy > > -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core > > -Inumpy/core/src/npymath -Inumpy/core/src/multiarray > -Inumpy/core/src/umath > > -Inumpy/core/src/npysort -Inumpy/core/include -I/usr/include/python2.6 > > -Ibuild/src.cygwin-1.7.11-i686-2.6/numpy/core/src/multiarray > > -Ibuild/src.cygwin-1.7.11-i686-2.6/numpy/core/src/umath -c' > > gcc: build/src.cygwin-1.7.11-i686-2.6/numpy/core/src/npymath/npy_math.c > > gcc: build/src.cygwin-1.7.11-i686-2.6/numpy/core/src/npymath/ieee754.c > > numpy/core/src/npymath/ieee754.c.src:590:25: fatal error: fenv/fenv.c: No > > such file or directory > > compilation terminated. > > numpy/core/src/npymath/ieee754.c.src:590:25: fatal error: fenv/fenv.c: No > > such file or directory > > compilation terminated. > > error: Command "gcc -fno-strict-aliasing -g -O2 -pipe -DNDEBUG -g -fwrapv > > -O3 -Wall -Wstrict-prototypes -Inumpy/core/include > > -Ibuild/src.cygwin-1.7.11-i686-2.6/numpy/core/include/numpy > > -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core > > -Inumpy/core/src/npymath -Inumpy/core/src/multiarray > -Inumpy/core/src/umath > > -Inumpy/core/src/npysort -Inumpy/core/include -I/usr/include/python2.6 > > -Ibuild/src.cygwin-1.7.11-i686-2.6/numpy/core/src/multiarray > > -Ibuild/src.cygwin-1.7.11-i686-2.6/numpy/core/src/umath -c > > build/src.cygwin-1.7.11-i686-2.6/numpy/core/src/npymath/ieee754.c -o > > > build/temp.cygwin-1.7.11-i686-2.6/build/src.cygwin-1.7.11-i686-2.6/numpy/core/src/npymath/ieee754.o" > > failed with exit status 1 > > > > > > Thanks > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mattm184 at gmail.com Wed Feb 29 17:07:58 2012 From: mattm184 at gmail.com (Matt Miller) Date: Wed, 29 Feb 2012 14:07:58 -0800 Subject: [Numpy-discussion] Cygwin compile: fatal error: fenv/fenv.c In-Reply-To: References: Message-ID: More reading of the thread linked solved the issue. To reiterate, add numpy/ and change .c to .h in line 590 of ieee754.c.src. Ex: elif defined(__CYGWIN__) include "numpy/fenv/fenv.h" endif Thanks, On Wed, Feb 29, 2012 at 1:41 PM, Matt Miller wrote: > That fixed changed my error message to this: > > numpy/core/src/private/lowlevel_strided_loops.h:404:1: warning: > ?PyArray_PrepareThreeRawArrayIter? declared ?static? but never defined > numpy/core/src/private/lowlevel_strided_loops.h:430:1: warning: > ?PyArray_PrepareFourRawArrayIter? declared ?static? but never defined > gcc -shared -Wl,--enable-auto-image-base > build/temp.cygwin-1.7.11-i686-2.6/numpy/core/src/umath/umathmodule_onefile.o > -L/usr/lib/python2.6/config -Lbuild/temp.cygwin-1.7.11-i686-2.6 -lnpymath > -lpython2.6 -o build/lib.cygwin-1.7.11-i686-2.6/numpy/core/umath.dll > build/temp.cygwin-1.7.11-i686-2.6/libnpymath.a(ieee754.o):ieee754.c:(.rdata+0x0): > multiple definition of `_npy__fe_dfl_env' > build/temp.cygwin-1.7.11-i686-2.6/numpy/core/src/umath/umathmodule_onefile.o:umathmodule_onefile.c:(.rdata+0x13158): > first defined here > collect2: ld returned 1 exit status > build/temp.cygwin-1.7.11-i686-2.6/libnpymath.a(ieee754.o):ieee754.c:(.rdata+0x0): > multiple definition of `_npy__fe_dfl_env' > build/temp.cygwin-1.7.11-i686-2.6/numpy/core/src/umath/umathmodule_onefile.o:umathmodule_onefile.c:(.rdata+0x13158): > first defined here > collect2: ld returned 1 exit status > error: Command "gcc -shared -Wl,--enable-auto-image-base > build/temp.cygwin-1.7.11-i686-2.6/numpy/core/src/umath/umathmodule_onefile.o > -L/usr/lib/python2.6/config -Lbuild/temp.cygwin-1.7.11-i686-2.6 -lnpymath > -lpython2.6 -o build/lib.cygwin-1.7.11-i686-2.6/numpy/core/umath.dll" > failed with exit status 1 > > > Thanks for the quick reply! > > >> >> > Hi all, >> > >> > I am getting the following error when running `python setup.py install` >> > for Numpy in Cygwin. This error happens on the latest as well as >> > the maintenance branched for 1.5 and 1.6. >> > >> >> This should fix it: http://projects.scipy.org/numpy/ticket/1944. >> >> Can you confirm that that works? Then I'll make the change in master. >> >> Ralf >> >> >> > ... >> > creating build/temp.cygwin-1.7.11-i686-2.6 >> > creating build/temp.cygwin-1.7.11-i686-2.6/build >> > creating >> build/temp.cygwin-1.7.11-i686-2.6/build/src.cygwin-1.7.11-i686-2.6 >> > creating >> > build/temp.cygwin-1.7.11-i686-2.6/build/src.cygwin-1.7.11-i686-2.6/numpy >> > creating >> > >> build/temp.cygwin-1.7.11-i686-2.6/build/src.cygwin-1.7.11-i686-2.6/numpy/core >> > creating >> > >> build/temp.cygwin-1.7.11-i686-2.6/build/src.cygwin-1.7.11-i686-2.6/numpy/core/src >> > creating >> > >> build/temp.cygwin-1.7.11-i686-2.6/build/src.cygwin-1.7.11-i686-2.6/numpy/core/src/npymath >> > creating build/temp.cygwin-1.7.11-i686-2.6/numpy >> > creating build/temp.cygwin-1.7.11-i686-2.6/numpy/core >> > creating build/temp.cygwin-1.7.11-i686-2.6/numpy/core/src >> > creating build/temp.cygwin-1.7.11-i686-2.6/numpy/core/src/npymath >> > compile options: '-Inumpy/core/include >> > -Ibuild/src.cygwin-1.7.11-i686-2.6/numpy/core/include/numpy >> > -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core >> > -Inumpy/core/src/npymath -Inumpy/core/src/multiarray >> -Inumpy/core/src/umath >> > -Inumpy/core/src/npysort -Inumpy/core/include -I/usr/include/python2.6 >> > -Ibuild/src.cygwin-1.7.11-i686-2.6/numpy/core/src/multiarray >> > -Ibuild/src.cygwin-1.7.11-i686-2.6/numpy/core/src/umath -c' >> > gcc: build/src.cygwin-1.7.11-i686-2.6/numpy/core/src/npymath/npy_math.c >> > gcc: build/src.cygwin-1.7.11-i686-2.6/numpy/core/src/npymath/ieee754.c >> > numpy/core/src/npymath/ieee754.c.src:590:25: fatal error: fenv/fenv.c: >> No >> > such file or directory >> > compilation terminated. >> > numpy/core/src/npymath/ieee754.c.src:590:25: fatal error: fenv/fenv.c: >> No >> > such file or directory >> > compilation terminated. >> > error: Command "gcc -fno-strict-aliasing -g -O2 -pipe -DNDEBUG -g >> -fwrapv >> > -O3 -Wall -Wstrict-prototypes -Inumpy/core/include >> > -Ibuild/src.cygwin-1.7.11-i686-2.6/numpy/core/include/numpy >> > -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core >> > -Inumpy/core/src/npymath -Inumpy/core/src/multiarray >> -Inumpy/core/src/umath >> > -Inumpy/core/src/npysort -Inumpy/core/include -I/usr/include/python2.6 >> > -Ibuild/src.cygwin-1.7.11-i686-2.6/numpy/core/src/multiarray >> > -Ibuild/src.cygwin-1.7.11-i686-2.6/numpy/core/src/umath -c >> > build/src.cygwin-1.7.11-i686-2.6/numpy/core/src/npymath/ieee754.c -o >> > >> build/temp.cygwin-1.7.11-i686-2.6/build/src.cygwin-1.7.11-i686-2.6/numpy/core/src/npymath/ieee754.o" >> > failed with exit status 1 >> > >> > >> > Thanks >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> > >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Wed Feb 29 17:26:00 2012 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 29 Feb 2012 23:26:00 +0100 Subject: [Numpy-discussion] YouTrack License In-Reply-To: References: Message-ID: 28.02.2012 22:11, Ralf Gommers kirjoitti: [clip] > How about just putting it in a new github repo so everyone can see/review > the mapping between Trac and YouTrack fields? > > We should probably create a basic export from YouTrack script at the > same time, to make sure there's no lock-in. Keeping the conversion script in the public sounds good. We could perhaps also consider applying hosting for the Scipy tracker, too. -- Pauli Virtanen From ralf.gommers at googlemail.com Wed Feb 29 17:41:45 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 29 Feb 2012 23:41:45 +0100 Subject: [Numpy-discussion] Cygwin compile: fatal error: fenv/fenv.c In-Reply-To: References: Message-ID: On Wed, Feb 29, 2012 at 11:07 PM, Matt Miller wrote: > More reading of the thread linked solved the issue. To reiterate, add > numpy/ and change .c to .h in line 590 of ieee754.c.src. > > Ex: > > elif defined(__CYGWIN__) > include "numpy/fenv/fenv.h" > endif > Thanks for confirming. Fixed in master and 1.6.x now. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Wed Feb 29 22:02:01 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 29 Feb 2012 19:02:01 -0800 Subject: [Numpy-discussion] Proposed Roadmap Overview In-Reply-To: References: <107C83AF-7566-4224-852A-DCF5F1A35511@continuum.io> <4F3F2297.7010703@creativetrax.com> <4F402B46.4020103@molden.no> <4F4D7623.5000801@continuum.io> <3CB0878C-A0B3-4B51-84AC-8BF874752C13@continuum.io> Message-ID: <5592C463-6D95-4723-8E68-FA5E10920B90@continuum.io> I Would like to hear the opinions of others on that point, but yes, I think that is an appropriate procedure. Travis -- Travis Oliphant (on a mobile) 512-826-7480 On Feb 29, 2012, at 10:54 AM, Matthew Brett wrote: > Hi, > > On Wed, Feb 29, 2012 at 1:46 AM, Travis Oliphant wrote: >> We already use the NEP process for such decisions. This discussion came from simply from the *idea* of writing such a NEP. >> >> Nothing has been decided. Only opinions have been shared that might influence the NEP. This is all pretty premature, though --- migration to C++ features on a trial branch is some months away were it to happen. > > Fernando can correct me if I'm wrong, but I think he was asking a > governance question. That is: would you (as BDF$N) consider the > following guideline: > > "As a condition for accepting significant changes to Numpy, for each > significant change, there will be a NEP. The NEP shall follow the > same model as the Python PEPs - that is - there will be a summary of > the changes, the issues arising, the for / against opinions and > alternatives offered. There will usually be a draft implementation. > The NEP will contain the resolution of the discussion as it relates to > the code" > > For example, the masked array NEP, although very substantial, contains > little discussion of the controversy arising, or the intended > resolution of the controversy: > > https://github.com/numpy/numpy/blob/3f685a1a990f7b6e5149c80b52436fb4207e49f5/doc/neps/missing-data.rst > > I mean, although it is useful, it is not in the form of a PEP, as > Fernando has described it. > > Would you accept extending the guidelines to the NEP format? > > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion