From chris at simplistix.co.uk Fri Jun 1 05:46:07 2012 From: chris at simplistix.co.uk (Chris Withers) Date: Fri, 01 Jun 2012 10:46:07 +0100 Subject: [Numpy-discussion] better error message possible? Message-ID: <4FC88F5F.3060303@simplistix.co.uk> Hi All, Any reason why this: >>> import numpy >>> numpy.zeros(10)[-123] Traceback (most recent call last): File "", line 1, in IndexError: index out of bounds ...could say this: >>> numpy.zeros(10)[-123] Traceback (most recent call last): File "", line 1, in IndexError: -123 is out of bounds cheers, Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From njs at pobox.com Fri Jun 1 09:14:32 2012 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 1 Jun 2012 14:14:32 +0100 Subject: [Numpy-discussion] better error message possible? In-Reply-To: <4FC88F5F.3060303@simplistix.co.uk> References: <4FC88F5F.3060303@simplistix.co.uk> Message-ID: On Fri, Jun 1, 2012 at 10:46 AM, Chris Withers wrote: > Hi All, > > Any reason why this: > > ?>>> import numpy > ?>>> numpy.zeros(10)[-123] > Traceback (most recent call last): > ? File "", line 1, in > IndexError: index out of bounds > > ...could say this: > > ?>>> numpy.zeros(10)[-123] > Traceback (most recent call last): > ? File "", line 1, in > IndexError: -123 is out of bounds Only that no-one has implemented it, I guess. If you want to then that'd be cool :-). To be generally useful for debugging, it would probably be good for the error message to also mention which dimension is involved, and/or the actual size of the array in that dimension. You can also get such error messages from expressions like 'arr[i, j, k]', after all, where it's even less obvious what went wrong. -- Nathaniel From ben.root at ou.edu Fri Jun 1 11:39:34 2012 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 1 Jun 2012 11:39:34 -0400 Subject: [Numpy-discussion] better error message possible? In-Reply-To: References: <4FC88F5F.3060303@simplistix.co.uk> Message-ID: On Fri, Jun 1, 2012 at 9:14 AM, Nathaniel Smith wrote: > On Fri, Jun 1, 2012 at 10:46 AM, Chris Withers > wrote: > > Hi All, > > > > Any reason why this: > > > > >>> import numpy > > >>> numpy.zeros(10)[-123] > > Traceback (most recent call last): > > File "", line 1, in > > IndexError: index out of bounds > > > > ...could say this: > > > > >>> numpy.zeros(10)[-123] > > Traceback (most recent call last): > > File "", line 1, in > > IndexError: -123 is out of bounds > > Only that no-one has implemented it, I guess. If you want to then > that'd be cool :-). > > To be generally useful for debugging, it would probably be good for > the error message to also mention which dimension is involved, and/or > the actual size of the array in that dimension. You can also get such > error messages from expressions like 'arr[i, j, k]', after all, where > it's even less obvious what went wrong. > > -- Nathaniel > +1, please! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Jun 1 12:34:36 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 1 Jun 2012 09:34:36 -0700 Subject: [Numpy-discussion] better error message possible? In-Reply-To: References: <4FC88F5F.3060303@simplistix.co.uk> Message-ID: >> On Fri, Jun 1, 2012 at 10:46 AM, Chris Withers >> > Any reason why this: >> > >> > ?>>> import numpy >> > ?>>> numpy.zeros(10)[-123] >> > Traceback (most recent call last): >> > ? File "", line 1, in >> > IndexError: index out of bounds >> > >> > ...could say this: >> > >> > ?>>> numpy.zeros(10)[-123] >> > Traceback (most recent call last): >> > ? File "", line 1, in >> > IndexError: -123 is out of bounds >> >> Only that no-one has implemented it, I guess. If you want to then >> that'd be cool :-). That would be nice, but to be fair, python itself doesn't do it either: >>> l = range(10) >>> l[12] Traceback (most recent call last): File "", line 1, in IndexError: list index out of range Though Python's standard error messages are lacking in a lot of places... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From jeremy.lecoeur at uphs.upenn.edu Fri Jun 1 14:45:04 2012 From: jeremy.lecoeur at uphs.upenn.edu (Jeremy Lecoeur) Date: Fri, 01 Jun 2012 14:45:04 -0400 Subject: [Numpy-discussion] Error when multiplying large sparse matrices Message-ID: <4FC90DB0.90503@uphs.upenn.edu> Hi, I have been using the sparse matrix tools for a while to do all sort of things and, using the same code that was working just fine, I now encounter a problem when trying . I do have very large sparse matrices and when i multiplying them the number of non zeros exceed the max value of an intc, which cause indptr to hold negative values. Hence in the multiplication function of csr, whenc reating the resulting matrix, i get an error as it is not possible to have a negative value for a matrix size. Am I missing something that would allow me to do that computation ? Here is the code I am using: def main(): inCondMatFile = sys.argv[1] inNodeSize = sys.argv[2] outProfileFile = sys.argv[3] outNodeDistFile = sys.argv[4] outNodeDensityFile = sys.argv[5] Acsr = scipy.io.mmread(inCondMatFile).tocsr().sorted_indices() A = Acsr.tocoo() n = A.shape[0] nnz = A.nnz rows=numpy.zeros(3*nnz+n, dtype=numpy.int32) cols=numpy.zeros(3*nnz+n, dtype=numpy.int32) data=numpy.zeros(3*nnz+n, dtype=numpy.float64) #first n rows of constraint mat is A - I rows[0:nnz] = A.row cols[0:nnz] = A.col data[0:nnz] = A.data rows[nnz:nnz+n] = numpy.arange(n) cols[nnz:nnz+n] = numpy.arange(n) data[nnz:nnz+n] = -numpy.ones(n) #rows n to n+nnz are #A_{i,j} d_{j} - A_{j,i} d_{i} == 0 rows[nnz+n:] = numpy.append(numpy.arange(n,n+nnz),numpy.arange(n,n+nnz)) cols[nnz+n:] = numpy.append(A.col,A.row) data[nnz+n:] = numpy.append(A.data,-Acsr[A.col,A.row]) tmpC = scipy.sparse.coo_matrix( (data, (rows,cols) ) ) Ptmp = (tmpC.transpose().tocsr() * tmpC.tocsr()).tocoo() And it fails for that last multiplication (i did not include the rest of the code) because of an nnz way too big for an intc. Jeremy The information contained in this e-mail message is intended only for the personal and confidential use of the recipient(s) named above. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail, and delete the original message. From pav at iki.fi Fri Jun 1 15:40:53 2012 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 01 Jun 2012 21:40:53 +0200 Subject: [Numpy-discussion] Error when multiplying large sparse matrices In-Reply-To: <4FC90DB0.90503@uphs.upenn.edu> References: <4FC90DB0.90503@uphs.upenn.edu> Message-ID: 01.06.2012 20:45, Jeremy Lecoeur kirjoitti: > I have been using the sparse matrix tools for a while to do all sort of > things and, using the same code that was working just fine, I now > encounter a problem when trying . I do have very large sparse matrices > and when i multiplying them the number of non zeros exceed the max value > of an intc, which cause indptr to hold negative values. Hence in the > multiplication function of csr, whenc reating the resulting matrix, i > get an error as it is not possible to have a negative value for a matrix > size. > Am I missing something that would allow me to do that computation ? Using a larger integer type for the indices is not supported in Scipy at the moment. It's possible to implement, but needs someone to still do a bit of work: http://projects.scipy.org/scipy/ticket/1307 From chris at simplistix.co.uk Fri Jun 1 12:56:34 2012 From: chris at simplistix.co.uk (Chris Withers) Date: Fri, 01 Jun 2012 17:56:34 +0100 Subject: [Numpy-discussion] better error message possible? In-Reply-To: References: <4FC88F5F.3060303@simplistix.co.uk> Message-ID: <4FC8F442.8040100@simplistix.co.uk> On 01/06/2012 16:39, Benjamin Root wrote: > > > > >>> import numpy > > >>> numpy.zeros(10)[-123] > > Traceback (most recent call last): > > File "", line 1, in > > IndexError: index out of bounds > > > > ...could say this: > > > > >>> numpy.zeros(10)[-123] > > Traceback (most recent call last): > > File "", line 1, in > > IndexError: -123 is out of bounds > > Only that no-one has implemented it, I guess. If you want to then > that'd be cool :-). > > To be generally useful for debugging, it would probably be good for > the error message to also mention which dimension is involved, and/or > the actual size of the array in that dimension. You can also get such > error messages from expressions like 'arr[i, j, k]', after all, where > it's even less obvious what went wrong. > > -- Nathaniel > > > +1, please! Indeed, sadly I'm not a C developer. It's a pet bugbear of mine that Python's built-in exceptions often tell you what went wrong but not what data caused the error, even when it's easily to hand when raising the exception. Where's the right place to raise an issue that a numpy developer can hopefully make the (I suspect) simple change to get this behaviour? cheers, Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From ralf.gommers at googlemail.com Sun Jun 3 10:28:01 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 3 Jun 2012 16:28:01 +0200 Subject: [Numpy-discussion] some typestrings not recognized anymore Message-ID: Hi, Just ran into this: >>> np.__version__ '1.5.1' >>> np.empty((1,), dtype='>h2') # works in 1.6.2 too array([0], dtype=int16) >>> np.__version__ '1.7.0.dev-fd78546' >>> np.empty((1,), dtype='>h2') Traceback (most recent call last): File "", line 1, in TypeError: data type ">h2" not understood Seems like a quite serious issue. It breaks scipy.io.netcdf for example. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun Jun 3 10:49:58 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 3 Jun 2012 15:49:58 +0100 Subject: [Numpy-discussion] some typestrings not recognized anymore In-Reply-To: References: Message-ID: On Sun, Jun 3, 2012 at 3:28 PM, Ralf Gommers wrote: > Hi, > > Just ran into this: > >>>> np.__version__ > '1.5.1' >>>> np.empty((1,), dtype='>h2')? # works in 1.6.2 too > array([0], dtype=int16) > >>>> np.__version__ > '1.7.0.dev-fd78546' >>>> np.empty((1,), dtype='>h2') > Traceback (most recent call last): > ? File "", line 1, in > TypeError: data type ">h2" not understood For reference the problem seems to be that in 1.6 and earlier, "h" plus a number was allowed, and the number was ignored: >>> np.__version__ '1.5.1' >>> np.dtype("h2") dtype('int16') >>> np.dtype("h4") dtype('int16') >>> np.dtype("h100") dtype('int16') In current master, the number is disallowed -- all of those give TypeErrors. Presumably because "h" already means the same as "i2", so adding a second number on their is weird. Other typecodes with an "intrinsic size" seem to have the same problem -- "q", "l", etc. Obviously "h2" should be allowed in 1.7, seeing as disallowing it breaks scipy. And the behavior for "h100" is clearly broken and should be disallowed in the long run. So I guess we need to do two things: 1) Re-enable the use of typecode + size specifier even in cases where the typcode has an intrinsic size 2) Issue a deprecation warning for cases where the intrinsic size and the specified size don't match (like "h100"), and then turn that into an error in 1.8. Does that sound correct? I guess the other option would be to deprecate *all* use of size specifiers with these typecodes (i.e., deprecate "h2" as well, where the size specifier is merely redundant), but I'm not sure removing that feature is really worth it. -- Nathaniel From charlesr.harris at gmail.com Sun Jun 3 12:43:13 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 3 Jun 2012 10:43:13 -0600 Subject: [Numpy-discussion] commit rights for Nathaniel Message-ID: Hi All, Numpy is approaching a time of transition. Ralf will be concentrating his efforts on Scipy and I will be cutting back on my work on Numpy. The 1.7 release looks to be delayed and I suspect that the Continuum Analytics folks will become increasingly dedicated to the big data push. We need new people to carry things forward and I think Nathaniel can pick up part of the load. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Jun 3 14:04:37 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 3 Jun 2012 20:04:37 +0200 Subject: [Numpy-discussion] commit rights for Nathaniel In-Reply-To: References: Message-ID: On Sun, Jun 3, 2012 at 6:43 PM, Charles R Harris wrote: > Hi All, > > Numpy is approaching a time of transition. Ralf will be concentrating his > efforts on Scipy I'll write a separate post on that asap. > and I will be cutting back on my work on Numpy. I sincerely hope you don't cut back on your work too much Charles. You have done an excellent job as "chief maintainer" over the last years. The 1.7 release looks to be delayed and I suspect that the Continuum > Analytics folks will become increasingly dedicated to the big data push. We > need new people to carry things forward and I think Nathaniel can pick up > part of the load. > Assuming he wants them, I am definitely +1 on giving Nathaniel commit rights. His recent patches and debugging of issues were of high quality and very helpful. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Jun 3 14:14:56 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 3 Jun 2012 20:14:56 +0200 Subject: [Numpy-discussion] NumPy release manager post Message-ID: Hi all, You probably remember that I said that after numpy 1.7.0 was out I wanted to step down as release manager for NumPy and focus more on SciPy. That was 4.5 months ago, and now that 1.7.0 keeps being postponed I'm actually planning to not wait for it. I have found that it's not possible for me to handle overlapping NumPy and SciPy release schedules well, therefore even if the NA stuff is sorted out soon I don't think I could do a 1.7.0 release within the next 6-8 weeks. A possible 1.6.3 release I could still do without much trouble though. Travis has already volunteered to manage the next release - thanks for that Travis. If anyone else would like to pitch in, I'm sure that's very welcome too. It doesn't require that you already know everything there is to know about NumPy, just a willingness to dig in. Feel free to ask me either on- or off-list for more details. Cheers, Ralf P.S. I'll certainly not completely stop working on numpy (docs, testing, distutils, ...) -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Jun 3 16:00:56 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 3 Jun 2012 14:00:56 -0600 Subject: [Numpy-discussion] better error message possible? In-Reply-To: <4FC8F442.8040100@simplistix.co.uk> References: <4FC88F5F.3060303@simplistix.co.uk> <4FC8F442.8040100@simplistix.co.uk> Message-ID: On Fri, Jun 1, 2012 at 10:56 AM, Chris Withers wrote: > On 01/06/2012 16:39, Benjamin Root wrote: > > > > > > > >>> import numpy > > > >>> numpy.zeros(10)[-123] > > > Traceback (most recent call last): > > > File "", line 1, in > > > IndexError: index out of bounds > > > > > > ...could say this: > > > > > > >>> numpy.zeros(10)[-123] > > > Traceback (most recent call last): > > > File "", line 1, in > > > IndexError: -123 is out of bounds > > > > Only that no-one has implemented it, I guess. If you want to then > > that'd be cool :-). > > > > To be generally useful for debugging, it would probably be good for > > the error message to also mention which dimension is involved, and/or > > the actual size of the array in that dimension. You can also get such > > error messages from expressions like 'arr[i, j, k]', after all, where > > it's even less obvious what went wrong. > > > > -- Nathaniel > > > > > > +1, please! > > Indeed, sadly I'm not a C developer. It's a pet bugbear of mine that > Python's built-in exceptions often tell you what went wrong but not what > data caused the error, even when it's easily to hand when raising the > exception. > > Where's the right place to raise an issue that a numpy developer can > hopefully make the (I suspect) simple change to get this behaviour? > > Hmm, how about I enable github issues for the numpy repository? It looks like we are headed that way and maybe now is a good time to get started. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Jun 3 16:06:39 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 3 Jun 2012 14:06:39 -0600 Subject: [Numpy-discussion] Issue tracking Message-ID: Hi All, The issue tracking discussion seems to have died. Since github issues looks to be a viable alternative at this point, I propose to turn it on for the numpy repository and start directing people in that direction. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Jun 3 16:20:08 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 3 Jun 2012 22:20:08 +0200 Subject: [Numpy-discussion] some typestrings not recognized anymore In-Reply-To: References: Message-ID: On Sun, Jun 3, 2012 at 4:49 PM, Nathaniel Smith wrote: > On Sun, Jun 3, 2012 at 3:28 PM, Ralf Gommers > wrote: > > Hi, > > > > Just ran into this: > > > >>>> np.__version__ > > '1.5.1' > >>>> np.empty((1,), dtype='>h2') # works in 1.6.2 too > > array([0], dtype=int16) > > > >>>> np.__version__ > > '1.7.0.dev-fd78546' > >>>> np.empty((1,), dtype='>h2') > > Traceback (most recent call last): > > File "", line 1, in > > TypeError: data type ">h2" not understood > > For reference the problem seems to be that in 1.6 and earlier, "h" > plus a number was allowed, and the number was ignored: > > >>> np.__version__ > '1.5.1' > >>> np.dtype("h2") > dtype('int16') > >>> np.dtype("h4") > dtype('int16') > >>> np.dtype("h100") > dtype('int16') > > In current master, the number is disallowed -- all of those give > TypeErrors. Presumably because "h" already means the same as "i2", so > adding a second number on their is weird. > > Other typecodes with an "intrinsic size" seem to have the same problem > -- "q", "l", etc. > > Obviously "h2" should be allowed in 1.7, seeing as disallowing it > breaks scipy. And the behavior for "h100" is clearly broken and should > be disallowed in the long run. So I guess we need to do two things: > > 1) Re-enable the use of typecode + size specifier even in cases where > the typcode has an intrinsic size > 2) Issue a deprecation warning for cases where the intrinsic size and > the specified size don't match (like "h100"), and then turn that into > an error in 1.8. > > Does that sound correct? Seems correct as far as I can tell. Your approach to fixing the issue sounds good. > I guess the other option would be to > deprecate *all* use of size specifiers with these typecodes (i.e., > deprecate "h2" as well, where the size specifier is merely redundant), > but I'm not sure removing that feature is really worth it. > Either way would be OK I think. Using "h2" is redundant, but I can see how someone could prefer writing it like that for clarity. It's not like 'h' --> np.int16 is obvious. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From srean.list at gmail.com Sun Jun 3 17:44:46 2012 From: srean.list at gmail.com (srean) Date: Sun, 3 Jun 2012 16:44:46 -0500 Subject: [Numpy-discussion] fast access and normalizing of ndarray slices In-Reply-To: <5B532960-20F1-41FA-A17D-10CC5BEB898F@gmail.com> References: <5B532960-20F1-41FA-A17D-10CC5BEB898F@gmail.com> Message-ID: Hi Wolfgang, I think you are looking for reduceat( ), in particular add.reduceat() -- srean On Thu, May 31, 2012 at 12:36 AM, Wolfgang Kerzendorf wrote: > Dear all, > > I have an ndarray which consists of many arrays stacked behind each other (only conceptually, in truth it's a normal 1d float64 array). > I have a second array which tells me the start of the individual data sets in the 1d float64 array and another one which tells me the length. > Example: > > data_array = (conceptually) [[1,2], [1,2,3,4], [1,2,3]] = in reality [1,2,1,2,3,4,1,2,3, dtype=float64] > start_pointer = [0, 2, 6] > length_data = [2, 4, 3] > > I now want to normalize each of the individual data sets. I wrote a simple for loop over the start_pointer and length data grabbed the data and normalized it and wrote it back to the big array. That's slow. Is there an elegant numpy way to do that? Do I have to go the cython way? From ben.root at ou.edu Sun Jun 3 19:45:13 2012 From: ben.root at ou.edu (Benjamin Root) Date: Sun, 3 Jun 2012 19:45:13 -0400 Subject: [Numpy-discussion] some typestrings not recognized anymore In-Reply-To: References: Message-ID: On Sunday, June 3, 2012, Ralf Gommers wrote: > > > On Sun, Jun 3, 2012 at 4:49 PM, Nathaniel Smith > > wrote: > >> On Sun, Jun 3, 2012 at 3:28 PM, Ralf Gommers >> > 'ralf.gommers at googlemail.com');>> wrote: >> > Hi, >> > >> > Just ran into this: >> > >> >>>> np.__version__ >> > '1.5.1' >> >>>> np.empty((1,), dtype='>h2') # works in 1.6.2 too >> > array([0], dtype=int16) >> > >> >>>> np.__version__ >> > '1.7.0.dev-fd78546' >> >>>> np.empty((1,), dtype='>h2') >> > Traceback (most recent call last): >> > File "", line 1, in >> > TypeError: data type ">h2" not understood >> >> For reference the problem seems to be that in 1.6 and earlier, "h" >> plus a number was allowed, and the number was ignored: >> >> >>> np.__version__ >> '1.5.1' >> >>> np.dtype("h2") >> dtype('int16') >> >>> np.dtype("h4") >> dtype('int16') >> >>> np.dtype("h100") >> dtype('int16') >> >> In current master, the number is disallowed -- all of those give >> TypeErrors. Presumably because "h" already means the same as "i2", so >> adding a second number on their is weird. >> >> Other typecodes with an "intrinsic size" seem to have the same problem >> -- "q", "l", etc. >> >> Obviously "h2" should be allowed in 1.7, seeing as disallowing it >> breaks scipy. And the behavior for "h100" is clearly broken and should >> be disallowed in the long run. So I guess we need to do two things: >> >> 1) Re-enable the use of typecode + size specifier even in cases where >> the typcode has an intrinsic size >> 2) Issue a deprecation warning for cases where the intrinsic size and >> the specified size don't match (like "h100"), and then turn that into >> an error in 1.8. >> >> Does that sound correct? > > > Seems correct as far as I can tell. Your approach to fixing the issue > sounds good. > > >> I guess the other option would be to >> deprecate *all* use of size specifiers with these typecodes (i.e., >> deprecate "h2" as well, where the size specifier is merely redundant), >> but I'm not sure removing that feature is really worth it. >> > > Either way would be OK I think. Using "h2" is redundant, but I can see how > someone could prefer writing it like that for clarity. It's not like 'h' > --> np.int16 is obvious. > > Ralf > Also, we still need the number for some type codes such as 'a' to indicate the length of the string. I like the first solution much better. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From e.antero.tammi at gmail.com Mon Jun 4 10:13:43 2012 From: e.antero.tammi at gmail.com (eat) Date: Mon, 4 Jun 2012 17:13:43 +0300 Subject: [Numpy-discussion] fast access and normalizing of ndarray slices In-Reply-To: References: <5B532960-20F1-41FA-A17D-10CC5BEB898F@gmail.com> Message-ID: Hi, On Mon, Jun 4, 2012 at 12:44 AM, srean wrote: > Hi Wolfgang, > > I think you are looking for reduceat( ), in particular add.reduceat() > Indeed OP could utilize add.reduceat(...), like: # tst.py import numpy as np def reduce(data, lengths): ind, ends= np.r_[lengths, lengths], lengths.cumsum() ind[::2], ind[1::2]= ends- lengths, ends return np.add.reduceat(np.r_[data, 0], ind)[::2] def normalize(data, lengths): return data/ np.repeat(reduce(data, lengths), lengths) def gen(par): lengths= np.random.randint(*par) return np.random.randn(lengths.sum()), lengths if __name__ == '__main__': data= np.array([1, 2, 1, 2, 3, 4, 1, 2, 3], dtype= float) lengths= np.array([2, 4, 3]) print reduce(data, lengths) print normalize(data, lengths).round(2) Resulting: In []: %run tst [ 3. 10. 6.] [ 0.33 0.67 0.1 0.2 0.3 0.4 0.17 0.33 0.5 ] Fast enough: In []: data, lengths= gen([5, 15, 5e4]) In []: data.size Out[]: 476028 In []: %timeit normalize(data, lengths) 10 loops, best of 3: 29.4 ms per loop My 2 cents, -eat > > -- srean > > On Thu, May 31, 2012 at 12:36 AM, Wolfgang Kerzendorf > wrote: > > Dear all, > > > > I have an ndarray which consists of many arrays stacked behind each > other (only conceptually, in truth it's a normal 1d float64 array). > > I have a second array which tells me the start of the individual data > sets in the 1d float64 array and another one which tells me the length. > > Example: > > > > data_array = (conceptually) [[1,2], [1,2,3,4], [1,2,3]] = in reality > [1,2,1,2,3,4,1,2,3, dtype=float64] > > start_pointer = [0, 2, 6] > > length_data = [2, 4, 3] > > > > I now want to normalize each of the individual data sets. I wrote a > simple for loop over the start_pointer and length data grabbed the data and > normalized it and wrote it back to the big array. That's slow. Is there an > elegant numpy way to do that? Do I have to go the cython way? > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thouis at gmail.com Mon Jun 4 10:27:29 2012 From: thouis at gmail.com (Thouis (Ray) Jones) Date: Mon, 4 Jun 2012 16:27:29 +0200 Subject: [Numpy-discussion] better error message possible? In-Reply-To: <4FC8F442.8040100@simplistix.co.uk> References: <4FC88F5F.3060303@simplistix.co.uk> <4FC8F442.8040100@simplistix.co.uk> Message-ID: On Fri, Jun 1, 2012 at 6:56 PM, Chris Withers wrote: > On 01/06/2012 16:39, Benjamin Root wrote: >> >> >> ? ? ?> >>> import numpy >> ? ? ?> >>> numpy.zeros(10)[-123] >> ? ? ?> Traceback (most recent call last): >> ? ? ?> ? File "", line 1, in >> ? ? ?> IndexError: index out of bounds >> ? ? ?> >> ? ? ?> ...could say this: >> ? ? ?> >> ? ? ?> >>> numpy.zeros(10)[-123] >> ? ? ?> Traceback (most recent call last): >> ? ? ?> ? File "", line 1, in >> ? ? ?> IndexError: -123 is out of bounds >> >> ? ? Only that no-one has implemented it, I guess. If you want to then >> ? ? that'd be cool :-). >> >> ? ? To be generally useful for debugging, it would probably be good for >> ? ? the error message to also mention which dimension is involved, and/or >> ? ? the actual size of the array in that dimension. You can also get such >> ? ? error messages from expressions like 'arr[i, j, k]', after all, where >> ? ? it's even less obvious what went wrong. >> >> ? ? -- Nathaniel >> >> >> +1, please! > > Indeed, sadly I'm not a C developer. It's a pet bugbear of mine that > Python's built-in exceptions often tell you what went wrong but not what > data caused the error, even when it's easily to hand when raising the > exception. I could look into this. There are only ~10 places the code generates this error, so it should be a pretty minor change. Ray Jones From paul.anton.letnes at gmail.com Mon Jun 4 10:28:55 2012 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Mon, 4 Jun 2012 16:28:55 +0200 Subject: [Numpy-discussion] better error message possible? In-Reply-To: References: <4FC88F5F.3060303@simplistix.co.uk> <4FC8F442.8040100@simplistix.co.uk> Message-ID: On 4. juni 2012, at 16:27, Thouis (Ray) Jones wrote: > On Fri, Jun 1, 2012 at 6:56 PM, Chris Withers wrote: >> On 01/06/2012 16:39, Benjamin Root wrote: >>> >>> >>> > >>> import numpy >>> > >>> numpy.zeros(10)[-123] >>> > Traceback (most recent call last): >>> > File "", line 1, in >>> > IndexError: index out of bounds >>> > >>> > ...could say this: >>> > >>> > >>> numpy.zeros(10)[-123] >>> > Traceback (most recent call last): >>> > File "", line 1, in >>> > IndexError: -123 is out of bounds >>> >>> Only that no-one has implemented it, I guess. If you want to then >>> that'd be cool :-). >>> >>> To be generally useful for debugging, it would probably be good for >>> the error message to also mention which dimension is involved, and/or >>> the actual size of the array in that dimension. You can also get such >>> error messages from expressions like 'arr[i, j, k]', after all, where >>> it's even less obvious what went wrong. >>> >>> -- Nathaniel >>> >>> >>> +1, please! >> >> Indeed, sadly I'm not a C developer. It's a pet bugbear of mine that >> Python's built-in exceptions often tell you what went wrong but not what >> data caused the error, even when it's easily to hand when raising the >> exception. > > I could look into this. There are only ~10 places the code generates > this error, so it should be a pretty minor change. > > Ray Jones Isn't it useful even if you change it in just one of those locations? Better to have the information available when you can, than to never have it. Paul From ralf.gommers at googlemail.com Mon Jun 4 11:34:09 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 4 Jun 2012 17:34:09 +0200 Subject: [Numpy-discussion] Issue tracking In-Reply-To: References: Message-ID: On Sun, Jun 3, 2012 at 10:06 PM, Charles R Harris wrote: > Hi All, > > The issue tracking discussion seems to have died. Since github issues > looks to be a viable alternative at this point, I propose to turn it on for > the numpy repository and start directing people in that direction. > > Thoughts? > Sounds good, as long as we don't create duplicates or do something to make the conversion from Trac harder. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Jun 4 12:05:47 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 4 Jun 2012 10:05:47 -0600 Subject: [Numpy-discussion] Issue tracking In-Reply-To: References: Message-ID: On Mon, Jun 4, 2012 at 9:34 AM, Ralf Gommers wrote: > > > On Sun, Jun 3, 2012 at 10:06 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> Hi All, >> >> The issue tracking discussion seems to have died. Since github issues >> looks to be a viable alternative at this point, I propose to turn it on for >> the numpy repository and start directing people in that direction. >> >> Thoughts? >> > > Sounds good, as long as we don't create duplicates or do something to make > the conversion from Trac harder. > > I looked into this a bit, and it looks like the first task is to set up labels. They should probably track what we currently have for trac in order to make moving some (all?) of the tickets over. I'm thinking component, priority, type, milestone, and version, omitting the keywords. I'm not sure how we should handle attachments. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From bobtnur78 at gmail.com Mon Jun 4 12:21:02 2012 From: bobtnur78 at gmail.com (bob tnur) Date: Mon, 4 Jun 2012 12:21:02 -0400 Subject: [Numpy-discussion] How to remove any row or column of a numpy matrix whose sum is 3? Message-ID: Hello every body. I am new to python. How to remove any row or column of a numpy matrix whose sum is 3. To obtain and save new matrix P with (sum(anyrow)!=3 and sum(anycolumn)!=3 elements. I tried like this: P = M[np.logical_not( (M[n,:].sum()==3) & (M[:,n].sum()==3))] or P = M[np.logical_not( (np.sum(M[n,:])==3) & (np.sum(M[:,n])==3))] M is the nxn numpy matrix. But I got indexerror. So can anyone correct this or any other elegant way of doing this? Thanks for your help -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Jun 4 12:38:05 2012 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 4 Jun 2012 17:38:05 +0100 Subject: [Numpy-discussion] How to remove any row or column of a numpy matrix whose sum is 3? In-Reply-To: References: Message-ID: On Mon, Jun 4, 2012 at 5:21 PM, bob tnur wrote: > Hello every body. I am new to python. > How to remove any row or column of a numpy matrix whose sum is 3. > To obtain and save new matrix P with (sum(anyrow)!=3 and sum(anycolumn)!=3 > elements. > > I tried like this: > > P = M[np.logical_not( (M[n,:].sum()==3) & (M[:,n].sum()==3))] > or > P = M[np.logical_not( (np.sum(M[n,:])==3) & (np.sum(M[:,n])==3))] > > > M is the nxn numpy matrix. > But I got indexerror. So can anyone correct this or any other elegant way of > doing this? If M is 5x5 matrix, then M[5,:] and M[:,5] don't work. You can't index past the last element. Python sequences in general and numpy arrays in particular use 0-based indexing. I'm not entirely sure what you intended with those expressions anyways. Here is how I would do it. # Get the integer indices of the rows that sum up to 3 # and the columns that sum up to 3. bad_rows = np.nonzero(M.sum(axis=1) == 3) bad_cols = np.nonzero(M.sum(axis=0) == 3) # Now use the numpy.delete() function to get the matrix # with those rows and columns removed from the original matrix. P = np.delete(M, bad_rows, axis=0) P = np.delete(P, bad_cols, axis=1) -- Robert Kern From bobtnur78 at gmail.com Mon Jun 4 12:39:42 2012 From: bobtnur78 at gmail.com (bob tnur) Date: Mon, 4 Jun 2012 12:39:42 -0400 Subject: [Numpy-discussion] How to remove any row or column of a numpy matrix whose sum is 3? Message-ID: Hello every body. I am new to python. How to remove any row or column of a numpy matrix whose sum is 3. To obtain and save new matrix P with (sum(anyrow)!=3 and sum(anycolumn)!=3 elements. I tried like this: P = M[np.logical_not( (M[n,:].sum()==3) & (M[:,n].sum()==3))] or P = M[np.logical_not( (np.sum(M[n,:])==3) & (np.sum(M[:,n])==3))] M is the nxn numpy matrix. But I got indexerror. So can anyone correct this or any other elegant way of doing this? Thanks for your help -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Mon Jun 4 12:40:12 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 4 Jun 2012 18:40:12 +0200 Subject: [Numpy-discussion] Issue tracking In-Reply-To: References: Message-ID: On Mon, Jun 4, 2012 at 6:05 PM, Charles R Harris wrote: > > > On Mon, Jun 4, 2012 at 9:34 AM, Ralf Gommers wrote: > >> >> >> On Sun, Jun 3, 2012 at 10:06 PM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> Hi All, >>> >>> The issue tracking discussion seems to have died. Since github issues >>> looks to be a viable alternative at this point, I propose to turn it on for >>> the numpy repository and start directing people in that direction. >>> >>> Thoughts? >>> >> >> Sounds good, as long as we don't create duplicates or do something to >> make the conversion from Trac harder. >> >> > I looked into this a bit, and it looks like the first task is to set up > labels. They should probably track what we currently have for trac in order > to make moving some (all?) of the tickets over. I'm thinking component, > priority, type, milestone, and version, omitting the keywords. I'm not sure > how we should handle attachments. > Version can be left out I think, unless someone finds it useful. We can think about extra labels too. I'd like "easy-fix", as a guide for new contributors to issues to get started on. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Jun 4 12:51:10 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 4 Jun 2012 09:51:10 -0700 Subject: [Numpy-discussion] How to remove any row or column of a numpy matrix whose sum is 3? In-Reply-To: References: Message-ID: On Mon, Jun 4, 2012 at 9:21 AM, bob tnur wrote: > Hello every body. I am new to python. > How to remove any row or column of a numpy matrix whose sum is 3. > To obtain and save new matrix P with (sum(anyrow)!=3 and sum(anycolumn)!=3 > elements. well, one question is -- do you want to remove the particular rows first, then remove the particular columns, or compute the sums of both with all in place, then remove rows and columns -- but some ideas: In [357]: m Out[357]: array([[ 1., 1., 1., 1., 0.], [ 1., 0., 1., 1., 1.], [ 1., 1., 1., 0., 0.], [ 1., 1., 1., 1., 1.]]) # which rows, columns sum to 3? In [363]: rows_3 = np.argwhere(m.sum(axis=1) == 3) In [364]: rows_3 Out[364]: array([[2]]) In [365]: cols_3 = np.argwhere(m.sum(axis=0) == 3) In [366]: cols_3 Out[366]: array([[1], [3]]) # but it's probably easier to know which do not sum to 3: In [367]: rows_3 = np.argwhere(m.sum(axis=1) != 3) In [368]: rows_3 Out[368]: array([[0], [1], [3]]) In [371]: cols_3 = np.argwhere(m.sum(axis=0) != 3) In [372]: cols_3 Out[372]: array([[0], [2], [4]]) now build the new array: m2 = In [415]: m2 = m[rows_3[:,0]][:, cols_3[:,0]] In [416]: m2 Out[416]: array([[ 1., 1., 0.], [ 1., 1., 1.], [ 1., 1., 1.]]) some trickery there: the [:,2] index is because argwhere() returns a 2-d (nXm)array of indexes -- where m is the rank of the input array -- in this case one, so we want the single column (it's done this way so you can do: arr[ argwhere(arr == something) ] for n-dim arrays I also found I need to pull out the rows first, then the columns, because: arr[a_1, a_2] is interpreted as two arrays of individual indexes, not as indexes to the roes an columsn, i.e.: In [428]: arr Out[428]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) In [430]: arr[(1,2), (2,3)] Out[430]: array([ 6, 11]) you got the elements at: (1,2) and (2,3) There may be a way to do that a bit cleaner -- it escapes me at the moment. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From chris.barker at noaa.gov Mon Jun 4 12:56:38 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 4 Jun 2012 09:56:38 -0700 Subject: [Numpy-discussion] How to remove any row or column of a numpy matrix whose sum is 3? In-Reply-To: References: Message-ID: On Mon, Jun 4, 2012 at 9:38 AM, Robert Kern wrote: > ?# Now use the numpy.delete() function to get the matrix > ?# with those rows and columns removed from the original matrix. > ?P = np.delete(M, bad_rows, axis=0) > ?P = np.delete(P, bad_cols, axis=1) ah yes, forgot about np.delete -- that is a bit cleaner -- also, nonzero() rather than argwhere() is also cleaner. i.e. listen to Robert, not me! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From plredmond at gmail.com Mon Jun 4 13:31:45 2012 From: plredmond at gmail.com (Patrick Redmond) Date: Mon, 4 Jun 2012 13:31:45 -0400 Subject: [Numpy-discussion] 1D array sorting ascending and descending by fields Message-ID: Hi! I have a one-dimensional ndarray with two fields. I'd like to sort in descending order by field 'a', breaking ties by sorting in ascending order by field 'b'. I've found combinations of sorting and reversing followed by stable sorting that work, but there must be a straightforward way to do it. Your help is appreciated! Thank you, Patrick From travis at continuum.io Mon Jun 4 13:43:36 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 4 Jun 2012 12:43:36 -0500 Subject: [Numpy-discussion] Issue tracking In-Reply-To: References: Message-ID: <3C64F2BE-50E5-403C-9022-71233A6E3449@continuum.io> There is an interesting project called http://huboard.com/ The projects suggests using a few Column Labels that provides a nice card-based window onto the Github issues. I have turned on issue tracking and started a few labels. Feel free to add more / adjust the names as appropriate. I am trying to find someone who can help manage the migration from Trac. I have two people but they are both quite inexperienced, and it will take them some time to learn the process. If anyone out there is in a position to spend a month, there are resources available to do the migration. I think this is ideal for someone just getting started in the NumPy community who knows something about web-apis and data-bases (or is eager to learn). Best, -Travis On Jun 4, 2012, at 11:40 AM, Ralf Gommers wrote: > > > On Mon, Jun 4, 2012 at 6:05 PM, Charles R Harris wrote: > > > On Mon, Jun 4, 2012 at 9:34 AM, Ralf Gommers wrote: > > > On Sun, Jun 3, 2012 at 10:06 PM, Charles R Harris wrote: > Hi All, > > The issue tracking discussion seems to have died. Since github issues looks to be a viable alternative at this point, I propose to turn it on for the numpy repository and start directing people in that direction. > > Thoughts? > > Sounds good, as long as we don't create duplicates or do something to make the conversion from Trac harder. > > > I looked into this a bit, and it looks like the first task is to set up labels. They should probably track what we currently have for trac in order to make moving some (all?) of the tickets over. I'm thinking component, priority, type, milestone, and version, omitting the keywords. I'm not sure how we should handle attachments. > > Version can be left out I think, unless someone finds it useful. > > We can think about extra labels too. I'd like "easy-fix", as a guide for new contributors to issues to get started on. > > Ralf > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From plredmond at gmail.com Mon Jun 4 14:10:44 2012 From: plredmond at gmail.com (Patrick Redmond) Date: Mon, 4 Jun 2012 14:10:44 -0400 Subject: [Numpy-discussion] 1D array sorting ascending and descending by fields In-Reply-To: References: Message-ID: Here's how I sorted primarily by field 'a' descending and secondarily by field 'b' ascending: (Note that 'a' is the second column, 'b' is the first) >>> data array([('b', 0.03), ('c', 0.03), ('f', 0.03), ('e', 0.01), ('d', 0.04), ('a', 0.04)], dtype=[('b', '|S32'), ('a', '>> data.sort(order='b') # sort by b >>> data = data[::-1] # reverse >>> data[numpy.argsort(data['a'])][::-1] # sort by a and reverse array([('a', 0.04), ('d', 0.04), ('b', 0.03), ('c', 0.03), ('f', 0.03), ('e', 0.01)], dtype=[('b', '|S32'), ('a', '>> data.sort(order=('-a', 'b')) ...indicating that the order of 'a' is descending, but this isn't part of NumPy's sort behavior. Your help is appreciated! Thank you, Patrick -------------- next part -------------- An HTML attachment was scrubbed... URL: From mhansen at gmail.com Mon Jun 4 15:06:04 2012 From: mhansen at gmail.com (Mike Hansen) Date: Mon, 4 Jun 2012 12:06:04 -0700 Subject: [Numpy-discussion] Changes in PyArray_FromAny between 1.5.x and 1.6.x In-Reply-To: References: Message-ID: On Mon, May 28, 2012 at 3:15 AM, Mike Hansen wrote: > In trying to upgrade NumPy within Sage, we notices some differences in > behavior between 1.5 and 1.6. ?In particular, in 1.5, we have > > sage: f = 0.5 > sage: f.__array_interface__ > {'typestr': '=f8'} > sage: numpy.array(f) > array(0.5) > sage: numpy.array(float(f)) > array(0.5) > > In 1.6, we get the following, > > sage: f = 0.5 > sage: f.__array_interface__ > {'typestr': '=f8'} > sage: numpy.array(f) > array(0.500000000000000, dtype=object) > > This seems to be do to the changes in PyArray_FromAny introduced in > https://github.com/mwhansen/numpy/commit/2635398db3f26529ce2aaea4028a8118844f3c48 > . ?In particular, _array_find_type used to be used to query our > __array_interface__ attribute, and it no longer seems to work. ?Is > there a way to get the old behavior with the current code? Any ideas? Thanks, --Mike From ralf.gommers at googlemail.com Mon Jun 4 15:19:06 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 4 Jun 2012 21:19:06 +0200 Subject: [Numpy-discussion] Debian/Ubuntu patch help In-Reply-To: References: <4FB2BE06.3030204@googlemail.com> <4FB406B0.7050808@googlemail.com> Message-ID: On Wed, May 16, 2012 at 10:50 PM, Ralf Gommers wrote: > > > On Wed, May 16, 2012 at 9:57 PM, Julian Taylor < > jtaylor.debian at googlemail.com> wrote: > >> On 05/16/2012 09:01 PM, Ralf Gommers wrote: >> > >> > >> > On Tue, May 15, 2012 at 10:35 PM, Julian Taylor >> > > >> > wrote: >> > >> > > Hi, if there's anyone wants to have a look at the above issue this >> > > week, >> > >that would be great. >> > >> > > If there's a patch by this weekend I can create a second RC, so >> we can >> > > still have the final release before the end of this month (needed >> for >> > > Debian freeze). Otherwise a second RC won't be needed. >> > >> > bugfixes are still allowed during the debian freeze, so that should >> not >> > be an issue for the release timing. >> > >> > OK, that's good to know. So what's the hard deadline then? >> >> the release team aims for a freeze in the second half of june, but the >> number of release critical bugs is still huge so it could still change >> [0]. >> The freeze is will probably be 3-6 month long. >> >> > >> > >> > >> > I don't see the issue with the gcc --print-multiarch patch besides >> maybe >> > some cleanup. >> > --print-multiarch is a debian specific gcc patch, but multiarch is >> > debian specific for now. >> > >> > It doesn't work in 11.04, but who cares, that will be end of life >> in 5 >> > month anyway. >> > >> > >> > Eh, we (the numpy maintainers) should care. If we would not care about >> > an OS released only 13 months ago, we're not doing our job right. >> >> I scanned the list of classes in system_info, the only libraries >> multiarched in 11.04 on the list are: >> x11_info >> xft_info >> freetype2_info >> >> the first one is handled by the existing glob method, the latter two are >> handled correctly via pkg-config >> So I don't think there is anything to do for 11.04. 11.10 and 12.04 >> should also be fine. >> Wheezy will have multiarched fftw but probably not much more. >> >> Though one must also account for backports and future releases will >> likely have more multiarch ready numerical stuff to allow partial >> architectures like i386+sse2, x86_64+avx or completely new ones like x32. >> >> > >> > >> > Besides x11 almost nothing is multiarched in 11.04 anyway >> > and that can still be covered by the currently existing method. >> > >> > gcc should be available for pretty much anything requiring >> > numpy.distutils anyway so that should be not be an issue. >> > On systems without --print-multiarch or gcc you just ignore the >> failing, >> > there will be no repercussions as there will also not be any >> multiarched >> > libraries. >> > >> > If it's really that simple, such a patch may go into numpy master. But >> > history has shown that patches to a central part of numpy.distutils are >> > rarely issue-free (more due to the limitations/complexity of distutils >> > than anything else). Therefore making such a change right before a >> > release is simply a bad idea. >> >> I agree its probably a bit late to add it to 1.6.2. >> There is also no real need to have multiarch handled in this version. >> The Debian can add the patch to its 1.6.2 package >> >> It would be good to have the patch or something equivalent in the next >> version so upgrading from the package to 1.6.3 or 1.7 will not cause a >> regression in this respect. >> > > Yes, and better sooner than later. If you or someone else can provide this > as a pull request on Github, that would be helpful. As would a check that > the patch doesn't fail on Windows or OS X. > I opened a ticket for this so it doesn't get forgotten for 1.7.0: http://projects.scipy.org/numpy/ticket/2150 Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From gideon.simpson at gmail.com Mon Jun 4 17:00:29 2012 From: gideon.simpson at gmail.com (Gideon Simpson) Date: Mon, 4 Jun 2012 16:00:29 -0500 Subject: [Numpy-discussion] Numpy + SWIG Message-ID: <193A261E-E635-4EF9-8DA9-8C2084C687BB@gmail.com> There are two types of swig problems that I was hoping to get some help with. First, suppose I have some C function void f(double *x, int nx, double *y, int ny); where we input one array, and we output another array, both of which should be the same size. I have used in my .i file: %apply(double *IN_ARRAY1, int DIM1){(double *x, int nx)} %apply(double *ARGOUT_ARRAY1, int DIM1){(double *y, int ny)} and this produces a workable function. However, it expects, as the functions second argument, the length of the array x. Now, it's easy enough to call: module.f(x, x.shape[0]) but is there a way to automatically get it to use the length of the array? The second problem I have is for a function of the fomr void g(double *x, int nx, double *y, int ny, double *z, int nz); which evaluates some function g at all (x,y) pairs. The the thing is that nx and ny need not be the same size, but nz should be nx * ny. I'd like to wrap this too, and ideally it would also automatically handle the array lengths, but I'd be happy to have anything right now. I'm also quite comfortable with the idea of packing z as a column array and reshaping it as necessary. -gideon From thouis at gmail.com Mon Jun 4 17:00:28 2012 From: thouis at gmail.com (Thouis (Ray) Jones) Date: Mon, 4 Jun 2012 23:00:28 +0200 Subject: [Numpy-discussion] better error message possible? In-Reply-To: References: <4FC88F5F.3060303@simplistix.co.uk> <4FC8F442.8040100@simplistix.co.uk> Message-ID: On Mon, Jun 4, 2012 at 4:27 PM, Thouis (Ray) Jones wrote: > On Fri, Jun 1, 2012 at 6:56 PM, Chris Withers wrote: >> On 01/06/2012 16:39, Benjamin Root wrote: >>> >>> >>> ? ? ?> >>> import numpy >>> ? ? ?> >>> numpy.zeros(10)[-123] >>> ? ? ?> Traceback (most recent call last): >>> ? ? ?> ? File "", line 1, in >>> ? ? ?> IndexError: index out of bounds >>> ? ? ?> >>> ? ? ?> ...could say this: >>> ? ? ?> >>> ? ? ?> >>> numpy.zeros(10)[-123] >>> ? ? ?> Traceback (most recent call last): >>> ? ? ?> ? File "", line 1, in >>> ? ? ?> IndexError: -123 is out of bounds >>> >>> ? ? Only that no-one has implemented it, I guess. If you want to then >>> ? ? that'd be cool :-). >>> >>> ? ? To be generally useful for debugging, it would probably be good for >>> ? ? the error message to also mention which dimension is involved, and/or >>> ? ? the actual size of the array in that dimension. You can also get such >>> ? ? error messages from expressions like 'arr[i, j, k]', after all, where >>> ? ? it's even less obvious what went wrong. >>> >>> ? ? -- Nathaniel >>> >>> >>> +1, please! >> >> Indeed, sadly I'm not a C developer. It's a pet bugbear of mine that >> Python's built-in exceptions often tell you what went wrong but not what >> data caused the error, even when it's easily to hand when raising the >> exception. > > I could look into this. ?There are only ~10 places the code generates > this error, so it should be a pretty minor change. My initial estimate was low, but not overly so. An initial pass at adding index/dimension information to IndexErrors is here: https://github.com/thouis/numpy/tree/index_error_info A typical result: >>> numpy.zeros(3)[5] Traceback (most recent call last): File "", line 1, in IndexError: index 5 out of bounds in dimension 0 I thought it best to have erroring indices report their initial value: >>> numpy.zeros(3)[-15] Traceback (most recent call last): File "", line 1, in IndexError: index -15 out of bounds in dimension 0 This is different from some places in the code where IndexErrors already had bad index and dimension information (including the maximum value possible for an index in that dimension). I left these alone, though most of them would report that the bad index was -12 instead of -15. For instance: https://github.com/thouis/numpy/blob/index_error_info/numpy/core/src/multiarray/mapping.c#L1640 Also there were a few indexing errors that were throwing ValueErrors. I changed these to IndexErrors. If someone could give this a cursory review before I issue a PR, I'd appreciate it. I don't expect that most of these code paths are heavily exercised in the tests (but I could be wrong). Ray Jones From d.s.seljebotn at astro.uio.no Mon Jun 4 17:12:44 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 04 Jun 2012 23:12:44 +0200 Subject: [Numpy-discussion] Changes in PyArray_FromAny between 1.5.x and 1.6.x In-Reply-To: References: Message-ID: <4FCD24CC.7090305@astro.uio.no> On 06/04/2012 09:06 PM, Mike Hansen wrote: > On Mon, May 28, 2012 at 3:15 AM, Mike Hansen wrote: >> In trying to upgrade NumPy within Sage, we notices some differences in >> behavior between 1.5 and 1.6. In particular, in 1.5, we have >> >> sage: f = 0.5 >> sage: f.__array_interface__ >> {'typestr': '=f8'} >> sage: numpy.array(f) >> array(0.5) >> sage: numpy.array(float(f)) >> array(0.5) >> >> In 1.6, we get the following, >> >> sage: f = 0.5 >> sage: f.__array_interface__ >> {'typestr': '=f8'} >> sage: numpy.array(f) >> array(0.500000000000000, dtype=object) >> >> This seems to be do to the changes in PyArray_FromAny introduced in >> https://github.com/mwhansen/numpy/commit/2635398db3f26529ce2aaea4028a8118844f3c48 >> . In particular, _array_find_type used to be used to query our >> __array_interface__ attribute, and it no longer seems to work. Is >> there a way to get the old behavior with the current code? No idea. If you want to spend the time to fix this properly, you could implement PEP 3118 and use that instead to export your array data (which can be done from Cython using __getbuffer__ on a Cython class). Dag From njs at pobox.com Mon Jun 4 17:49:45 2012 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 4 Jun 2012 22:49:45 +0100 Subject: [Numpy-discussion] better error message possible? In-Reply-To: References: <4FC88F5F.3060303@simplistix.co.uk> <4FC8F442.8040100@simplistix.co.uk> Message-ID: On Mon, Jun 4, 2012 at 10:00 PM, Thouis (Ray) Jones wrote: > On Mon, Jun 4, 2012 at 4:27 PM, Thouis (Ray) Jones wrote: >> I could look into this. ?There are only ~10 places the code generates >> this error, so it should be a pretty minor change. > > My initial estimate was low, but not overly so. ?An initial pass at > adding index/dimension information to IndexErrors is here: > https://github.com/thouis/numpy/tree/index_error_info Fabulous! I made a few comments there, but also: > A typical result: > >>>> numpy.zeros(3)[5] > Traceback (most recent call last): > ?File "", line 1, in > IndexError: index 5 out of bounds in dimension 0 I would say "for", not "in". "index 5" is a bit ambiguous too... people might mis-read it as the dimension, like, "the 5th index value I gave"? Not sure how to make it unambiguous. Maybe: "IndexError: dimension 0 index out of bounds: got 5, size is 3" ? > I thought it best to have erroring indices report their initial value: > >>>> numpy.zeros(3)[-15] > Traceback (most recent call last): > ?File "", line 1, in > IndexError: index -15 out of bounds in dimension 0 > > This is different from some places in the code where IndexErrors > already had bad index and dimension information (including the maximum > value possible for an index in that dimension). ?I left these alone, > though most of them would report that the bad index was -12 instead of > -15. ?For instance: > https://github.com/thouis/numpy/blob/index_error_info/numpy/core/src/multiarray/mapping.c#L1640 I think this code you link to is actually correct, but yeah, it should definitely report whatever the user passed in, or it will be a debugging hindrance rather than a debugging help! > Also there were a few indexing errors that were throwing ValueErrors. > I changed these to IndexErrors. > > If someone could give this a cursory review before I issue a PR, I'd > appreciate it. ?I don't expect that most of these code paths are > heavily exercised in the tests (but I could be wrong). Perhaps the easiest thing would be to just add a test? It should be about 1 line each per code path... or 2 if you check both the negative and positive versions. def test_index_bound_checking(): assert_raises(IndexError, my_array.__getitem__, (0, 100)) assert_raises(IndexError, my_array.__getitem__, (0, -101)) # etc. - N From chris.barker at noaa.gov Mon Jun 4 18:01:51 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 4 Jun 2012 15:01:51 -0700 Subject: [Numpy-discussion] Numpy + SWIG In-Reply-To: <193A261E-E635-4EF9-8DA9-8C2084C687BB@gmail.com> References: <193A261E-E635-4EF9-8DA9-8C2084C687BB@gmail.com> Message-ID: HAVe you discovered the numpy.i interface files? I haven't done SWIG in a while, but they should take care of at least some of this for you. They used to be distributed with numpy (in docs?), but some googling should find then in any case. -Chris On Mon, Jun 4, 2012 at 2:00 PM, Gideon Simpson wrote: > There are two types of swig problems that I was hoping to get some help with. First, suppose I have some C function > > void f(double *x, int nx, double *y, int ny); > > where we input one array, and we output another array, both of which should be the same size. > > I have used in my .i file: > %apply(double *IN_ARRAY1, int DIM1){(double *x, int nx)} > %apply(double *ARGOUT_ARRAY1, int DIM1){(double *y, int ny)} > > and this produces a workable function. ?However, it expects, as the functions second argument, the length of the array x. Now, it's easy enough to call: > module.f(x, x.shape[0]) > > but is there a way to automatically get it to use the length of the array? > > The second problem I have is for a function of the fomr > > void g(double *x, int nx, double *y, int ny, double *z, int nz); > > which evaluates some function g at all (x,y) pairs. ?The the thing is that nx and ny need not be the same size, but nz should be nx * ny. ?I'd like to wrap this too, and ideally it would also automatically handle the array lengths, but I'd be happy to have anything right now. ?I'm also quite comfortable with the idea of packing z as a column array and reshaping it as necessary. > > > -gideon > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From chris.barker at noaa.gov Mon Jun 4 18:08:35 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 4 Jun 2012 15:08:35 -0700 Subject: [Numpy-discussion] 1D array sorting ascending and descending by fields In-Reply-To: References: Message-ID: On Mon, Jun 4, 2012 at 11:10 AM, Patrick Redmond wrote: > Here's how I sorted primarily by field 'a' descending and secondarily by > field 'b' ascending: could you multiply the numeric field by -1, sort, then put it back -- somethign like: data *- -1 data_sorted = np.sort(data, order=['a','b']) data_sorted *= -1 (reverse if necessary -- I lost track...) -Chris > (Note that 'a' is the second column, 'b' is the first) > >>>> data > array([('b', 0.03), > ? ? ? ?('c', 0.03), > ? ? ? ?('f', 0.03), > ? ? ? ?('e', 0.01), > ? ? ? ?('d', 0.04), > ? ? ? ?('a', 0.04)], > ? ? ? dtype=[('b', '|S32'), ('a', '>>> data.sort(order='b') # sort by b >>>> data = data[::-1] # reverse >>>> data[numpy.argsort(data['a'])][::-1] # sort by a and reverse > array([('a', 0.04), > ? ? ? ?('d', 0.04), > ? ? ? ?('b', 0.03), > ? ? ? ?('c', 0.03), > ? ? ? ?('f', 0.03), > ? ? ? ?('e', 0.01)], > ? ? ? dtype=[('b', '|S32'), ('a', ' > My question is whether there's an easier way to do this. could you multipily the nubmeric field by -1, sort, then multiply it again? Originally I > thought it would be possible to just do: > >>>> data.sort(order=('-a', 'b')) > > ...indicating that the order of 'a' is descending, but this isn't part of > NumPy's sort behavior. > > Your help is appreciated! > > Thank you, > Patrick > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From wfspotz at sandia.gov Mon Jun 4 18:17:31 2012 From: wfspotz at sandia.gov (Bill Spotz) Date: Mon, 4 Jun 2012 16:17:31 -0600 Subject: [Numpy-discussion] [EXTERNAL] Numpy + SWIG In-Reply-To: <193A261E-E635-4EF9-8DA9-8C2084C687BB@gmail.com> References: <193A261E-E635-4EF9-8DA9-8C2084C687BB@gmail.com> Message-ID: <4A5A0F11-30C0-49C8-BC89-D25F7035A5B0@sandia.gov> Gideon, For these use cases, you will need to write short wrapper functions yourself. In the online docs, http://docs.scipy.org/doc/numpy/reference/swig.interface-file.html in the section entitled "Beyond the Provided Typemaps", subsection "A Common Example", there is an example of how to do this for a similar, but subtly different use case. This example looks more like your second problem than your first, but you tackle the first problem using the same technique. If you have trouble getting something to work, feel free to contact me off-list. -Bill On Jun 4, 2012, at 3:00 PM, Gideon Simpson wrote: > There are two types of swig problems that I was hoping to get some help with. First, suppose I have some C function > > void f(double *x, int nx, double *y, int ny); > > where we input one array, and we output another array, both of which should be the same size. > > I have used in my .i file: > %apply(double *IN_ARRAY1, int DIM1){(double *x, int nx)} > %apply(double *ARGOUT_ARRAY1, int DIM1){(double *y, int ny)} > > and this produces a workable function. However, it expects, as the functions second argument, the length of the array x. Now, it's easy enough to call: > module.f(x, x.shape[0]) > > but is there a way to automatically get it to use the length of the array? > > The second problem I have is for a function of the fomr > > void g(double *x, int nx, double *y, int ny, double *z, int nz); > > which evaluates some function g at all (x,y) pairs. The the thing is that nx and ny need not be the same size, but nz should be nx * ny. I'd like to wrap this too, and ideally it would also automatically handle the array lengths, but I'd be happy to have anything right now. I'm also quite comfortable with the idea of packing z as a column array and reshaping it as necessary. > > > -gideon > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion ** Bill Spotz ** ** Sandia National Laboratories Voice: (505)845-0170 ** ** P.O. Box 5800 Fax: (505)284-0154 ** ** Albuquerque, NM 87185-0370 Email: wfspotz at sandia.gov ** From ben.root at ou.edu Mon Jun 4 20:17:13 2012 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 4 Jun 2012 20:17:13 -0400 Subject: [Numpy-discussion] 1D array sorting ascending and descending by fields In-Reply-To: References: Message-ID: On Monday, June 4, 2012, Chris Barker wrote: > On Mon, Jun 4, 2012 at 11:10 AM, Patrick Redmond > > wrote: > > Here's how I sorted primarily by field 'a' descending and secondarily by > > field 'b' ascending: > > could you multiply the numeric field by -1, sort, then put it back -- > somethign like: > > data *- -1 > data_sorted = np.sort(data, order=['a','b']) > data_sorted *= -1 > > (reverse if necessary -- I lost track...) > > -Chris While that may work for this users case, that would not work for all dtypes. Some, such as timedelta, datetime and strings would not be able to be multiplied by a number. Would be an interesting feature to add, but I am not certain if the negative sign notation would be best. Is it possible for a named field to start with a negative sign? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Tue Jun 5 00:30:37 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 4 Jun 2012 23:30:37 -0500 Subject: [Numpy-discussion] Changes in PyArray_FromAny between 1.5.x and 1.6.x In-Reply-To: <4FCD24CC.7090305@astro.uio.no> References: <4FCD24CC.7090305@astro.uio.no> Message-ID: <3B22F9A0-BA05-4585-A992-89EBD590BB41@continuum.io> Can you raise an issue on the Github issue tracker for NumPy? These issues will be looked at more closely. This kind of change should not have made it in to the release. Given the lack of availability of time from enough experts in NumPy, this is the sort of thing that can happen. I was not able to guide development of NumPy appropriately at my old job. That's a big reason I left. I still have more to do than just guide NumPy now, but making sure NumPy is maintained is a big part of what I am doing and why both NumFOCUS and Continuum Analytics exist. I am very hopeful that we can avoid this sort of regression in the future. More tests will help. I think it's important to note that there are many people who will be in the same boat of upgrading to 1.6 over the coming year and there are going to be other little issues like this we will need to address. -Travis On Jun 4, 2012, at 4:12 PM, Dag Sverre Seljebotn wrote: > On 06/04/2012 09:06 PM, Mike Hansen wrote: >> On Mon, May 28, 2012 at 3:15 AM, Mike Hansen wrote: >>> In trying to upgrade NumPy within Sage, we notices some differences in >>> behavior between 1.5 and 1.6. In particular, in 1.5, we have >>> >>> sage: f = 0.5 >>> sage: f.__array_interface__ >>> {'typestr': '=f8'} >>> sage: numpy.array(f) >>> array(0.5) >>> sage: numpy.array(float(f)) >>> array(0.5) >>> >>> In 1.6, we get the following, >>> >>> sage: f = 0.5 >>> sage: f.__array_interface__ >>> {'typestr': '=f8'} >>> sage: numpy.array(f) >>> array(0.500000000000000, dtype=object) >>> >>> This seems to be do to the changes in PyArray_FromAny introduced in >>> https://github.com/mwhansen/numpy/commit/2635398db3f26529ce2aaea4028a8118844f3c48 >>> . In particular, _array_find_type used to be used to query our >>> __array_interface__ attribute, and it no longer seems to work. Is >>> there a way to get the old behavior with the current code? > > No idea. If you want to spend the time to fix this properly, you could > implement PEP 3118 and use that instead to export your array data (which > can be done from Cython using __getbuffer__ on a Cython class). > > Dag > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From travis at continuum.io Tue Jun 5 00:39:40 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 4 Jun 2012 23:39:40 -0500 Subject: [Numpy-discussion] some typestrings not recognized anymore In-Reply-To: References: Message-ID: Using the 'h2' is redundant, but it should not have been changed so quickly. I could see raising a deprecation warning and communicating the correct spelling ('i2'). -Travis On Jun 3, 2012, at 6:45 PM, Benjamin Root wrote: > > > On Sunday, June 3, 2012, Ralf Gommers wrote: > > > On Sun, Jun 3, 2012 at 4:49 PM, Nathaniel Smith wrote: > On Sun, Jun 3, 2012 at 3:28 PM, Ralf Gommers > wrote: > > Hi, > > > > Just ran into this: > > > >>>> np.__version__ > > '1.5.1' > >>>> np.empty((1,), dtype='>h2') # works in 1.6.2 too > > array([0], dtype=int16) > > > >>>> np.__version__ > > '1.7.0.dev-fd78546' > >>>> np.empty((1,), dtype='>h2') > > Traceback (most recent call last): > > File "", line 1, in > > TypeError: data type ">h2" not understood > > For reference the problem seems to be that in 1.6 and earlier, "h" > plus a number was allowed, and the number was ignored: > > >>> np.__version__ > '1.5.1' > >>> np.dtype("h2") > dtype('int16') > >>> np.dtype("h4") > dtype('int16') > >>> np.dtype("h100") > dtype('int16') > > In current master, the number is disallowed -- all of those give > TypeErrors. Presumably because "h" already means the same as "i2", so > adding a second number on their is weird. > > Other typecodes with an "intrinsic size" seem to have the same problem > -- "q", "l", etc. > > Obviously "h2" should be allowed in 1.7, seeing as disallowing it > breaks scipy. And the behavior for "h100" is clearly broken and should > be disallowed in the long run. So I guess we need to do two things: > > 1) Re-enable the use of typecode + size specifier even in cases where > the typcode has an intrinsic size > 2) Issue a deprecation warning for cases where the intrinsic size and > the specified size don't match (like "h100"), and then turn that into > an error in 1.8. > > Does that sound correct? > > Seems correct as far as I can tell. Your approach to fixing the issue sounds good. > > I guess the other option would be to > deprecate *all* use of size specifiers with these typecodes (i.e., > deprecate "h2" as well, where the size specifier is merely redundant), > but I'm not sure removing that feature is really worth it. > > Either way would be OK I think. Using "h2" is redundant, but I can see how someone could prefer writing it like that for clarity. It's not like 'h' --> np.int16 is obvious. > > Ralf > > > Also, we still need the number for some type codes such as 'a' to indicate the length of the string. I like the first solution much better. > > Ben Root > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From mhansen at gmail.com Tue Jun 5 00:41:15 2012 From: mhansen at gmail.com (Mike Hansen) Date: Mon, 4 Jun 2012 21:41:15 -0700 Subject: [Numpy-discussion] Changes in PyArray_FromAny between 1.5.x and 1.6.x In-Reply-To: <3B22F9A0-BA05-4585-A992-89EBD590BB41@continuum.io> References: <4FCD24CC.7090305@astro.uio.no> <3B22F9A0-BA05-4585-A992-89EBD590BB41@continuum.io> Message-ID: On Mon, Jun 4, 2012 at 9:30 PM, Travis Oliphant wrote: > Can you raise an issue on the Github issue tracker for NumPy? ? These issues will be looked at more closely. ? This kind of change should not have made it in to the release. Thanks Travis! I've made this https://github.com/numpy/numpy/issues/291 --Mike From thouis.jones at curie.fr Tue Jun 5 06:15:56 2012 From: thouis.jones at curie.fr (Thouis Jones) Date: Tue, 5 Jun 2012 12:15:56 +0200 Subject: [Numpy-discussion] better error message possible? In-Reply-To: References: <4FC88F5F.3060303@simplistix.co.uk> <4FC8F442.8040100@simplistix.co.uk> Message-ID: On Mon, Jun 4, 2012 at 11:49 PM, Nathaniel Smith wrote: > On Mon, Jun 4, 2012 at 10:00 PM, Thouis (Ray) Jones wrote: >> On Mon, Jun 4, 2012 at 4:27 PM, Thouis (Ray) Jones wrote: >>> I could look into this. ?There are only ~10 places the code generates >>> this error, so it should be a pretty minor change. >> >> My initial estimate was low, but not overly so. ?An initial pass at >> adding index/dimension information to IndexErrors is here: >> https://github.com/thouis/numpy/tree/index_error_info > > Fabulous! I made a few comments there, but also: > >> A typical result: >> >>>>> numpy.zeros(3)[5] >> Traceback (most recent call last): >> ?File "", line 1, in >> IndexError: index 5 out of bounds in dimension 0 > > I would say "for", not "in". > > "index 5" is a bit ambiguous too... people might mis-read it as the > dimension, like, "the 5th index value I gave"? Not sure how to make it > unambiguous. Maybe: > > "IndexError: dimension 0 index out of bounds: got 5, size is 3" > > ? How about: IndexError: 5 is out of bounds for dimension 0: must be in [-3, 3). to be maximally explicit about what values are allowed, and avoid the "index" confusion. Ray Jones From njs at pobox.com Tue Jun 5 06:40:44 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 5 Jun 2012 11:40:44 +0100 Subject: [Numpy-discussion] nditer_buffer_flag branch (was: Add data memory allocation tracing facilities. (#284)) Message-ID: On Tue, Jun 5, 2012 at 11:06 AM, Thouis (Ray) Jones wrote: > All of the failing tests seem to have been caused by the buffer copy bug, fixed in ?https://github.com/mwiebe/numpy/tree/nditer_buffer_flag (but not yet pulled into numpy). > > I also have a version that implements tracing, with pure C in the allocation functions writing to a dynamically allocated buffer, which must then be fetched proactively by Python. ?However, I think this version is a little nicer to use from the Python perspective. > > --- > Reply to this email directly or view it on GitHub: > https://github.com/numpy/numpy/pull/284#issuecomment-6121817 Speaking of which, Mark - what's the status of that nditer_buffer_flag branch? Should there be a pull request? -N From markflorisson88 at gmail.com Tue Jun 5 07:55:13 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 5 Jun 2012 12:55:13 +0100 Subject: [Numpy-discussion] lazy evaluation Message-ID: Hey, Another discussion on lazy evaluation, given the recent activity here: https://github.com/ContinuumIO/numba/pull/6#issuecomment-6117091 A somewhat recent previous thread can be found here: http://mail.scipy.org/pipermail/numpy-discussion/2012-February/060862.html , and a NEP here: https://github.com/numpy/numpy/blob/master/doc/neps/deferred-ufunc-evaluation.rst I think trying to parse bytecode and build an expression graph for array expressions from that has disadvantages and is harder in general. For instance it won't be able to deal with branching at execution time, and things like inter-procedural analysis will be harder (not to mention you'd have to parse dtype creation). Instead, what you really want to do is hook into a lazy evaluating version of numpy, and generate your own code from the operations it records. It would be great if we implement the NEP listed above, but with a few extensions. I think Numpy should handle the lazy evaluation part, and determine when expressions should be evaluated, etc. However, for each user operation, Numpy will call back a user-installed hook implementing some interface, to allow various packages to provide their own hooks to evaluate vector operations however they want. This will include packages such as Theano, which could run things on the GPU, Numexpr, and in the future https://github.com/markflorisson88/minivect (which will likely have an LLVM backend in the future, and possibly integrated with Numba to allow inlining of numba ufuncs). The project above tries to bring together all the different array expression compilers together in a single framework, to provide efficient array expressions specialized for any data layout (nditer on steroids if you will, with SIMD, threaded and inlining capabilities). We could allow each hook to specify which dtypes it supports, and a minimal data size needed before it should be invoked (to avoid overhead for small arrays, like the openmp 'if' clause). If an operation is not supported, it will simply raise NotImplementedError, which means Numpy will evaluate the expression built so far and run its own implementation, resulting in a non-lazy array. E.g. if a library supports adding things together, but doesn't support the 'sin' function, np.sin(a + b) will result in the library executing a + b, and numpy evaluating sin on the result. So the idea is that the numpy lazy array will wrap an expression graph, which is built when the user performs operations and evaluated when needed (when a result is required or when someone tells numpy to evaluate all lazy arrays). Numpy will simply use the first hook willing to operate on data of the specified size and dtype, and will keep using that hook to build the expression until evaluated. Anyway, this is somewhat of a high-level overview. If there is any interest, we can flesh out the details and extend the NEP. Mark From thouis.jones at curie.fr Tue Jun 5 08:41:09 2012 From: thouis.jones at curie.fr (Thouis Jones) Date: Tue, 5 Jun 2012 14:41:09 +0200 Subject: [Numpy-discussion] better error message possible? In-Reply-To: References: <4FC88F5F.3060303@simplistix.co.uk> <4FC8F442.8040100@simplistix.co.uk> Message-ID: On Tue, Jun 5, 2012 at 12:15 PM, Thouis Jones wrote: > On Mon, Jun 4, 2012 at 11:49 PM, Nathaniel Smith wrote: >> On Mon, Jun 4, 2012 at 10:00 PM, Thouis (Ray) Jones wrote: >>> On Mon, Jun 4, 2012 at 4:27 PM, Thouis (Ray) Jones wrote: >>>> I could look into this. ?There are only ~10 places the code generates >>>> this error, so it should be a pretty minor change. >>> >>> My initial estimate was low, but not overly so. ?An initial pass at >>> adding index/dimension information to IndexErrors is here: >>> https://github.com/thouis/numpy/tree/index_error_info >> >> Fabulous! I made a few comments there, but also: >> >>> A typical result: >>> >>>>>> numpy.zeros(3)[5] >>> Traceback (most recent call last): >>> ?File "", line 1, in >>> IndexError: index 5 out of bounds in dimension 0 >> >> I would say "for", not "in". >> >> "index 5" is a bit ambiguous too... people might mis-read it as the >> dimension, like, "the 5th index value I gave"? Not sure how to make it >> unambiguous. Maybe: >> >> "IndexError: dimension 0 index out of bounds: got 5, size is 3" >> >> ? > > How about: > IndexError: 5 is out of bounds for dimension 0: must be in [-3, 3). > > to be maximally explicit about what values are allowed, and avoid the > "index" confusion. Or perhaps "axis" instead of "dimension", since this is how they are referred to in most numpy argument lists. From ndbecker2 at gmail.com Tue Jun 5 09:54:22 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 05 Jun 2012 09:54:22 -0400 Subject: [Numpy-discussion] varargs for logical_or, etc Message-ID: I think it's unfortunate that functions like logical_or are limited to binary. As a workaround, I've been using this: def apply_binary (func, *args): if len (args) == 1: return args[0] elif len (args) == 2: return func (*args) else: return func ( apply_binary (func, *args[:len(args)/2]), apply_binary (func, *args[(len(args))/2:])) Then for example: punc2 = np.logical_and (u % 5 == 4, apply_binary (np.logical_or, u/5 == 3, u/5 == 8, u/5 == 13)) From njs at pobox.com Tue Jun 5 09:58:18 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 5 Jun 2012 14:58:18 +0100 Subject: [Numpy-discussion] lazy evaluation In-Reply-To: References: Message-ID: On Tue, Jun 5, 2012 at 12:55 PM, mark florisson wrote: > It would be great if we implement the NEP listed above, but with a few > extensions. I think Numpy should handle the lazy evaluation part, and > determine when expressions should be evaluated, etc. However, for each > user operation, Numpy will call back a user-installed hook > implementing some interface, to allow various packages to provide > their own hooks to evaluate vector operations however they want. This > will include packages such as Theano, which could run things on the > GPU, Numexpr, and in the future > https://github.com/markflorisson88/minivect (which will likely have an > LLVM backend in the future, and possibly integrated with Numba to > allow inlining of numba ufuncs). The project above tries to bring > together all the different array expression compilers together in a > single framework, to provide efficient array expressions specialized > for any data layout (nditer on steroids if you will, with SIMD, > threaded and inlining capabilities). A global hook sounds ugly and hard to control -- it's hard to tell which operations should be deferred and which should be forced, etc. While it would be less magical, I think a more explicit API would in the end be easier to use... something like a, b, c, d = deferred([a, b, c, d]) e = a + b * c # 'e' is a deferred object too f = np.dot(e, d) # so is 'f' g = force(f) # 'g' is an ndarray # or force(f, out=g) But at that point, this could easily be an external library, right? All we'd need from numpy would be some way for external types to override the evaluation of ufuncs, np.dot, etc.? We've recently seen several reasons to want that functionality, and it seems like developing these "improved numexpr" ideas would be much easier if they didn't require doing deep surgery to numpy itself... -N From plredmond at gmail.com Tue Jun 5 10:26:37 2012 From: plredmond at gmail.com (Patrick Redmond) Date: Tue, 5 Jun 2012 10:26:37 -0400 Subject: [Numpy-discussion] 1D array sorting ascending and descending by fields In-Reply-To: References: Message-ID: On Mon, Jun 4, 2012 at 6:08 PM, Chris Barker wrote: > could you multiply the numeric field by -1, sort, then put it back > Yeah, that works great for my situation. Thanks Chris! On Mon, Jun 4, 2012 at 8:17 PM, Benjamin Root wrote: > While that may work for this users case, that would not work for all dtypes. > Some, such as timedelta, datetime and strings would not be able to be > multiplied by a number. > This is the reason why I thought there might be such a feature. > Would be an interesting feature to add, but I am not certain if the negative > sign notation would be best. Is it possible for a named field to start with > a negative sign? > I'm not sure about what is allowable in names, but I would be interested in getting involved with the NumPy project by helping to add this feature. I'll check out the contributing doc. From robert.kern at gmail.com Tue Jun 5 10:37:56 2012 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 5 Jun 2012 15:37:56 +0100 Subject: [Numpy-discussion] varargs for logical_or, etc In-Reply-To: References: Message-ID: On Tue, Jun 5, 2012 at 2:54 PM, Neal Becker wrote: > I think it's unfortunate that functions like logical_or are limited to binary. > > As a workaround, I've been using this: > > def apply_binary (func, *args): > ? ?if len (args) == 1: > ? ? ? ?return args[0] > ? ?elif len (args) == 2: > ? ? ? ?return func (*args) > ? ?else: > ? ? ? ?return func ( > ? ? ? ? ? ?apply_binary (func, *args[:len(args)/2]), > ? ? ? ? ? ?apply_binary (func, *args[(len(args))/2:])) > > Then for example: > > punc2 = np.logical_and (u % 5 == 4, > ? ? ? ? ? ? ? ? ? ? ? apply_binary (np.logical_or, u/5 == 3, u/5 == 8, u/5 == > 13)) reduce(np.logical_and, args) -- Robert Kern From njs at pobox.com Tue Jun 5 10:49:44 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 5 Jun 2012 15:49:44 +0100 Subject: [Numpy-discussion] 1D array sorting ascending and descending by fields In-Reply-To: References: Message-ID: On Tue, Jun 5, 2012 at 1:17 AM, Benjamin Root wrote: > > > On Monday, June 4, 2012, Chris Barker wrote: >> >> On Mon, Jun 4, 2012 at 11:10 AM, Patrick Redmond >> wrote: >> > Here's how I sorted primarily by field 'a' descending and secondarily by >> > field 'b' ascending: >> >> could you multiply the numeric field by -1, sort, then put it back -- >> somethign like: >> >> data *- -1 >> data_sorted = np.sort(data, order=['a','b']) >> data_sorted *= -1 >> >> (reverse if necessary -- I lost track...) >> >> -Chris > > > > While that may work for this users case, that would not work for all dtypes. > Some, such as timedelta, datetime and strings would not be able to be > multiplied by a number. > > Would be an interesting feature to add, but I am not certain if the negative > sign notation would be best. Is it possible for a named field to start with > a negative sign? Maybe add a reverse= argument (named after the corresponding argument to list.sort and __builtins__.sorted). # sorts in descending order, no fields required np.sort([10, 20, 0], reverse=True) # sorts in descending order np.sort(rec_array, order=("a", "b"), reverse=True) # ascending by "a" then descending by "b" np.sort(rec_array, order=("a", "b"), reverse=(False, True)) ? -n From mwwiebe at gmail.com Tue Jun 5 11:01:01 2012 From: mwwiebe at gmail.com (Mark Wiebe) Date: Tue, 5 Jun 2012 10:01:01 -0500 Subject: [Numpy-discussion] nditer_buffer_flag branch (was: Add data memory allocation tracing facilities. (#284)) In-Reply-To: References: Message-ID: On Tue, Jun 5, 2012 at 5:40 AM, Nathaniel Smith wrote: > On Tue, Jun 5, 2012 at 11:06 AM, Thouis (Ray) Jones wrote: > > All of the failing tests seem to have been caused by the buffer copy > bug, fixed in https://github.com/mwiebe/numpy/tree/nditer_buffer_flag(but not yet pulled into numpy). > > > > I also have a version that implements tracing, with pure C in the > allocation functions writing to a dynamically allocated buffer, which must > then be fetched proactively by Python. However, I think this version is a > little nicer to use from the Python perspective. > > > > --- > > Reply to this email directly or view it on GitHub: > > https://github.com/numpy/numpy/pull/284#issuecomment-6121817 > > Speaking of which, Mark - what's the status of that nditer_buffer_flag > branch? Should there be a pull request? > Thanks for the nudge, I've made a PR. -Mark > > -N > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From markflorisson88 at gmail.com Tue Jun 5 11:12:46 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 5 Jun 2012 16:12:46 +0100 Subject: [Numpy-discussion] lazy evaluation In-Reply-To: References: Message-ID: On 5 June 2012 14:58, Nathaniel Smith wrote: > On Tue, Jun 5, 2012 at 12:55 PM, mark florisson > wrote: >> It would be great if we implement the NEP listed above, but with a few >> extensions. I think Numpy should handle the lazy evaluation part, and >> determine when expressions should be evaluated, etc. However, for each >> user operation, Numpy will call back a user-installed hook >> implementing some interface, to allow various packages to provide >> their own hooks to evaluate vector operations however they want. This >> will include packages such as Theano, which could run things on the >> GPU, Numexpr, and in the future >> https://github.com/markflorisson88/minivect (which will likely have an >> LLVM backend in the future, and possibly integrated with Numba to >> allow inlining of numba ufuncs). The project above tries to bring >> together all the different array expression compilers together in a >> single framework, to provide efficient array expressions specialized >> for any data layout (nditer on steroids if you will, with SIMD, >> threaded and inlining capabilities). > > A global hook sounds ugly and hard to control -- it's hard to tell > which operations should be deferred and which should be forced, etc. Yes, but for the user the difference should not be visible (unless operations can raise exceptions, in which case you choose the safe path, or let the user configure what to do). > While it would be less magical, I think a more explicit API would in > the end be easier to use... something like > > ?a, b, c, d = deferred([a, b, c, d]) > ?e = a + b * c ?# 'e' is a deferred object too > ?f = np.dot(e, d) ?# so is 'f' > ?g = force(f) ?# 'g' is an ndarray > ?# or > ?force(f, out=g) > > But at that point, this could easily be an external library, right? > All we'd need from numpy would be some way for external types to > override the evaluation of ufuncs, np.dot, etc.? We've recently seen > several reasons to want that functionality, and it seems like > developing these "improved numexpr" ideas would be much easier if they > didn't require doing deep surgery to numpy itself... Definitely, but besides monkey-patch-chaining I think some modifications would be required, but they would be reasonably simple. Most of the functionality would be handled in one function, which most ufuncs (the ones you care about, as well as ufunc (methods) like add) call. E.g. if ((result = NPy_LazyEval("add", op1, op2)) return result; , which is inserted after argument unpacking and sanity checking. You could also do a per-module hook, and have the function look at sys._getframe(1).f_globals, but that is fragile and won't work from C or Cython code. How did you have overrides in mind? I also found this thread: http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html , but I think you want more than just to override ufuncs, you want numpy to govern when stuff is allowed to be lazy and when stuff should be evaluated (e.g. when it is indexed, slice assigned (although that itself may also be lazy), etc). You don't want some funny object back that doesn't work with things which are not overridden in numpy. > -N > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Tue Jun 5 11:19:35 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 5 Jun 2012 09:19:35 -0600 Subject: [Numpy-discussion] commit rights for Nathaniel In-Reply-To: References: Message-ID: On Sun, Jun 3, 2012 at 12:04 PM, Ralf Gommers wrote: > > > On Sun, Jun 3, 2012 at 6:43 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> Hi All, >> >> Numpy is approaching a time of transition. Ralf will be concentrating his >> efforts on Scipy > > > I'll write a separate post on that asap. > > >> and I will be cutting back on my work on Numpy. > > > I sincerely hope you don't cut back on your work too much Charles. You > have done an excellent job as "chief maintainer" over the last years. > > The 1.7 release looks to be delayed and I suspect that the Continuum >> Analytics folks will become increasingly dedicated to the big data push. We >> need new people to carry things forward and I think Nathaniel can pick up >> part of the load. >> > > Assuming he wants them, I am definitely +1 on giving Nathaniel commit > rights. His recent patches and debugging of issues were of high quality and > very helpful. > > OK, I went ahead and added him whether he wants it or not ;) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Jun 5 11:34:41 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 5 Jun 2012 16:34:41 +0100 Subject: [Numpy-discussion] Changes in PyArray_FromAny between 1.5.x and 1.6.x In-Reply-To: <4FCD24CC.7090305@astro.uio.no> References: <4FCD24CC.7090305@astro.uio.no> Message-ID: On Mon, Jun 4, 2012 at 10:12 PM, Dag Sverre Seljebotn wrote: > On 06/04/2012 09:06 PM, Mike Hansen wrote: >> On Mon, May 28, 2012 at 3:15 AM, Mike Hansen ?wrote: >>> In trying to upgrade NumPy within Sage, we notices some differences in >>> behavior between 1.5 and 1.6. ?In particular, in 1.5, we have >>> >>> sage: f = 0.5 >>> sage: f.__array_interface__ >>> {'typestr': '=f8'} >>> sage: numpy.array(f) >>> array(0.5) >>> sage: numpy.array(float(f)) >>> array(0.5) >>> >>> In 1.6, we get the following, >>> >>> sage: f = 0.5 >>> sage: f.__array_interface__ >>> {'typestr': '=f8'} >>> sage: numpy.array(f) >>> array(0.500000000000000, dtype=object) >>> >>> This seems to be do to the changes in PyArray_FromAny introduced in >>> https://github.com/mwhansen/numpy/commit/2635398db3f26529ce2aaea4028a8118844f3c48 >>> . ?In particular, _array_find_type used to be used to query our >>> __array_interface__ attribute, and it no longer seems to work. ?Is >>> there a way to get the old behavior with the current code? > > No idea. If you want to spend the time to fix this properly, you could > implement PEP 3118 and use that instead to export your array data (which > can be done from Cython using __getbuffer__ on a Cython class). I don't think that would work, because looking more closely, I don't think they're actually doing anything like what __array_interface__/PEP3118 are designed for. They just have some custom class ("sage.rings.real_mpfr.RealLiteral", I guess an arbitrary precision floating point of some sort?), and they want instances that are passed to np.array() to be automatically coerced to another type (float64) by default. But there's no buffer sharing or anything like that going on at all. Mike, does that sound right? This automagic coercion seems... in very dubious taste to me. (Why does creating an array object imply that you want to throw away precision? You can already throw away precision explicitly by doing np.array(f, dtype=float).) But if this automatic coercion feature is useful, then wouldn't it be better to have a different interface instead of kluging it into __array_interface__, like we should check for an attribute called __numpy_preferred_dtype__ or something? -n From njs at pobox.com Tue Jun 5 12:25:00 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 5 Jun 2012 17:25:00 +0100 Subject: [Numpy-discussion] commit rights for Nathaniel In-Reply-To: References: Message-ID: On Tue, Jun 5, 2012 at 4:19 PM, Charles R Harris wrote: > > > On Sun, Jun 3, 2012 at 12:04 PM, Ralf Gommers > wrote: >> >> >> >> On Sun, Jun 3, 2012 at 6:43 PM, Charles R Harris >> wrote: >>> >>> Hi All, >>> >>> Numpy is approaching a time of transition. Ralf will be concentrating his >>> efforts on Scipy >> >> >> I'll write a separate post on that asap. >> >>> >>> and I will be cutting back on my work on Numpy. >> >> >> I sincerely hope you don't cut back on your work too much Charles. You >> have done an excellent job as "chief maintainer" over the last years. >> >>> The 1.7 release looks to be delayed and I suspect that the Continuum >>> Analytics folks will become increasingly dedicated to the big data push. We >>> need new people to carry things forward and I think Nathaniel can pick up >>> part of the load. >> >> >> Assuming he wants them, I am definitely +1 on giving Nathaniel commit >> rights. His recent patches and debugging of issues were of high quality and >> very helpful. >> > > OK, I went ahead and added him whether he wants it or not ;) Hah. Thanks! Is there a "committers guide" anywhere? By default I would assume that the rules are pretty much -- continue sending pull requests for my own changes (unless a trivial typo fix in a comment or something), go ahead and merge anyone else's pull request where things seem okay and my best judgement is we have consensus, fix things if my judgement was wrong? But I don't want to step on any toes... -n From njs at pobox.com Tue Jun 5 12:38:20 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 5 Jun 2012 17:38:20 +0100 Subject: [Numpy-discussion] lazy evaluation In-Reply-To: References: Message-ID: On Tue, Jun 5, 2012 at 4:12 PM, mark florisson wrote: > On 5 June 2012 14:58, Nathaniel Smith wrote: >> On Tue, Jun 5, 2012 at 12:55 PM, mark florisson >> wrote: >>> It would be great if we implement the NEP listed above, but with a few >>> extensions. I think Numpy should handle the lazy evaluation part, and >>> determine when expressions should be evaluated, etc. However, for each >>> user operation, Numpy will call back a user-installed hook >>> implementing some interface, to allow various packages to provide >>> their own hooks to evaluate vector operations however they want. This >>> will include packages such as Theano, which could run things on the >>> GPU, Numexpr, and in the future >>> https://github.com/markflorisson88/minivect (which will likely have an >>> LLVM backend in the future, and possibly integrated with Numba to >>> allow inlining of numba ufuncs). The project above tries to bring >>> together all the different array expression compilers together in a >>> single framework, to provide efficient array expressions specialized >>> for any data layout (nditer on steroids if you will, with SIMD, >>> threaded and inlining capabilities). >> >> A global hook sounds ugly and hard to control -- it's hard to tell >> which operations should be deferred and which should be forced, etc. > > Yes, but for the user the difference should not be visible (unless > operations can raise exceptions, in which case you choose the safe > path, or let the user configure what to do). > >> While it would be less magical, I think a more explicit API would in >> the end be easier to use... something like >> >> ?a, b, c, d = deferred([a, b, c, d]) >> ?e = a + b * c ?# 'e' is a deferred object too >> ?f = np.dot(e, d) ?# so is 'f' >> ?g = force(f) ?# 'g' is an ndarray >> ?# or >> ?force(f, out=g) >> >> But at that point, this could easily be an external library, right? >> All we'd need from numpy would be some way for external types to >> override the evaluation of ufuncs, np.dot, etc.? We've recently seen >> several reasons to want that functionality, and it seems like >> developing these "improved numexpr" ideas would be much easier if they >> didn't require doing deep surgery to numpy itself... > > Definitely, but besides monkey-patch-chaining I think some > modifications would be required, but they would be reasonably simple. > Most of the functionality would be handled in one function, which most > ufuncs (the ones you care about, as well as ufunc (methods) like add) > call. E.g. if ((result = NPy_LazyEval("add", op1, op2)) return result; > , which is inserted after argument unpacking and sanity checking. You > could also do a per-module hook, and have the function look at > sys._getframe(1).f_globals, but that is fragile and won't work from C > or Cython code. > > How did you have overrides in mind? My vague idea is that core numpy operations are about as fundamental for scientific users as the Python builtin operations are, so they should probably be overrideable in a similar way. So we'd teach numpy functions to check for methods named like "__numpy_ufunc__" or "__numpy_dot__" and let themselves be overridden if found. Like how __gt__ and __add__ and stuff work. Or something along those lines. > I also found this thread: > http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html > , but I think you want more than just to override ufuncs, you want > numpy to govern when stuff is allowed to be lazy and when stuff should > be evaluated (e.g. when it is indexed, slice assigned (although that > itself may also be lazy), etc). You don't want some funny object back > that doesn't work with things which are not overridden in numpy. My point is that probably numpy should *not* govern the decision about what stuff should be lazy and what should be evaluated; that should be governed by some combination of the user and Numba/Theano/minivect/whatever. The toy API I sketched out would make those decisions obvious and explicit. (And if the funny objects had an __array_interface__ attribute that automatically forced evaluation when accessed, then they'd work fine with code that was expecting an array, or if they were assigned to a "real" ndarray, etc.) -n From ben.root at ou.edu Tue Jun 5 12:43:28 2012 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 5 Jun 2012 12:43:28 -0400 Subject: [Numpy-discussion] 1D array sorting ascending and descending by fields In-Reply-To: References: Message-ID: On Tue, Jun 5, 2012 at 10:49 AM, Nathaniel Smith wrote: > On Tue, Jun 5, 2012 at 1:17 AM, Benjamin Root wrote: > > > > > > On Monday, June 4, 2012, Chris Barker wrote: > >> > >> On Mon, Jun 4, 2012 at 11:10 AM, Patrick Redmond > >> wrote: > >> > Here's how I sorted primarily by field 'a' descending and secondarily > by > >> > field 'b' ascending: > >> > >> could you multiply the numeric field by -1, sort, then put it back -- > >> somethign like: > >> > >> data *- -1 > >> data_sorted = np.sort(data, order=['a','b']) > >> data_sorted *= -1 > >> > >> (reverse if necessary -- I lost track...) > >> > >> -Chris > > > > > > > > While that may work for this users case, that would not work for all > dtypes. > > Some, such as timedelta, datetime and strings would not be able to be > > multiplied by a number. > > > > Would be an interesting feature to add, but I am not certain if the > negative > > sign notation would be best. Is it possible for a named field to start > with > > a negative sign? > > Maybe add a reverse= argument (named after the corresponding argument > to list.sort and __builtins__.sorted). > > # sorts in descending order, no fields required > np.sort([10, 20, 0], reverse=True) > # sorts in descending order > np.sort(rec_array, order=("a", "b"), reverse=True) > # ascending by "a" then descending by "b" > np.sort(rec_array, order=("a", "b"), reverse=(False, True)) > > ? > > -n > Clear, unambiguous, and works with the existing framework. +1 Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Tue Jun 5 12:59:10 2012 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 5 Jun 2012 12:59:10 -0400 Subject: [Numpy-discussion] varargs for logical_or, etc In-Reply-To: References: Message-ID: On Tue, Jun 5, 2012 at 10:37 AM, Robert Kern wrote: > On Tue, Jun 5, 2012 at 2:54 PM, Neal Becker wrote: > > I think it's unfortunate that functions like logical_or are limited to > binary. > > > > As a workaround, I've been using this: > > > > def apply_binary (func, *args): > > if len (args) == 1: > > return args[0] > > elif len (args) == 2: > > return func (*args) > > else: > > return func ( > > apply_binary (func, *args[:len(args)/2]), > > apply_binary (func, *args[(len(args))/2:])) > > > > Then for example: > > > > punc2 = np.logical_and (u % 5 == 4, > > apply_binary (np.logical_or, u/5 == 3, u/5 == 8, > u/5 == > > 13)) > > > reduce(np.logical_and, args) > > I would love it if we could add something like that to the doc-string of those functions because I don't think it is immediately obvious. How do we do that for ufuncs? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Tue Jun 5 13:21:50 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 05 Jun 2012 13:21:50 -0400 Subject: [Numpy-discussion] lazy evaluation References: Message-ID: Would lazy eval be able to eliminate temps in doing operations such as: np.sum (u != 23)? That is, now ops involving selecting elements of matrixes are often performed by first constructing temp matrixes, and the operating on them. From mhansen at gmail.com Tue Jun 5 13:42:01 2012 From: mhansen at gmail.com (Mike Hansen) Date: Tue, 5 Jun 2012 10:42:01 -0700 Subject: [Numpy-discussion] Changes in PyArray_FromAny between 1.5.x and 1.6.x In-Reply-To: References: <4FCD24CC.7090305@astro.uio.no> Message-ID: On Tue, Jun 5, 2012 at 8:34 AM, Nathaniel Smith wrote: > I don't think that would work, because looking more closely, I don't > think they're actually doing anything like what > __array_interface__/PEP3118 are designed for. They just have some > custom class ("sage.rings.real_mpfr.RealLiteral", I guess an arbitrary > precision floating point of some sort?), and they want instances that > are passed to np.array() to be automatically coerced to another type > (float64) by default. But there's no buffer sharing or anything like > that going on at all. Mike, does that sound right? Yes, there's no buffer sharing going on at all. > This automagic coercion seems... in very dubious taste to me. (Why > does creating an array object imply that you want to throw away > precision? The __array_interface__ attribute is a property which depends on the precision of the ring. If it floats have enough precision, you just get floats; otherwise you get objects. > You can already throw away precision explicitly by doing > np.array(f, dtype=float).) But if this automatic coercion feature is > useful, then wouldn't it be better to have a different interface > instead of kluging it into __array_interface__, like we should check > for an attribute called __numpy_preferred_dtype__ or something? It isn't just the array() calls which end up getting problems. For example, in 1.5.x sage: f = 10; type(f) sage: numpy.arange(f) array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) #int64 while in 1.6.x sage: numpy.arange(f) array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=object) We also see problems with calls like sage: scipy.stats.uniform(0,15).ppf([0.5,0.7]) array([ 7.5, 10.5]) which work in 1.5.x, but fail with a traceback "TypeError: array cannot be safely cast to required type" in 1.6.x. --Mike From zachary.pincus at yale.edu Tue Jun 5 13:51:50 2012 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Tue, 5 Jun 2012 13:51:50 -0400 Subject: [Numpy-discussion] Changes in PyArray_FromAny between 1.5.x and 1.6.x In-Reply-To: References: <4FCD24CC.7090305@astro.uio.no> Message-ID: <99C221FE-6285-485E-9298-831E675C5A8B@yale.edu> > It isn't just the array() calls which end up getting problems. For > example, in 1.5.x > > sage: f = 10; type(f) > > sage: numpy.arange(f) > array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) #int64 > > while in 1.6.x > > sage: numpy.arange(f) > array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=object) > > We also see problems with calls like > > sage: scipy.stats.uniform(0,15).ppf([0.5,0.7]) > array([ 7.5, 10.5]) > > which work in 1.5.x, but fail with a traceback "TypeError: array > cannot be safely cast to required type" in 1.6.x. I'm getting problems like this after a 1.6 upgrade as well. Lots of object arrays being created when previously there would either be an error, or an array of floats. Also, lots of the "TypeError: array cannot be safely cast to required type" are cropping up. Honestly, most of these are in places where my code was lax and so I just cleaned things up to use the right dtypes etc. But still a bit unexpected in terms of having more code to fix than I was used to for 0.X numpy revisions. Just another data-point, though. Not really a complaint. Zach From charlesr.harris at gmail.com Tue Jun 5 13:52:40 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 5 Jun 2012 11:52:40 -0600 Subject: [Numpy-discussion] commit rights for Nathaniel In-Reply-To: References: Message-ID: On Tue, Jun 5, 2012 at 10:25 AM, Nathaniel Smith wrote: > On Tue, Jun 5, 2012 at 4:19 PM, Charles R Harris > wrote: > > > > > > On Sun, Jun 3, 2012 at 12:04 PM, Ralf Gommers < > ralf.gommers at googlemail.com> > > wrote: > >> > >> > >> > >> On Sun, Jun 3, 2012 at 6:43 PM, Charles R Harris > >> wrote: > >>> > >>> Hi All, > >>> > >>> Numpy is approaching a time of transition. Ralf will be concentrating > his > >>> efforts on Scipy > >> > >> > >> I'll write a separate post on that asap. > >> > >>> > >>> and I will be cutting back on my work on Numpy. > >> > >> > >> I sincerely hope you don't cut back on your work too much Charles. You > >> have done an excellent job as "chief maintainer" over the last years. > >> > >>> The 1.7 release looks to be delayed and I suspect that the Continuum > >>> Analytics folks will become increasingly dedicated to the big data > push. We > >>> need new people to carry things forward and I think Nathaniel can pick > up > >>> part of the load. > >> > >> > >> Assuming he wants them, I am definitely +1 on giving Nathaniel commit > >> rights. His recent patches and debugging of issues were of high quality > and > >> very helpful. > >> > > > > OK, I went ahead and added him whether he wants it or not ;) > > Hah. Thanks! > > Is there a "committers guide" anywhere? By default I would assume that > the rules are pretty much -- continue sending pull requests for my own > changes (unless a trivial typo fix in a comment or something), go > ahead and merge anyone else's pull request where things seem okay and > my best judgement is we have consensus, fix things if my judgement was > wrong? But I don't want to step on any toes... > > You can commit your own stuff also if someone signs off on it or it seems uncontroversial and has sat there for a while. It's mostly a judgement call. For the commits themselves, the github button doesn't do fast forward or whitespace cleanup, so I have the following alias in .git/config getpatch = !sh -c 'git co -b pull-$1 master &&\ curl https://github.com/numpy/nump/pull/$1.patch|\ git am -3 --whitespace=strip' - which opens a new branch pull-nnn and is useful for the bigger commits so they can be tested and then merged with master before pushing. The non-trivial commits should be tested with at least Python 2.4, 2.7, and 3.2. I also suggest running the one-file build for changes in core since most developers do the separate file thing and sometimes fail to catch single file build problems. Keep an eye on coding style, otherwise it will drift. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From markflorisson88 at gmail.com Tue Jun 5 14:02:51 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 5 Jun 2012 19:02:51 +0100 Subject: [Numpy-discussion] lazy evaluation In-Reply-To: References: Message-ID: On 5 June 2012 18:21, Neal Becker wrote: > Would lazy eval be able to eliminate temps in doing operations such as: > > np.sum (u != 23)? > > That is, now ops involving selecting elements of matrixes are often performed by > first constructing temp matrixes, and the operating on them. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Sure, yeah, it's pretty easy to generate a loop with an if statement and a reduction. From charlesr.harris at gmail.com Tue Jun 5 14:03:48 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 5 Jun 2012 12:03:48 -0600 Subject: [Numpy-discussion] commit rights for Nathaniel In-Reply-To: References: Message-ID: On Tue, Jun 5, 2012 at 11:52 AM, Charles R Harris wrote: > > > On Tue, Jun 5, 2012 at 10:25 AM, Nathaniel Smith wrote: > >> On Tue, Jun 5, 2012 at 4:19 PM, Charles R Harris >> wrote: >> > >> > >> > On Sun, Jun 3, 2012 at 12:04 PM, Ralf Gommers < >> ralf.gommers at googlemail.com> >> > wrote: >> >> >> >> >> >> >> >> On Sun, Jun 3, 2012 at 6:43 PM, Charles R Harris >> >> wrote: >> >>> >> >>> Hi All, >> >>> >> >>> Numpy is approaching a time of transition. Ralf will be concentrating >> his >> >>> efforts on Scipy >> >> >> >> >> >> I'll write a separate post on that asap. >> >> >> >>> >> >>> and I will be cutting back on my work on Numpy. >> >> >> >> >> >> I sincerely hope you don't cut back on your work too much Charles. You >> >> have done an excellent job as "chief maintainer" over the last years. >> >> >> >>> The 1.7 release looks to be delayed and I suspect that the Continuum >> >>> Analytics folks will become increasingly dedicated to the big data >> push. We >> >>> need new people to carry things forward and I think Nathaniel can >> pick up >> >>> part of the load. >> >> >> >> >> >> Assuming he wants them, I am definitely +1 on giving Nathaniel commit >> >> rights. His recent patches and debugging of issues were of high >> quality and >> >> very helpful. >> >> >> > >> > OK, I went ahead and added him whether he wants it or not ;) >> >> Hah. Thanks! >> >> Is there a "committers guide" anywhere? By default I would assume that >> the rules are pretty much -- continue sending pull requests for my own >> changes (unless a trivial typo fix in a comment or something), go >> ahead and merge anyone else's pull request where things seem okay and >> my best judgement is we have consensus, fix things if my judgement was >> wrong? But I don't want to step on any toes... >> >> > You can commit your own stuff also if someone signs off on it or it seems > uncontroversial and has sat there for a while. It's mostly a judgement call. > > For the commits themselves, the github button doesn't do fast forward or > whitespace cleanup, so I have the following alias in .git/config > > getpatch = !sh -c 'git co -b pull-$1 master &&\ > curl https://github.com/numpy/nump/pull/$1.patch|\ > git am -3 --whitespace=strip' - > > which opens a new branch pull-nnn and is useful for the bigger commits so > they can be tested and then merged with master before pushing. The > non-trivial commits should be tested with at least Python 2.4, 2.7, and > 3.2. I also suggest running the one-file build for changes in core since > most developers do the separate file thing and sometimes fail to catch > single file build problems. > > Keep an eye on coding style, otherwise it will drift. > > And keep in mind that part of your job is to train new committers and help bring them up to speed. See yourself as a recruiter as well as a reviewer. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From markflorisson88 at gmail.com Tue Jun 5 14:08:38 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 5 Jun 2012 19:08:38 +0100 Subject: [Numpy-discussion] lazy evaluation In-Reply-To: References: Message-ID: On 5 June 2012 17:38, Nathaniel Smith wrote: > On Tue, Jun 5, 2012 at 4:12 PM, mark florisson > wrote: >> On 5 June 2012 14:58, Nathaniel Smith wrote: >>> On Tue, Jun 5, 2012 at 12:55 PM, mark florisson >>> wrote: >>>> It would be great if we implement the NEP listed above, but with a few >>>> extensions. I think Numpy should handle the lazy evaluation part, and >>>> determine when expressions should be evaluated, etc. However, for each >>>> user operation, Numpy will call back a user-installed hook >>>> implementing some interface, to allow various packages to provide >>>> their own hooks to evaluate vector operations however they want. This >>>> will include packages such as Theano, which could run things on the >>>> GPU, Numexpr, and in the future >>>> https://github.com/markflorisson88/minivect (which will likely have an >>>> LLVM backend in the future, and possibly integrated with Numba to >>>> allow inlining of numba ufuncs). The project above tries to bring >>>> together all the different array expression compilers together in a >>>> single framework, to provide efficient array expressions specialized >>>> for any data layout (nditer on steroids if you will, with SIMD, >>>> threaded and inlining capabilities). >>> >>> A global hook sounds ugly and hard to control -- it's hard to tell >>> which operations should be deferred and which should be forced, etc. >> >> Yes, but for the user the difference should not be visible (unless >> operations can raise exceptions, in which case you choose the safe >> path, or let the user configure what to do). >> >>> While it would be less magical, I think a more explicit API would in >>> the end be easier to use... something like >>> >>> ?a, b, c, d = deferred([a, b, c, d]) >>> ?e = a + b * c ?# 'e' is a deferred object too >>> ?f = np.dot(e, d) ?# so is 'f' >>> ?g = force(f) ?# 'g' is an ndarray >>> ?# or >>> ?force(f, out=g) >>> >>> But at that point, this could easily be an external library, right? >>> All we'd need from numpy would be some way for external types to >>> override the evaluation of ufuncs, np.dot, etc.? We've recently seen >>> several reasons to want that functionality, and it seems like >>> developing these "improved numexpr" ideas would be much easier if they >>> didn't require doing deep surgery to numpy itself... >> >> Definitely, but besides monkey-patch-chaining I think some >> modifications would be required, but they would be reasonably simple. >> Most of the functionality would be handled in one function, which most >> ufuncs (the ones you care about, as well as ufunc (methods) like add) >> call. E.g. if ((result = NPy_LazyEval("add", op1, op2)) return result; >> , which is inserted after argument unpacking and sanity checking. You >> could also do a per-module hook, and have the function look at >> sys._getframe(1).f_globals, but that is fragile and won't work from C >> or Cython code. >> >> How did you have overrides in mind? > > My vague idea is that core numpy operations are about as fundamental > for scientific users as the Python builtin operations are, so they > should probably be overrideable in a similar way. So we'd teach numpy > functions to check for methods named like "__numpy_ufunc__" or > "__numpy_dot__" and let themselves be overridden if found. Like how > __gt__ and __add__ and stuff work. Or something along those lines. > >> I also found this thread: >> http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html >> , but I think you want more than just to override ufuncs, you want >> numpy to govern when stuff is allowed to be lazy and when stuff should >> be evaluated (e.g. when it is indexed, slice assigned (although that >> itself may also be lazy), etc). You don't want some funny object back >> that doesn't work with things which are not overridden in numpy. > > My point is that probably numpy should *not* govern the decision about > what stuff should be lazy and what should be evaluated; that should be > governed by some combination of the user and > Numba/Theano/minivect/whatever. The toy API I sketched out would make > those decisions obvious and explicit. (And if the funny objects had an > __array_interface__ attribute that automatically forced evaluation > when accessed, then they'd work fine with code that was expecting an > array, or if they were assigned to a "real" ndarray, etc.) That's disappointing though, since the performance drawbacks can severely limit the usefulness for people with big data sets. Ideally, you would take your intuitive numpy code, and make it go fast, without jumping through hoops. Numpypy has lazy evaluation, I don't know how good a job it does, but it does mean you can finally get fast numpy code in an intuitive way (and even run it on a GPU if that is possible and beneficial). > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Tue Jun 5 14:13:45 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 5 Jun 2012 12:13:45 -0600 Subject: [Numpy-discussion] Changes in PyArray_FromAny between 1.5.x and 1.6.x In-Reply-To: <99C221FE-6285-485E-9298-831E675C5A8B@yale.edu> References: <4FCD24CC.7090305@astro.uio.no> <99C221FE-6285-485E-9298-831E675C5A8B@yale.edu> Message-ID: On Tue, Jun 5, 2012 at 11:51 AM, Zachary Pincus wrote: > > It isn't just the array() calls which end up getting problems. For > > example, in 1.5.x > > > > sage: f = 10; type(f) > > > > sage: numpy.arange(f) > > array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) #int64 > > > > while in 1.6.x > > > > sage: numpy.arange(f) > > array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=object) > > > > We also see problems with calls like > > > > sage: scipy.stats.uniform(0,15).ppf([0.5,0.7]) > > array([ 7.5, 10.5]) > > > > which work in 1.5.x, but fail with a traceback "TypeError: array > > cannot be safely cast to required type" in 1.6.x. > > I'm getting problems like this after a 1.6 upgrade as well. Lots of object > arrays being created when previously there would either be an error, or an > array of floats. > > Also, lots of the "TypeError: array cannot be safely cast to required > type" are cropping up. > > Honestly, most of these are in places where my code was lax and so I just > cleaned things up to use the right dtypes etc. But still a bit unexpected > in terms of having more code to fix than I was used to for 0.X numpy > revisions. > There is a fine line here. We do need to make people clean up lax code in order to improve numpy, but hopefully we can keep the cleanups reasonable. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From zachary.pincus at yale.edu Tue Jun 5 14:41:28 2012 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Tue, 5 Jun 2012 14:41:28 -0400 Subject: [Numpy-discussion] Changes in PyArray_FromAny between 1.5.x and 1.6.x In-Reply-To: References: <4FCD24CC.7090305@astro.uio.no> <99C221FE-6285-485E-9298-831E675C5A8B@yale.edu> Message-ID: <5C74C855-3949-4FDF-99DF-1411CC0330F3@yale.edu> > There is a fine line here. We do need to make people clean up lax code in order to improve numpy, but hopefully we can keep the cleanups reasonable. Oh agreed. Somehow, though, I was surprised by this, even though I keep tabs on the numpy lists -- at no point did it become clear that "big changes in how arrays get constructed and typecast are ahead that may require code fixes". That was my main point, but probably a PEBCAK issue more than anything. Zach From ralf.gommers at googlemail.com Tue Jun 5 14:47:53 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 5 Jun 2012 20:47:53 +0200 Subject: [Numpy-discussion] Changes in PyArray_FromAny between 1.5.x and 1.6.x In-Reply-To: <5C74C855-3949-4FDF-99DF-1411CC0330F3@yale.edu> References: <4FCD24CC.7090305@astro.uio.no> <99C221FE-6285-485E-9298-831E675C5A8B@yale.edu> <5C74C855-3949-4FDF-99DF-1411CC0330F3@yale.edu> Message-ID: On Tue, Jun 5, 2012 at 8:41 PM, Zachary Pincus wrote: > > There is a fine line here. We do need to make people clean up lax code > in order to improve numpy, but hopefully we can keep the cleanups > reasonable. > > Oh agreed. Somehow, though, I was surprised by this, even though I keep > tabs on the numpy lists -- at no point did it become clear that "big > changes in how arrays get constructed and typecast are ahead that may > require code fixes". That was my main point, but probably a PEBCAK issue > more than anything. > It was fairly extensively discussed when introduced, http://thread.gmane.org/gmane.comp.python.numeric.general/44206, and again at some later point. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Tue Jun 5 14:51:26 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 5 Jun 2012 20:51:26 +0200 Subject: [Numpy-discussion] varargs for logical_or, etc In-Reply-To: References: Message-ID: On Tue, Jun 5, 2012 at 6:59 PM, Benjamin Root wrote: > > > On Tue, Jun 5, 2012 at 10:37 AM, Robert Kern wrote: > >> On Tue, Jun 5, 2012 at 2:54 PM, Neal Becker wrote: >> > I think it's unfortunate that functions like logical_or are limited to >> binary. >> > >> > As a workaround, I've been using this: >> > >> > def apply_binary (func, *args): >> > if len (args) == 1: >> > return args[0] >> > elif len (args) == 2: >> > return func (*args) >> > else: >> > return func ( >> > apply_binary (func, *args[:len(args)/2]), >> > apply_binary (func, *args[(len(args))/2:])) >> > >> > Then for example: >> > >> > punc2 = np.logical_and (u % 5 == 4, >> > apply_binary (np.logical_or, u/5 == 3, u/5 == 8, >> u/5 == >> > 13)) >> >> >> reduce(np.logical_and, args) >> >> > I would love it if we could add something like that to the doc-string of > those functions because I don't think it is immediately obvious. How do we > do that for ufuncs? > Edit numpy/core/code_generators/ufunc_docstrings.py Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Jun 5 14:55:42 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 5 Jun 2012 19:55:42 +0100 Subject: [Numpy-discussion] Changes in PyArray_FromAny between 1.5.x and 1.6.x In-Reply-To: References: <4FCD24CC.7090305@astro.uio.no> <99C221FE-6285-485E-9298-831E675C5A8B@yale.edu> <5C74C855-3949-4FDF-99DF-1411CC0330F3@yale.edu> Message-ID: On Tue, Jun 5, 2012 at 7:47 PM, Ralf Gommers wrote: > > > On Tue, Jun 5, 2012 at 8:41 PM, Zachary Pincus > wrote: >> >> > There is a fine line here. We do need to make people clean up lax code >> > in order to improve numpy, but hopefully we can keep the cleanups >> > reasonable. >> >> Oh agreed. Somehow, though, I was surprised by this, even though I keep >> tabs on the numpy lists -- at no point did it become clear that "big changes >> in how arrays get constructed and typecast are ahead that may require code >> fixes". That was my main point, but probably a PEBCAK issue more than >> anything. > > > It was fairly extensively discussed when introduced, > http://thread.gmane.org/gmane.comp.python.numeric.general/44206, and again > at some later point. Those are the not-yet-finalized changes in 1.7; Zachary (I think) is talking about problems upgrading from ~1.5 to 1.6. -n From zachary.pincus at yale.edu Tue Jun 5 15:01:22 2012 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Tue, 5 Jun 2012 15:01:22 -0400 Subject: [Numpy-discussion] Changes in PyArray_FromAny between 1.5.x and 1.6.x In-Reply-To: References: <4FCD24CC.7090305@astro.uio.no> <99C221FE-6285-485E-9298-831E675C5A8B@yale.edu> <5C74C855-3949-4FDF-99DF-1411CC0330F3@yale.edu> Message-ID: <78B85B5D-DB70-45E1-9A4F-C6DA47A2530F@yale.edu> >> On Tue, Jun 5, 2012 at 8:41 PM, Zachary Pincus >> wrote: >>> >>>> There is a fine line here. We do need to make people clean up lax code >>>> in order to improve numpy, but hopefully we can keep the cleanups >>>> reasonable. >>> >>> Oh agreed. Somehow, though, I was surprised by this, even though I keep >>> tabs on the numpy lists -- at no point did it become clear that "big changes >>> in how arrays get constructed and typecast are ahead that may require code >>> fixes". That was my main point, but probably a PEBCAK issue more than >>> anything. >> >> >> It was fairly extensively discussed when introduced, >> http://thread.gmane.org/gmane.comp.python.numeric.general/44206, and again >> at some later point. > > Those are the not-yet-finalized changes in 1.7; Zachary (I think) is > talking about problems upgrading from ~1.5 to 1.6. Yes, unless I'm wrong I experienced these problems from 1.5.something to 1.6.1. I didn't take notes as it was in the middle of a deadline-crunch so I just fixed the code and moved on (long, stupid story about why the upgrade before a deadline...). It's just that the issues mentioned above seem to have hit me too and I wanted to mention that. But unhelpfully, I think, without code, and now I've hijacked this thread! Sorry. Zach From njs at pobox.com Tue Jun 5 15:14:35 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 5 Jun 2012 20:14:35 +0100 Subject: [Numpy-discussion] commit rights for Nathaniel In-Reply-To: References: Message-ID: On Tue, Jun 5, 2012 at 6:52 PM, Charles R Harris wrote: > > > On Tue, Jun 5, 2012 at 10:25 AM, Nathaniel Smith wrote: >> >> On Tue, Jun 5, 2012 at 4:19 PM, Charles R Harris >> wrote: >> > >> > >> > On Sun, Jun 3, 2012 at 12:04 PM, Ralf Gommers >> > >> > wrote: >> >> >> >> >> >> >> >> On Sun, Jun 3, 2012 at 6:43 PM, Charles R Harris >> >> wrote: >> >>> >> >>> Hi All, >> >>> >> >>> Numpy is approaching a time of transition. Ralf will be concentrating >> >>> his >> >>> efforts on Scipy >> >> >> >> >> >> I'll write a separate post on that asap. >> >> >> >>> >> >>> and I will be cutting back on my work on Numpy. >> >> >> >> >> >> I sincerely hope you don't cut back on your work too much Charles. You >> >> have done an excellent job as "chief maintainer" over the last years. >> >> >> >>> The 1.7 release looks to be delayed and I suspect that the Continuum >> >>> Analytics folks will become increasingly dedicated to the big data >> >>> push. We >> >>> need new people to carry things forward and I think Nathaniel can pick >> >>> up >> >>> part of the load. >> >> >> >> >> >> Assuming he wants them, I am definitely +1 on giving Nathaniel commit >> >> rights. His recent patches and debugging of issues were of high quality >> >> and >> >> very helpful. >> >> >> > >> > OK, I went ahead and added him whether he wants it or not ;) >> >> Hah. Thanks! >> >> Is there a "committers guide" anywhere? By default I would assume that >> the rules are pretty much -- continue sending pull requests for my own >> changes (unless a trivial typo fix in a comment or something), go >> ahead and merge anyone else's pull request where things seem okay and >> my best judgement is we have consensus, fix things if my judgement was >> wrong? But I don't want to step on any toes... >> > > You can commit your own stuff also if someone signs off on it or it seems > uncontroversial and has sat there for a while. It's mostly a judgement call. Speaking of which, this pull request has been sitting for a bit, waiting for your input :-) https://github.com/numpy/numpy/pull/280 > For the commits themselves, the github button doesn't do fast forward or > whitespace cleanup, so I have the following alias in .git/config > > getpatch = !sh -c 'git co -b pull-$1 master &&\ > ?????????? curl https://github.com/numpy/nump/pull/$1.patch|\ > ?????????? git am -3 --whitespace=strip' - > > which opens a new branch pull-nnn and is useful for the bigger commits so > they can be tested and then merged with master before pushing. The > non-trivial commits should be tested with at least Python 2.4, 2.7, and 3.2. > I also suggest running the one-file build for changes in core since most > developers do the separate file thing and sometimes fail to catch single > file build problems. Oops, heh. I don't know how to do the separate file thing, I've just been running single-file builds :-). > Keep an eye on coding style, otherwise it will drift. Thanks! -n From njs at pobox.com Tue Jun 5 15:17:38 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 5 Jun 2012 20:17:38 +0100 Subject: [Numpy-discussion] lazy evaluation In-Reply-To: References: Message-ID: On Tue, Jun 5, 2012 at 7:08 PM, mark florisson wrote: > On 5 June 2012 17:38, Nathaniel Smith wrote: >> On Tue, Jun 5, 2012 at 4:12 PM, mark florisson >> wrote: >>> On 5 June 2012 14:58, Nathaniel Smith wrote: >>>> On Tue, Jun 5, 2012 at 12:55 PM, mark florisson >>>> wrote: >>>>> It would be great if we implement the NEP listed above, but with a few >>>>> extensions. I think Numpy should handle the lazy evaluation part, and >>>>> determine when expressions should be evaluated, etc. However, for each >>>>> user operation, Numpy will call back a user-installed hook >>>>> implementing some interface, to allow various packages to provide >>>>> their own hooks to evaluate vector operations however they want. This >>>>> will include packages such as Theano, which could run things on the >>>>> GPU, Numexpr, and in the future >>>>> https://github.com/markflorisson88/minivect (which will likely have an >>>>> LLVM backend in the future, and possibly integrated with Numba to >>>>> allow inlining of numba ufuncs). The project above tries to bring >>>>> together all the different array expression compilers together in a >>>>> single framework, to provide efficient array expressions specialized >>>>> for any data layout (nditer on steroids if you will, with SIMD, >>>>> threaded and inlining capabilities). >>>> >>>> A global hook sounds ugly and hard to control -- it's hard to tell >>>> which operations should be deferred and which should be forced, etc. >>> >>> Yes, but for the user the difference should not be visible (unless >>> operations can raise exceptions, in which case you choose the safe >>> path, or let the user configure what to do). >>> >>>> While it would be less magical, I think a more explicit API would in >>>> the end be easier to use... something like >>>> >>>> ?a, b, c, d = deferred([a, b, c, d]) >>>> ?e = a + b * c ?# 'e' is a deferred object too >>>> ?f = np.dot(e, d) ?# so is 'f' >>>> ?g = force(f) ?# 'g' is an ndarray >>>> ?# or >>>> ?force(f, out=g) >>>> >>>> But at that point, this could easily be an external library, right? >>>> All we'd need from numpy would be some way for external types to >>>> override the evaluation of ufuncs, np.dot, etc.? We've recently seen >>>> several reasons to want that functionality, and it seems like >>>> developing these "improved numexpr" ideas would be much easier if they >>>> didn't require doing deep surgery to numpy itself... >>> >>> Definitely, but besides monkey-patch-chaining I think some >>> modifications would be required, but they would be reasonably simple. >>> Most of the functionality would be handled in one function, which most >>> ufuncs (the ones you care about, as well as ufunc (methods) like add) >>> call. E.g. if ((result = NPy_LazyEval("add", op1, op2)) return result; >>> , which is inserted after argument unpacking and sanity checking. You >>> could also do a per-module hook, and have the function look at >>> sys._getframe(1).f_globals, but that is fragile and won't work from C >>> or Cython code. >>> >>> How did you have overrides in mind? >> >> My vague idea is that core numpy operations are about as fundamental >> for scientific users as the Python builtin operations are, so they >> should probably be overrideable in a similar way. So we'd teach numpy >> functions to check for methods named like "__numpy_ufunc__" or >> "__numpy_dot__" and let themselves be overridden if found. Like how >> __gt__ and __add__ and stuff work. Or something along those lines. >> >>> I also found this thread: >>> http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html >>> , but I think you want more than just to override ufuncs, you want >>> numpy to govern when stuff is allowed to be lazy and when stuff should >>> be evaluated (e.g. when it is indexed, slice assigned (although that >>> itself may also be lazy), etc). You don't want some funny object back >>> that doesn't work with things which are not overridden in numpy. >> >> My point is that probably numpy should *not* govern the decision about >> what stuff should be lazy and what should be evaluated; that should be >> governed by some combination of the user and >> Numba/Theano/minivect/whatever. The toy API I sketched out would make >> those decisions obvious and explicit. (And if the funny objects had an >> __array_interface__ attribute that automatically forced evaluation >> when accessed, then they'd work fine with code that was expecting an >> array, or if they were assigned to a "real" ndarray, etc.) > > That's disappointing though, since the performance drawbacks can > severely limit the usefulness for people with big data sets. Ideally, > you would take your intuitive numpy code, and make it go fast, without > jumping through hoops. Numpypy has lazy evaluation, ?I don't know how > good a job it does, but it does mean you can finally get fast numpy > code in an intuitive way (and even run it on a GPU if that is possible > and beneficial). All of these proposals require the user to jump through hoops -- the deferred-ufunc NEP has the extra 'with deferredstate' thing, and more importantly, a set of rules that people have to learn and keep in mind for which numpy operations are affected, which ones aren't, which operations can't be performed while deferredstate is True, etc. So this has two problems: (1) these rules are opaque, (2) it's far from clear what the rules should be. If we (initially) implement "deferredness" in a third-party library with an explicit "deferredarray" type, then that works around both of them: it makes the rules transparent (operations using that type are deferred, operations using ndarray aren't), and gives you room to experiment with different approaches without having to first accomplish some major change in the numpy code base (and maybe get it wrong and have to change it again later). That's what I meant when I said in my first message that the more explicit API actually seemed like it would be easier for people to use in the long run. -n From charlesr.harris at gmail.com Tue Jun 5 15:35:16 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 5 Jun 2012 13:35:16 -0600 Subject: [Numpy-discussion] commit rights for Nathaniel In-Reply-To: References: Message-ID: On Tue, Jun 5, 2012 at 1:14 PM, Nathaniel Smith wrote: > On Tue, Jun 5, 2012 at 6:52 PM, Charles R Harris > wrote: > > > > > > On Tue, Jun 5, 2012 at 10:25 AM, Nathaniel Smith wrote: > >> > >> On Tue, Jun 5, 2012 at 4:19 PM, Charles R Harris > >> wrote: > >> > > >> > > >> > On Sun, Jun 3, 2012 at 12:04 PM, Ralf Gommers > >> > > >> > wrote: > >> >> > >> >> > >> >> > >> >> On Sun, Jun 3, 2012 at 6:43 PM, Charles R Harris > >> >> wrote: > >> >>> > >> >>> Hi All, > >> >>> > >> >>> Numpy is approaching a time of transition. Ralf will be > concentrating > >> >>> his > >> >>> efforts on Scipy > >> >> > >> >> > >> >> I'll write a separate post on that asap. > >> >> > >> >>> > >> >>> and I will be cutting back on my work on Numpy. > >> >> > >> >> > >> >> I sincerely hope you don't cut back on your work too much Charles. > You > >> >> have done an excellent job as "chief maintainer" over the last years. > >> >> > >> >>> The 1.7 release looks to be delayed and I suspect that the Continuum > >> >>> Analytics folks will become increasingly dedicated to the big data > >> >>> push. We > >> >>> need new people to carry things forward and I think Nathaniel can > pick > >> >>> up > >> >>> part of the load. > >> >> > >> >> > >> >> Assuming he wants them, I am definitely +1 on giving Nathaniel commit > >> >> rights. His recent patches and debugging of issues were of high > quality > >> >> and > >> >> very helpful. > >> >> > >> > > >> > OK, I went ahead and added him whether he wants it or not ;) > >> > >> Hah. Thanks! > >> > >> Is there a "committers guide" anywhere? By default I would assume that > >> the rules are pretty much -- continue sending pull requests for my own > >> changes (unless a trivial typo fix in a comment or something), go > >> ahead and merge anyone else's pull request where things seem okay and > >> my best judgement is we have consensus, fix things if my judgement was > >> wrong? But I don't want to step on any toes... > >> > > > > You can commit your own stuff also if someone signs off on it or it seems > > uncontroversial and has sat there for a while. It's mostly a judgement > call. > > Speaking of which, this pull request has been sitting for a bit, > waiting for your input :-) > https://github.com/numpy/numpy/pull/280 > Mark and Travis made most of the comments so I figured it was up to them to sign off. If you think it is ready, go ahead and commit it, it's one of the reasons I gave you the premissions. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From markflorisson88 at gmail.com Tue Jun 5 16:47:08 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 5 Jun 2012 21:47:08 +0100 Subject: [Numpy-discussion] lazy evaluation In-Reply-To: References: Message-ID: On 5 June 2012 20:17, Nathaniel Smith wrote: > On Tue, Jun 5, 2012 at 7:08 PM, mark florisson > wrote: >> On 5 June 2012 17:38, Nathaniel Smith wrote: >>> On Tue, Jun 5, 2012 at 4:12 PM, mark florisson >>> wrote: >>>> On 5 June 2012 14:58, Nathaniel Smith wrote: >>>>> On Tue, Jun 5, 2012 at 12:55 PM, mark florisson >>>>> wrote: >>>>>> It would be great if we implement the NEP listed above, but with a few >>>>>> extensions. I think Numpy should handle the lazy evaluation part, and >>>>>> determine when expressions should be evaluated, etc. However, for each >>>>>> user operation, Numpy will call back a user-installed hook >>>>>> implementing some interface, to allow various packages to provide >>>>>> their own hooks to evaluate vector operations however they want. This >>>>>> will include packages such as Theano, which could run things on the >>>>>> GPU, Numexpr, and in the future >>>>>> https://github.com/markflorisson88/minivect (which will likely have an >>>>>> LLVM backend in the future, and possibly integrated with Numba to >>>>>> allow inlining of numba ufuncs). The project above tries to bring >>>>>> together all the different array expression compilers together in a >>>>>> single framework, to provide efficient array expressions specialized >>>>>> for any data layout (nditer on steroids if you will, with SIMD, >>>>>> threaded and inlining capabilities). >>>>> >>>>> A global hook sounds ugly and hard to control -- it's hard to tell >>>>> which operations should be deferred and which should be forced, etc. >>>> >>>> Yes, but for the user the difference should not be visible (unless >>>> operations can raise exceptions, in which case you choose the safe >>>> path, or let the user configure what to do). >>>> >>>>> While it would be less magical, I think a more explicit API would in >>>>> the end be easier to use... something like >>>>> >>>>> ?a, b, c, d = deferred([a, b, c, d]) >>>>> ?e = a + b * c ?# 'e' is a deferred object too >>>>> ?f = np.dot(e, d) ?# so is 'f' >>>>> ?g = force(f) ?# 'g' is an ndarray >>>>> ?# or >>>>> ?force(f, out=g) >>>>> >>>>> But at that point, this could easily be an external library, right? >>>>> All we'd need from numpy would be some way for external types to >>>>> override the evaluation of ufuncs, np.dot, etc.? We've recently seen >>>>> several reasons to want that functionality, and it seems like >>>>> developing these "improved numexpr" ideas would be much easier if they >>>>> didn't require doing deep surgery to numpy itself... >>>> >>>> Definitely, but besides monkey-patch-chaining I think some >>>> modifications would be required, but they would be reasonably simple. >>>> Most of the functionality would be handled in one function, which most >>>> ufuncs (the ones you care about, as well as ufunc (methods) like add) >>>> call. E.g. if ((result = NPy_LazyEval("add", op1, op2)) return result; >>>> , which is inserted after argument unpacking and sanity checking. You >>>> could also do a per-module hook, and have the function look at >>>> sys._getframe(1).f_globals, but that is fragile and won't work from C >>>> or Cython code. >>>> >>>> How did you have overrides in mind? >>> >>> My vague idea is that core numpy operations are about as fundamental >>> for scientific users as the Python builtin operations are, so they >>> should probably be overrideable in a similar way. So we'd teach numpy >>> functions to check for methods named like "__numpy_ufunc__" or >>> "__numpy_dot__" and let themselves be overridden if found. Like how >>> __gt__ and __add__ and stuff work. Or something along those lines. >>> >>>> I also found this thread: >>>> http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html >>>> , but I think you want more than just to override ufuncs, you want >>>> numpy to govern when stuff is allowed to be lazy and when stuff should >>>> be evaluated (e.g. when it is indexed, slice assigned (although that >>>> itself may also be lazy), etc). You don't want some funny object back >>>> that doesn't work with things which are not overridden in numpy. >>> >>> My point is that probably numpy should *not* govern the decision about >>> what stuff should be lazy and what should be evaluated; that should be >>> governed by some combination of the user and >>> Numba/Theano/minivect/whatever. The toy API I sketched out would make >>> those decisions obvious and explicit. (And if the funny objects had an >>> __array_interface__ attribute that automatically forced evaluation >>> when accessed, then they'd work fine with code that was expecting an >>> array, or if they were assigned to a "real" ndarray, etc.) >> >> That's disappointing though, since the performance drawbacks can >> severely limit the usefulness for people with big data sets. Ideally, >> you would take your intuitive numpy code, and make it go fast, without >> jumping through hoops. Numpypy has lazy evaluation, ?I don't know how >> good a job it does, but it does mean you can finally get fast numpy >> code in an intuitive way (and even run it on a GPU if that is possible >> and beneficial). > > All of these proposals require the user to jump through hoops -- the > deferred-ufunc NEP has the extra 'with deferredstate' thing, and more > importantly, a set of rules that people have to learn and keep in mind > for which numpy operations are affected, which ones aren't, which > operations can't be performed while deferredstate is True, etc. So > this has two problems: (1) these rules are opaque, (2) it's far from > clear what the rules should be. Right, I guess I should have commented on that. I don't think the deferredstate stuff is needed at all, execution can always be deferred as long as it does not affect semantics. So if something is marked readonly because it is used in an expression and then written to, you evaluate the expression and then perform the write. The only way to break stuff, I think, would be to use pointers through the buffer interface or PyArray_DATA and not respect the sudden readonly property. A deferred expression is only evaluated once in any valid GIL-holding context (so it shouldn't break threads either). > If we (initially) implement > "deferredness" in a third-party library with an explicit > "deferredarray" type, then that works around both of them: it makes > the rules transparent (operations using that type are deferred, > operations using ndarray aren't), and gives you room to experiment > with different approaches without having to first accomplish some > major change in the numpy code base (and maybe get it wrong and have > to change it again later). That's what I meant when I said in my first > message that the more explicit API actually seemed like it would be > easier for people to use in the long run. Right, ok, that makes sense. I'm not sure how much more experimentation is needed though. Theano, for instance, is something that already does this stuff, it's just not as convenient to use as regular numpy code (for the cases that work in both, Theano does other stuff as well). > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From njs at pobox.com Tue Jun 5 17:29:41 2012 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 5 Jun 2012 22:29:41 +0100 Subject: [Numpy-discussion] lazy evaluation In-Reply-To: References: Message-ID: On Tue, Jun 5, 2012 at 9:47 PM, mark florisson wrote: > On 5 June 2012 20:17, Nathaniel Smith wrote: >> On Tue, Jun 5, 2012 at 7:08 PM, mark florisson >> wrote: >>> On 5 June 2012 17:38, Nathaniel Smith wrote: >>>> On Tue, Jun 5, 2012 at 4:12 PM, mark florisson >>>> wrote: >>>>> On 5 June 2012 14:58, Nathaniel Smith wrote: >>>>>> On Tue, Jun 5, 2012 at 12:55 PM, mark florisson >>>>>> wrote: >>>>>>> It would be great if we implement the NEP listed above, but with a few >>>>>>> extensions. I think Numpy should handle the lazy evaluation part, and >>>>>>> determine when expressions should be evaluated, etc. However, for each >>>>>>> user operation, Numpy will call back a user-installed hook >>>>>>> implementing some interface, to allow various packages to provide >>>>>>> their own hooks to evaluate vector operations however they want. This >>>>>>> will include packages such as Theano, which could run things on the >>>>>>> GPU, Numexpr, and in the future >>>>>>> https://github.com/markflorisson88/minivect (which will likely have an >>>>>>> LLVM backend in the future, and possibly integrated with Numba to >>>>>>> allow inlining of numba ufuncs). The project above tries to bring >>>>>>> together all the different array expression compilers together in a >>>>>>> single framework, to provide efficient array expressions specialized >>>>>>> for any data layout (nditer on steroids if you will, with SIMD, >>>>>>> threaded and inlining capabilities). >>>>>> >>>>>> A global hook sounds ugly and hard to control -- it's hard to tell >>>>>> which operations should be deferred and which should be forced, etc. >>>>> >>>>> Yes, but for the user the difference should not be visible (unless >>>>> operations can raise exceptions, in which case you choose the safe >>>>> path, or let the user configure what to do). >>>>> >>>>>> While it would be less magical, I think a more explicit API would in >>>>>> the end be easier to use... something like >>>>>> >>>>>> ?a, b, c, d = deferred([a, b, c, d]) >>>>>> ?e = a + b * c ?# 'e' is a deferred object too >>>>>> ?f = np.dot(e, d) ?# so is 'f' >>>>>> ?g = force(f) ?# 'g' is an ndarray >>>>>> ?# or >>>>>> ?force(f, out=g) >>>>>> >>>>>> But at that point, this could easily be an external library, right? >>>>>> All we'd need from numpy would be some way for external types to >>>>>> override the evaluation of ufuncs, np.dot, etc.? We've recently seen >>>>>> several reasons to want that functionality, and it seems like >>>>>> developing these "improved numexpr" ideas would be much easier if they >>>>>> didn't require doing deep surgery to numpy itself... >>>>> >>>>> Definitely, but besides monkey-patch-chaining I think some >>>>> modifications would be required, but they would be reasonably simple. >>>>> Most of the functionality would be handled in one function, which most >>>>> ufuncs (the ones you care about, as well as ufunc (methods) like add) >>>>> call. E.g. if ((result = NPy_LazyEval("add", op1, op2)) return result; >>>>> , which is inserted after argument unpacking and sanity checking. You >>>>> could also do a per-module hook, and have the function look at >>>>> sys._getframe(1).f_globals, but that is fragile and won't work from C >>>>> or Cython code. >>>>> >>>>> How did you have overrides in mind? >>>> >>>> My vague idea is that core numpy operations are about as fundamental >>>> for scientific users as the Python builtin operations are, so they >>>> should probably be overrideable in a similar way. So we'd teach numpy >>>> functions to check for methods named like "__numpy_ufunc__" or >>>> "__numpy_dot__" and let themselves be overridden if found. Like how >>>> __gt__ and __add__ and stuff work. Or something along those lines. >>>> >>>>> I also found this thread: >>>>> http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html >>>>> , but I think you want more than just to override ufuncs, you want >>>>> numpy to govern when stuff is allowed to be lazy and when stuff should >>>>> be evaluated (e.g. when it is indexed, slice assigned (although that >>>>> itself may also be lazy), etc). You don't want some funny object back >>>>> that doesn't work with things which are not overridden in numpy. >>>> >>>> My point is that probably numpy should *not* govern the decision about >>>> what stuff should be lazy and what should be evaluated; that should be >>>> governed by some combination of the user and >>>> Numba/Theano/minivect/whatever. The toy API I sketched out would make >>>> those decisions obvious and explicit. (And if the funny objects had an >>>> __array_interface__ attribute that automatically forced evaluation >>>> when accessed, then they'd work fine with code that was expecting an >>>> array, or if they were assigned to a "real" ndarray, etc.) >>> >>> That's disappointing though, since the performance drawbacks can >>> severely limit the usefulness for people with big data sets. Ideally, >>> you would take your intuitive numpy code, and make it go fast, without >>> jumping through hoops. Numpypy has lazy evaluation, ?I don't know how >>> good a job it does, but it does mean you can finally get fast numpy >>> code in an intuitive way (and even run it on a GPU if that is possible >>> and beneficial). >> >> All of these proposals require the user to jump through hoops -- the >> deferred-ufunc NEP has the extra 'with deferredstate' thing, and more >> importantly, a set of rules that people have to learn and keep in mind >> for which numpy operations are affected, which ones aren't, which >> operations can't be performed while deferredstate is True, etc. So >> this has two problems: (1) these rules are opaque, (2) it's far from >> clear what the rules should be. > > Right, I guess I should have commented on that. I don't think the > deferredstate stuff is needed at all, execution can always be deferred > as long as it does not affect semantics. So if something is marked > readonly because it is used in an expression and then written to, you > evaluate the expression and then perform the write. The only way to > break stuff, I think, would be to use pointers through the buffer > interface or PyArray_DATA and not respect the sudden readonly > property. A deferred expression is only evaluated once in any valid > GIL-holding context (so it shouldn't break threads either). I don't think you can get away with switching numpy to defer all operations by default. I just don't see how you could make it transparent. One obvious abstraction leak is that the readonly flag is never a reliable way to detect or prevent writes -- a = np.arange(10) b = a.view() a.flags.readonly = True assert a[0] == 0 b[0] = 1 assert a[0] == 1 Another would be that memory and CPU usage suddenly become very unpredicable -- def f(): a = np.zeros((2, 1000000)) return np.sum(a, axis=1) s = f() # 'a' has left scope, but is still pinned in memory # This operation allows the memory to be freed, but takes a ton of CPU time: print s[0] Another standard problem with such schemes is making sure that exceptions are raised at the correct place. (You can only defer operations if you can guarantee that they cannot fail.) The PyPy approach is the Right Thing, but it's very very difficult. I would either help them, or else try to find another approach that gives 90% of the benefit for 10% of the effort. That's just me though :-). >> If we (initially) implement >> "deferredness" in a third-party library with an explicit >> "deferredarray" type, then that works around both of them: it makes >> the rules transparent (operations using that type are deferred, >> operations using ndarray aren't), and gives you room to experiment >> with different approaches without having to first accomplish some >> major change in the numpy code base (and maybe get it wrong and have >> to change it again later). That's what I meant when I said in my first >> message that the more explicit API actually seemed like it would be >> easier for people to use in the long run. > > Right, ok, that makes sense. I'm not sure how much more > experimentation is needed though. Theano, for instance, is something > that already does this stuff, it's just not as convenient to use as > regular numpy code (for the cases that work in both, Theano does other > stuff as well). Does Theano have the same rules for what is deferred and what isn't that Numba and minivect do? Are you sure that the same hook interface will work for generating all of their internal representations? -n From d.s.seljebotn at astro.uio.no Tue Jun 5 17:36:11 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 05 Jun 2012 23:36:11 +0200 Subject: [Numpy-discussion] lazy evaluation In-Reply-To: References: Message-ID: <4FCE7BCB.1060900@astro.uio.no> On 06/05/2012 10:47 PM, mark florisson wrote: > On 5 June 2012 20:17, Nathaniel Smith wrote: >> On Tue, Jun 5, 2012 at 7:08 PM, mark florisson >> wrote: >>> On 5 June 2012 17:38, Nathaniel Smith wrote: >>>> On Tue, Jun 5, 2012 at 4:12 PM, mark florisson >>>> wrote: >>>>> On 5 June 2012 14:58, Nathaniel Smith wrote: >>>>>> On Tue, Jun 5, 2012 at 12:55 PM, mark florisson >>>>>> wrote: >>>>>>> It would be great if we implement the NEP listed above, but with a few >>>>>>> extensions. I think Numpy should handle the lazy evaluation part, and >>>>>>> determine when expressions should be evaluated, etc. However, for each >>>>>>> user operation, Numpy will call back a user-installed hook >>>>>>> implementing some interface, to allow various packages to provide >>>>>>> their own hooks to evaluate vector operations however they want. This >>>>>>> will include packages such as Theano, which could run things on the >>>>>>> GPU, Numexpr, and in the future >>>>>>> https://github.com/markflorisson88/minivect (which will likely have an >>>>>>> LLVM backend in the future, and possibly integrated with Numba to >>>>>>> allow inlining of numba ufuncs). The project above tries to bring >>>>>>> together all the different array expression compilers together in a >>>>>>> single framework, to provide efficient array expressions specialized >>>>>>> for any data layout (nditer on steroids if you will, with SIMD, >>>>>>> threaded and inlining capabilities). >>>>>> >>>>>> A global hook sounds ugly and hard to control -- it's hard to tell >>>>>> which operations should be deferred and which should be forced, etc. >>>>> >>>>> Yes, but for the user the difference should not be visible (unless >>>>> operations can raise exceptions, in which case you choose the safe >>>>> path, or let the user configure what to do). >>>>> >>>>>> While it would be less magical, I think a more explicit API would in >>>>>> the end be easier to use... something like >>>>>> >>>>>> a, b, c, d = deferred([a, b, c, d]) >>>>>> e = a + b * c # 'e' is a deferred object too >>>>>> f = np.dot(e, d) # so is 'f' >>>>>> g = force(f) # 'g' is an ndarray >>>>>> # or >>>>>> force(f, out=g) >>>>>> >>>>>> But at that point, this could easily be an external library, right? >>>>>> All we'd need from numpy would be some way for external types to >>>>>> override the evaluation of ufuncs, np.dot, etc.? We've recently seen >>>>>> several reasons to want that functionality, and it seems like >>>>>> developing these "improved numexpr" ideas would be much easier if they >>>>>> didn't require doing deep surgery to numpy itself... >>>>> >>>>> Definitely, but besides monkey-patch-chaining I think some >>>>> modifications would be required, but they would be reasonably simple. >>>>> Most of the functionality would be handled in one function, which most >>>>> ufuncs (the ones you care about, as well as ufunc (methods) like add) >>>>> call. E.g. if ((result = NPy_LazyEval("add", op1, op2)) return result; >>>>> , which is inserted after argument unpacking and sanity checking. You >>>>> could also do a per-module hook, and have the function look at >>>>> sys._getframe(1).f_globals, but that is fragile and won't work from C >>>>> or Cython code. >>>>> >>>>> How did you have overrides in mind? >>>> >>>> My vague idea is that core numpy operations are about as fundamental >>>> for scientific users as the Python builtin operations are, so they >>>> should probably be overrideable in a similar way. So we'd teach numpy >>>> functions to check for methods named like "__numpy_ufunc__" or >>>> "__numpy_dot__" and let themselves be overridden if found. Like how >>>> __gt__ and __add__ and stuff work. Or something along those lines. >>>> >>>>> I also found this thread: >>>>> http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html >>>>> , but I think you want more than just to override ufuncs, you want >>>>> numpy to govern when stuff is allowed to be lazy and when stuff should >>>>> be evaluated (e.g. when it is indexed, slice assigned (although that >>>>> itself may also be lazy), etc). You don't want some funny object back >>>>> that doesn't work with things which are not overridden in numpy. >>>> >>>> My point is that probably numpy should *not* govern the decision about >>>> what stuff should be lazy and what should be evaluated; that should be >>>> governed by some combination of the user and >>>> Numba/Theano/minivect/whatever. The toy API I sketched out would make >>>> those decisions obvious and explicit. (And if the funny objects had an >>>> __array_interface__ attribute that automatically forced evaluation >>>> when accessed, then they'd work fine with code that was expecting an >>>> array, or if they were assigned to a "real" ndarray, etc.) >>> >>> That's disappointing though, since the performance drawbacks can >>> severely limit the usefulness for people with big data sets. Ideally, >>> you would take your intuitive numpy code, and make it go fast, without >>> jumping through hoops. Numpypy has lazy evaluation, I don't know how >>> good a job it does, but it does mean you can finally get fast numpy >>> code in an intuitive way (and even run it on a GPU if that is possible >>> and beneficial). >> >> All of these proposals require the user to jump through hoops -- the >> deferred-ufunc NEP has the extra 'with deferredstate' thing, and more >> importantly, a set of rules that people have to learn and keep in mind >> for which numpy operations are affected, which ones aren't, which >> operations can't be performed while deferredstate is True, etc. So >> this has two problems: (1) these rules are opaque, (2) it's far from >> clear what the rules should be. > > Right, I guess I should have commented on that. I don't think the > deferredstate stuff is needed at all, execution can always be deferred > as long as it does not affect semantics. So if something is marked > readonly because it is used in an expression and then written to, you > evaluate the expression and then perform the write. The only way to > break stuff, I think, would be to use pointers through the buffer > interface or PyArray_DATA and not respect the sudden readonly > property. A deferred expression is only evaluated once in any valid > GIL-holding context (so it shouldn't break threads either). I think Nathaniel's point is that the point where you get a 10-second pause to wait for computation is part of the semantics of current NumPy: print 'Starting computation' z = (x + y).sum() print 'Computation done' print 'Result was', z I think that if this wasn't the case, newbies would be be tripped up a lot and things would feel a lot less intuitive. Certainly when working from the IPython command line. Also, to remain sane in IPython (or when using a debugger, etc.), I'd want "print z" to print something like "unevaluated array", not to trigger a computation. Same with str(z) and so on. I don't think a context manager modifying thread-local global state like with np.lazy: ... would be horribly intrusive. But I also think it'd be good to start with being very explicit (x = np.lazy_multiply(a, b); compute(x)) -- such an API should be available anyway -- and then have the discussion once that works. Dag From markflorisson88 at gmail.com Tue Jun 5 18:02:54 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 5 Jun 2012 23:02:54 +0100 Subject: [Numpy-discussion] lazy evaluation In-Reply-To: References: Message-ID: On 5 June 2012 22:29, Nathaniel Smith wrote: > On Tue, Jun 5, 2012 at 9:47 PM, mark florisson > wrote: >> On 5 June 2012 20:17, Nathaniel Smith wrote: >>> On Tue, Jun 5, 2012 at 7:08 PM, mark florisson >>> wrote: >>>> On 5 June 2012 17:38, Nathaniel Smith wrote: >>>>> On Tue, Jun 5, 2012 at 4:12 PM, mark florisson >>>>> wrote: >>>>>> On 5 June 2012 14:58, Nathaniel Smith wrote: >>>>>>> On Tue, Jun 5, 2012 at 12:55 PM, mark florisson >>>>>>> wrote: >>>>>>>> It would be great if we implement the NEP listed above, but with a few >>>>>>>> extensions. I think Numpy should handle the lazy evaluation part, and >>>>>>>> determine when expressions should be evaluated, etc. However, for each >>>>>>>> user operation, Numpy will call back a user-installed hook >>>>>>>> implementing some interface, to allow various packages to provide >>>>>>>> their own hooks to evaluate vector operations however they want. This >>>>>>>> will include packages such as Theano, which could run things on the >>>>>>>> GPU, Numexpr, and in the future >>>>>>>> https://github.com/markflorisson88/minivect (which will likely have an >>>>>>>> LLVM backend in the future, and possibly integrated with Numba to >>>>>>>> allow inlining of numba ufuncs). The project above tries to bring >>>>>>>> together all the different array expression compilers together in a >>>>>>>> single framework, to provide efficient array expressions specialized >>>>>>>> for any data layout (nditer on steroids if you will, with SIMD, >>>>>>>> threaded and inlining capabilities). >>>>>>> >>>>>>> A global hook sounds ugly and hard to control -- it's hard to tell >>>>>>> which operations should be deferred and which should be forced, etc. >>>>>> >>>>>> Yes, but for the user the difference should not be visible (unless >>>>>> operations can raise exceptions, in which case you choose the safe >>>>>> path, or let the user configure what to do). >>>>>> >>>>>>> While it would be less magical, I think a more explicit API would in >>>>>>> the end be easier to use... something like >>>>>>> >>>>>>> ?a, b, c, d = deferred([a, b, c, d]) >>>>>>> ?e = a + b * c ?# 'e' is a deferred object too >>>>>>> ?f = np.dot(e, d) ?# so is 'f' >>>>>>> ?g = force(f) ?# 'g' is an ndarray >>>>>>> ?# or >>>>>>> ?force(f, out=g) >>>>>>> >>>>>>> But at that point, this could easily be an external library, right? >>>>>>> All we'd need from numpy would be some way for external types to >>>>>>> override the evaluation of ufuncs, np.dot, etc.? We've recently seen >>>>>>> several reasons to want that functionality, and it seems like >>>>>>> developing these "improved numexpr" ideas would be much easier if they >>>>>>> didn't require doing deep surgery to numpy itself... >>>>>> >>>>>> Definitely, but besides monkey-patch-chaining I think some >>>>>> modifications would be required, but they would be reasonably simple. >>>>>> Most of the functionality would be handled in one function, which most >>>>>> ufuncs (the ones you care about, as well as ufunc (methods) like add) >>>>>> call. E.g. if ((result = NPy_LazyEval("add", op1, op2)) return result; >>>>>> , which is inserted after argument unpacking and sanity checking. You >>>>>> could also do a per-module hook, and have the function look at >>>>>> sys._getframe(1).f_globals, but that is fragile and won't work from C >>>>>> or Cython code. >>>>>> >>>>>> How did you have overrides in mind? >>>>> >>>>> My vague idea is that core numpy operations are about as fundamental >>>>> for scientific users as the Python builtin operations are, so they >>>>> should probably be overrideable in a similar way. So we'd teach numpy >>>>> functions to check for methods named like "__numpy_ufunc__" or >>>>> "__numpy_dot__" and let themselves be overridden if found. Like how >>>>> __gt__ and __add__ and stuff work. Or something along those lines. >>>>> >>>>>> I also found this thread: >>>>>> http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html >>>>>> , but I think you want more than just to override ufuncs, you want >>>>>> numpy to govern when stuff is allowed to be lazy and when stuff should >>>>>> be evaluated (e.g. when it is indexed, slice assigned (although that >>>>>> itself may also be lazy), etc). You don't want some funny object back >>>>>> that doesn't work with things which are not overridden in numpy. >>>>> >>>>> My point is that probably numpy should *not* govern the decision about >>>>> what stuff should be lazy and what should be evaluated; that should be >>>>> governed by some combination of the user and >>>>> Numba/Theano/minivect/whatever. The toy API I sketched out would make >>>>> those decisions obvious and explicit. (And if the funny objects had an >>>>> __array_interface__ attribute that automatically forced evaluation >>>>> when accessed, then they'd work fine with code that was expecting an >>>>> array, or if they were assigned to a "real" ndarray, etc.) >>>> >>>> That's disappointing though, since the performance drawbacks can >>>> severely limit the usefulness for people with big data sets. Ideally, >>>> you would take your intuitive numpy code, and make it go fast, without >>>> jumping through hoops. Numpypy has lazy evaluation, ?I don't know how >>>> good a job it does, but it does mean you can finally get fast numpy >>>> code in an intuitive way (and even run it on a GPU if that is possible >>>> and beneficial). >>> >>> All of these proposals require the user to jump through hoops -- the >>> deferred-ufunc NEP has the extra 'with deferredstate' thing, and more >>> importantly, a set of rules that people have to learn and keep in mind >>> for which numpy operations are affected, which ones aren't, which >>> operations can't be performed while deferredstate is True, etc. So >>> this has two problems: (1) these rules are opaque, (2) it's far from >>> clear what the rules should be. >> >> Right, I guess I should have commented on that. I don't think the >> deferredstate stuff is needed at all, execution can always be deferred >> as long as it does not affect semantics. So if something is marked >> readonly because it is used in an expression and then written to, you >> evaluate the expression and then perform the write. The only way to >> break stuff, I think, would be to use pointers through the buffer >> interface or PyArray_DATA and not respect the sudden readonly >> property. A deferred expression is only evaluated once in any valid >> GIL-holding context (so it shouldn't break threads either). > > I don't think you can get away with switching numpy to defer all > operations by default. I just don't see how you could make it > transparent. > > One obvious abstraction leak is that the readonly flag is never a > reliable way to detect or prevent writes -- > > ?a = np.arange(10) > ?b = a.view() > ?a.flags.readonly = True > ?assert a[0] == 0 > ?b[0] = 1 > ?assert a[0] == 1 Right, that's a good point, although arguably each view should share a data structure with its owner, which means you would mark a memory region as readonly instead of a view. > Another would be that memory and CPU usage suddenly become very unpredicable -- > > ?def f(): > ? ?a = np.zeros((2, 1000000)) > ? ?return np.sum(a, axis=1) > ?s = f() > ?# 'a' has left scope, but is still pinned in memory > ?# This operation allows the memory to be freed, but takes a ton of CPU time: > ?print s[0] Another valid point. I guess the expression evaluating library can govern the evaluation here, since it can see the shape of the operands and lazy result, and decide it's not worth to have the result be lazy. But your point stands, we cannot predict use cases that might break in some (subtle) way, so we can't switch to entirely lazy. On the other hand, if it works fine for numpypy, assuming the numpy C api is not an obstruction, we should also be able to make it work. > Another standard problem with such schemes is making sure that > exceptions are raised at the correct place. (You can only defer > operations if you can guarantee that they cannot fail.) > > The PyPy approach is the Right Thing, but it's very very difficult. I > would either help them, or else try to find another approach that > gives 90% of the benefit for 10% of the effort. That's just me though > :-). I don't know, it also binds you to that specific platform. What I'd like is to experiment with various approaches and choose the one that is the best or suits a set of use cases best. >>> If we (initially) implement >>> "deferredness" in a third-party library with an explicit >>> "deferredarray" type, then that works around both of them: it makes >>> the rules transparent (operations using that type are deferred, >>> operations using ndarray aren't), and gives you room to experiment >>> with different approaches without having to first accomplish some >>> major change in the numpy code base (and maybe get it wrong and have >>> to change it again later). That's what I meant when I said in my first >>> message that the more explicit API actually seemed like it would be >>> easier for people to use in the long run. >> >> Right, ok, that makes sense. I'm not sure how much more >> experimentation is needed though. Theano, for instance, is something >> that already does this stuff, it's just not as convenient to use as >> regular numpy code (for the cases that work in both, Theano does other >> stuff as well). > > Does Theano have the same rules for what is deferred and what isn't > that Numba and minivect do? Are you sure that the same hook interface > will work for generating all of their internal representations? Well, they are all different projects with different goals. Numba is to create ufuncs which numpy can then evaluate (at this point), Theano is always lazy until you say "build me a callable function" (which is not entirely just-in-time, but almost), and minivect is supposed to be used as a specializer and code generator for Cython and hopefully other projects such as numba and theano. But yes, it is possible to generate all these representations (at least numexpr, theano and minivect), since unsupported operations simply mean numpy evaluation, which is already supported. I believe numba goes straight from bytecode to llvm (correct me if I'm wrong), without an intermediate AST. > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From markflorisson88 at gmail.com Tue Jun 5 18:06:46 2012 From: markflorisson88 at gmail.com (mark florisson) Date: Tue, 5 Jun 2012 23:06:46 +0100 Subject: [Numpy-discussion] lazy evaluation In-Reply-To: <4FCE7BCB.1060900@astro.uio.no> References: <4FCE7BCB.1060900@astro.uio.no> Message-ID: On 5 June 2012 22:36, Dag Sverre Seljebotn wrote: > On 06/05/2012 10:47 PM, mark florisson wrote: >> On 5 June 2012 20:17, Nathaniel Smith ?wrote: >>> On Tue, Jun 5, 2012 at 7:08 PM, mark florisson >>> ?wrote: >>>> On 5 June 2012 17:38, Nathaniel Smith ?wrote: >>>>> On Tue, Jun 5, 2012 at 4:12 PM, mark florisson >>>>> ?wrote: >>>>>> On 5 June 2012 14:58, Nathaniel Smith ?wrote: >>>>>>> On Tue, Jun 5, 2012 at 12:55 PM, mark florisson >>>>>>> ?wrote: >>>>>>>> It would be great if we implement the NEP listed above, but with a few >>>>>>>> extensions. I think Numpy should handle the lazy evaluation part, and >>>>>>>> determine when expressions should be evaluated, etc. However, for each >>>>>>>> user operation, Numpy will call back a user-installed hook >>>>>>>> implementing some interface, to allow various packages to provide >>>>>>>> their own hooks to evaluate vector operations however they want. This >>>>>>>> will include packages such as Theano, which could run things on the >>>>>>>> GPU, Numexpr, and in the future >>>>>>>> https://github.com/markflorisson88/minivect (which will likely have an >>>>>>>> LLVM backend in the future, and possibly integrated with Numba to >>>>>>>> allow inlining of numba ufuncs). The project above tries to bring >>>>>>>> together all the different array expression compilers together in a >>>>>>>> single framework, to provide efficient array expressions specialized >>>>>>>> for any data layout (nditer on steroids if you will, with SIMD, >>>>>>>> threaded and inlining capabilities). >>>>>>> >>>>>>> A global hook sounds ugly and hard to control -- it's hard to tell >>>>>>> which operations should be deferred and which should be forced, etc. >>>>>> >>>>>> Yes, but for the user the difference should not be visible (unless >>>>>> operations can raise exceptions, in which case you choose the safe >>>>>> path, or let the user configure what to do). >>>>>> >>>>>>> While it would be less magical, I think a more explicit API would in >>>>>>> the end be easier to use... something like >>>>>>> >>>>>>> ? a, b, c, d = deferred([a, b, c, d]) >>>>>>> ? e = a + b * c ?# 'e' is a deferred object too >>>>>>> ? f = np.dot(e, d) ?# so is 'f' >>>>>>> ? g = force(f) ?# 'g' is an ndarray >>>>>>> ? # or >>>>>>> ? force(f, out=g) >>>>>>> >>>>>>> But at that point, this could easily be an external library, right? >>>>>>> All we'd need from numpy would be some way for external types to >>>>>>> override the evaluation of ufuncs, np.dot, etc.? We've recently seen >>>>>>> several reasons to want that functionality, and it seems like >>>>>>> developing these "improved numexpr" ideas would be much easier if they >>>>>>> didn't require doing deep surgery to numpy itself... >>>>>> >>>>>> Definitely, but besides monkey-patch-chaining I think some >>>>>> modifications would be required, but they would be reasonably simple. >>>>>> Most of the functionality would be handled in one function, which most >>>>>> ufuncs (the ones you care about, as well as ufunc (methods) like add) >>>>>> call. E.g. if ((result = NPy_LazyEval("add", op1, op2)) return result; >>>>>> , which is inserted after argument unpacking and sanity checking. You >>>>>> could also do a per-module hook, and have the function look at >>>>>> sys._getframe(1).f_globals, but that is fragile and won't work from C >>>>>> or Cython code. >>>>>> >>>>>> How did you have overrides in mind? >>>>> >>>>> My vague idea is that core numpy operations are about as fundamental >>>>> for scientific users as the Python builtin operations are, so they >>>>> should probably be overrideable in a similar way. So we'd teach numpy >>>>> functions to check for methods named like "__numpy_ufunc__" or >>>>> "__numpy_dot__" and let themselves be overridden if found. Like how >>>>> __gt__ and __add__ and stuff work. Or something along those lines. >>>>> >>>>>> I also found this thread: >>>>>> http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html >>>>>> , but I think you want more than just to override ufuncs, you want >>>>>> numpy to govern when stuff is allowed to be lazy and when stuff should >>>>>> be evaluated (e.g. when it is indexed, slice assigned (although that >>>>>> itself may also be lazy), etc). You don't want some funny object back >>>>>> that doesn't work with things which are not overridden in numpy. >>>>> >>>>> My point is that probably numpy should *not* govern the decision about >>>>> what stuff should be lazy and what should be evaluated; that should be >>>>> governed by some combination of the user and >>>>> Numba/Theano/minivect/whatever. The toy API I sketched out would make >>>>> those decisions obvious and explicit. (And if the funny objects had an >>>>> __array_interface__ attribute that automatically forced evaluation >>>>> when accessed, then they'd work fine with code that was expecting an >>>>> array, or if they were assigned to a "real" ndarray, etc.) >>>> >>>> That's disappointing though, since the performance drawbacks can >>>> severely limit the usefulness for people with big data sets. Ideally, >>>> you would take your intuitive numpy code, and make it go fast, without >>>> jumping through hoops. Numpypy has lazy evaluation, ?I don't know how >>>> good a job it does, but it does mean you can finally get fast numpy >>>> code in an intuitive way (and even run it on a GPU if that is possible >>>> and beneficial). >>> >>> All of these proposals require the user to jump through hoops -- the >>> deferred-ufunc NEP has the extra 'with deferredstate' thing, and more >>> importantly, a set of rules that people have to learn and keep in mind >>> for which numpy operations are affected, which ones aren't, which >>> operations can't be performed while deferredstate is True, etc. So >>> this has two problems: (1) these rules are opaque, (2) it's far from >>> clear what the rules should be. >> >> Right, I guess I should have commented on that. I don't think the >> deferredstate stuff is needed at all, execution can always be deferred >> as long as it does not affect semantics. So if something is marked >> readonly because it is used in an expression and then written to, you >> evaluate the expression and then perform the write. The only way to >> break stuff, I think, would be to use pointers through the buffer >> interface or PyArray_DATA and not respect the sudden readonly >> property. A deferred expression is only evaluated once in any valid >> GIL-holding context (so it shouldn't break threads either). > > I think Nathaniel's point is that the point where you get a 10-second > pause to wait for computation is part of the semantics of current NumPy: > > print 'Starting computation' > z = (x + y).sum() > print 'Computation done' > print 'Result was', z > > I think that if this wasn't the case, newbies would be be tripped up a > lot and things would feel a lot less intuitive. Certainly when working > from the IPython command line. > > Also, to remain sane in IPython (or when using a debugger, etc.), I'd want > > "print z" > > to print something like "unevaluated array", not to trigger a > computation. Same with str(z) and so on. I guess you could detect that at runtime, or just make it configurable. As for triggering computation somewhere else, I guess I find it preferable to horrible performance :) > I don't think a context manager modifying thread-local global state like > > with np.lazy: > ? ? ... > > would be horribly intrusive. > > But I also think it'd be good to start with being very explicit (x = > np.lazy_multiply(a, b); compute(x)) -- such an API should be available > anyway -- and then have the discussion once that works. Maybe that's the best way forward. I guess I'd prefer an import numpy.lazy_numpy as numpy in that case. I don't really like the with statement here, since ideally you'd just experiment with swapping in another module and see if your code still runs fine. > Dag > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From fperez.net at gmail.com Tue Jun 5 18:59:30 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 5 Jun 2012 15:59:30 -0700 Subject: [Numpy-discussion] commit rights for Nathaniel In-Reply-To: References: Message-ID: A couple of notes from the IPython workflow in case it's of use to you guys: On Tue, Jun 5, 2012 at 10:52 AM, Charles R Harris wrote: > > For the commits themselves, the github button doesn't do fast forward or > whitespace cleanup, so I have the following alias in .git/config > > getpatch = !sh -c 'git co -b pull-$1 master &&\ > ?????????? curl https://github.com/numpy/nump/pull/$1.patch|\ > ?????????? git am -3 --whitespace=strip' - > > which opens a new branch pull-nnn and is useful for the bigger commits so > they can be tested and then merged with master before pushing. The > non-trivial commits should be tested with at least Python 2.4, 2.7, and 3.2. > I also suggest running the one-file build for changes in core since most > developers do the separate file thing and sometimes fail to catch single > file build problems. 1) We've settled on using the green button rather than something like the above, because we decided that having the no-ff was actually a *good* thing (and yes, this reverses my initial opinion on the matter). The reasoning that convinced me was that the merge commit in itself is signal, not noise: - it indicates who did the final reviewing and merging (which doesn't happen in a ff merge b/c there's no separate merge commit) - it serves as a good place to cleanly summarize the PR itself, which could possibly contain many commits. It's the job and responsibility of the person doing the merge to understand the PR enough to explain it succinctly, so that one can read just that message and get a realistic idea of what the say 100 commits that went in were meant to do. These merge commits are the right thing to read when building release notes, instead of having to slog through the individual commits. - this way, the DAG's topology immediately shows what went in with review and what was committed without review (hopefully only small/trivial/emergency fixes). - even if the PR has a single commit, it's still OK to do this, as it marks the reviewer (and credits the reviewer as well, which is actual work). For all these reasons, I'm very happy that we reversed our policy and now *only* use the green button to merge, and *never* do a FF merge. We only commit directly to master in the case of absolutely trivial typo fixes or emergency 'my god master is borked' scenarios. 2) I'd encourage you to steal/improve our 'test_pr / post_pr_test' as well as git-mrb tools: https://github.com/ipython/ipython/blob/master/tools/test_pr.py https://github.com/ipython/ipython/blob/master/tools/post_pr_test.py https://github.com/ipython/ipython/blob/master/tools/git-mrb In particular test_pr is a *huge* help. We now almost never merge something that doesn't have a test_pr report. Here's an example where test_pr revealed initially problems, later fixed: https://github.com/ipython/ipython/pull/1847 Once the fix was confirmed, it was easy to merge. It routinely catches python3 errors we put in because most of the core devs don't use python3 regularly. But now I'm not worried about it anymore, as I know the problems will be caught before merging (I used to feel guilty for constantly breaking py3 and having poor Thomas Kluyver have to clean up my messes). Cheers, f From charlesr.harris at gmail.com Tue Jun 5 19:15:53 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 5 Jun 2012 17:15:53 -0600 Subject: [Numpy-discussion] commit rights for Nathaniel In-Reply-To: References: Message-ID: On Tue, Jun 5, 2012 at 4:59 PM, Fernando Perez wrote: > A couple of notes from the IPython workflow in case it's of use to you > guys: > > On Tue, Jun 5, 2012 at 10:52 AM, Charles R Harris > wrote: > > > > For the commits themselves, the github button doesn't do fast forward or > > whitespace cleanup, so I have the following alias in .git/config > > > > getpatch = !sh -c 'git co -b pull-$1 master &&\ > > curl https://github.com/numpy/nump/pull/$1.patch|\ > > git am -3 --whitespace=strip' - > > > > which opens a new branch pull-nnn and is useful for the bigger commits so > > they can be tested and then merged with master before pushing. The > > non-trivial commits should be tested with at least Python 2.4, 2.7, and > 3.2. > > I also suggest running the one-file build for changes in core since most > > developers do the separate file thing and sometimes fail to catch single > > file build problems. > > 1) We've settled on using the green button rather than something like > the above, because we decided that having the no-ff was actually a > *good* thing (and yes, this reverses my initial opinion on the > matter). The reasoning that convinced me was that the merge commit in > itself is signal, not noise: > > - it indicates who did the final reviewing and merging (which doesn't > happen in a ff merge b/c there's no separate merge commit) > > - it serves as a good place to cleanly summarize the PR itself, which > could possibly contain many commits. It's the job and responsibility > of the person doing the merge to understand the PR enough to explain > it succinctly, so that one can read just that message and get a > realistic idea of what the say 100 commits that went in were meant to > do. These merge commits are the right thing to read when building > release notes, instead of having to slog through the individual > commits. > > - this way, the DAG's topology immediately shows what went in with > review and what was committed without review (hopefully only > small/trivial/emergency fixes). > > - even if the PR has a single commit, it's still OK to do this, as it > marks the reviewer (and credits the reviewer as well, which is actual > work). > > For all these reasons, I'm very happy that we reversed our policy and > now *only* use the green button to merge, and *never* do a FF merge. > We only commit directly to master in the case of absolutely trivial > typo fixes or emergency 'my god master is borked' scenarios. > > 2) I'd encourage you to steal/improve our 'test_pr / post_pr_test' > as well as git-mrb tools: > > https://github.com/ipython/ipython/blob/master/tools/test_pr.py > https://github.com/ipython/ipython/blob/master/tools/post_pr_test.py > https://github.com/ipython/ipython/blob/master/tools/git-mrb > > In particular test_pr is a *huge* help. We now almost never merge > something that doesn't have a test_pr report. Here's an example where > test_pr revealed initially problems, later fixed: > > https://github.com/ipython/ipython/pull/1847 > > Once the fix was confirmed, it was easy to merge. It routinely > catches python3 errors we put in because most of the core devs don't > use python3 regularly. But now I'm not worried about it anymore, as I > know the problems will be caught before merging (I used to feel guilty > for constantly breaking py3 and having poor Thomas Kluyver have to > clean up my messes). > > There are other advantages to pulling down the patch. Fixups can be merged together, commit comments enhanced, whitespace removed, style cleanups can be added, tests can be run, and the PR is automatically rebased. I still like fast forward for single commit merges, for larger merges I specify no-ff so that things come in as a well defined chunk. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Tue Jun 5 19:22:20 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 5 Jun 2012 16:22:20 -0700 Subject: [Numpy-discussion] commit rights for Nathaniel In-Reply-To: References: Message-ID: On Tue, Jun 5, 2012 at 4:15 PM, Charles R Harris wrote: > There are other advantages to pulling down the patch. Fixups can be merged > together, commit comments enhanced, whitespace removed, style cleanups can > be added, tests can be run, and the PR is automatically rebased. I still > like fast forward for single commit merges, for larger merges I specify > no-ff so that things come in as a well defined chunk. Sure, that's a decision each project can take as it prefers: we've taken the approach that the person doing the merge does *not* massage the history as presented in the PR; instead we have submitters fix things up when deemed necessary (and we help them out a bit with git-fu if needed). And for single commit merges, we use the merge commit as topological evidence that there was review, which is very useful when looking retrospectively at the project. But each project must find how it best wants to proceed, I'm only offering our perspective in case any of it is useful for numpy. You guys will cherrypick the pieces that merge cleanly for numpy ;) Cheers, f From edcjones at comcast.net Tue Jun 5 19:37:51 2012 From: edcjones at comcast.net (Edward C. Jones) Date: Tue, 05 Jun 2012 19:37:51 -0400 Subject: [Numpy-discussion] numpy.clip behavior at max and min of dtypes Message-ID: <4FCE984F.50307@comcast.net> Can the following function be written using numpy.clip? In some other way? Does numpy.clip satisfy condition 4 below? Does numpy.clip satisfy some closely related condition? Define a function clipcast: output = clipcast(arr, dtype=None, out=None) 1. All arrays have int or float dtypes. 2. Exactly one of the keyword arguments "dtype" and "out" must be used. If "dtype" is given, then output has that dtype. 3. "output" has the same shape as "arr". 4. Let ER be the set of all the real numbers that can be exactly represented by the output dtype. ER is finite and bounded. Let themin = min(ER) and themax = max(ER). For any real number x, define a function f(x) by If x is in ER, define f(x) = x. If x is between two consecutive numbers, u and v, in ER, then define f(x) = u or f(x) = v. Probably the choice would be made using a C cast. If x < themin, define f(x) = themin. If x > themax, define f(x) = themax. If x is an element of arr, say Arr[I], then output[I] == f(x) where I is any index that defines a single element of arr. From travis at continuum.io Tue Jun 5 23:11:43 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 5 Jun 2012 22:11:43 -0500 Subject: [Numpy-discussion] Changes in PyArray_FromAny between 1.5.x and 1.6.x In-Reply-To: <78B85B5D-DB70-45E1-9A4F-C6DA47A2530F@yale.edu> References: <4FCD24CC.7090305@astro.uio.no> <99C221FE-6285-485E-9298-831E675C5A8B@yale.edu> <5C74C855-3949-4FDF-99DF-1411CC0330F3@yale.edu> <78B85B5D-DB70-45E1-9A4F-C6DA47A2530F@yale.edu> Message-ID: During the original discussion, Gael pointed out that the changes would probably break some code (which might need to be cleaned up but still). I think it was underestimated how quickly people would upgrade and see the changes and therefore be able to report problems. We are talking about a 1.7 release, but there are still people who have not upgraded their code to use 1.6 (when some of the big changes occurred). This should probably guide our view of how long it takes to migrate behavior in NumPy and minimize migration difficulties for users. -Travis On Jun 5, 2012, at 2:01 PM, Zachary Pincus wrote: >>> On Tue, Jun 5, 2012 at 8:41 PM, Zachary Pincus >>> wrote: >>>> >>>>> There is a fine line here. We do need to make people clean up lax code >>>>> in order to improve numpy, but hopefully we can keep the cleanups >>>>> reasonable. >>>> >>>> Oh agreed. Somehow, though, I was surprised by this, even though I keep >>>> tabs on the numpy lists -- at no point did it become clear that "big changes >>>> in how arrays get constructed and typecast are ahead that may require code >>>> fixes". That was my main point, but probably a PEBCAK issue more than >>>> anything. >>> >>> >>> It was fairly extensively discussed when introduced, >>> http://thread.gmane.org/gmane.comp.python.numeric.general/44206, and again >>> at some later point. >> >> Those are the not-yet-finalized changes in 1.7; Zachary (I think) is >> talking about problems upgrading from ~1.5 to 1.6. > > Yes, unless I'm wrong I experienced these problems from 1.5.something to 1.6.1. I didn't take notes as it was in the middle of a deadline-crunch so I just fixed the code and moved on (long, stupid story about why the upgrade before a deadline...). It's just that the issues mentioned above seem to have hit me too and I wanted to mention that. But unhelpfully, I think, without code, and now I've hijacked this thread! Sorry. > > Zach > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From travis at continuum.io Tue Jun 5 23:15:28 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 5 Jun 2012 22:15:28 -0500 Subject: [Numpy-discussion] Changes in PyArray_FromAny between 1.5.x and 1.6.x In-Reply-To: References: <4FCD24CC.7090305@astro.uio.no> Message-ID: > > I don't think that would work, because looking more closely, I don't > think they're actually doing anything like what > __array_interface__/PEP3118 are designed for. They just have some > custom class ("sage.rings.real_mpfr.RealLiteral", I guess an arbitrary > precision floating point of some sort?), and they want instances that > are passed to np.array() to be automatically coerced to another type > (float64) by default. But there's no buffer sharing or anything like > that going on at all. Mike, does that sound right? > > This automagic coercion seems... in very dubious taste to me. (Why > does creating an array object imply that you want to throw away > precision? You can already throw away precision explicitly by doing > np.array(f, dtype=float).) But if this automatic coercion feature is > useful, then wouldn't it be better to have a different interface > instead of kluging it into __array_interface__, like we should check > for an attribute called __numpy_preferred_dtype__ or something? Interesting. It does look like off-label use of the __array_interface__ attribute. Given that "array" used to query the __array_interface__ attribute for type discovery, I still wonder why it was disabled in 1.6? -Travis From ralf.gommers at googlemail.com Wed Jun 6 02:21:34 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 6 Jun 2012 08:21:34 +0200 Subject: [Numpy-discussion] Changes in PyArray_FromAny between 1.5.x and 1.6.x In-Reply-To: References: <4FCD24CC.7090305@astro.uio.no> <99C221FE-6285-485E-9298-831E675C5A8B@yale.edu> <5C74C855-3949-4FDF-99DF-1411CC0330F3@yale.edu> <78B85B5D-DB70-45E1-9A4F-C6DA47A2530F@yale.edu> Message-ID: On Wed, Jun 6, 2012 at 5:11 AM, Travis Oliphant wrote: > During the original discussion, Gael pointed out that the changes would > probably break some code (which might need to be cleaned up but still). I > think it was underestimated how quickly people would upgrade and see the > changes and therefore be able to report problems. > > You're making the same mistake I made above. This error occurs in 1.6.x, so before the proposed change to casting='same_kind'. That's not actually the default right now by the way, in both 1.6.2 and current master the default is 'safe'. In [3]: np.__version__ Out[3]: '1.7.0.dev-fd78546' In [4]: print np.can_cast.__doc__ can_cast(from, totype, casting = 'safe') Ralf We are talking about a 1.7 release, but there are still people who have not > upgraded their code to use 1.6 (when some of the big changes occurred). > > This should probably guide our view of how long it takes to migrate > behavior in NumPy and minimize migration difficulties for users. > > > -Travis > > > > On Jun 5, 2012, at 2:01 PM, Zachary Pincus wrote: > > >>> On Tue, Jun 5, 2012 at 8:41 PM, Zachary Pincus < > zachary.pincus at yale.edu> > >>> wrote: > >>>> > >>>>> There is a fine line here. We do need to make people clean up lax > code > >>>>> in order to improve numpy, but hopefully we can keep the cleanups > >>>>> reasonable. > >>>> > >>>> Oh agreed. Somehow, though, I was surprised by this, even though I > keep > >>>> tabs on the numpy lists -- at no point did it become clear that "big > changes > >>>> in how arrays get constructed and typecast are ahead that may require > code > >>>> fixes". That was my main point, but probably a PEBCAK issue more than > >>>> anything. > >>> > >>> > >>> It was fairly extensively discussed when introduced, > >>> http://thread.gmane.org/gmane.comp.python.numeric.general/44206, and > again > >>> at some later point. > >> > >> Those are the not-yet-finalized changes in 1.7; Zachary (I think) is > >> talking about problems upgrading from ~1.5 to 1.6. > > > > Yes, unless I'm wrong I experienced these problems from 1.5.something to > 1.6.1. I didn't take notes as it was in the middle of a deadline-crunch so > I just fixed the code and moved on (long, stupid story about why the > upgrade before a deadline...). It's just that the issues mentioned above > seem to have hit me too and I wanted to mention that. But unhelpfully, I > think, without code, and now I've hijacked this thread! Sorry. > > > > Zach > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsalvati at u.washington.edu Wed Jun 6 04:48:18 2012 From: jsalvati at u.washington.edu (John Salvatier) Date: Wed, 6 Jun 2012 01:48:18 -0700 Subject: [Numpy-discussion] Incrementing with advanced indexing: why don't repeated indexes repeatedly increment? Message-ID: Hello, I've noticed that If you try to increment elements of an array with advanced indexing, repeated indexes don't get repeatedly incremented. For example: In [30]: x = zeros(5) In [31]: idx = array([1,1,1,3,4]) In [32]: x[idx] += [2,4,8,10,30] In [33]: x Out[33]: array([ 0., 8., 0., 10., 30.]) I would intuitively expect the output to be array([0,14, 0,10,30]) since index 1 is incremented by 2+4+8=14, but instead it seems to only increment by 8. What is numpy actually doing here? The authors of Theano noticed this behavior a while ago so they python loop through the values in idx (this kind of calculation is necessary for calculating gradients), but this is a bit slow for my purposes, so I'd like to figure out how to get the behavior I expected, but faster. I'm also not sure how to navigate the numpy codebase, where would I look for the code responsible for this behavior? -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Wed Jun 6 05:22:28 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 06 Jun 2012 11:22:28 +0200 Subject: [Numpy-discussion] lazy evaluation In-Reply-To: References: <4FCE7BCB.1060900@astro.uio.no> Message-ID: <4FCF2154.7020803@astro.uio.no> On 06/06/2012 12:06 AM, mark florisson wrote: > On 5 June 2012 22:36, Dag Sverre Seljebotn wrote: >> On 06/05/2012 10:47 PM, mark florisson wrote: >>> On 5 June 2012 20:17, Nathaniel Smith wrote: >>>> On Tue, Jun 5, 2012 at 7:08 PM, mark florisson >>>> wrote: >>>>> On 5 June 2012 17:38, Nathaniel Smith wrote: >>>>>> On Tue, Jun 5, 2012 at 4:12 PM, mark florisson >>>>>> wrote: >>>>>>> On 5 June 2012 14:58, Nathaniel Smith wrote: >>>>>>>> On Tue, Jun 5, 2012 at 12:55 PM, mark florisson >>>>>>>> wrote: >>>>>>>>> It would be great if we implement the NEP listed above, but with a few >>>>>>>>> extensions. I think Numpy should handle the lazy evaluation part, and >>>>>>>>> determine when expressions should be evaluated, etc. However, for each >>>>>>>>> user operation, Numpy will call back a user-installed hook >>>>>>>>> implementing some interface, to allow various packages to provide >>>>>>>>> their own hooks to evaluate vector operations however they want. This >>>>>>>>> will include packages such as Theano, which could run things on the >>>>>>>>> GPU, Numexpr, and in the future >>>>>>>>> https://github.com/markflorisson88/minivect (which will likely have an >>>>>>>>> LLVM backend in the future, and possibly integrated with Numba to >>>>>>>>> allow inlining of numba ufuncs). The project above tries to bring >>>>>>>>> together all the different array expression compilers together in a >>>>>>>>> single framework, to provide efficient array expressions specialized >>>>>>>>> for any data layout (nditer on steroids if you will, with SIMD, >>>>>>>>> threaded and inlining capabilities). >>>>>>>> >>>>>>>> A global hook sounds ugly and hard to control -- it's hard to tell >>>>>>>> which operations should be deferred and which should be forced, etc. >>>>>>> >>>>>>> Yes, but for the user the difference should not be visible (unless >>>>>>> operations can raise exceptions, in which case you choose the safe >>>>>>> path, or let the user configure what to do). >>>>>>> >>>>>>>> While it would be less magical, I think a more explicit API would in >>>>>>>> the end be easier to use... something like >>>>>>>> >>>>>>>> a, b, c, d = deferred([a, b, c, d]) >>>>>>>> e = a + b * c # 'e' is a deferred object too >>>>>>>> f = np.dot(e, d) # so is 'f' >>>>>>>> g = force(f) # 'g' is an ndarray >>>>>>>> # or >>>>>>>> force(f, out=g) >>>>>>>> >>>>>>>> But at that point, this could easily be an external library, right? >>>>>>>> All we'd need from numpy would be some way for external types to >>>>>>>> override the evaluation of ufuncs, np.dot, etc.? We've recently seen >>>>>>>> several reasons to want that functionality, and it seems like >>>>>>>> developing these "improved numexpr" ideas would be much easier if they >>>>>>>> didn't require doing deep surgery to numpy itself... >>>>>>> >>>>>>> Definitely, but besides monkey-patch-chaining I think some >>>>>>> modifications would be required, but they would be reasonably simple. >>>>>>> Most of the functionality would be handled in one function, which most >>>>>>> ufuncs (the ones you care about, as well as ufunc (methods) like add) >>>>>>> call. E.g. if ((result = NPy_LazyEval("add", op1, op2)) return result; >>>>>>> , which is inserted after argument unpacking and sanity checking. You >>>>>>> could also do a per-module hook, and have the function look at >>>>>>> sys._getframe(1).f_globals, but that is fragile and won't work from C >>>>>>> or Cython code. >>>>>>> >>>>>>> How did you have overrides in mind? >>>>>> >>>>>> My vague idea is that core numpy operations are about as fundamental >>>>>> for scientific users as the Python builtin operations are, so they >>>>>> should probably be overrideable in a similar way. So we'd teach numpy >>>>>> functions to check for methods named like "__numpy_ufunc__" or >>>>>> "__numpy_dot__" and let themselves be overridden if found. Like how >>>>>> __gt__ and __add__ and stuff work. Or something along those lines. >>>>>> >>>>>>> I also found this thread: >>>>>>> http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html >>>>>>> , but I think you want more than just to override ufuncs, you want >>>>>>> numpy to govern when stuff is allowed to be lazy and when stuff should >>>>>>> be evaluated (e.g. when it is indexed, slice assigned (although that >>>>>>> itself may also be lazy), etc). You don't want some funny object back >>>>>>> that doesn't work with things which are not overridden in numpy. >>>>>> >>>>>> My point is that probably numpy should *not* govern the decision about >>>>>> what stuff should be lazy and what should be evaluated; that should be >>>>>> governed by some combination of the user and >>>>>> Numba/Theano/minivect/whatever. The toy API I sketched out would make >>>>>> those decisions obvious and explicit. (And if the funny objects had an >>>>>> __array_interface__ attribute that automatically forced evaluation >>>>>> when accessed, then they'd work fine with code that was expecting an >>>>>> array, or if they were assigned to a "real" ndarray, etc.) >>>>> >>>>> That's disappointing though, since the performance drawbacks can >>>>> severely limit the usefulness for people with big data sets. Ideally, >>>>> you would take your intuitive numpy code, and make it go fast, without >>>>> jumping through hoops. Numpypy has lazy evaluation, I don't know how >>>>> good a job it does, but it does mean you can finally get fast numpy >>>>> code in an intuitive way (and even run it on a GPU if that is possible >>>>> and beneficial). >>>> >>>> All of these proposals require the user to jump through hoops -- the >>>> deferred-ufunc NEP has the extra 'with deferredstate' thing, and more >>>> importantly, a set of rules that people have to learn and keep in mind >>>> for which numpy operations are affected, which ones aren't, which >>>> operations can't be performed while deferredstate is True, etc. So >>>> this has two problems: (1) these rules are opaque, (2) it's far from >>>> clear what the rules should be. >>> >>> Right, I guess I should have commented on that. I don't think the >>> deferredstate stuff is needed at all, execution can always be deferred >>> as long as it does not affect semantics. So if something is marked >>> readonly because it is used in an expression and then written to, you >>> evaluate the expression and then perform the write. The only way to >>> break stuff, I think, would be to use pointers through the buffer >>> interface or PyArray_DATA and not respect the sudden readonly >>> property. A deferred expression is only evaluated once in any valid >>> GIL-holding context (so it shouldn't break threads either). >> >> I think Nathaniel's point is that the point where you get a 10-second >> pause to wait for computation is part of the semantics of current NumPy: >> >> print 'Starting computation' >> z = (x + y).sum() >> print 'Computation done' >> print 'Result was', z >> >> I think that if this wasn't the case, newbies would be be tripped up a >> lot and things would feel a lot less intuitive. Certainly when working >> from the IPython command line. >> >> Also, to remain sane in IPython (or when using a debugger, etc.), I'd want >> >> "print z" >> >> to print something like "unevaluated array", not to trigger a >> computation. Same with str(z) and so on. > > I guess you could detect that at runtime, or just make it > configurable. As for triggering computation somewhere else, I guess I > find it preferable to horrible performance :) My problem might be that I don't use NumPy wherever I need performance (except as a glorified double*, i.e. I don't use it for computation). NumPy is for interactive "play with a reduced dataset" work. > >> I don't think a context manager modifying thread-local global state like >> >> with np.lazy: >> ... >> >> would be horribly intrusive. >> >> But I also think it'd be good to start with being very explicit (x = >> np.lazy_multiply(a, b); compute(x)) -- such an API should be available >> anyway -- and then have the discussion once that works. > > Maybe that's the best way forward. I guess I'd prefer an import > numpy.lazy_numpy as numpy in that case. I don't really like the with > statement here, since ideally you'd just experiment with swapping in > another module and see if your code still runs fine. Or just "import lazyarray as np". As I said, I think it's important to refactor NumPy so that things can happen outside of the NumPy project. NumPy needs to be very conservative. You've seen the recent NA semantics debate. If NumPy was to decide on *the* final blessed semantics for lazy evaluation, even as an experimental sub-module, you'd never see the end of it. One part of this is a polymorphic C API targeted for lazy evaluation and get current NumPy to support that. Another part is, as Nathaniel has commented, making things like "np.dot" have some kind of polymorphic dispatch-on-the-objects behaviour. (I'd like something based on multiple dispatch rather than just calling something on the left operand. Then use that multiple dispatch for implementing +, - and so on as well when you want anything to interact with NumPy arrays.) Dag From nouiz at nouiz.org Wed Jun 6 09:07:21 2012 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Wed, 6 Jun 2012 09:07:21 -0400 Subject: [Numpy-discussion] Incrementing with advanced indexing: why don't repeated indexes repeatedly increment? In-Reply-To: References: Message-ID: Hi, I get across the numpy.put[1] function. I'm not sure, but maybe it do what you want. My memory are fuzy about this and they don't tell about this in the doc of this function. Fred [1] http://docs.scipy.org/doc/numpy/reference/generated/numpy.put.html On Wed, Jun 6, 2012 at 4:48 AM, John Salvatier wrote: > Hello, > > I've noticed that If you try to increment elements of an array with advanced > indexing, repeated indexes don't get repeatedly incremented. For example: > > In [30]: x = zeros(5) > > In [31]: idx = array([1,1,1,3,4]) > > In [32]: x[idx] += [2,4,8,10,30] > > In [33]: x > Out[33]: array([ ?0., ? 8., ? 0., ?10., ?30.]) > > I would intuitively expect the output to be array([0,14, 0,10,30]) since > index 1 is incremented by 2+4+8=14, but instead it seems to only increment > by 8. What is numpy actually doing here? > > The authors of Theano noticed this behavior a while ago so they python loop > through the values in idx (this kind of calculation is necessary for > calculating gradients), but this is a bit slow for my purposes, so I'd like > to figure out how to get the behavior I expected, but faster. > > I'm also not sure how to navigate the numpy codebase, where would I look for > the code responsible for this behavior? > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From ben.root at ou.edu Wed Jun 6 09:08:43 2012 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 6 Jun 2012 09:08:43 -0400 Subject: [Numpy-discussion] boolean indexing of structured arrays Message-ID: Not sure if this is a bug or not. I am using a fairly recent master branch. >>> # Setting up... >>> import numpy as np >>> a = np.zeros((10, 1), dtype=[('foo', 'f4'), ('bar', 'f4'), ('spam', 'f4')]) >>> a['foo'] = np.random.random((10, 1)) >>> a['bar'] = np.random.random((10, 1)) >>> a['spam'] = np.random.random((10, 1)) >>> a array([[(0.8748096823692322, 0.08278043568134308, 0.2463584989309311)], [(0.27129432559013367, 0.9645473957061768, 0.41787904500961304)], [(0.4902191460132599, 0.6772263646125793, 0.07460898905992508)], [(0.13542482256889343, 0.8646988868713379, 0.98673015832901)], [(0.6527929902076721, 0.7392181754112244, 0.5919206738471985)], [(0.11248272657394409, 0.5818713903427124, 0.9287213087081909)], [(0.47561103105545044, 0.48848700523376465, 0.7108170390129089)], [(0.47087424993515015, 0.6080209016799927, 0.6583810448646545)], [(0.08447299897670746, 0.39479559659957886, 0.13520188629627228)], [(0.7074970006942749, 0.8426893353462219, 0.19329732656478882)]], dtype=[('foo', '>> b = (a['bar'] > 0.4) >>> b array([[False], [ True], [ True], [ True], [ True], [ True], [ True], [ True], [False], [ True]], dtype=bool) >>> # ---- Boolean indexing of structured array with a (10,1) boolean array ---- >>> a[b]['foo'] array([ 0.27129433, 0.49021915, 0.13542482, 0.65279299, 0.11248273, 0.47561103, 0.47087425, 0.707497 ], dtype=float32) >>> # ---- Boolean indexing of structured array with a (10,) boolean array ---- >>> a[b[:,0]]['foo'] array([[(0.27129432559013367, 0.9645473957061768, 0.41787904500961304)], [(0.4902191460132599, 0.6772263646125793, 0.07460898905992508)], [(0.13542482256889343, 0.8646988868713379, 0.98673015832901)], [(0.6527929902076721, 0.7392181754112244, 0.5919206738471985)], [(0.11248272657394409, 0.5818713903427124, 0.9287213087081909)], [(0.47561103105545044, 0.48848700523376465, 0.7108170390129089)], [(0.47087424993515015, 0.6080209016799927, 0.6583810448646545)], [(0.7074970006942749, 0.8426893353462219, 0.19329732656478882)]], dtype=[('foo', ' From jsalvati at u.washington.edu Wed Jun 6 10:45:58 2012 From: jsalvati at u.washington.edu (John Salvatier) Date: Wed, 6 Jun 2012 07:45:58 -0700 Subject: [Numpy-discussion] Incrementing with advanced indexing: why don't repeated indexes repeatedly increment? In-Reply-To: References: Message-ID: Thank you for the suggestion, but it looks like that has the same behavior too: In [43]: x = zeros(5) In [44]: idx = array([1,1,1,3,4]) In [45]: put(x,idx, [2,4,8,10,30]) In [46]: x Out[46]: array([ 0., 8., 0., 10., 30.]) On Wed, Jun 6, 2012 at 6:07 AM, Fr?d?ric Bastien wrote: > Hi, > > I get across the numpy.put[1] function. I'm not sure, but maybe it do > what you want. My memory are fuzy about this and they don't tell about > this in the doc of this function. > > Fred > > > [1] http://docs.scipy.org/doc/numpy/reference/generated/numpy.put.html > > On Wed, Jun 6, 2012 at 4:48 AM, John Salvatier > wrote: > > Hello, > > > > I've noticed that If you try to increment elements of an array with > advanced > > indexing, repeated indexes don't get repeatedly incremented. For example: > > > > In [30]: x = zeros(5) > > > > In [31]: idx = array([1,1,1,3,4]) > > > > In [32]: x[idx] += [2,4,8,10,30] > > > > In [33]: x > > Out[33]: array([ 0., 8., 0., 10., 30.]) > > > > I would intuitively expect the output to be array([0,14, 0,10,30]) since > > index 1 is incremented by 2+4+8=14, but instead it seems to only > increment > > by 8. What is numpy actually doing here? > > > > The authors of Theano noticed this behavior a while ago so they python > loop > > through the values in idx (this kind of calculation is necessary for > > calculating gradients), but this is a bit slow for my purposes, so I'd > like > > to figure out how to get the behavior I expected, but faster. > > > > I'm also not sure how to navigate the numpy codebase, where would I look > for > > the code responsible for this behavior? > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Jun 6 11:06:28 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 6 Jun 2012 16:06:28 +0100 Subject: [Numpy-discussion] Incrementing with advanced indexing: why don't repeated indexes repeatedly increment? In-Reply-To: References: Message-ID: On Wed, Jun 6, 2012 at 9:48 AM, John Salvatier wrote: > Hello, > > I've noticed that If you try to increment elements of an array with advanced > indexing, repeated indexes don't get repeatedly incremented. For example: > > In [30]: x = zeros(5) > > In [31]: idx = array([1,1,1,3,4]) > > In [32]: x[idx] += [2,4,8,10,30] > > In [33]: x > Out[33]: array([ ?0., ? 8., ? 0., ?10., ?30.]) > > I would intuitively expect the output to be array([0,14, 0,10,30]) since > index 1 is incremented by 2+4+8=14, but instead it seems to only increment > by 8. What is numpy actually doing here? > > The authors of Theano noticed this behavior a while ago so they python loop > through the values in idx (this kind of calculation is necessary for > calculating gradients), but this is a bit slow for my purposes, so I'd like > to figure out how to get the behavior I expected, but faster. > > I'm also not sure how to navigate the numpy codebase, where would I look for > the code responsible for this behavior? Strictly speaking, it isn't actually in the numpy codebase at all -- what's happening is that the Python interpreter sees this code: x[idx] += vals and then it translates it into this code before running it: tmp = x.__getitem__(idx) tmp = tmp.__iadd__(vals) x.__setitem__(idx, tmp) So you can find the implementations of the ndarray methods __getitem__, __iadd__, __setitem__ (they're called array_subscript_nice, array_inplace_add, and array_ass_sub in the C code), but there's no way to fix them so that this works the way you want it to, because there's no way for __iadd__ to know that the temporary values that it's working with are really duplicate copies of "the same" value in the original array. It would be nice if numpy had some sort of standard API for doing what you want, but not sure what a good API would look like, and someone would have to implement it. -n From cimrman3 at ntc.zcu.cz Wed Jun 6 11:30:26 2012 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Wed, 06 Jun 2012 17:30:26 +0200 Subject: [Numpy-discussion] Incrementing with advanced indexing: why don't repeated indexes repeatedly increment? In-Reply-To: References: Message-ID: <4FCF7792.7050706@ntc.zcu.cz> On 06/06/2012 05:06 PM, Nathaniel Smith wrote: > On Wed, Jun 6, 2012 at 9:48 AM, John Salvatier > wrote: >> Hello, >> >> I've noticed that If you try to increment elements of an array with advanced >> indexing, repeated indexes don't get repeatedly incremented. For example: >> >> In [30]: x = zeros(5) >> >> In [31]: idx = array([1,1,1,3,4]) >> >> In [32]: x[idx] += [2,4,8,10,30] >> >> In [33]: x >> Out[33]: array([ 0., 8., 0., 10., 30.]) >> >> I would intuitively expect the output to be array([0,14, 0,10,30]) since >> index 1 is incremented by 2+4+8=14, but instead it seems to only increment >> by 8. What is numpy actually doing here? >> >> The authors of Theano noticed this behavior a while ago so they python loop >> through the values in idx (this kind of calculation is necessary for >> calculating gradients), but this is a bit slow for my purposes, so I'd like >> to figure out how to get the behavior I expected, but faster. >> >> I'm also not sure how to navigate the numpy codebase, where would I look for >> the code responsible for this behavior? > > Strictly speaking, it isn't actually in the numpy codebase at all -- > what's happening is that the Python interpreter sees this code: > > x[idx] += vals > > and then it translates it into this code before running it: > > tmp = x.__getitem__(idx) > tmp = tmp.__iadd__(vals) > x.__setitem__(idx, tmp) > > So you can find the implementations of the ndarray methods > __getitem__, __iadd__, __setitem__ (they're called > array_subscript_nice, array_inplace_add, and array_ass_sub in the C > code), but there's no way to fix them so that this works the way you > want it to, because there's no way for __iadd__ to know that the > temporary values that it's working with are really duplicate copies of > "the same" value in the original array. > > It would be nice if numpy had some sort of standard API for doing what > you want, but not sure what a good API would look like, and someone > would have to implement it. This operation is also heavily used for the finite element assembling, and a similar question has been raised already several times (e.g. http://old.nabble.com/How-to-assemble-large-sparse-matrices-effectively-td33833855.html). So why not adding a function np.assemble()? r. From njs at pobox.com Wed Jun 6 11:34:45 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 6 Jun 2012 16:34:45 +0100 Subject: [Numpy-discussion] Incrementing with advanced indexing: why don't repeated indexes repeatedly increment? In-Reply-To: <4FCF7792.7050706@ntc.zcu.cz> References: <4FCF7792.7050706@ntc.zcu.cz> Message-ID: On Wed, Jun 6, 2012 at 4:30 PM, Robert Cimrman wrote: > On 06/06/2012 05:06 PM, Nathaniel Smith wrote: >> On Wed, Jun 6, 2012 at 9:48 AM, John Salvatier >> ?wrote: >>> Hello, >>> >>> I've noticed that If you try to increment elements of an array with advanced >>> indexing, repeated indexes don't get repeatedly incremented. For example: >>> >>> In [30]: x = zeros(5) >>> >>> In [31]: idx = array([1,1,1,3,4]) >>> >>> In [32]: x[idx] += [2,4,8,10,30] >>> >>> In [33]: x >>> Out[33]: array([ ?0., ? 8., ? 0., ?10., ?30.]) >>> >>> I would intuitively expect the output to be array([0,14, 0,10,30]) since >>> index 1 is incremented by 2+4+8=14, but instead it seems to only increment >>> by 8. What is numpy actually doing here? >>> >>> The authors of Theano noticed this behavior a while ago so they python loop >>> through the values in idx (this kind of calculation is necessary for >>> calculating gradients), but this is a bit slow for my purposes, so I'd like >>> to figure out how to get the behavior I expected, but faster. >>> >>> I'm also not sure how to navigate the numpy codebase, where would I look for >>> the code responsible for this behavior? >> >> Strictly speaking, it isn't actually in the numpy codebase at all -- >> what's happening is that the Python interpreter sees this code: >> >> ? ?x[idx] += vals >> >> and then it translates it into this code before running it: >> >> ? ?tmp = x.__getitem__(idx) >> ? ?tmp = tmp.__iadd__(vals) >> ? ?x.__setitem__(idx, tmp) >> >> So you can find the implementations of the ndarray methods >> __getitem__, __iadd__, __setitem__ (they're called >> array_subscript_nice, array_inplace_add, and array_ass_sub in the C >> code), but there's no way to fix them so that this works the way you >> want it to, because there's no way for __iadd__ to know that the >> temporary values that it's working with are really duplicate copies of >> "the same" value in the original array. >> >> It would be nice if numpy had some sort of standard API for doing what >> you want, but not sure what a good API would look like, and someone >> would have to implement it. > > This operation is also heavily used for the finite element assembling, and a > similar question has been raised already several times (e.g. > http://old.nabble.com/How-to-assemble-large-sparse-matrices-effectively-td33833855.html). > So why not adding a function np.assemble()? I read that message, but I don't see what it has to do with this discussion? It seemed to be about fast ways to assign dense matrices into sparse matrices, not fast ways of applying in-place arithmetic to specific spots in a dense matrix. -n From cimrman3 at ntc.zcu.cz Wed Jun 6 11:52:13 2012 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Wed, 06 Jun 2012 17:52:13 +0200 Subject: [Numpy-discussion] Incrementing with advanced indexing: why don't repeated indexes repeatedly increment? In-Reply-To: References: <4FCF7792.7050706@ntc.zcu.cz> Message-ID: <4FCF7CAD.10807@ntc.zcu.cz> On 06/06/2012 05:34 PM, Nathaniel Smith wrote: > On Wed, Jun 6, 2012 at 4:30 PM, Robert Cimrman wrote: >> On 06/06/2012 05:06 PM, Nathaniel Smith wrote: >>> On Wed, Jun 6, 2012 at 9:48 AM, John Salvatier >>> wrote: >>>> Hello, >>>> >>>> I've noticed that If you try to increment elements of an array with advanced >>>> indexing, repeated indexes don't get repeatedly incremented. For example: >>>> >>>> In [30]: x = zeros(5) >>>> >>>> In [31]: idx = array([1,1,1,3,4]) >>>> >>>> In [32]: x[idx] += [2,4,8,10,30] >>>> >>>> In [33]: x >>>> Out[33]: array([ 0., 8., 0., 10., 30.]) >>>> >>>> I would intuitively expect the output to be array([0,14, 0,10,30]) since >>>> index 1 is incremented by 2+4+8=14, but instead it seems to only increment >>>> by 8. What is numpy actually doing here? >>>> >>>> The authors of Theano noticed this behavior a while ago so they python loop >>>> through the values in idx (this kind of calculation is necessary for >>>> calculating gradients), but this is a bit slow for my purposes, so I'd like >>>> to figure out how to get the behavior I expected, but faster. >>>> >>>> I'm also not sure how to navigate the numpy codebase, where would I look for >>>> the code responsible for this behavior? >>> >>> Strictly speaking, it isn't actually in the numpy codebase at all -- >>> what's happening is that the Python interpreter sees this code: >>> >>> x[idx] += vals >>> >>> and then it translates it into this code before running it: >>> >>> tmp = x.__getitem__(idx) >>> tmp = tmp.__iadd__(vals) >>> x.__setitem__(idx, tmp) >>> >>> So you can find the implementations of the ndarray methods >>> __getitem__, __iadd__, __setitem__ (they're called >>> array_subscript_nice, array_inplace_add, and array_ass_sub in the C >>> code), but there's no way to fix them so that this works the way you >>> want it to, because there's no way for __iadd__ to know that the >>> temporary values that it's working with are really duplicate copies of >>> "the same" value in the original array. >>> >>> It would be nice if numpy had some sort of standard API for doing what >>> you want, but not sure what a good API would look like, and someone >>> would have to implement it. >> >> This operation is also heavily used for the finite element assembling, and a >> similar question has been raised already several times (e.g. >> http://old.nabble.com/How-to-assemble-large-sparse-matrices-effectively-td33833855.html). >> So why not adding a function np.assemble()? > > I read that message, but I don't see what it has to do with this > discussion? It seemed to be about fast ways to assign dense matrices > into sparse matrices, not fast ways of applying in-place arithmetic to > specific spots in a dense matrix. Yes (in that thread), but it applies also adding/assembling vectors into a global vector - this is just x[idx] += vals. I linked that discussion as that was recent enough for me to recall it, but there were other. Anyway, my point was that a having a function with the "adding" semantics in NumPy would be handy. r. From robert.kern at gmail.com Wed Jun 6 12:35:28 2012 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 6 Jun 2012 17:35:28 +0100 Subject: [Numpy-discussion] Incrementing with advanced indexing: why don't repeated indexes repeatedly increment? In-Reply-To: <4FCF7CAD.10807@ntc.zcu.cz> References: <4FCF7792.7050706@ntc.zcu.cz> <4FCF7CAD.10807@ntc.zcu.cz> Message-ID: On Wed, Jun 6, 2012 at 4:52 PM, Robert Cimrman wrote: > Yes (in that thread), but it applies also adding/assembling vectors into a > global vector - this is just x[idx] += vals. I linked that discussion as that > was recent enough for me to recall it, but there were other. > > Anyway, my point was that a having a function with the "adding" semantics in > NumPy would be handy. x += numpy.bincount(idx, vals, minlength=len(x)) -- Robert Kern From jsalvati at u.washington.edu Wed Jun 6 12:48:57 2012 From: jsalvati at u.washington.edu (John Salvatier) Date: Wed, 6 Jun 2012 09:48:57 -0700 Subject: [Numpy-discussion] Incrementing with advanced indexing: why don't repeated indexes repeatedly increment? In-Reply-To: References: <4FCF7792.7050706@ntc.zcu.cz> <4FCF7CAD.10807@ntc.zcu.cz> Message-ID: That does seem like it should work well if len(unique(idx)) is close to len(x). Thanks! On Wed, Jun 6, 2012 at 9:35 AM, Robert Kern wrote: > On Wed, Jun 6, 2012 at 4:52 PM, Robert Cimrman > wrote: > > > Yes (in that thread), but it applies also adding/assembling vectors into > a > > global vector - this is just x[idx] += vals. I linked that discussion as > that > > was recent enough for me to recall it, but there were other. > > > > Anyway, my point was that a having a function with the "adding" > semantics in > > NumPy would be handy. > > x += numpy.bincount(idx, vals, minlength=len(x)) > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cimrman3 at ntc.zcu.cz Wed Jun 6 13:14:32 2012 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Wed, 06 Jun 2012 19:14:32 +0200 Subject: [Numpy-discussion] Incrementing with advanced indexing: why don't repeated indexes repeatedly increment? In-Reply-To: References: <4FCF7792.7050706@ntc.zcu.cz> <4FCF7CAD.10807@ntc.zcu.cz> Message-ID: <4FCF8FF8.2010002@ntc.zcu.cz> On 06/06/2012 06:35 PM, Robert Kern wrote: > On Wed, Jun 6, 2012 at 4:52 PM, Robert Cimrman wrote: > >> Yes (in that thread), but it applies also adding/assembling vectors into a >> global vector - this is just x[idx] += vals. I linked that discussion as that >> was recent enough for me to recall it, but there were other. >> >> Anyway, my point was that a having a function with the "adding" semantics in >> NumPy would be handy. > > x += numpy.bincount(idx, vals, minlength=len(x)) > Nice! Looking at the C source, it seems it should be pretty efficient for this task. r. From njs at pobox.com Wed Jun 6 18:08:38 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 6 Jun 2012 23:08:38 +0100 Subject: [Numpy-discussion] Pull request: Split maskna support out of mainline into a branch Message-ID: Just submitted this pull request for discussion: https://github.com/numpy/numpy/pull/297 As per earlier discussion on the list, this PR attempts to remove exactly and only the maskna-related code from numpy mainline: http://mail.scipy.org/pipermail/numpy-discussion/2012-May/062417.html The suggestion is that we merge this to master for the 1.7 release, and immediately "git revert" it on a branch so that it can be modified further without blocking the release. The first patch does the actual maskna removal; the second and third rearrange things so that PyArray_ReduceWrapper does not end up in the public API, for reasons described therein. All tests pass with Python 2.4, 2.5, 2.6, 2.7, 3.1, 3.2 on 64-bit Ubuntu. The docs also appear to build. Before I re-based this I also tested against Scipy, matplotlib, and pandas, and all were fine. -- Nathaniel From edcjones at comcast.net Wed Jun 6 22:17:32 2012 From: edcjones at comcast.net (Edward C. Jones) Date: Wed, 06 Jun 2012 22:17:32 -0400 Subject: [Numpy-discussion] Are "min", "max" documented for scalars? Message-ID: <4FD00F3C.9040102@comcast.net> Python "max" and "min" have an interesting and _useful_ behavior when applied to numpy scalars and Python numbers. Here is a piece of pseudo-code: def max(a, b): if int(b) > int(a): return b else: return a The larger object is returned unchanged. If the two objects are equal, return the first unchanged. Is the behavior of "max", "min", "<", "<=", etc. for numpy scalar objects documented somewhere? keywords: greater than less than From thouis at gmail.com Thu Jun 7 04:30:33 2012 From: thouis at gmail.com (Thouis (Ray) Jones) Date: Thu, 7 Jun 2012 10:30:33 +0200 Subject: [Numpy-discussion] better error message possible? In-Reply-To: References: <4FC88F5F.3060303@simplistix.co.uk> <4FC8F442.8040100@simplistix.co.uk> Message-ID: I've opened a PR at https://github.com/numpy/numpy/pull/296 for discussion. A typical result >>> np.zeros((3,3))[[1,2,3]] Traceback (most recent call last): File "", line 1, in IndexError: index 3 is out of bounds for axis 0: [-3,3) Ray Jones From lists at hilboll.de Thu Jun 7 05:24:42 2012 From: lists at hilboll.de (Andreas Hilboll) Date: Thu, 7 Jun 2012 11:24:42 +0200 Subject: [Numpy-discussion] Ubuntu PPA for NumPy / SciPy / ... Message-ID: Hi, I just noticed that there's a PPA for NumPy/SciPy on Launchpad: https://launchpad.net/~scipy/+archive/ppa However, it's painfully outdated. Does anyone know of its status? Is it 'official'? Are there any plans in revitalizing it, possibly with adding other projects from the "scipy universe"? Is there help needed? Many questions, but possibly quite easy to answer ... Cheers, Andreas. From paul.anton.letnes at gmail.com Thu Jun 7 05:32:53 2012 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Thu, 7 Jun 2012 11:32:53 +0200 Subject: [Numpy-discussion] better error message possible? In-Reply-To: References: <4FC88F5F.3060303@simplistix.co.uk> <4FC8F442.8040100@simplistix.co.uk> Message-ID: <3EC9C3D6-BC30-45C9-9DC9-49818A95C2CE@gmail.com> On 7. juni 2012, at 10:30, Thouis (Ray) Jones wrote: > I've opened a PR at https://github.com/numpy/numpy/pull/296 for discussion. > > A typical result > >>>> np.zeros((3,3))[[1,2,3]] > Traceback (most recent call last): > File "", line 1, in > IndexError: index 3 is out of bounds for axis 0: [-3,3) > > Ray Jones > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion I would prefer: IndexError: index 3 is out of bounds for axis 0: [-3,2] as I find the 3) notation a bit weird - after all, indices are not floats, so 2.999 or 2.3 doesn't make sense as an index. An alternative is to not refer to negative indices and say IndexError: index 3 is out of bounds for axis 0, shape was: (3,) (print more axes when the number of axes is higher.) BTW, I'm really glad someone is taking an interest in these error messages, it's a great idea! Paul From dave.hirschfeld at gmail.com Thu Jun 7 05:44:24 2012 From: dave.hirschfeld at gmail.com (Dave Hirschfeld) Date: Thu, 7 Jun 2012 09:44:24 +0000 (UTC) Subject: [Numpy-discussion] better error message possible? References: <4FC88F5F.3060303@simplistix.co.uk> <4FC8F442.8040100@simplistix.co.uk> <3EC9C3D6-BC30-45C9-9DC9-49818A95C2CE@gmail.com> Message-ID: Paul Anton Letnes gmail.com> writes: > I would prefer: > IndexError: index 3 is out of bounds for axis 0: [-3,2] > as I find the 3) notation a bit weird - after all, indices are not floats, so 2.999 or 2.3 doesn't make sense as > an index. > > An alternative is to not refer to negative indices and say > IndexError: index 3 is out of bounds for axis 0, shape was: (3,) > (print more axes when the number of axes is higher.) > +1 for the latter suggestion - if the array shape is available it's a great help in debugging the error. An alternative wording could be: IndexError: Index 3 is out of bounds for axis 0 of an array of shape (3,2,4) -Dave From thouis at gmail.com Thu Jun 7 06:36:12 2012 From: thouis at gmail.com (Thouis (Ray) Jones) Date: Thu, 7 Jun 2012 12:36:12 +0200 Subject: [Numpy-discussion] better error message possible? In-Reply-To: References: <4FC88F5F.3060303@simplistix.co.uk> <4FC8F442.8040100@simplistix.co.uk> <3EC9C3D6-BC30-45C9-9DC9-49818A95C2CE@gmail.com> Message-ID: On Thu, Jun 7, 2012 at 11:44 AM, Dave Hirschfeld wrote: > Paul Anton Letnes gmail.com> writes: > >> I would prefer: >> IndexError: index 3 is out of bounds for axis 0: [-3,2] >> as I find the 3) notation a bit weird - after all, indices are not floats, so > 2.999 or 2.3 doesn't make sense as >> an index. >> >> An alternative is to not refer to negative indices and say >> IndexError: index 3 is out of bounds for axis 0, shape was: (3,) >> (print more axes when the number of axes is higher.) >> > > +1 for the latter suggestion - if the array shape is available it's a great > help in debugging the error. I agree that reporting the shape would be good, but it's usually not available at the point that the indices are found to be out-of-bounds, due to (often implicit) flattening. I think it might be possible to track and report that the array was flattened, which might help avoid some confusion when the maximum index reported in the Exception doesn't match any of the dimensions of the array being indexed due to flattening. Another possibility I entertained was to split the too-high vs. too-low cases: IndexError: index 3 is out of bounds for axis 0: must be < 3 IndexError: index -4 is out of bounds for axis 0: must be >= -3 Ray Jones From chaoyuejoy at gmail.com Thu Jun 7 06:43:52 2012 From: chaoyuejoy at gmail.com (Chao YUE) Date: Thu, 7 Jun 2012 12:43:52 +0200 Subject: [Numpy-discussion] Ubuntu PPA for NumPy / SciPy / ... In-Reply-To: References: Message-ID: Hi, do you try to install from this PPA? I am using ubuntu 11.04, it's not difficult possibly to install by pip. but this is the only thing I know. chao 2012/6/7 Andreas Hilboll > Hi, > > I just noticed that there's a PPA for NumPy/SciPy on Launchpad: > > https://launchpad.net/~scipy/+archive/ppa > > However, it's painfully outdated. Does anyone know of its status? Is it > 'official'? Are there any plans in revitalizing it, possibly with adding > other projects from the "scipy universe"? Is there help needed? > > Many questions, but possibly quite easy to answer ... > > Cheers, > Andreas. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From lists at hilboll.de Thu Jun 7 07:33:27 2012 From: lists at hilboll.de (Andreas Hilboll) Date: Thu, 7 Jun 2012 13:33:27 +0200 Subject: [Numpy-discussion] Ubuntu PPA for NumPy / SciPy / ... In-Reply-To: References: Message-ID: <3bb46c7aa3d5927c4cd765fb80fc7d39.squirrel@srv2.s4y.tournesol-consulting.eu> >> Hi, >> >> I just noticed that there's a PPA for NumPy/SciPy on Launchpad: >> >> https://launchpad.net/~scipy/+archive/ppa >> >> However, it's painfully outdated. Does anyone know of its status? Is it >> 'official'? Are there any plans in revitalizing it, possibly with adding >> other projects from the "scipy universe"? Is there help needed? >> >> Many questions, but possibly quite easy to answer ... >> >> Cheers, >> Andreas. > Hi, > > do you try to install from this PPA? I am using ubuntu 11.04, it's not > difficult possibly to install by pip. but this is the only thing I know. > > chao Hi, I know it's easy to install by pip. But I know too many people who don't want to care about installing software 'by hand'. For those, it would be good to have a PPA with the current stable releases of numpy, scipy, matplotlib, ipython, pandas, statsmodels, you name them. Many desktops in research environments might be running Ubuntu LTS, and in two years, noone will want to be stuck with numpy 1.6 without the chance to easily (and for a whole network) be up-to-date. To summarize: I think it would be a valuable addition to the numpy community if there were such an up-to-date PPA. Andreas. From barthpi at gmail.com Thu Jun 7 09:16:42 2012 From: barthpi at gmail.com (Pierre Barthelemy) Date: Thu, 7 Jun 2012 15:16:42 +0200 Subject: [Numpy-discussion] reshape/resize and array extension Message-ID: Hi everyone, I am making a program to realize some "live" data analysis. I progressively take the data, and write them in a file as a single column. If i take 2D data, this would give: data= X Y Z 0 0 1 1 0 2 2 0 3 0 1 1 1 1 2 2 1 3 0 2 1 1 2 1 2 2 1 To plot these data, i need first to reshape the Z column as a 2D array: z=data(:,2) z.resize(len(x),len(y)) The issue is that in most of the case, are the data is built live, the last line of the 2D array is almost never completed. To solve that issue i complete it with 'nan' My code is then z=append(z,zeros(len(x)*len(y)-len(z))*nan) z.resize((len(y),len(x)) Is there a simpler/faster way to do that completing-reshaping step ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From edcjones at comcast.net Thu Jun 7 09:25:13 2012 From: edcjones at comcast.net (Edward C. Jones) Date: Thu, 07 Jun 2012 09:25:13 -0400 Subject: [Numpy-discussion] Are "min", "max" documented for scalars? Message-ID: <4FD0ABB9.1040606@comcast.net> Silly mistakes. If a and b are Python ints, Python floats, or non-complex numpy.number's, "max" returns, unchanged, the largrt of the two objects. There is no coercion to a common type. This useful behavior needs to be documented. From travis at vaught.net Thu Jun 7 11:02:31 2012 From: travis at vaught.net (Travis Vaught) Date: Thu, 7 Jun 2012 10:02:31 -0500 Subject: [Numpy-discussion] better error message possible? In-Reply-To: <3EC9C3D6-BC30-45C9-9DC9-49818A95C2CE@gmail.com> References: <4FC88F5F.3060303@simplistix.co.uk> <4FC8F442.8040100@simplistix.co.uk> <3EC9C3D6-BC30-45C9-9DC9-49818A95C2CE@gmail.com> Message-ID: On Jun 7, 2012, at 4:32 AM, Paul Anton Letnes wrote: > > On 7. juni 2012, at 10:30, Thouis (Ray) Jones wrote: > >> I've opened a PR at https://github.com/numpy/numpy/pull/296 for discussion. >> >> A typical result >> >>>>> np.zeros((3,3))[[1,2,3]] >> Traceback (most recent call last): >> File "", line 1, in >> IndexError: index 3 is out of bounds for axis 0: [-3,3) >> >> Ray Jones >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > I would prefer: > IndexError: index 3 is out of bounds for axis 0: [-3,2] > as I find the 3) notation a bit weird - after all, indices are not floats, so 2.999 or 2.3 doesn't make sense as an index. Actually, with slicing, you can attempt indexing using floats (as referred to here: http://stackoverflow.com/questions/8514547/why-ndarray-allow-floating-point-index), so I disagree with the integer notation [-3,2] I actually prefer the subsequent suggestion of reporting the shape, since that's what I always check when debugging index errors anyway -- having it in the error message would save me a ton of time. > > An alternative is to not refer to negative indices and say > IndexError: index 3 is out of bounds for axis 0, shape was: (3,) > (print more axes when the number of axes is higher.) > > BTW, I'm really glad someone is taking an interest in these error messages, it's a great idea! > ? an enthusiastic +1! > Paul > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Thu Jun 7 11:22:25 2012 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 7 Jun 2012 16:22:25 +0100 Subject: [Numpy-discussion] Are "min", "max" documented for scalars? In-Reply-To: <4FD0ABB9.1040606@comcast.net> References: <4FD0ABB9.1040606@comcast.net> Message-ID: On Thu, Jun 7, 2012 at 2:25 PM, Edward C. Jones wrote: > Silly mistakes. > > If a and b are Python ints, Python floats, or non-complex > numpy.number's, ?"max" returns, unchanged, the largrt of the two > objects. ?There is no coercion to a common type. ?This useful behavior > needs to be documented. Suggestions for improving the standard Python documentation (which documents these functions) can be sent here: http://docs.python.org/bugs.html#documentation-bugs -- Robert Kern From ndbecker2 at gmail.com Thu Jun 7 14:55:42 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Thu, 07 Jun 2012 14:55:42 -0400 Subject: [Numpy-discussion] possible enhancement to getitem? Message-ID: In [3]: u = np.arange(10) In [4]: u Out[4]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) In [5]: u[-2:] Out[5]: array([8, 9]) In [6]: u[-2:2] Out[6]: array([], dtype=int64) I would argue for consistency it would be desirable for this to return [8, 9, 0, 1] From robert.kern at gmail.com Thu Jun 7 15:00:40 2012 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 7 Jun 2012 20:00:40 +0100 Subject: [Numpy-discussion] possible enhancement to getitem? In-Reply-To: References: Message-ID: On Thu, Jun 7, 2012 at 7:55 PM, Neal Becker wrote: > In [3]: u = np.arange(10) > > In [4]: u > Out[4]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) > > In [5]: u[-2:] > Out[5]: array([8, 9]) > > In [6]: u[-2:2] > Out[6]: array([], dtype=int64) > > I would argue for consistency it would be desirable for this to return > > [8, 9, 0, 1] Unfortunately, this would be inconsistent with Python semantics: [~] |1> u = range(10) [~] |2> u[-2:2] [] -- Robert Kern From ralf.gommers at googlemail.com Thu Jun 7 17:14:01 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 7 Jun 2012 23:14:01 +0200 Subject: [Numpy-discussion] Ubuntu PPA for NumPy / SciPy / ... In-Reply-To: <3bb46c7aa3d5927c4cd765fb80fc7d39.squirrel@srv2.s4y.tournesol-consulting.eu> References: <3bb46c7aa3d5927c4cd765fb80fc7d39.squirrel@srv2.s4y.tournesol-consulting.eu> Message-ID: On Thu, Jun 7, 2012 at 1:33 PM, Andreas Hilboll wrote: > >> Hi, > >> > >> I just noticed that there's a PPA for NumPy/SciPy on Launchpad: > >> > >> https://launchpad.net/~scipy/+archive/ppa > > >> > >> However, it's painfully outdated. Does anyone know of its status? Is it > >> 'official'? > No, it's not. All kinds of distribution mechanisms are dependent on the initiatives of community members; the only things that can be considered official (in the sense that they're the main repo or part of a release) is the Github repo and the pypi and SourceForge tarballs/binaries. > Are there any plans in revitalizing it, possibly with adding > >> other projects from the "scipy universe"? Is there help needed? > >> > >> Many questions, but possibly quite easy to answer ... > >> > >> Cheers, > >> Andreas. > > Hi, > > > > do you try to install from this PPA? I am using ubuntu 11.04, it's not > > difficult possibly to install by pip. but this is the only thing I know. > > > > chao > > Hi, I know it's easy to install by pip. But I know too many people who > don't want to care about installing software 'by hand'. For those, it > would be good to have a PPA with the current stable releases of numpy, > scipy, matplotlib, ipython, pandas, statsmodels, you name them. Many > desktops in research environments might be running Ubuntu LTS, and in two > years, noone will want to be stuck with numpy 1.6 without the chance to > easily (and for a whole network) be up-to-date. > > To summarize: I think it would be a valuable addition to the numpy > community if there were such an up-to-date PPA. > Sounds good. Anyone could maintain such a PPA. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Fri Jun 8 02:07:37 2012 From: cournape at gmail.com (David Cournapeau) Date: Fri, 8 Jun 2012 14:07:37 +0800 Subject: [Numpy-discussion] [SciPy-Dev] Ubuntu PPA for NumPy / SciPy / ... In-Reply-To: References: Message-ID: On Thu, Jun 7, 2012 at 5:24 PM, Andreas Hilboll wrote: > Hi, > > I just noticed that there's a PPA for NumPy/SciPy on Launchpad: > > https://launchpad.net/~scipy/+archive/ppa > > However, it's painfully outdated. Does anyone know of its status? Is it > 'official'? Are there any plans in revitalizing it, possibly with adding > other projects from the "scipy universe"? Is there help needed? > > Many questions, but possibly quite easy to answer ... > > I set up this PPA a long time ago. I just don't have time to maintain it at that point, but would be happy to give someone the keys to make it up to date. David -------------- next part -------------- An HTML attachment was scrubbed... URL: From bob at bobcowdery.plus.com Fri Jun 8 06:31:19 2012 From: bob at bobcowdery.plus.com (Bob Cowdery) Date: Fri, 08 Jun 2012 11:31:19 +0100 Subject: [Numpy-discussion] Numpy structures Message-ID: <4FD1D477.7040309@bobcowdery.plus.com> Hi all, I am reading a datagram which contains within it a type. The type dictates the structure of the datagram. I want to put this into a numpy structure, one of which is: np.zeros(1,dtype=('2uint8,uint8,uint8,uint32,8uint8,504uint8,8uint8,504uint8')) As I don't know what I'm getting until I've read it (it seems I have to read the whole datagram in one read) I don't know the shape of the structure to use. I have tried reading into: np.zeros(1032, dtype='uint8') and then attempting to copy to the correct structure. How should I do this copy as numpy.copy() does not seem to work as if I then try to read some fields in the structure it complains the fields don't exist. Is there a better way to do this kind of thing, preferably without causing a data copy? Thasnks Bob From ndbecker2 at gmail.com Fri Jun 8 09:14:55 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 08 Jun 2012 09:14:55 -0400 Subject: [Numpy-discussion] possible enhancement to getitem? References: Message-ID: Robert Kern wrote: > On Thu, Jun 7, 2012 at 7:55 PM, Neal Becker wrote: >> In [3]: u = np.arange(10) >> >> In [4]: u >> Out[4]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >> >> In [5]: u[-2:] >> Out[5]: array([8, 9]) >> >> In [6]: u[-2:2] >> Out[6]: array([], dtype=int64) >> >> I would argue for consistency it would be desirable for this to return >> >> [8, 9, 0, 1] > > Unfortunately, this would be inconsistent with Python semantics: > > [~] > |1> u = range(10) > > [~] > |2> u[-2:2] > [] > The fact that this proposed numpy behavior would not match python list behavior holds little weight for me. I would still favor this change, unless it added significant overhead. My opinion, of course. From alan.isaac at gmail.com Fri Jun 8 09:42:34 2012 From: alan.isaac at gmail.com (Alan G Isaac) Date: Fri, 08 Jun 2012 09:42:34 -0400 Subject: [Numpy-discussion] possible enhancement to getitem? In-Reply-To: References: Message-ID: <4FD2014A.2090702@gmail.com> On 6/8/2012 9:14 AM, Neal Becker wrote: > The fact that this proposed numpy behavior would not match python list behavior > holds little weight for me. It is not just Python behavior for lists. It is the semantics for all sequence types. Breaking this would be appalling. Alan Isaac From jniehof at lanl.gov Fri Jun 8 09:51:53 2012 From: jniehof at lanl.gov (Jonathan T. Niehof) Date: Fri, 08 Jun 2012 07:51:53 -0600 Subject: [Numpy-discussion] possible enhancement to getitem? In-Reply-To: References: Message-ID: <4FD20379.1080706@lanl.gov> On 06/07/2012 12:55 PM, Neal Becker wrote: > In [3]: u = np.arange(10) > > In [4]: u > Out[4]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) > > In [5]: u[-2:] > Out[5]: array([8, 9]) > > In [6]: u[-2:2] > Out[6]: array([], dtype=int64) > > I would argue for consistency it would be desirable for this to return > > [8, 9, 0, 1] Should u[8:2] also return [8, 9, 0, 1], for consistency? That would be the concatenation of u[8:] and u[:2], which seems to be your argument. I concur with Alan that a numpy array should, to the extent possible, remain a sequence type. It's a pretty good duck without a peacock tail ;) -- Jonathan Niehof ISR-3 Space Data Systems Los Alamos National Laboratory MS-D466 Los Alamos, NM 87545 Phone: 505-667-9595 email: jniehof at lanl.gov Correspondence / Technical data or Software Publicly Available From rjd4+numpy at cam.ac.uk Fri Jun 8 10:04:33 2012 From: rjd4+numpy at cam.ac.uk (Bob Dowling) Date: Fri, 08 Jun 2012 15:04:33 +0100 Subject: [Numpy-discussion] possible enhancement to getitem? In-Reply-To: References: Message-ID: <4FD20671.5090906@cam.ac.uk> On 08/06/12 14:14, Neal Becker wrote: > The fact that this proposed numpy behavior would not match python list behavior > holds little weight for me. I would still favor this change, unless it added > significant overhead. My opinion, of course. It holds enormous weight for me. My opinion is that NumPy arrays should be consistent with Python sequences as much as possible. From bobtnur78 at gmail.com Fri Jun 8 11:04:29 2012 From: bobtnur78 at gmail.com (bob tnur) Date: Fri, 8 Jun 2012 11:04:29 -0400 Subject: [Numpy-discussion] numpy arrays Message-ID: Hi every body! I have a &b numpy arrays a=np.loadtxt('?m1.txt', dtype=np.float, skiprows=2,usecols=[1]) b=np.loadtxt('?m1.txt', dtype=('x', np.float64), skiprows=2,usecols=[2]) 1. I want to save or write these two arrays and able to see the output as output file, say cm1.out. what about if I have multiple files like cm1.txt,cm2.txt,cm3.txt etc and to produce their corresponding outputs cm1.out,cm2.out,cm3.out etc. I appreciate your help -------------- next part -------------- An HTML attachment was scrubbed... URL: From mike.ressler at alum.mit.edu Fri Jun 8 11:52:43 2012 From: mike.ressler at alum.mit.edu (Mike Ressler) Date: Fri, 8 Jun 2012 08:52:43 -0700 Subject: [Numpy-discussion] possible enhancement to getitem? In-Reply-To: <4FD20671.5090906@cam.ac.uk> References: <4FD20671.5090906@cam.ac.uk> Message-ID: > On 08/06/12 14:14, Neal Becker wrote: >> The fact that this proposed numpy behavior would not match python list behavior >> holds little weight for me. ?I would still favor this change, unless it added >> significant overhead. ?My opinion, of course. As a "Joe User", I think using the [-2:2] syntax would be a mistake, though I agree it would be a useful shortcut (e.g. FFTs or power series where you need to wrap around). My concern is this: in a multiple dimension case, which are the -2 and -1 that you really want? Example, in a 2-D image that is constructed by reshaping a series of samples ordered in time, there are occasions where I would want the first two and last two samples of the same row (looking at spatial effects), and other times where I want the right two samples of the preceding row and the left two samples of the next row (looking at time effects). Which is the correct interpretation? It may be obvious to most people that it is "from the same row", but that isn't so obvious to me. Actually, my own preference would be that this construction throws an exception as a malformed slice, and I really wonder why python itself doesn't do this. Mike -- mike.ressler at alum.mit.edu From njs at pobox.com Fri Jun 8 15:20:33 2012 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 8 Jun 2012 20:20:33 +0100 Subject: [Numpy-discussion] Numpy structures In-Reply-To: <4FD1D477.7040309@bobcowdery.plus.com> References: <4FD1D477.7040309@bobcowdery.plus.com> Message-ID: On Fri, Jun 8, 2012 at 11:31 AM, Bob Cowdery wrote: > Hi all, > > I am reading a datagram which contains within it a type. The type > dictates the structure of the datagram. I want to put this into a numpy > structure, one of which is: > np.zeros(1,dtype=('2uint8,uint8,uint8,uint32,8uint8,504uint8,8uint8,504uint8')) > > As I don't know what I'm getting until I've read it (it seems I have to > read the whole datagram in one read) I don't know the shape of the > structure to use. > > I have tried reading into: > np.zeros(1032, dtype='uint8') > > and then attempting to copy to the correct structure. How should I do > this copy as numpy.copy() does not seem to work as if I then try to read > some fields in the structure it complains the fields don't exist. > > Is there a better way to do this kind of thing, preferably without > causing a data copy? I'm not sure I followed what exactly is going on here, but in general if you want to take a chunk of memory that's inside a numpy array and re-interpret it in-place as being of a new dtype, then the way you do that is my_array.view(dtype=) HTH, -n From nouiz at nouiz.org Fri Jun 8 20:45:17 2012 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Fri, 8 Jun 2012 20:45:17 -0400 Subject: [Numpy-discussion] not expected output of fill_diagonal Message-ID: Hi, While reviewing the Theano op that wrap numpy.fill_diagonal, we found an unexpected behavior of it: # as expected for square matrix >>> a=numpy.zeros((5,5)) >>> numpy.fill_diagonal(a, 10) >>> print a # as expected long rectangular matrix >>> a=numpy.zeros((3,5)) >>> numpy.fill_diagonal(a, 10) >>> print a [[ 10. 0. 0. 0. 0.] [ 0. 10. 0. 0. 0.] [ 0. 0. 10. 0. 0.]] # Not as expected >>> a=numpy.zeros((5,3)) >>> numpy.fill_diagonal(a, 10) >>> print a [[ 10. 0. 0.] [ 0. 10. 0.] [ 0. 0. 10.] [ 0. 0. 0.] [ 10. 0. 0.]] I can make a PR that will add a parameter wrap that allow to control if it return the old behavior or what I would expect in the last case: [[ 10. 0. 0.] [ 0. 10. 0.] [ 0. 0. 10.] [ 0. 0. 0.] [ 0. 0. 0.]] My questions is, do someone else expect the current behavior? Should we change the default to be what I expect? Do you want that we warn if the user didn't specify witch behavior and in the future we change it? Anything else I didn't think? thanks Fred From warren.weckesser at enthought.com Sat Jun 9 08:44:06 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Sat, 9 Jun 2012 07:44:06 -0500 Subject: [Numpy-discussion] not expected output of fill_diagonal In-Reply-To: References: Message-ID: On Fri, Jun 8, 2012 at 7:45 PM, Fr?d?ric Bastien wrote: > Hi, > > While reviewing the Theano op that wrap numpy.fill_diagonal, we found > an unexpected behavior of it: > > # as expected for square matrix > >>> a=numpy.zeros((5,5)) > >>> numpy.fill_diagonal(a, 10) > >>> print a > > # as expected long rectangular matrix > >>> a=numpy.zeros((3,5)) > >>> numpy.fill_diagonal(a, 10) > >>> print a > [[ 10. 0. 0. 0. 0.] > [ 0. 10. 0. 0. 0.] > [ 0. 0. 10. 0. 0.]] > > # Not as expected > >>> a=numpy.zeros((5,3)) > >>> numpy.fill_diagonal(a, 10) > >>> print a > [[ 10. 0. 0.] > [ 0. 10. 0.] > [ 0. 0. 10.] > [ 0. 0. 0.] > [ 10. 0. 0.]] > > > I can make a PR that will add a parameter wrap that allow to control > if it return the old behavior or what I would expect in the last case: > [[ 10. 0. 0.] > [ 0. 10. 0.] > [ 0. 0. 10.] > [ 0. 0. 0.] > [ 0. 0. 0.]] > > My questions is, do someone else expect the current behavior? Should > we change the default to be what I expect? Do you want that we warn if > the user didn't specify witch behavior and in the future we change it? > > There is a ticket for this: http://projects.scipy.org/numpy/ticket/1953 I agree that the behavior is unexpected and should be fixed. Warren > Anything else I didn't think? > > thanks > > Fred > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From plredmond at gmail.com Sat Jun 9 13:42:07 2012 From: plredmond at gmail.com (Patrick Redmond) Date: Sat, 9 Jun 2012 13:42:07 -0400 Subject: [Numpy-discussion] numpy arrays In-Reply-To: References: Message-ID: How do you want the output files to be formatted? Binary data? Textual representation? This function can do both: http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.tofile.html And numpy supports a variety of methods for outputting to files (and reading data back in): http://docs.scipy.org/doc/numpy/reference/routines.io.html --PLR On Fri, Jun 8, 2012 at 11:04 AM, bob tnur wrote: > Hi every body! > I have a &b numpy arrays > a=np.loadtxt('?m1.txt',? dtype=np.float, skiprows=2,usecols=[1]) > b=np.loadtxt('?m1.txt',? dtype=('x', np.float64),? skiprows=2,usecols=[2]) > > 1. I want to save or write these two arrays and able to see the output as > output file, say cm1.out. what about if I have multiple files like > cm1.txt,cm2.txt,cm3.txt etc and to produce their corresponding outputs > cm1.out,cm2.out,cm3.out etc. > > I appreciate your help > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From bobtnur78 at gmail.com Sat Jun 9 14:12:40 2012 From: bobtnur78 at gmail.com (bob tnur) Date: Sat, 9 Jun 2012 14:12:40 -0400 Subject: [Numpy-discussion] numpy arrays In-Reply-To: References: Message-ID: > > Hi every body! > I have a &b numpy arrays > a=np.loadtxt('?m1.txt', dtype=np.float, skiprows=2,usecols=[1]) > b=np.loadtxt('?m1.txt', dtype=('x', np.float64), skiprows=2,usecols=[2]) > > how to save multiple files like cm1.txt,cm2.txt,cm3.txt etc and to > produce their corresponding outputs cm1.out,cm2.out,cm3.out etc. > or how to modify this: np.savetxt(*fname*, *(a*,b), *fmt='%4*.8f*'*) > > I appreciate your help > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From plredmond at gmail.com Sat Jun 9 15:09:25 2012 From: plredmond at gmail.com (Patrick Redmond) Date: Sat, 9 Jun 2012 15:09:25 -0400 Subject: [Numpy-discussion] numpy arrays In-Reply-To: References: Message-ID: On Sat, Jun 9, 2012 at 2:12 PM, bob tnur wrote: >> >> how to save? multiple files like cm1.txt,cm2.txt,cm3.txt etc and to >> produce their corresponding outputs cm1.out,cm2.out,cm3.out etc. > > ?? or how to modify this: > ?? np.savetxt(fname, (a,b), fmt='%4.8f') > You can save them to separate files with a for loop. for i, arr in enumerate([a, b]): fname = 'cm{}.out'.format(i + 1) np.savetxt(fname, arr, fmt='%4.8f') From josef.pktd at gmail.com Sat Jun 9 17:45:41 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 9 Jun 2012 17:45:41 -0400 Subject: [Numpy-discussion] convert to string - astype(str) Message-ID: Is there a way to convert an array to string elements in numpy, without knowing the string length? >>> arr2 = np.arange(8, 13) >>> arr2.astype(str) # bad array(['8', '9', '1', '1', '1'], dtype='|S1') >>> arr2.astype('S2') array(['8', '9', '10', '11', '12'], dtype='|S2') >>> map(str, arr2) ['8', '9', '10', '11', '12'] >>> arr3 = np.round(np.random.rand(5), 2) >>> arr3 array([ 0.51, 0.86, 0.15, 0.68, 0.59]) >>> arr3.astype(str) # bad array(['0', '0', '0', '0', '0'], dtype='|S1') >>> arr3.astype('S4') array(['0.51', '0.86', '0.15', '0.68', '0.59'], dtype='|S4') >>> map(str, arr3) ['0.51', '0.86', '0.15', '0.68', '0.59'] >>> np.__version__ '1.5.1' (from an issue in statsmodels) Thanks, Josef From njs at pobox.com Sat Jun 9 17:47:45 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 9 Jun 2012 22:47:45 +0100 Subject: [Numpy-discussion] Fwd: [numpy] ENH: Initial implementation of a 'neighbor' calculation (#303) In-Reply-To: References: Message-ID: [Manual PR notification] ---------- Forwarded message ---------- From: timcera Date: Sat, Jun 9, 2012 at 10:13 PM Subject: [numpy] ENH: Initial implementation of a 'neighbor' calculation (#303) To: njsmith Each element is assigned the result of a function based on it's neighbors. Neighbors are selected based on a weight array. It uses the new pad routines to pad arrays if neighboring values are required that would be off the edge of the input array. Will be great to have the masked array settled because right now you can only sort of exclude from the neighborhood using a zero in the weight array. ?Zero or np.IGNORE don't affect np.sum, but functions like np.mean and np.std would give different answers. ?Because of this my early implementations of neighbor included an optional mask array along with the weight array, but I decided would be best to wait for the new masked arrays. This in some ways could be considered a generalization of a convolution, and comparison with existing numpy/scipy convolution results are included in the tests. ?The advantage to neighbor is that any function that accepts a 1-d array, and returns a single result, can be used instead of convolution only using summation. ?The convolution functions require the weight array to be flipped to get the same answer as neighbor. You can merge this Pull Request by running: ?git pull https://github.com/timcera/numpy neighbor Or you can view, comment on it, or merge it online at: ?https://github.com/numpy/numpy/pull/303 -- Commit Summary -- * ENH: Initial implementation of a 'neighbor' calculation where the each -- File Changes -- M numpy/lib/__init__.py (2) A numpy/lib/neighbor.py (305) A numpy/lib/tests/test_neighbor.py (278) -- Patch Links -- ?https://github.com/numpy/numpy/pull/303.patch ?https://github.com/numpy/numpy/pull/303.diff --- Reply to this email directly or view it on GitHub: https://github.com/numpy/numpy/pull/303 From edcjones at comcast.net Sat Jun 9 22:58:29 2012 From: edcjones at comcast.net (Edward C. Jones) Date: Sat, 09 Jun 2012 22:58:29 -0400 Subject: [Numpy-discussion] Incorrect overflow warning message for float128 Message-ID: <4FD40D55.6060204@comcast.net> I use up-to-date Debian testing (wheezy), amd64 architecture. From the docs for numpy.MachAr: maxexp int Smallest (positive) power of ibeta that causes overflow. On my machine, ibeta = 2 and maxexp = 16384. For float64, float32, and float16 things behave as expected. For float128, I get the message about overflow for exponent 8192 (and greater) but the correct answer is printed. What is the problem? #! /usr/bin/env python3.2 import numpy print('float128') fi = numpy.finfo(numpy.float128) print('ibeta:', fi.machar.ibeta) # 2 print('maxexp:', fi.machar.maxexp) # 16384 print('xmax:', fi.machar.xmax) # 1.18973149536e+4932 two = numpy.float128(2) big = numpy.float128(8191) x = numpy.power(two, big) print('2**8191:', x) # 5.4537406781e+2465 big = numpy.float128(8192) # RuntimeWarning: overflow encountered in power x = numpy.power(two, big) print('2**8192:', x) # 1.09074813562e+2466 From travis at continuum.io Sun Jun 10 02:17:25 2012 From: travis at continuum.io (Travis Oliphant) Date: Sun, 10 Jun 2012 01:17:25 -0500 Subject: [Numpy-discussion] convert to string - astype(str) In-Reply-To: References: Message-ID: On Jun 9, 2012, at 4:45 PM, josef.pktd at gmail.com wrote: > Is there a way to convert an array to string elements in numpy, > without knowing the string length? Not really. In the next release of NumPy you should be able to do. result = array(arr2, str) and it will determine the length of the string for you. For now, I would compute the max length via int(math.ceil(math.log10(np.max(arr2)))) -Travis > > >>>> arr2 = np.arange(8, 13) > >>>> arr2.astype(str) # bad > array(['8', '9', '1', '1', '1'], > dtype='|S1') > >>>> arr2.astype('S2') > array(['8', '9', '10', '11', '12'], > dtype='|S2') > >>>> map(str, arr2) > ['8', '9', '10', '11', '12'] > > >>>> arr3 = np.round(np.random.rand(5), 2) >>>> arr3 > array([ 0.51, 0.86, 0.15, 0.68, 0.59]) > >>>> arr3.astype(str) # bad > array(['0', '0', '0', '0', '0'], > dtype='|S1') > >>>> arr3.astype('S4') > array(['0.51', '0.86', '0.15', '0.68', '0.59'], > dtype='|S4') > >>>> map(str, arr3) > ['0.51', '0.86', '0.15', '0.68', '0.59'] > >>>> np.__version__ > '1.5.1' > > (from an issue in statsmodels) > > Thanks, > > Josef > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From josef.pktd at gmail.com Sun Jun 10 07:58:38 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 10 Jun 2012 07:58:38 -0400 Subject: [Numpy-discussion] convert to string - astype(str) In-Reply-To: References: Message-ID: On Sun, Jun 10, 2012 at 2:17 AM, Travis Oliphant wrote: > > On Jun 9, 2012, at 4:45 PM, josef.pktd at gmail.com wrote: > >> Is there a way to convert an array to string elements in numpy, >> without knowing the string length? > > Not really. ? In the next release of NumPy you should be able to do. > > result = array(arr2, str) > > and it will determine the length of the string for you. > > For now, I would compute the max length via > > int(math.ceil(math.log10(np.max(arr2)))) Thanks Travis, It's good to know when I can stop looking. Josef > > > -Travis > > > >> >> >>>>> arr2 = np.arange(8, 13) >> >>>>> arr2.astype(str) ? # bad >> array(['8', '9', '1', '1', '1'], >> ? ? ?dtype='|S1') >> >>>>> arr2.astype('S2') >> array(['8', '9', '10', '11', '12'], >> ? ? ?dtype='|S2') >> >>>>> map(str, arr2) >> ['8', '9', '10', '11', '12'] >> >> >>>>> arr3 = np.round(np.random.rand(5), 2) >>>>> arr3 >> array([ 0.51, ?0.86, ?0.15, ?0.68, ?0.59]) >> >>>>> arr3.astype(str) ? # bad >> array(['0', '0', '0', '0', '0'], >> ? ? ?dtype='|S1') >> >>>>> arr3.astype('S4') >> array(['0.51', '0.86', '0.15', '0.68', '0.59'], >> ? ? ?dtype='|S4') >> >>>>> map(str, arr3) >> ['0.51', '0.86', '0.15', '0.68', '0.59'] >> >>>>> np.__version__ >> '1.5.1' >> >> (from an issue in statsmodels) >> >> Thanks, >> >> Josef >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ralf.gommers at googlemail.com Sun Jun 10 11:13:28 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 10 Jun 2012 17:13:28 +0200 Subject: [Numpy-discussion] boolean indexing change Message-ID: Something that we just ran into trying to merge a scipy PR: With 1.5.1: >>> np.arange(10)[np.array([0,1,0,1,2,3]) > 0] array([1, 3, 4, 5]) With current master: In [1]: np.arange(10)[np.array([0,1,0,1,2,3]) > 0] --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /Users/rgommers/ in () ----> 1 np.arange(10)[np.array([0,1,0,1,2,3]) > 0] ValueError: operands could not be broadcast together with shapes (10) (6) The reason for this is noted in the 2.0.0 release notes: "Full-array boolean indexing used to allow boolean arrays with a size non-broadcastable to the array size. Now it forces this to be broadcastable. Since this affects some legacy code, this change will require discussion during alpha or early beta testing, and a decision to either keep the stricter behavior, or add in a hack to allow the previous behavior to work. " I'm not opposed to the change in principle, but just wanted to note it can lead to code breaking and puzzled users. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From edcjones at comcast.net Sun Jun 10 20:21:29 2012 From: edcjones at comcast.net (Edward C. Jones) Date: Sun, 10 Jun 2012 20:21:29 -0400 Subject: [Numpy-discussion] Errors with "**" for numpy.float16 Message-ID: <4FD53A09.9010301@comcast.net> #! /usr/bin/env python3.2 import numpy for t in (numpy.float16, numpy.float32, numpy.float64, numpy.float128): two = t(2) print(t, two, two ** two, numpy.power(two, two)) """ I use up-to-date debian testing (wheezy), amd64 architecture. The python package is python3, version 3.2.3~rc1-2. The numpy package is python3-numpy, version 1:1.6.2~rc1-1. "**" has problems for float16: 2.0 1.1921e-07 4.0 2.0 4.0 4.0 2.0 4.0 4.0 2.0 4.0 4.0 """ From travis at continuum.io Sun Jun 10 20:31:36 2012 From: travis at continuum.io (Travis Oliphant) Date: Sun, 10 Jun 2012 19:31:36 -0500 Subject: [Numpy-discussion] boolean indexing change In-Reply-To: References: Message-ID: It is unfortunate that this was committed to master. This should be backed out and is a blocker for 1.7. Can someone help me identify which commit made the change? This is a rather significant change and changes the documented behavior of NumPy substantially. This should definitely not occur in 1.7 The documented behavior (Guide to NumPy, pg. 84) of boolean indexing is that x[obj] is equivalent to x[obj.nonzero()] The shape of advanced indexing is not restricted to the shape of of x. I suspect this change was made when it was presumed the next release would be 2.0 and such behavior could presumably be changed somewhat? But, was there a discussion about this? -Travis On Jun 10, 2012, at 10:13 AM, Ralf Gommers wrote: > Something that we just ran into trying to merge a scipy PR: > > With 1.5.1: > >>> np.arange(10)[np.array([0,1,0,1,2,3]) > 0] > array([1, 3, 4, 5]) > > With current master: > In [1]: np.arange(10)[np.array([0,1,0,1,2,3]) > 0] > --------------------------------------------------------------------------- > ValueError Traceback (most recent call last) > /Users/rgommers/ in () > ----> 1 np.arange(10)[np.array([0,1,0,1,2,3]) > 0] > > ValueError: operands could not be broadcast together with shapes (10) (6) > > > The reason for this is noted in the 2.0.0 release notes: > "Full-array boolean indexing used to allow boolean arrays with a size > non-broadcastable to the array size. Now it forces this to be broadcastable. > Since this affects some legacy code, this change will require discussion > during alpha or early beta testing, and a decision to either keep the > stricter behavior, or add in a hack to allow the previous behavior to > work. > " > > I'm not opposed to the change in principle, but just wanted to note it can lead to code breaking and puzzled users. > > Ralf > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From bergstrj at iro.umontreal.ca Mon Jun 11 00:03:40 2012 From: bergstrj at iro.umontreal.ca (James Bergstra) Date: Mon, 11 Jun 2012 00:03:40 -0400 Subject: [Numpy-discussion] lazy evaluation In-Reply-To: <4FCF2154.7020803@astro.uio.no> References: <4FCE7BCB.1060900@astro.uio.no> <4FCF2154.7020803@astro.uio.no> Message-ID: Hi all, (sorry for missing the debate, I don't often check my numpy-list folder.) I agree that an "official" numpy solution to this problem is premature, but at the same time I think the failure to approach anything remotely resembling a consensus on how to deal with lazy evaluation is really gumming up the works across the numpy community. With apologies in advance to projects I don't cite (sorry!) it is currently the case that many high-level libraries (e.g. pacal, pymc, theano, sympy, pylearn2) offer more or less symbolic math features for scientific applications, but each one defines it's own lazy-numpy AST thing to track functional relationships between inputs and/or random variables. At the same time, none of these ASTs (except arguably Theano's) is handled natively by the many competing lazy-evaluation compiler runtimes (e.g. cython, numba, theano, numexpr). So consequently, the feature-specific ASTs often become more of a performance *problem* than a part of the optimizing-compiler pathway and libraries that provide end-user APIs (in my work I think of sklearn and skimage) continue to "wait and see" and don't commit to *any* of the options (except labour-intensive cython), so we all lose. The interesting development/insight I got from numba's byte-code parsing technique is the illustration that *Python byte code* is: a) a standard data structure that all Python code is already using b) editable (see e.g. http://code.google.com/p/byteplay) c) in pretty direct correspondance with high level (e.g. Theano's) "abstract" syntax graphs d) an unambiguous and obvious program specification for optimization (e.g. numba) After a little proof of concept work, I think that many high-level semantic features (e.g. turning a stochastic function into a sampler via PyMC, tracking uncertainty through computations, or minimizing a numpy function directly by automatic differentiation) can and should be done as bytecode -> bytecode transforms. An implementation of e.g. auto-diff will have to recognize when it can (and cannot) make sense of a code object... so functions with lots of control flow, yield statements, exception handling and the like may just be rejected. That's OK because mathematical code does often not require complex (often even *any*) control flow constructs. With regards to users being surprised by strange resource usage levels... this surprise can be avoided because a user who is applying such transforms will be well aware that he/she has transformed the original function into a new function. That transformation would be explicit, so there will be little suggestion from the program syntax that the new function has any statements in common with the original. The new function will have different statements, different resource usage profile, etc. I think APIs for this sort of bytecode->bytecode transformation can avoid surprising users if they are done right. If anyone is interested in my ongoing API & bytecode adventure in why / how lazy computing could be useful, I've put together a few tiny hypothetically-runnable examples here: https://github.com/jaberg/numba/tree/master/examples https://github.com/jaberg/numba/blob/master/examples/linear_svm.py https://github.com/jaberg/numba/blob/master/examples/mcmc.py The purpose of the examples is to show how the features of e.g. Theano and PyMC could be expressed as operators on raw Python code. Perhaps most importantly of all, these transforms would work together: a PaCal transform could automatically generate a likelihood function from a model and data, and then a Theano transform could provide the parameter gradients required to fit the likelihood. This natural chaining is a complete PITA when every project uses its own AST. That numba fork also includes very sketchy pseudocode of the main work routines in the numba/ad.py and numba/rv.py files. The linear_svm example was recently using Theano as a backend. I don't think it works right now but FWIW it is still close to running. Sorry for the long post, - James On Wed, Jun 6, 2012 at 5:22 AM, Dag Sverre Seljebotn wrote: > On 06/06/2012 12:06 AM, mark florisson wrote: >> On 5 June 2012 22:36, Dag Sverre Seljebotn ?wrote: >>> On 06/05/2012 10:47 PM, mark florisson wrote: >>>> On 5 June 2012 20:17, Nathaniel Smith ? ?wrote: >>>>> On Tue, Jun 5, 2012 at 7:08 PM, mark florisson >>>>> ? ?wrote: >>>>>> On 5 June 2012 17:38, Nathaniel Smith ? ?wrote: >>>>>>> On Tue, Jun 5, 2012 at 4:12 PM, mark florisson >>>>>>> ? ?wrote: >>>>>>>> On 5 June 2012 14:58, Nathaniel Smith ? ?wrote: >>>>>>>>> On Tue, Jun 5, 2012 at 12:55 PM, mark florisson >>>>>>>>> ? ?wrote: >>>>>>>>>> It would be great if we implement the NEP listed above, but with a few >>>>>>>>>> extensions. I think Numpy should handle the lazy evaluation part, and >>>>>>>>>> determine when expressions should be evaluated, etc. However, for each >>>>>>>>>> user operation, Numpy will call back a user-installed hook >>>>>>>>>> implementing some interface, to allow various packages to provide >>>>>>>>>> their own hooks to evaluate vector operations however they want. This >>>>>>>>>> will include packages such as Theano, which could run things on the >>>>>>>>>> GPU, Numexpr, and in the future >>>>>>>>>> https://github.com/markflorisson88/minivect (which will likely have an >>>>>>>>>> LLVM backend in the future, and possibly integrated with Numba to >>>>>>>>>> allow inlining of numba ufuncs). The project above tries to bring >>>>>>>>>> together all the different array expression compilers together in a >>>>>>>>>> single framework, to provide efficient array expressions specialized >>>>>>>>>> for any data layout (nditer on steroids if you will, with SIMD, >>>>>>>>>> threaded and inlining capabilities). >>>>>>>>> >>>>>>>>> A global hook sounds ugly and hard to control -- it's hard to tell >>>>>>>>> which operations should be deferred and which should be forced, etc. >>>>>>>> >>>>>>>> Yes, but for the user the difference should not be visible (unless >>>>>>>> operations can raise exceptions, in which case you choose the safe >>>>>>>> path, or let the user configure what to do). >>>>>>>> >>>>>>>>> While it would be less magical, I think a more explicit API would in >>>>>>>>> the end be easier to use... something like >>>>>>>>> >>>>>>>>> ? ?a, b, c, d = deferred([a, b, c, d]) >>>>>>>>> ? ?e = a + b * c ?# 'e' is a deferred object too >>>>>>>>> ? ?f = np.dot(e, d) ?# so is 'f' >>>>>>>>> ? ?g = force(f) ?# 'g' is an ndarray >>>>>>>>> ? ?# or >>>>>>>>> ? ?force(f, out=g) >>>>>>>>> >>>>>>>>> But at that point, this could easily be an external library, right? >>>>>>>>> All we'd need from numpy would be some way for external types to >>>>>>>>> override the evaluation of ufuncs, np.dot, etc.? We've recently seen >>>>>>>>> several reasons to want that functionality, and it seems like >>>>>>>>> developing these "improved numexpr" ideas would be much easier if they >>>>>>>>> didn't require doing deep surgery to numpy itself... >>>>>>>> >>>>>>>> Definitely, but besides monkey-patch-chaining I think some >>>>>>>> modifications would be required, but they would be reasonably simple. >>>>>>>> Most of the functionality would be handled in one function, which most >>>>>>>> ufuncs (the ones you care about, as well as ufunc (methods) like add) >>>>>>>> call. E.g. if ((result = NPy_LazyEval("add", op1, op2)) return result; >>>>>>>> , which is inserted after argument unpacking and sanity checking. You >>>>>>>> could also do a per-module hook, and have the function look at >>>>>>>> sys._getframe(1).f_globals, but that is fragile and won't work from C >>>>>>>> or Cython code. >>>>>>>> >>>>>>>> How did you have overrides in mind? >>>>>>> >>>>>>> My vague idea is that core numpy operations are about as fundamental >>>>>>> for scientific users as the Python builtin operations are, so they >>>>>>> should probably be overrideable in a similar way. So we'd teach numpy >>>>>>> functions to check for methods named like "__numpy_ufunc__" or >>>>>>> "__numpy_dot__" and let themselves be overridden if found. Like how >>>>>>> __gt__ and __add__ and stuff work. Or something along those lines. >>>>>>> >>>>>>>> I also found this thread: >>>>>>>> http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html >>>>>>>> , but I think you want more than just to override ufuncs, you want >>>>>>>> numpy to govern when stuff is allowed to be lazy and when stuff should >>>>>>>> be evaluated (e.g. when it is indexed, slice assigned (although that >>>>>>>> itself may also be lazy), etc). You don't want some funny object back >>>>>>>> that doesn't work with things which are not overridden in numpy. >>>>>>> >>>>>>> My point is that probably numpy should *not* govern the decision about >>>>>>> what stuff should be lazy and what should be evaluated; that should be >>>>>>> governed by some combination of the user and >>>>>>> Numba/Theano/minivect/whatever. The toy API I sketched out would make >>>>>>> those decisions obvious and explicit. (And if the funny objects had an >>>>>>> __array_interface__ attribute that automatically forced evaluation >>>>>>> when accessed, then they'd work fine with code that was expecting an >>>>>>> array, or if they were assigned to a "real" ndarray, etc.) >>>>>> >>>>>> That's disappointing though, since the performance drawbacks can >>>>>> severely limit the usefulness for people with big data sets. Ideally, >>>>>> you would take your intuitive numpy code, and make it go fast, without >>>>>> jumping through hoops. Numpypy has lazy evaluation, ?I don't know how >>>>>> good a job it does, but it does mean you can finally get fast numpy >>>>>> code in an intuitive way (and even run it on a GPU if that is possible >>>>>> and beneficial). >>>>> >>>>> All of these proposals require the user to jump through hoops -- the >>>>> deferred-ufunc NEP has the extra 'with deferredstate' thing, and more >>>>> importantly, a set of rules that people have to learn and keep in mind >>>>> for which numpy operations are affected, which ones aren't, which >>>>> operations can't be performed while deferredstate is True, etc. So >>>>> this has two problems: (1) these rules are opaque, (2) it's far from >>>>> clear what the rules should be. >>>> >>>> Right, I guess I should have commented on that. I don't think the >>>> deferredstate stuff is needed at all, execution can always be deferred >>>> as long as it does not affect semantics. So if something is marked >>>> readonly because it is used in an expression and then written to, you >>>> evaluate the expression and then perform the write. The only way to >>>> break stuff, I think, would be to use pointers through the buffer >>>> interface or PyArray_DATA and not respect the sudden readonly >>>> property. A deferred expression is only evaluated once in any valid >>>> GIL-holding context (so it shouldn't break threads either). >>> >>> I think Nathaniel's point is that the point where you get a 10-second >>> pause to wait for computation is part of the semantics of current NumPy: >>> >>> print 'Starting computation' >>> z = (x + y).sum() >>> print 'Computation done' >>> print 'Result was', z >>> >>> I think that if this wasn't the case, newbies would be be tripped up a >>> lot and things would feel a lot less intuitive. Certainly when working >>> from the IPython command line. >>> >>> Also, to remain sane in IPython (or when using a debugger, etc.), I'd want >>> >>> "print z" >>> >>> to print something like "unevaluated array", not to trigger a >>> computation. Same with str(z) and so on. >> >> I guess you could detect that at runtime, or just make it >> configurable. As for triggering computation somewhere else, I guess I >> find it preferable to horrible performance :) > > My problem might be that I don't use NumPy wherever I need performance > (except as a glorified double*, i.e. I don't use it for computation). > NumPy is for interactive "play with a reduced dataset" work. > >> >>> I don't think a context manager modifying thread-local global state like >>> >>> with np.lazy: >>> ? ? ?... >>> >>> would be horribly intrusive. >>> >>> But I also think it'd be good to start with being very explicit (x = >>> np.lazy_multiply(a, b); compute(x)) -- such an API should be available >>> anyway -- and then have the discussion once that works. >> >> Maybe that's the best way forward. I guess I'd prefer an import >> numpy.lazy_numpy as numpy in that case. I don't really like the with >> statement here, since ideally you'd just experiment with swapping in >> another module and see if your code still runs fine. > > Or just "import lazyarray as np". As I said, I think it's important to > refactor NumPy so that things can happen outside of the NumPy project. > > NumPy needs to be very conservative. You've seen the recent NA semantics > debate. If NumPy was to decide on *the* final blessed semantics for lazy > evaluation, even as an experimental sub-module, you'd never see the end > of it. > > One part of this is a polymorphic C API targeted for lazy evaluation and > get current NumPy to support that. Another part is, as Nathaniel has > commented, making things like "np.dot" have some kind of polymorphic > dispatch-on-the-objects behaviour. > > (I'd like something based on multiple dispatch rather than just calling > something on the left operand. Then use that multiple dispatch for > implementing +, - and so on as well when you want anything to interact > with NumPy arrays.) > > Dag > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- http://www-etud.iro.umontreal.ca/~bergstrj From thouis at gmail.com Mon Jun 11 10:11:53 2012 From: thouis at gmail.com (Thouis (Ray) Jones) Date: Mon, 11 Jun 2012 16:11:53 +0200 Subject: [Numpy-discussion] Migrating numpy Trac to github issues Message-ID: I've volunteered to help manage the migration of numpy tickets from Trac to github issues. The first part of this process is to decide which tickets to migrate, and how to map Trac ticket data to github issue data. Question 1: Which tickets should be migrated? Open? Open and recently closed? All tickets? Question 2: Which parts of the tickets should be included, and in what way? For reference, these are the parts of a ticket (from http://trac.edgewall.org/wiki/TracTickets) Reporter Type (e.g., defect, enhancementt). Component Version Keywords Priority Milestone Assigned to/Owner Cc Resolution (fixed, invalid, wontfix, duplicate, worksforme, others?) Status (e.g., new, needs_review, etc.) Summary (brief description) Description (full description) Comments Attachments (not listed on the link above, but enabled for numpy Trac) ??? (Other) ? I don't have access to the Trac configuration, so it's not clear if there are other fields I may be missing. As an initial suggestion, I propose: Reporter - yes, mapping Trac users to github users where possible. Type - yes, using github issues labels Component - yes, using labels Version - probably yes, using labels Keywords - Yes, as a line in the github issue comments Priority - yes, using labels Milestone - yes, using github milestones Assigned to/Owner - yes, mapping to github users Cc - Yes, for github users we can map to, using @username Resolution - Only if tickets other than "open" are migrated, and as part of the issue comments Status - Yes, using labels Summary - Yes, as the github issue title Description - Yes, as the github issue body Comments - Yes, as github issue comments Attachments - Yes, linked in an issue comment Note that the Trac database will be available for some time, and every migrated ticket will include a link to the original. If the Trac database is ever disabled, these links could be migrated to a static copy of the site. At some point, there may be a new location for ticket attachments, requiring changing links in some github issues. That's a separate issue from this discussion. Trac allows WikiFormatting, while github has a more limited markup language. As much translation as is possible will be performed within comments (code blocks, at an absolute minimum). Tickets will be migrated to a test repository before being moved to the numpy github repo, and there will be ample time to review the result before proceeding beyond that step. Ray Jones From bergstrj at iro.umontreal.ca Mon Jun 11 11:41:05 2012 From: bergstrj at iro.umontreal.ca (James Bergstra) Date: Mon, 11 Jun 2012 11:41:05 -0400 Subject: [Numpy-discussion] lazy evaluation In-Reply-To: References: <4FCE7BCB.1060900@astro.uio.no> <4FCF2154.7020803@astro.uio.no> Message-ID: On Mon, Jun 11, 2012 at 12:03 AM, James Bergstra wrote: > If anyone is interested in my ongoing API & bytecode adventure in why > / how lazy computing could be useful, I've put together a few tiny > hypothetically-runnable examples here: > > https://github.com/jaberg/numba/tree/master/examples > https://github.com/jaberg/numba/blob/master/examples/linear_svm.py > https://github.com/jaberg/numba/blob/master/examples/mcmc.py > > The purpose of the examples is to show how the features of e.g. Theano > and PyMC could be expressed as operators on raw Python code. Perhaps > most importantly of all, these transforms would work together: a PaCal > transform could automatically generate a likelihood function from a > model and data, and then a Theano transform could provide the > parameter gradients required to fit the likelihood. This natural > chaining is a complete PITA when every project uses its own AST. > > That numba fork also includes very sketchy pseudocode of the main work > routines in the numba/ad.py and numba/rv.py files. The linear_svm > example was recently using Theano as a backend. I don't think it works > right now but FWIW it is still close to running. > For those interested, the linear_svm example works again. -- http://www-etud.iro.umontreal.ca/~bergstrj From nouiz at nouiz.org Mon Jun 11 16:45:54 2012 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Mon, 11 Jun 2012 16:45:54 -0400 Subject: [Numpy-discussion] not expected output of fill_diagonal In-Reply-To: References: Message-ID: Thanks, I made a PR to fix it. I changed the current behavior to don't "wrap", but there is an option to change it. If someone think that we should not change the current default behavior. tell me. https://github.com/numpy/numpy/pull/306 Fred On Sat, Jun 9, 2012 at 8:44 AM, Warren Weckesser wrote: > > > On Fri, Jun 8, 2012 at 7:45 PM, Fr?d?ric Bastien wrote: >> >> Hi, >> >> While reviewing the Theano op that wrap numpy.fill_diagonal, we found >> an unexpected behavior of it: >> >> # as expected for square matrix >> >>> a=numpy.zeros((5,5)) >> >>> numpy.fill_diagonal(a, 10) >> >>> print a >> >> # as expected long rectangular matrix >> >>> a=numpy.zeros((3,5)) >> >>> numpy.fill_diagonal(a, 10) >> >>> print a >> [[ 10. ? 0. ? 0. ? 0. ? 0.] >> ?[ ?0. ?10. ? 0. ? 0. ? 0.] >> ?[ ?0. ? 0. ?10. ? 0. ? 0.]] >> >> # Not as expected >> >>> a=numpy.zeros((5,3)) >> >>> numpy.fill_diagonal(a, 10) >> >>> print a >> [[ 10. ? 0. ? 0.] >> ?[ ?0. ?10. ? 0.] >> ?[ ?0. ? 0. ?10.] >> ?[ ?0. ? 0. ? 0.] >> ?[ 10. ? 0. ? 0.]] >> >> >> I can make a PR that will add a parameter wrap that allow to control >> if it return the old behavior or what I would expect in the last case: >> [[ 10. ? 0. ? 0.] >> ?[ ?0. ?10. ? 0.] >> ?[ ?0. ? 0. ?10.] >> ?[ ?0. ? 0. ? 0.] >> ?[ ?0. ? 0. ? 0.]] >> >> My questions is, do someone else expect the current behavior? Should >> we change the default to be what I expect? Do you want that we warn if >> the user didn't specify witch behavior and in the future we change it? >> > > > There is a ticket for this: > > ??? http://projects.scipy.org/numpy/ticket/1953 > > I agree that the behavior is unexpected and should be fixed. > > Warren > > >> >> Anything else I didn't think? >> >> thanks >> >> Fred >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From ralf.gommers at googlemail.com Tue Jun 12 11:55:46 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 12 Jun 2012 17:55:46 +0200 Subject: [Numpy-discussion] Migrating numpy Trac to github issues In-Reply-To: References: Message-ID: On Mon, Jun 11, 2012 at 4:11 PM, Thouis (Ray) Jones wrote: > I've volunteered to help manage the migration of numpy tickets from > Trac to github issues. Awesome, thanks! > The first part of this process is to decide > which tickets to migrate, and how to map Trac ticket data to github > issue data. > > Question 1: Which tickets should be migrated? Open? Open and > recently closed? All tickets? > > If Trac is going to disappear eventually and it's not too much trouble, all would be good. > Question 2: Which parts of the tickets should be included, and in what way? > > For reference, these are the parts of a ticket (from > http://trac.edgewall.org/wiki/TracTickets) > > Reporter > Type (e.g., defect, enhancementt). > Component > Version > Keywords > Priority > Milestone > Assigned to/Owner > Cc > Resolution (fixed, invalid, wontfix, duplicate, worksforme, others?) > Status (e.g., new, needs_review, etc.) > Summary (brief description) > Description (full description) > Comments > Attachments (not listed on the link above, but enabled for numpy Trac) > ??? (Other) ? I don't have access to the Trac configuration, so it's > not clear if there are other fields I may be missing. > > As an initial suggestion, I propose: > > Reporter - yes, mapping Trac users to github users where possible. > And if not possible, include username in comment on Github? Also, consider sending Trac users an email directly about the moved ticket? > Type - yes, using github issues labels > Component - yes, using labels > Version - probably yes, using labels > I don't find version useful at all, so I'd prefer to leave it out. Alternatively, keep as a comment on the issue. It's not important enough for a label. Also, because users can't assign labels, it will be extra work (which lands in comments) to attach this label to new issues after they're reported. Keywords - Yes, as a line in the github issue comments > Priority - yes, using labels > Milestone - yes, using github milestones > Assigned to/Owner - yes, mapping to github users > Cc - Yes, for github users we can map to, using @username > Resolution - Only if tickets other than "open" are migrated, and as > part of the issue comments > Status - Yes, using labels > Summary - Yes, as the github issue title > Description - Yes, as the github issue body > Comments - Yes, as github issue comments > Attachments - Yes, linked in an issue comment > All agreed. > Note that the Trac database will be available for some time, and every > migrated ticket will include a link to the original. If the Trac > database is ever disabled, these links could be migrated to a static > copy of the site. > > At some point, there may be a new location for ticket attachments, > requiring changing links in some github issues. That's a separate > issue from this discussion. > > Trac allows WikiFormatting, while github has a more limited markup > language. As much translation as is possible will be performed within > comments (code blocks, at an absolute minimum). > In practice, code blocks are probably enough. > > Tickets will be migrated to a test repository before being moved to > the numpy github repo, and there will be ample time to review the > result before proceeding beyond that step. > Sounds good. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From bobtnur78 at gmail.com Tue Jun 12 14:49:12 2012 From: bobtnur78 at gmail.com (bob tnur) Date: Tue, 12 Jun 2012 14:49:12 -0400 Subject: [Numpy-discussion] numpy array in networkx graph? Message-ID: can anyone give me a hint on the following code? import network as nx import pylab as plt G=nx.Graph(M) # M is numpy matrix ,i.e:type(M)=numpy.ndarray for i in xrange(len(M)): tt=P[i,:].sum() if tt==1: G.add_node(i,color='blue') elif tt==2: G.add_node(i,color='red') elif tt==3: G.add_node(i,color='white') else: tt==4 G.add_node(i,color='green') G.nodes(data=True) T=nx.draw(G) plt.axis('off') plt.savefig("test.png") I didn't get color change, still the defualt color is used.Did I miss something? my aim is to obtain: something like: find total number of w-red-red-z path number of w-red-red-red-z path number of w-red-red-red-red-z path where w (left side of some cyclic polygon(can also be conjugated ring)) and z(right-side of it)are any of the colors except red. any comment is appreciated? Thanks Bob -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett.olsen at gmail.com Tue Jun 12 15:32:19 2012 From: brett.olsen at gmail.com (Brett Olsen) Date: Tue, 12 Jun 2012 14:32:19 -0500 Subject: [Numpy-discussion] numpy array in networkx graph? In-Reply-To: References: Message-ID: This seems to work: import networkx as nx import pylab import numpy as N M = N.random.random((10, 10)) G = nx.Graph(M) node_colors = [] for i in xrange(len(M)): if M[i,0] < 0.5: node_colors.append('white') else: node_colors.append('blue') nx.draw(G, node_color=node_colors) pylab.show() ~Brett On Tue, Jun 12, 2012 at 1:49 PM, bob tnur wrote: > can anyone give me a hint on the following code? > > import network as nx > import pylab as plt > > G=nx.Graph(M)? # M is numpy matrix ,i.e:type(M)=numpy.ndarray > for i in xrange(len(M)): > ????? tt=P[i,:].sum() > ????? if tt==1: > ????????? G.add_node(i,color='blue') > ????? elif tt==2: > ????????? G.add_node(i,color='red') > ????? elif tt==3: > ????????? G.add_node(i,color='white') > ????? else: > ????????? tt==4 > ????????? G.add_node(i,color='green') > G.nodes(data=True) > T=nx.draw(G) > plt.axis('off') > plt.savefig("test.png") > > I didn't get color change, still the defualt color is used.Did I miss > something? > > my aim is to obtain: > something like: > find total number of w-red-red-z path > ?????????? number of w-red-red-red-z path > ?????????? number of w-red-red-red-red-z path > where w (left side of some cyclic polygon(can also be conjugated ring)) > and z(right-side of it)are any of the colors except red. > > any comment is appreciated? > Thanks > Bob > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From bryanv at continuum.io Tue Jun 12 17:27:28 2012 From: bryanv at continuum.io (Bryan Van de Ven) Date: Tue, 12 Jun 2012 16:27:28 -0500 Subject: [Numpy-discussion] Enum/Factor NEP (now with code) Message-ID: <4FD7B440.8030500@continuum.io> Hi all, It has been some time, but I do have an update regarding this proposed feature. I thought it would be helpful to flesh out some parts of a possible implementation to learn what can be spelled reasonably in NumPy. Mark Wiebe helped out greatly in navigating the NumPy code codebase. Here is a link to my branch with this code; https://github.com/bryevdv/numpy/tree/enum and the updated NEP: https://github.com/bryevdv/numpy/blob/enum/doc/neps/enum.rst Not everything in the NEP is implemented (integral levels and natural naming in particular) and some parts definitely need more fleshing out. However, things currently work basically as described in the NEP, and there is also a small set of tests that demonstrate current usage. A few things will crash python (astype especially). More tests are needed. I would appreciate as much feedback and discussion as you can provide! Thanks, Bryan Van de Ven From stefan at sun.ac.za Tue Jun 12 21:48:16 2012 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 12 Jun 2012 18:48:16 -0700 Subject: [Numpy-discussion] SciPy2012 conference: Last week for early birds, poster submissions In-Reply-To: References: Message-ID: Hi everyone We're rapidly approaching SciPy2012 , which takes place in Austin, Texas from July 16th to 21st. This is a reminder that the *discounted early bird registration* closes on the 18th of this month. Also, we decided to keep the queue for *poster submissions* open until all slots are filled. So, whether you have a neat side project, a lightning talk gone rogue, or simply want to get the community talking about your latest and greatest idea--send in a poster abstract to 2012submissions at scipy.org. See you in Austin! St?fan -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Jun 13 09:33:41 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 13 Jun 2012 14:33:41 +0100 Subject: [Numpy-discussion] Enum/Factor NEP (now with code) In-Reply-To: <4FD7B440.8030500@continuum.io> References: <4FD7B440.8030500@continuum.io> Message-ID: On Tue, Jun 12, 2012 at 10:27 PM, Bryan Van de Ven wrote: > Hi all, > > It has been some time, but I do have an update regarding this proposed > feature. I thought it would be helpful to flesh out some parts of a > possible implementation to learn what can be spelled reasonably in > NumPy. Mark Wiebe helped out greatly in navigating the NumPy code > codebase. Here is a link to my branch with this code; > > ? ? https://github.com/bryevdv/numpy/tree/enum > > and the updated NEP: > > ? ? https://github.com/bryevdv/numpy/blob/enum/doc/neps/enum.rst > > Not everything in the NEP is implemented (integral levels and natural > naming in particular) and some parts definitely need more fleshing out. > However, things currently work basically as described in the NEP, and > there is also a small set of tests that demonstrate current usage. A few > things will crash python (astype especially). More tests are needed. I > would appreciate as much feedback and discussion as you can provide! Hi Bryan, I skimmed over the diff: https://github.com/bryevdv/numpy/compare/master...enum It was a bit hard to read since it seems like about half the changes in that branch are datatime cleanups or something? I hope you'll separate those out -- it's much easier to review self-contained changes, and the more changes you roll together into a big lump, the more risk there is that they'll get lost all together. >From the updated NEP I actually understand the use case for "open types" now, so that's good :-). But I don't think they're actually workable, so that's bad :-(. The use case, as I understand it, is for when you want to extend the levels set on the fly as you read through a file. The problem with this is that it produces a non-deterministic level ordering, where level 0 is whatever was seen first in the file, level 1 is whatever was seen second, etc. E.g., say I have a CSV file I read in: subject,initial_skill,skill_after_training 1,LOW,HIGH 2,LOW,LOW 3,HIGH,HIGH ... With the scheme described in the NEP, my initial_skill dtype will have levels ["LOW", "HIGH"], and by skill_after_training dtype will have levels ["HIGH","LOW"], which means that their storage will be incompatible, comparisons won't work (or will have to go through some nasty convert-to-string-and-back path), etc. Another situation where this will occur is if you have multiple data files in the same format; whether or not you're able to compare the data from them will depend on the order the data happens to occur in in each file. The solution is that whenever we automagically create a set of levels from some data, and the user hasn't specified any order, we should pick an order deterministically by sorting the levels. (This is also what R does. levels(factor(c("a", "b"))) -> "a", "b". levels(factor(c("b", "a"))) -> "a", "b".) I'm inclined to say therefore that we should just drop the "open type" idea, since it adds complexity but doesn't seem to actually solve the problem it's designed for. Can you explain why you're using khash instead of PyDict? It seems to add a *lot* of complexity -- like it seems like you're using about as many lines of code just marshalling data into and out of the khash as I used for my old npenum.pyx prototype (not even counting all the extra work required to , and AFAICT my prototype has about the same amount of functionality as this. (Of course that's not entirely fair, because I was working in Cython... but why not work in Cython?) And you'll need to expose a Python dict interface sooner or later anyway, I'd think? I can't tell if it's worth having categorical scalar types. What value do they provide over just using scalars of the level type? Terminology: I'd like to suggest we prefer the term "categorical" for this data, rather than "factor" or "enum". Partly this is because it makes my life easier ;-): https://groups.google.com/forum/#!msg/pystatsmodels/wLX1-a5Y9fg/04HFKEu45W4J and partly because numpy has a very diverse set of users and I suspect that "categorical" will just be a more transparent name to those who aren't already familiar with the particular statistical and programming traditions that "factor" and "enum" come from. I'm disturbed to see you adding special cases to the core ufunc dispatch machinery for these things. I'm -1 on that. We should clean up the generic ufunc machinery so that it doesn't need special cases to handle adding a simple type like this. I'm also worried that I still don't see any signs that you're working with the downstream libraries that this functionality is intended to be useful for, like the various HDF5 libraries and pandas. I really don't think this functionality can be merged to numpy until we have affirmative statements from those developers that they are excited about it and will use it, and since they're busy people, it's pretty much your job to track them down and make sure that your code will solve their problems. Hope that helps -- it's exciting to see someone working on this, and you seem to be off to a good start! -N From bobtnur78 at gmail.com Wed Jun 13 11:04:23 2012 From: bobtnur78 at gmail.com (bob tnur) Date: Wed, 13 Jun 2012 11:04:23 -0400 Subject: [Numpy-discussion] numpy array in networkx graph? In-Reply-To: References: Message-ID: I have M is numpy matrix with 0's& 1's. I want to color the nodes with different colors. can anyone give me a hint on the following code? import network as nx import pylab as plt G=nx.Graph(M) # M is numpy matrix ,i.e:type(M)=numpy.ndarray for i in xrange(len(M)): tt=P[i,:].sum() if tt==1: G.add_node(i,color='blue') elif tt==2: G.add_node(i,color='red') elif tt==3: G.add_node(i,color='white') else: tt==4 G.add_node(i,color='green') G.nodes(data=True) T=nx.draw(G) plt.axis('off') plt.savefig("test.png") I didn't get color change, still the defualt color is used.Did I miss something? my aim is to obtain: something like: find total number of w-red-red-z path number of w-red-red-red-z path number of w-red-red-red-red-z path where w (left side of some cyclic polygon(can also be conjugated ring)) and z(right-side of it)are any of the colors except red. any comment is appreciated? Thanks Bob -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Wed Jun 13 12:04:53 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 13 Jun 2012 18:04:53 +0200 Subject: [Numpy-discussion] Enum/Factor NEP (now with code) In-Reply-To: References: <4FD7B440.8030500@continuum.io> Message-ID: <4FD8BA25.3060404@astro.uio.no> On 06/13/2012 03:33 PM, Nathaniel Smith wrote: > On Tue, Jun 12, 2012 at 10:27 PM, Bryan Van de Ven wrote: >> Hi all, >> >> It has been some time, but I do have an update regarding this proposed >> feature. I thought it would be helpful to flesh out some parts of a >> possible implementation to learn what can be spelled reasonably in >> NumPy. Mark Wiebe helped out greatly in navigating the NumPy code >> codebase. Here is a link to my branch with this code; >> >> https://github.com/bryevdv/numpy/tree/enum >> >> and the updated NEP: >> >> https://github.com/bryevdv/numpy/blob/enum/doc/neps/enum.rst >> >> Not everything in the NEP is implemented (integral levels and natural >> naming in particular) and some parts definitely need more fleshing out. >> However, things currently work basically as described in the NEP, and >> there is also a small set of tests that demonstrate current usage. A few >> things will crash python (astype especially). More tests are needed. I >> would appreciate as much feedback and discussion as you can provide! > > Hi Bryan, > > I skimmed over the diff: > https://github.com/bryevdv/numpy/compare/master...enum > It was a bit hard to read since it seems like about half the changes > in that branch are datatime cleanups or something? I hope you'll > separate those out -- it's much easier to review self-contained > changes, and the more changes you roll together into a big lump, the > more risk there is that they'll get lost all together. > > From the updated NEP I actually understand the use case for "open > types" now, so that's good :-). But I don't think they're actually > workable, so that's bad :-(. The use case, as I understand it, is for > when you want to extend the levels set on the fly as you read through > a file. The problem with this is that it produces a non-deterministic > level ordering, where level 0 is whatever was seen first in the file, > level 1 is whatever was seen second, etc. E.g., say I have a CSV file > I read in: > > subject,initial_skill,skill_after_training > 1,LOW,HIGH > 2,LOW,LOW > 3,HIGH,HIGH > ... > > With the scheme described in the NEP, my initial_skill dtype will have > levels ["LOW", "HIGH"], and by skill_after_training dtype will have > levels ["HIGH","LOW"], which means that their storage will be > incompatible, comparisons won't work (or will have to go through some > nasty convert-to-string-and-back path), etc. Another situation where > this will occur is if you have multiple data files in the same format; > whether or not you're able to compare the data from them will depend > on the order the data happens to occur in in each file. The solution > is that whenever we automagically create a set of levels from some > data, and the user hasn't specified any order, we should pick an order > deterministically by sorting the levels. (This is also what R does. > levels(factor(c("a", "b"))) -> "a", "b". levels(factor(c("b", "a"))) > -> "a", "b".) > > I'm inclined to say therefore that we should just drop the "open type" > idea, since it adds complexity but doesn't seem to actually solve the > problem it's designed for. If one wants to have an "open", hassle-free enum, an alternative would be to cryptographically hash the enum string. I'd trust 64 bits of hash for this purpose. The obvious disadvantage is the extra space used, but it'd be a bit more hassle-free compared to regular enums; you'd never have to fix the set of enum strings and they'd always be directly comparable across different arrays. HDF libraries etc. could compress it at the storage layer, storing the enum mapping in the metadata. Just a thought. Dag > > Can you explain why you're using khash instead of PyDict? It seems to > add a *lot* of complexity -- like it seems like you're using about as > many lines of code just marshalling data into and out of the khash as > I used for my old npenum.pyx prototype (not even counting all the > extra work required to , and AFAICT my prototype has about the same > amount of functionality as this. (Of course that's not entirely fair, > because I was working in Cython... but why not work in Cython?) And > you'll need to expose a Python dict interface sooner or later anyway, > I'd think? > > I can't tell if it's worth having categorical scalar types. What value > do they provide over just using scalars of the level type? > > Terminology: I'd like to suggest we prefer the term "categorical" for > this data, rather than "factor" or "enum". Partly this is because it > makes my life easier ;-): > https://groups.google.com/forum/#!msg/pystatsmodels/wLX1-a5Y9fg/04HFKEu45W4J > and partly because numpy has a very diverse set of users and I suspect > that "categorical" will just be a more transparent name to those who > aren't already familiar with the particular statistical and > programming traditions that "factor" and "enum" come from. > > I'm disturbed to see you adding special cases to the core ufunc > dispatch machinery for these things. I'm -1 on that. We should clean > up the generic ufunc machinery so that it doesn't need special cases > to handle adding a simple type like this. > > I'm also worried that I still don't see any signs that you're working > with the downstream libraries that this functionality is intended to > be useful for, like the various HDF5 libraries and pandas. I really > don't think this functionality can be merged to numpy until we have > affirmative statements from those developers that they are excited > about it and will use it, and since they're busy people, it's pretty > much your job to track them down and make sure that your code will > solve their problems. > > Hope that helps -- it's exciting to see someone working on this, and > you seem to be off to a good start! > > -N > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From njs at pobox.com Wed Jun 13 12:23:11 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 13 Jun 2012 17:23:11 +0100 Subject: [Numpy-discussion] Enum/Factor NEP (now with code) In-Reply-To: <4FD8BA25.3060404@astro.uio.no> References: <4FD7B440.8030500@continuum.io> <4FD8BA25.3060404@astro.uio.no> Message-ID: On Wed, Jun 13, 2012 at 5:04 PM, Dag Sverre Seljebotn wrote: > On 06/13/2012 03:33 PM, Nathaniel Smith wrote: >> I'm inclined to say therefore that we should just drop the "open type" >> idea, since it adds complexity but doesn't seem to actually solve the >> problem it's designed for. > > If one wants to have an "open", hassle-free enum, an alternative would > be to cryptographically hash the enum string. I'd trust 64 bits of hash > for this purpose. > > The obvious disadvantage is the extra space used, but it'd be a bit more > hassle-free compared to regular enums; you'd never have to fix the set > of enum strings and they'd always be directly comparable across > different arrays. HDF libraries etc. could compress it at the storage > layer, storing the enum mapping in the metadata. You'd trust 64 bits to be collision-free for all strings ever stored in numpy, eternally? I wouldn't. Anyway, if the goal is to store an arbitrary set of strings in 64 bits apiece, then there is no downside to just using an object array + interning (like pandas does now), and this *is* guaranteed to be collision free. Maybe it would be useful to have a "heap string" dtype, but that'd be something different. AFAIK all the cases where an explicit categorical type adds value over this are the ones where having an explicit set of levels is useful. Representing HDF5 enums or R factors requires a way to specify arbitrary string<->integer mappings, and there are algorithms (e.g. in charlton) that are much more efficient if they can figure out what the set of possible levels is directly without scanning the whole array. -N From bryanv at continuum.io Wed Jun 13 12:44:14 2012 From: bryanv at continuum.io (Bryan Van de Ven) Date: Wed, 13 Jun 2012 11:44:14 -0500 Subject: [Numpy-discussion] Enum/Factor NEP (now with code) In-Reply-To: References: <4FD7B440.8030500@continuum.io> Message-ID: <4FD8C35E.1080901@continuum.io> On 6/13/12 8:33 AM, Nathaniel Smith wrote: > Hi Bryan, > > I skimmed over the diff: > https://github.com/bryevdv/numpy/compare/master...enum > It was a bit hard to read since it seems like about half the changes > in that branch are datatime cleanups or something? I hope you'll > separate those out -- it's much easier to review self-contained > changes, and the more changes you roll together into a big lump, the > more risk there is that they'll get lost all together. I'm not quite sure what happened there, my git skills are not advanced by any measure. I think the datetime changes are a much smaller fraction than fifty percent, but I will see what I can do to separate them out in the near future. > From the updated NEP I actually understand the use case for "open > types" now, so that's good :-). But I don't think they're actually > workable, so that's bad :-(. The use case, as I understand it, is for > when you want to extend the levels set on the fly as you read through > a file. The problem with this is that it produces a non-deterministic > level ordering, where level 0 is whatever was seen first in the file, > level 1 is whatever was seen second, etc. E.g., say I have a CSV file > I read in: > > subject,initial_skill,skill_after_training > 1,LOW,HIGH > 2,LOW,LOW > 3,HIGH,HIGH > ... > > With the scheme described in the NEP, my initial_skill dtype will have > levels ["LOW", "HIGH"], and by skill_after_training dtype will have > levels ["HIGH","LOW"], which means that their storage will be > incompatible, comparisons won't work (or will have to go through some I imagine users using the same open dtype object in both fields of the structure dtype used to read in the file, if both fields of the file contain the same categories. If they don't contain the same categories, they are incomparable in any case. I believe many users have this simpler use case where each field is a separate category, and they want to read them all individually, separately on the fly. For these simple cases, it would "just work". For your case example there would definitely be a documentation, examples, tutorials, education issue, to avoid the "gotcha" you describe. > nasty convert-to-string-and-back path), etc. Another situation where > this will occur is if you have multiple data files in the same format; > whether or not you're able to compare the data from them will depend > on the order the data happens to occur in in each file. The solution > is that whenever we automagically create a set of levels from some > data, and the user hasn't specified any order, we should pick an order > deterministically by sorting the levels. (This is also what R does. > levels(factor(c("a", "b"))) -> "a", "b". levels(factor(c("b", "a"))) > -> "a", "b".) A solution is to create the dtype object when reading in the first file, and to reuse that same dtype object when reading in subsequent files. Perhaps it's not ideal, but it does enable the work to be done. > Can you explain why you're using khash instead of PyDict? It seems to > add a *lot* of complexity -- like it seems like you're using about as > many lines of code just marshalling data into and out of the khash as > I used for my old npenum.pyx prototype (not even counting all the > extra work required to , and AFAICT my prototype has about the same > amount of functionality as this. (Of course that's not entirely fair, > because I was working in Cython... but why not work in Cython?) And > you'll need to expose a Python dict interface sooner or later anyway, > I'd think? I suppose I agree with the sentiment that the core of NumPy really ought to be less dependent on the Python C API, not more. I also think the khash API is pretty dead simple and straightforward, and the fact that it is contained in a singe header is attractive. It's also quite performant in time and space. But if others disagree strongly, all of it's uses are hidden behind the interface in leveled_dtypes.c, it could be replaced with some other mechanism easily enough. > I can't tell if it's worth having categorical scalar types. What value > do they provide over just using scalars of the level type? I'm not certain they are worthwhile either, which is why I did not spend any time on them yet. Wes has expressed a desire for very broad categorical types (even more than just scalar categories), hopefully he can chime in with his motivations. > Terminology: I'd like to suggest we prefer the term "categorical" for > this data, rather than "factor" or "enum". Partly this is because it > makes my life easier ;-): > https://groups.google.com/forum/#!msg/pystatsmodels/wLX1-a5Y9fg/04HFKEu45W4J > and partly because numpy has a very diverse set of users and I suspect > that "categorical" will just be a more transparent name to those who > aren't already familiar with the particular statistical and > programming traditions that "factor" and "enum" come from. I think I like "categorical" over "factor" but I am not sure we should ditch "enum". There are two different use cases here: I have a pile of strings (or scalars) that I want to treat as discrete things (categories), and: I have a pile of numbers that I want to give convenient or meaningful names to (enums). This latter case was the motivation for possibly adding "Natural Naming". > I'm disturbed to see you adding special cases to the core ufunc > dispatch machinery for these things. I'm -1 on that. We should clean > up the generic ufunc machinery so that it doesn't need special cases > to handle adding a simple type like this. This could certainly be improved, I agree. > I'm also worried that I still don't see any signs that you're working > with the downstream libraries that this functionality is intended to > be useful for, like the various HDF5 libraries and pandas. I really > don't think this functionality can be merged to numpy until we have > affirmative statements from those developers that they are excited > about it and will use it, and since they're busy people, it's pretty > much your job to track them down and make sure that your code will > solve their problems. Francesc is certainly aware of this work, and I emailed Wes earlier this week, I probably should have mentioned that, though. Hopefully they will have time to contribute their thoughts. I also imagine Travis can speak on behalf of the users he has interacted with over the last several years that have requested this feature that don't happen to follow mailing lists. Thanks, Bryan From d.s.seljebotn at astro.uio.no Wed Jun 13 12:48:28 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 13 Jun 2012 18:48:28 +0200 Subject: [Numpy-discussion] Enum/Factor NEP (now with code) In-Reply-To: References: <4FD7B440.8030500@continuum.io> <4FD8BA25.3060404@astro.uio.no> Message-ID: <673db4b2-c11f-4d63-8b77-e72d7eba32e7@email.android.com> Nathaniel Smith wrote: >On Wed, Jun 13, 2012 at 5:04 PM, Dag Sverre Seljebotn > wrote: >> On 06/13/2012 03:33 PM, Nathaniel Smith wrote: >>> I'm inclined to say therefore that we should just drop the "open >type" >>> idea, since it adds complexity but doesn't seem to actually solve >the >>> problem it's designed for. >> >> If one wants to have an "open", hassle-free enum, an alternative >would >> be to cryptographically hash the enum string. I'd trust 64 bits of >hash >> for this purpose. >> >> The obvious disadvantage is the extra space used, but it'd be a bit >more >> hassle-free compared to regular enums; you'd never have to fix the >set >> of enum strings and they'd always be directly comparable across >> different arrays. HDF libraries etc. could compress it at the storage >> layer, storing the enum mapping in the metadata. > >You'd trust 64 bits to be collision-free for all strings ever stored >in numpy, eternally? I wouldn't. Anyway, if the goal is to store an >arbitrary set of strings in 64 bits apiece, then there is no downside >to just using an object array + interning (like pandas does now), and >this *is* guaranteed to be collision free. Maybe it would be useful to >have a "heap string" dtype, but that'd be something different. Heh, we've been having this discussion before :-) The 'interned heap string dtype' may be something different, but it could be something that could meet the 'open enum' usecases (assuming they exist) in a better way than making enums complicated. Consider it a backup strategy if one can't put the open enum idea dead otherwise.. > >AFAIK all the cases where an explicit categorical type adds value over >this are the ones where having an explicit set of levels is useful. >Representing HDF5 enums or R factors requires a way to specify >arbitrary string<->integer mappings, and there are algorithms (e.g. in >charlton) that are much more efficient if they can figure out what the >set of possible levels is directly without scanning the whole array. For interned strings, the set of strings present could be stored in the array in principle (though I guess it would be very difficult to implement in current numpy). The perfect hash schemes we've explored on the Cython list lately uses around 10-20 microseconds on my 1.8 GHz for 64-element table rehashing (worst case insertion, happens more often than with insertion in regular hash tables) and 0.5-2 nanoseconds for a lookup in L1 (which always hits on first try if the entry is in the table). Dag >_______________________________________________ >NumPy-Discussion mailing list >NumPy-Discussion at scipy.org >http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. From thouis at gmail.com Wed Jun 13 12:57:01 2012 From: thouis at gmail.com (Thouis (Ray) Jones) Date: Wed, 13 Jun 2012 18:57:01 +0200 Subject: [Numpy-discussion] Neighborhood iterator: way to easily check which elements have already been visited in parent iterator? Message-ID: Hello, I'm rewriting scipy.ndimage.label() using numpy's iterator API, and would like to add the ability for it to operate in-place. However, to do so, I need to limit the neighbors consulted to those that have already been processed in the parent iterator over the input and output arrays. Is there a general way to do this? If the parent iterator is constructed with the default flags (order='K' and readonly), can I assume that it will always move forward in memory? Thanks for any advice, Ray Jones From bergstrj at iro.umontreal.ca Wed Jun 13 13:42:22 2012 From: bergstrj at iro.umontreal.ca (James Bergstra) Date: Wed, 13 Jun 2012 13:42:22 -0400 Subject: [Numpy-discussion] automatic differentiation with PyAutoDiff Message-ID: Further to the recent discussion on lazy evaluation & numba, I moved what I was doing into a new project: PyAutoDiff: https://github.com/jaberg/pyautodiff It currently works by executing CPython bytecode with a numpy-aware engine that builds a symbolic expression graph with Theano... so you can do for example: >>> import autodiff, numpy as np >>> autodiff.fmin_l_bfgs_b(lambda x: (x + 1) ** 2, [np.zeros(())]) ... and you'll see `[array(-1.0)]` printed out. In the future, I think it should be able to export the gradient-computing function as bytecode, which could then be optimized by e.g. numba or a theano bytecode front-end. For now it just compiles and runs the Theano graph that it built. It's still pretty rough (you'll see if you look at the code!) but I'm excited about it. - James From tim at cerazone.net Wed Jun 13 13:47:42 2012 From: tim at cerazone.net (Tim Cera) Date: Wed, 13 Jun 2012 13:47:42 -0400 Subject: [Numpy-discussion] Neighborhood iterator: way to easily check which elements have already been visited in parent iterator? In-Reply-To: References: Message-ID: Tried to figure out in-place calculation for the neighbor routine that I recently submitted to numpy, but got nowhere. See https://github.com/numpy/numpy/pull/303 for what I came up with. It currently makes a new array to hold the calculations. If someone does come up with something - I would be very interested. Kindest regards, Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Wed Jun 13 13:58:50 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 13 Jun 2012 19:58:50 +0200 Subject: [Numpy-discussion] Neighborhood iterator: way to easily check which elements have already been visited in parent iterator? In-Reply-To: References: Message-ID: On Wed, Jun 13, 2012 at 6:57 PM, Thouis (Ray) Jones wrote: > Hello, > > I'm rewriting scipy.ndimage.label() using numpy's iterator API, I think there were some changes to the iterator API recently, so please keep in mind that scipy has to still be compatible with numpy 1.5.1 (at least for now). Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Jun 13 14:12:05 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 13 Jun 2012 19:12:05 +0100 Subject: [Numpy-discussion] Enum/Factor NEP (now with code) In-Reply-To: <4FD8C35E.1080901@continuum.io> References: <4FD7B440.8030500@continuum.io> <4FD8C35E.1080901@continuum.io> Message-ID: On Wed, Jun 13, 2012 at 5:44 PM, Bryan Van de Ven wrote: > On 6/13/12 8:33 AM, Nathaniel Smith wrote: >> Hi Bryan, >> >> I skimmed over the diff: >> ? ? https://github.com/bryevdv/numpy/compare/master...enum >> It was a bit hard to read since it seems like about half the changes >> in that branch are datatime cleanups or something? I hope you'll >> separate those out -- it's much easier to review self-contained >> changes, and the more changes you roll together into a big lump, the >> more risk there is that they'll get lost all together. > > I'm not quite sure what happened there, my git skills are not advanced > by any measure. I think the datetime changes are a much smaller fraction > than fifty percent, but I will see what I can do to separate them out in > the near future. Looking again, it looks like a lot of it is actually because when I asked github to show me the diff between your branch and master, it showed me the diff between your branch and *your repository's* version of master. And your branch is actually based off a newer version of 'master' than you have in your repository. So, as far as git and github are concerned, all those changes that are included in your-branch's-base-master but not in your-repo's-master are new stuff that you did on your branch. Solution is just to do git push master >> ?From the updated NEP I actually understand the use case for "open >> types" now, so that's good :-). But I don't think they're actually >> workable, so that's bad :-(. The use case, as I understand it, is for >> when you want to extend the levels set on the fly as you read through >> a file. The problem with this is that it produces a non-deterministic >> level ordering, where level 0 is whatever was seen first in the file, >> level 1 is whatever was seen second, etc. E.g., say I have a CSV file >> I read in: >> >> ? ? ?subject,initial_skill,skill_after_training >> ? ? ?1,LOW,HIGH >> ? ? ?2,LOW,LOW >> ? ? ?3,HIGH,HIGH >> ? ? ?... >> >> With the scheme described in the NEP, my initial_skill dtype will have >> levels ["LOW", "HIGH"], and by skill_after_training dtype will have >> levels ["HIGH","LOW"], which means that their storage will be >> incompatible, comparisons won't work (or will have to go through some > > I imagine users using the same open dtype object in both fields of the > structure dtype used to read in the file, if both fields of the file > contain the same categories. If they don't contain the same categories, > they are incomparable in any case. I believe many users have this > simpler use case where each field is a separate category, and they want > to read them all individually, separately on the fly. ?For these simple > cases, it would "just work". For your case example there would > definitely be a documentation, examples, tutorials, education issue, to > avoid the "gotcha" you describe. Yes, of course we *could* write the code to implement these "open" dtypes, and then write the documentation, examples, tutorials, etc. to help people work around their limitations. Or, we could just implement np.fromfile properly, which would require no workarounds and take less code to boot. >> nasty convert-to-string-and-back path), etc. Another situation where >> this will occur is if you have multiple data files in the same format; >> whether or not you're able to compare the data from them will depend >> on the order the data happens to occur in in each file. The solution >> is that whenever we automagically create a set of levels from some >> data, and the user hasn't specified any order, we should pick an order >> deterministically by sorting the levels. (This is also what R does. >> levels(factor(c("a", "b"))) -> ?"a", "b". levels(factor(c("b", "a"))) >> -> ?"a", "b".) > > A solution is to create the dtype object when reading in the first file, > and to reuse that same dtype object when reading in subsequent files. > Perhaps it's not ideal, but it does enable the work to be done. So would a proper implementation of np.fromfile that normalized the level ordering. >> Can you explain why you're using khash instead of PyDict? It seems to >> add a *lot* of complexity -- like it seems like you're using about as >> many lines of code just marshalling data into and out of the khash as >> I used for my old npenum.pyx prototype (not even counting all the >> extra work required to , and AFAICT my prototype has about the same >> amount of functionality as this. (Of course that's not entirely fair, >> because I was working in Cython... but why not work in Cython?) And >> you'll need to expose a Python dict interface sooner or later anyway, >> I'd think? > > I suppose I agree with the sentiment that the core of NumPy really ought > to be less dependent on the Python C API, not more. I also think the > khash API is pretty dead simple and straightforward, and the fact that > it is contained in a singe header is attractive. It's also quite > performant in time and space. But if others disagree strongly, all of > it's uses are hidden behind the interface in leveled_dtypes.c, it could > be replaced with some other mechanism easily enough. I'm not at all convinced by the argument that throwing in random redundant data types into NumPy will somehow reduce our dependence on the Python C API. If you have a plan to replace *all* use of dicts in numpy with khash, then we can talk about that, I guess. But that would be a separate patch, and I don't think using PyDict in this patch would really have any effect on how difficult that separate patch was to do. PyDict also has a very simple API -- and in fact, the comparison is between the PyDict API+the khash API, versus just the PyDict API alone, since everyone working with the Python C API already has to know how that works. It's also contained in effectively zero header files, which is even more attractive than one header file. And that interface in leveled_dtypes.c is the one that I was talking about being larger than my entire categorical dtype implementation. None of this means that using it is a bad idea, of course! Maybe it has some key advantage over PyDict in terms of memory use or something, for those people who have hundreds of thousands of distinct categories in their data, I don't know. But all your arguments here seem to be of the form "hey, it's not *that* bad", and it seems like there must be some actual affirmative advantages it has over PyDict if it's going to be worth using. >> I can't tell if it's worth having categorical scalar types. What value >> do they provide over just using scalars of the level type? > > I'm not certain they are worthwhile either, which is why I did not spend > any time on them yet. Wes has expressed a desire for very broad > categorical types (even more than just scalar categories), hopefully he > can chime in with his motivations. > >> Terminology: I'd like to suggest we prefer the term "categorical" for >> this data, rather than "factor" or "enum". Partly this is because it >> makes my life easier ;-): >> ? ?https://groups.google.com/forum/#!msg/pystatsmodels/wLX1-a5Y9fg/04HFKEu45W4J >> and partly because numpy has a very diverse set of users and I suspect >> that "categorical" will just be a more transparent name to those who >> aren't already familiar with the particular statistical and >> programming traditions that "factor" and "enum" come from. > > I think I like "categorical" over "factor" but I am not sure we should > ditch "enum". There are two different use cases here: I have a pile of > strings (or scalars) that I want to treat as discrete things > (categories), and: I have a pile of numbers that I want to give > convenient or meaningful names to (enums). This latter case was the > motivation for possibly adding "Natural Naming". So mention the word "enum" in the documentation, so people looking for that will find the categorical data support? :-) >> I'm disturbed to see you adding special cases to the core ufunc >> dispatch machinery for these things. I'm -1 on that. We should clean >> up the generic ufunc machinery so that it doesn't need special cases >> to handle adding a simple type like this. > > This could certainly be improved, I agree. I don't want to be Mr. Grumpypants here, but I do want to make sure we're speaking the same language: what "-1" means is "I consider this a show-stopper and will oppose merging any code that does not improve on this". (Of course you also always have the option of trying to change my mind. Even Mr. Grumpypants can be swayed by logic!) >> I'm also worried that I still don't see any signs that you're working >> with the downstream libraries that this functionality is intended to >> be useful for, like the various HDF5 libraries and pandas. I really >> don't think this functionality can be merged to numpy until we have >> affirmative statements from those developers that they are excited >> about it and will use it, and since they're busy people, it's pretty >> much your job to track them down and make sure that your code will >> solve their problems. > > Francesc is certainly aware of this work, and I emailed Wes earlier this > week, I probably should have mentioned that, though. Hopefully they will > have time to contribute their thoughts. I also imagine Travis can speak > on behalf of the users he has interacted with over the last several > years that have requested this feature that don't happen to follow > mailing lists. I'm glad Francesc and Wes are aware of the work, but my point was that that isn't enough. So if I were in your position and hoping to get this code merged, I'd be trying to figure out how to get them more actively on board? -N From wesmckinn at gmail.com Wed Jun 13 14:54:44 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Wed, 13 Jun 2012 14:54:44 -0400 Subject: [Numpy-discussion] Enum/Factor NEP (now with code) In-Reply-To: References: <4FD7B440.8030500@continuum.io> <4FD8C35E.1080901@continuum.io> Message-ID: On Wed, Jun 13, 2012 at 2:12 PM, Nathaniel Smith wrote: > On Wed, Jun 13, 2012 at 5:44 PM, Bryan Van de Ven wrote: >> On 6/13/12 8:33 AM, Nathaniel Smith wrote: >>> Hi Bryan, >>> >>> I skimmed over the diff: >>> ? ? https://github.com/bryevdv/numpy/compare/master...enum >>> It was a bit hard to read since it seems like about half the changes >>> in that branch are datatime cleanups or something? I hope you'll >>> separate those out -- it's much easier to review self-contained >>> changes, and the more changes you roll together into a big lump, the >>> more risk there is that they'll get lost all together. >> >> I'm not quite sure what happened there, my git skills are not advanced >> by any measure. I think the datetime changes are a much smaller fraction >> than fifty percent, but I will see what I can do to separate them out in >> the near future. > > Looking again, it looks like a lot of it is actually because when I > asked github to show me the diff between your branch and master, it > showed me the diff between your branch and *your repository's* version > of master. And your branch is actually based off a newer version of > 'master' than you have in your repository. So, as far as git and > github are concerned, all those changes that are included in > your-branch's-base-master but not in your-repo's-master are new stuff > that you did on your branch. Solution is just to do > ?git push master > >>> ?From the updated NEP I actually understand the use case for "open >>> types" now, so that's good :-). But I don't think they're actually >>> workable, so that's bad :-(. The use case, as I understand it, is for >>> when you want to extend the levels set on the fly as you read through >>> a file. The problem with this is that it produces a non-deterministic >>> level ordering, where level 0 is whatever was seen first in the file, >>> level 1 is whatever was seen second, etc. E.g., say I have a CSV file >>> I read in: >>> >>> ? ? ?subject,initial_skill,skill_after_training >>> ? ? ?1,LOW,HIGH >>> ? ? ?2,LOW,LOW >>> ? ? ?3,HIGH,HIGH >>> ? ? ?... >>> >>> With the scheme described in the NEP, my initial_skill dtype will have >>> levels ["LOW", "HIGH"], and by skill_after_training dtype will have >>> levels ["HIGH","LOW"], which means that their storage will be >>> incompatible, comparisons won't work (or will have to go through some >> >> I imagine users using the same open dtype object in both fields of the >> structure dtype used to read in the file, if both fields of the file >> contain the same categories. If they don't contain the same categories, >> they are incomparable in any case. I believe many users have this >> simpler use case where each field is a separate category, and they want >> to read them all individually, separately on the fly. ?For these simple >> cases, it would "just work". For your case example there would >> definitely be a documentation, examples, tutorials, education issue, to >> avoid the "gotcha" you describe. > > Yes, of course we *could* write the code to implement these "open" > dtypes, and then write the documentation, examples, tutorials, etc. to > help people work around their limitations. Or, we could just implement > np.fromfile properly, which would require no workarounds and take less > code to boot. > >>> nasty convert-to-string-and-back path), etc. Another situation where >>> this will occur is if you have multiple data files in the same format; >>> whether or not you're able to compare the data from them will depend >>> on the order the data happens to occur in in each file. The solution >>> is that whenever we automagically create a set of levels from some >>> data, and the user hasn't specified any order, we should pick an order >>> deterministically by sorting the levels. (This is also what R does. >>> levels(factor(c("a", "b"))) -> ?"a", "b". levels(factor(c("b", "a"))) >>> -> ?"a", "b".) >> >> A solution is to create the dtype object when reading in the first file, >> and to reuse that same dtype object when reading in subsequent files. >> Perhaps it's not ideal, but it does enable the work to be done. > > So would a proper implementation of np.fromfile that normalized the > level ordering. > >>> Can you explain why you're using khash instead of PyDict? It seems to >>> add a *lot* of complexity -- like it seems like you're using about as >>> many lines of code just marshalling data into and out of the khash as >>> I used for my old npenum.pyx prototype (not even counting all the >>> extra work required to , and AFAICT my prototype has about the same >>> amount of functionality as this. (Of course that's not entirely fair, >>> because I was working in Cython... but why not work in Cython?) And >>> you'll need to expose a Python dict interface sooner or later anyway, >>> I'd think? >> >> I suppose I agree with the sentiment that the core of NumPy really ought >> to be less dependent on the Python C API, not more. I also think the >> khash API is pretty dead simple and straightforward, and the fact that >> it is contained in a singe header is attractive. ?It's also quite >> performant in time and space. But if others disagree strongly, all of >> it's uses are hidden behind the interface in leveled_dtypes.c, it could >> be replaced with some other mechanism easily enough. > > I'm not at all convinced by the argument that throwing in random > redundant data types into NumPy will somehow reduce our dependence on > the Python C API. If you have a plan to replace *all* use of dicts in > numpy with khash, then we can talk about that, I guess. But that would > be a separate patch, and I don't think using PyDict in this patch > would really have any effect on how difficult that separate patch was > to do. > > PyDict also has a very simple API -- and in fact, the comparison is > between the PyDict API+the khash API, versus just the PyDict API > alone, since everyone working with the Python C API already has to > know how that works. It's also contained in effectively zero header > files, which is even more attractive than one header file. And that > interface in leveled_dtypes.c is the one that I was talking about > being larger than my entire categorical dtype implementation. > > None of this means that using it is a bad idea, of course! Maybe it > has some key advantage over PyDict in terms of memory use or > something, for those people who have hundreds of thousands of distinct > categories in their data, I don't know. But all your arguments here > seem to be of the form "hey, it's not *that* bad", and it seems like > there must be some actual affirmative advantages it has over PyDict if > it's going to be worth using. > >>> I can't tell if it's worth having categorical scalar types. What value >>> do they provide over just using scalars of the level type? >> >> I'm not certain they are worthwhile either, which is why I did not spend >> any time on them yet. Wes has expressed a desire for very broad >> categorical types (even more than just scalar categories), hopefully he >> can chime in with his motivations. >> >>> Terminology: I'd like to suggest we prefer the term "categorical" for >>> this data, rather than "factor" or "enum". Partly this is because it >>> makes my life easier ;-): >>> ? ?https://groups.google.com/forum/#!msg/pystatsmodels/wLX1-a5Y9fg/04HFKEu45W4J >>> and partly because numpy has a very diverse set of users and I suspect >>> that "categorical" will just be a more transparent name to those who >>> aren't already familiar with the particular statistical and >>> programming traditions that "factor" and "enum" come from. >> >> I think I like "categorical" over "factor" but I am not sure we should >> ditch "enum". There are two different use cases here: I have a pile of >> strings (or scalars) that I want to treat as discrete things >> (categories), and: I have a pile of numbers that I want to give >> convenient or meaningful names to (enums). This latter case was the >> motivation for possibly adding "Natural Naming". > > So mention the word "enum" in the documentation, so people looking for > that will find the categorical data support? :-) > >>> I'm disturbed to see you adding special cases to the core ufunc >>> dispatch machinery for these things. I'm -1 on that. We should clean >>> up the generic ufunc machinery so that it doesn't need special cases >>> to handle adding a simple type like this. >> >> This could certainly be improved, I agree. > > I don't want to be Mr. Grumpypants here, but I do want to make sure > we're speaking the same language: what "-1" means is "I consider this > a show-stopper and will oppose merging any code that does not improve > on this". (Of course you also always have the option of trying to > change my mind. Even Mr. Grumpypants can be swayed by logic!) > >>> I'm also worried that I still don't see any signs that you're working >>> with the downstream libraries that this functionality is intended to >>> be useful for, like the various HDF5 libraries and pandas. I really >>> don't think this functionality can be merged to numpy until we have >>> affirmative statements from those developers that they are excited >>> about it and will use it, and since they're busy people, it's pretty >>> much your job to track them down and make sure that your code will >>> solve their problems. >> >> Francesc is certainly aware of this work, and I emailed Wes earlier this >> week, I probably should have mentioned that, though. Hopefully they will >> have time to contribute their thoughts. I also imagine Travis can speak >> on behalf of the users he has interacted with over the last several >> years that have requested this feature that don't happen to follow >> mailing lists. > > I'm glad Francesc and Wes are aware of the work, but my point was that > that isn't enough. So if I were in your position and hoping to get > this code merged, I'd be trying to figure out how to get them more > actively on board? > > -N > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion OK, I need to spend some time on this as it will directly impact me. Random thoughts here. It looks like the levels can only be strings. This is too limited for my needs. Why not support all possible NumPy dtypes? In pandas world, the levels can be any unique Index object (note, I'm going to change the name of the Factor class to Categorical before 0.8.0 final per discussion with Nathaniel): In [2]: Factor.from_array(np.random.randint(0, 10, 100)) Out[2]: Factor: array([6, 6, 4, 2, 1, 2, 3, 5, 1, 5, 2, 9, 2, 8, 8, 1, 5, 2, 6, 9, 2, 1, 3, 6, 4, 4, 8, 1, 3, 1, 7, 9, 6, 4, 8, 0, 2, 9, 6, 2, 0, 6, 7, 5, 1, 7, 8, 2, 7, 9, 7, 6, 5, 8, 3, 9, 4, 5, 0, 1, 4, 1, 8, 8, 6, 8, 0, 2, 2, 7, 0, 9, 9, 9, 4, 6, 4, 1, 8, 6, 3, 3, 2, 5, 3, 9, 9, 0, 0, 7, 2, 1, 6, 0, 7, 6, 6, 0, 7, 5]) Levels (10): array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) In [3]: Factor.from_array(np.random.randint(0, 10, 100)).levels Out[3]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) In [4]: Factor.from_array(np.random.randint(0, 10, 100)).labels Out[4]: array([0, 4, 3, 6, 6, 0, 6, 2, 2, 6, 2, 4, 7, 4, 1, 8, 1, 4, 8, 6, 4, 5, 6, 4, 8, 3, 9, 5, 3, 0, 4, 2, 7, 0, 1, 8, 0, 7, 8, 6, 5, 6, 1, 6, 2, 7, 8, 5, 7, 5, 1, 5, 0, 5, 6, 5, 5, 4, 0, 3, 3, 8, 5, 1, 1, 2, 6, 7, 7, 1, 6, 6, 4, 4, 8, 2, 1, 7, 8, 3, 7, 8, 1, 5, 0, 6, 9, 9, 9, 5, 7, 3, 1, 2, 0, 1, 5, 6, 4, 5]) The API for constructing an enum/factor/categorical array from fixed levels and an array of labels seems somewhat weak to me. A very common scenario is to need to construct a factor from an array of integers with an associated array of levels: In [13]: labels Out[13]: array([6, 7, 3, 8, 8, 6, 7, 4, 8, 4, 2, 8, 8, 4, 8, 8, 1, 9, 5, 9, 6, 5, 7, 1, 6, 5, 2, 0, 4, 4, 1, 8, 6, 0, 1, 5, 9, 6, 0, 2, 1, 5, 8, 9, 6, 8, 0, 1, 9, 5, 8, 6, 3, 4, 3, 3, 8, 7, 8, 2, 9, 8, 9, 9, 5, 0, 5, 2, 1, 0, 2, 2, 0, 5, 4, 7, 6, 5, 0, 7, 3, 5, 6, 0, 6, 2, 5, 1, 5, 6, 3, 8, 7, 9, 7, 3, 3, 0, 4, 4]) In [14]: levels Out[14]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) In [15]: Factor(labels, levels) Out[15]: Factor: array([6, 7, 3, 8, 8, 6, 7, 4, 8, 4, 2, 8, 8, 4, 8, 8, 1, 9, 5, 9, 6, 5, 7, 1, 6, 5, 2, 0, 4, 4, 1, 8, 6, 0, 1, 5, 9, 6, 0, 2, 1, 5, 8, 9, 6, 8, 0, 1, 9, 5, 8, 6, 3, 4, 3, 3, 8, 7, 8, 2, 9, 8, 9, 9, 5, 0, 5, 2, 1, 0, 2, 2, 0, 5, 4, 7, 6, 5, 0, 7, 3, 5, 6, 0, 6, 2, 5, 1, 5, 6, 3, 8, 7, 9, 7, 3, 3, 0, 4, 4]) Levels (10): array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) What is the story for NA values (NaL?) in a factor array? I code them as -1 in the labels, though you could use INT32_MAX or something. This is very important in the context of groupby operations. Are the levels ordered (Nathaniel brought this up already looks like)? It doesn't look like it. That is also necessary. You also need to be able to sort the levels (which is a relabeling, I have lots of code in use for this). In the context of groupby in pandas, when processing a key (array of values) to a factor to be used for aggregating some data, you have the option of returning an object that has the levels as observed in the data or sorting. Sorting can obviously be very expensive depending on the number of groups in the data (http://wesmckinney.com/blog/?p=437). Example: from pandas import DataFrame from pandas.util.testing import rands import numpy as np df = DataFrame({'key' : [rands(10) for _ in xrange(100000)] * 10, 'data' : np.random.randn(1000000)}) In [32]: timeit df.groupby('key').sum() 1 loops, best of 3: 374 ms per loop In [33]: timeit df.groupby('key', sort=False).sum() 10 loops, best of 3: 185 ms per loop The "factorization time" for the `key` column dominates the runtime; the factor is computed once then reused if you keep the GroupBy object around: In [36]: timeit grouped.sum() 100 loops, best of 3: 6.05 ms per loop As another example of why ordered factors matter, consider a quantile cut (google for the "cut" function in R) function I wrote recently: In [40]: arr = Series(np.random.randn(1000000)) In [41]: cats = qcut(arr, [0, 0.25, 0.5, 0.75, 1]) In [43]: arr.groupby(cats).describe().unstack(0) Out[43]: (-4.85, -0.673] (-0.673, 0.00199] (0.00199, 0.677] (0.677, 4.914] count 250000.000000 250000.000000 250000.000000 250000.000000 mean -1.270623 -0.323092 0.326325 1.271519 std 0.491317 0.193254 0.193044 0.490611 min -4.839798 -0.673224 0.001992 0.677177 25% -1.533021 -0.487450 0.158736 0.888502 50% -1.150136 -0.317501 0.320352 1.150480 75% -0.887974 -0.155197 0.490456 1.534709 max -0.673224 0.001990 0.677176 4.913536 If you don't have ordered levels, then the quantiles might come out in the wrong order depending on how the strings sort or fall out of the hash table. Nathaniel: my experience (see blog posting above for a bit more) is that khash really crushes PyDict for two reasons: you can use it with primitive types and avoid boxing, and secondly you can preallocate. Its memory footprint with large hashtables is also a fraction of PyDict. The Python memory allocator is not problematic-- if you create millions of Python objects expect the RAM usage of the Python process to balloon absurdly. Anyway, this is exciting work assuming we get the API right and hitting all the use cases. On top of all this I am _very_ performance sensitive so you'll have to be pretty aggressive with benchmarking things. I have concerns about ceding control over critical functionality that I need for pandas (which has become a large and very important library these days for a lot of people), but as long as the pieces in NumPy are suitably mature and robust for me to switch to them eventually that would be great. I'll do my best to stay involved in the discussion, though I'm juggling a lot of things these days (e.g. I have the PyData book deadline approaching like a freight train). - Wes From thouis at gmail.com Wed Jun 13 15:10:37 2012 From: thouis at gmail.com (Thouis (Ray) Jones) Date: Wed, 13 Jun 2012 21:10:37 +0200 Subject: [Numpy-discussion] Neighborhood iterator: way to easily check which elements have already been visited in parent iterator? In-Reply-To: References: Message-ID: On Wed, Jun 13, 2012 at 7:58 PM, Ralf Gommers wrote: > I think there were some changes to the iterator API recently, so please keep > in mind that scipy has to still be compatible with numpy 1.5.1 (at least for > now). Noted. I'll rewrite using the 1.5 API, and save this for when scipy moves to a newer version of numpy. Ray Jones From cournape at gmail.com Wed Jun 13 16:12:47 2012 From: cournape at gmail.com (David Cournapeau) Date: Wed, 13 Jun 2012 21:12:47 +0100 Subject: [Numpy-discussion] Neighborhood iterator: way to easily check which elements have already been visited in parent iterator? In-Reply-To: References: Message-ID: Not the neighborhood one, though. It would be good if this iterator had a cython wrapper, and ndimage used that, though. Le 13 juin 2012 18:59, "Ralf Gommers" a ?crit : > > > On Wed, Jun 13, 2012 at 6:57 PM, Thouis (Ray) Jones wrote: > >> Hello, >> >> I'm rewriting scipy.ndimage.label() using numpy's iterator API, > > > I think there were some changes to the iterator API recently, so please > keep in mind that scipy has to still be compatible with numpy 1.5.1 (at > least for now). > > Ralf > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bryanv at continuum.io Wed Jun 13 17:19:57 2012 From: bryanv at continuum.io (Bryan Van de Ven) Date: Wed, 13 Jun 2012 16:19:57 -0500 Subject: [Numpy-discussion] Enum/Factor NEP (now with code) In-Reply-To: References: <4FD7B440.8030500@continuum.io> <4FD8C35E.1080901@continuum.io> Message-ID: <4FD903FD.7000701@continuum.io> On 6/13/12 1:54 PM, Wes McKinney wrote: > OK, I need to spend some time on this as it will directly impact me. > Random thoughts here. > > It looks like the levels can only be strings. This is too limited for > my needs. Why not support all possible NumPy dtypes? In pandas world, > the levels can be any unique Index object (note, I'm going to change > the name of the Factor class to Categorical before 0.8.0 final per > discussion with Nathaniel): The current for-discussion prototype currently only supports strings. I had mentioned integral levels in the NEP but wanted to get more feedback first. It looks like you are using intervals as levels in things like qcut? This would add some complexity. I can think of a couple of possible approaches I will have to try a few of them out to see what would make the most sense. > The API for constructing an enum/factor/categorical array from fixed > levels and an array of labels seems somewhat weak to me. A very common > scenario is to need to construct a factor from an array of integers > with an associated array of levels: > > > In [13]: labels > Out[13]: > array([6, 7, 3, 8, 8, 6, 7, 4, 8, 4, 2, 8, 8, 4, 8, 8, 1, 9, 5, 9, 6, 5, 7, > 1, 6, 5, 2, 0, 4, 4, 1, 8, 6, 0, 1, 5, 9, 6, 0, 2, 1, 5, 8, 9, 6, 8, > 0, 1, 9, 5, 8, 6, 3, 4, 3, 3, 8, 7, 8, 2, 9, 8, 9, 9, 5, 0, 5, 2, 1, > 0, 2, 2, 0, 5, 4, 7, 6, 5, 0, 7, 3, 5, 6, 0, 6, 2, 5, 1, 5, 6, 3, 8, > 7, 9, 7, 3, 3, 0, 4, 4]) > > In [14]: levels > Out[14]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) > > In [15]: Factor(labels, levels) > Out[15]: > Factor: > array([6, 7, 3, 8, 8, 6, 7, 4, 8, 4, 2, 8, 8, 4, 8, 8, 1, 9, 5, 9, 6, 5, 7, > 1, 6, 5, 2, 0, 4, 4, 1, 8, 6, 0, 1, 5, 9, 6, 0, 2, 1, 5, 8, 9, 6, 8, > 0, 1, 9, 5, 8, 6, 3, 4, 3, 3, 8, 7, 8, 2, 9, 8, 9, 9, 5, 0, 5, 2, 1, > 0, 2, 2, 0, 5, 4, 7, 6, 5, 0, 7, 3, 5, 6, 0, 6, 2, 5, 1, 5, 6, 3, 8, > 7, 9, 7, 3, 3, 0, 4, 4]) > Levels (10): array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) I originally had a very similar interface in the NEP. I was persuaded by Mark that this would be redundant: In [10]: levels = np.factor(['a', 'b', 'c']) # or levels = np.factor_array(['a', 'b', 'c', 'a', 'b']).dtype In [11]: np.array(['b', 'b', 'a', 'c', 'c', 'a', 'a', 'a', 'b'], levels) Out[11]: array(['b', 'b', 'a', 'c', 'c', 'a', 'a', 'a', 'b'], dtype='factor({'c': 2, 'a': 0, 'b': 1})') This should also spell even more closely to your example as: labels.astype(levels) but I have not done much with casting yet, so this currently complains. However, would this satisfy your needs (modulo the separate question about more general integral or object levels)? > What is the story for NA values (NaL?) in a factor array? I code them > as -1 in the labels, though you could use INT32_MAX or something. This > is very important in the context of groupby operations. I am just using INT32_MIN at the moment. > Are the levels ordered (Nathaniel brought this up already looks like)? > It doesn't look like it. That is also necessary. You also need to be They currently compare based on their value: In [20]: arr = np.array(['b', 'b', 'a', 'c', 'c', 'a', 'a', 'a', 'b'], np.factor({'c':0, 'b':1, 'a':2})) In [21]: arr Out[21]: array(['b', 'b', 'a', 'c', 'c', 'a', 'a', 'a', 'b'], dtype='factor({'c': 0, 'a': 2, 'b': 1})') In [22]: arr.sort() In [23]: arr Out[23]: array(['c', 'c', 'b', 'b', 'b', 'a', 'a', 'a', 'a'], dtype='factor({'c': 0, 'a': 2, 'b': 1})') > able to sort the levels (which is a relabeling, I have lots of code in > use for this). In the context of groupby in pandas, when processing a > key (array of values) to a factor to be used for aggregating some > data, you have the option of returning an object that has the levels > as observed in the data or sorting. Sorting can obviously be very > expensive depending on the number of groups in the data > (http://wesmckinney.com/blog/?p=437). Example: > > from pandas import DataFrame > from pandas.util.testing import rands > import numpy as np > > df = DataFrame({'key' : [rands(10) for _ in xrange(100000)] * 10, > 'data' : np.random.randn(1000000)}) > > In [32]: timeit df.groupby('key').sum() > 1 loops, best of 3: 374 ms per loop > > In [33]: timeit df.groupby('key', sort=False).sum() > 10 loops, best of 3: 185 ms per loop > > The "factorization time" for the `key` column dominates the runtime; > the factor is computed once then reused if you keep the GroupBy object > around: > > In [36]: timeit grouped.sum() > 100 loops, best of 3: 6.05 ms per loop Just some numbers for comparison. Factorization times: In [41]: lets = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'] In [42]: levels = np.factor(lets) In [43]: data = [lets[int(x)] for x in np.random.randn(1000000)] In [44]: %timeit np.array(data, levels) 10 loops, best of 3: 137 ms per loop And retrieving group indicies/summing: In [8]: %timeit arr=='a' 1000 loops, best of 3: 1.52 ms per loop In [10]: vals = np.random.randn(1000000) In [20]: inds = [arr==x for x in lets] In [23]: %timeit for ind in inds: vals[ind].sum() 10 loops, best of 3: 48.3 ms per loop On my laptop your grouped.sum() took 22ms, so this is roughly off by about a factor of two. But we should compare it on the same hardware, and with the same level data types. There is no doubt room for improvement, though. It would not be too bad to add some groupby functionality on top of this. I still need to add a mechanism for accessing and iterating over the levels. > As another example of why ordered factors matter, consider a quantile > cut (google for the "cut" function in R) function I wrote recently: > > > In [40]: arr = Series(np.random.randn(1000000)) > > In [41]: cats = qcut(arr, [0, 0.25, 0.5, 0.75, 1]) > > In [43]: arr.groupby(cats).describe().unstack(0) > Out[43]: > (-4.85, -0.673] (-0.673, 0.00199] (0.00199, 0.677] (0.677, 4.914] > count 250000.000000 250000.000000 250000.000000 250000.000000 > mean -1.270623 -0.323092 0.326325 1.271519 > std 0.491317 0.193254 0.193044 0.490611 > min -4.839798 -0.673224 0.001992 0.677177 > 25% -1.533021 -0.487450 0.158736 0.888502 > 50% -1.150136 -0.317501 0.320352 1.150480 > 75% -0.887974 -0.155197 0.490456 1.534709 > max -0.673224 0.001990 0.677176 4.913536 > > If you don't have ordered levels, then the quantiles might come out in > the wrong order depending on how the strings sort or fall out of the > hash table. We do have ordered levels. :) Now, there's currently no way to get a list of the levels, in order, but that should be trivial to add. > Nathaniel: my experience (see blog posting above for a bit more) is > that khash really crushes PyDict for two reasons: you can use it with > primitive types and avoid boxing, and secondly you can preallocate. > Its memory footprint with large hashtables is also a fraction of > PyDict. The Python memory allocator is not problematic-- if you create > millions of Python objects expect the RAM usage of the Python process > to balloon absurdly. > > Anyway, this is exciting work assuming we get the API right and > hitting all the use cases. On top of all this I am _very_ performance > sensitive so you'll have to be pretty aggressive with benchmarking > things. I have concerns about ceding control over critical > functionality that I need for pandas (which has become a large and > very important library these days for a lot of people), but as long as > the pieces in NumPy are suitably mature and robust for me to switch to > them eventually that would be great. > > I'll do my best to stay involved in the discussion, though I'm > juggling a lot of things these days (e.g. I have the PyData book > deadline approaching like a freight train). > > - Wes > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From wesmckinn at gmail.com Wed Jun 13 18:11:03 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Wed, 13 Jun 2012 18:11:03 -0400 Subject: [Numpy-discussion] Enum/Factor NEP (now with code) In-Reply-To: <4FD903FD.7000701@continuum.io> References: <4FD7B440.8030500@continuum.io> <4FD8C35E.1080901@continuum.io> <4FD903FD.7000701@continuum.io> Message-ID: On Wed, Jun 13, 2012 at 5:19 PM, Bryan Van de Ven wrote: > On 6/13/12 1:54 PM, Wes McKinney wrote: >> OK, I need to spend some time on this as it will directly impact me. >> Random thoughts here. >> >> It looks like the levels can only be strings. This is too limited for >> my needs. Why not support all possible NumPy dtypes? In pandas world, >> the levels can be any unique Index object (note, I'm going to change >> the name of the Factor class to Categorical before 0.8.0 final per >> discussion with Nathaniel): > > The current for-discussion prototype currently only supports strings. I > had mentioned integral levels in the NEP but wanted to get more feedback > first. It looks like you are using intervals as levels in things like > qcut? This would add some complexity. I can think of a couple of > possible approaches I will have to try a few of them out to see what > would make the most sense. > >> The API for constructing an enum/factor/categorical array from fixed >> levels and an array of labels seems somewhat weak to me. A very common >> scenario is to need to construct a factor from an array of integers >> with an associated array of levels: >> >> >> In [13]: labels >> Out[13]: >> array([6, 7, 3, 8, 8, 6, 7, 4, 8, 4, 2, 8, 8, 4, 8, 8, 1, 9, 5, 9, 6, 5, 7, >> ? ? ? ? 1, 6, 5, 2, 0, 4, 4, 1, 8, 6, 0, 1, 5, 9, 6, 0, 2, 1, 5, 8, 9, 6, 8, >> ? ? ? ? 0, 1, 9, 5, 8, 6, 3, 4, 3, 3, 8, 7, 8, 2, 9, 8, 9, 9, 5, 0, 5, 2, 1, >> ? ? ? ? 0, 2, 2, 0, 5, 4, 7, 6, 5, 0, 7, 3, 5, 6, 0, 6, 2, 5, 1, 5, 6, 3, 8, >> ? ? ? ? 7, 9, 7, 3, 3, 0, 4, 4]) >> >> In [14]: levels >> Out[14]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >> >> In [15]: Factor(labels, levels) >> Out[15]: >> Factor: >> array([6, 7, 3, 8, 8, 6, 7, 4, 8, 4, 2, 8, 8, 4, 8, 8, 1, 9, 5, 9, 6, 5, 7, >> ? ? ? ? 1, 6, 5, 2, 0, 4, 4, 1, 8, 6, 0, 1, 5, 9, 6, 0, 2, 1, 5, 8, 9, 6, 8, >> ? ? ? ? 0, 1, 9, 5, 8, 6, 3, 4, 3, 3, 8, 7, 8, 2, 9, 8, 9, 9, 5, 0, 5, 2, 1, >> ? ? ? ? 0, 2, 2, 0, 5, 4, 7, 6, 5, 0, 7, 3, 5, 6, 0, 6, 2, 5, 1, 5, 6, 3, 8, >> ? ? ? ? 7, 9, 7, 3, 3, 0, 4, 4]) >> Levels (10): array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) > > I originally had a very similar interface in the NEP. I was persuaded by > Mark that this would be redundant: > > In [10]: levels = np.factor(['a', 'b', 'c']) ? # or levels = > np.factor_array(['a', 'b', 'c', 'a', 'b']).dtype > In [11]: np.array(['b', 'b', 'a', 'c', 'c', 'a', 'a', 'a', 'b'], levels) > Out[11]: array(['b', 'b', 'a', 'c', 'c', 'a', 'a', 'a', 'b'], > dtype='factor({'c': 2, 'a': 0, 'b': 1})') > > This should also spell even more closely to your example as: > > labels.astype(levels) > > but I have not done much with casting yet, so this currently complains. > However, would this satisfy your needs (modulo the separate question > about more general integral or object levels)? > >> What is the story for NA values (NaL?) in a factor array? I code them >> as -1 in the labels, though you could use INT32_MAX or something. This >> is very important in the context of groupby operations. > I am just using INT32_MIN at the moment. >> Are the levels ordered (Nathaniel brought this up already looks like)? >> It doesn't look like it. That is also necessary. You also need to be > > They currently compare based on their value: > > In [20]: arr = np.array(['b', 'b', 'a', 'c', 'c', 'a', 'a', 'a', 'b'], > np.factor({'c':0, 'b':1, 'a':2})) > In [21]: arr > Out[21]: array(['b', 'b', 'a', 'c', 'c', 'a', 'a', 'a', 'b'], > dtype='factor({'c': 0, 'a': 2, 'b': 1})') > In [22]: arr.sort() > In [23]: arr > Out[23]: array(['c', 'c', 'b', 'b', 'b', 'a', 'a', 'a', 'a'], > dtype='factor({'c': 0, 'a': 2, 'b': 1})') > > >> able to sort the levels (which is a relabeling, I have lots of code in >> use for this). In the context of groupby in pandas, when processing a >> key (array of values) to a factor to be used for aggregating some >> data, you have the option of returning an object that has the levels >> as observed in the data or sorting. Sorting can obviously be very >> expensive depending on the number of groups in the data >> (http://wesmckinney.com/blog/?p=437). Example: >> >> from pandas import DataFrame >> from pandas.util.testing import rands >> import numpy as np >> >> df = DataFrame({'key' : [rands(10) for _ in xrange(100000)] * 10, >> ? ? ? ? ? ? 'data' : np.random.randn(1000000)}) >> >> In [32]: timeit df.groupby('key').sum() >> 1 loops, best of 3: 374 ms per loop >> >> In [33]: timeit df.groupby('key', sort=False).sum() >> 10 loops, best of 3: 185 ms per loop >> >> The "factorization time" for the `key` column dominates the runtime; >> the factor is computed once then reused if you keep the GroupBy object >> around: >> >> In [36]: timeit grouped.sum() >> 100 loops, best of 3: 6.05 ms per loop > Just some numbers for comparison. Factorization times: > > In [41]: lets = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j'] > In [42]: levels = np.factor(lets) > In [43]: data = [lets[int(x)] for x in np.random.randn(1000000)] > In [44]: %timeit np.array(data, levels) > 10 loops, best of 3: 137 ms per loop > > And retrieving group indicies/summing: > > In [8]: %timeit arr=='a' > 1000 loops, best of 3: 1.52 ms per loop > In [10]: vals = np.random.randn(1000000) > In [20]: inds = [arr==x for x in lets] > In [23]: %timeit for ind in inds: vals[ind].sum() > 10 loops, best of 3: 48.3 ms per loop (FYI you're comparing an O(NK) algorithm with an O(N) algorithm for small K) > On my laptop your grouped.sum() took 22ms, so this is roughly off by > about a factor of two. But we should compare it on the same hardware, > and with the same level data types. There is no doubt room for > improvement, though. > > It would not be too bad to add some groupby functionality on top of > this. I still need to add a mechanism for accessing and iterating over > the levels. > >> As another example of why ordered factors matter, consider a quantile >> cut (google for the "cut" function in R) function I wrote recently: >> >> >> In [40]: arr = Series(np.random.randn(1000000)) >> >> In [41]: cats = qcut(arr, [0, 0.25, 0.5, 0.75, 1]) >> >> In [43]: arr.groupby(cats).describe().unstack(0) >> Out[43]: >> ? ? ? ? (-4.85, -0.673] ?(-0.673, 0.00199] ?(0.00199, 0.677] ?(0.677, 4.914] >> count ? ?250000.000000 ? ? ?250000.000000 ? ? 250000.000000 ? 250000.000000 >> mean ? ? ? ? -1.270623 ? ? ? ? ?-0.323092 ? ? ? ? ?0.326325 ? ? ? ?1.271519 >> std ? ? ? ? ? 0.491317 ? ? ? ? ? 0.193254 ? ? ? ? ?0.193044 ? ? ? ?0.490611 >> min ? ? ? ? ?-4.839798 ? ? ? ? ?-0.673224 ? ? ? ? ?0.001992 ? ? ? ?0.677177 >> 25% ? ? ? ? ?-1.533021 ? ? ? ? ?-0.487450 ? ? ? ? ?0.158736 ? ? ? ?0.888502 >> 50% ? ? ? ? ?-1.150136 ? ? ? ? ?-0.317501 ? ? ? ? ?0.320352 ? ? ? ?1.150480 >> 75% ? ? ? ? ?-0.887974 ? ? ? ? ?-0.155197 ? ? ? ? ?0.490456 ? ? ? ?1.534709 >> max ? ? ? ? ?-0.673224 ? ? ? ? ? 0.001990 ? ? ? ? ?0.677176 ? ? ? ?4.913536 >> >> If you don't have ordered levels, then the quantiles might come out in >> the wrong order depending on how the strings sort or fall out of the >> hash table. > We do have ordered levels. :) Now, there's currently no way to get a > list of the levels, in order, but that should be trivial to add. > >> Nathaniel: my experience (see blog posting above for a bit more) is >> that khash really crushes PyDict for two reasons: you can use it with >> primitive types and avoid boxing, and secondly you can preallocate. >> Its memory footprint with large hashtables is also a fraction of >> PyDict. The Python memory allocator is not problematic-- if you create >> millions of Python objects expect the RAM usage of the Python process >> to balloon absurdly. >> >> Anyway, this is exciting work assuming we get the API right and >> hitting all the use cases. On top of all this I am _very_ performance >> sensitive so you'll have to be pretty aggressive with benchmarking >> things. I have concerns about ceding control over critical >> functionality that I need for pandas (which has become a large and >> very important library these days for a lot of people), but as long as >> the pieces in NumPy are suitably mature and robust for me to switch to >> them eventually that would be great. >> >> I'll do my best to stay involved in the discussion, though I'm >> juggling a lot of things these days (e.g. I have the PyData book >> deadline approaching like a freight train). >> >> - Wes >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From bryanv at continuum.io Wed Jun 13 18:06:29 2012 From: bryanv at continuum.io (Bryan Van de Ven) Date: Wed, 13 Jun 2012 17:06:29 -0500 Subject: [Numpy-discussion] Enum/Factor NEP (now with code) In-Reply-To: References: <4FD7B440.8030500@continuum.io> <4FD8C35E.1080901@continuum.io> Message-ID: <4FD90EE5.7030601@continuum.io> On 6/13/12 1:12 PM, Nathaniel Smith wrote: > your-branch's-base-master but not in your-repo's-master are new stuff > that you did on your branch. Solution is just to do > git push master Fixed, thanks. > Yes, of course we *could* write the code to implement these "open" > dtypes, and then write the documentation, examples, tutorials, etc. to > help people work around their limitations. Or, we could just implement > np.fromfile properly, which would require no workarounds and take less > code to boot. > > [snip] > So would a proper implementation of np.fromfile that normalized the > level ordering. My understanding of the impetus for the open type was sensitivity to the performance of having to make two passes over large text datasets. We'll have to get more feedback from users here and input from Travis, I think. > categories in their data, I don't know. But all your arguments here > seem to be of the form "hey, it's not *that* bad", and it seems like > there must be some actual affirmative advantages it has over PyDict if > it's going to be worth using. I should have been more specific about the performance concerns. Wes summed them up, though: better space efficiency, and not having to box/unbox native types. >> I think I like "categorical" over "factor" but I am not sure we should >> ditch "enum". There are two different use cases here: I have a pile of >> strings (or scalars) that I want to treat as discrete things >> (categories), and: I have a pile of numbers that I want to give >> convenient or meaningful names to (enums). This latter case was the >> motivation for possibly adding "Natural Naming". > So mention the word "enum" in the documentation, so people looking for > that will find the categorical data support? :-) I'm not sure I follow. Natural Naming seems like a great idea for people that want something like an actual enum (i.e., a way to avoid magic numbers). We could even imagine some nice with-hacks: colors = enum(['red', 'green', 'blue') with colors: foo.fill(red) bar.fill(blue) But natural naming will not work with many category names ("VERY HIGH") if they have spaces, etc. So, we could add a parameter to factor(...) that turns on and off natural naming for a dtype object when it is created: colors = factor(['red', 'green', 'blue'], closed=True, natural_naming=False) vs colors = enum(['red', 'green', 'blue']) I think the latter is better, not only because it is more parsimonious, but because it also expresses intent better. Or we can just not have natural naming at all, if no one wants it. It hasn't been implemented yet, so that would be a snap. :) Hopefully we'll get more feedback from the list. >>> I'm disturbed to see you adding special cases to the core ufunc >>> dispatch machinery for these things. I'm -1 on that. We should clean >>> up the generic ufunc machinery so that it doesn't need special cases >>> to handle adding a simple type like this. >> This could certainly be improved, I agree. > I don't want to be Mr. Grumpypants here, but I do want to make sure > we're speaking the same language: what "-1" means is "I consider this > a show-stopper and will oppose merging any code that does not improve > on this". (Of course you also always have the option of trying to > change my mind. Even Mr. Grumpypants can be swayed by logic!) Well, a few comments. The special case in array_richcompare is due to the lack of string ufuncs. I think it would be great to have string ufuncs, but I also think it is a separate concern and outside the scope of this proposal. The special case in arraydescr_typename_get is for the same reason as datetime special case, the need to access dtype metadata. I don't think you are really concerned about these two, though? That leaves the special case in PyUFunc_SimpleBinaryComparisonTypeResolver. As I said, I chaffed a bit when I put that in. On the other hand, having dtypes with this extent of attached metadata, and potentially dynamic metadata, is unique in NumPy. It was simple and straightforward to add those few lines of code, and does not affect performance. How invasive will the changes to core ufunc machinery be to accommodate a type like this more generally? I took the easy way because I was new to the numpy codebase and did not feel confident mucking with the central ufunc code. However, maybe the dispatch can be accomplished easily with the casting machinery. I am not so sure, I will have to investigate. Of course, I welcome input, suggestions, and proposals on the best way to improve this. >> I'm glad Francesc and Wes are aware of the work, but my point was that >> that isn't enough. So if I were in your position and hoping to get >> this code merged, I'd be trying to figure out how to get them more >> actively on board? Is there some other way besides responding to and attempting to accommodate technical needs? Bryan From bryanv at continuum.io Wed Jun 13 18:20:17 2012 From: bryanv at continuum.io (Bryan Van de Ven) Date: Wed, 13 Jun 2012 17:20:17 -0500 Subject: [Numpy-discussion] Enum/Factor NEP (now with code) In-Reply-To: References: <4FD7B440.8030500@continuum.io> <4FD8C35E.1080901@continuum.io> <4FD903FD.7000701@continuum.io> Message-ID: <4FD91221.1040602@continuum.io> On 6/13/12 5:11 PM, Wes McKinney wrote: > And retrieving group indicies/summing: > > In [8]: %timeit arr=='a' > 1000 loops, best of 3: 1.52 ms per loop > In [10]: vals = np.random.randn(1000000) > In [20]: inds = [arr==x for x in lets] > In [23]: %timeit for ind in inds: vals[ind].sum() > 10 loops, best of 3: 48.3 ms per loop > (FYI you're comparing an O(NK) algorithm with an O(N) algorithm for small K) I am not familiar with the details of your groupby implementation (evidently!), consider me appropriately chastised. Bryan From nic399c at sbcglobal.net Wed Jun 13 19:02:06 2012 From: nic399c at sbcglobal.net (Nicholas Carl) Date: Wed, 13 Jun 2012 16:02:06 -0700 (PDT) Subject: [Numpy-discussion] multiarray.so: undefined symbol: _Py_ascii_whitespace Message-ID: <1339628526.618.YahooMailClassic@web81708.mail.mud.yahoo.com> Hello, ? I installed Numpy 1.5.1 with Python 2.7.2 and when I try to test my QIIME install it gives me the following error: ? /apps/qiime-1.5/dependencies/lib/python2.7/site-packages/numpy/core/multiarray.so: undefined symbol: _Py_ascii_whitespace ? ? ldd for that file gives: ??????? libm.so.6 => /lib64/libm.so.6 (0x00002b664c3a4000) ??????? libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b664c627000) ??????? libc.so.6 => /lib64/libc.so.6 (0x00002b664c842000) ??????? /lib64/ld-linux-x86-64.so.2 (0x0000003bae400000) ? I saw something similar here with no reply: http://article.gmane.org/gmane.comp.python.matplotlib.general/26025/match=multiarray+so+_py_ascii_whitespace ? Any ideas? Thanks! Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: From bobtnur78 at gmail.com Wed Jun 13 21:55:42 2012 From: bobtnur78 at gmail.com (bob tnur) Date: Wed, 13 Jun 2012 21:55:42 -0400 Subject: [Numpy-discussion] consecutive node sequence and pathlength of graph Message-ID: Let say,I have a conjugated cyclic polygon and its nodes are given by the list: list_p=[a,b,c,d,e,f,g,a,a,b,d,d,d,d,d,c,c,e,e,a,d,d,g]. If X & Y are any elements in a list_p except d, and Z is also an element of list_p but has value only d, i.e, Z=d. Now,I want to compute the number of paths with sequence X-Z- Y consecutive nodes, example: a-d-d-e,a-d-d-a, a-d-d-d-g, etc. Consecutive path length of Z can be 1,2,3,4 or 5. To generalize: Find total number of paths with consecutive X-Z-Y sequence for path length of z within range(1,6): or (i) X-Z-Y for len(Z)=1 (ii) X-Z-Y for len(Z)=2 (iii)X-Z-Y for len(Z)=3 (iv) X-Z-Y for len(Z)=4 (v) X-Z-Y for len(Z)=5 Is there any easy way to program this using python networkx or igraph? , can numpy group have some module to simplify this problem? -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Thu Jun 14 04:00:07 2012 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Thu, 14 Jun 2012 10:00:07 +0200 Subject: [Numpy-discussion] automatic differentiation with PyAutoDiff In-Reply-To: References: Message-ID: 2012/6/13 James Bergstra : > Further to the recent discussion on lazy evaluation & numba, I moved > what I was doing into a new project: > > PyAutoDiff: > https://github.com/jaberg/pyautodiff > > It currently works by executing CPython bytecode with a numpy-aware > engine that builds a symbolic expression graph with Theano... so you > can do for example: > >>>> import autodiff, numpy as np >>>> autodiff.fmin_l_bfgs_b(lambda x: (x + 1) ** 2, [np.zeros(())]) > > ... and you'll see `[array(-1.0)]` printed out. > > In the future, I think it should be able to export the > gradient-computing function as bytecode, which could then be optimized > by e.g. numba or a theano bytecode front-end. For now it just compiles > and runs the Theano graph that it built. > > It's still pretty rough (you'll see if you look at the code!) but I'm > excited about it. Very interesting. Would it be possible to use bytecode introspection to printout the compute and display a symbolic representation of an arbitrary python + numpy expression? E.g. something along the lines of: >>> g = autodiff.gradient(lambda x: (x + 1) ** 2, [np.zeros(())]) >>> print g f(x) = 2 * x + 2 >>> g(np.arrange(3)) array[2, 4, 6] -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel From d.s.seljebotn at astro.uio.no Thu Jun 14 06:12:53 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 14 Jun 2012 12:12:53 +0200 Subject: [Numpy-discussion] Enum/Factor NEP (now with code) In-Reply-To: <4FD90EE5.7030601@continuum.io> References: <4FD7B440.8030500@continuum.io> <4FD8C35E.1080901@continuum.io> <4FD90EE5.7030601@continuum.io> Message-ID: <4FD9B925.9000302@astro.uio.no> On 06/14/2012 12:06 AM, Bryan Van de Ven wrote: > On 6/13/12 1:12 PM, Nathaniel Smith wrote: >> your-branch's-base-master but not in your-repo's-master are new stuff >> that you did on your branch. Solution is just to do >> git push master > > Fixed, thanks. > >> Yes, of course we *could* write the code to implement these "open" >> dtypes, and then write the documentation, examples, tutorials, etc. to >> help people work around their limitations. Or, we could just implement >> np.fromfile properly, which would require no workarounds and take less >> code to boot. >> >> [snip] >> So would a proper implementation of np.fromfile that normalized the >> level ordering. > > My understanding of the impetus for the open type was sensitivity to the > performance of having to make two passes over large text datasets. We'll > have to get more feedback from users here and input from Travis, I think. Can't you just build up the file using uint8, collecting enum values in a separate dict, and then recast the array with the final enum in the end? Or, recast the array with a new enum type every time one wants to add an enum value? (Similar to how you append to a tuple...) (Yes, normalizing level ordering requires another pass through the parsed data array, but that's unavoidable and rather orthogonal to whether one has an open enum dtype API or not.) A mutable dtype gives me the creeps. dtypes currently implements __hash__ and __eq__ and can be used as dict keys, which I think is very valuable. Making them sometimes mutable would cause a confusing situations. There are cases for mutable objects that become immutable, but it should be very well motivated as it makes for a much more confusing API... Dag From francesc at continuum.io Thu Jun 14 06:48:36 2012 From: francesc at continuum.io (Francesc Alted) Date: Thu, 14 Jun 2012 12:48:36 +0200 Subject: [Numpy-discussion] Enum/Factor NEP (now with code) In-Reply-To: References: <4FD7B440.8030500@continuum.io> <4FD8C35E.1080901@continuum.io> Message-ID: <4FD9C184.60809@continuum.io> On 6/13/12 8:12 PM, Nathaniel Smith wrote: >>> I'm also worried that I still don't see any signs that you're working >>> with the downstream libraries that this functionality is intended to >>> be useful for, like the various HDF5 libraries and pandas. I really >>> don't think this functionality can be merged to numpy until we have >>> affirmative statements from those developers that they are excited >>> about it and will use it, and since they're busy people, it's pretty >>> much your job to track them down and make sure that your code will >>> solve their problems. >> Francesc is certainly aware of this work, and I emailed Wes earlier this >> week, I probably should have mentioned that, though. Hopefully they will >> have time to contribute their thoughts. I also imagine Travis can speak >> on behalf of the users he has interacted with over the last several >> years that have requested this feature that don't happen to follow >> mailing lists. > I'm glad Francesc and Wes are aware of the work, but my point was that > that isn't enough. So if I were in your position and hoping to get > this code merged, I'd be trying to figure out how to get them more > actively on board? Sorry to chime in late. Yes, I am aware of the improvements that Bryan (and Mark) are proposing. My position here is that I'm very open to this (at least from a functional point of view; I have to recognize that I have not had a look into the code). The current situation for the HDF5 wrappers (at least PyTables ones) is that, due to the lack of support of enums in NumPy itself, we had to come with a specific solution for this. Our approach was pretty simple: basically providing an exhaustive set or list of possible, named values for different integers. And although I'm not familiar with the implementation details (it was Ivan Vilata who implemented this part), I think we used an internal dictionary for doing the translation while PyTables is presenting the enums to the user. Bryan is implementing a much more complete (and probably more efficient) support for enums in NumPy. As this is new functionality, and PyTables does not trust on it, there is not an immediate danger (i.e. a backward incompatibility) on introducing the new enums in NumPy. But they could be used for future PyTables versions (and other HDF5 wrappers), which is a good thing indeed. My 2 cents, -- Francesc Alted From thouis at gmail.com Thu Jun 14 07:57:41 2012 From: thouis at gmail.com (Thouis (Ray) Jones) Date: Thu, 14 Jun 2012 13:57:41 +0200 Subject: [Numpy-discussion] Enum/Factor NEP (now with code) In-Reply-To: References: <4FD7B440.8030500@continuum.io> <4FD8C35E.1080901@continuum.io> Message-ID: On Wed, Jun 13, 2012 at 8:54 PM, Wes McKinney wrote: > Nathaniel: my experience (see blog posting above for a bit more) is > that khash really crushes PyDict for two reasons: you can use it with > primitive types and avoid boxing, and secondly you can preallocate. > Its memory footprint with large hashtables is also a fraction of > PyDict. The Python memory allocator is not problematic-- if you create > millions of Python objects expect the RAM usage of the Python process > to balloon absurdly. The other big reason to consider allowing khash (or some other hash implementation) within numpy is that you can use it without the GIL. From bergstrj at iro.umontreal.ca Thu Jun 14 09:43:53 2012 From: bergstrj at iro.umontreal.ca (James Bergstra) Date: Thu, 14 Jun 2012 09:43:53 -0400 Subject: [Numpy-discussion] automatic differentiation with PyAutoDiff In-Reply-To: References: Message-ID: On Thu, Jun 14, 2012 at 4:00 AM, Olivier Grisel wrote: > 2012/6/13 James Bergstra : >> Further to the recent discussion on lazy evaluation & numba, I moved >> what I was doing into a new project: >> >> PyAutoDiff: >> https://github.com/jaberg/pyautodiff >> >> It currently works by executing CPython bytecode with a numpy-aware >> engine that builds a symbolic expression graph with Theano... so you >> can do for example: >> >>>>> import autodiff, numpy as np >>>>> autodiff.fmin_l_bfgs_b(lambda x: (x + 1) ** 2, [np.zeros(())]) >> >> ... and you'll see `[array(-1.0)]` printed out. >> >> In the future, I think it should be able to export the >> gradient-computing function as bytecode, which could then be optimized >> by e.g. numba or a theano bytecode front-end. For now it just compiles >> and runs the Theano graph that it built. >> >> It's still pretty rough (you'll see if you look at the code!) but I'm >> excited about it. > > Very interesting. Would it be possible to use bytecode introspection > to printout the compute and display a symbolic representation of an > arbitrary python + numpy expression? > > E.g. something along the lines of: > >>>> g = autodiff.gradient(lambda x: (x + 1) ** 2, [np.zeros(())]) >>>> print g > f(x) = 2 * x + 2 >>>> g(np.arrange(3)) > array[2, 4, 6] > > -- > Olivier > http://twitter.com/ogrisel - http://github.com/ogrisel So... almost? I just hacked this gradient function to see what theano could print out, and the first thing that happened (after my own mistakes were sorted out) was an error because the lambda expression was defined to work on a 0-d array, but then you evaluated g on a vector. Was this part of the test? If so, I'm not sure I think it's a good idea, I'm assuming it was a cut-and-paste oversight and moving on.... I settled on (https://github.com/jaberg/pyautodiff/blob/master/autodiff/tests/test_gradient.py) ``` import numpy as np from autodiff import Gradient def test_basic(): g = Gradient(lambda x: ((x + 1) ** 2).sum(), [np.zeros(3)]) print g print g(np.arange(3)) ``` The output is ... well... ugly but correct: Elemwise{Composite{[mul(i0, add(i1, i2))]}}(TensorConstant{(1,) of 2.0}, TensorConstant{(1,) of 1.0}, ) [array([ 2., 4., 6.])] So with some effort on pretty-printing I'm pretty confident that this could work, at least for simple examples. Pretty-printing is always a challenge for non-trivial examples. One option might be to convert the internal symbolic graph to sympy? - James -- http://www-etud.iro.umontreal.ca/~bergstrj From cournape at gmail.com Thu Jun 14 09:58:51 2012 From: cournape at gmail.com (David Cournapeau) Date: Thu, 14 Jun 2012 14:58:51 +0100 Subject: [Numpy-discussion] [ANN] Bento 0.1.0 Message-ID: Hi, I am pleased to announce a new release of bento, a packaging solution for python which aims at reproducibility, extensibility and simplicity. The main features of this 0.1.0 release are: - new commands register_pypi and upload_pypi to register a package to pypi and upload tarballs to it. - add sphinx command to build a package documentation if it uses sphinx. - add tweak_library/tweak_extension functions to build contexts to simplify simple builder customization (e.g. include_dirs, defines, etc...) - waf backend: cython tool automatically loaded if cython files are detected in sources - UseBackends feature: allows to declare which build backend to use when building C extensions in the bento.info file directly - add --use-distutils-flags configure option to force using flags from distutils (disabled by default). - add --disable-autoconfigure build option to bypass configure for fast partial rebuilds. This is not reliable depending on how the environment is changed, so one should only use this during development. - add register_metadata API to register new metadata to be filled in MetaTemplateFile Bento source code can be found on github: https://github.com/cournape/Bento Bento documentation is there as well: https://cournape.github.com/Bento regards, David -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Thu Jun 14 10:42:53 2012 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Thu, 14 Jun 2012 16:42:53 +0200 Subject: [Numpy-discussion] automatic differentiation with PyAutoDiff In-Reply-To: References: Message-ID: 2012/6/14 James Bergstra : > On Thu, Jun 14, 2012 at 4:00 AM, Olivier Grisel > wrote: >> 2012/6/13 James Bergstra : >>> Further to the recent discussion on lazy evaluation & numba, I moved >>> what I was doing into a new project: >>> >>> PyAutoDiff: >>> https://github.com/jaberg/pyautodiff >>> >>> It currently works by executing CPython bytecode with a numpy-aware >>> engine that builds a symbolic expression graph with Theano... so you >>> can do for example: >>> >>>>>> import autodiff, numpy as np >>>>>> autodiff.fmin_l_bfgs_b(lambda x: (x + 1) ** 2, [np.zeros(())]) >>> >>> ... and you'll see `[array(-1.0)]` printed out. >>> >>> In the future, I think it should be able to export the >>> gradient-computing function as bytecode, which could then be optimized >>> by e.g. numba or a theano bytecode front-end. For now it just compiles >>> and runs the Theano graph that it built. >>> >>> It's still pretty rough (you'll see if you look at the code!) but I'm >>> excited about it. >> >> Very interesting. Would it be possible to use bytecode introspection >> to printout the compute and display a symbolic representation of an >> arbitrary python + numpy expression? >> >> E.g. something along the lines of: >> >>>>> g = autodiff.gradient(lambda x: (x + 1) ** 2, [np.zeros(())]) >>>>> print g >> f(x) = 2 * x + 2 >>>>> g(np.arrange(3)) >> array[2, 4, 6] >> >> -- >> Olivier >> http://twitter.com/ogrisel - http://github.com/ogrisel > > So... almost? > > I just hacked this gradient function to see what theano could print > out, and the first thing that happened (after my own mistakes were > sorted out) was an error because the lambda expression was defined to > work on a 0-d array, but then you evaluated g on a vector. Was this > part of the test? If so, I'm not sure I think it's a good idea, I'm > assuming it was a cut-and-paste oversight and moving on.... Indeed, my bad. I wrote that email in a hurry while waiting in line for boarding in a plane while still using the airport wifi... > I settled on (https://github.com/jaberg/pyautodiff/blob/master/autodiff/tests/test_gradient.py) > ``` > import numpy as np > from autodiff import Gradient > > def test_basic(): > ? ?g = Gradient(lambda x: ((x + 1) ** 2).sum(), [np.zeros(3)]) > ? ?print g > ? ?print g(np.arange(3)) > ``` > > The output is ... well... ugly but correct: > Elemwise{Composite{[mul(i0, add(i1, i2))]}}(TensorConstant{(1,) of > 2.0}, TensorConstant{(1,) of 1.0}, ) > [array([ 2., ?4., ?6.])] Indeed it's a bit hard to parse by a human :) > So with some effort on pretty-printing I'm pretty confident that this > could work, at least for simple examples. Pretty-printing is always a > challenge for non-trivial examples. ?One option might be to convert > the internal symbolic graph to sympy? Indeed that would be great as sympy already has already excellent math expression rendering. An alternative would be to output mathml or something similar that could be understood by the mathjax rendering module of the IPython notebook. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel From njs at pobox.com Thu Jun 14 11:01:53 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 14 Jun 2012 16:01:53 +0100 Subject: [Numpy-discussion] automatic differentiation with PyAutoDiff In-Reply-To: References: Message-ID: On Thu, Jun 14, 2012 at 3:42 PM, Olivier Grisel wrote: > 2012/6/14 James Bergstra : >> On Thu, Jun 14, 2012 at 4:00 AM, Olivier Grisel >> wrote: >>> 2012/6/13 James Bergstra : >>>> Further to the recent discussion on lazy evaluation & numba, I moved >>>> what I was doing into a new project: >>>> >>>> PyAutoDiff: >>>> https://github.com/jaberg/pyautodiff >>>> >>>> It currently works by executing CPython bytecode with a numpy-aware >>>> engine that builds a symbolic expression graph with Theano... so you >>>> can do for example: >>>> >>>>>>> import autodiff, numpy as np >>>>>>> autodiff.fmin_l_bfgs_b(lambda x: (x + 1) ** 2, [np.zeros(())]) >>>> >>>> ... and you'll see `[array(-1.0)]` printed out. >>>> >>>> In the future, I think it should be able to export the >>>> gradient-computing function as bytecode, which could then be optimized >>>> by e.g. numba or a theano bytecode front-end. For now it just compiles >>>> and runs the Theano graph that it built. >>>> >>>> It's still pretty rough (you'll see if you look at the code!) but I'm >>>> excited about it. >>> >>> Very interesting. Would it be possible to use bytecode introspection >>> to printout the compute and display a symbolic representation of an >>> arbitrary python + numpy expression? >>> >>> E.g. something along the lines of: >>> >>>>>> g = autodiff.gradient(lambda x: (x + 1) ** 2, [np.zeros(())]) >>>>>> print g >>> f(x) = 2 * x + 2 >>>>>> g(np.arrange(3)) >>> array[2, 4, 6] >>> >>> -- >>> Olivier >>> http://twitter.com/ogrisel - http://github.com/ogrisel >> >> So... almost? >> >> I just hacked this gradient function to see what theano could print >> out, and the first thing that happened (after my own mistakes were >> sorted out) was an error because the lambda expression was defined to >> work on a 0-d array, but then you evaluated g on a vector. Was this >> part of the test? If so, I'm not sure I think it's a good idea, I'm >> assuming it was a cut-and-paste oversight and moving on.... > > Indeed, my bad. I wrote that email in a hurry while waiting in line > for boarding in a plane while still using the airport wifi... > >> I settled on (https://github.com/jaberg/pyautodiff/blob/master/autodiff/tests/test_gradient.py) >> ``` >> import numpy as np >> from autodiff import Gradient >> >> def test_basic(): >> ? ?g = Gradient(lambda x: ((x + 1) ** 2).sum(), [np.zeros(3)]) >> ? ?print g >> ? ?print g(np.arange(3)) >> ``` >> >> The output is ... well... ugly but correct: >> Elemwise{Composite{[mul(i0, add(i1, i2))]}}(TensorConstant{(1,) of >> 2.0}, TensorConstant{(1,) of 1.0}, ) >> [array([ 2., ?4., ?6.])] > > Indeed it's a bit hard to parse by a human :) > >> So with some effort on pretty-printing I'm pretty confident that this >> could work, at least for simple examples. Pretty-printing is always a >> challenge for non-trivial examples. ?One option might be to convert >> the internal symbolic graph to sympy? > > Indeed that would be great as sympy already has already excellent math > expression rendering. > > An alternative would be to output mathml or something similar that > could be understood by the mathjax rendering module of the IPython > notebook. I'd find it quite useful if it could spit out the derivative as Python code that I could check and integrate into my source. I often have a particular function that I need to optimize in many different situations, but would rather not pull in a whole (complex and perhaps fragile) bytecode introspection library just to repeatedly recompute the same function on every run... -N From njs at pobox.com Thu Jun 14 12:17:30 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 14 Jun 2012 17:17:30 +0100 Subject: [Numpy-discussion] Pull request: Split maskna support out of mainline into a branch In-Reply-To: References: Message-ID: On Wed, Jun 6, 2012 at 11:08 PM, Nathaniel Smith wrote: > Just submitted this pull request for discussion: > ?https://github.com/numpy/numpy/pull/297 > > As per earlier discussion on the list, this PR attempts to remove > exactly and only the maskna-related code from numpy mainline: > ?http://mail.scipy.org/pipermail/numpy-discussion/2012-May/062417.html > > The suggestion is that we merge this to master for the 1.7 release, > and immediately "git revert" it on a branch so that it can be modified > further without blocking the release. > > The first patch does the actual maskna removal; the second and third > rearrange things so that PyArray_ReduceWrapper does not end up in the > public API, for reasons described therein. > > All tests pass with Python 2.4, 2.5, 2.6, 2.7, 3.1, 3.2 on 64-bit > Ubuntu. The docs also appear to build. Before I re-based this I also > tested against Scipy, matplotlib, and pandas, and all were fine. While it's tempting to think that the lack of response to this email/PR indicates that everyone now agrees with me about how to proceed with the NA work, I'm for some reason unconvinced... Any objections to merging this? -N From cournape at gmail.com Thu Jun 14 12:20:19 2012 From: cournape at gmail.com (David Cournapeau) Date: Thu, 14 Jun 2012 17:20:19 +0100 Subject: [Numpy-discussion] Pull request: Split maskna support out of mainline into a branch In-Reply-To: References: Message-ID: On Thu, Jun 14, 2012 at 5:17 PM, Nathaniel Smith wrote: > On Wed, Jun 6, 2012 at 11:08 PM, Nathaniel Smith wrote: > > Just submitted this pull request for discussion: > > https://github.com/numpy/numpy/pull/297 > > > > As per earlier discussion on the list, this PR attempts to remove > > exactly and only the maskna-related code from numpy mainline: > > http://mail.scipy.org/pipermail/numpy-discussion/2012-May/062417.html > > > > The suggestion is that we merge this to master for the 1.7 release, > > and immediately "git revert" it on a branch so that it can be modified > > further without blocking the release. > > > > The first patch does the actual maskna removal; the second and third > > rearrange things so that PyArray_ReduceWrapper does not end up in the > > public API, for reasons described therein. > > > > All tests pass with Python 2.4, 2.5, 2.6, 2.7, 3.1, 3.2 on 64-bit > > Ubuntu. The docs also appear to build. Before I re-based this I also > > tested against Scipy, matplotlib, and pandas, and all were fine. > > While it's tempting to think that the lack of response to this > email/PR indicates that everyone now agrees with me about how to > proceed with the NA work, I'm for some reason unconvinced... > > Any objections to merging this? > No objection, but could you wait for this WE ? I am in the middle of setting up a buildbot for windows for numpy (for both mingw and MSVC compilers), and that would be a good way to test it. David -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Jun 14 14:21:06 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 14 Jun 2012 12:21:06 -0600 Subject: [Numpy-discussion] Pull request: Split maskna support out of mainline into a branch In-Reply-To: References: Message-ID: On Thu, Jun 14, 2012 at 10:17 AM, Nathaniel Smith wrote: > On Wed, Jun 6, 2012 at 11:08 PM, Nathaniel Smith wrote: > > Just submitted this pull request for discussion: > > https://github.com/numpy/numpy/pull/297 > > > > As per earlier discussion on the list, this PR attempts to remove > > exactly and only the maskna-related code from numpy mainline: > > http://mail.scipy.org/pipermail/numpy-discussion/2012-May/062417.html > > > > The suggestion is that we merge this to master for the 1.7 release, > > and immediately "git revert" it on a branch so that it can be modified > > further without blocking the release. > > > > The first patch does the actual maskna removal; the second and third > > rearrange things so that PyArray_ReduceWrapper does not end up in the > > public API, for reasons described therein. > > > > All tests pass with Python 2.4, 2.5, 2.6, 2.7, 3.1, 3.2 on 64-bit > > Ubuntu. The docs also appear to build. Before I re-based this I also > > tested against Scipy, matplotlib, and pandas, and all were fine. > > While it's tempting to think that the lack of response to this > email/PR indicates that everyone now agrees with me about how to > proceed with the NA work, I'm for some reason unconvinced... > > Any objections to merging this? > > Travis needs to sign off on this one. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Jun 14 14:32:09 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 14 Jun 2012 19:32:09 +0100 Subject: [Numpy-discussion] Pull request: Split maskna support out of mainline into a branch In-Reply-To: References: Message-ID: On Thu, Jun 14, 2012 at 5:20 PM, David Cournapeau wrote: > > > > On Thu, Jun 14, 2012 at 5:17 PM, Nathaniel Smith wrote: >> >> On Wed, Jun 6, 2012 at 11:08 PM, Nathaniel Smith wrote: >> > Just submitted this pull request for discussion: >> > ?https://github.com/numpy/numpy/pull/297 >> > >> > As per earlier discussion on the list, this PR attempts to remove >> > exactly and only the maskna-related code from numpy mainline: >> > ?http://mail.scipy.org/pipermail/numpy-discussion/2012-May/062417.html >> > >> > The suggestion is that we merge this to master for the 1.7 release, >> > and immediately "git revert" it on a branch so that it can be modified >> > further without blocking the release. >> > >> > The first patch does the actual maskna removal; the second and third >> > rearrange things so that PyArray_ReduceWrapper does not end up in the >> > public API, for reasons described therein. >> > >> > All tests pass with Python 2.4, 2.5, 2.6, 2.7, 3.1, 3.2 on 64-bit >> > Ubuntu. The docs also appear to build. Before I re-based this I also >> > tested against Scipy, matplotlib, and pandas, and all were fine. >> >> While it's tempting to think that the lack of response to this >> email/PR indicates that everyone now agrees with me about how to >> proceed with the NA work, I'm for some reason unconvinced... >> >> Any objections to merging this? > > > No objection, but could you wait for this WE ? I am in the middle of > setting up a buildbot for windows for numpy (for both mingw and MSVC > compilers), and that would be a good way to test it. Sure, I doubt it would go in before then anyway. IME you'll need more than one test commit to get a buildbot going... a useful trick I learned here: https://github.com/numpy/numpy/pull/292 is to point the buildbot at your private clone, and then you can do test commits to your heart's content. -N From bergstrj at iro.umontreal.ca Thu Jun 14 14:53:43 2012 From: bergstrj at iro.umontreal.ca (James Bergstra) Date: Thu, 14 Jun 2012 14:53:43 -0400 Subject: [Numpy-discussion] automatic differentiation with PyAutoDiff In-Reply-To: References: Message-ID: On Thu, Jun 14, 2012 at 11:01 AM, Nathaniel Smith wrote: >> Indeed that would be great as sympy already has already excellent math >> expression rendering. >> >> An alternative would be to output mathml or something similar that >> could be understood by the mathjax rendering module of the IPython >> notebook. > > I'd find it quite useful if it could spit out the derivative as Python > code that I could check and integrate into my source. I often have a > particular function that I need to optimize in many different > situations, but would rather not pull in a whole (complex and perhaps > fragile) bytecode introspection library just to repeatedly recompute > the same function on every run... > > -N I was hoping to get by with bytecode-> bytecode interface, are there bytecode -> source tools that could help here? Otherwise it might be possible to appeal to the symbolic intermediate representation to produce more legible source. With regards to "pulling in a whole bytecode introspection library" I don't really see what you mean. If the issue is that you want some way to verify that the output function is actually computing the right thing, then I hear you - that's an issue. If the issue that autodiff itself is slow, then I'd like to hear more about the application, because in minimization you usually have to call the function many times (hundreds) so the autodiff overhead should be relatively small (I'm not counting Theano's function compilation time here, which still can be significant... but that's a separate concern.) - James -- http://www-etud.iro.umontreal.ca/~bergstrj From njs at pobox.com Thu Jun 14 15:38:30 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 14 Jun 2012 20:38:30 +0100 Subject: [Numpy-discussion] automatic differentiation with PyAutoDiff In-Reply-To: References: Message-ID: On Thu, Jun 14, 2012 at 7:53 PM, James Bergstra wrote: > On Thu, Jun 14, 2012 at 11:01 AM, Nathaniel Smith wrote: > >>> Indeed that would be great as sympy already has already excellent math >>> expression rendering. >>> >>> An alternative would be to output mathml or something similar that >>> could be understood by the mathjax rendering module of the IPython >>> notebook. >> >> I'd find it quite useful if it could spit out the derivative as Python >> code that I could check and integrate into my source. I often have a >> particular function that I need to optimize in many different >> situations, but would rather not pull in a whole (complex and perhaps >> fragile) bytecode introspection library just to repeatedly recompute >> the same function on every run... >> >> -N > > I was hoping to get by with bytecode-> bytecode interface, are there > bytecode -> source tools that could help here? Not that I know of -- you might try googling "python reverse engineer" or similar. Mostly people treat bytecode as the internal intermediate format it is. I'm sort of confused at why people are suddenly excited about using (some particular CPython release's version of) bytecode as an input format when both the ast module and Cython are perfectly capable of parsing real Python source into a nice abstract format, but you all seem to be having fun so hey. > Otherwise it might be possible to appeal to the symbolic intermediate > representation to produce more legible source. > > With regards to "pulling in a whole bytecode introspection library" I > don't really see what you mean. If the issue is that you want some way > to verify that the output function is actually computing the right > thing, then I hear you - that's an issue. If the issue that autodiff > itself is slow, then I'd like to hear more about the application, > because in minimization you usually have to call the function many > times (hundreds) so the autodiff overhead should be relatively small > (I'm not counting Theano's function compilation time here, which still > can be significant... but that's a separate concern.) For example, I wrote a library routine for doing log-linear regression. Doing this required computing the derivative of the likelihood function, which was a huge nitpicky hassle; took me a few hours to work out and debug. But it's still just 10 lines of Python code that I needed to figure out once and they're done forever, now. I'd have been perfectly happy if I could have gotten those ten lines by asking a random unreleased library I pulled off github, which depended on heavy libraries like Theano and relied on a mostly untested emulator for some particular version of the CPython VM. But I'd be less happy to ask everyone who uses my code to install that library as well, just so I could avoid having to spend a few hours doing math. This isn't a criticism or your library or anything, it's just that I'm always going to be reluctant to rely on an automatic differentiation tool that takes arbitrary code as input, because it almost certainly cannot be made fully robust. So it'd be nice to have the option to stick a human in the loop. -N From srean.list at gmail.com Thu Jun 14 16:22:40 2012 From: srean.list at gmail.com (srean) Date: Thu, 14 Jun 2012 15:22:40 -0500 Subject: [Numpy-discussion] automatic differentiation with PyAutoDiff In-Reply-To: References: Message-ID: > > For example, I wrote a library routine for doing log-linear > regression. Doing this required computing the derivative of the > likelihood function, which was a huge nitpicky hassle; took me a few > hours to work out and debug. But it's still just 10 lines of Python > code that I needed to figure out once and they're done forever, now. > I'd have been perfectly happy if I could have gotten those ten lines > by asking a random unreleased library I pulled off github, which > depended on heavy libraries like Theano and relied on a mostly > untested emulator for some particular version of the CPython VM. But > I'd be less happy to ask everyone who uses my code to install that > library as well, just so I could avoid having to spend a few hours > doing math. This isn't a criticism or your library or anything, it's > just that I'm always going to be reluctant to rely on an automatic > differentiation tool that takes arbitrary code as input, because it > almost certainly cannot be made fully robust. So it'd be nice to have > the option to stick a human in the loop. Log-linears are by definition too simple a model to appreciate auto-differentiation. Try computing the Hessian by hand on a modestly sized multilayer neural network and you will start seeing the advantages. Or say computing the Hessian of a large graphical model. But I do have my own reservations about auto-diff. Until we have the smart enough compiler that does common subexpression elimination, and in fact even then, hand written differentiation code will often turn out to be more efficient. Terms cancel out (subtraction or division), terms factorize, terms can be arranged into an efficient Horner's scheme. It will take a very smart symbolic manipulation of the parse tree to get all that. So places where I really need to optimize the derivative code, I would still do it by hand and delegate it to an AD system when the size gets unwieldy. In theory a good compromise is to let the AD churn out the code and then hand optimize it. But here readable output indeed does help. As far as correctness of the computed derivative is concerned, computing the dot product between the gradient of a function and the secant computed numerically from the function does guard against gross errors. If i remember correctly the scipy module on optimization already has a function to do such sanity checks. Of course it cannot guarantee correctness, but usually goes a long way. -- srean From bergstrj at iro.umontreal.ca Thu Jun 14 16:31:20 2012 From: bergstrj at iro.umontreal.ca (James Bergstra) Date: Thu, 14 Jun 2012 16:31:20 -0400 Subject: [Numpy-discussion] automatic differentiation with PyAutoDiff In-Reply-To: References: Message-ID: On Thu, Jun 14, 2012 at 3:38 PM, Nathaniel Smith wrote: > On Thu, Jun 14, 2012 at 7:53 PM, James Bergstra > wrote: >> On Thu, Jun 14, 2012 at 11:01 AM, Nathaniel Smith wrote: >> >>>> Indeed that would be great as sympy already has already excellent math >>>> expression rendering. >>>> >>>> An alternative would be to output mathml or something similar that >>>> could be understood by the mathjax rendering module of the IPython >>>> notebook. >>> >>> I'd find it quite useful if it could spit out the derivative as Python >>> code that I could check and integrate into my source. I often have a >>> particular function that I need to optimize in many different >>> situations, but would rather not pull in a whole (complex and perhaps >>> fragile) bytecode introspection library just to repeatedly recompute >>> the same function on every run... >>> >>> -N >> >> I was hoping to get by with bytecode-> bytecode interface, are there >> bytecode -> source tools that could help here? > > Not that I know of -- you might try googling "python reverse engineer" > or similar. Mostly people treat bytecode as the internal intermediate > format it is. I'm sort of confused at why people are suddenly excited > about using (some particular CPython release's version of) bytecode as > an input format when both the ast module and Cython are perfectly > capable of parsing real Python source into a nice abstract format, but > you all seem to be having fun so hey. > Heh, yeah I'm sure the bytecode high will wear off soon enough. For now though, the key advantage over ast manipulation is that bytecode seems easier to *run* than an ast. Actually running a non-trivial Python code fragment is the only reliable way of determining what it will do. Python is so weakly typed that static analysis is bound to be even more fragile (imagine trying to do static analysis of a function that takes variables from globals or a closure!?) In general, the drawback of running the bytecode is that the trace depend on control flow. You obviously don't get to follow both an `if` and `else` branch if you are trying to emulate the real interpreter. Automatic differentiation is a good fit here because as a general rule an epsilon change to parameters does not change the control flow path. I'm hoping that violations of this general rule are both (a) easy for autodiff to detect and (b) easy for programmers to rewrite... but time will tell. As for the CPython-version-specificity of bytecode I haven't found it changes much from 2.5 to 2.7... there are new instructions for conditional branching, but those are trivial to accommodate. Again, maybe there are some surprises on the way, but so far smooth sailing. >> Otherwise it might be possible to appeal to the symbolic intermediate >> representation to produce more legible source. >> >> With regards to "pulling in a whole bytecode introspection library" I >> don't really see what you mean. If the issue is that you want some way >> to verify that the output function is actually computing the right >> thing, then I hear you - that's an issue. If the issue that autodiff >> itself is slow, then I'd like to hear more about the application, >> because in minimization you usually have to call the function many >> times (hundreds) so the autodiff overhead should be relatively small >> (I'm not counting Theano's function compilation time here, which still >> can be significant... but that's a separate concern.) > > For example, I wrote a library routine for doing log-linear > regression. Doing this required computing the derivative of the > likelihood function, which was a huge nitpicky hassle; took me a few > hours to work out and debug. But it's still just 10 lines of Python > code that I needed to figure out once and they're done forever, now. > I'd have been perfectly happy if I could have gotten those ten lines > by asking a random unreleased library I pulled off github, which > depended on heavy libraries like Theano and relied on a mostly > untested emulator for some particular version of the CPython VM. But > I'd be less happy to ask everyone who uses my code to install that > library as well, just so I could avoid having to spend a few hours > doing math. This isn't a criticism or your library or anything, it's > just that I'm always going to be reluctant to rely on an automatic > differentiation tool that takes arbitrary code as input, because it > almost certainly cannot be made fully robust. So it'd be nice to have > the option to stick a human in the loop. > > -N > Thanks, that makes sense. Sounds like that print function that Olivier was asking about would be just the thing. I'm reading this mainly as a plea for e.g. Theano to provide better human-readable output, or for sympy to provide better support for tensor expressions. Let me know if that's not fair. My understanding is that your wish would be textual output as close to runnable numpy code as possible. - James -- http://www-etud.iro.umontreal.ca/~bergstrj From bergstrj at iro.umontreal.ca Thu Jun 14 16:49:25 2012 From: bergstrj at iro.umontreal.ca (James Bergstra) Date: Thu, 14 Jun 2012 16:49:25 -0400 Subject: [Numpy-discussion] automatic differentiation with PyAutoDiff In-Reply-To: References: Message-ID: On Thu, Jun 14, 2012 at 4:22 PM, srean wrote: >> >> For example, I wrote a library routine for doing log-linear >> regression. Doing this required computing the derivative of the >> likelihood function, which was a huge nitpicky hassle; took me a few >> hours to work out and debug. But it's still just 10 lines of Python >> code that I needed to figure out once and they're done forever, now. >> I'd have been perfectly happy if I could have gotten those ten lines >> by asking a random unreleased library I pulled off github, which >> depended on heavy libraries like Theano and relied on a mostly >> untested emulator for some particular version of the CPython VM. But >> I'd be less happy to ask everyone who uses my code to install that >> library as well, just so I could avoid having to spend a few hours >> doing math. This isn't a criticism or your library or anything, it's >> just that I'm always going to be reluctant to rely on an automatic >> differentiation tool that takes arbitrary code as input, because it >> almost certainly cannot be made fully robust. So it'd be nice to have >> the option to stick a human in the loop. > > Log-linears are by definition too simple a model to appreciate > auto-differentiation. Try computing the Hessian by hand ?on a modestly > sized multilayer neural network and you will start seeing the > advantages. Or say computing the Hessian of a large graphical model. > But I do have my own reservations about auto-diff. Until we have the > smart enough compiler that does common subexpression elimination, and > in fact even then, hand written differentiation code will often turn > out to be more efficient. Terms cancel out (subtraction or division), > terms factorize, terms can be arranged into an efficient Horner's > scheme. It will take a very smart symbolic manipulation of the parse > tree to get all that. > You're right - there is definitely a difference between a correct gradient and a gradient is both correct and fast to compute. The current quick implementation of pyautodiff is naive in this regard. However, it is delegating the heavy lifting to Theano. Theano performs the sort of optimization-oriented tree manipulations you're talking about, and Theano contributors have been tweaking them for a few years now. They aren't always perfect but they are often pretty good. In fact, I would go as far as saying that they are often *better* than what you might do by hand if you don't sit down for a long time and tune the hell out of your computation. > ?So places where I really need to optimize the derivative code, I > would still do it by hand and delegate it to an AD system when the > size gets unwieldy. In theory a good compromise is to let the AD churn > out the code and then hand optimize it. But here readable output > indeed does help. > > > As far as correctness of the computed derivative is concerned, > computing the dot product between the gradient of a function and the > secant computed numerically from the function does guard against gross > errors. If i remember correctly the scipy module on optimization > already has a function to do such sanity checks. Of course it cannot > guarantee correctness, but usually goes a long way. > > -- srean True, even approximating a gradient by finite differences is a subtle thing if you want to get the most precision per time spent. Another thing I was wondering about was periodically re-running the original bytecode on inputs to make sure that the derived bytecode produces the same answer (!). Those two sanity checks would detect the two most scary errors to my mind as a user: a) that autodiff got the original function wrong b) that autodiff is mis-computing a gradient. - James -- http://www-etud.iro.umontreal.ca/~bergstrj From srean.list at gmail.com Thu Jun 14 17:06:49 2012 From: srean.list at gmail.com (srean) Date: Thu, 14 Jun 2012 16:06:49 -0500 Subject: [Numpy-discussion] automatic differentiation with PyAutoDiff In-Reply-To: References: Message-ID: > > You're right - there is definitely a difference between a correct > gradient and a gradient is both correct and fast to compute. > > The current quick implementation of pyautodiff is naive in this > regard. Oh and by no means was I criticizing your implementation. It is a very hard problem to solve and as you indicate takes several man years to deal with. And compared to having no gradient at all, a gradient but possibly slower to compute is a big improvement :) > True, even approximating a gradient by finite differences is a subtle > thing if you want to get the most precision per time spent. Another > thing I was wondering about was periodically re-running the original > bytecode on inputs to make sure that the derived bytecode produces the > same answer (!). Those two sanity checks would detect the two most > scary errors to my mind as a user: > a) that autodiff got the original function wrong > b) that autodiff is mis-computing a gradient. Was suggesting finite difference just for sanity check, not as an actual substitute for the gradient. You wont believe how many times the finite difference check has saved me from going in the exact opposite direction ! From travis at continuum.io Thu Jun 14 17:12:49 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 14 Jun 2012 16:12:49 -0500 Subject: [Numpy-discussion] Pull request: Split maskna support out of mainline into a branch In-Reply-To: References: Message-ID: <3D0F9AF7-D8E0-4525-A5F8-D9B9BB7E8B6C@continuum.io> I think we should go ahead and merge this PR. It would be ideal to make a branch with the current code and then merge this into master. I haven't had the time to do this. If you can do this Nathaniel, then it will really help with 1.7 release. Thanks, -Travis On Jun 14, 2012, at 11:17 AM, Nathaniel Smith wrote: > On Wed, Jun 6, 2012 at 11:08 PM, Nathaniel Smith wrote: >> Just submitted this pull request for discussion: >> https://github.com/numpy/numpy/pull/297 >> >> As per earlier discussion on the list, this PR attempts to remove >> exactly and only the maskna-related code from numpy mainline: >> http://mail.scipy.org/pipermail/numpy-discussion/2012-May/062417.html >> >> The suggestion is that we merge this to master for the 1.7 release, >> and immediately "git revert" it on a branch so that it can be modified >> further without blocking the release. >> >> The first patch does the actual maskna removal; the second and third >> rearrange things so that PyArray_ReduceWrapper does not end up in the >> public API, for reasons described therein. >> >> All tests pass with Python 2.4, 2.5, 2.6, 2.7, 3.1, 3.2 on 64-bit >> Ubuntu. The docs also appear to build. Before I re-based this I also >> tested against Scipy, matplotlib, and pandas, and all were fine. > > While it's tempting to think that the lack of response to this > email/PR indicates that everyone now agrees with me about how to > proceed with the NA work, I'm for some reason unconvinced... > > Any objections to merging this? > > -N > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From nouiz at nouiz.org Thu Jun 14 17:37:38 2012 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Thu, 14 Jun 2012 17:37:38 -0400 Subject: [Numpy-discussion] automatic differentiation with PyAutoDiff In-Reply-To: References: Message-ID: Hi, On Thu, Jun 14, 2012 at 4:49 PM, James Bergstra wrote: > You're right - there is definitely a difference between a correct > gradient and a gradient is both correct and fast to compute. > > The current quick implementation of pyautodiff is naive in this > regard. ?However, it is delegating the heavy lifting to Theano. Theano > performs the sort of optimization-oriented tree manipulations you're > talking about, and Theano contributors have been tweaking them for a > few years now. They aren't always perfect but they are often pretty > good. In fact, I would go as far as saying that they are often > *better* than what you might do by hand if you don't sit down for a > long time and tune the hell out of your computation. Hi, I second James here, Theano do many of those optimizations. Only advanced coder can do better then Theano in most case, but that will take them much more time. If you find some optimization that you do and Theano don't, tell us. We want to add them :) Fred From njs at pobox.com Thu Jun 14 17:38:26 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 14 Jun 2012 22:38:26 +0100 Subject: [Numpy-discussion] boolean indexing change In-Reply-To: References: Message-ID: On Mon, Jun 11, 2012 at 1:31 AM, Travis Oliphant wrote: > It is unfortunate that this was committed to master. ?This should be backed out and is a blocker for 1.7. ? Can someone help me identify which commit made the change? > > This is a rather significant change and changes the documented behavior of NumPy substantially. ? This should definitely not occur in 1.7 > > The documented behavior (Guide to NumPy, pg. ?84) of boolean indexing is that > > ? x[obj] is equivalent to x[obj.nonzero()] > > The shape of advanced indexing is not restricted to the shape of of x. ? ? ?I suspect this change was made when it was presumed the next release would be 2.0 and such behavior could presumably be changed somewhat? ?But, was there a discussion about this? I don't see the commit where the change was made (the release notes were updated at some other time), but that error message seems to come from nditer. So I suspect that this change was part of the rewrite of indexing, and might not be fixable with a small local patch -- I don't think nditer has a way to iterate over a boolean array while filling in False once you go past the end of the array... If we're willing to deprecate this behavior then a temporary workaround measure might be enough. Like checking for such arrays and copying them into larger arrays before using them in indexing. Or maybe that would be fine in general, I dunno. Mark, any thoughts? -N From travis at continuum.io Thu Jun 14 17:42:28 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 14 Jun 2012 16:42:28 -0500 Subject: [Numpy-discussion] automatic differentiation with PyAutoDiff In-Reply-To: References: Message-ID: On Jun 14, 2012, at 1:53 PM, James Bergstra wrote: > On Thu, Jun 14, 2012 at 11:01 AM, Nathaniel Smith wrote: > >>> Indeed that would be great as sympy already has already excellent math >>> expression rendering. >>> >>> An alternative would be to output mathml or something similar that >>> could be understood by the mathjax rendering module of the IPython >>> notebook. >> >> I'd find it quite useful if it could spit out the derivative as Python >> code that I could check and integrate into my source. I often have a >> particular function that I need to optimize in many different >> situations, but would rather not pull in a whole (complex and perhaps >> fragile) bytecode introspection library just to repeatedly recompute >> the same function on every run... >> >> -N > > I was hoping to get by with bytecode-> bytecode interface, are there > bytecode -> source tools that could help here? > There have been some attempts in the past. The most advanced tool I've seen here is by Sean Ross-Ross: https://github.com/srossross/meta Here's an example: import meta # get some code object (i.e. from compile or from func.func_code --- with no return statement) mod2 = meta.decompile(code) meta.dump_python_source(mod2) -Travis > Otherwise it might be possible to appeal to the symbolic intermediate > representation to produce more legible source. > > With regards to "pulling in a whole bytecode introspection library" I > don't really see what you mean. If the issue is that you want some way > to verify that the output function is actually computing the right > thing, then I hear you - that's an issue. If the issue that autodiff > itself is slow, then I'd like to hear more about the application, > because in minimization you usually have to call the function many > times (hundreds) so the autodiff overhead should be relatively small > (I'm not counting Theano's function compilation time here, which still > can be significant... but that's a separate concern.) > > - James > -- > http://www-etud.iro.umontreal.ca/~bergstrj > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Jun 14 17:53:16 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 14 Jun 2012 22:53:16 +0100 Subject: [Numpy-discussion] automatic differentiation with PyAutoDiff In-Reply-To: References: Message-ID: On Thu, Jun 14, 2012 at 9:22 PM, srean wrote: >> >> For example, I wrote a library routine for doing log-linear >> regression. Doing this required computing the derivative of the >> likelihood function, which was a huge nitpicky hassle; took me a few >> hours to work out and debug. But it's still just 10 lines of Python >> code that I needed to figure out once and they're done forever, now. >> I'd have been perfectly happy if I could have gotten those ten lines >> by asking a random unreleased library I pulled off github, which >> depended on heavy libraries like Theano and relied on a mostly >> untested emulator for some particular version of the CPython VM. But >> I'd be less happy to ask everyone who uses my code to install that >> library as well, just so I could avoid having to spend a few hours >> doing math. This isn't a criticism or your library or anything, it's >> just that I'm always going to be reluctant to rely on an automatic >> differentiation tool that takes arbitrary code as input, because it >> almost certainly cannot be made fully robust. So it'd be nice to have >> the option to stick a human in the loop. > > Log-linears are by definition too simple a model to appreciate > auto-differentiation. Try computing the Hessian by hand ?on a modestly > sized multilayer neural network and you will start seeing the > advantages. No, I'm saying I totally see the advantages. Here's the code I'm talking about: def _loglik(self, params): alpha, beta = self.used_alpha_beta(params) if np.any(alpha < 0): return 1e20 total = 0 for group in self._model._groups.itervalues(): alpha_part = np.dot(group["alpha_matrix"], alpha) eff_beta_matrix = group["beta_matrix"].copy() nab = self._model._num_alpha_betas eff_beta_matrix[:, :nab] *= np.log(alpha_part[:, np.newaxis]) exponent = np.dot(eff_beta_matrix, beta) Z = np.exp(exponent).sum() total += (group["counts"] * exponent).sum() total += group["counts"].sum() * -np.log(Z) return total It's not complex, but it's complicated. Enough that propagating all those multidimensional chain rules through it was a pain in the butt. But not so complicated that an automatic tool couldn't have worked it out, especially with some hand-holding (e.g. extracting the inner loop, sticking the dict lookups into local variables, getting rid of the .copy()). Of course, maybe you were pointing out that if your derivative calculation depends in some intrinsic way on the topology of some graph, then your best bet is to have an automatic way to recompute it from scratch for each new graph you see. In that case, fair enough! > Or say computing the Hessian of a large graphical model. > But I do have my own reservations about auto-diff. Until we have the > smart enough compiler that does common subexpression elimination, and > in fact even then, hand written differentiation code will often turn > out to be more efficient. Terms cancel out (subtraction or division), > terms factorize, terms can be arranged into an efficient Horner's > scheme. It will take a very smart symbolic manipulation of the parse > tree to get all that. > > ?So places where I really need to optimize the derivative code, I > would still do it by hand and delegate it to an AD system when the > size gets unwieldy. In theory a good compromise is to let the AD churn > out the code and then hand optimize it. But here readable output > indeed does help. > > As far as correctness of the computed derivative is concerned, > computing the dot product between the gradient of a function and the > secant computed numerically from the function does guard against gross > errors. If i remember correctly the scipy module on optimization > already has a function to do such sanity checks. Of course it cannot > guarantee correctness, but usually goes a long way. Right, and what I want is to do those correctness checks once, and then save the validated derivative function somewhere and know that it won't break the next time I upgrade some library or make some seemingly-irrelevant change to the original code. -N From bergstrj at iro.umontreal.ca Thu Jun 14 18:52:58 2012 From: bergstrj at iro.umontreal.ca (James Bergstra) Date: Thu, 14 Jun 2012 18:52:58 -0400 Subject: [Numpy-discussion] automatic differentiation with PyAutoDiff In-Reply-To: References: Message-ID: On Thu, Jun 14, 2012 at 5:53 PM, Nathaniel Smith wrote: > On Thu, Jun 14, 2012 at 9:22 PM, srean wrote: > No, I'm saying I totally see the advantages. Here's the code I'm talking about: > > ? ?def _loglik(self, params): > ? ? ? ?alpha, beta = self.used_alpha_beta(params) > ? ? ? ?if np.any(alpha < 0): > ? ? ? ? ? ?return 1e20 > ? ? ? ?total = 0 > ? ? ? ?for group in self._model._groups.itervalues(): > ? ? ? ? ? ?alpha_part = np.dot(group["alpha_matrix"], alpha) > ? ? ? ? ? ?eff_beta_matrix = group["beta_matrix"].copy() > ? ? ? ? ? ?nab = self._model._num_alpha_betas > ? ? ? ? ? ?eff_beta_matrix[:, :nab] *= np.log(alpha_part[:, np.newaxis]) > ? ? ? ? ? ?exponent = np.dot(eff_beta_matrix, beta) > ? ? ? ? ? ?Z = np.exp(exponent).sum() > ? ? ? ? ? ?total += (group["counts"] * exponent).sum() > ? ? ? ? ? ?total += group["counts"].sum() * -np.log(Z) > ? ? ? ?return total > You're right, this is totally the kind of code that autodiff can/should be able to help with. I just pushed a first draft at support for np.any, log, exp, inplace operators, and inplace array assignment... so there's a chance that your example might currently run. (You might even see a speedup if Theano graph optimizations work their magic). It's not clear from the code fragment what the various types in play are (see previous rant on static analysis!), but an autodiff PR with a test case would help sort out any remaining problems if you want to follow up on this. - James From srean.list at gmail.com Thu Jun 14 18:59:50 2012 From: srean.list at gmail.com (srean) Date: Thu, 14 Jun 2012 17:59:50 -0500 Subject: [Numpy-discussion] automatic differentiation with PyAutoDiff In-Reply-To: References: Message-ID: > Of course, maybe you were pointing out that if your derivative > calculation depends in some intrinsic way on the topology of some > graph, then your best bet is to have an automatic way to recompute it > from scratch for each new graph you see. In that case, fair enough! That is indeed what I had in mind. In neural networks, Markov random fields, Bayesian networks, graph regularization etc it is something that has to be dealt with all the time. > Right, and what I want is to do those correctness checks once, and > then save the validated derivative function somewhere and know that it > won't break the next time I upgrade some library or make some > seemingly-irrelevant change to the original code. Exactly. What I was getting at is: even if it is not feasible to get a pretty printed python output, the byte code can still be validated (somewhat) with a few numeric sanity checks. So, yes the derivatives needn't/shouldn't be re-computed in runtime all the time and an API that that returns even some opaque but computable representation of the derivative that can be validated and then "frozen" would be helpful. I think one can go further and formally prove the correctness of the derivative computing engine. I dont know if anyone has done it. Maybe Theano does it. Should be possible for a statically typed sublanguage. From srean.list at gmail.com Thu Jun 14 19:18:16 2012 From: srean.list at gmail.com (srean) Date: Thu, 14 Jun 2012 18:18:16 -0500 Subject: [Numpy-discussion] automatic differentiation with PyAutoDiff In-Reply-To: References: Message-ID: > Hi, > > I second James here, Theano do many of those optimizations. Only > advanced coder can do better then Theano in most case, but that will > take them much more time. If you find some optimization that you do > and Theano don't, tell us. We want to add them :) > > Fred I am sure Theano does an excellent job of expressions that matter. But I think to get the best symbolic reduction of an expression is a hard, as in, an AI hard problem. Correct me if I am wrong though. One can come up with perverse corner cases using algebraic or trigonometric identities, expressions that are hundreds of terms long but whose derivatives are simple, perhaps even a constant. But all that matters is how well it does for the common cases and am hearing that it does extremely well. I will be happy if it can reduce simple things like the following (a very common form in Theano's domain) \phi(x) - \phi(y) - dot( x-y, \grad_phi(y)) evaluated for \phi(x) = \sum_i (x_i log x_i) - x_i to \sum_i x_i log(x_i / y_i) on the set sum(x) = sum(y) = 1 In anycase I think this is a digression and rather not pollute this thread with peripheral (nonethless very interesting) issues. From matthew.brett at gmail.com Thu Jun 14 22:06:44 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 14 Jun 2012 19:06:44 -0700 Subject: [Numpy-discussion] Matrix rank default tolerance - is it too low? In-Reply-To: References: Message-ID: Hi, I noticed that numpy.linalg.matrix_rank sometimes gives full rank for matrices that are numerically rank deficient: If I repeatedly make random matrices, then set the first column to be equal to the sum of the second and third columns: def make_deficient(): ? ?X = np.random.normal(size=(40, 10)) ? ?deficient_X = X.copy() ? ?deficient_X[:, 0] = deficient_X[:, 1] + deficient_X[:, 2] ? ?return deficient_X then the current numpy.linalg.matrix_rank algorithm returns full rank (10) in about 8 percent of cases (see appended script). I think this is a tolerance problem. ?The ``matrix_rank`` algorithm does this by default: S = spl.svd(M, compute_uv=False) tol = S.max() * np.finfo(S.dtype).eps return np.sum(S > tol) I guess we'd we want the lowest tolerance that nearly always or always identifies numerically rank deficient matrices. I suppose one way of looking at whether the tolerance is in the right range is to compare the calculated tolerance (``tol``) to the minimum singular value (``S.min()``) because S.min() in our case should be very small and indicate the rank deficiency. The mean value of tol / S.min() for the current algorithm, across many iterations, is about 2.8. ?We might hope this value would be higher than 1, but not much higher, otherwise we might be rejecting too many columns. Our current algorithm for tolerance is the same as the 2-norm of M * eps. ?We're citing Golub and Van Loan for this, but now I look at our copy (p 261, last para) - they seem to be suggesting using u * |M| where u = (p 61, section 2.4.2) eps / ?2. (see [1]). I think the Golub and Van Loan suggestion corresponds to: tol = np.linalg.norm(M, np.inf) * np.finfo(M.dtype).eps / 2 This tolerance gives full rank for these rank-deficient matrices in about 39 percent of cases (tol / S.min() ratio of 1.7) We see on p 56 (section 2.3.2) that: m, n = M.shape 1 / sqrt(n) . |M|_{inf} <= |M|_2 So we can get an upper bound on |M|_{inf} with |M|_2 * sqrt(n). ?Setting: tol = S.max() * np.finfo(M.dtype).eps / 2 * np.sqrt(n) gives about 0.5 percent error (tol / S.min() of 4.4) Using the Mathworks threshold [2]: tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) There are no false negatives (0 percent rank 10), but tol / S.min() is around 110 - so conservative, in this case. So - summary - I'm worrying our current threshold is too small, letting through many rank-deficient matrices without detection. ?I may have misread Golub and Van Loan, but maybe we aren't doing what they suggest. ?Maybe what we could use is either the MATLAB threshold or something like: tol = S.max() * np.finfo(M.dtype).eps * np.sqrt(n) - so 2 * the upper bound for the inf norm = 2 * |M|_2 * sqrt(n) . This gives 0 percent misses and tol / S.min() of 8.7. What do y'all think? Best, Matthew [1] http://matthew-brett.github.com/pydagogue/floating_error.html#machine-epsilon [2] http://www.mathworks.com/help/techdoc/ref/rank.html Output from script: Percent undetected current: 9.8, tol / S.min(): 2.762 Percent undetected inf norm: 39.1, tol / S.min(): 1.667 Percent undetected upper bound inf norm: 0.5, tol / S.min(): 4.367 Percent undetected upper bound inf norm * 2: 0.0, tol / S.min(): 8.734 Percent undetected MATLAB: 0.0, tol / S.min(): 110.477 From charlesr.harris at gmail.com Thu Jun 14 23:10:52 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 14 Jun 2012 21:10:52 -0600 Subject: [Numpy-discussion] Matrix rank default tolerance - is it too low? In-Reply-To: References: Message-ID: On Thu, Jun 14, 2012 at 8:06 PM, Matthew Brett wrote: > Hi, > > I noticed that numpy.linalg.matrix_rank sometimes gives full rank for > matrices that are numerically rank deficient: > > If I repeatedly make random matrices, then set the first column to be > equal to the sum of the second and third columns: > > def make_deficient(): > X = np.random.normal(size=(40, 10)) > deficient_X = X.copy() > deficient_X[:, 0] = deficient_X[:, 1] + deficient_X[:, 2] > return deficient_X > > then the current numpy.linalg.matrix_rank algorithm returns full rank > (10) in about 8 percent of cases (see appended script). > > I think this is a tolerance problem. The ``matrix_rank`` algorithm > does this by default: > > S = spl.svd(M, compute_uv=False) > tol = S.max() * np.finfo(S.dtype).eps > return np.sum(S > tol) > > I guess we'd we want the lowest tolerance that nearly always or always > identifies numerically rank deficient matrices. I suppose one way of > looking at whether the tolerance is in the right range is to compare > the calculated tolerance (``tol``) to the minimum singular value > (``S.min()``) because S.min() in our case should be very small and > indicate the rank deficiency. The mean value of tol / S.min() for the > current algorithm, across many iterations, is about 2.8. We might > hope this value would be higher than 1, but not much higher, otherwise > we might be rejecting too many columns. > > Our current algorithm for tolerance is the same as the 2-norm of M * > eps. We're citing Golub and Van Loan for this, but now I look at our > copy (p 261, last para) - they seem to be suggesting using u * |M| > where u = (p 61, section 2.4.2) eps / 2. (see [1]). I think the Golub > and Van Loan suggestion corresponds to: > > tol = np.linalg.norm(M, np.inf) * np.finfo(M.dtype).eps / 2 > > This tolerance gives full rank for these rank-deficient matrices in > about 39 percent of cases (tol / S.min() ratio of 1.7) > > We see on p 56 (section 2.3.2) that: > > m, n = M.shape > 1 / sqrt(n) . |M|_{inf} <= |M|_2 > > So we can get an upper bound on |M|_{inf} with |M|_2 * sqrt(n). Setting: > > tol = S.max() * np.finfo(M.dtype).eps / 2 * np.sqrt(n) > > gives about 0.5 percent error (tol / S.min() of 4.4) > > Using the Mathworks threshold [2]: > > tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) > > There are no false negatives (0 percent rank 10), but tol / S.min() is > around 110 - so conservative, in this case. > > So - summary - I'm worrying our current threshold is too small, > letting through many rank-deficient matrices without detection. I may > have misread Golub and Van Loan, but maybe we aren't doing what they > suggest. Maybe what we could use is either the MATLAB threshold or > something like: > > tol = S.max() * np.finfo(M.dtype).eps * np.sqrt(n) > > - so 2 * the upper bound for the inf norm = 2 * |M|_2 * sqrt(n) . This > gives 0 percent misses and tol / S.min() of 8.7. > > What do y'all think? > > Best, > > Matthew > > [1] > http://matthew-brett.github.com/pydagogue/floating_error.html#machine-epsilon > [2] http://www.mathworks.com/help/techdoc/ref/rank.html > > Output from script: > > Percent undetected current: 9.8, tol / S.min(): 2.762 > Percent undetected inf norm: 39.1, tol / S.min(): 1.667 > Percent undetected upper bound inf norm: 0.5, tol / S.min(): 4.367 > Percent undetected upper bound inf norm * 2: 0.0, tol / S.min(): 8.734 > Percent undetected MATLAB: 0.0, tol / S.min(): 110.477 > > > The polynomial fitting uses eps times the largest array dimension for the relative condition number. IIRC, that choice traces back to numerical recipes. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmp50 at ukr.net Fri Jun 15 05:52:58 2012 From: tmp50 at ukr.net (Dmitrey) Date: Fri, 15 Jun 2012 12:52:58 +0300 Subject: [Numpy-discussion] [ANN} OpenOpt / FuncDesigner release 0.39 Message-ID: <92779.1339753978.799619080370585600@ffe16.ukr.net> Hi all, I'm glad to inform you about new OpenOpt Suite release 0.39 (2012-June-15): interalg: add categorical variables and general logical constraints, many other improvements Some improvements for automatic differentiation DerApproximator and some OpenOpt / FuncDesigner functionality now works with PyPy New solver lsmr for dense / sparse LLSP oovar constructors now can handle parameters lb and ub, e.g. a = oovar('a', lb=-1, ub=[1,2,3]) (this oovar should have size 3) or x = oovars(10, lb=-1, ub=1) New FuncDesigner function hstack, similar syntax to numpy.hstack, e.g. f = hstack((a,b,c,d)) Some bugfixes I have some progress toward solving in FuncDesigner linear DAE (differential algebraic equations, example) and Stochastic Opimization (example), but this is too premature yet to be released, there is 60-70% probability it will be properly implemented in next OpenOpt release. In our website you could vote for most required OpenOpt Suite development direction(s). -------------------- Regards, D. http://openopt.org/Dmitrey -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Jun 15 20:39:54 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 15 Jun 2012 17:39:54 -0700 Subject: [Numpy-discussion] Matrix rank default tolerance - is it too low? In-Reply-To: References: Message-ID: Hi, On Thu, Jun 14, 2012 at 8:10 PM, Charles R Harris wrote: > > > On Thu, Jun 14, 2012 at 8:06 PM, Matthew Brett > wrote: >> >> Hi, >> >> I noticed that numpy.linalg.matrix_rank sometimes gives full rank for >> matrices that are numerically rank deficient: >> >> If I repeatedly make random matrices, then set the first column to be >> equal to the sum of the second and third columns: >> >> def make_deficient(): >> ? ?X = np.random.normal(size=(40, 10)) >> ? ?deficient_X = X.copy() >> ? ?deficient_X[:, 0] = deficient_X[:, 1] + deficient_X[:, 2] >> ? ?return deficient_X >> >> then the current numpy.linalg.matrix_rank algorithm returns full rank >> (10) in about 8 percent of cases (see appended script). >> >> I think this is a tolerance problem. ?The ``matrix_rank`` algorithm >> does this by default: >> >> S = spl.svd(M, compute_uv=False) >> tol = S.max() * np.finfo(S.dtype).eps >> return np.sum(S > tol) >> >> I guess we'd we want the lowest tolerance that nearly always or always >> identifies numerically rank deficient matrices. ?I suppose one way of >> looking at whether the tolerance is in the right range is to compare >> the calculated tolerance (``tol``) to the minimum singular value >> (``S.min()``) because S.min() in our case should be very small and >> indicate the rank deficiency. The mean value of tol / S.min() for the >> current algorithm, across many iterations, is about 2.8. ?We might >> hope this value would be higher than 1, but not much higher, otherwise >> we might be rejecting too many columns. >> >> Our current algorithm for tolerance is the same as the 2-norm of M * >> eps. ?We're citing Golub and Van Loan for this, but now I look at our >> copy (p 261, last para) - they seem to be suggesting using u * |M| >> where u = (p 61, section 2.4.2) eps / ?2. (see [1]). I think the Golub >> and Van Loan suggestion corresponds to: >> >> tol = np.linalg.norm(M, np.inf) * np.finfo(M.dtype).eps / 2 >> >> This tolerance gives full rank for these rank-deficient matrices in >> about 39 percent of cases (tol / S.min() ratio of 1.7) >> >> We see on p 56 (section 2.3.2) that: >> >> m, n = M.shape >> 1 / sqrt(n) . |M|_{inf} <= |M|_2 >> >> So we can get an upper bound on |M|_{inf} with |M|_2 * sqrt(n). ?Setting: >> >> tol = S.max() * np.finfo(M.dtype).eps / 2 * np.sqrt(n) >> >> gives about 0.5 percent error (tol / S.min() of 4.4) >> >> Using the Mathworks threshold [2]: >> >> tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) >> >> There are no false negatives (0 percent rank 10), but tol / S.min() is >> around 110 - so conservative, in this case. >> >> So - summary - I'm worrying our current threshold is too small, >> letting through many rank-deficient matrices without detection. ?I may >> have misread Golub and Van Loan, but maybe we aren't doing what they >> suggest. ?Maybe what we could use is either the MATLAB threshold or >> something like: >> >> tol = S.max() * np.finfo(M.dtype).eps * np.sqrt(n) >> >> - so 2 * the upper bound for the inf norm = 2 * |M|_2 * sqrt(n) . This >> gives 0 percent misses and tol / S.min() of 8.7. >> >> What do y'all think? >> >> Best, >> >> Matthew >> >> [1] >> http://matthew-brett.github.com/pydagogue/floating_error.html#machine-epsilon >> [2] http://www.mathworks.com/help/techdoc/ref/rank.html >> >> Output from script: >> >> Percent undetected current: 9.8, tol / S.min(): 2.762 >> Percent undetected inf norm: 39.1, tol / S.min(): 1.667 >> Percent undetected upper bound inf norm: 0.5, tol / S.min(): 4.367 >> Percent undetected upper bound inf norm * 2: 0.0, tol / S.min(): 8.734 >> Percent undetected MATLAB: 0.0, tol / S.min(): 110.477 >> >> > > > The polynomial fitting uses eps times the largest array dimension for the > relative condition number. IIRC, that choice traces back to numerical > recipes. The problem for that as a general metric would be that larger values in the data would tend to lead to larger numerical error, but this won't be reflected in the tolerance. For example, running my script with your metric finds 0 errors for the normal distribution var = 1, but 100 percent error for a variance of 1000. Percent undetected current: 0.0, tol / S.min(): 3.545 Percent undetected inf norm: 0.0, tol / S.min(): 2.139 Percent undetected upper bound inf norm: 0.0, tol / S.min(): 5.605 Percent undetected upper bound inf norm * 2: 0.0, tol / S.min(): 11.209 Percent undetected MATLAB: 0.0, tol / S.min(): 141.785 Percent undetected Chuck: 100.0, tol / S.min(): 0.013 My worry is that I haven't sampled the space of all possible matrix sizes and scalings. First pass suggests that the values will be different on different plaforms (at least, between a PPC 32 bit and an Intel 64 bit). I think the tolerance is wrong at the moment, and it looks like the Golub and Van Loan suggestion will not work as written. The MATLAB algorithm is some kind of standard and has been battle tested. If we are going to change, it seems tempting to use that. What do you think?, Matthew From charlesr.harris at gmail.com Fri Jun 15 22:51:13 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 15 Jun 2012 20:51:13 -0600 Subject: [Numpy-discussion] Matrix rank default tolerance - is it too low? In-Reply-To: References: Message-ID: On Fri, Jun 15, 2012 at 6:39 PM, Matthew Brett wrote: > Hi, > > On Thu, Jun 14, 2012 at 8:10 PM, Charles R Harris > wrote: > > > > > > On Thu, Jun 14, 2012 at 8:06 PM, Matthew Brett > > wrote: > >> > >> Hi, > >> > >> I noticed that numpy.linalg.matrix_rank sometimes gives full rank for > >> matrices that are numerically rank deficient: > >> > >> If I repeatedly make random matrices, then set the first column to be > >> equal to the sum of the second and third columns: > >> > >> def make_deficient(): > >> X = np.random.normal(size=(40, 10)) > >> deficient_X = X.copy() > >> deficient_X[:, 0] = deficient_X[:, 1] + deficient_X[:, 2] > >> return deficient_X > >> > >> then the current numpy.linalg.matrix_rank algorithm returns full rank > >> (10) in about 8 percent of cases (see appended script). > >> > >> I think this is a tolerance problem. The ``matrix_rank`` algorithm > >> does this by default: > >> > >> S = spl.svd(M, compute_uv=False) > >> tol = S.max() * np.finfo(S.dtype).eps > >> return np.sum(S > tol) > >> > >> I guess we'd we want the lowest tolerance that nearly always or always > >> identifies numerically rank deficient matrices. I suppose one way of > >> looking at whether the tolerance is in the right range is to compare > >> the calculated tolerance (``tol``) to the minimum singular value > >> (``S.min()``) because S.min() in our case should be very small and > >> indicate the rank deficiency. The mean value of tol / S.min() for the > >> current algorithm, across many iterations, is about 2.8. We might > >> hope this value would be higher than 1, but not much higher, otherwise > >> we might be rejecting too many columns. > >> > >> Our current algorithm for tolerance is the same as the 2-norm of M * > >> eps. We're citing Golub and Van Loan for this, but now I look at our > >> copy (p 261, last para) - they seem to be suggesting using u * |M| > >> where u = (p 61, section 2.4.2) eps / 2. (see [1]). I think the Golub > >> and Van Loan suggestion corresponds to: > >> > >> tol = np.linalg.norm(M, np.inf) * np.finfo(M.dtype).eps / 2 > >> > >> This tolerance gives full rank for these rank-deficient matrices in > >> about 39 percent of cases (tol / S.min() ratio of 1.7) > >> > >> We see on p 56 (section 2.3.2) that: > >> > >> m, n = M.shape > >> 1 / sqrt(n) . |M|_{inf} <= |M|_2 > >> > >> So we can get an upper bound on |M|_{inf} with |M|_2 * sqrt(n). > Setting: > >> > >> tol = S.max() * np.finfo(M.dtype).eps / 2 * np.sqrt(n) > >> > >> gives about 0.5 percent error (tol / S.min() of 4.4) > >> > >> Using the Mathworks threshold [2]: > >> > >> tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) > >> > >> There are no false negatives (0 percent rank 10), but tol / S.min() is > >> around 110 - so conservative, in this case. > >> > >> So - summary - I'm worrying our current threshold is too small, > >> letting through many rank-deficient matrices without detection. I may > >> have misread Golub and Van Loan, but maybe we aren't doing what they > >> suggest. Maybe what we could use is either the MATLAB threshold or > >> something like: > >> > >> tol = S.max() * np.finfo(M.dtype).eps * np.sqrt(n) > >> > >> - so 2 * the upper bound for the inf norm = 2 * |M|_2 * sqrt(n) . This > >> gives 0 percent misses and tol / S.min() of 8.7. > >> > >> What do y'all think? > >> > >> Best, > >> > >> Matthew > >> > >> [1] > >> > http://matthew-brett.github.com/pydagogue/floating_error.html#machine-epsilon > >> [2] http://www.mathworks.com/help/techdoc/ref/rank.html > >> > >> Output from script: > >> > >> Percent undetected current: 9.8, tol / S.min(): 2.762 > >> Percent undetected inf norm: 39.1, tol / S.min(): 1.667 > >> Percent undetected upper bound inf norm: 0.5, tol / S.min(): 4.367 > >> Percent undetected upper bound inf norm * 2: 0.0, tol / S.min(): 8.734 > >> Percent undetected MATLAB: 0.0, tol / S.min(): 110.477 > >> > >> > > > > > > The polynomial fitting uses eps times the largest array dimension for the > > relative condition number. IIRC, that choice traces back to numerical > > recipes. > > The problem for that as a general metric would be that larger values > in the data would tend to lead to larger numerical error, but this > won't be reflected in the tolerance. For example, running my script > with your metric finds 0 errors for the normal distribution var = 1, > but 100 percent error for a variance of 1000. > > It's a *relative* condition number. You need to multiply it times the largest singular value to find the cutoff. > Percent undetected current: 0.0, tol / S.min(): 3.545 > Percent undetected inf norm: 0.0, tol / S.min(): 2.139 > Percent undetected upper bound inf norm: 0.0, tol / S.min(): 5.605 > Percent undetected upper bound inf norm * 2: 0.0, tol / S.min(): 11.209 > Percent undetected MATLAB: 0.0, tol / S.min(): 141.785 > Percent undetected Chuck: 100.0, tol / S.min(): 0.013 > > My worry is that I haven't sampled the space of all possible matrix > sizes and scalings. First pass suggests that the values will be > different on different plaforms (at least, between a PPC 32 bit and an > Intel 64 bit). I think the tolerance is wrong at the moment, and it > looks like the Golub and Van Loan suggestion will not work as written. > The MATLAB algorithm is some kind of standard and has been battle > tested. If we are going to change, it seems tempting to use that. > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Jun 16 06:40:20 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 16 Jun 2012 11:40:20 +0100 Subject: [Numpy-discussion] Matrix rank default tolerance - is it too low? In-Reply-To: References: Message-ID: On Fri, Jun 15, 2012 at 4:10 AM, Charles R Harris wrote: > > > On Thu, Jun 14, 2012 at 8:06 PM, Matthew Brett > wrote: >> >> Hi, >> >> I noticed that numpy.linalg.matrix_rank sometimes gives full rank for >> matrices that are numerically rank deficient: >> >> If I repeatedly make random matrices, then set the first column to be >> equal to the sum of the second and third columns: >> >> def make_deficient(): >> ? ?X = np.random.normal(size=(40, 10)) >> ? ?deficient_X = X.copy() >> ? ?deficient_X[:, 0] = deficient_X[:, 1] + deficient_X[:, 2] >> ? ?return deficient_X >> >> then the current numpy.linalg.matrix_rank algorithm returns full rank >> (10) in about 8 percent of cases (see appended script). >> >> I think this is a tolerance problem. ?The ``matrix_rank`` algorithm >> does this by default: >> >> S = spl.svd(M, compute_uv=False) >> tol = S.max() * np.finfo(S.dtype).eps >> return np.sum(S > tol) >> >> I guess we'd we want the lowest tolerance that nearly always or always >> identifies numerically rank deficient matrices. ?I suppose one way of >> looking at whether the tolerance is in the right range is to compare >> the calculated tolerance (``tol``) to the minimum singular value >> (``S.min()``) because S.min() in our case should be very small and >> indicate the rank deficiency. The mean value of tol / S.min() for the >> current algorithm, across many iterations, is about 2.8. ?We might >> hope this value would be higher than 1, but not much higher, otherwise >> we might be rejecting too many columns. >> >> Our current algorithm for tolerance is the same as the 2-norm of M * >> eps. ?We're citing Golub and Van Loan for this, but now I look at our >> copy (p 261, last para) - they seem to be suggesting using u * |M| >> where u = (p 61, section 2.4.2) eps / ?2. (see [1]). I think the Golub >> and Van Loan suggestion corresponds to: >> >> tol = np.linalg.norm(M, np.inf) * np.finfo(M.dtype).eps / 2 >> >> This tolerance gives full rank for these rank-deficient matrices in >> about 39 percent of cases (tol / S.min() ratio of 1.7) >> >> We see on p 56 (section 2.3.2) that: >> >> m, n = M.shape >> 1 / sqrt(n) . |M|_{inf} <= |M|_2 >> >> So we can get an upper bound on |M|_{inf} with |M|_2 * sqrt(n). ?Setting: >> >> tol = S.max() * np.finfo(M.dtype).eps / 2 * np.sqrt(n) >> >> gives about 0.5 percent error (tol / S.min() of 4.4) >> >> Using the Mathworks threshold [2]: >> >> tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) >> >> There are no false negatives (0 percent rank 10), but tol / S.min() is >> around 110 - so conservative, in this case. >> >> So - summary - I'm worrying our current threshold is too small, >> letting through many rank-deficient matrices without detection. ?I may >> have misread Golub and Van Loan, but maybe we aren't doing what they >> suggest. ?Maybe what we could use is either the MATLAB threshold or >> something like: >> >> tol = S.max() * np.finfo(M.dtype).eps * np.sqrt(n) >> >> - so 2 * the upper bound for the inf norm = 2 * |M|_2 * sqrt(n) . This >> gives 0 percent misses and tol / S.min() of 8.7. >> >> What do y'all think? >> >> Best, >> >> Matthew >> >> [1] >> http://matthew-brett.github.com/pydagogue/floating_error.html#machine-epsilon >> [2] http://www.mathworks.com/help/techdoc/ref/rank.html >> >> Output from script: >> >> Percent undetected current: 9.8, tol / S.min(): 2.762 >> Percent undetected inf norm: 39.1, tol / S.min(): 1.667 >> Percent undetected upper bound inf norm: 0.5, tol / S.min(): 4.367 >> Percent undetected upper bound inf norm * 2: 0.0, tol / S.min(): 8.734 >> Percent undetected MATLAB: 0.0, tol / S.min(): 110.477 >> >> > > > The polynomial fitting uses eps times the largest array dimension for the > relative condition number. IIRC, that choice traces back to numerical > recipes. This is the same as Matlab, right? If the Matlab condition is the most conservative, then it seems like a reasonable choice -- conservative is good so long as your false positive rate doesn't become to high, and presumably Matlab has enough user experience to know whether the false positive rate is too high. -N From matthew.brett at gmail.com Sat Jun 16 16:03:19 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 16 Jun 2012 20:03:19 +0000 Subject: [Numpy-discussion] Matrix rank default tolerance - is it too low? In-Reply-To: References: Message-ID: Hi, On Sat, Jun 16, 2012 at 10:40 AM, Nathaniel Smith wrote: > On Fri, Jun 15, 2012 at 4:10 AM, Charles R Harris > wrote: >> >> >> On Thu, Jun 14, 2012 at 8:06 PM, Matthew Brett >> wrote: >>> >>> Hi, >>> >>> I noticed that numpy.linalg.matrix_rank sometimes gives full rank for >>> matrices that are numerically rank deficient: >>> >>> If I repeatedly make random matrices, then set the first column to be >>> equal to the sum of the second and third columns: >>> >>> def make_deficient(): >>> ? ?X = np.random.normal(size=(40, 10)) >>> ? ?deficient_X = X.copy() >>> ? ?deficient_X[:, 0] = deficient_X[:, 1] + deficient_X[:, 2] >>> ? ?return deficient_X >>> >>> then the current numpy.linalg.matrix_rank algorithm returns full rank >>> (10) in about 8 percent of cases (see appended script). >>> >>> I think this is a tolerance problem. ?The ``matrix_rank`` algorithm >>> does this by default: >>> >>> S = spl.svd(M, compute_uv=False) >>> tol = S.max() * np.finfo(S.dtype).eps >>> return np.sum(S > tol) >>> >>> I guess we'd we want the lowest tolerance that nearly always or always >>> identifies numerically rank deficient matrices. ?I suppose one way of >>> looking at whether the tolerance is in the right range is to compare >>> the calculated tolerance (``tol``) to the minimum singular value >>> (``S.min()``) because S.min() in our case should be very small and >>> indicate the rank deficiency. The mean value of tol / S.min() for the >>> current algorithm, across many iterations, is about 2.8. ?We might >>> hope this value would be higher than 1, but not much higher, otherwise >>> we might be rejecting too many columns. >>> >>> Our current algorithm for tolerance is the same as the 2-norm of M * >>> eps. ?We're citing Golub and Van Loan for this, but now I look at our >>> copy (p 261, last para) - they seem to be suggesting using u * |M| >>> where u = (p 61, section 2.4.2) eps / ?2. (see [1]). I think the Golub >>> and Van Loan suggestion corresponds to: >>> >>> tol = np.linalg.norm(M, np.inf) * np.finfo(M.dtype).eps / 2 >>> >>> This tolerance gives full rank for these rank-deficient matrices in >>> about 39 percent of cases (tol / S.min() ratio of 1.7) >>> >>> We see on p 56 (section 2.3.2) that: >>> >>> m, n = M.shape >>> 1 / sqrt(n) . |M|_{inf} <= |M|_2 >>> >>> So we can get an upper bound on |M|_{inf} with |M|_2 * sqrt(n). ?Setting: >>> >>> tol = S.max() * np.finfo(M.dtype).eps / 2 * np.sqrt(n) >>> >>> gives about 0.5 percent error (tol / S.min() of 4.4) >>> >>> Using the Mathworks threshold [2]: >>> >>> tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) >>> >>> There are no false negatives (0 percent rank 10), but tol / S.min() is >>> around 110 - so conservative, in this case. >>> >>> So - summary - I'm worrying our current threshold is too small, >>> letting through many rank-deficient matrices without detection. ?I may >>> have misread Golub and Van Loan, but maybe we aren't doing what they >>> suggest. ?Maybe what we could use is either the MATLAB threshold or >>> something like: >>> >>> tol = S.max() * np.finfo(M.dtype).eps * np.sqrt(n) >>> >>> - so 2 * the upper bound for the inf norm = 2 * |M|_2 * sqrt(n) . This >>> gives 0 percent misses and tol / S.min() of 8.7. >>> >>> What do y'all think? >>> >>> Best, >>> >>> Matthew >>> >>> [1] >>> http://matthew-brett.github.com/pydagogue/floating_error.html#machine-epsilon >>> [2] http://www.mathworks.com/help/techdoc/ref/rank.html >>> >>> Output from script: >>> >>> Percent undetected current: 9.8, tol / S.min(): 2.762 >>> Percent undetected inf norm: 39.1, tol / S.min(): 1.667 >>> Percent undetected upper bound inf norm: 0.5, tol / S.min(): 4.367 >>> Percent undetected upper bound inf norm * 2: 0.0, tol / S.min(): 8.734 >>> Percent undetected MATLAB: 0.0, tol / S.min(): 110.477 >>> >>> >> >> >> The polynomial fitting uses eps times the largest array dimension for the >> relative condition number. IIRC, that choice traces back to numerical >> recipes. Chuck - sorry - I didn't understand what you were saying, and now I think you were proposing the MATLAB algorithm. I can't find that in Numerical Recipes - can you? It would be helpful as a reference. > This is the same as Matlab, right? Yes, I believe so, i.e: tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) from my original email. > If the Matlab condition is the most conservative, then it seems like a > reasonable choice -- conservative is good so long as your false > positive rate doesn't become to high, and presumably Matlab has enough > user experience to know whether the false positive rate is too high. Are we agreeing to go for the Matlab algorithm? If so, how should this be managed? Just changing it may change the output of code using numpy >= 1.5.0, but then again, the threshold is probably incorrect. Fix and break or FutureWarning with something like: def matrix_rank(M, tol=None): where ``tol`` can be a string like ``maxdim``? Best, Matthew From njs at pobox.com Sat Jun 16 16:16:01 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 16 Jun 2012 21:16:01 +0100 Subject: [Numpy-discussion] travis-ci support for numpy Message-ID: Thanks to Marc Abramowitz[1], Numpy commits are now being tested by Travis-CI: http://travis-ci.org/#!/numpy/numpy As discussed on the numfocus list[2], this isn't really a complete CI solution, because it only gives test coverage on 64-bit Ubuntu. But, it does cover all supported versions of Python, which should at least catch the very common Python 2.4 compatibility bugs, so that's useful until someone gets a proper buildbot set up. And what will be useful even then is that travis-ci can automatically test all pull requests and report the results in the PR discussion directly[3], sort of like Fernando's script, but without human intervention. To get this turned on, someone needs to: -- donate some arbitrarily small (or large) amount: https://love.travis-ci.org/ -- email support at travis-ci.org and say "Hi, I donated and here is my order # to confirm; could you please turn on PR testing for the numpy/numpy repository, which was connected to travis by user 'njsmith'" I nominate Travis for the "someone", both on grounds of name collision amusingness and on grounds of him having a budget for such things :-). (Though literally, a $1 donation would seem to suffice from what the developers have said.) -n [1] https://github.com/numpy/numpy/pull/292 [2] https://groups.google.com/d/msg/numfocus/I_kmL4FUGaY/f2tO7IEU-l4J [3] http://about.travis-ci.org/blog/announcing-pull-request-support/ From matthew.brett at gmail.com Sat Jun 16 16:33:30 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 16 Jun 2012 20:33:30 +0000 Subject: [Numpy-discussion] Matrix rank default tolerance - is it too low? In-Reply-To: References: Message-ID: Hi, On Sat, Jun 16, 2012 at 8:03 PM, Matthew Brett wrote: > Hi, > > On Sat, Jun 16, 2012 at 10:40 AM, Nathaniel Smith wrote: >> On Fri, Jun 15, 2012 at 4:10 AM, Charles R Harris >> wrote: >>> >>> >>> On Thu, Jun 14, 2012 at 8:06 PM, Matthew Brett >>> wrote: >>>> >>>> Hi, >>>> >>>> I noticed that numpy.linalg.matrix_rank sometimes gives full rank for >>>> matrices that are numerically rank deficient: >>>> >>>> If I repeatedly make random matrices, then set the first column to be >>>> equal to the sum of the second and third columns: >>>> >>>> def make_deficient(): >>>> ? ?X = np.random.normal(size=(40, 10)) >>>> ? ?deficient_X = X.copy() >>>> ? ?deficient_X[:, 0] = deficient_X[:, 1] + deficient_X[:, 2] >>>> ? ?return deficient_X >>>> >>>> then the current numpy.linalg.matrix_rank algorithm returns full rank >>>> (10) in about 8 percent of cases (see appended script). >>>> >>>> I think this is a tolerance problem. ?The ``matrix_rank`` algorithm >>>> does this by default: >>>> >>>> S = spl.svd(M, compute_uv=False) >>>> tol = S.max() * np.finfo(S.dtype).eps >>>> return np.sum(S > tol) >>>> >>>> I guess we'd we want the lowest tolerance that nearly always or always >>>> identifies numerically rank deficient matrices. ?I suppose one way of >>>> looking at whether the tolerance is in the right range is to compare >>>> the calculated tolerance (``tol``) to the minimum singular value >>>> (``S.min()``) because S.min() in our case should be very small and >>>> indicate the rank deficiency. The mean value of tol / S.min() for the >>>> current algorithm, across many iterations, is about 2.8. ?We might >>>> hope this value would be higher than 1, but not much higher, otherwise >>>> we might be rejecting too many columns. >>>> >>>> Our current algorithm for tolerance is the same as the 2-norm of M * >>>> eps. ?We're citing Golub and Van Loan for this, but now I look at our >>>> copy (p 261, last para) - they seem to be suggesting using u * |M| >>>> where u = (p 61, section 2.4.2) eps / ?2. (see [1]). I think the Golub >>>> and Van Loan suggestion corresponds to: >>>> >>>> tol = np.linalg.norm(M, np.inf) * np.finfo(M.dtype).eps / 2 >>>> >>>> This tolerance gives full rank for these rank-deficient matrices in >>>> about 39 percent of cases (tol / S.min() ratio of 1.7) >>>> >>>> We see on p 56 (section 2.3.2) that: >>>> >>>> m, n = M.shape >>>> 1 / sqrt(n) . |M|_{inf} <= |M|_2 >>>> >>>> So we can get an upper bound on |M|_{inf} with |M|_2 * sqrt(n). ?Setting: >>>> >>>> tol = S.max() * np.finfo(M.dtype).eps / 2 * np.sqrt(n) >>>> >>>> gives about 0.5 percent error (tol / S.min() of 4.4) >>>> >>>> Using the Mathworks threshold [2]: >>>> >>>> tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) >>>> >>>> There are no false negatives (0 percent rank 10), but tol / S.min() is >>>> around 110 - so conservative, in this case. >>>> >>>> So - summary - I'm worrying our current threshold is too small, >>>> letting through many rank-deficient matrices without detection. ?I may >>>> have misread Golub and Van Loan, but maybe we aren't doing what they >>>> suggest. ?Maybe what we could use is either the MATLAB threshold or >>>> something like: >>>> >>>> tol = S.max() * np.finfo(M.dtype).eps * np.sqrt(n) >>>> >>>> - so 2 * the upper bound for the inf norm = 2 * |M|_2 * sqrt(n) . This >>>> gives 0 percent misses and tol / S.min() of 8.7. >>>> >>>> What do y'all think? >>>> >>>> Best, >>>> >>>> Matthew >>>> >>>> [1] >>>> http://matthew-brett.github.com/pydagogue/floating_error.html#machine-epsilon >>>> [2] http://www.mathworks.com/help/techdoc/ref/rank.html >>>> >>>> Output from script: >>>> >>>> Percent undetected current: 9.8, tol / S.min(): 2.762 >>>> Percent undetected inf norm: 39.1, tol / S.min(): 1.667 >>>> Percent undetected upper bound inf norm: 0.5, tol / S.min(): 4.367 >>>> Percent undetected upper bound inf norm * 2: 0.0, tol / S.min(): 8.734 >>>> Percent undetected MATLAB: 0.0, tol / S.min(): 110.477 >>>> >>>> >>> >>> >>> The polynomial fitting uses eps times the largest array dimension for the >>> relative condition number. IIRC, that choice traces back to numerical >>> recipes. > > Chuck - sorry - I didn't understand what you were saying, and now I > think you were proposing the MATLAB algorithm. ? I can't find that in > Numerical Recipes - can you? ?It would be helpful as a reference. > >> This is the same as Matlab, right? > > Yes, I believe so, i.e: > > tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) > > from my original email. > >> If the Matlab condition is the most conservative, then it seems like a >> reasonable choice -- conservative is good so long as your false >> positive rate doesn't become to high, and presumably Matlab has enough >> user experience to know whether the false positive rate is too high. > > Are we agreeing to go for the Matlab algorithm? As extra data, current Numerical Recipes (2007, p 67) appears to prefer: tol = S.max() * np.finfo(M.dtype).eps / 2. * np.sqrt(m + n + 1.) There's a discussion of algorithms in: @article{konstantinides1988statistical, title={Statistical analysis of effective singular values in matrix rank determination}, author={Konstantinides, K. and Yao, K.}, journal={Acoustics, Speech and Signal Processing, IEEE Transactions on}, volume={36}, number={5}, pages={757--763}, year={1988}, publisher={IEEE} } Yes, restricted access: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1585&tag=1 Cheers, Matthew From njs at pobox.com Sat Jun 16 16:39:05 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 16 Jun 2012 21:39:05 +0100 Subject: [Numpy-discussion] Matrix rank default tolerance - is it too low? In-Reply-To: References: Message-ID: On Sat, Jun 16, 2012 at 9:03 PM, Matthew Brett wrote: > Hi, > > On Sat, Jun 16, 2012 at 10:40 AM, Nathaniel Smith wrote: >> On Fri, Jun 15, 2012 at 4:10 AM, Charles R Harris >> wrote: >>> >>> >>> On Thu, Jun 14, 2012 at 8:06 PM, Matthew Brett >>> wrote: >>>> >>>> Hi, >>>> >>>> I noticed that numpy.linalg.matrix_rank sometimes gives full rank for >>>> matrices that are numerically rank deficient: >>>> >>>> If I repeatedly make random matrices, then set the first column to be >>>> equal to the sum of the second and third columns: >>>> >>>> def make_deficient(): >>>> ? ?X = np.random.normal(size=(40, 10)) >>>> ? ?deficient_X = X.copy() >>>> ? ?deficient_X[:, 0] = deficient_X[:, 1] + deficient_X[:, 2] >>>> ? ?return deficient_X >>>> >>>> then the current numpy.linalg.matrix_rank algorithm returns full rank >>>> (10) in about 8 percent of cases (see appended script). >>>> >>>> I think this is a tolerance problem. ?The ``matrix_rank`` algorithm >>>> does this by default: >>>> >>>> S = spl.svd(M, compute_uv=False) >>>> tol = S.max() * np.finfo(S.dtype).eps >>>> return np.sum(S > tol) >>>> >>>> I guess we'd we want the lowest tolerance that nearly always or always >>>> identifies numerically rank deficient matrices. ?I suppose one way of >>>> looking at whether the tolerance is in the right range is to compare >>>> the calculated tolerance (``tol``) to the minimum singular value >>>> (``S.min()``) because S.min() in our case should be very small and >>>> indicate the rank deficiency. The mean value of tol / S.min() for the >>>> current algorithm, across many iterations, is about 2.8. ?We might >>>> hope this value would be higher than 1, but not much higher, otherwise >>>> we might be rejecting too many columns. >>>> >>>> Our current algorithm for tolerance is the same as the 2-norm of M * >>>> eps. ?We're citing Golub and Van Loan for this, but now I look at our >>>> copy (p 261, last para) - they seem to be suggesting using u * |M| >>>> where u = (p 61, section 2.4.2) eps / ?2. (see [1]). I think the Golub >>>> and Van Loan suggestion corresponds to: >>>> >>>> tol = np.linalg.norm(M, np.inf) * np.finfo(M.dtype).eps / 2 >>>> >>>> This tolerance gives full rank for these rank-deficient matrices in >>>> about 39 percent of cases (tol / S.min() ratio of 1.7) >>>> >>>> We see on p 56 (section 2.3.2) that: >>>> >>>> m, n = M.shape >>>> 1 / sqrt(n) . |M|_{inf} <= |M|_2 >>>> >>>> So we can get an upper bound on |M|_{inf} with |M|_2 * sqrt(n). ?Setting: >>>> >>>> tol = S.max() * np.finfo(M.dtype).eps / 2 * np.sqrt(n) >>>> >>>> gives about 0.5 percent error (tol / S.min() of 4.4) >>>> >>>> Using the Mathworks threshold [2]: >>>> >>>> tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) >>>> >>>> There are no false negatives (0 percent rank 10), but tol / S.min() is >>>> around 110 - so conservative, in this case. >>>> >>>> So - summary - I'm worrying our current threshold is too small, >>>> letting through many rank-deficient matrices without detection. ?I may >>>> have misread Golub and Van Loan, but maybe we aren't doing what they >>>> suggest. ?Maybe what we could use is either the MATLAB threshold or >>>> something like: >>>> >>>> tol = S.max() * np.finfo(M.dtype).eps * np.sqrt(n) >>>> >>>> - so 2 * the upper bound for the inf norm = 2 * |M|_2 * sqrt(n) . This >>>> gives 0 percent misses and tol / S.min() of 8.7. >>>> >>>> What do y'all think? >>>> >>>> Best, >>>> >>>> Matthew >>>> >>>> [1] >>>> http://matthew-brett.github.com/pydagogue/floating_error.html#machine-epsilon >>>> [2] http://www.mathworks.com/help/techdoc/ref/rank.html >>>> >>>> Output from script: >>>> >>>> Percent undetected current: 9.8, tol / S.min(): 2.762 >>>> Percent undetected inf norm: 39.1, tol / S.min(): 1.667 >>>> Percent undetected upper bound inf norm: 0.5, tol / S.min(): 4.367 >>>> Percent undetected upper bound inf norm * 2: 0.0, tol / S.min(): 8.734 >>>> Percent undetected MATLAB: 0.0, tol / S.min(): 110.477 >>>> >>>> >>> >>> >>> The polynomial fitting uses eps times the largest array dimension for the >>> relative condition number. IIRC, that choice traces back to numerical >>> recipes. > > Chuck - sorry - I didn't understand what you were saying, and now I > think you were proposing the MATLAB algorithm. ? I can't find that in > Numerical Recipes - can you? ?It would be helpful as a reference. > >> This is the same as Matlab, right? > > Yes, I believe so, i.e: > > tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) > > from my original email. > >> If the Matlab condition is the most conservative, then it seems like a >> reasonable choice -- conservative is good so long as your false >> positive rate doesn't become to high, and presumably Matlab has enough >> user experience to know whether the false positive rate is too high. > > Are we agreeing to go for the Matlab algorithm? > > If so, how should this be managed? ?Just changing it may change the > output of code using numpy >= 1.5.0, but then again, the threshold is > probably incorrect. > > Fix and break or FutureWarning with something like: > > def matrix_rank(M, tol=None): > > where ``tol`` can be a string like ``maxdim``? I dunno, I don't think we should do a big deprecation dance for every bug fix. Is this a bug fix, so numpy will simply start producing more accurate results on a given problem? I guess there isn't really a right answer here (though claiming that [a, b, a+b] is full-rank is clearly broken, and the matlab algorithm seems reasonable for answering the specific question of whether a matrix is full rank), so we'll have to hope some users speak up... -N From njs at pobox.com Sat Jun 16 16:39:47 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 16 Jun 2012 21:39:47 +0100 Subject: [Numpy-discussion] Pull request: Split maskna support out of mainline into a branch In-Reply-To: References: Message-ID: On Thu, Jun 14, 2012 at 5:20 PM, David Cournapeau wrote: > > > On Thu, Jun 14, 2012 at 5:17 PM, Nathaniel Smith wrote: >> >> On Wed, Jun 6, 2012 at 11:08 PM, Nathaniel Smith wrote: >> > Just submitted this pull request for discussion: >> > ?https://github.com/numpy/numpy/pull/297 >> > >> > As per earlier discussion on the list, this PR attempts to remove >> > exactly and only the maskna-related code from numpy mainline: >> > ?http://mail.scipy.org/pipermail/numpy-discussion/2012-May/062417.html >> > >> > The suggestion is that we merge this to master for the 1.7 release, >> > and immediately "git revert" it on a branch so that it can be modified >> > further without blocking the release. >> > >> > The first patch does the actual maskna removal; the second and third >> > rearrange things so that PyArray_ReduceWrapper does not end up in the >> > public API, for reasons described therein. >> > >> > All tests pass with Python 2.4, 2.5, 2.6, 2.7, 3.1, 3.2 on 64-bit >> > Ubuntu. The docs also appear to build. Before I re-based this I also >> > tested against Scipy, matplotlib, and pandas, and all were fine. >> >> While it's tempting to think that the lack of response to this >> email/PR indicates that everyone now agrees with me about how to >> proceed with the NA work, I'm for some reason unconvinced... >> >> Any objections to merging this? > > > No objection, but could you wait for this WE ? I am in the middle of setting > up a buildbot for windows for numpy (for both mingw and MSVC compilers), and > that would be a good way to test it. Sounds like we have consensus and the patch is good to go, so let me know when you're ready... From travis at continuum.io Sat Jun 16 19:39:15 2012 From: travis at continuum.io (Travis Oliphant) Date: Sat, 16 Jun 2012 18:39:15 -0500 Subject: [Numpy-discussion] travis-ci support for numpy In-Reply-To: References: Message-ID: <61C7543F-84D5-46FD-9DF1-9BDC7DBDC4E6@continuum.io> This is definitely in-line with the purpose of the foundation and so we'll make the donation. Thanks, for letting us know. Best, -Travis On Jun 16, 2012, at 3:16 PM, Nathaniel Smith wrote: > Thanks to Marc Abramowitz[1], Numpy commits are now being tested by Travis-CI: > http://travis-ci.org/#!/numpy/numpy > As discussed on the numfocus list[2], this isn't really a complete CI > solution, because it only gives test coverage on 64-bit Ubuntu. But, > it does cover all supported versions of Python, which should at least > catch the very common Python 2.4 compatibility bugs, so that's useful > until someone gets a proper buildbot set up. And what will be useful > even then is that travis-ci can automatically test all pull requests > and report the results in the PR discussion directly[3], sort of like > Fernando's script, but without human intervention. > > To get this turned on, someone needs to: > -- donate some arbitrarily small (or large) amount: > https://love.travis-ci.org/ > -- email support at travis-ci.org and say "Hi, I donated and here is my > order # to confirm; could you please turn on PR testing for the > numpy/numpy repository, which was connected to travis by user > 'njsmith'" > > I nominate Travis for the "someone", both on grounds of name collision > amusingness and on grounds of him having a budget for such things :-). > (Though literally, a $1 donation would seem to suffice from what the > developers have said.) > > -n > > [1] https://github.com/numpy/numpy/pull/292 > [2] https://groups.google.com/d/msg/numfocus/I_kmL4FUGaY/f2tO7IEU-l4J > [3] http://about.travis-ci.org/blog/announcing-pull-request-support/ > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From josef.pktd at gmail.com Sun Jun 17 01:06:44 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 17 Jun 2012 01:06:44 -0400 Subject: [Numpy-discussion] Matrix rank default tolerance - is it too low? In-Reply-To: References: Message-ID: On Sat, Jun 16, 2012 at 4:39 PM, Nathaniel Smith wrote: > On Sat, Jun 16, 2012 at 9:03 PM, Matthew Brett wrote: >> Hi, >> >> On Sat, Jun 16, 2012 at 10:40 AM, Nathaniel Smith wrote: >>> On Fri, Jun 15, 2012 at 4:10 AM, Charles R Harris >>> wrote: >>>> >>>> >>>> On Thu, Jun 14, 2012 at 8:06 PM, Matthew Brett >>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> I noticed that numpy.linalg.matrix_rank sometimes gives full rank for >>>>> matrices that are numerically rank deficient: >>>>> >>>>> If I repeatedly make random matrices, then set the first column to be >>>>> equal to the sum of the second and third columns: >>>>> >>>>> def make_deficient(): >>>>> ? ?X = np.random.normal(size=(40, 10)) >>>>> ? ?deficient_X = X.copy() >>>>> ? ?deficient_X[:, 0] = deficient_X[:, 1] + deficient_X[:, 2] >>>>> ? ?return deficient_X >>>>> >>>>> then the current numpy.linalg.matrix_rank algorithm returns full rank >>>>> (10) in about 8 percent of cases (see appended script). >>>>> >>>>> I think this is a tolerance problem. ?The ``matrix_rank`` algorithm >>>>> does this by default: >>>>> >>>>> S = spl.svd(M, compute_uv=False) >>>>> tol = S.max() * np.finfo(S.dtype).eps >>>>> return np.sum(S > tol) >>>>> >>>>> I guess we'd we want the lowest tolerance that nearly always or always >>>>> identifies numerically rank deficient matrices. ?I suppose one way of >>>>> looking at whether the tolerance is in the right range is to compare >>>>> the calculated tolerance (``tol``) to the minimum singular value >>>>> (``S.min()``) because S.min() in our case should be very small and >>>>> indicate the rank deficiency. The mean value of tol / S.min() for the >>>>> current algorithm, across many iterations, is about 2.8. ?We might >>>>> hope this value would be higher than 1, but not much higher, otherwise >>>>> we might be rejecting too many columns. >>>>> >>>>> Our current algorithm for tolerance is the same as the 2-norm of M * >>>>> eps. ?We're citing Golub and Van Loan for this, but now I look at our >>>>> copy (p 261, last para) - they seem to be suggesting using u * |M| >>>>> where u = (p 61, section 2.4.2) eps / ?2. (see [1]). I think the Golub >>>>> and Van Loan suggestion corresponds to: >>>>> >>>>> tol = np.linalg.norm(M, np.inf) * np.finfo(M.dtype).eps / 2 >>>>> >>>>> This tolerance gives full rank for these rank-deficient matrices in >>>>> about 39 percent of cases (tol / S.min() ratio of 1.7) >>>>> >>>>> We see on p 56 (section 2.3.2) that: >>>>> >>>>> m, n = M.shape >>>>> 1 / sqrt(n) . |M|_{inf} <= |M|_2 >>>>> >>>>> So we can get an upper bound on |M|_{inf} with |M|_2 * sqrt(n). ?Setting: >>>>> >>>>> tol = S.max() * np.finfo(M.dtype).eps / 2 * np.sqrt(n) >>>>> >>>>> gives about 0.5 percent error (tol / S.min() of 4.4) >>>>> >>>>> Using the Mathworks threshold [2]: >>>>> >>>>> tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) >>>>> >>>>> There are no false negatives (0 percent rank 10), but tol / S.min() is >>>>> around 110 - so conservative, in this case. >>>>> >>>>> So - summary - I'm worrying our current threshold is too small, >>>>> letting through many rank-deficient matrices without detection. ?I may >>>>> have misread Golub and Van Loan, but maybe we aren't doing what they >>>>> suggest. ?Maybe what we could use is either the MATLAB threshold or >>>>> something like: >>>>> >>>>> tol = S.max() * np.finfo(M.dtype).eps * np.sqrt(n) >>>>> >>>>> - so 2 * the upper bound for the inf norm = 2 * |M|_2 * sqrt(n) . This >>>>> gives 0 percent misses and tol / S.min() of 8.7. >>>>> >>>>> What do y'all think? >>>>> >>>>> Best, >>>>> >>>>> Matthew >>>>> >>>>> [1] >>>>> http://matthew-brett.github.com/pydagogue/floating_error.html#machine-epsilon >>>>> [2] http://www.mathworks.com/help/techdoc/ref/rank.html >>>>> >>>>> Output from script: >>>>> >>>>> Percent undetected current: 9.8, tol / S.min(): 2.762 >>>>> Percent undetected inf norm: 39.1, tol / S.min(): 1.667 >>>>> Percent undetected upper bound inf norm: 0.5, tol / S.min(): 4.367 >>>>> Percent undetected upper bound inf norm * 2: 0.0, tol / S.min(): 8.734 >>>>> Percent undetected MATLAB: 0.0, tol / S.min(): 110.477 >>>>> >>>>> >>>> >>>> >>>> The polynomial fitting uses eps times the largest array dimension for the >>>> relative condition number. IIRC, that choice traces back to numerical >>>> recipes. >> >> Chuck - sorry - I didn't understand what you were saying, and now I >> think you were proposing the MATLAB algorithm. ? I can't find that in >> Numerical Recipes - can you? ?It would be helpful as a reference. >> >>> This is the same as Matlab, right? >> >> Yes, I believe so, i.e: >> >> tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) >> >> from my original email. >> >>> If the Matlab condition is the most conservative, then it seems like a >>> reasonable choice -- conservative is good so long as your false >>> positive rate doesn't become to high, and presumably Matlab has enough >>> user experience to know whether the false positive rate is too high. >> >> Are we agreeing to go for the Matlab algorithm? >> >> If so, how should this be managed? ?Just changing it may change the >> output of code using numpy >= 1.5.0, but then again, the threshold is >> probably incorrect. >> >> Fix and break or FutureWarning with something like: >> >> def matrix_rank(M, tol=None): >> >> where ``tol`` can be a string like ``maxdim``? > > I dunno, I don't think we should do a big deprecation dance for every > bug fix. Is this a bug fix, so numpy will simply start producing more > accurate results on a given problem? I guess there isn't really a > right answer here (though claiming that [a, b, a+b] is full-rank is > clearly broken, and the matlab algorithm seems reasonable for > answering the specific question of whether a matrix is full rank), so > we'll have to hope some users speak up... I don't see a problem changing this as a bugfix. statsmodels still has, I think, the original scipy.stats.models version for rank which is still much higher for any non-huge array and float, cond=1.0e-12. Josef > > -N > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From matthew.brett at gmail.com Sun Jun 17 03:49:02 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 17 Jun 2012 00:49:02 -0700 Subject: [Numpy-discussion] Matrix rank default tolerance - is it too low? In-Reply-To: References: Message-ID: Hi, On Sat, Jun 16, 2012 at 1:33 PM, Matthew Brett wrote: > Hi, > > On Sat, Jun 16, 2012 at 8:03 PM, Matthew Brett wrote: >> Hi, >> >> On Sat, Jun 16, 2012 at 10:40 AM, Nathaniel Smith wrote: >>> On Fri, Jun 15, 2012 at 4:10 AM, Charles R Harris >>> wrote: >>>> >>>> >>>> On Thu, Jun 14, 2012 at 8:06 PM, Matthew Brett >>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> I noticed that numpy.linalg.matrix_rank sometimes gives full rank for >>>>> matrices that are numerically rank deficient: >>>>> >>>>> If I repeatedly make random matrices, then set the first column to be >>>>> equal to the sum of the second and third columns: >>>>> >>>>> def make_deficient(): >>>>> ? ?X = np.random.normal(size=(40, 10)) >>>>> ? ?deficient_X = X.copy() >>>>> ? ?deficient_X[:, 0] = deficient_X[:, 1] + deficient_X[:, 2] >>>>> ? ?return deficient_X >>>>> >>>>> then the current numpy.linalg.matrix_rank algorithm returns full rank >>>>> (10) in about 8 percent of cases (see appended script). >>>>> >>>>> I think this is a tolerance problem. ?The ``matrix_rank`` algorithm >>>>> does this by default: >>>>> >>>>> S = spl.svd(M, compute_uv=False) >>>>> tol = S.max() * np.finfo(S.dtype).eps >>>>> return np.sum(S > tol) >>>>> >>>>> I guess we'd we want the lowest tolerance that nearly always or always >>>>> identifies numerically rank deficient matrices. ?I suppose one way of >>>>> looking at whether the tolerance is in the right range is to compare >>>>> the calculated tolerance (``tol``) to the minimum singular value >>>>> (``S.min()``) because S.min() in our case should be very small and >>>>> indicate the rank deficiency. The mean value of tol / S.min() for the >>>>> current algorithm, across many iterations, is about 2.8. ?We might >>>>> hope this value would be higher than 1, but not much higher, otherwise >>>>> we might be rejecting too many columns. >>>>> >>>>> Our current algorithm for tolerance is the same as the 2-norm of M * >>>>> eps. ?We're citing Golub and Van Loan for this, but now I look at our >>>>> copy (p 261, last para) - they seem to be suggesting using u * |M| >>>>> where u = (p 61, section 2.4.2) eps / ?2. (see [1]). I think the Golub >>>>> and Van Loan suggestion corresponds to: >>>>> >>>>> tol = np.linalg.norm(M, np.inf) * np.finfo(M.dtype).eps / 2 >>>>> >>>>> This tolerance gives full rank for these rank-deficient matrices in >>>>> about 39 percent of cases (tol / S.min() ratio of 1.7) >>>>> >>>>> We see on p 56 (section 2.3.2) that: >>>>> >>>>> m, n = M.shape >>>>> 1 / sqrt(n) . |M|_{inf} <= |M|_2 >>>>> >>>>> So we can get an upper bound on |M|_{inf} with |M|_2 * sqrt(n). ?Setting: >>>>> >>>>> tol = S.max() * np.finfo(M.dtype).eps / 2 * np.sqrt(n) >>>>> >>>>> gives about 0.5 percent error (tol / S.min() of 4.4) >>>>> >>>>> Using the Mathworks threshold [2]: >>>>> >>>>> tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) >>>>> >>>>> There are no false negatives (0 percent rank 10), but tol / S.min() is >>>>> around 110 - so conservative, in this case. >>>>> >>>>> So - summary - I'm worrying our current threshold is too small, >>>>> letting through many rank-deficient matrices without detection. ?I may >>>>> have misread Golub and Van Loan, but maybe we aren't doing what they >>>>> suggest. ?Maybe what we could use is either the MATLAB threshold or >>>>> something like: >>>>> >>>>> tol = S.max() * np.finfo(M.dtype).eps * np.sqrt(n) >>>>> >>>>> - so 2 * the upper bound for the inf norm = 2 * |M|_2 * sqrt(n) . This >>>>> gives 0 percent misses and tol / S.min() of 8.7. >>>>> >>>>> What do y'all think? >>>>> >>>>> Best, >>>>> >>>>> Matthew >>>>> >>>>> [1] >>>>> http://matthew-brett.github.com/pydagogue/floating_error.html#machine-epsilon >>>>> [2] http://www.mathworks.com/help/techdoc/ref/rank.html >>>>> >>>>> Output from script: >>>>> >>>>> Percent undetected current: 9.8, tol / S.min(): 2.762 >>>>> Percent undetected inf norm: 39.1, tol / S.min(): 1.667 >>>>> Percent undetected upper bound inf norm: 0.5, tol / S.min(): 4.367 >>>>> Percent undetected upper bound inf norm * 2: 0.0, tol / S.min(): 8.734 >>>>> Percent undetected MATLAB: 0.0, tol / S.min(): 110.477 >>>>> >>>>> >>>> >>>> >>>> The polynomial fitting uses eps times the largest array dimension for the >>>> relative condition number. IIRC, that choice traces back to numerical >>>> recipes. >> >> Chuck - sorry - I didn't understand what you were saying, and now I >> think you were proposing the MATLAB algorithm. ? I can't find that in >> Numerical Recipes - can you? ?It would be helpful as a reference. >> >>> This is the same as Matlab, right? >> >> Yes, I believe so, i.e: >> >> tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) >> >> from my original email. >> >>> If the Matlab condition is the most conservative, then it seems like a >>> reasonable choice -- conservative is good so long as your false >>> positive rate doesn't become to high, and presumably Matlab has enough >>> user experience to know whether the false positive rate is too high. >> >> Are we agreeing to go for the Matlab algorithm? > > As extra data, current Numerical Recipes (2007, p 67) appears to prefer: > > tol = S.max() * np.finfo(M.dtype).eps / 2. * np.sqrt(m + n + 1.) To add extra confusing flames to this fire, a survey of random matrices of sizes M = (3, 5, 10, 50, 100, 500), N = (3, 5, 10, 50, 100, 500) in all combinations suggests that this last Numerical Recipes algorithm does not give false negatives, and the treshold is generally considerably lower than the MATLAB algorithm. At least for this platform (linux 64 bit), and with only one rank deficient column / row. My feeling is still that the risk of using the matlab version is less, and the risk of too many false positives is relatively small. If anyone disagrees, it might be worth running the test rig for other parameters and platforms, Best, Matthew import numpy as np import numpy.linalg as npl algs = (('current', lambda M, S, eps2: S.max() * eps2), ('inf norm', lambda M, S, eps2: npl.norm(M, np.inf) * eps2 / 2), ('ub inf norm', lambda M, S, eps2: S.max() * eps2 / 2 * np.sqrt(M.shape[1])), ('ub inf norm * 2', lambda M, S, eps2: S.max() * eps2 * np.sqrt(M.shape[1])), ('NR', lambda M, S, eps2: S.max() * eps2 / 2 * np.sqrt(sum(M.shape + (1,)))), ('MATLAB', lambda M, S, eps2: S.max() * eps2 * max(M.shape)), ) def make_deficient(M, N, loc=0, scale=1): X = np.random.normal(size=(M, N)) deficient_X = X.copy() if M > N: # Make a column deficient deficient_X[:, 0] = deficient_X[:, 1] + deficient_X[:, 2] else: # Make a row deficient deficient_X[0] = deficient_X[1] + deficient_X[2] return deficient_X def doit(algs=algs): results = {} n_iters = 1000 n_algs = len(algs) pcnt_div = n_iters * 100. tols = np.zeros((n_iters, n_algs)) ranks = np.zeros((n_iters, n_algs)) eps2 = np.finfo(float).eps for M in (3, 5, 10, 50, 100, 500): for N in (3, 5, 10, 50, 100, 500): max_rank = min(M, N) svs = np.zeros((n_iters, max_rank)) for loc in (0, 100, 1000): for scale in (1, 100, 1000, 10000): for i in range(n_iters): m = make_deficient(M, N, loc, scale) # The SVD tolerances S = npl.svd(m, compute_uv=False) svs[i] = np.sort(S) for j, alg in enumerate(algs): name, func = alg tols[i, j] = func(m, S, eps2) ranks[i, j] = np.sum(S > tols[i, j]) del m, S rel_tols = tols / svs[:, 0][:, None] key = (M, N, loc, scale) print key pcnts = np.sum(ranks == max_rank, axis=0) / pcnt_div mrtols = np.mean(rel_tols, axis=0) results[key] = (pcnts, mrtols) From njs at pobox.com Sun Jun 17 06:10:10 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 17 Jun 2012 11:10:10 +0100 Subject: [Numpy-discussion] Enum/Factor NEP (now with code) In-Reply-To: References: <4FD7B440.8030500@continuum.io> <4FD8C35E.1080901@continuum.io> Message-ID: On Wed, Jun 13, 2012 at 7:54 PM, Wes McKinney wrote: > It looks like the levels can only be strings. This is too limited for > my needs. Why not support all possible NumPy dtypes? In pandas world, > the levels can be any unique Index object It seems like there are three obvious options, from most to least general: 1) Allow levels to be an arbitrary collection of hashable Python objects 2) Allow levels to be a homogenous collection of objects of any arbitrary numpy dtype 3) Allow levels to be chosen a few fixed types (strings and ints, I guess) I agree that (3) is a bit limiting. (1) is probably easier to implement than (2). (2) is the most general, since of course "arbitrary Python object" is a dtype. Is it useful to be able to restrict levels to be of homogenous type? The main difference between dtypes and python types is that (most) dtype scalars can be unboxed -- is that substantively useful for levels? > What is the story for NA values (NaL?) in a factor array? I code them > as -1 in the labels, though you could use INT32_MAX or something. This > is very important in the context of groupby operations. If we have a type restriction on levels (options (2) or (3) above), then how to handle out-of-bounds values is quite a problem, yeah. Once we have NA dtypes then I suppose we could use those, but we don't yet. It's tempting to just error out of any operation that encounters such values. > Nathaniel: my experience (see blog posting above for a bit more) is > that khash really crushes PyDict for two reasons: you can use it with > primitive types and avoid boxing, and secondly you can preallocate. > Its memory footprint with large hashtables is also a fraction of > PyDict. The Python memory allocator is not problematic-- if you create > millions of Python objects expect the RAM usage of the Python process > to balloon absurdly. Right, I saw that posting -- it's clear that khash has a lot of advantages as internal temporary storage for a specific operation like groupby on unboxed types. But I can't tell whether those arguments still apply now that we're talking about a long-term storage representation for data that has to support a variety of operations (many of which would require boxing/unboxing, since the API is in Python), might or might not use boxed types, etc. Obviously this also depends on which of the three options above we go with -- unboxing doesn't even make sense for option (1). -n From wesmckinn at gmail.com Sun Jun 17 16:04:17 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Sun, 17 Jun 2012 16:04:17 -0400 Subject: [Numpy-discussion] Enum/Factor NEP (now with code) In-Reply-To: References: <4FD7B440.8030500@continuum.io> <4FD8C35E.1080901@continuum.io> Message-ID: On Sun, Jun 17, 2012 at 6:10 AM, Nathaniel Smith wrote: > On Wed, Jun 13, 2012 at 7:54 PM, Wes McKinney wrote: >> It looks like the levels can only be strings. This is too limited for >> my needs. Why not support all possible NumPy dtypes? In pandas world, >> the levels can be any unique Index object > > It seems like there are three obvious options, from most to least general: > > 1) Allow levels to be an arbitrary collection of hashable Python objects > 2) Allow levels to be a homogenous collection of objects of any > arbitrary numpy dtype > 3) Allow levels to be chosen a few fixed types (strings and ints, I guess) > > I agree that (3) is a bit limiting. (1) is probably easier to > implement than (2). (2) is the most general, since of course > "arbitrary Python object" is a dtype. Is it useful to be able to > restrict levels to be of homogenous type? The main difference between > dtypes and python types is that (most) dtype scalars can be unboxed -- > is that substantively useful for levels? > >> What is the story for NA values (NaL?) in a factor array? I code them >> as -1 in the labels, though you could use INT32_MAX or something. This >> is very important in the context of groupby operations. > > If we have a type restriction on levels (options (2) or (3) above), > then how to handle out-of-bounds values is quite a problem, yeah. Once > we have NA dtypes then I suppose we could use those, but we don't yet. > It's tempting to just error out of any operation that encounters such > values. > >> Nathaniel: my experience (see blog posting above for a bit more) is >> that khash really crushes PyDict for two reasons: you can use it with >> primitive types and avoid boxing, and secondly you can preallocate. >> Its memory footprint with large hashtables is also a fraction of >> PyDict. The Python memory allocator is not problematic-- if you create >> millions of Python objects expect the RAM usage of the Python process >> to balloon absurdly. > > Right, I saw that posting -- it's clear that khash has a lot of > advantages as internal temporary storage for a specific operation like > groupby on unboxed types. But I can't tell whether those arguments > still apply now that we're talking about a long-term storage > representation for data that has to support a variety of operations > (many of which would require boxing/unboxing, since the API is in > Python), might or might not use boxed types, etc. Obviously this also > depends on which of the three options above we go with -- unboxing > doesn't even make sense for option (1). > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion I'm in favor of option #2 (a lite version of what I'm doing currently-- I handle a few dtypes (PyObject, int64, datetime64, float64), though you'd have to go the code-generation route for all the dtypes to keep yourself sane if you do that. - Wes From cournape at gmail.com Sun Jun 17 16:37:30 2012 From: cournape at gmail.com (David Cournapeau) Date: Sun, 17 Jun 2012 21:37:30 +0100 Subject: [Numpy-discussion] Pull request: Split maskna support out of mainline into a branch In-Reply-To: References: Message-ID: On Sat, Jun 16, 2012 at 9:39 PM, Nathaniel Smith wrote: > On Thu, Jun 14, 2012 at 5:20 PM, David Cournapeau > wrote: > > > > > > On Thu, Jun 14, 2012 at 5:17 PM, Nathaniel Smith wrote: > >> > >> On Wed, Jun 6, 2012 at 11:08 PM, Nathaniel Smith wrote: > >> > Just submitted this pull request for discussion: > >> > https://github.com/numpy/numpy/pull/297 > >> > > >> > As per earlier discussion on the list, this PR attempts to remove > >> > exactly and only the maskna-related code from numpy mainline: > >> > > http://mail.scipy.org/pipermail/numpy-discussion/2012-May/062417.html > >> > > >> > The suggestion is that we merge this to master for the 1.7 release, > >> > and immediately "git revert" it on a branch so that it can be modified > >> > further without blocking the release. > >> > > >> > The first patch does the actual maskna removal; the second and third > >> > rearrange things so that PyArray_ReduceWrapper does not end up in the > >> > public API, for reasons described therein. > >> > > >> > All tests pass with Python 2.4, 2.5, 2.6, 2.7, 3.1, 3.2 on 64-bit > >> > Ubuntu. The docs also appear to build. Before I re-based this I also > >> > tested against Scipy, matplotlib, and pandas, and all were fine. > >> > >> While it's tempting to think that the lack of response to this > >> email/PR indicates that everyone now agrees with me about how to > >> proceed with the NA work, I'm for some reason unconvinced... > >> > >> Any objections to merging this? > > > > > > No objection, but could you wait for this WE ? I am in the middle of > setting > > up a buildbot for windows for numpy (for both mingw and MSVC compilers), > and > > that would be a good way to test it. > > Sounds like we have consensus and the patch is good to go, so let me > know when you're ready... > Setting up the windows builbot is even more of a pain than I expected :( In the end, I just tested your branch with MSVC for python 2.7 (32 bits), and got the following errors related to NA: ====================================================================== ERROR: test_numeric.TestIsclose.test_masked_arrays ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python27\lib\site-packages\nose-1.1.2-py2.7.egg\nose\case.py", line 197, in runTest self.test(*self.arg) File "C:\Users\david\tmp\numpy-git\numpy\core\tests\test_numeric.py", line 1274, in test_masked_arrays assert_(type(x) == type(isclose(inf, x))) File "C:\Users\david\tmp\numpy-git\numpy\core\numeric.py", line 2073, in isclose cond[~finite] = (x[~finite] == y[~finite]) File "C:\Users\david\tmp\numpy-git\numpy\ma\core.py", line 3579, in __eq__ check = ndarray.__eq__(self.filled(0), odata).view(type(self)) AttributeError: 'NotImplementedType' object has no attribute 'view' ====================================================================== ERROR: Test a special case for var ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Users\david\tmp\numpy-git\numpy\ma\tests\test_core.py", line 2735, in test_varstd_specialcases _ = method(out=nout) File "C:\Users\david\tmp\numpy-git\numpy\ma\core.py", line 4778, in std dvar = sqrt(dvar) File "C:\Users\david\tmp\numpy-git\numpy\ma\core.py", line 849, in __call__ m |= self.domain(d) File "C:\Users\david\tmp\numpy-git\numpy\ma\core.py", line 801, in __call__ return umath.less(x, self.critical_value) RuntimeWarning: invalid value encountered in less David -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun Jun 17 17:19:59 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 17 Jun 2012 22:19:59 +0100 Subject: [Numpy-discussion] Enum/Factor NEP (now with code) In-Reply-To: References: <4FD7B440.8030500@continuum.io> <4FD8C35E.1080901@continuum.io> Message-ID: On Sun, Jun 17, 2012 at 9:04 PM, Wes McKinney wrote: > On Sun, Jun 17, 2012 at 6:10 AM, Nathaniel Smith wrote: >> On Wed, Jun 13, 2012 at 7:54 PM, Wes McKinney wrote: >>> It looks like the levels can only be strings. This is too limited for >>> my needs. Why not support all possible NumPy dtypes? In pandas world, >>> the levels can be any unique Index object >> >> It seems like there are three obvious options, from most to least general: >> >> 1) Allow levels to be an arbitrary collection of hashable Python objects >> 2) Allow levels to be a homogenous collection of objects of any >> arbitrary numpy dtype >> 3) Allow levels to be chosen a few fixed types (strings and ints, I guess) >> >> I agree that (3) is a bit limiting. (1) is probably easier to >> implement than (2). (2) is the most general, since of course >> "arbitrary Python object" is a dtype. Is it useful to be able to >> restrict levels to be of homogenous type? The main difference between >> dtypes and python types is that (most) dtype scalars can be unboxed -- >> is that substantively useful for levels? [...] > I'm in favor of option #2 (a lite version of what I'm doing > currently-- I handle a few dtypes (PyObject, int64, datetime64, > float64), though you'd have to go the code-generation route for all > the dtypes to keep yourself sane if you do that. Why would you do code generation? dtype's already expose a generic API for doing boxing/unboxing/etc. Are you thinking this would just be too slow or...? -N From njs at pobox.com Sun Jun 17 17:52:55 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 17 Jun 2012 22:52:55 +0100 Subject: [Numpy-discussion] Pull request: Split maskna support out of mainline into a branch In-Reply-To: References: Message-ID: On Jun 17, 2012 9:37 PM, "David Cournapeau" wrote: > > > > On Sat, Jun 16, 2012 at 9:39 PM, Nathaniel Smith wrote: >> >> On Thu, Jun 14, 2012 at 5:20 PM, David Cournapeau wrote: >> > >> > >> > On Thu, Jun 14, 2012 at 5:17 PM, Nathaniel Smith wrote: >> >> >> >> On Wed, Jun 6, 2012 at 11:08 PM, Nathaniel Smith wrote: >> >> > Just submitted this pull request for discussion: >> >> > https://github.com/numpy/numpy/pull/297 >> >> > >> >> > As per earlier discussion on the list, this PR attempts to remove >> >> > exactly and only the maskna-related code from numpy mainline: >> >> > http://mail.scipy.org/pipermail/numpy-discussion/2012-May/062417.html >> >> > >> >> > The suggestion is that we merge this to master for the 1.7 release, >> >> > and immediately "git revert" it on a branch so that it can be modified >> >> > further without blocking the release. >> >> > >> >> > The first patch does the actual maskna removal; the second and third >> >> > rearrange things so that PyArray_ReduceWrapper does not end up in the >> >> > public API, for reasons described therein. >> >> > >> >> > All tests pass with Python 2.4, 2.5, 2.6, 2.7, 3.1, 3.2 on 64-bit >> >> > Ubuntu. The docs also appear to build. Before I re-based this I also >> >> > tested against Scipy, matplotlib, and pandas, and all were fine. >> >> >> >> While it's tempting to think that the lack of response to this >> >> email/PR indicates that everyone now agrees with me about how to >> >> proceed with the NA work, I'm for some reason unconvinced... >> >> >> >> Any objections to merging this? >> > >> > >> > No objection, but could you wait for this WE ? I am in the middle of setting >> > up a buildbot for windows for numpy (for both mingw and MSVC compilers), and >> > that would be a good way to test it. >> >> Sounds like we have consensus and the patch is good to go, so let me >> know when you're ready... > > > Setting up the windows builbot is even more of a pain than I expected :( > > In the end, I just tested your branch with MSVC for python 2.7 (32 bits), and got the following errors related to NA: > > ====================================================================== > ERROR: test_numeric.TestIsclose.test_masked_arrays > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "C:\Python27\lib\site-packages\nose-1.1.2-py2.7.egg\nose\case.py", line 197, in runTest > self.test(*self.arg) > File "C:\Users\david\tmp\numpy-git\numpy\core\tests\test_numeric.py", line 1274, in test_masked_arrays > assert_(type(x) == type(isclose(inf, x))) > File "C:\Users\david\tmp\numpy-git\numpy\core\numeric.py", line 2073, in isclose > cond[~finite] = (x[~finite] == y[~finite]) > File "C:\Users\david\tmp\numpy-git\numpy\ma\core.py", line 3579, in __eq__ > check = ndarray.__eq__(self.filled(0), odata).view(type(self)) > AttributeError: 'NotImplementedType' object has no attribute 'view' > > ====================================================================== > ERROR: Test a special case for var > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "C:\Users\david\tmp\numpy-git\numpy\ma\tests\test_core.py", line 2735, in test_varstd_specialcases > _ = method(out=nout) > File "C:\Users\david\tmp\numpy-git\numpy\ma\core.py", line 4778, in std > dvar = sqrt(dvar) > File "C:\Users\david\tmp\numpy-git\numpy\ma\core.py", line 849, in __call__ > m |= self.domain(d) > File "C:\Users\david\tmp\numpy-git\numpy\ma\core.py", line 801, in __call__ > return umath.less(x, self.critical_value) > RuntimeWarning: invalid value encountered in less > Oh man wtf. Before I start trying to debug these with my mind, could you confirm real quick that you don't see these with master? -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun Jun 17 18:06:14 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 17 Jun 2012 23:06:14 +0100 Subject: [Numpy-discussion] Enum/Factor NEP (now with code) In-Reply-To: <4FD90EE5.7030601@continuum.io> References: <4FD7B440.8030500@continuum.io> <4FD8C35E.1080901@continuum.io> <4FD90EE5.7030601@continuum.io> Message-ID: On Wed, Jun 13, 2012 at 11:06 PM, Bryan Van de Ven wrote: > On 6/13/12 1:12 PM, Nathaniel Smith wrote: >> Yes, of course we *could* write the code to implement these "open" >> dtypes, and then write the documentation, examples, tutorials, etc. to >> help people work around their limitations. Or, we could just implement >> np.fromfile properly, which would require no workarounds and take less >> code to boot. >> >> [snip] >> So would a proper implementation of np.fromfile that normalized the >> level ordering. > > My understanding of the impetus for the open type was sensitivity to the > performance of having to make two passes over large text datasets. We'll > have to get more feedback from users here and input from Travis, I think. You definitely don't want to make two passes over large text datasets, but that's not required. While reading through the data, you keep a dict mapping levels to integer values, which you assign arbitrarily as new levels are encountered, and an integer array holding the integer value for each line of the file. Then at the end of the file, you sort the levels, figure out what the proper integer value for each level is, and do a single in-memory pass through your array, swapping each integer value for the new correct integer value. Since your original integer values are assigned densely, you can map the old integers to the new integers using a single array lookup. This is going to be much faster than any text file reader. There may be some rare people who have huge data files, fast storage, a very large number of distinct levels, and don't care about normalizing level order. But I really think the default should be to normalize level ordering, and then once you can do that, it's trivial to add a "don't normalize please" option for anyone who wants it. >>> I think I like "categorical" over "factor" but I am not sure we should >>> ditch "enum". There are two different use cases here: I have a pile of >>> strings (or scalars) that I want to treat as discrete things >>> (categories), and: I have a pile of numbers that I want to give >>> convenient or meaningful names to (enums). This latter case was the >>> motivation for possibly adding "Natural Naming". >> So mention the word "enum" in the documentation, so people looking for >> that will find the categorical data support? :-) > > I'm not sure I follow. So the above discussion was just about what to name things, and I was saying that we don't need to use the word "enum" in the API itself, whatever the design ends up looking like. That said, I am not personally sold on the idea of using these things in enum-like roles. There are already tons of "enum" libraries on PyPI (I linked some of them in the last thread on this), and I don't see how this design could handle all the basic use cases for enums. Flag bits are one of the most common enums, after all, but red|green is just NaL. So I'm +0 on just sticking to categorical data. > Natural Naming seems like a great idea for people > that want something like an actual enum (i.e., a way to avoid magic > numbers). We could even imagine some nice with-hacks: > > ? ? colors = enum(['red', 'green', 'blue') > ? ? with colors: > ? ? ? ? foo.fill(red) > ? ? ? ? bar.fill(blue) FYI you can't really do this with a context manager. This is the closest I managed: https://gist.github.com/2347382 and you'll note that it still requires reaching up the stack and directly rewriting the C fields of a PyFrameObject while it is in the middle of executing... this is surprisingly less horrible than it sounds, but that still leaves a lot of room for horribleness. >>>> I'm disturbed to see you adding special cases to the core ufunc >>>> dispatch machinery for these things. I'm -1 on that. We should clean >>>> up the generic ufunc machinery so that it doesn't need special cases >>>> to handle adding a simple type like this. >>> This could certainly be improved, I agree. >> I don't want to be Mr. Grumpypants here, but I do want to make sure >> we're speaking the same language: what "-1" means is "I consider this >> a show-stopper and will oppose merging any code that does not improve >> on this". (Of course you also always have the option of trying to >> change my mind. Even Mr. Grumpypants can be swayed by logic!) > Well, a few comments. The special case in array_richcompare is due to > the lack of string ufuncs. I think it would be great to have string > ufuncs, but I also think it is a separate concern and outside the scope > of this proposal. The special case in arraydescr_typename_get is for the > same reason as datetime special case, the need to access dtype metadata. > I don't think you are really concerned about these two, though? > > That leaves the special case in > PyUFunc_SimpleBinaryComparisonTypeResolver. As I said, I chaffed a bit > when I put that in. On the other hand, having dtypes with this extent of > attached metadata, and potentially dynamic metadata, is unique in NumPy. > It was simple and straightforward to add those few lines of code, and > does not affect performance. How invasive will the changes to core ufunc > machinery be to accommodate a type like this more generally? I took the > easy way because I was new to the numpy codebase and did not feel > confident mucking with the central ufunc code. However, maybe the > dispatch can be accomplished easily with the casting machinery. I am not > so sure, I will have to investigate. ?Of course, I welcome input, > suggestions, and proposals on the best way to improve this. I haven't gone back and looked over all the special cases in detail, but my general point is that ufunc's need to be able to access dtype metadata, and the fact that we're now talking about hard-coding special case workarounds for this for a third dtype is pretty compelling evidence of that. We'd already have full-fledged third-party categorical dtypes if they didn't need special cases in numpy. So I think we should fix the root problem instead of continuing to paper over it. We're not talking about a major re-architecting of numpy or anything. -n From charlesr.harris at gmail.com Sun Jun 17 22:22:26 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 17 Jun 2012 20:22:26 -0600 Subject: [Numpy-discussion] Matrix rank default tolerance - is it too low? In-Reply-To: References: Message-ID: On Sat, Jun 16, 2012 at 2:33 PM, Matthew Brett wrote: > Hi, > > On Sat, Jun 16, 2012 at 8:03 PM, Matthew Brett > wrote: > > Hi, > > > > On Sat, Jun 16, 2012 at 10:40 AM, Nathaniel Smith wrote: > >> On Fri, Jun 15, 2012 at 4:10 AM, Charles R Harris > >> wrote: > >>> > >>> > >>> On Thu, Jun 14, 2012 at 8:06 PM, Matthew Brett < > matthew.brett at gmail.com> > >>> wrote: > >>>> > >>>> Hi, > >>>> > >>>> I noticed that numpy.linalg.matrix_rank sometimes gives full rank for > >>>> matrices that are numerically rank deficient: > >>>> > >>>> If I repeatedly make random matrices, then set the first column to be > >>>> equal to the sum of the second and third columns: > >>>> > >>>> def make_deficient(): > >>>> X = np.random.normal(size=(40, 10)) > >>>> deficient_X = X.copy() > >>>> deficient_X[:, 0] = deficient_X[:, 1] + deficient_X[:, 2] > >>>> return deficient_X > >>>> > >>>> then the current numpy.linalg.matrix_rank algorithm returns full rank > >>>> (10) in about 8 percent of cases (see appended script). > >>>> > >>>> I think this is a tolerance problem. The ``matrix_rank`` algorithm > >>>> does this by default: > >>>> > >>>> S = spl.svd(M, compute_uv=False) > >>>> tol = S.max() * np.finfo(S.dtype).eps > >>>> return np.sum(S > tol) > >>>> > >>>> I guess we'd we want the lowest tolerance that nearly always or always > >>>> identifies numerically rank deficient matrices. I suppose one way of > >>>> looking at whether the tolerance is in the right range is to compare > >>>> the calculated tolerance (``tol``) to the minimum singular value > >>>> (``S.min()``) because S.min() in our case should be very small and > >>>> indicate the rank deficiency. The mean value of tol / S.min() for the > >>>> current algorithm, across many iterations, is about 2.8. We might > >>>> hope this value would be higher than 1, but not much higher, otherwise > >>>> we might be rejecting too many columns. > >>>> > >>>> Our current algorithm for tolerance is the same as the 2-norm of M * > >>>> eps. We're citing Golub and Van Loan for this, but now I look at our > >>>> copy (p 261, last para) - they seem to be suggesting using u * |M| > >>>> where u = (p 61, section 2.4.2) eps / 2. (see [1]). I think the Golub > >>>> and Van Loan suggestion corresponds to: > >>>> > >>>> tol = np.linalg.norm(M, np.inf) * np.finfo(M.dtype).eps / 2 > >>>> > >>>> This tolerance gives full rank for these rank-deficient matrices in > >>>> about 39 percent of cases (tol / S.min() ratio of 1.7) > >>>> > >>>> We see on p 56 (section 2.3.2) that: > >>>> > >>>> m, n = M.shape > >>>> 1 / sqrt(n) . |M|_{inf} <= |M|_2 > >>>> > >>>> So we can get an upper bound on |M|_{inf} with |M|_2 * sqrt(n). > Setting: > >>>> > >>>> tol = S.max() * np.finfo(M.dtype).eps / 2 * np.sqrt(n) > >>>> > >>>> gives about 0.5 percent error (tol / S.min() of 4.4) > >>>> > >>>> Using the Mathworks threshold [2]: > >>>> > >>>> tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) > >>>> > >>>> There are no false negatives (0 percent rank 10), but tol / S.min() is > >>>> around 110 - so conservative, in this case. > >>>> > >>>> So - summary - I'm worrying our current threshold is too small, > >>>> letting through many rank-deficient matrices without detection. I may > >>>> have misread Golub and Van Loan, but maybe we aren't doing what they > >>>> suggest. Maybe what we could use is either the MATLAB threshold or > >>>> something like: > >>>> > >>>> tol = S.max() * np.finfo(M.dtype).eps * np.sqrt(n) > >>>> > >>>> - so 2 * the upper bound for the inf norm = 2 * |M|_2 * sqrt(n) . This > >>>> gives 0 percent misses and tol / S.min() of 8.7. > >>>> > >>>> What do y'all think? > >>>> > >>>> Best, > >>>> > >>>> Matthew > >>>> > >>>> [1] > >>>> > http://matthew-brett.github.com/pydagogue/floating_error.html#machine-epsilon > >>>> [2] http://www.mathworks.com/help/techdoc/ref/rank.html > >>>> > >>>> Output from script: > >>>> > >>>> Percent undetected current: 9.8, tol / S.min(): 2.762 > >>>> Percent undetected inf norm: 39.1, tol / S.min(): 1.667 > >>>> Percent undetected upper bound inf norm: 0.5, tol / S.min(): 4.367 > >>>> Percent undetected upper bound inf norm * 2: 0.0, tol / S.min(): 8.734 > >>>> Percent undetected MATLAB: 0.0, tol / S.min(): 110.477 > >>>> > >>>> > >>> > >>> > >>> The polynomial fitting uses eps times the largest array dimension for > the > >>> relative condition number. IIRC, that choice traces back to numerical > >>> recipes. > > > > Chuck - sorry - I didn't understand what you were saying, and now I > > think you were proposing the MATLAB algorithm. I can't find that in > > Numerical Recipes - can you? It would be helpful as a reference. > > > >> This is the same as Matlab, right? > > > > Yes, I believe so, i.e: > > > > tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) > > > > from my original email. > > > >> If the Matlab condition is the most conservative, then it seems like a > >> reasonable choice -- conservative is good so long as your false > >> positive rate doesn't become to high, and presumably Matlab has enough > >> user experience to know whether the false positive rate is too high. > > > > Are we agreeing to go for the Matlab algorithm? > > As extra data, current Numerical Recipes (2007, p 67) appears to prefer: > > tol = S.max() * np.finfo(M.dtype).eps / 2. * np.sqrt(m + n + 1.) > That's interesting, as something like that with a square root was my first choice for the least squares, but then someone mentioned the NR choice. That was all on the mailing list way several years back when I was fixing up the polynomial fitting routine. The NR reference is on page 517 of the 1986 edition (FORTRAN), which might be hard to come by these days ;) > There's a discussion of algorithms in: > > @article{konstantinides1988statistical, > title={Statistical analysis of effective singular values in matrix > rank determination}, > author={Konstantinides, K. and Yao, K.}, > journal={Acoustics, Speech and Signal Processing, IEEE Transactions on}, > volume={36}, > number={5}, > pages={757--763}, > year={1988}, > publisher={IEEE} > } > > Yes, restricted access: > http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1585&tag=1 > > Cheers, > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From aron at ahmadia.net Mon Jun 18 03:47:14 2012 From: aron at ahmadia.net (Aron Ahmadia) Date: Mon, 18 Jun 2012 10:47:14 +0300 Subject: [Numpy-discussion] Preferring gfortran over g77 on OS X and other distributions? Message-ID: f2py, by default, seems to prefer g77 (no longer maintained, deprecated, speedy, doesn't support Fortran 90 or Fortran 95) over gfortran (maintained, slower, Fortran 90 and Fortran 95 support). This causes problems when we try to compile Fortran 90 extensions using f2py on platforms where both g77 and gfortran are installed without manually switching the compiler's flags. It is a very minor edit to the fcompiler/__init__.py file to prefer gfortran over g77 on OS X, and I can think of almost no reason not to do so, since the Vectorize framework (OS X tuned LAPACK/BLAS) appears to be ABI compatible with gfortran. I am not sure what the situation is on the distributions that numpy is trying to support, but my feeling is that g77 should not be preferred when gfortran is available. Regards, Aron -------------- next part -------------- An HTML attachment was scrubbed... URL: From thouis at gmail.com Mon Jun 18 06:14:34 2012 From: thouis at gmail.com (Thouis (Ray) Jones) Date: Mon, 18 Jun 2012 12:14:34 +0200 Subject: [Numpy-discussion] numpy allocation event hooks Message-ID: Based on some previous discussion on the numpy list [1] and in now-cancelled PRs [2,3], I'd like to solicit opinions on adding an interface for numpy memory allocation event tracking, as implemented in this PR: https://github.com/numpy/numpy/pull/309 A brief summary of the changes: - PyDataMem_NEW/FREE/RENEW become functions in the numpy API. (they used to be macros for malloc/free/realloc) These are the functions used to manage allocations for array's internal data. Most other numpy data is allocated through Python's allocator. - PyDataMem_NEW/RENEW return void* instead of char*. - Adds PyDataMem_SetEventHook() to the API, with this description: * Sets the allocation event hook for numpy array data. * Takes a PyDataMem_EventHookFunc *, which has the signature: * void hook(void *old, void *new, size_t size, void *user_data). * Also takes a void *user_data, and void **old_data. * * Returns a pointer to the previous hook or NULL. If old_data is * non-NULL, the previous user_data pointer will be copied to it. * * If not NULL, hook will be called at the end of each PyDataMem_NEW/FREE/RENEW: * result = PyDataMem_NEW(size) -> (*hook)(NULL, result, size, user_data) * PyDataMem_FREE(ptr) -> (*hook)(ptr, NULL, 0, user_data) * result = PyDataMem_RENEW(ptr, size) -> (*hook)(ptr, result, size, user_data) * * When the hook is called, the GIL will be held by the calling * thread. The hook should be written to be reentrant, if it performs * operations that might cause new allocation events (such as the * creation/descruction numpy objects, or creating/destroying Python * objects which might cause a gc) The PR also includes an example using the hook functions to track allocation via Python callback funcions (in tools/allocation_tracking). Why I think this is worth adding to numpy, even though other tools may be able to provide similar functionality: - numpy arrays use orders of magnitude more memory than most python objects, and this is often a limiting factor in algorithms. - numpy can behave in complicated ways with regards to memory management, e.g., views, OWNDATA, temporaries, etc., making it sometimes difficult to know where memory usage problems are happening and why. - numpy attracts a large number of programmers with limited low-level programming expertise, and who don't have the skills to use external tools (or time/motivation to acquire those skills), but still need to be able to diagnose these sorts of problems. - Other tools are not well integrated with Python, and vary a great deal between OS and compiler setup. I appreciate any feedback. Ray Jones [1] http://mail.scipy.org/pipermail/numpy-discussion/2012-May/062373.html [2] (python callbacks) https://github.com/numpy/numpy/pull/284 [3] (C-level logging) https://github.com/numpy/numpy/pull/301 From d.s.seljebotn at astro.uio.no Mon Jun 18 09:46:59 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 18 Jun 2012 15:46:59 +0200 Subject: [Numpy-discussion] numpy allocation event hooks In-Reply-To: References: Message-ID: <4FDF3153.3050808@astro.uio.no> On 06/18/2012 12:14 PM, Thouis (Ray) Jones wrote: > Based on some previous discussion on the numpy list [1] and in > now-cancelled PRs [2,3], I'd like to solicit opinions on adding an > interface for numpy memory allocation event tracking, as implemented > in this PR: > > https://github.com/numpy/numpy/pull/309 > > A brief summary of the changes: > > - PyDataMem_NEW/FREE/RENEW become functions in the numpy API. > (they used to be macros for malloc/free/realloc) > These are the functions used to manage allocations for array's > internal data. Most other numpy data is allocated through Python's > allocator. > > - PyDataMem_NEW/RENEW return void* instead of char*. > > - Adds PyDataMem_SetEventHook() to the API, with this description: > * Sets the allocation event hook for numpy array data. > * Takes a PyDataMem_EventHookFunc *, which has the signature: > * void hook(void *old, void *new, size_t size, void *user_data). > * Also takes a void *user_data, and void **old_data. > * > * Returns a pointer to the previous hook or NULL. If old_data is > * non-NULL, the previous user_data pointer will be copied to it. > * > * If not NULL, hook will be called at the end of each PyDataMem_NEW/FREE/RENEW: > * result = PyDataMem_NEW(size) -> (*hook)(NULL, result, > size, user_data) > * PyDataMem_FREE(ptr) -> (*hook)(ptr, NULL, 0, user_data) > * result = PyDataMem_RENEW(ptr, size) -> (*hook)(ptr, result, size, > user_data) > * > * When the hook is called, the GIL will be held by the calling > * thread. The hook should be written to be reentrant, if it performs > * operations that might cause new allocation events (such as the > * creation/descruction numpy objects, or creating/destroying Python > * objects which might cause a gc) > > > The PR also includes an example using the hook functions to track > allocation via Python callback funcions (in > tools/allocation_tracking). > > Why I think this is worth adding to numpy, even though other tools may > be able to provide similar functionality: > > - numpy arrays use orders of magnitude more memory than most python > objects, and this is often a limiting factor in algorithms. > > - numpy can behave in complicated ways with regards to memory > management, e.g., views, OWNDATA, temporaries, etc., making it > sometimes difficult to know where memory usage problems are > happening and why. > > - numpy attracts a large number of programmers with limited low-level > programming expertise, and who don't have the skills to use external > tools (or time/motivation to acquire those skills), but still need > to be able to diagnose these sorts of problems. > > - Other tools are not well integrated with Python, and vary a great > deal between OS and compiler setup. > > I appreciate any feedback. Are the hooks able to change how allocation happens/override allocation? If one goes to this much pain already, I think one might as well go the extra step and allow hooks to override memory allocation. At least something to think about -- of course the above (as I understand it) would be a good start on a pluggable allocator even if it isn't done right away. Examples: - Allocate NumPy arrays in process-shared memory using shmem/mmap - Allocate NumPy arrays on some boundary (16-byte, 4096-byte..) using memalign Dag From thouis at gmail.com Mon Jun 18 09:58:19 2012 From: thouis at gmail.com (Thouis (Ray) Jones) Date: Mon, 18 Jun 2012 15:58:19 +0200 Subject: [Numpy-discussion] numpy allocation event hooks In-Reply-To: <4FDF3153.3050808@astro.uio.no> References: <4FDF3153.3050808@astro.uio.no> Message-ID: On Mon, Jun 18, 2012 at 3:46 PM, Dag Sverre Seljebotn wrote: > On 06/18/2012 12:14 PM, Thouis (Ray) Jones wrote: >> Based on some previous discussion on the numpy list [1] and in >> now-cancelled PRs [2,3], I'd like to solicit opinions on adding an >> interface for numpy memory allocation event tracking, as implemented >> in this PR: >> >> https://github.com/numpy/numpy/pull/309 >> >> A brief summary of the changes: >> >> - PyDataMem_NEW/FREE/RENEW become functions in the numpy API. >> ? ?(they used to be macros for malloc/free/realloc) >> ? ?These are the functions used to manage allocations for array's >> ? ?internal data. ?Most other numpy data is allocated through Python's >> ? ?allocator. >> >> - PyDataMem_NEW/RENEW return void* instead of char*. >> >> - Adds PyDataMem_SetEventHook() to the API, with this description: >> ? * Sets the allocation event hook for numpy array data. >> ? * Takes a PyDataMem_EventHookFunc *, which has the signature: >> ? * ? ? ? ?void hook(void *old, void *new, size_t size, void *user_data). >> ? * ? Also takes a void *user_data, and void **old_data. >> ? * >> ? * Returns a pointer to the previous hook or NULL. ?If old_data is >> ? * non-NULL, the previous user_data pointer will be copied to it. >> ? * >> ? * If not NULL, hook will be called at the end of each PyDataMem_NEW/FREE/RENEW: >> ? * ? result = PyDataMem_NEW(size) ? ? ? ?-> ?(*hook)(NULL, result, >> size, user_data) >> ? * ? PyDataMem_FREE(ptr) ? ? ? ? ? ? ? ? -> ?(*hook)(ptr, NULL, 0, user_data) >> ? * ? result = PyDataMem_RENEW(ptr, size) -> ?(*hook)(ptr, result, size, >> user_data) >> ? * >> ? * When the hook is called, the GIL will be held by the calling >> ? * thread. ?The hook should be written to be reentrant, if it performs >> ? * operations that might cause new allocation events (such as the >> ? * creation/descruction numpy objects, or creating/destroying Python >> ? * objects which might cause a gc) >> >> >> The PR also includes an example using the hook functions to track >> allocation via Python callback funcions (in >> tools/allocation_tracking). >> >> Why I think this is worth adding to numpy, even though other tools may >> be able to provide similar functionality: >> >> - numpy arrays use orders of magnitude more memory than most python >> ? ?objects, and this is often a limiting factor in algorithms. >> >> - numpy can behave in complicated ways with regards to memory >> ? ?management, e.g., views, OWNDATA, temporaries, etc., making it >> ? ?sometimes difficult to know where memory usage problems are >> ? ?happening and why. >> >> - numpy attracts a large number of programmers with limited low-level >> ? ?programming expertise, and who don't have the skills to use external >> ? ?tools (or time/motivation to acquire those skills), but still need >> ? ?to be able to diagnose these sorts of problems. >> >> - Other tools are not well integrated with Python, and vary a great >> ? ?deal between OS and compiler setup. >> >> I appreciate any feedback. > > Are the hooks able to change how allocation happens/override allocation? > If one goes to this much pain already, I think one might as well go the > extra step and allow hooks to override memory allocation. > > At least something to think about -- of course the above (as I > understand it) would be a good start on a pluggable allocator even if it > isn't done right away. > > Examples: > > ?- Allocate NumPy arrays in process-shared memory using shmem/mmap > ?- Allocate NumPy arrays on some boundary (16-byte, 4096-byte..) using > memalign That's not present in the current change, but the choice to use "EventHook" rather than the more generic "Hook" was to avoid colliding with a change like that in the future. From bobtnur78 at gmail.com Mon Jun 18 11:55:41 2012 From: bobtnur78 at gmail.com (bob tnur) Date: Mon, 18 Jun 2012 11:55:41 -0400 Subject: [Numpy-discussion] convert any non square matrix in to square matrix using numpy Message-ID: Hi, how I can convert (by adding zero) of any non-square numpy matrix in to square matrix using numpy? then how to find the minimum number in each row except the zeros added(for making square matrix)? ;) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tsyu80 at gmail.com Mon Jun 18 13:04:44 2012 From: tsyu80 at gmail.com (Tony Yu) Date: Mon, 18 Jun 2012 13:04:44 -0400 Subject: [Numpy-discussion] convert any non square matrix in to square matrix using numpy In-Reply-To: References: Message-ID: On Mon, Jun 18, 2012 at 11:55 AM, bob tnur wrote: > Hi, > how I can convert (by adding zero) of any non-square numpy matrix in to > square matrix using numpy? then how to find the minimum number in each row > except the zeros added(for making square matrix)? ;) > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > Hi Bob, I'm not quite sure what you're looking for (esp. the second question), but maybe something like the following? #~~~ example code import numpy as np nonsquare = np.random.random(size=(3, 5)) M, N = nonsquare.shape width = max(M, N) square = np.zeros((width, width)) square[:M, :N] = nonsquare min_rows = np.min(nonsquare, axis=1) #~~~ -Tony -------------- next part -------------- An HTML attachment was scrubbed... URL: From e.antero.tammi at gmail.com Mon Jun 18 13:22:38 2012 From: e.antero.tammi at gmail.com (eat) Date: Mon, 18 Jun 2012 20:22:38 +0300 Subject: [Numpy-discussion] convert any non square matrix in to square matrix using numpy In-Reply-To: References: Message-ID: Hi, On Mon, Jun 18, 2012 at 6:55 PM, bob tnur wrote: > Hi, > how I can convert (by adding zero) of any non-square numpy matrix in to > square matrix using numpy? then how to find the minimum number in each row > except the zeros added(for making square matrix)? ;) > Perhaps something like this: In []: def make_square(A): ..: s= A.shape ..: if s[0]< s[1]: ....: return r_[A, zeros((s[1]- s[0], s[1]), dtype= A.dtype)] ..: return c_[A, zeros((s[0], s[0]- s[1]), dtype= A.dtype)] ..: In []: A= rand(4, 2) In []: make_square(A) Out[]: array([[ 0.76109774, 0.42980812, 0. , 0. ], [ 0.11810978, 0.59622975, 0. , 0. ], [ 0.54991376, 0.29315485, 0. , 0. ], [ 0.78182313, 0.3828001 , 0. , 0. ]]) In []: make_square(A.T) Out[]: array([[ 0.76109774, 0.11810978, 0.54991376, 0.78182313], [ 0.42980812, 0.59622975, 0.29315485, 0.3828001 ], [ 0. , 0. , 0. , 0. ], [ 0. , 0. , 0. , 0. ]]) will help you. My 2 cents, -eat > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon Jun 18 18:50:52 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 18 Jun 2012 15:50:52 -0700 Subject: [Numpy-discussion] Matrix rank default tolerance - is it too low? In-Reply-To: References: Message-ID: Hi, On Sun, Jun 17, 2012 at 7:22 PM, Charles R Harris wrote: > > > On Sat, Jun 16, 2012 at 2:33 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Sat, Jun 16, 2012 at 8:03 PM, Matthew Brett >> wrote: >> > Hi, >> > >> > On Sat, Jun 16, 2012 at 10:40 AM, Nathaniel Smith wrote: >> >> On Fri, Jun 15, 2012 at 4:10 AM, Charles R Harris >> >> wrote: >> >>> >> >>> >> >>> On Thu, Jun 14, 2012 at 8:06 PM, Matthew Brett >> >>> >> >>> wrote: >> >>>> >> >>>> Hi, >> >>>> >> >>>> I noticed that numpy.linalg.matrix_rank sometimes gives full rank for >> >>>> matrices that are numerically rank deficient: >> >>>> >> >>>> If I repeatedly make random matrices, then set the first column to be >> >>>> equal to the sum of the second and third columns: >> >>>> >> >>>> def make_deficient(): >> >>>> ? ?X = np.random.normal(size=(40, 10)) >> >>>> ? ?deficient_X = X.copy() >> >>>> ? ?deficient_X[:, 0] = deficient_X[:, 1] + deficient_X[:, 2] >> >>>> ? ?return deficient_X >> >>>> >> >>>> then the current numpy.linalg.matrix_rank algorithm returns full rank >> >>>> (10) in about 8 percent of cases (see appended script). >> >>>> >> >>>> I think this is a tolerance problem. ?The ``matrix_rank`` algorithm >> >>>> does this by default: >> >>>> >> >>>> S = spl.svd(M, compute_uv=False) >> >>>> tol = S.max() * np.finfo(S.dtype).eps >> >>>> return np.sum(S > tol) >> >>>> >> >>>> I guess we'd we want the lowest tolerance that nearly always or >> >>>> always >> >>>> identifies numerically rank deficient matrices. ?I suppose one way of >> >>>> looking at whether the tolerance is in the right range is to compare >> >>>> the calculated tolerance (``tol``) to the minimum singular value >> >>>> (``S.min()``) because S.min() in our case should be very small and >> >>>> indicate the rank deficiency. The mean value of tol / S.min() for the >> >>>> current algorithm, across many iterations, is about 2.8. ?We might >> >>>> hope this value would be higher than 1, but not much higher, >> >>>> otherwise >> >>>> we might be rejecting too many columns. >> >>>> >> >>>> Our current algorithm for tolerance is the same as the 2-norm of M * >> >>>> eps. ?We're citing Golub and Van Loan for this, but now I look at our >> >>>> copy (p 261, last para) - they seem to be suggesting using u * |M| >> >>>> where u = (p 61, section 2.4.2) eps / ?2. (see [1]). I think the >> >>>> Golub >> >>>> and Van Loan suggestion corresponds to: >> >>>> >> >>>> tol = np.linalg.norm(M, np.inf) * np.finfo(M.dtype).eps / 2 >> >>>> >> >>>> This tolerance gives full rank for these rank-deficient matrices in >> >>>> about 39 percent of cases (tol / S.min() ratio of 1.7) >> >>>> >> >>>> We see on p 56 (section 2.3.2) that: >> >>>> >> >>>> m, n = M.shape >> >>>> 1 / sqrt(n) . |M|_{inf} <= |M|_2 >> >>>> >> >>>> So we can get an upper bound on |M|_{inf} with |M|_2 * sqrt(n). >> >>>> ?Setting: >> >>>> >> >>>> tol = S.max() * np.finfo(M.dtype).eps / 2 * np.sqrt(n) >> >>>> >> >>>> gives about 0.5 percent error (tol / S.min() of 4.4) >> >>>> >> >>>> Using the Mathworks threshold [2]: >> >>>> >> >>>> tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) >> >>>> >> >>>> There are no false negatives (0 percent rank 10), but tol / S.min() >> >>>> is >> >>>> around 110 - so conservative, in this case. >> >>>> >> >>>> So - summary - I'm worrying our current threshold is too small, >> >>>> letting through many rank-deficient matrices without detection. ?I >> >>>> may >> >>>> have misread Golub and Van Loan, but maybe we aren't doing what they >> >>>> suggest. ?Maybe what we could use is either the MATLAB threshold or >> >>>> something like: >> >>>> >> >>>> tol = S.max() * np.finfo(M.dtype).eps * np.sqrt(n) >> >>>> >> >>>> - so 2 * the upper bound for the inf norm = 2 * |M|_2 * sqrt(n) . >> >>>> This >> >>>> gives 0 percent misses and tol / S.min() of 8.7. >> >>>> >> >>>> What do y'all think? >> >>>> >> >>>> Best, >> >>>> >> >>>> Matthew >> >>>> >> >>>> [1] >> >>>> >> >>>> http://matthew-brett.github.com/pydagogue/floating_error.html#machine-epsilon >> >>>> [2] http://www.mathworks.com/help/techdoc/ref/rank.html >> >>>> >> >>>> Output from script: >> >>>> >> >>>> Percent undetected current: 9.8, tol / S.min(): 2.762 >> >>>> Percent undetected inf norm: 39.1, tol / S.min(): 1.667 >> >>>> Percent undetected upper bound inf norm: 0.5, tol / S.min(): 4.367 >> >>>> Percent undetected upper bound inf norm * 2: 0.0, tol / S.min(): >> >>>> 8.734 >> >>>> Percent undetected MATLAB: 0.0, tol / S.min(): 110.477 >> >>>> >> >>>> >> >>> >> >>> >> >>> The polynomial fitting uses eps times the largest array dimension for >> >>> the >> >>> relative condition number. IIRC, that choice traces back to numerical >> >>> recipes. >> > >> > Chuck - sorry - I didn't understand what you were saying, and now I >> > think you were proposing the MATLAB algorithm. ? I can't find that in >> > Numerical Recipes - can you? ?It would be helpful as a reference. >> > >> >> This is the same as Matlab, right? >> > >> > Yes, I believe so, i.e: >> > >> > tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) >> > >> > from my original email. >> > >> >> If the Matlab condition is the most conservative, then it seems like a >> >> reasonable choice -- conservative is good so long as your false >> >> positive rate doesn't become to high, and presumably Matlab has enough >> >> user experience to know whether the false positive rate is too high. >> > >> > Are we agreeing to go for the Matlab algorithm? >> >> As extra data, current Numerical Recipes (2007, p 67) appears to prefer: >> >> tol = S.max() * np.finfo(M.dtype).eps / 2. * np.sqrt(m + n + 1.) > > > That's interesting, as something like that with a square root was my first > choice for the least squares, but then someone mentioned the NR choice. That > was all on the mailing list way several years back when I was fixing up the > polynomial fitting routine. The NR reference is on page 517 of the 1986 > edition (FORTRAN), which might be hard to come by these days ;) Thanks for tracking that down, it's very helpful. For those of you not near a huge University library or your own private copy, p517 says: "A plausible answer to the question "how small is small", is to edit in this fashion all singular values whose ratio to the largest singular value is less then N times the machine precision \epsilon. (You might argue for root N, or a constant, instead of N as the multiple; that starts getting into hardware-dependent questions). Earlier (p510) we see the (General Linear Least Squares) problem being set up as A = (N x M) where N >= M. The 2007 edition replaces the "(You might argue... )" text with: (p 795) "(This is a more conservative recommendation than the default in section 2.6 which scales as N^{1/2})" and this in turn refers to the threshold: tol = S.max() * np.finfo(M.dtype).eps / 2. * np.sqrt(m + n + 1.) (p67) - which is justified as being (p71) " ... a default value based on expected roundoff error". I could not at first glance see any other justification for this threshold in the text. So, how about something like: def matrix_rank(M, tol='maxdim'): ... tol: {'maxdim', 'nr-roundoff'} or float If str, gives threshold strategy for tolerance relative to the maximum singular value, explained below. If float gives absolute tolerance below which singular values assumed zero. For the threshold strategies, we will call the maximum singular value``S.max()`` and the floating point epsilon for the working precision data type ``eps``. Default strategy is 'maxdim' corresponding to ``tol = S.max() * eps * max(M.shape)``. This is the MATLAB default; see also Numerical Recipes 2007. Other options are 'nr-roundoff' (also from Numerical Recipes 2007) corresponding to ``tol = S.max() * eps / 2 * np.sqrt(M.shape[0] + M.shape[1] + 1)``. ? Best, Matthew From ndbecker2 at gmail.com Wed Jun 20 10:58:21 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Wed, 20 Jun 2012 10:58:21 -0400 Subject: [Numpy-discussion] trivial question? Message-ID: Maybe I'm being slow, but is there any convenient function to calculate, for 2 vectors: \sum_i \sum_j x_i y_j (I had a matrix once, but it vanished without a trace) From robert.kern at gmail.com Wed Jun 20 11:01:08 2012 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 20 Jun 2012 16:01:08 +0100 Subject: [Numpy-discussion] trivial question? In-Reply-To: References: Message-ID: On Wed, Jun 20, 2012 at 3:58 PM, Neal Becker wrote: > Maybe I'm being slow, but is there any convenient function to calculate, > for 2 vectors: > > \sum_i \sum_j x_i y_j > > (I had a matrix once, but it vanished without a trace) np.multiply.outer(x, y).sum() -- Robert Kern From charlesr.harris at gmail.com Wed Jun 20 11:57:02 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 20 Jun 2012 09:57:02 -0600 Subject: [Numpy-discussion] trivial question? In-Reply-To: References: Message-ID: On Wed, Jun 20, 2012 at 8:58 AM, Neal Becker wrote: > Maybe I'm being slow, but is there any convenient function to calculate, > for 2 vectors: > > \sum_i \sum_j x_i y_j > It factors, just do x.sum()*y.sum(). Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsalvati at u.washington.edu Wed Jun 20 16:26:03 2012 From: jsalvati at u.washington.edu (John Salvatier) Date: Wed, 20 Jun 2012 13:26:03 -0700 Subject: [Numpy-discussion] PyArray_MapIter{Reset, Next, New, Bind} not exposed? Message-ID: Hello, I wanted to create a function that visits elements in an array using the same rules as advanced indexing (with integer and boolean arrays) but does addition instead of assignment (discussed more here http://mail.scipy.org/pipermail/numpy-discussion/2012-June/062687.html). I looked at the code for advanced indexing and tried to copy that, but it uses PyArrayMapIterObject and the functions for using that object (PyArray_MapIterReset, and the equivalent Next,New and Bind functions as well as _swap_axes) are not exposed by the C API as far as I can tell. Is that on purpose? Is there an easy way to gain access to these functions? Should I give up on this approach? (The function I was trying to build is index_inc in here ( https://github.com/jsalvatier/advinc/blob/master/advinc/advinc.c)) Cheers, John -------------- next part -------------- An HTML attachment was scrubbed... URL: From balarsen at lanl.gov Wed Jun 20 16:32:14 2012 From: balarsen at lanl.gov (Larsen, Brian A) Date: Wed, 20 Jun 2012 20:32:14 +0000 Subject: [Numpy-discussion] frompyfunc question Message-ID: Hello all, I was looking to wrap hasattr in a numpy ufunc and got some weird behavior. Here is a transcript: In [5]: import numpy as np In [6]: np.__version__ Out[6]: '1.6.2' In [7]: b = [1,2,3] In [7]: nphas = np.frompyfunc(hasattr, 2, 1) In [8]: hasattr(b, 'extend') Out[8]: True In [9]: nphas(b, 'extend') Out[9]: NotImplemented In [10]: nphas(b, ['extend']*2) Out[10]: NotImplemented In [11]: type(nphas(b, ['extend']*2)) Out[11]: NotImplementedType This isn't an exception but an object returned, what what or why is not implemented? What I really want to do is be able to run hasattr like isinstance is run, isinstance(b, (list, str)) Anyone have any thoughts/advice here? Cheers, Brian -- Brian A. Larsen ISR-1 Space Science and Applications Los Alamos National Laboratory PO Box 1663, MS-D466 Los Alamos, NM 87545 USA (For overnight add: SM-30, Bikini Atoll Road) Phone: 505-665-7691 Fax: 505-665-7395 email: balarsen at lanl.gov Correspondence / Technical data or Software Publicly Available -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Jun 20 16:48:33 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 20 Jun 2012 13:48:33 -0700 Subject: [Numpy-discussion] Bizarre errors with byteswapping, complex256, PPC Message-ID: Hi, Our Debian friends were hammering our code tests before the upcoming freeze, and found the following very odd thing on 32-bit PPC running Debian squeeze and numpy 1.6.2 or current trunk. Consider the following script: Here is the output from some example runs of this script: np-devel)[mb312 at joshlegacy ~/tmp]$ python funny_bs.py [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j 7.0+0.0j 8.0+0.0j 9.0+0.0j] [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j 7.0+0.0j 8.0+0.0j 9.0+0.0j] [ True False False False False False False False False False] [ True False False False False False False False False False] [ True False False False False False False False False False] (that's the most common result) (np-devel)[mb312 at joshlegacy ~/tmp]$ python funny_bs.py [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j 7.0+0.0j 8.0+0.0j 9.0+0.0j] [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j 7.0+0.0j 8.0+0.0j 9.0+0.0j] [ True True True True True True True True True True] [ True True True True True True True True True True] [ True True True True True True True True True True] (this happens maybe 10% of the time) (np-devel)[mb312 at joshlegacy ~/tmp]$ python funny_bs.py [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j 7.0+0.0j 8.0+0.0j 9.0+0.0j] [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j 7.0+0.0j 8.0+0.0j 9.0+0.0j] [ True True True True True True True True True True] [ True False False False False False False False False False] [ True True True True True True True True True True] (less than 10% of the time - order of True, False prints _of the same comparison_ is random. This only seems to happen with complex256. Is there anything I can do to debug this further? Does anyone want a login to this machine to have a look? See you, Matthew From travis at continuum.io Wed Jun 20 16:56:08 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 20 Jun 2012 15:56:08 -0500 Subject: [Numpy-discussion] Bizarre errors with byteswapping, complex256, PPC In-Reply-To: References: Message-ID: This looks like a problem with comparisons of floating point numbers rather than a byteswapping problem per-say. Try to use an almost equal comparison instead. -Travis On Jun 20, 2012, at 3:48 PM, Matthew Brett wrote: > Hi, > > Our Debian friends were hammering our code tests before the upcoming > freeze, and found the following very odd thing on 32-bit PPC running > Debian squeeze and numpy 1.6.2 or current trunk. > > Consider the following script: > > > > Here is the output from some example runs of this script: > > np-devel)[mb312 at joshlegacy ~/tmp]$ python funny_bs.py > [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > 7.0+0.0j 8.0+0.0j 9.0+0.0j] > [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > 7.0+0.0j 8.0+0.0j 9.0+0.0j] > [ True False False False False False False False False False] > [ True False False False False False False False False False] > [ True False False False False False False False False False] > > (that's the most common result) > > (np-devel)[mb312 at joshlegacy ~/tmp]$ python funny_bs.py > [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > 7.0+0.0j 8.0+0.0j 9.0+0.0j] > [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > 7.0+0.0j 8.0+0.0j 9.0+0.0j] > [ True True True True True True True True True True] > [ True True True True True True True True True True] > [ True True True True True True True True True True] > > (this happens maybe 10% of the time) > > (np-devel)[mb312 at joshlegacy ~/tmp]$ python funny_bs.py > [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > 7.0+0.0j 8.0+0.0j 9.0+0.0j] > [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > 7.0+0.0j 8.0+0.0j 9.0+0.0j] > [ True True True True True True True True True True] > [ True False False False False False False False False False] > [ True True True True True True True True True True] > > (less than 10% of the time - order of True, False prints _of the same > comparison_ is random. This only seems to happen with complex256. > > Is there anything I can do to debug this further? Does anyone want a > login to this machine to have a look? > > See you, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From matthew.brett at gmail.com Wed Jun 20 18:00:12 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 20 Jun 2012 15:00:12 -0700 Subject: [Numpy-discussion] Bizarre errors with byteswapping, complex256, PPC In-Reply-To: References: Message-ID: Hi, On Wed, Jun 20, 2012 at 1:56 PM, Travis Oliphant wrote: > This looks like a problem with comparisons of floating point numbers rather than a byteswapping problem per-say. ? Try to use an almost equal comparison instead. Is that right - that the byteswapped versions might not be strictly equal to identical numbers but not byteswapped? But I should maybe have been clearer - they also subtract wrongly: (np-devel)[mb312 at joshlegacy ~/tmp]$ python funny_bs.py [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j 7.0+0.0j 8.0+0.0j 9.0+0.0j] [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j 7.0+0.0j 8.0+0.0j 9.0+0.0j] [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j 7.0+0.0j 8.0+0.0j 9.0+0.0j] [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j 7.0+0.0j 8.0+0.0j 9.0+0.0j] [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j 7.0+0.0j 8.0+0.0j 9.0+0.0j] (wrong) (np-devel)[mb312 at joshlegacy ~/tmp]$ python funny_bs.py [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j 7.0+0.0j 8.0+0.0j 9.0+0.0j] [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j 7.0+0.0j 8.0+0.0j 9.0+0.0j] [ 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j] [ 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j] [ 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j] (right) See you, Matthew From travis at continuum.io Wed Jun 20 18:04:37 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 20 Jun 2012 17:04:37 -0500 Subject: [Numpy-discussion] Bizarre errors with byteswapping, complex256, PPC In-Reply-To: References: Message-ID: <5AFD7E3F-88A8-434C-924C-15F62A431717@continuum.io> That is clearly wrong and worth a bug report. Travis -- Travis Oliphant (on a mobile) 512-826-7480 On Jun 20, 2012, at 5:00 PM, Matthew Brett wrote: > Hi, > > On Wed, Jun 20, 2012 at 1:56 PM, Travis Oliphant wrote: >> This looks like a problem with comparisons of floating point numbers rather than a byteswapping problem per-say. Try to use an almost equal comparison instead. > > Is that right - that the byteswapped versions might not be strictly > equal to identical numbers but not byteswapped? > > But I should maybe have been clearer - they also subtract wrongly: > > > > (np-devel)[mb312 at joshlegacy ~/tmp]$ python funny_bs.py > [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > 7.0+0.0j 8.0+0.0j 9.0+0.0j] > [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > 7.0+0.0j 8.0+0.0j 9.0+0.0j] > [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > 7.0+0.0j 8.0+0.0j 9.0+0.0j] > [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > 7.0+0.0j 8.0+0.0j 9.0+0.0j] > [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > 7.0+0.0j 8.0+0.0j 9.0+0.0j] > > (wrong) > > (np-devel)[mb312 at joshlegacy ~/tmp]$ python funny_bs.py > [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > 7.0+0.0j 8.0+0.0j 9.0+0.0j] > [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > 7.0+0.0j 8.0+0.0j 9.0+0.0j] > [ 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j > 0.0+0.0j 0.0+0.0j 0.0+0.0j] > [ 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j > 0.0+0.0j 0.0+0.0j 0.0+0.0j] > [ 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j > 0.0+0.0j 0.0+0.0j 0.0+0.0j] > > (right) > > See you, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Wed Jun 20 18:05:05 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 20 Jun 2012 16:05:05 -0600 Subject: [Numpy-discussion] Bizarre errors with byteswapping, complex256, PPC In-Reply-To: References: Message-ID: On Wed, Jun 20, 2012 at 4:00 PM, Matthew Brett wrote: > Hi, > > On Wed, Jun 20, 2012 at 1:56 PM, Travis Oliphant > wrote: > > This looks like a problem with comparisons of floating point numbers > rather than a byteswapping problem per-say. Try to use an almost equal > comparison instead. > > Is that right - that the byteswapped versions might not be strictly > equal to identical numbers but not byteswapped? > > But I should maybe have been clearer - they also subtract wrongly: > > > > (np-devel)[mb312 at joshlegacy ~/tmp]$ python funny_bs.py > [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > 7.0+0.0j 8.0+0.0j 9.0+0.0j] > [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > 7.0+0.0j 8.0+0.0j 9.0+0.0j] > [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > 7.0+0.0j 8.0+0.0j 9.0+0.0j] > [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > 7.0+0.0j 8.0+0.0j 9.0+0.0j] > [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > 7.0+0.0j 8.0+0.0j 9.0+0.0j] > > (wrong) > > (np-devel)[mb312 at joshlegacy ~/tmp]$ python funny_bs.py > [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > 7.0+0.0j 8.0+0.0j 9.0+0.0j] > [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > 7.0+0.0j 8.0+0.0j 9.0+0.0j] > [ 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j > 0.0+0.0j 0.0+0.0j 0.0+0.0j] > [ 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j > 0.0+0.0j 0.0+0.0j 0.0+0.0j] > [ 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j > 0.0+0.0j 0.0+0.0j 0.0+0.0j] > > (right) > > See you, > > Long doubles on PPC consist of two doubles, so I expect you need to swap both doubles instead of 16 bytes. Strictly speaking, numpy doesn't support non ieee floats. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Jun 20 18:08:04 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 20 Jun 2012 16:08:04 -0600 Subject: [Numpy-discussion] Bizarre errors with byteswapping, complex256, PPC In-Reply-To: References: Message-ID: On Wed, Jun 20, 2012 at 2:48 PM, Matthew Brett wrote: > Hi, > > Our Debian friends were hammering our code tests before the upcoming > freeze, and found the following very odd thing on 32-bit PPC running > Debian squeeze and numpy 1.6.2 or current trunk. > > Consider the following script: > > > > Here is the output from some example runs of this script: > > np-devel)[mb312 at joshlegacy ~/tmp]$ python funny_bs.py > [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > 7.0+0.0j 8.0+0.0j 9.0+0.0j] > [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > 7.0+0.0j 8.0+0.0j 9.0+0.0j] > [ True False False False False False False False False False] > [ True False False False False False False False False False] > [ True False False False False False False False False False] > > (that's the most common result) > > (np-devel)[mb312 at joshlegacy ~/tmp]$ python funny_bs.py > [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > 7.0+0.0j 8.0+0.0j 9.0+0.0j] > [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > 7.0+0.0j 8.0+0.0j 9.0+0.0j] > [ True True True True True True True True True True] > [ True True True True True True True True True True] > [ True True True True True True True True True True] > > (this happens maybe 10% of the time) > > (np-devel)[mb312 at joshlegacy ~/tmp]$ python funny_bs.py > [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > 7.0+0.0j 8.0+0.0j 9.0+0.0j] > [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > 7.0+0.0j 8.0+0.0j 9.0+0.0j] > [ True True True True True True True True True True] > [ True False False False False False False False False False] > [ True True True True True True True True True True] > > (less than 10% of the time - order of True, False prints _of the same > comparison_ is random. This only seems to happen with complex256. > > Is there anything I can do to debug this further? Does anyone want a > login to this machine to have a look? > Try swapped float128 and see what that looks like. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Jun 20 18:11:01 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 20 Jun 2012 15:11:01 -0700 Subject: [Numpy-discussion] Bizarre errors with byteswapping, complex256, PPC In-Reply-To: References: Message-ID: Hi, On Wed, Jun 20, 2012 at 3:05 PM, Charles R Harris wrote: > > > On Wed, Jun 20, 2012 at 4:00 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Wed, Jun 20, 2012 at 1:56 PM, Travis Oliphant >> wrote: >> > This looks like a problem with comparisons of floating point numbers >> > rather than a byteswapping problem per-say. ? Try to use an almost equal >> > comparison instead. >> >> Is that right - that the byteswapped versions might not be strictly >> equal to identical numbers but not byteswapped? >> >> But I should maybe have been clearer - they also subtract wrongly: >> >> >> >> (np-devel)[mb312 at joshlegacy ~/tmp]$ python funny_bs.py >> [ 0.0+0.0j ?1.0+0.0j ?2.0+0.0j ?3.0+0.0j ?4.0+0.0j ?5.0+0.0j ?6.0+0.0j >> ?7.0+0.0j ?8.0+0.0j ?9.0+0.0j] >> [ 0.0+0.0j ?1.0+0.0j ?2.0+0.0j ?3.0+0.0j ?4.0+0.0j ?5.0+0.0j ?6.0+0.0j >> ?7.0+0.0j ?8.0+0.0j ?9.0+0.0j] >> [ 0.0+0.0j ?1.0+0.0j ?2.0+0.0j ?3.0+0.0j ?4.0+0.0j ?5.0+0.0j ?6.0+0.0j >> ?7.0+0.0j ?8.0+0.0j ?9.0+0.0j] >> [ 0.0+0.0j ?1.0+0.0j ?2.0+0.0j ?3.0+0.0j ?4.0+0.0j ?5.0+0.0j ?6.0+0.0j >> ?7.0+0.0j ?8.0+0.0j ?9.0+0.0j] >> [ 0.0+0.0j ?1.0+0.0j ?2.0+0.0j ?3.0+0.0j ?4.0+0.0j ?5.0+0.0j ?6.0+0.0j >> ?7.0+0.0j ?8.0+0.0j ?9.0+0.0j] >> >> (wrong) >> >> (np-devel)[mb312 at joshlegacy ~/tmp]$ python funny_bs.py >> [ 0.0+0.0j ?1.0+0.0j ?2.0+0.0j ?3.0+0.0j ?4.0+0.0j ?5.0+0.0j ?6.0+0.0j >> ?7.0+0.0j ?8.0+0.0j ?9.0+0.0j] >> [ 0.0+0.0j ?1.0+0.0j ?2.0+0.0j ?3.0+0.0j ?4.0+0.0j ?5.0+0.0j ?6.0+0.0j >> ?7.0+0.0j ?8.0+0.0j ?9.0+0.0j] >> [ 0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j >> ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j] >> [ 0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j >> ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j] >> [ 0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j >> ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j] >> >> (right) >> >> See you, >> > > Long doubles on PPC consist of two doubles, so I expect you need to swap > both doubles instead of 16 bytes. Strictly speaking, numpy doesn't support > non ieee floats. Well - the byteswapping appears to be correct in that the array is displayed with the correct values, but then, when doing a subtraction on the array, most of the time it is incorrect, but whether it is correct or incorrect, appears to be random even with the same variables and memory. Float128 and other numpy dtypes appear to be correct using the same tests. Best, Matthew From matthew.brett at gmail.com Wed Jun 20 18:46:59 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 20 Jun 2012 15:46:59 -0700 Subject: [Numpy-discussion] Bizarre errors with byteswapping, complex256, PPC In-Reply-To: References: Message-ID: Hi, On Wed, Jun 20, 2012 at 3:11 PM, Matthew Brett wrote: > Hi, > > On Wed, Jun 20, 2012 at 3:05 PM, Charles R Harris > wrote: >> >> >> On Wed, Jun 20, 2012 at 4:00 PM, Matthew Brett >> wrote: >>> >>> Hi, >>> >>> On Wed, Jun 20, 2012 at 1:56 PM, Travis Oliphant >>> wrote: >>> > This looks like a problem with comparisons of floating point numbers >>> > rather than a byteswapping problem per-say. ? Try to use an almost equal >>> > comparison instead. >>> >>> Is that right - that the byteswapped versions might not be strictly >>> equal to identical numbers but not byteswapped? >>> >>> But I should maybe have been clearer - they also subtract wrongly: >>> >>> >>> >>> (np-devel)[mb312 at joshlegacy ~/tmp]$ python funny_bs.py >>> [ 0.0+0.0j ?1.0+0.0j ?2.0+0.0j ?3.0+0.0j ?4.0+0.0j ?5.0+0.0j ?6.0+0.0j >>> ?7.0+0.0j ?8.0+0.0j ?9.0+0.0j] >>> [ 0.0+0.0j ?1.0+0.0j ?2.0+0.0j ?3.0+0.0j ?4.0+0.0j ?5.0+0.0j ?6.0+0.0j >>> ?7.0+0.0j ?8.0+0.0j ?9.0+0.0j] >>> [ 0.0+0.0j ?1.0+0.0j ?2.0+0.0j ?3.0+0.0j ?4.0+0.0j ?5.0+0.0j ?6.0+0.0j >>> ?7.0+0.0j ?8.0+0.0j ?9.0+0.0j] >>> [ 0.0+0.0j ?1.0+0.0j ?2.0+0.0j ?3.0+0.0j ?4.0+0.0j ?5.0+0.0j ?6.0+0.0j >>> ?7.0+0.0j ?8.0+0.0j ?9.0+0.0j] >>> [ 0.0+0.0j ?1.0+0.0j ?2.0+0.0j ?3.0+0.0j ?4.0+0.0j ?5.0+0.0j ?6.0+0.0j >>> ?7.0+0.0j ?8.0+0.0j ?9.0+0.0j] >>> >>> (wrong) >>> >>> (np-devel)[mb312 at joshlegacy ~/tmp]$ python funny_bs.py >>> [ 0.0+0.0j ?1.0+0.0j ?2.0+0.0j ?3.0+0.0j ?4.0+0.0j ?5.0+0.0j ?6.0+0.0j >>> ?7.0+0.0j ?8.0+0.0j ?9.0+0.0j] >>> [ 0.0+0.0j ?1.0+0.0j ?2.0+0.0j ?3.0+0.0j ?4.0+0.0j ?5.0+0.0j ?6.0+0.0j >>> ?7.0+0.0j ?8.0+0.0j ?9.0+0.0j] >>> [ 0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j >>> ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j] >>> [ 0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j >>> ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j] >>> [ 0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j >>> ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j] >>> >>> (right) >>> >>> See you, >>> >> >> Long doubles on PPC consist of two doubles, so I expect you need to swap >> both doubles instead of 16 bytes. Strictly speaking, numpy doesn't support >> non ieee floats. > > Well - the byteswapping appears to be correct in that the array is > displayed with the correct values, but then, when doing a subtraction > on the array, most of the time it is incorrect, but whether it is > correct or incorrect, appears to be random even with the same > variables and memory. > > Float128 and other numpy dtypes appear to be correct using the same tests. http://projects.scipy.org/numpy/ticket/2174 Best, Matthew From charlesr.harris at gmail.com Thu Jun 21 01:43:44 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 20 Jun 2012 23:43:44 -0600 Subject: [Numpy-discussion] Bizarre errors with byteswapping, complex256, PPC In-Reply-To: References: Message-ID: On Wed, Jun 20, 2012 at 4:11 PM, Matthew Brett wrote: > Hi, > > On Wed, Jun 20, 2012 at 3:05 PM, Charles R Harris > wrote: > > > > > > On Wed, Jun 20, 2012 at 4:00 PM, Matthew Brett > > wrote: > >> > >> Hi, > >> > >> On Wed, Jun 20, 2012 at 1:56 PM, Travis Oliphant > >> wrote: > >> > This looks like a problem with comparisons of floating point numbers > >> > rather than a byteswapping problem per-say. Try to use an almost > equal > >> > comparison instead. > >> > >> Is that right - that the byteswapped versions might not be strictly > >> equal to identical numbers but not byteswapped? > >> > >> But I should maybe have been clearer - they also subtract wrongly: > >> > >> > >> > >> (np-devel)[mb312 at joshlegacy ~/tmp]$ python funny_bs.py > >> [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > >> 7.0+0.0j 8.0+0.0j 9.0+0.0j] > >> [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > >> 7.0+0.0j 8.0+0.0j 9.0+0.0j] > >> [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > >> 7.0+0.0j 8.0+0.0j 9.0+0.0j] > >> [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > >> 7.0+0.0j 8.0+0.0j 9.0+0.0j] > >> [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > >> 7.0+0.0j 8.0+0.0j 9.0+0.0j] > >> > >> (wrong) > >> > >> (np-devel)[mb312 at joshlegacy ~/tmp]$ python funny_bs.py > >> [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > >> 7.0+0.0j 8.0+0.0j 9.0+0.0j] > >> [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j 6.0+0.0j > >> 7.0+0.0j 8.0+0.0j 9.0+0.0j] > >> [ 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j > >> 0.0+0.0j 0.0+0.0j 0.0+0.0j] > >> [ 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j > >> 0.0+0.0j 0.0+0.0j 0.0+0.0j] > >> [ 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j > >> 0.0+0.0j 0.0+0.0j 0.0+0.0j] > >> > >> (right) > >> > >> See you, > >> > > > > Long doubles on PPC consist of two doubles, so I expect you need to swap > > both doubles instead of 16 bytes. Strictly speaking, numpy doesn't > support > > non ieee floats. > > Well - the byteswapping appears to be correct in that the array is > displayed with the correct values, but then, when doing a subtraction > on the array, most of the time it is incorrect, but whether it is > correct or incorrect, appears to be random even with the same > variables and memory. > > Float128 and other numpy dtypes appear to be correct using the same tests. > Thinking about it, that makes sense because the swapped version is probably incorrect ;) That is, the PPC was (is?) selectable to run either little endian or big endian, so the real test would be if long doubles were portable between machines set up different ways. The only machines I know of are little endian, but IIRC, there was at least one brand that was big endian. However, I suspect we are just reversing the whole 16 bytes, so even though that is pretty much a meaningless thing to do, it should work... Is this something that only happens on 32 bit machines? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Thu Jun 21 02:11:37 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 20 Jun 2012 23:11:37 -0700 Subject: [Numpy-discussion] Bizarre errors with byteswapping, complex256, PPC In-Reply-To: References: Message-ID: Hi, On Wed, Jun 20, 2012 at 10:43 PM, Charles R Harris wrote: > > > On Wed, Jun 20, 2012 at 4:11 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Wed, Jun 20, 2012 at 3:05 PM, Charles R Harris >> wrote: >> > >> > >> > On Wed, Jun 20, 2012 at 4:00 PM, Matthew Brett >> > wrote: >> >> >> >> Hi, >> >> >> >> On Wed, Jun 20, 2012 at 1:56 PM, Travis Oliphant >> >> wrote: >> >> > This looks like a problem with comparisons of floating point numbers >> >> > rather than a byteswapping problem per-say. ? Try to use an almost >> >> > equal >> >> > comparison instead. >> >> >> >> Is that right - that the byteswapped versions might not be strictly >> >> equal to identical numbers but not byteswapped? >> >> >> >> But I should maybe have been clearer - they also subtract wrongly: >> >> >> >> >> >> >> >> (np-devel)[mb312 at joshlegacy ~/tmp]$ python funny_bs.py >> >> [ 0.0+0.0j ?1.0+0.0j ?2.0+0.0j ?3.0+0.0j ?4.0+0.0j ?5.0+0.0j ?6.0+0.0j >> >> ?7.0+0.0j ?8.0+0.0j ?9.0+0.0j] >> >> [ 0.0+0.0j ?1.0+0.0j ?2.0+0.0j ?3.0+0.0j ?4.0+0.0j ?5.0+0.0j ?6.0+0.0j >> >> ?7.0+0.0j ?8.0+0.0j ?9.0+0.0j] >> >> [ 0.0+0.0j ?1.0+0.0j ?2.0+0.0j ?3.0+0.0j ?4.0+0.0j ?5.0+0.0j ?6.0+0.0j >> >> ?7.0+0.0j ?8.0+0.0j ?9.0+0.0j] >> >> [ 0.0+0.0j ?1.0+0.0j ?2.0+0.0j ?3.0+0.0j ?4.0+0.0j ?5.0+0.0j ?6.0+0.0j >> >> ?7.0+0.0j ?8.0+0.0j ?9.0+0.0j] >> >> [ 0.0+0.0j ?1.0+0.0j ?2.0+0.0j ?3.0+0.0j ?4.0+0.0j ?5.0+0.0j ?6.0+0.0j >> >> ?7.0+0.0j ?8.0+0.0j ?9.0+0.0j] >> >> >> >> (wrong) >> >> >> >> (np-devel)[mb312 at joshlegacy ~/tmp]$ python funny_bs.py >> >> [ 0.0+0.0j ?1.0+0.0j ?2.0+0.0j ?3.0+0.0j ?4.0+0.0j ?5.0+0.0j ?6.0+0.0j >> >> ?7.0+0.0j ?8.0+0.0j ?9.0+0.0j] >> >> [ 0.0+0.0j ?1.0+0.0j ?2.0+0.0j ?3.0+0.0j ?4.0+0.0j ?5.0+0.0j ?6.0+0.0j >> >> ?7.0+0.0j ?8.0+0.0j ?9.0+0.0j] >> >> [ 0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j >> >> ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j] >> >> [ 0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j >> >> ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j] >> >> [ 0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j >> >> ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j] >> >> >> >> (right) >> >> >> >> See you, >> >> >> > >> > Long doubles on PPC consist of two doubles, so I expect you need to swap >> > both doubles instead of 16 bytes. Strictly speaking, numpy doesn't >> > support >> > non ieee floats. >> >> Well - the byteswapping appears to be correct in that the array is >> displayed with the correct values, but then, when doing a subtraction >> on the array, most of the time it is incorrect, but whether it is >> correct or incorrect, appears to be random even with the same >> variables and memory. >> >> Float128 and other numpy dtypes appear to be correct using the same tests. > > > Thinking about it, that makes sense because the swapped version is probably > incorrect ;) That is, the PPC was (is?) selectable to run either little > endian or big endian, so the real test would be if long doubles were > portable between machines set up different ways. The only machines I know of > are little endian, but IIRC, there was at least one brand that was big > endian. However, I suspect we are just reversing the whole 16 bytes, so even > though that is pretty much a meaningless thing to do, it should work... > > Is this something that only happens on 32 bit machines? The PPC machines I have record themselves as big endian (2 running OSX and 1 running Debian wheezy). I can only get the Debian wheezy machine to misbehave in this way. The original report of the problem was on a POWER7 machine running Debian - wheezy I think. I guess this is 64 bit - Yarik - do you know? See you, Matthew From charlesr.harris at gmail.com Thu Jun 21 02:57:57 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 21 Jun 2012 00:57:57 -0600 Subject: [Numpy-discussion] Bizarre errors with byteswapping, complex256, PPC In-Reply-To: References: Message-ID: On Thu, Jun 21, 2012 at 12:11 AM, Matthew Brett wrote: > Hi, > > On Wed, Jun 20, 2012 at 10:43 PM, Charles R Harris > wrote: > > > > > > On Wed, Jun 20, 2012 at 4:11 PM, Matthew Brett > > wrote: > >> > >> Hi, > >> > >> On Wed, Jun 20, 2012 at 3:05 PM, Charles R Harris > >> wrote: > >> > > >> > > >> > On Wed, Jun 20, 2012 at 4:00 PM, Matthew Brett < > matthew.brett at gmail.com> > >> > wrote: > >> >> > >> >> Hi, > >> >> > >> >> On Wed, Jun 20, 2012 at 1:56 PM, Travis Oliphant < > travis at continuum.io> > >> >> wrote: > >> >> > This looks like a problem with comparisons of floating point > numbers > >> >> > rather than a byteswapping problem per-say. Try to use an almost > >> >> > equal > >> >> > comparison instead. > >> >> > >> >> Is that right - that the byteswapped versions might not be strictly > >> >> equal to identical numbers but not byteswapped? > >> >> > >> >> But I should maybe have been clearer - they also subtract wrongly: > >> >> > >> >> > >> >> > >> >> (np-devel)[mb312 at joshlegacy ~/tmp]$ python funny_bs.py > >> >> [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j > 6.0+0.0j > >> >> 7.0+0.0j 8.0+0.0j 9.0+0.0j] > >> >> [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j > 6.0+0.0j > >> >> 7.0+0.0j 8.0+0.0j 9.0+0.0j] > >> >> [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j > 6.0+0.0j > >> >> 7.0+0.0j 8.0+0.0j 9.0+0.0j] > >> >> [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j > 6.0+0.0j > >> >> 7.0+0.0j 8.0+0.0j 9.0+0.0j] > >> >> [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j > 6.0+0.0j > >> >> 7.0+0.0j 8.0+0.0j 9.0+0.0j] > >> >> > >> >> (wrong) > >> >> > >> >> (np-devel)[mb312 at joshlegacy ~/tmp]$ python funny_bs.py > >> >> [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j > 6.0+0.0j > >> >> 7.0+0.0j 8.0+0.0j 9.0+0.0j] > >> >> [ 0.0+0.0j 1.0+0.0j 2.0+0.0j 3.0+0.0j 4.0+0.0j 5.0+0.0j > 6.0+0.0j > >> >> 7.0+0.0j 8.0+0.0j 9.0+0.0j] > >> >> [ 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j > 0.0+0.0j > >> >> 0.0+0.0j 0.0+0.0j 0.0+0.0j] > >> >> [ 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j > 0.0+0.0j > >> >> 0.0+0.0j 0.0+0.0j 0.0+0.0j] > >> >> [ 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j 0.0+0.0j > 0.0+0.0j > >> >> 0.0+0.0j 0.0+0.0j 0.0+0.0j] > >> >> > >> >> (right) > >> >> > >> >> See you, > >> >> > >> > > >> > Long doubles on PPC consist of two doubles, so I expect you need to > swap > >> > both doubles instead of 16 bytes. Strictly speaking, numpy doesn't > >> > support > >> > non ieee floats. > >> > >> Well - the byteswapping appears to be correct in that the array is > >> displayed with the correct values, but then, when doing a subtraction > >> on the array, most of the time it is incorrect, but whether it is > >> correct or incorrect, appears to be random even with the same > >> variables and memory. > >> > >> Float128 and other numpy dtypes appear to be correct using the same > tests. > > > > > > Thinking about it, that makes sense because the swapped version is > probably > > incorrect ;) That is, the PPC was (is?) selectable to run either little > > endian or big endian, so the real test would be if long doubles were > > portable between machines set up different ways. The only machines I > know of > > are little endian, but IIRC, there was at least one brand that was big > > endian. However, I suspect we are just reversing the whole 16 bytes, so > even > > though that is pretty much a meaningless thing to do, it should work... > > > > Is this something that only happens on 32 bit machines? > > The PPC machines I have record themselves as big endian (2 running OSX > and 1 running Debian wheezy). > > I can only get the Debian wheezy machine to misbehave in this way. > > The original report of the problem was on a POWER7 machine running > Debian - wheezy I think. I guess this is 64 bit - Yarik - do you > know? > > Looks like I got little/big endian reversed. Anyway, this is very strange. Same compilers on both machines? I don't understand why this wouldn't show up on other machines with float128, and in particular why is should only happen for complex256 and Debian. What is the extended precision type in OSX on PPC? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Thu Jun 21 05:01:16 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 21 Jun 2012 02:01:16 -0700 Subject: [Numpy-discussion] Bizarre errors with byteswapping, complex256, PPC In-Reply-To: References: Message-ID: Hi, On Wed, Jun 20, 2012 at 11:57 PM, Charles R Harris wrote: > > > On Thu, Jun 21, 2012 at 12:11 AM, Matthew Brett > wrote: >> >> Hi, >> >> On Wed, Jun 20, 2012 at 10:43 PM, Charles R Harris >> wrote: >> > >> > >> > On Wed, Jun 20, 2012 at 4:11 PM, Matthew Brett >> > wrote: >> >> >> >> Hi, >> >> >> >> On Wed, Jun 20, 2012 at 3:05 PM, Charles R Harris >> >> wrote: >> >> > >> >> > >> >> > On Wed, Jun 20, 2012 at 4:00 PM, Matthew Brett >> >> > >> >> > wrote: >> >> >> >> >> >> Hi, >> >> >> >> >> >> On Wed, Jun 20, 2012 at 1:56 PM, Travis Oliphant >> >> >> >> >> >> wrote: >> >> >> > This looks like a problem with comparisons of floating point >> >> >> > numbers >> >> >> > rather than a byteswapping problem per-say. ? Try to use an almost >> >> >> > equal >> >> >> > comparison instead. >> >> >> >> >> >> Is that right - that the byteswapped versions might not be strictly >> >> >> equal to identical numbers but not byteswapped? >> >> >> >> >> >> But I should maybe have been clearer - they also subtract wrongly: >> >> >> >> >> >> >> >> >> >> >> >> (np-devel)[mb312 at joshlegacy ~/tmp]$ python funny_bs.py >> >> >> [ 0.0+0.0j ?1.0+0.0j ?2.0+0.0j ?3.0+0.0j ?4.0+0.0j ?5.0+0.0j >> >> >> ?6.0+0.0j >> >> >> ?7.0+0.0j ?8.0+0.0j ?9.0+0.0j] >> >> >> [ 0.0+0.0j ?1.0+0.0j ?2.0+0.0j ?3.0+0.0j ?4.0+0.0j ?5.0+0.0j >> >> >> ?6.0+0.0j >> >> >> ?7.0+0.0j ?8.0+0.0j ?9.0+0.0j] >> >> >> [ 0.0+0.0j ?1.0+0.0j ?2.0+0.0j ?3.0+0.0j ?4.0+0.0j ?5.0+0.0j >> >> >> ?6.0+0.0j >> >> >> ?7.0+0.0j ?8.0+0.0j ?9.0+0.0j] >> >> >> [ 0.0+0.0j ?1.0+0.0j ?2.0+0.0j ?3.0+0.0j ?4.0+0.0j ?5.0+0.0j >> >> >> ?6.0+0.0j >> >> >> ?7.0+0.0j ?8.0+0.0j ?9.0+0.0j] >> >> >> [ 0.0+0.0j ?1.0+0.0j ?2.0+0.0j ?3.0+0.0j ?4.0+0.0j ?5.0+0.0j >> >> >> ?6.0+0.0j >> >> >> ?7.0+0.0j ?8.0+0.0j ?9.0+0.0j] >> >> >> >> >> >> (wrong) >> >> >> >> >> >> (np-devel)[mb312 at joshlegacy ~/tmp]$ python funny_bs.py >> >> >> [ 0.0+0.0j ?1.0+0.0j ?2.0+0.0j ?3.0+0.0j ?4.0+0.0j ?5.0+0.0j >> >> >> ?6.0+0.0j >> >> >> ?7.0+0.0j ?8.0+0.0j ?9.0+0.0j] >> >> >> [ 0.0+0.0j ?1.0+0.0j ?2.0+0.0j ?3.0+0.0j ?4.0+0.0j ?5.0+0.0j >> >> >> ?6.0+0.0j >> >> >> ?7.0+0.0j ?8.0+0.0j ?9.0+0.0j] >> >> >> [ 0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j >> >> >> ?0.0+0.0j >> >> >> ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j] >> >> >> [ 0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j >> >> >> ?0.0+0.0j >> >> >> ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j] >> >> >> [ 0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j >> >> >> ?0.0+0.0j >> >> >> ?0.0+0.0j ?0.0+0.0j ?0.0+0.0j] >> >> >> >> >> >> (right) >> >> >> >> >> >> See you, >> >> >> >> >> > >> >> > Long doubles on PPC consist of two doubles, so I expect you need to >> >> > swap >> >> > both doubles instead of 16 bytes. Strictly speaking, numpy doesn't >> >> > support >> >> > non ieee floats. >> >> >> >> Well - the byteswapping appears to be correct in that the array is >> >> displayed with the correct values, but then, when doing a subtraction >> >> on the array, most of the time it is incorrect, but whether it is >> >> correct or incorrect, appears to be random even with the same >> >> variables and memory. >> >> >> >> Float128 and other numpy dtypes appear to be correct using the same >> >> tests. >> > >> > >> > Thinking about it, that makes sense because the swapped version is >> > probably >> > incorrect ;) That is, the PPC was (is?) selectable to run either little >> > endian or big endian, so the real test would be if long doubles were >> > portable between machines set up different ways. The only machines I >> > know of >> > are little endian, but IIRC, there was at least one brand that was big >> > endian. However, I suspect we are just reversing the whole 16 bytes, so >> > even >> > though that is pretty much a meaningless thing to do, it should work... >> > >> > Is this something that only happens on 32 bit machines? >> >> The PPC machines I have record themselves as big endian (2 running OSX >> and 1 running Debian wheezy). >> >> I can only get the Debian wheezy machine to misbehave in this way. >> >> The original report of the problem was on a POWER7 machine running >> Debian - wheezy I think. ?I guess this is 64 bit - Yarik - do you >> know? >> > > Looks like I got little/big endian reversed. Anyway, this is very strange. > Same compilers on both machines? gcc-4.6.3-1 on the machine I have access to. The POWER7 machine appears to be running an older kernel, so my guess is it has an older gcc too, but I don't have access I'm afraid. > I don't understand why this wouldn't show > up on other machines with float128, and in particular why is should only > happen for complex256 and Debian. What is the extended precision type in OSX > on PPC? OSX uses double pairs too - and I've been testing on them for a while - so I was also surprised to see this when we got to Debian... See you, Matthew From travis at continuum.io Thu Jun 21 06:11:36 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 21 Jun 2012 05:11:36 -0500 Subject: [Numpy-discussion] Created NumPy 1.7.x branch Message-ID: Hey all, I made a branch called with_maskna and then merged Nathaniel's PR which removes the mask_na support from master. I then applied a patch to fix the boolean indexing problem reported by Ralf. I then created a NumPy 1.7.x maintenance branch from which the release of NumPy 1.7 will be made. Ondrej Certik and I will be managing the release of NumPy 1.7. Ondrej is the author of SymPy and has agreed to help get NumPy 1.7 out the door. Thanks, Ondrej for being willing to help in this way. In principal only bug-fixes should be pushed to the NumPy 1.7 branch at this point. The target is to make a release of NumPy 1.7.x by July 9th. The schedule we will work for is: RC1 -- June 25 RC2 -- July 5 Release -- July 13 NumPy 1.7 is a significant release and has several changes many of which are documented in the release notes. Several new code paths were added which can have a subtle impact on code. As we make the release candidates, it will be very helpful to receive as much feedback as possible on how any changes affect your code. We will work on the release notes over the coming weeks so that they have as much information as possible. After NumPy 1.7, there is a NumPy 1.8 planned for later this year. Best regards, -Travis From ndbecker2 at gmail.com Thu Jun 21 08:15:15 2012 From: ndbecker2 at gmail.com (Neal Becker) Date: Thu, 21 Jun 2012 08:15:15 -0400 Subject: [Numpy-discussion] trivial question? References: Message-ID: Robert Kern wrote: > On Wed, Jun 20, 2012 at 3:58 PM, Neal Becker wrote: >> Maybe I'm being slow, but is there any convenient function to calculate, >> for 2 vectors: >> >> \sum_i \sum_j x_i y_j >> >> (I had a matrix once, but it vanished without a trace) > > np.multiply.outer(x, y).sum() > I guess that's the same as np.outer (x, y).sum() ? From pierre.haessig at crans.org Thu Jun 21 08:22:15 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Thu, 21 Jun 2012 14:22:15 +0200 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: Message-ID: <4FE311F7.8030800@crans.org> Hi, Glad to see that 1.7 is coming soon ! Le 21/06/2012 12:11, Travis Oliphant a ?crit : > NumPy 1.7 is a significant release and has several changes many of which are documented in the release notes. I browsed the sources on github and ended up here : https://github.com/numpy/numpy/tree/maintenance/1.7.x/doc/release I didn't find release notes for 1.7 but there is a file for 2.0 which content suggest it applies to 1.7. https://github.com/numpy/numpy/blob/maintenance/1.7.x/doc/release/2.0.0-notes.rst Is it indeed the file you mentioned ? Best, Pierre -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From pierre.haessig at crans.org Thu Jun 21 08:50:12 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Thu, 21 Jun 2012 14:50:12 +0200 Subject: [Numpy-discussion] "can cast safely" table Message-ID: <4FE31884.5020207@crans.org> Hi, While getting through the ufunc documentation, (http://docs.scipy.org/numpy/docs/numpy-docs/reference/ufuncs.rst/) I took the liberty to change one line in the code segment which generates the "can cast safely" table. I wanted to increase the readability if the table by increasing its contrast. To this end, I replaced the character '0' by '-'. (I copy pasted the result below). Now two questions : 1) One thing is that some of you may not like this change and I'm fine with that because this was just for fun ! 2) A slightly more important thing is to double check that the table content is unchanged. Indeed, the table caption says it was generated on a 32 bits machine while my computer is running a 64 bits Linux. I believe that the table I generated was the same, but please make sure I didn't mess up something. Best, Pierre Quick preview of the change : the "can cast safely" table before : X ? b h i l q p B H I L Q P e f d g F D G S U V O M m ? 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 b 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 h 0 0 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 i 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 1 1 0 0 l 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 1 1 0 0 q 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 1 1 0 0 p 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 1 1 0 0 B 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 H 0 0 0 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 0 I 0 0 0 0 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 1 1 1 1 1 0 0 L 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 0 1 1 1 1 1 1 0 0 Q 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 0 1 1 1 1 1 1 0 0 P 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 1 0 1 1 1 1 1 1 0 0 e 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 d 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 1 1 0 0 g 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 0 0 F 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 D 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 G 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 U 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 V 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 O 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 M 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 m 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 the "can cast safely" table after : X ? b h i l q p B H I L Q P e f d g F D G S U V O M m ? 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 b - 1 1 1 1 1 1 - - - - - - 1 1 1 1 1 1 1 1 1 1 1 - - h - - 1 1 1 1 1 - - - - - - - 1 1 1 1 1 1 1 1 1 1 - - i - - - 1 1 1 1 - - - - - - - - 1 1 - 1 1 1 1 1 1 - - l - - - - 1 1 1 - - - - - - - - 1 1 - 1 1 1 1 1 1 - - q - - - - 1 1 1 - - - - - - - - 1 1 - 1 1 1 1 1 1 - - p - - - - 1 1 1 - - - - - - - - 1 1 - 1 1 1 1 1 1 - - B - - 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 - - H - - - 1 1 1 1 - 1 1 1 1 1 - 1 1 1 1 1 1 1 1 1 1 - - I - - - - 1 1 1 - - 1 1 1 1 - - 1 1 - 1 1 1 1 1 1 - - L - - - - - - - - - - 1 1 1 - - 1 1 - 1 1 1 1 1 1 - - Q - - - - - - - - - - 1 1 1 - - 1 1 - 1 1 1 1 1 1 - - P - - - - - - - - - - 1 1 1 - - 1 1 - 1 1 1 1 1 1 - - e - - - - - - - - - - - - - 1 1 1 1 1 1 1 1 1 1 1 - - f - - - - - - - - - - - - - - 1 1 1 1 1 1 1 1 1 1 - - d - - - - - - - - - - - - - - - 1 1 - 1 1 1 1 1 1 - - g - - - - - - - - - - - - - - - - 1 - - 1 1 1 1 1 - - F - - - - - - - - - - - - - - - - - 1 1 1 1 1 1 1 - - D - - - - - - - - - - - - - - - - - - 1 1 1 1 1 1 - - G - - - - - - - - - - - - - - - - - - - 1 1 1 1 1 - - S - - - - - - - - - - - - - - - - - - - - 1 1 1 1 - - U - - - - - - - - - - - - - - - - - - - - - 1 1 1 - - V - - - - - - - - - - - - - - - - - - - - - - 1 1 - - O - - - - - - - - - - - - - - - - - - - - - - 1 1 - - M - - - - - - - - - - - - - - - - - - - - - - - - 1 - m - - - - - - - - - - - - - - - - - - - - - - - - - 1 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From charlesr.harris at gmail.com Thu Jun 21 09:07:25 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 21 Jun 2012 07:07:25 -0600 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: Message-ID: On Thu, Jun 21, 2012 at 4:11 AM, Travis Oliphant wrote: > Hey all, > > I made a branch called with_maskna and then merged Nathaniel's PR which > removes the mask_na support from master. I then applied a patch to fix the > boolean indexing problem reported by Ralf. > > I then created a NumPy 1.7.x maintenance branch from which the release of > NumPy 1.7 will be made. Ondrej Certik and I will be managing the release > of NumPy 1.7. Ondrej is the author of SymPy and has agreed to help get > NumPy 1.7 out the door. Thanks, Ondrej for being willing to help in this > way. > > In principal only bug-fixes should be pushed to the NumPy 1.7 branch at > this point. The target is to make a release of NumPy 1.7.x by July 9th. > The schedule we will work for is: > > RC1 -- June 25 > RC2 -- July 5 > Release -- July 13 > > NumPy 1.7 is a significant release and has several changes many of which > are documented in the release notes. Several new code paths were added > which can have a subtle impact on code. As we make the release > candidates, it will be very helpful to receive as much feedback as possible > on how any changes affect your code. We will work on the release notes > over the coming weeks so that they have as much information as possible. > > After NumPy 1.7, there is a NumPy 1.8 planned for later this year. > > Hmm, I was going to add the type specific sorts for object and structured types. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Jun 21 09:10:12 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 21 Jun 2012 07:10:12 -0600 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: Message-ID: On Thu, Jun 21, 2012 at 7:07 AM, Charles R Harris wrote: > > > On Thu, Jun 21, 2012 at 4:11 AM, Travis Oliphant wrote: > >> Hey all, >> >> I made a branch called with_maskna and then merged Nathaniel's PR which >> removes the mask_na support from master. I then applied a patch to fix the >> boolean indexing problem reported by Ralf. >> >> I then created a NumPy 1.7.x maintenance branch from which the release of >> NumPy 1.7 will be made. Ondrej Certik and I will be managing the release >> of NumPy 1.7. Ondrej is the author of SymPy and has agreed to help get >> NumPy 1.7 out the door. Thanks, Ondrej for being willing to help in this >> way. >> >> In principal only bug-fixes should be pushed to the NumPy 1.7 branch at >> this point. The target is to make a release of NumPy 1.7.x by July 9th. >> The schedule we will work for is: >> >> RC1 -- June 25 >> RC2 -- July 5 >> Release -- July 13 >> >> NumPy 1.7 is a significant release and has several changes many of which >> are documented in the release notes. Several new code paths were added >> which can have a subtle impact on code. As we make the release >> candidates, it will be very helpful to receive as much feedback as possible >> on how any changes affect your code. We will work on the release notes >> over the coming weeks so that they have as much information as possible. >> >> After NumPy 1.7, there is a NumPy 1.8 planned for later this year. >> >> > Hmm, I was going to add the type specific sorts for object and structured > types. > Also, there is some additional cleanup that needs to be done for macros. Probably it would have been helpful to schedule the branch for a week or two in the future so we could all get the little odds and ends fixed up first. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From bobtnur78 at gmail.com Thu Jun 21 10:59:25 2012 From: bobtnur78 at gmail.com (bob tnur) Date: Thu, 21 Jun 2012 10:59:25 -0400 Subject: [Numpy-discussion] Good way to develop numpy as popular choice! Message-ID: Hi all numpy fun;) This question is already posted in stackoverflow by some people, I am just thinking that numpy python will do this with trick;) I guess numpy will be every ones choice as its popularity increases. The question is herein: http://stackoverflow.com/questions/10074270/how-can-i-find-the-minimum-number-of-lines-needed-to-cover-all-the-zeros-in-a-2 Have fun with numpy:) Bob -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Thu Jun 21 11:03:48 2012 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 21 Jun 2012 16:03:48 +0100 Subject: [Numpy-discussion] Good way to develop numpy as popular choice! In-Reply-To: References: Message-ID: On Thu, Jun 21, 2012 at 3:59 PM, bob tnur wrote: > Hi all numpy fun;) > This question is already posted in stackoverflow by some people, I am just > thinking that numpy python will do this with trick;) I guess numpy will be > every ones choice as its popularity increases. The question is herein: > http://stackoverflow.com/questions/10074270/how-can-i-find-the-minimum-number-of-lines-needed-to-cover-all-the-zeros-in-a-2 My "numpy solution" for this is just $ pip install munkres http://pypi.python.org/pypi/munkres -- Robert Kern From travis at continuum.io Thu Jun 21 11:25:18 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 21 Jun 2012 10:25:18 -0500 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: Message-ID: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> I thought it was clear we were doing a 1.7 release before SciPy. It seems pretty urgent that we get something out sooner than later. I know there is never enough time to do all the things we want to do. There is time before the first Release candidate to make changes on the 1.7.x branch. If you want to make the changes on master, and just indicate the Pull requests, Ondrej can make sure they are added to the 1.7.x. branch by Monday. We can also delay the first Release Candidate by a few days to next Wednesday and then bump everything 3 days if that will help. There will be a follow-on 1.8 release before the end of the year --- so there is time to make changes for that release as well. The next release will not take a year to get out, so we shouldn't feel pressured to get *everything* in this release. Speaking of code changes... What are the cleanups for macros that need to be done? I was looking at the code and notice that where before I could do PyArray_NDIM(obj), Mark's code now does PyArray_NDIM((PyArrayObject *)obj). Is that intentional? That's not as nice to type. Is that assuming that PyArray_NDIM will become a function and need a specific object type for its argument (and everything else cast....). That's one clear disadvantage of inline functions versus macros in my mind: no automatic polymorphism. I don't think type safety is a big win for macros like these. We need to be more judicious about which macros are scheduled for function inlining. Some just don't benefit from the type-safety implications as much as others do, and you end up requiring everyone to change their code downstream for no real reason. These sorts of changes really feel to me like unnecessary spelling changes that require work from extension writers who now have to modify their code with no real gain. There seems to be a lot of that going on in the code base and I'm not really convinced that it's useful for end-users. I'm going to be a lot more resistant to that sort of change in the code base when I see it. One particularly glaring example to my lens on the world: I think it would have been better to define new macros which require semicolons than changing the macros that don't require semicolons to now require semicolons: NPY_BEGIN_THREADS_DEF NPY_BEGIN_THREADS NPY_ALLOW_C_API NPY_ALLOW_C_API_DEF NPY_DISABLE_C_API That feels like a gratuitous style change that will force users of those macros to re-write their code. Sure, it's a simple change, but it's a simple change that doesn't do anything for you as an end user. I think I'm going to back this change out, in fact. I can't see requiring people to change their C-code like this will require without a clear benefit to them. I'm quite sure there is code out there that uses these documented APIs (without the semicolon). If we want to define new macros that require colons, then we do that, but we can't get rid of the old ones --- especially in a 1.x release. Our policy should not be to allow gratuitous style changes just because we think something is prettier another way. The NumPy code base has come from multiple sources and reflects several styles. It also follows an older style of C-programming (that is quite common in the Python code base). It can be changed, but those changes shouldn't be painful for a library user without some specific gain for them that the change allows. There are significant users of NumPy out there still on 1.4. Even the policy of deprecation that has been discussed will not help people trying to upgrade from 1.4 to 1.8. They will be forced to upgrade multiple times. The easier we can make this process for users the better. I remain convinced that it's better and am much more comfortable with making a release that requires a re-compile (that will succeed without further code changes --- because of backward compatibility efforts) than to have supposed ABI compatibility with subtle semantic changes and required C-code changes when you do happen to re-compile. Thanks, -Travis On Jun 21, 2012, at 8:10 AM, Charles R Harris wrote: > > > On Thu, Jun 21, 2012 at 7:07 AM, Charles R Harris wrote: > > > On Thu, Jun 21, 2012 at 4:11 AM, Travis Oliphant wrote: > Hey all, > > I made a branch called with_maskna and then merged Nathaniel's PR which removes the mask_na support from master. I then applied a patch to fix the boolean indexing problem reported by Ralf. > > I then created a NumPy 1.7.x maintenance branch from which the release of NumPy 1.7 will be made. Ondrej Certik and I will be managing the release of NumPy 1.7. Ondrej is the author of SymPy and has agreed to help get NumPy 1.7 out the door. Thanks, Ondrej for being willing to help in this way. > > In principal only bug-fixes should be pushed to the NumPy 1.7 branch at this point. The target is to make a release of NumPy 1.7.x by July 9th. The schedule we will work for is: > > RC1 -- June 25 > RC2 -- July 5 > Release -- July 13 > > NumPy 1.7 is a significant release and has several changes many of which are documented in the release notes. Several new code paths were added which can have a subtle impact on code. As we make the release candidates, it will be very helpful to receive as much feedback as possible on how any changes affect your code. We will work on the release notes over the coming weeks so that they have as much information as possible. > > After NumPy 1.7, there is a NumPy 1.8 planned for later this year. > > > Hmm, I was going to add the type specific sorts for object and structured types. > > Also, there is some additional cleanup that needs to be done for macros. Probably it would have been helpful to schedule the branch for a week or two in the future so we could all get the little odds and ends fixed up first. > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Jun 21 13:20:49 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 21 Jun 2012 11:20:49 -0600 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> Message-ID: On Thu, Jun 21, 2012 at 9:25 AM, Travis Oliphant wrote: > I thought it was clear we were doing a 1.7 release before SciPy. It > seems pretty urgent that we get something out sooner than later. I > know there is never enough time to do all the things we want to do. > > The usual practice is to announce a schedule first. > There is time before the first Release candidate to make changes on the > 1.7.x branch. If you want to make the changes on master, and just > indicate the Pull requests, Ondrej can make sure they are added to the > 1.7.x. branch by Monday. We can also delay the first Release Candidate > by a few days to next Wednesday and then bump everything 3 days if that > will help. There will be a follow-on 1.8 release before the end of the > year --- so there is time to make changes for that release as well. The > next release will not take a year to get out, so we shouldn't feel > pressured to get *everything* in this release. > What are we going to do for 1.8? > > > Speaking of code changes... > > What are the cleanups for macros that need to be done? I was looking at > the code and notice that where before I could do PyArray_NDIM(obj), Mark's > code now does PyArray_NDIM((PyArrayObject *)obj). Is that intentional? > Yes, the functions will give warnings otherwise. > That's not as nice to type. > So? The point is to have correctness, not ease of typing. > Is that assuming that PyArray_NDIM will become a function and need a > specific object type for its argument (and everything else cast....). > That's one clear disadvantage of inline functions versus macros in my mind: > no automatic polymorphism. > That's a disadvantage of Python. The virtue of inline functions is precisely type checking. > I don't think type safety is a big win for macros like these. We need > to be more judicious about which macros are scheduled for function > inlining. Some just don't benefit from the type-safety implications as > much as others do, and you end up requiring everyone to change their code > downstream for no real reason. > > These sorts of changes really feel to me like unnecessary spelling changes > that require work from extension writers who now have to modify their code > with no real gain. There seems to be a lot of that going on in the code > base and I'm not really convinced that it's useful for end-users. > Good style and type checking are useful. Numpy needs more of both. > I'm going to be a lot more resistant to that sort of change in the code > base when I see it. > Numpy is a team effort. There are people out there who write better code than you do, you should learn from them. > > One particularly glaring example to my lens on the world: I think it > would have been better to define new macros which require semicolons than > changing the macros that don't require semicolons to now require > semicolons: > > NPY_BEGIN_THREADS_DEF > NPY_BEGIN_THREADS > NPY_ALLOW_C_API > NPY_ALLOW_C_API_DEF > NPY_DISABLE_C_API > > That feels like a gratuitous style change that will force users of those > macros to re-write their code. > It doesn't seem to be much of a problem. > Sure, it's a simple change, but it's a simple change that doesn't do > anything for you as an end user. I think I'm going to back this change > out, in fact. I can't see requiring people to change their C-code like > this will require without a clear benefit to them. I'm quite sure there > is code out there that uses these documented APIs (without the semicolon). > If we want to define new macros that require colons, then we do that, but > we can't get rid of the old ones --- especially in a 1.x release. > > Our policy should not be to allow gratuitous style changes just because we > think something is prettier another way. The NumPy code base has come > from multiple sources and reflects several styles. It also follows an > older style of C-programming (that is quite common in the Python code > base). It can be changed, but those changes shouldn't be painful for a > library user without some specific gain for them that the change allows. > > You use that word 'gratuitous' a lot, I don't think it means what you think it means. For instance, the new polynomial coefficient order wasn't gratuitous, it was doing things in a way many found more intuitive and generalized better to different polynomial basis. People have different ideas, that doesn't make them gratuitous. > There are significant users of NumPy out there still on 1.4. Even the > policy of deprecation that has been discussed will not help people trying > to upgrade from 1.4 to 1.8. They will be forced to upgrade multiple > times. The easier we can make this process for users the better. I > remain convinced that it's better and am much more comfortable with making > a release that requires a re-compile (that will succeed without further > code changes --- because of backward compatibility efforts) than to have > supposed ABI compatibility with subtle semantic changes and required C-code > changes when you do happen to re-compile. > > Cleanups need to be made bit by bit. I don't think we have done anything that will cause undo trouble. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From e.antero.tammi at gmail.com Thu Jun 21 14:33:59 2012 From: e.antero.tammi at gmail.com (eat) Date: Thu, 21 Jun 2012 21:33:59 +0300 Subject: [Numpy-discussion] Good way to develop numpy as popular choice! In-Reply-To: References: Message-ID: Heh, On Thu, Jun 21, 2012 at 6:03 PM, Robert Kern wrote: > On Thu, Jun 21, 2012 at 3:59 PM, bob tnur wrote: > > Hi all numpy fun;) > > This question is already posted in stackoverflow by some people, I am > just > > thinking that numpy python will do this with trick;) I guess numpy will > be > > every ones choice as its popularity increases. The question is herein: > > > http://stackoverflow.com/questions/10074270/how-can-i-find-the-minimum-number-of-lines-needed-to-cover-all-the-zeros-in-a-2 > > My "numpy solution" for this is just > > $ pip install munkres > munkres seems to be a pure python implementation ;-). FWIIW, There exists pure python implementation(s) to outperform munkresimplementation more than 200 times already with a 100x100 random cost matrix, based on shortest path variant of the Hungarian algorithm (more details of the algorithms can be found for example at http://www.assignmentproblems.com/). How the assignment algorithms are (typically) described, it actually may be quite a tedious job to create more performance ones utilizing numpy arrays instead of lists of lists. My 2 cents, -eat > > http://pypi.python.org/pypi/munkres > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Thu Jun 21 14:39:46 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 21 Jun 2012 20:39:46 +0200 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> Message-ID: On Thu, Jun 21, 2012 at 7:20 PM, Charles R Harris wrote: > > > On Thu, Jun 21, 2012 at 9:25 AM, Travis Oliphant wrote: > >> >> One particularly glaring example to my lens on the world: I think it >> would have been better to define new macros which require semicolons than >> changing the macros that don't require semicolons to now require >> semicolons: >> >> NPY_BEGIN_THREADS_DEF >> NPY_BEGIN_THREADS >> NPY_ALLOW_C_API >> NPY_ALLOW_C_API_DEF >> NPY_DISABLE_C_API >> >> That feels like a gratuitous style change that will force users of those >> macros to re-write their code. >> > > It doesn't seem to be much of a problem. > Well, we did do a SciPy maintenance release for this.... Overall I agree with you Chuck that cleanups are needed, but if there's too much impact for users of this particular change -- which would be nice to see confirmed by pointing to actual code instead of just asserted -- then I don't see the harm in undoing it. > > >> Sure, it's a simple change, but it's a simple change that doesn't do >> anything for you as an end user. I think I'm going to back this change >> out, in fact. I can't see requiring people to change their C-code like >> this will require without a clear benefit to them. I'm quite sure there >> is code out there that uses these documented APIs (without the semicolon). >> If we want to define new macros that require colons, then we do that, but >> we can't get rid of the old ones --- especially in a 1.x release. >> >> Our policy should not be to allow gratuitous style changes just because >> we think something is prettier another way. The NumPy code base has come >> from multiple sources and reflects several styles. It also follows an >> older style of C-programming (that is quite common in the Python code >> base). It can be changed, but those changes shouldn't be painful for a >> library user without some specific gain for them that the change allows. >> >> > You use that word 'gratuitous' a lot, I don't think it means what you > think it means. For instance, the new polynomial coefficient order wasn't > gratuitous, it was doing things in a way many found more intuitive and > generalized better to different polynomial basis. People have different > ideas, that doesn't make them gratuitous. > > >> There are significant users of NumPy out there still on 1.4. Even the >> policy of deprecation that has been discussed will not help people trying >> to upgrade from 1.4 to 1.8. They will be forced to upgrade multiple >> times. The easier we can make this process for users the better. I >> remain convinced that it's better and am much more comfortable with making >> a release that requires a re-compile (that will succeed without further >> code changes --- because of backward compatibility efforts) than to have >> supposed ABI compatibility with subtle semantic changes and required C-code >> changes when you do happen to re-compile. >> > Best to have neither a re-compile nor ABI incompatibility. That said, I'd prefer the former over the latter any day of the week if I'd have to choose. Ralf > Cleanups need to be made bit by bit. I don't think we have done anything > that will cause undo trouble. > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Thu Jun 21 14:49:54 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 21 Jun 2012 20:49:54 +0200 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> Message-ID: On Thu, Jun 21, 2012 at 5:25 PM, Travis Oliphant wrote: > I thought it was clear we were doing a 1.7 release before SciPy. It > seems pretty urgent that we get something out sooner than later. I > know there is never enough time to do all the things we want to do. > > There is time before the first Release candidate to make changes on the > 1.7.x branch. If you want to make the changes on master, and just > indicate the Pull requests, Ondrej can make sure they are added to the > 1.7.x. branch by Monday. We can also delay the first Release Candidate > by a few days to next Wednesday and then bump everything 3 days if that > will help. There will be a follow-on 1.8 release before the end of the > year --- so there is time to make changes for that release as well. The > next release will not take a year to get out, so we shouldn't feel > pressured to get *everything* in this release. > What about http://projects.scipy.org/numpy/ticket/2108? Someone needs to at least answer the question of how much of datetime is unusable on Windows with the current code. If that's not a lot then perhaps this is not a blocker, but we did consider it one until now..... Of the other tickets (http://projects.scipy.org/numpy/report/3) it would also be good to get an assessment of which ones are critical. Perhaps none of them are and the branch is in good shape for a release, but some of those segfaults would be nice to have fixed. Debian multi-arch support too, as discussed on this list recently. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From wesmckinn at gmail.com Thu Jun 21 15:31:17 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Thu, 21 Jun 2012 15:31:17 -0400 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> Message-ID: On Thu, Jun 21, 2012 at 2:49 PM, Ralf Gommers wrote: > > > On Thu, Jun 21, 2012 at 5:25 PM, Travis Oliphant > wrote: >> >> I thought it was clear we were doing a 1.7 release before SciPy. ? It >> seems pretty urgent that we get something out sooner than later. ? ? ?I know >> there is never enough time to do all the things we want to do. >> >> There is time before the first Release candidate to make changes on the >> 1.7.x branch. ? If you want to make the changes on master, and just indicate >> the Pull requests, Ondrej can make sure they are added to the 1.7.x. branch >> by Monday. ? ?We can also delay the first Release Candidate by a few days to >> next Wednesday and then bump everything 3 days if that will help. ? ? There >> will be a follow-on 1.8 release before the end of the year --- so there is >> time to make changes for that release as well. ? ?The next release will not >> take a year to get out, so we shouldn't feel pressured to get *everything* >> in this release. > > > What about http://projects.scipy.org/numpy/ticket/2108? Someone needs to at > least answer the question of how much of datetime is unusable on Windows > with the current code. If that's not a lot then perhaps this is not a > blocker, but we did consider it one until now..... pandas has become a heavy consumer of datetime64 recently, and we haven't had any issues using VS2003 and VS2008, but haven't tested heavily against NumPy compiled with mingw outside of the version shipped in Enthought Python Distribution (the test suite passes fine, last time I checked). > Of the other tickets (http://projects.scipy.org/numpy/report/3) it would > also be good to get an assessment of which ones are critical. Perhaps none > of them are and the branch is in good shape for a release, but some of those > segfaults would be nice to have fixed. Debian multi-arch support too, as > discussed on this list recently. > > Ralf > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From ralf.gommers at googlemail.com Thu Jun 21 15:53:13 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 21 Jun 2012 21:53:13 +0200 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> Message-ID: On Thu, Jun 21, 2012 at 9:31 PM, Wes McKinney wrote: > On Thu, Jun 21, 2012 at 2:49 PM, Ralf Gommers > wrote: > > > > > > On Thu, Jun 21, 2012 at 5:25 PM, Travis Oliphant > > wrote: > >> > >> I thought it was clear we were doing a 1.7 release before SciPy. It > >> seems pretty urgent that we get something out sooner than later. I > know > >> there is never enough time to do all the things we want to do. > >> > >> There is time before the first Release candidate to make changes on the > >> 1.7.x branch. If you want to make the changes on master, and just > indicate > >> the Pull requests, Ondrej can make sure they are added to the 1.7.x. > branch > >> by Monday. We can also delay the first Release Candidate by a few > days to > >> next Wednesday and then bump everything 3 days if that will help. > There > >> will be a follow-on 1.8 release before the end of the year --- so there > is > >> time to make changes for that release as well. The next release will > not > >> take a year to get out, so we shouldn't feel pressured to get > *everything* > >> in this release. > > > > > > What about http://projects.scipy.org/numpy/ticket/2108? Someone needs > to at > > least answer the question of how much of datetime is unusable on Windows > > with the current code. If that's not a lot then perhaps this is not a > > blocker, but we did consider it one until now..... > > pandas has become a heavy consumer of datetime64 recently, and we > haven't had any issues using VS2003 and VS2008, but haven't tested > heavily against NumPy compiled with mingw outside of the version > shipped in Enthought Python Distribution (the test suite passes fine, > last time I checked). > Thanks Wes. It's indeed a MinGW-specific issue. EPD ships MinGW 4.5.2, which should work but has issues when producing binary installers that aren't yet resolved AFAIK. David C. last reported on that a few months ago that he didn't see an easy solution. All releases until now have been done with MinGW 3.4.5, which has a datetime problem. So we still need a confirmation about whether current issues with 3.4.5 are acceptable, or we need a fix or another way of creating binaries. Ralf > > Of the other tickets (http://projects.scipy.org/numpy/report/3) it would > > also be good to get an assessment of which ones are critical. Perhaps > none > > of them are and the branch is in good shape for a release, but some of > those > > segfaults would be nice to have fixed. Debian multi-arch support too, as > > discussed on this list recently. > > > > Ralf > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Thu Jun 21 17:04:40 2012 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 21 Jun 2012 22:04:40 +0100 Subject: [Numpy-discussion] Good way to develop numpy as popular choice! In-Reply-To: References: Message-ID: On Thu, Jun 21, 2012 at 7:33 PM, eat wrote: > Heh, > > On Thu, Jun 21, 2012 at 6:03 PM, Robert Kern wrote: >> >> On Thu, Jun 21, 2012 at 3:59 PM, bob tnur wrote: >> > Hi all numpy fun;) >> > This question is already posted in stackoverflow by some people, I am >> > just >> > thinking that numpy python will do this with trick;) I guess numpy will >> > be >> > every ones choice as its popularity increases. The question is herein: >> > >> > http://stackoverflow.com/questions/10074270/how-can-i-find-the-minimum-number-of-lines-needed-to-cover-all-the-zeros-in-a-2 >> >> My "numpy solution" for this is just >> >> ?$ pip install munkres > > munkres seems to be a pure python implementation ;-). Oops! I could have sworn that I once tried one named munkres that used numpy. But that was several years ago. -- Robert Kern From ben.root at ou.edu Thu Jun 21 20:59:09 2012 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 21 Jun 2012 20:59:09 -0400 Subject: [Numpy-discussion] Good way to develop numpy as popular choice! In-Reply-To: References: Message-ID: On Thursday, June 21, 2012, Robert Kern wrote: > On Thu, Jun 21, 2012 at 7:33 PM, eat > > wrote: > > Heh, > > > > On Thu, Jun 21, 2012 at 6:03 PM, Robert Kern > > wrote: > >> > >> On Thu, Jun 21, 2012 at 3:59 PM, bob tnur > > wrote: > >> > Hi all numpy fun;) > >> > This question is already posted in stackoverflow by some people, I am > >> > just > >> > thinking that numpy python will do this with trick;) I guess numpy > will > >> > be > >> > every ones choice as its popularity increases. The question is herein: > >> > > >> > > http://stackoverflow.com/questions/10074270/how-can-i-find-the-minimum-number-of-lines-needed-to-cover-all-the-zeros-in-a-2 > >> > >> My "numpy solution" for this is just > >> > >> $ pip install munkres > > > > munkres seems to be a pure python implementation ;-). > > Oops! I could have sworn that I once tried one named munkres that used > numpy. But that was several years ago. > > There is a development branch of sk-learn with an implementation of the hungarian assignment solver using numpy. It will even do non-square matrices and matrices with an empty dimension. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Fri Jun 22 00:51:50 2012 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Fri, 22 Jun 2012 06:51:50 +0200 Subject: [Numpy-discussion] Good way to develop numpy as popular choice! In-Reply-To: References: Message-ID: <20120622045150.GA13499@phare.normalesup.org> On Thu, Jun 21, 2012 at 08:59:09PM -0400, Benjamin Root wrote: > > munkres seems to be a pure python implementation ;-). > Oops! I could have sworn that I once tried one named munkres that used > numpy. But that was several years ago. > There is a development branch of sk-learn with an implementation of the > hungarian assignment solver using numpy. It will even do non-square > matrices and matrices with an empty dimension. Yes, absolutely, thanks to Ben: https://github.com/GaelVaroquaux/scikit-learn/blob/hungarian/sklearn/utils/hungarian.py I never merged this in the main scikit-learn tree, because munkres is not used so far. Maybe I should merge it in the main tree, or maybe it should be added to scipy or numpy. Ga?l From thouis at gmail.com Fri Jun 22 03:49:58 2012 From: thouis at gmail.com (Thouis (Ray) Jones) Date: Fri, 22 Jun 2012 09:49:58 +0200 Subject: [Numpy-discussion] Issue tracking In-Reply-To: <3C64F2BE-50E5-403C-9022-71233A6E3449@continuum.io> References: <3C64F2BE-50E5-403C-9022-71233A6E3449@continuum.io> Message-ID: On Mon, Jun 4, 2012 at 7:43 PM, Travis Oliphant wrote: > I have turned on issue tracking and started a few labels. ? Feel free to add > more / adjust the names as appropriate. ? ? I am trying to find someone who > can help manage the migration from Trac. Are the github issues set up sufficiently for Trac to be disabled and github to take over? Ray Jones From e.antero.tammi at gmail.com Fri Jun 22 09:42:02 2012 From: e.antero.tammi at gmail.com (eat) Date: Fri, 22 Jun 2012 16:42:02 +0300 Subject: [Numpy-discussion] Good way to develop numpy as popular choice! In-Reply-To: <20120622045150.GA13499@phare.normalesup.org> References: <20120622045150.GA13499@phare.normalesup.org> Message-ID: Hi, On Fri, Jun 22, 2012 at 7:51 AM, Gael Varoquaux < gael.varoquaux at normalesup.org> wrote: > On Thu, Jun 21, 2012 at 08:59:09PM -0400, Benjamin Root wrote: > > > munkres seems to be a pure python implementation ;-). > > > Oops! I could have sworn that I once tried one named munkres that > used > > numpy. But that was several years ago. > > > There is a development branch of sk-learn with an implementation of > the > > hungarian assignment solver using numpy. It will even do non-square > > matrices and matrices with an empty dimension. > > Yes, absolutely, thanks to Ben: > > https://github.com/GaelVaroquaux/scikit-learn/blob/hungarian/sklearn/utils/hungarian.py > I never merged this in the main scikit-learn tree, because munkres is not > used so far. Maybe I should merge it in the main tree, or maybe it should > be added to scipy or numpy. > I made some simple timing comparisons (see attached picture) between numpy based hungarian and pure python shortest path based hungarian_sp. It seems that pure python based implementation outperforms numpy based implementation. Timings are averaged over five runs. The difference cannot totally be explained by different algorithms (although shortest path based seem to scale better). Rather the heavy access to rows and columns seem to favor list of lists. So this type of algorithms may indeed be real challenges for numpy. Regards, -eat > > Ga?l > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: compare.png Type: image/png Size: 96356 bytes Desc: not available URL: From ben.root at ou.edu Fri Jun 22 09:48:30 2012 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 22 Jun 2012 09:48:30 -0400 Subject: [Numpy-discussion] Good way to develop numpy as popular choice! In-Reply-To: References: <20120622045150.GA13499@phare.normalesup.org> Message-ID: On Fri, Jun 22, 2012 at 9:42 AM, eat wrote: > Hi, > > On Fri, Jun 22, 2012 at 7:51 AM, Gael Varoquaux < > gael.varoquaux at normalesup.org> wrote: > >> On Thu, Jun 21, 2012 at 08:59:09PM -0400, Benjamin Root wrote: >> > > munkres seems to be a pure python implementation ;-). >> >> > Oops! I could have sworn that I once tried one named munkres that >> used >> > numpy. But that was several years ago. >> >> > There is a development branch of sk-learn with an implementation of >> the >> > hungarian assignment solver using numpy. It will even do non-square >> > matrices and matrices with an empty dimension. >> >> Yes, absolutely, thanks to Ben: >> >> https://github.com/GaelVaroquaux/scikit-learn/blob/hungarian/sklearn/utils/hungarian.py >> I never merged this in the main scikit-learn tree, because munkres is not >> used so far. Maybe I should merge it in the main tree, or maybe it should >> be added to scipy or numpy. >> > I made some simple timing comparisons (see attached picture) between numpy > based hungarian and pure python shortest path based hungarian_sp. It seems > that pure python based implementation outperforms numpy based > implementation. Timings are averaged over five runs. > > The difference cannot totally be explained by different algorithms > (although shortest path based seem to scale better). Rather the heavy > access to rows and columns seem to favor list of lists. So this type of > algorithms may indeed be real challenges for numpy. > > eat, Thanks for that analysis. Personally, I never needed high-performance so I never bothered to optimize it. However, it does appear that there is an order-of-magnitude difference between the two, and so it might be worth it to see what can be done to fix that. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Fri Jun 22 10:25:40 2012 From: travis at continuum.io (Travis Oliphant) Date: Fri, 22 Jun 2012 09:25:40 -0500 Subject: [Numpy-discussion] Good way to develop numpy as popular choice! In-Reply-To: References: <20120622045150.GA13499@phare.normalesup.org> Message-ID: Accessing individual elements of NumPy arrays is slower than accessing individual elements of lists --- around 2.5x-3x slower. NumPy has to do more work to figure out what kind of indexing you are trying to do because of its flexibility. It also has to create the Python object to return. In contrast, the list approach already has the Python objects created and you are just returning pointers to them and there is much less flexibility in the kinds of indexing you can do. Simple timings show that a.item(i,j) is about 2x slower than list element access (but faster than a[i,j] which is about 2.5x to 3x slower). The slowness of a.item is due to the need to create the Python object to return (there are just raw bytes there) so it gives some idea of the relative cost of each part of the slowness of a[i,j]. Also, math on the array scalars returned from NumPy will be slower than math on integers and floats --- because NumPy re-uses the ufunc machinery which is not optimized at all for scalars. The take-away is that NumPy is built for doing vectorized operations on bytes of data. It is not optimized for doing element-by-element individual access. The right approach there is to just use lists (or use a version specialized for the kind of data in the lists that removes the boxing and unboxing). Here are my timings using IPython for NumPy indexing: 1-D: In[2]: a = arange(100) In [3]: %timeit [a.item(i) for i in xrange(100)] 10000 loops, best of 3: 25.6 us per loop In [4]: %timeit [a[i] for i in xrange(100)] 10000 loops, best of 3: 31.8 us per loop In [5]: al = a.tolist() In [6]: %timeit [al[i] for i in xrange(100)] 100000 loops, best of 3: 10.6 us per loop 2-D: In [7]: a = arange(100).reshape(10,10) In [8]: al = a.tolist() In [9]: %timeit [al[i][j] for i in xrange(10) for j in xrange(10)] 10000 loops, best of 3: 18.6 us per loop In [10]: %timeit [a[i,j] for i in xrange(10) for j in xrange(10)] 10000 loops, best of 3: 44.4 us per loop In [11]: %timeit [a.item(i,j) for i in xrange(10) for j in xrange(10)] 10000 loops, best of 3: 34.2 us per loop -Travis On Jun 22, 2012, at 8:48 AM, Benjamin Root wrote: > > > On Fri, Jun 22, 2012 at 9:42 AM, eat wrote: > Hi, > > On Fri, Jun 22, 2012 at 7:51 AM, Gael Varoquaux wrote: > On Thu, Jun 21, 2012 at 08:59:09PM -0400, Benjamin Root wrote: > > > munkres seems to be a pure python implementation ;-). > > > Oops! I could have sworn that I once tried one named munkres that used > > numpy. But that was several years ago. > > > There is a development branch of sk-learn with an implementation of the > > hungarian assignment solver using numpy. It will even do non-square > > matrices and matrices with an empty dimension. > > Yes, absolutely, thanks to Ben: > https://github.com/GaelVaroquaux/scikit-learn/blob/hungarian/sklearn/utils/hungarian.py > I never merged this in the main scikit-learn tree, because munkres is not > used so far. Maybe I should merge it in the main tree, or maybe it should > be added to scipy or numpy. > I made some simple timing comparisons (see attached picture) between numpy based hungarian and pure python shortest path based hungarian_sp. It seems that pure python based implementation outperforms numpy based implementation. Timings are averaged over five runs. > > The difference cannot totally be explained by different algorithms (although shortest path based seem to scale better). Rather the heavy access to rows and columns seem to favor list of lists. So this type of algorithms may indeed be real challenges for numpy. > > > eat, > > Thanks for that analysis. Personally, I never needed high-performance so I never bothered to optimize it. However, it does appear that there is an order-of-magnitude difference between the two, and so it might be worth it to see what can be done to fix that. > > Ben Root > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Fri Jun 22 11:05:19 2012 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 22 Jun 2012 11:05:19 -0400 Subject: [Numpy-discussion] Good way to develop numpy as popular choice! In-Reply-To: References: <20120622045150.GA13499@phare.normalesup.org> Message-ID: On Fri, Jun 22, 2012 at 10:25 AM, Travis Oliphant wrote: > Accessing individual elements of NumPy arrays is slower than accessing > individual elements of lists --- around 2.5x-3x slower. NumPy has to do > more work to figure out what kind of indexing you are trying to do because > of its flexibility. It also has to create the Python object to return. > In contrast, the list approach already has the Python objects created and > you are just returning pointers to them and there is much less flexibility > in the kinds of indexing you can do. > > Simple timings show that a.item(i,j) is about 2x slower than list element > access (but faster than a[i,j] which is about 2.5x to 3x slower). The > slowness of a.item is due to the need to create the Python object to return > (there are just raw bytes there) so it gives some idea of the relative cost > of each part of the slowness of a[i,j]. > > Also, math on the array scalars returned from NumPy will be slower than > math on integers and floats --- because NumPy re-uses the ufunc machinery > which is not optimized at all for scalars. > > The take-away is that NumPy is built for doing vectorized operations on > bytes of data. It is not optimized for doing element-by-element > individual access. The right approach there is to just use lists (or use > a version specialized for the kind of data in the lists that removes the > boxing and unboxing). > > Here are my timings using IPython for NumPy indexing: > > 1-D: > > In[2]: a = arange(100) > > In [3]: %timeit [a.item(i) for i in xrange(100)] > 10000 loops, best of 3: 25.6 us per loop > > In [4]: %timeit [a[i] for i in xrange(100)] > 10000 loops, best of 3: 31.8 us per loop > > In [5]: al = a.tolist() > > In [6]: %timeit [al[i] for i in xrange(100)] > 100000 loops, best of 3: 10.6 us per loop > > > > 2-D: > > In [7]: a = arange(100).reshape(10,10) > > In [8]: al = a.tolist() > > In [9]: %timeit [al[i][j] for i in xrange(10) for j in xrange(10)] > 10000 loops, best of 3: 18.6 us per loop > > In [10]: %timeit [a[i,j] for i in xrange(10) for j in xrange(10)] > 10000 loops, best of 3: 44.4 us per loop > > In [11]: %timeit [a.item(i,j) for i in xrange(10) for j in xrange(10)] > 10000 loops, best of 3: 34.2 us per loop > > > > -Travis > > However, what is the timing/memory cost of converting a large numpy array that already exists into python list of lists? If all my processing before the munkres step is using NumPy, converting it into python lists has a cost. Also, your timings indicate only ~2x slowdown, while the timing tests done by eat show an order-of-magnitude difference. I suspect there is great room for improvement before even starting to worry about the array access issues. Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Fri Jun 22 11:13:12 2012 From: travis at continuum.io (Travis Oliphant) Date: Fri, 22 Jun 2012 10:13:12 -0500 Subject: [Numpy-discussion] Good way to develop numpy as popular choice! In-Reply-To: References: <20120622045150.GA13499@phare.normalesup.org> Message-ID: <70872533-10C5-468B-861F-27A08B69C4F1@continuum.io> > > > -Travis > > > However, what is the timing/memory cost of converting a large numpy array that already exists into python list of lists? If all my processing before the munkres step is using NumPy, converting it into python lists has a cost. Also, your timings indicate only ~2x slowdown, while the timing tests done by eat show an order-of-magnitude difference. I suspect there is great room for improvement before even starting to worry about the array access issues. > If you are also doing scalar math with the results returned from NumPy array element access, then a.item(i,j) will be faster because it returns a Python object which will use it's scalar math instead of re-using vectorized math operations which a[i,j] will do. I haven't looked at the code yet, just re-emphasizing known issues with trying to use NumPy as an arbitrary container of "elements" rather than a container of bytes that you do vectorized operations on. -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From Tom.KACVINSKY at 3ds.com Fri Jun 22 11:16:31 2012 From: Tom.KACVINSKY at 3ds.com (KACVINSKY Tom) Date: Fri, 22 Jun 2012 15:16:31 +0000 Subject: [Numpy-discussion] Building numpy with MSVC 2010 and Intel Fortran 10.x (and higher) Message-ID: I have reason to build Python 2.6.8 and numpy 1.4.1 with MSVC 2010 and Intel Fortran 10.1 (and higher). I also am building with MKL 10.3. So far, I am able to get the setup to recognize the MKL libraries: C:\Users\tky\Python\numpy-1.6.2>python setup.py build --compiler=msvc --fcompiler=intel Running from numpy source directory.Forcing DISTUTILS_USE_SDK=1 F2PY Version 2 blas_opt_info: blas_mkl_info: FOUND: libraries = ['mkl_sequential_dll', 'mkl_intel_lp64_dll', 'mkl_core_dll'] library_dirs = ['u:\\users\\tky\\mkllib'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['u:\\users\\tky\\mklinc'] FOUND: libraries = ['mkl_sequential_dll', 'mkl_intel_lp64_dll', 'mkl_core_dll'] library_dirs = ['u:\\users\\tky\\mkllib'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['u:\\users\\tky\\mklinc'] lapack_opt_info: lapack_mkl_info: mkl_info: FOUND: libraries = ['mkl_sequential_dll', 'mkl_intel_lp64_dll', 'mkl_core_dll'] library_dirs = ['u:\\users\\tky\\mkllib'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['u:\\users\\tky\\mklinc'] FOUND: libraries = ['mkl_sequential_dll mkl_intel_lp64_dll mkl_core_dll', 'mkl_sequential_dll', 'mkl_intel_lp64_dll', 'mkl_core_dll'] library_dirs = ['u:\\users\\tky\\mkllib'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['u:\\users\\tky\\mklinc'] FOUND: libraries = ['mkl_sequential_dll mkl_intel_lp64_dll mkl_core_dll', 'mkl_sequential_dll', 'mkl_intel_lp64_dll', 'mkl_core_dll'] library_dirs = ['u:\\users\\tky\\mkllib'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['u:\\users\\tky\\mklinc'] But where things are failing is during the compiler checks: customize IntelFCompiler Found executable C:\Program Files (x86)\Intel\Compiler\Fortran\10.1.011\EM64T\bin\ifort.exe Found executable C:\Program Files (x86)\Intel\Compiler\Fortran\10.1.011\EM64T\bin\ifort.exe Traceback (most recent call last): File "setup.py", line 214, in setup_package() File "setup.py", line 207, in setup_package configuration=configuration ) File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\core.py", line 186, in setup return old_setup(**new_attr) File "C:\Users\tky\Python\release\lib\distutils\core.py", line 152, in setup dist.run_commands() File "C:\Users\tky\Python\release\lib\distutils\dist.py", line 975, in run_commands self.run_command(cmd) File "C:\Users\tky\Python\release\lib\distutils\dist.py", line 995, in run_command cmd_obj.run() File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\command\build.py", line 37, in run old_build.run(self) File "C:\Users\tky\Python\release\lib\distutils\command\build.py", line 134, in run self.run_command(cmd_name) File "C:\Users\tky\Python\release\lib\distutils\cmd.py", line 333, in run_command self.distribution.run_command(command) File "C:\Users\tky\Python\release\lib\distutils\dist.py", line 995, in run_command cmd_obj.run() File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\command\build_src.py", line 152, in run self.build_sources() File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\command\build_src.py", line 163, in build_sources self.build_library_sources(*libname_info) File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\command\build_src.py", line 298, in build_library_sources sources = self.generate_sources(sources, (lib_name, build_info)) File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\command\build_src.py", line 385, in generate_sources source = func(extension, build_dir) File "numpy\core\setup.py", line 694, in get_mathlib_info st = config_cmd.try_link('int main(void) { return 0;}') File "C:\Users\tky\Python\release\lib\distutils\command\config.py", line 257, in try_link self._check_compiler() File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\command\config.py", line 77, in _check_compiler self.fcompiler.customize(self.distribution) File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\fcompiler\__init__.py", line 502, in customize get_flags('arch', aflags) File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\fcompiler\__init__.py", line 491, in get_flags flags.extend(getattr(self.flag_vars, tag)) File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\environment.py", line 37, in __getattr__ return self._get_var(name, conf_desc) File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\environment.py", line 51, in _get_var var = self._hook_handler(name, hook) File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\fcompiler\__init__.py", line 698, in _environment_hook return hook() File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\fcompiler\intel.py", line 63, in get_flags_arch v = self.get_version() File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\fcompiler\__init__.py", line 432, in get_version raise CompilerNotFound() numpy.distutils.fcompiler.CompilerNotFound Any ideas on how to circumvent this? Thanks, Tom This email and any attachments are intended solely for the use of the individual or entity to whom it is addressed and may be confidential and/or privileged. If you are not one of the named recipients or have received this email in error, (i) you should not read, disclose, or copy it, (ii) please notify sender of your receipt by reply email and delete this email and all attachments, (iii) Dassault Systemes does not accept or assume any liability or responsibility for any use of or reliance on this email. For other languages, go to http://www.3ds.com/terms/email-disclaimer -------------- next part -------------- An HTML attachment was scrubbed... URL: From Tom.KACVINSKY at 3ds.com Fri Jun 22 11:21:59 2012 From: Tom.KACVINSKY at 3ds.com (KACVINSKY Tom) Date: Fri, 22 Jun 2012 15:21:59 +0000 Subject: [Numpy-discussion] Building numpy with MSVC 2010 and Intel Fortran 10.x (and higher) In-Reply-To: References: Message-ID: Ooops. The message about the Fortran compiler version comes from building numpy 1.6.1. The problem I am seeing with numpy 1.4.1 follows. No module named msvccompiler in numpy.distutils; trying from distutils customize IntelFCompiler Found executable c:\Program Files (x86)\Intel\Composer XE\bin\intel64\ifort.exe Found executable c:\Program Files (x86)\Intel\Composer XE\bin\intel64\ifort.exe C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\amd64\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -Inumpy\core\src\private -Inump y\core\src -Inumpy\core -Inumpy\core\src\npymath -Inumpy\core\src\multiarray -Inumpy\core\src\umath -Inumpy\core\include -IC:\Users\tky\Pyth on\release\include -IC:\Users\tky\Python\release\PC /Tc_configtest.c /Fo_configtest.obj Found executable C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\amd64\cl.exe C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\amd64\link.exe /nologo /INCREMENTAL:NO _configtest.obj /OUT:_configtest.exe /MANI FESTFILE:_configtest.exe.manifest Found executable C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\amd64\link.exe mt.exe -nologo -manifest _configtest.exe.manifest -outputresource:_configtest.exe;1 Found executable C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\bin\x64\mt.exe _configtest.exe.manifest : general error c1010070: Failed to load and parse the manifest. The system cannot find the file specified. failure. removing: _configtest.c _configtest.obj Traceback (most recent call last): File "setup.py", line 187, in setup_package() File "setup.py", line 180, in setup_package configuration=configuration ) File "C:\Users\tky\Python\numpy-1.4.1\numpy\distutils\core.py", line 186, in setup return old_setup(**new_attr) File "C:\Users\tky\Python\release\lib\distutils\core.py", line 152, in setup dist.run_commands() File "C:\Users\tky\Python\release\lib\distutils\dist.py", line 975, in run_commands self.run_command(cmd) File "C:\Users\tky\Python\release\lib\distutils\dist.py", line 995, in run_command cmd_obj.run() File "C:\Users\tky\Python\numpy-1.4.1\numpy\distutils\command\build.py", line 37, in run old_build.run(self) File "C:\Users\tky\Python\release\lib\distutils\command\build.py", line 134, in run self.run_command(cmd_name) File "C:\Users\tky\Python\release\lib\distutils\cmd.py", line 333, in run_command self.distribution.run_command(command) File "C:\Users\tky\Python\release\lib\distutils\dist.py", line 995, in run_command cmd_obj.run() File "C:\Users\tky\Python\numpy-1.4.1\numpy\distutils\command\build_src.py", line 152, in run self.build_sources() File "C:\Users\tky\Python\numpy-1.4.1\numpy\distutils\command\build_src.py", line 163, in build_sources self.build_library_sources(*libname_info) File "C:\Users\tky\Python\numpy-1.4.1\numpy\distutils\command\build_src.py", line 298, in build_library_sources sources = self.generate_sources(sources, (lib_name, build_info)) File "C:\Users\tky\Python\numpy-1.4.1\numpy\distutils\command\build_src.py", line 385, in generate_sources source = func(extension, build_dir) File "numpy\core\setup.py", line 657, in get_mathlib_info raise RuntimeError("Broken toolchain: cannot link a simple C program") RuntimeError: Broken toolchain: cannot link a simple C program From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of KACVINSKY Tom Sent: Friday, June 22, 2012 11:17 AM To: 'numpy-discussion at scipy.org' Subject: [Numpy-discussion] Building numpy with MSVC 2010 and Intel Fortran 10.x (and higher) I have reason to build Python 2.6.8 and numpy 1.4.1 with MSVC 2010 and Intel Fortran 10.1 (and higher). I also am building with MKL 10.3. So far, I am able to get the setup to recognize the MKL libraries: C:\Users\tky\Python\numpy-1.6.2>python setup.py build --compiler=msvc --fcompiler=intel Running from numpy source directory.Forcing DISTUTILS_USE_SDK=1 F2PY Version 2 blas_opt_info: blas_mkl_info: FOUND: libraries = ['mkl_sequential_dll', 'mkl_intel_lp64_dll', 'mkl_core_dll'] library_dirs = ['u:\\users\\tky\\mkllib'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['u:\\users\\tky\\mklinc'] FOUND: libraries = ['mkl_sequential_dll', 'mkl_intel_lp64_dll', 'mkl_core_dll'] library_dirs = ['u:\\users\\tky\\mkllib'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['u:\\users\\tky\\mklinc'] lapack_opt_info: lapack_mkl_info: mkl_info: FOUND: libraries = ['mkl_sequential_dll', 'mkl_intel_lp64_dll', 'mkl_core_dll'] library_dirs = ['u:\\users\\tky\\mkllib'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['u:\\users\\tky\\mklinc'] FOUND: libraries = ['mkl_sequential_dll mkl_intel_lp64_dll mkl_core_dll', 'mkl_sequential_dll', 'mkl_intel_lp64_dll', 'mkl_core_dll'] library_dirs = ['u:\\users\\tky\\mkllib'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['u:\\users\\tky\\mklinc'] FOUND: libraries = ['mkl_sequential_dll mkl_intel_lp64_dll mkl_core_dll', 'mkl_sequential_dll', 'mkl_intel_lp64_dll', 'mkl_core_dll'] library_dirs = ['u:\\users\\tky\\mkllib'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['u:\\users\\tky\\mklinc'] But where things are failing is during the compiler checks: customize IntelFCompiler Found executable C:\Program Files (x86)\Intel\Compiler\Fortran\10.1.011\EM64T\bin\ifort.exe Found executable C:\Program Files (x86)\Intel\Compiler\Fortran\10.1.011\EM64T\bin\ifort.exe Traceback (most recent call last): File "setup.py", line 214, in setup_package() File "setup.py", line 207, in setup_package configuration=configuration ) File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\core.py", line 186, in setup return old_setup(**new_attr) File "C:\Users\tky\Python\release\lib\distutils\core.py", line 152, in setup dist.run_commands() File "C:\Users\tky\Python\release\lib\distutils\dist.py", line 975, in run_commands self.run_command(cmd) File "C:\Users\tky\Python\release\lib\distutils\dist.py", line 995, in run_command cmd_obj.run() File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\command\build.py", line 37, in run old_build.run(self) File "C:\Users\tky\Python\release\lib\distutils\command\build.py", line 134, in run self.run_command(cmd_name) File "C:\Users\tky\Python\release\lib\distutils\cmd.py", line 333, in run_command self.distribution.run_command(command) File "C:\Users\tky\Python\release\lib\distutils\dist.py", line 995, in run_command cmd_obj.run() File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\command\build_src.py", line 152, in run self.build_sources() File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\command\build_src.py", line 163, in build_sources self.build_library_sources(*libname_info) File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\command\build_src.py", line 298, in build_library_sources sources = self.generate_sources(sources, (lib_name, build_info)) File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\command\build_src.py", line 385, in generate_sources source = func(extension, build_dir) File "numpy\core\setup.py", line 694, in get_mathlib_info st = config_cmd.try_link('int main(void) { return 0;}') File "C:\Users\tky\Python\release\lib\distutils\command\config.py", line 257, in try_link self._check_compiler() File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\command\config.py", line 77, in _check_compiler self.fcompiler.customize(self.distribution) File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\fcompiler\__init__.py", line 502, in customize get_flags('arch', aflags) File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\fcompiler\__init__.py", line 491, in get_flags flags.extend(getattr(self.flag_vars, tag)) File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\environment.py", line 37, in __getattr__ return self._get_var(name, conf_desc) File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\environment.py", line 51, in _get_var var = self._hook_handler(name, hook) File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\fcompiler\__init__.py", line 698, in _environment_hook return hook() File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\fcompiler\intel.py", line 63, in get_flags_arch v = self.get_version() File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\fcompiler\__init__.py", line 432, in get_version raise CompilerNotFound() numpy.distutils.fcompiler.CompilerNotFound Any ideas on how to circumvent this? Thanks, Tom This email and any attachments are intended solely for the use of the individual or entity to whom it is addressed and may be confidential and/or privileged. If you are not one of the named recipients or have received this email in error, (i) you should not read, disclose, or copy it, (ii) please notify sender of your receipt by reply email and delete this email and all attachments, (iii) Dassault Systemes does not accept or assume any liability or responsibility for any use of or reliance on this email. For other languages, go to http://www.3ds.com/terms/email-disclaimer This email and any attachments are intended solely for the use of the individual or entity to whom it is addressed and may be confidential and/or privileged. If you are not one of the named recipients or have received this email in error, (i) you should not read, disclose, or copy it, (ii) please notify sender of your receipt by reply email and delete this email and all attachments, (iii) Dassault Systemes does not accept or assume any liability or responsibility for any use of or reliance on this email. For other languages, go to http://www.3ds.com/terms/email-disclaimer -------------- next part -------------- An HTML attachment was scrubbed... URL: From e.antero.tammi at gmail.com Fri Jun 22 12:12:56 2012 From: e.antero.tammi at gmail.com (eat) Date: Fri, 22 Jun 2012 19:12:56 +0300 Subject: [Numpy-discussion] Good way to develop numpy as popular choice! In-Reply-To: References: <20120622045150.GA13499@phare.normalesup.org> Message-ID: Hi, On Fri, Jun 22, 2012 at 6:05 PM, Benjamin Root wrote: > > > On Fri, Jun 22, 2012 at 10:25 AM, Travis Oliphant wrote: > >> Accessing individual elements of NumPy arrays is slower than accessing >> individual elements of lists --- around 2.5x-3x slower. NumPy has to do >> more work to figure out what kind of indexing you are trying to do because >> of its flexibility. It also has to create the Python object to return. >> In contrast, the list approach already has the Python objects created and >> you are just returning pointers to them and there is much less flexibility >> in the kinds of indexing you can do. >> >> Simple timings show that a.item(i,j) is about 2x slower than list element >> access (but faster than a[i,j] which is about 2.5x to 3x slower). The >> slowness of a.item is due to the need to create the Python object to return >> (there are just raw bytes there) so it gives some idea of the relative cost >> of each part of the slowness of a[i,j]. >> >> Also, math on the array scalars returned from NumPy will be slower than >> math on integers and floats --- because NumPy re-uses the ufunc machinery >> which is not optimized at all for scalars. >> >> The take-away is that NumPy is built for doing vectorized operations on >> bytes of data. It is not optimized for doing element-by-element >> individual access. The right approach there is to just use lists (or use >> a version specialized for the kind of data in the lists that removes the >> boxing and unboxing). >> >> Here are my timings using IPython for NumPy indexing: >> >> 1-D: >> >> In[2]: a = arange(100) >> >> In [3]: %timeit [a.item(i) for i in xrange(100)] >> 10000 loops, best of 3: 25.6 us per loop >> >> In [4]: %timeit [a[i] for i in xrange(100)] >> 10000 loops, best of 3: 31.8 us per loop >> >> In [5]: al = a.tolist() >> >> In [6]: %timeit [al[i] for i in xrange(100)] >> 100000 loops, best of 3: 10.6 us per loop >> >> >> >> 2-D: >> >> In [7]: a = arange(100).reshape(10,10) >> >> In [8]: al = a.tolist() >> >> In [9]: %timeit [al[i][j] for i in xrange(10) for j in xrange(10)] >> 10000 loops, best of 3: 18.6 us per loop >> >> In [10]: %timeit [a[i,j] for i in xrange(10) for j in xrange(10)] >> 10000 loops, best of 3: 44.4 us per loop >> >> In [11]: %timeit [a.item(i,j) for i in xrange(10) for j in xrange(10)] >> 10000 loops, best of 3: 34.2 us per loop >> >> >> >> -Travis >> >> > However, what is the timing/memory cost of converting a large numpy array > that already exists into python list of lists? If all my processing before > the munkres step is using NumPy, converting it into python lists has a > cost. Also, your timings indicate only ~2x slowdown, while the timing > tests done by eat show an order-of-magnitude difference. I suspect there > is great room for improvement before even starting to worry about the array > access issues. > To create list of list from array is quite fast, like In []: A= rand(500, 500) In []: %timeit A.tolist() 100 loops, best of 3: 10.8 ms per loop Regards, -eat > > Cheers! > Ben Root > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Fri Jun 22 13:29:42 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 22 Jun 2012 19:29:42 +0200 Subject: [Numpy-discussion] Issue tracking In-Reply-To: References: <3C64F2BE-50E5-403C-9022-71233A6E3449@continuum.io> Message-ID: On Fri, Jun 22, 2012 at 9:49 AM, Thouis (Ray) Jones wrote: > On Mon, Jun 4, 2012 at 7:43 PM, Travis Oliphant > wrote: > > I have turned on issue tracking and started a few labels. Feel free to > add > > more / adjust the names as appropriate. I am trying to find someone > who > > can help manage the migration from Trac. > > Are the github issues set up sufficiently for Trac to be disabled and > github to take over? You lost me here. You were going to set up a test site where we could see the Trac --> Github conversion could be tested, before actually pushing that conversion to the numpy Github repo. If you sent a message that that was ready, I must have missed it. The current state of labels on https://github.com/numpy/numpy/issues is also far from complete (no prios, components). Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From Tom.KACVINSKY at 3ds.com Fri Jun 22 15:41:54 2012 From: Tom.KACVINSKY at 3ds.com (KACVINSKY Tom) Date: Fri, 22 Jun 2012 19:41:54 +0000 Subject: [Numpy-discussion] Building numpy with MSVC 2010 and Intel Fortran 10.x (and higher) In-Reply-To: References: Message-ID: I found the problem. It was a missing /manifest option in the distutils bundled with Python 2.6 (how this ever worked without my patch I'll never understand). Anyway, I now have numpy built with MKL 10.3, MSVC 2010, and Intel Fortran 11. What I need to know is how to verify that the linear algebra routines are using MKL without benchmarking numpy. Thoughts on the matter? From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of KACVINSKY Tom Sent: Friday, June 22, 2012 11:22 AM To: 'numpy-discussion at scipy.org' Subject: Re: [Numpy-discussion] Building numpy with MSVC 2010 and Intel Fortran 10.x (and higher) Ooops. The message about the Fortran compiler version comes from building numpy 1.6.1. The problem I am seeing with numpy 1.4.1 follows. No module named msvccompiler in numpy.distutils; trying from distutils customize IntelFCompiler Found executable c:\Program Files (x86)\Intel\Composer XE\bin\intel64\ifort.exe Found executable c:\Program Files (x86)\Intel\Composer XE\bin\intel64\ifort.exe C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\amd64\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -Inumpy\core\src\private -Inump y\core\src -Inumpy\core -Inumpy\core\src\npymath -Inumpy\core\src\multiarray -Inumpy\core\src\umath -Inumpy\core\include -IC:\Users\tky\Pyth on\release\include -IC:\Users\tky\Python\release\PC /Tc_configtest.c /Fo_configtest.obj Found executable C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\amd64\cl.exe C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\amd64\link.exe /nologo /INCREMENTAL:NO _configtest.obj /OUT:_configtest.exe /MANI FESTFILE:_configtest.exe.manifest Found executable C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\amd64\link.exe mt.exe -nologo -manifest _configtest.exe.manifest -outputresource:_configtest.exe;1 Found executable C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\bin\x64\mt.exe _configtest.exe.manifest : general error c1010070: Failed to load and parse the manifest. The system cannot find the file specified. failure. removing: _configtest.c _configtest.obj Traceback (most recent call last): File "setup.py", line 187, in setup_package() File "setup.py", line 180, in setup_package configuration=configuration ) File "C:\Users\tky\Python\numpy-1.4.1\numpy\distutils\core.py", line 186, in setup return old_setup(**new_attr) File "C:\Users\tky\Python\release\lib\distutils\core.py", line 152, in setup dist.run_commands() File "C:\Users\tky\Python\release\lib\distutils\dist.py", line 975, in run_commands self.run_command(cmd) File "C:\Users\tky\Python\release\lib\distutils\dist.py", line 995, in run_command cmd_obj.run() File "C:\Users\tky\Python\numpy-1.4.1\numpy\distutils\command\build.py", line 37, in run old_build.run(self) File "C:\Users\tky\Python\release\lib\distutils\command\build.py", line 134, in run self.run_command(cmd_name) File "C:\Users\tky\Python\release\lib\distutils\cmd.py", line 333, in run_command self.distribution.run_command(command) File "C:\Users\tky\Python\release\lib\distutils\dist.py", line 995, in run_command cmd_obj.run() File "C:\Users\tky\Python\numpy-1.4.1\numpy\distutils\command\build_src.py", line 152, in run self.build_sources() File "C:\Users\tky\Python\numpy-1.4.1\numpy\distutils\command\build_src.py", line 163, in build_sources self.build_library_sources(*libname_info) File "C:\Users\tky\Python\numpy-1.4.1\numpy\distutils\command\build_src.py", line 298, in build_library_sources sources = self.generate_sources(sources, (lib_name, build_info)) File "C:\Users\tky\Python\numpy-1.4.1\numpy\distutils\command\build_src.py", line 385, in generate_sources source = func(extension, build_dir) File "numpy\core\setup.py", line 657, in get_mathlib_info raise RuntimeError("Broken toolchain: cannot link a simple C program") RuntimeError: Broken toolchain: cannot link a simple C program From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of KACVINSKY Tom Sent: Friday, June 22, 2012 11:17 AM To: 'numpy-discussion at scipy.org' Subject: [Numpy-discussion] Building numpy with MSVC 2010 and Intel Fortran 10.x (and higher) I have reason to build Python 2.6.8 and numpy 1.4.1 with MSVC 2010 and Intel Fortran 10.1 (and higher). I also am building with MKL 10.3. So far, I am able to get the setup to recognize the MKL libraries: C:\Users\tky\Python\numpy-1.6.2>python setup.py build --compiler=msvc --fcompiler=intel Running from numpy source directory.Forcing DISTUTILS_USE_SDK=1 F2PY Version 2 blas_opt_info: blas_mkl_info: FOUND: libraries = ['mkl_sequential_dll', 'mkl_intel_lp64_dll', 'mkl_core_dll'] library_dirs = ['u:\\users\\tky\\mkllib'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['u:\\users\\tky\\mklinc'] FOUND: libraries = ['mkl_sequential_dll', 'mkl_intel_lp64_dll', 'mkl_core_dll'] library_dirs = ['u:\\users\\tky\\mkllib'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['u:\\users\\tky\\mklinc'] lapack_opt_info: lapack_mkl_info: mkl_info: FOUND: libraries = ['mkl_sequential_dll', 'mkl_intel_lp64_dll', 'mkl_core_dll'] library_dirs = ['u:\\users\\tky\\mkllib'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['u:\\users\\tky\\mklinc'] FOUND: libraries = ['mkl_sequential_dll mkl_intel_lp64_dll mkl_core_dll', 'mkl_sequential_dll', 'mkl_intel_lp64_dll', 'mkl_core_dll'] library_dirs = ['u:\\users\\tky\\mkllib'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['u:\\users\\tky\\mklinc'] FOUND: libraries = ['mkl_sequential_dll mkl_intel_lp64_dll mkl_core_dll', 'mkl_sequential_dll', 'mkl_intel_lp64_dll', 'mkl_core_dll'] library_dirs = ['u:\\users\\tky\\mkllib'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['u:\\users\\tky\\mklinc'] But where things are failing is during the compiler checks: customize IntelFCompiler Found executable C:\Program Files (x86)\Intel\Compiler\Fortran\10.1.011\EM64T\bin\ifort.exe Found executable C:\Program Files (x86)\Intel\Compiler\Fortran\10.1.011\EM64T\bin\ifort.exe Traceback (most recent call last): File "setup.py", line 214, in setup_package() File "setup.py", line 207, in setup_package configuration=configuration ) File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\core.py", line 186, in setup return old_setup(**new_attr) File "C:\Users\tky\Python\release\lib\distutils\core.py", line 152, in setup dist.run_commands() File "C:\Users\tky\Python\release\lib\distutils\dist.py", line 975, in run_commands self.run_command(cmd) File "C:\Users\tky\Python\release\lib\distutils\dist.py", line 995, in run_command cmd_obj.run() File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\command\build.py", line 37, in run old_build.run(self) File "C:\Users\tky\Python\release\lib\distutils\command\build.py", line 134, in run self.run_command(cmd_name) File "C:\Users\tky\Python\release\lib\distutils\cmd.py", line 333, in run_command self.distribution.run_command(command) File "C:\Users\tky\Python\release\lib\distutils\dist.py", line 995, in run_command cmd_obj.run() File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\command\build_src.py", line 152, in run self.build_sources() File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\command\build_src.py", line 163, in build_sources self.build_library_sources(*libname_info) File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\command\build_src.py", line 298, in build_library_sources sources = self.generate_sources(sources, (lib_name, build_info)) File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\command\build_src.py", line 385, in generate_sources source = func(extension, build_dir) File "numpy\core\setup.py", line 694, in get_mathlib_info st = config_cmd.try_link('int main(void) { return 0;}') File "C:\Users\tky\Python\release\lib\distutils\command\config.py", line 257, in try_link self._check_compiler() File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\command\config.py", line 77, in _check_compiler self.fcompiler.customize(self.distribution) File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\fcompiler\__init__.py", line 502, in customize get_flags('arch', aflags) File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\fcompiler\__init__.py", line 491, in get_flags flags.extend(getattr(self.flag_vars, tag)) File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\environment.py", line 37, in __getattr__ return self._get_var(name, conf_desc) File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\environment.py", line 51, in _get_var var = self._hook_handler(name, hook) File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\fcompiler\__init__.py", line 698, in _environment_hook return hook() File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\fcompiler\intel.py", line 63, in get_flags_arch v = self.get_version() File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\fcompiler\__init__.py", line 432, in get_version raise CompilerNotFound() numpy.distutils.fcompiler.CompilerNotFound Any ideas on how to circumvent this? Thanks, Tom This email and any attachments are intended solely for the use of the individual or entity to whom it is addressed and may be confidential and/or privileged. If you are not one of the named recipients or have received this email in error, (i) you should not read, disclose, or copy it, (ii) please notify sender of your receipt by reply email and delete this email and all attachments, (iii) Dassault Systemes does not accept or assume any liability or responsibility for any use of or reliance on this email. For other languages, go to http://www.3ds.com/terms/email-disclaimer This email and any attachments are intended solely for the use of the individual or entity to whom it is addressed and may be confidential and/or privileged. If you are not one of the named recipients or have received this email in error, (i) you should not read, disclose, or copy it, (ii) please notify sender of your receipt by reply email and delete this email and all attachments, (iii) Dassault Systemes does not accept or assume any liability or responsibility for any use of or reliance on this email. For other languages, go to http://www.3ds.com/terms/email-disclaimer This email and any attachments are intended solely for the use of the individual or entity to whom it is addressed and may be confidential and/or privileged. If you are not one of the named recipients or have received this email in error, (i) you should not read, disclose, or copy it, (ii) please notify sender of your receipt by reply email and delete this email and all attachments, (iii) Dassault Systemes does not accept or assume any liability or responsibility for any use of or reliance on this email. For other languages, go to http://www.3ds.com/terms/email-disclaimer -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Fri Jun 22 16:42:19 2012 From: travis at continuum.io (Travis Oliphant) Date: Fri, 22 Jun 2012 15:42:19 -0500 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> Message-ID: > > The usual practice is to announce a schedule first. I just did announce the schedule. > > There is time before the first Release candidate to make changes on the 1.7.x branch. If you want to make the changes on master, and just indicate the Pull requests, Ondrej can make sure they are added to the 1.7.x. branch by Monday. We can also delay the first Release Candidate by a few days to next Wednesday and then bump everything 3 days if that will help. There will be a follow-on 1.8 release before the end of the year --- so there is time to make changes for that release as well. The next release will not take a year to get out, so we shouldn't feel pressured to get *everything* in this release. > > What are we going to do for 1.8? Let's get 1.7 out the door first. > > Yes, the functions will give warnings otherwise. I think this needs to be revisited. I don't think these changes are necessary for *every* use of macros. It can cause a lot of effort for people downstream without concrete benefit. > > That's not as nice to type. > > So? The point is to have correctness, not ease of typing. I'm not sure if a pun was intended there or not. C is not a safe and fully-typed system. That is one of its weaknesses according to many. But, I would submit that not being forced to give everything a "type" (and recognizing the tradeoffs that implies) is also one reason it gets used. > > Is that assuming that PyArray_NDIM will become a function and need a specific object type for its argument (and everything else cast....). That's one clear disadvantage of inline functions versus macros in my mind: no automatic polymorphism. > > That's a disadvantage of Python. The virtue of inline functions is precisely type checking. Right, but we need to be more conscientious about this. Not every use of Macros should be replaced by inline function calls and the requisite *forced* type-checking. type-chekcing is not *universally* a virtue --- if it were, nobody would use Python. > > I don't think type safety is a big win for macros like these. We need to be more judicious about which macros are scheduled for function inlining. Some just don't benefit from the type-safety implications as much as others do, and you end up requiring everyone to change their code downstream for no real reason. > > These sorts of changes really feel to me like unnecessary spelling changes that require work from extension writers who now have to modify their code with no real gain. There seems to be a lot of that going on in the code base and I'm not really convinced that it's useful for end-users. > > Good style and type checking are useful. Numpy needs more of both. You can assert it, but it doesn't make it so. "Good style" depends on what you are trying to accomplish and on your point of view. NumPy's style is not the product of one person, it's been adapted from multiple styles and inherits quite a bit from Python's style. I don't make any claims for it other than it allowed me to write it with the time and experience I had 7 years ago. We obviously disagree about this point. I'm sorry about that. I'm pretty flexible usually --- that's probably one of your big criticisms of my "style". But, one of the things I feel quite strongly about is how hard we make it for NumPy users to upgrade. There are two specific things I disagree with pretty strongly: 1) Changing defined macros that should work the same on PyArrayObjects or PyObjects to now *require* types --- if we want to introduce new macros that require types than we can --- as long as it just provides warnings but still compiles then I suppose I could find this acceptable. 2) Changing MACROS to require semicolons when they were previously not needed. I'm going to be very hard-nosed about this one. > > I'm going to be a lot more resistant to that sort of change in the code base when I see it. > > Numpy is a team effort. There are people out there who write better code than you do, you should learn from them. Exactly! It's a team effort. I'm part of that team as well, and while I don't always have strong opinions about things. When I do, I'm going to voice it. I've learned long ago there are people that write better code than me. There are people that write better code than you. That is not the question here at all. The question here is not requiring a *re-write* of code in order to get their extensions to compile using NumPy headers. We should not be making people change their code to get their extensions to compile in NumPy 1.X > > > One particularly glaring example to my lens on the world: I think it would have been better to define new macros which require semicolons than changing the macros that don't require semicolons to now require semicolons: > > NPY_BEGIN_THREADS_DEF > NPY_BEGIN_THREADS > NPY_ALLOW_C_API > NPY_ALLOW_C_API_DEF > NPY_DISABLE_C_API > > That feels like a gratuitous style change that will force users of those macros to re-write their code. > > It doesn't seem to be much of a problem. Unfortunately, I don't trust your judgment on that. My experience and understanding tells a much different story. I'm sorry if you disagree with me. > > Sure, it's a simple change, but it's a simple change that doesn't do anything for you as an end user. I think I'm going to back this change out, in fact. I can't see requiring people to change their C-code like this will require without a clear benefit to them. I'm quite sure there is code out there that uses these documented APIs (without the semicolon). If we want to define new macros that require colons, then we do that, but we can't get rid of the old ones --- especially in a 1.x release. > > Our policy should not be to allow gratuitous style changes just because we think something is prettier another way. The NumPy code base has come from multiple sources and reflects several styles. It also follows an older style of C-programming (that is quite common in the Python code base). It can be changed, but those changes shouldn't be painful for a library user without some specific gain for them that the change allows. > > > You use that word 'gratuitous' a lot, I don't think it means what you think it means. For instance, the new polynomial coefficient order wasn't gratuitous, it was doing things in a way many found more intuitive and generalized better to different polynomial basis. People > have different ideas, that doesn't make them gratuitous. That's a slightly different issue. At least you created a new object and api which is a *little* better. My complaint about the choice there is now there *must* be two interfaces and added confusion as people will have to figure out which assumption is being used. I don't really care about the coefficient order --- really I don't. Either one is fine in my mind. I recognize the reasons. The problem is *changing* it without a *really* good reason. Now, we have to have two different APIs. I would much preferred to have poly1d disappear and just use your much nicer polynomial classes. Now, it can't and we are faced with a user-story that is either difficult for someone transitioning from MATLAB or a "why did you do that?" puzzled look from a new user as to why we support both coefficient orders. Of course, that could be our story --- hey we support all kinds of orders, it doesn't really matter, you just have to tell us what you mean when passing in an unadorned array of coefficients. But, this is a different issue. I'm using the word 'gratuitous' to mean that it is "uncalled for and lacks a good reason". There needs to be much better reasons given for code changes that require someone to re-write working code than "it's better style" or even "it will help new programmers avoid errors". Let's write another interface that new programmers can use that fits the world the way you see it, don't change what's already working just because you don't like it or wish a different choice had been made. > > There are significant users of NumPy out there still on 1.4. Even the policy of deprecation that has been discussed will not help people trying to upgrade from 1.4 to 1.8. They will be forced to upgrade multiple times. The easier we can make this process for users the better. I remain convinced that it's better and am much more comfortable with making a release that requires a re-compile (that will succeed without further code changes --- because of backward compatibility efforts) than to have supposed ABI compatibility with subtle semantic changes and required C-code changes when you do happen to re-compile. > > > Cleanups need to be made bit by bit. I don't think we have done anything that will cause undo trouble. I disagree substantially on the impact of these changes. You can disagree about my awareness of NumPy users, but I think I understand a large number of them and why NumPy has been successful in getting users. I agree that we have been unsuccessful at getting serious developers and I'm convinced by you and Mark as to why that is. But, we can't sacrifice users for the sake of getting developers who will spend their free time trying to get around the organic pile that NumPy is at this point. Because of this viewpoint, I think there is some adaptation and cleanup right now, needed, so that significant users of NumPy can upgrade based on the changes that have occurred without causing them annoying errors (even simple changes can be a pain in the neck to fix). I do agree changes can be made. I realize you've worked hard to keep the code-base in a state that you find more adequate. I think you go overboard on that front, but I acknowledge that there are people that appreciate this. I do feel very strongly that we should not require users to have to re-write working C-code in order to use a new minor version number in NumPy, regardless of how the code "looks" or how much "better" it is according to some idealized standard. The macro changes are border-line (at least I believe code will still compile --- just raise warnings, but I need to be sure about this). The changes that require semi-colons are not acceptable at all. Look Charles, I believe we can continue to work productively together and our differences can be a strength to the community. I hope you feel the same way. I will continue to respect and listen to your perspective --- especially when I disagree with it. -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From ischnell at enthought.com Fri Jun 22 17:04:59 2012 From: ischnell at enthought.com (Ilan Schnell) Date: Fri, 22 Jun 2012 16:04:59 -0500 Subject: [Numpy-discussion] Building numpy with MSVC 2010 and Intel Fortran 10.x (and higher) In-Reply-To: References: Message-ID: It you link dynamically to MKL (vs statically to ATLAS), the easiest way is to look at the size of the C extension numpy/linalg/lapack_lite.pyd I have (on 32-bit Windows) 24KB when linking to MKL, as supposed to 2.3MB when linking statically to ATLAS, where all lapack symbols are linked in. - Ilan On Fri, Jun 22, 2012 at 2:41 PM, KACVINSKY Tom wrote: > I found the problem.? It was a missing /manifest option in the distutils > bundled with Python 2.6 (how this ever worked without my patch I'll never > understand).? ?Anyway, I now have numpy built with MKL 10.3, MSVC 2010, and > Intel Fortran 11.? What I need to know is how to verify that the linear > algebra routines are using MKL without benchmarking numpy.? Thoughts on the > matter? > > > > From: numpy-discussion-bounces at scipy.org > [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of KACVINSKY Tom > Sent: Friday, June 22, 2012 11:22 AM > To: 'numpy-discussion at scipy.org' > Subject: Re: [Numpy-discussion] Building numpy with MSVC 2010 and Intel > Fortran 10.x (and higher) > > > > Ooops. The message about the Fortran compiler version comes from building > numpy 1.6.1.? The problem I am seeing with numpy 1.4.1 follows. > > > > No module named msvccompiler in numpy.distutils; trying from distutils > > customize IntelFCompiler > > Found executable c:\Program Files (x86)\Intel\Composer > XE\bin\intel64\ifort.exe > > Found executable c:\Program Files (x86)\Intel\Composer > XE\bin\intel64\ifort.exe > > C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\amd64\cl.exe /c > /nologo /Ox /MD /W3 /GS- /DNDEBUG -Inumpy\core\src\private -Inump > > y\core\src -Inumpy\core -Inumpy\core\src\npymath -Inumpy\core\src\multiarray > -Inumpy\core\src\umath -Inumpy\core\include -IC:\Users\tky\Pyth > > on\release\include -IC:\Users\tky\Python\release\PC /Tc_configtest.c > /Fo_configtest.obj > > Found executable C:\Program Files (x86)\Microsoft Visual Studio > 10.0\VC\BIN\amd64\cl.exe > > C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\BIN\amd64\link.exe > /nologo /INCREMENTAL:NO _configtest.obj /OUT:_configtest.exe /MANI > > FESTFILE:_configtest.exe.manifest > > Found executable C:\Program Files (x86)\Microsoft Visual Studio > 10.0\VC\BIN\amd64\link.exe > > mt.exe -nologo -manifest _configtest.exe.manifest > -outputresource:_configtest.exe;1 > > Found executable C:\Program Files (x86)\Microsoft > SDKs\Windows\v7.0A\bin\x64\mt.exe > > > > _configtest.exe.manifest : general error c1010070: Failed to load and parse > the manifest. The system cannot find the file specified. > > failure. > > removing: _configtest.c _configtest.obj > > Traceback (most recent call last): > > ? File "setup.py", line 187, in > > ??? setup_package() > > ? File "setup.py", line 180, in setup_package > > ??? configuration=configuration ) > > ? File "C:\Users\tky\Python\numpy-1.4.1\numpy\distutils\core.py", line 186, > in setup > > ??? return old_setup(**new_attr) > > ? File "C:\Users\tky\Python\release\lib\distutils\core.py", line 152, in > setup > > ??? dist.run_commands() > > ? File "C:\Users\tky\Python\release\lib\distutils\dist.py", line 975, in > run_commands > > ??? self.run_command(cmd) > > ? File "C:\Users\tky\Python\release\lib\distutils\dist.py", line 995, in > run_command > > ??? cmd_obj.run() > > ? File "C:\Users\tky\Python\numpy-1.4.1\numpy\distutils\command\build.py", > line 37, in run > > ??? old_build.run(self) > > ? File "C:\Users\tky\Python\release\lib\distutils\command\build.py", line > 134, in run > > ??? self.run_command(cmd_name) > > ? File "C:\Users\tky\Python\release\lib\distutils\cmd.py", line 333, in > run_command > > ??? self.distribution.run_command(command) > > ? File "C:\Users\tky\Python\release\lib\distutils\dist.py", line 995, in > run_command > > ??? cmd_obj.run() > > ? File > "C:\Users\tky\Python\numpy-1.4.1\numpy\distutils\command\build_src.py", line > 152, in run > > ??? self.build_sources() > > ? File > "C:\Users\tky\Python\numpy-1.4.1\numpy\distutils\command\build_src.py", line > 163, in build_sources > > ??? self.build_library_sources(*libname_info) > > ? File > "C:\Users\tky\Python\numpy-1.4.1\numpy\distutils\command\build_src.py", line > 298, in build_library_sources > > ??? sources = self.generate_sources(sources, (lib_name, build_info)) > > ? File > "C:\Users\tky\Python\numpy-1.4.1\numpy\distutils\command\build_src.py", line > 385, in generate_sources > > ??? source = func(extension, build_dir) > > ? File "numpy\core\setup.py", line 657, in get_mathlib_info > > ??? raise RuntimeError("Broken toolchain: cannot link a simple C program") > > RuntimeError: Broken toolchain: cannot link a simple C program > > > > > > > > From: numpy-discussion-bounces at scipy.org > [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of KACVINSKY Tom > Sent: Friday, June 22, 2012 11:17 AM > To: 'numpy-discussion at scipy.org' > Subject: [Numpy-discussion] Building numpy with MSVC 2010 and Intel Fortran > 10.x (and higher) > > > > I have reason to build Python 2.6.8 and numpy 1.4.1 with MSVC 2010 and Intel > Fortran 10.1 (and higher).? I also am building with MKL 10.3. > > > > So far, I am able to get the setup to recognize the MKL libraries: > > > > C:\Users\tky\Python\numpy-1.6.2>python setup.py build --compiler=msvc > --fcompiler=intel > > Running from numpy source directory.Forcing DISTUTILS_USE_SDK=1 > > F2PY Version 2 > > blas_opt_info: > > blas_mkl_info: > > ? FOUND: > > ??? libraries = ['mkl_sequential_dll', 'mkl_intel_lp64_dll', 'mkl_core_dll'] > > ??? library_dirs = ['u:\\users\\tky\\mkllib'] > > ??? define_macros = [('SCIPY_MKL_H', None)] > > ??? include_dirs = ['u:\\users\\tky\\mklinc'] > > > > ? FOUND: > > ??? libraries = ['mkl_sequential_dll', 'mkl_intel_lp64_dll', 'mkl_core_dll'] > > ??? library_dirs = ['u:\\users\\tky\\mkllib'] > > ??? define_macros = [('SCIPY_MKL_H', None)] > > ??? include_dirs = ['u:\\users\\tky\\mklinc'] > > > > lapack_opt_info: > > lapack_mkl_info: > > mkl_info: > > ? FOUND: > > ??? libraries = ['mkl_sequential_dll', 'mkl_intel_lp64_dll', 'mkl_core_dll'] > > ??? library_dirs = ['u:\\users\\tky\\mkllib'] > > ??? define_macros = [('SCIPY_MKL_H', None)] > > ??? include_dirs = ['u:\\users\\tky\\mklinc'] > > > > ? FOUND: > > ??? libraries = ['mkl_sequential_dll mkl_intel_lp64_dll mkl_core_dll', > 'mkl_sequential_dll', 'mkl_intel_lp64_dll', 'mkl_core_dll'] > > ??? library_dirs = ['u:\\users\\tky\\mkllib'] > > ??? define_macros = [('SCIPY_MKL_H', None)] > > ??? include_dirs = ['u:\\users\\tky\\mklinc'] > > > > ? FOUND: > > ??? libraries = ['mkl_sequential_dll mkl_intel_lp64_dll mkl_core_dll', > 'mkl_sequential_dll', 'mkl_intel_lp64_dll', 'mkl_core_dll'] > > ??? library_dirs = ['u:\\users\\tky\\mkllib'] > > ??? define_macros = [('SCIPY_MKL_H', None)] > > ??? include_dirs = ['u:\\users\\tky\\mklinc'] > > > > But where things are failing is during the compiler checks: > > > > customize IntelFCompiler > > Found executable C:\Program Files > (x86)\Intel\Compiler\Fortran\10.1.011\EM64T\bin\ifort.exe > > Found executable C:\Program Files > (x86)\Intel\Compiler\Fortran\10.1.011\EM64T\bin\ifort.exe > > Traceback (most recent call last): > > ? File "setup.py", line 214, in > > ??? setup_package() > > ? File "setup.py", line 207, in setup_package > > ??? configuration=configuration ) > > ? File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\core.py", line 186, > in setup > > ??? return old_setup(**new_attr) > > ? File "C:\Users\tky\Python\release\lib\distutils\core.py", line 152, in > setup > > ??? dist.run_commands() > > ? File "C:\Users\tky\Python\release\lib\distutils\dist.py", line 975, in > run_commands > > ??? self.run_command(cmd) > > ? File "C:\Users\tky\Python\release\lib\distutils\dist.py", line 995, in > run_command > > ??? cmd_obj.run() > > ? File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\command\build.py", > line 37, in run > > ??? old_build.run(self) > > ? File "C:\Users\tky\Python\release\lib\distutils\command\build.py", line > 134, in run > > ??? self.run_command(cmd_name) > > ? File "C:\Users\tky\Python\release\lib\distutils\cmd.py", line 333, in > run_command > > ??? self.distribution.run_command(command) > > ? File "C:\Users\tky\Python\release\lib\distutils\dist.py", line 995, in > run_command > > ??? cmd_obj.run() > > ? File > "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\command\build_src.py", line > 152, in run > > ??? self.build_sources() > > ? File > "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\command\build_src.py", line > 163, in build_sources > > ??? self.build_library_sources(*libname_info) > > ? File > "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\command\build_src.py", line > 298, in build_library_sources > > ??? sources = self.generate_sources(sources, (lib_name, build_info)) > > ? File > "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\command\build_src.py", line > 385, in generate_sources > > ??? source = func(extension, build_dir) > > ? File "numpy\core\setup.py", line 694, in get_mathlib_info > > ??? st = config_cmd.try_link('int main(void) { return 0;}') > > ? File "C:\Users\tky\Python\release\lib\distutils\command\config.py", line > 257, in try_link > > ??? self._check_compiler() > > ? File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\command\config.py", > line 77, in _check_compiler > > ??? self.fcompiler.customize(self.distribution) > > ? File > "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\fcompiler\__init__.py", > line 502, in customize > > ??? get_flags('arch', aflags) > > ? File > "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\fcompiler\__init__.py", > line 491, in get_flags > > ??? flags.extend(getattr(self.flag_vars, tag)) > > ? File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\environment.py", > line 37, in __getattr__ > > ??? return self._get_var(name, conf_desc) > > ? File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\environment.py", > line 51, in _get_var > > ??? var = self._hook_handler(name, hook) > > ? File > "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\fcompiler\__init__.py", > line 698, in _environment_hook > > ??? return hook() > > ? File "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\fcompiler\intel.py", > line 63, in get_flags_arch > > ??? v = self.get_version() > > ? File > "C:\Users\tky\Python\numpy-1.6.2\numpy\distutils\fcompiler\__init__.py", > line 432, in get_version > > ??? raise CompilerNotFound() > > numpy.distutils.fcompiler.CompilerNotFound > > > > Any ideas on how to circumvent this? > > > > Thanks, > > > > Tom > > > > > > This email and any attachments are intended solely for the use of the > individual or entity to whom it is addressed and may be confidential and/or > privileged. > > If you are not one of the named recipients or have received this email in > error, > > (i) you should not read, disclose, or copy it, > > (ii) please notify sender of your receipt by reply email and delete this > email and all attachments, > > (iii) Dassault Systemes does not accept or assume any liability or > responsibility for any use of or reliance on this email. > > For other languages, go to http://www.3ds.com/terms/email-disclaimer > > This email and any attachments are intended solely for the use of the > individual or entity to whom it is addressed and may be confidential and/or > privileged. > > If you are not one of the named recipients or have received this email in > error, > > (i) you should not read, disclose, or copy it, > > (ii) please notify sender of your receipt by reply email and delete this > email and all attachments, > > (iii) Dassault Systemes does not accept or assume any liability or > responsibility for any use of or reliance on this email. > > For other languages, go to http://www.3ds.com/terms/email-disclaimer > > This email and any attachments are intended solely for the use of the > individual or entity to whom it is addressed and may be confidential and/or > privileged. > > If you are not one of the named recipients or have received this email in > error, > > (i) you should not read, disclose, or copy it, > > (ii) please notify sender of your receipt by reply email and delete this > email and all attachments, > > (iii) Dassault Systemes does not accept or assume any liability or > responsibility for any use of or reliance on this email. > > For other languages, go to http://www.3ds.com/terms/email-disclaimer > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From fperez.net at gmail.com Fri Jun 22 22:50:08 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Fri, 22 Jun 2012 19:50:08 -0700 Subject: [Numpy-discussion] Is IPython useful for your research/industry work? Feedback wanted for grant proposal. Message-ID: Hi folks, sorry for the cross-post, but I expect all replies to this to happen off-list. I'm in the process of writing an NSF grant that will partly include IPython support, and along with Brian we will soon be doing more of the same. In the past we haven't had the best of luck with the NFS, hopefully this time it will be better. I think one mistake we've made has been to have very little in the way of hard evidence of the value (if any) that IPython provides to the scientific work of others and to industry. So I would greatly appreciate if you can contact me off-list (best at Fernando.Perez at berkeley.edu) with any info that I could use in a typical NSF grant application. I'm not looking for marketing-type testimonials nor letters of support (the NSF frowns on those), but rather specific info, best if backed by journal citations, on how and where IPython plays an important role in your research or industry project (while the NSF is a science funding agency, it also has as part of its mission the economic well-being of the US). I'd also like to clarify that I'm not looking for quotes strictly of personal use as an interactive shell, since I know in this community most people do that. What I'm after are things like: - a research project that builds on IPython in some capacity - important results obtained with the IPython parallel machinery that were better/easier/whatever than a classical approach with other tools - uses of the notebook in education - anything else along these lines you can think of, that goes beyond pure personal shell use. Thanks! Again, in the interest of keeping list noise down, please reply directly to me: Fernando.Perez at berkeley.edu. f From charlesr.harris at gmail.com Fri Jun 22 23:14:05 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 22 Jun 2012 21:14:05 -0600 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> Message-ID: On Fri, Jun 22, 2012 at 2:42 PM, Travis Oliphant wrote: > > The usual practice is to announce a schedule first. > > > I just did announce the schedule. > > What has been done in the past is that an intent to fork is announced some two weeks in advance so that people can weigh in on what needs to be done before the fork. The immediate fork was a bit hasty. Likewise, when I suggested going to the github issue tracking, I opened a discussion on needed tags, but voila, there it was with an incomplete set and no discussion. That to seemed hasty. > > >> There is time before the first Release candidate to make changes on the >> 1.7.x branch. If you want to make the changes on master, and just >> indicate the Pull requests, Ondrej can make sure they are added to the >> 1.7.x. branch by Monday. We can also delay the first Release Candidate >> by a few days to next Wednesday and then bump everything 3 days if that >> will help. There will be a follow-on 1.8 release before the end of the >> year --- so there is time to make changes for that release as well. The >> next release will not take a year to get out, so we shouldn't feel >> pressured to get *everything* in this release. >> > > What are we going to do for 1.8? > > > Let's get 1.7 out the door first. > Mark proposed a schedule for the next several releases, I'd like to know if we are going to follow it. > > > Yes, the functions will give warnings otherwise. > > > I think this needs to be revisited. I don't think these changes are > necessary for *every* use of macros. It can cause a lot of effort for > people downstream without concrete benefit. > The idea is to slowly move towards hiding the innards of the array type. This has been under discussion since 1.3 came out. It is certainly the case that not all macros need to go away. > > > >> That's not as nice to type. >> > > So? The point is to have correctness, not ease of typing. > > > I'm not sure if a pun was intended there or not. C is not a safe and > fully-typed system. That is one of its weaknesses according to many. > But, I would submit that not being forced to give everything a "type" (and > recognizing the tradeoffs that implies) is also one reason it gets used. > C was famous for bugs due to the lack of function prototypes. This was fixed with C99 and the stricter typing was a great help. > > > > >> Is that assuming that PyArray_NDIM will become a function and need a >> specific object type for its argument (and everything else cast....). >> That's one clear disadvantage of inline functions versus macros in my mind: >> no automatic polymorphism. >> > > That's a disadvantage of Python. The virtue of inline functions is > precisely type checking. > > > Right, but we need to be more conscientious about this. Not every use of > Macros should be replaced by inline function calls and the requisite > *forced* type-checking. type-chekcing is not *universally* a virtue --- > if it were, nobody would use Python. > > > >> I don't think type safety is a big win for macros like these. We need >> to be more judicious about which macros are scheduled for function >> inlining. Some just don't benefit from the type-safety implications as >> much as others do, and you end up requiring everyone to change their code >> downstream for no real reason. >> >> These sorts of changes really feel to me like unnecessary spelling >> changes that require work from extension writers who now have to modify >> their code with no real gain. There seems to be a lot of that going on in >> the code base and I'm not really convinced that it's useful for end-users. >> > > Good style and type checking are useful. Numpy needs more of both. > > > You can assert it, but it doesn't make it so. "Good style" depends on > what you are trying to accomplish and on your point of view. NumPy's style > is not the product of one person, it's been adapted from multiple styles > and inherits quite a bit from Python's style. I don't make any claims for > it other than it allowed me to write it with the time and experience I had > 7 years ago. We obviously disagree about this point. I'm sorry about > that. I'm pretty flexible usually --- that's probably one of your big > criticisms of my "style". > Curiously, my criticism would be more that you are inflexible, slow to change old habits. > > But, one of the things I feel quite strongly about is how hard we make it > for NumPy users to upgrade. There are two specific things I disagree > with pretty strongly: > > 1) Changing defined macros that should work the same on PyArrayObjects or > PyObjects to now *require* types --- if we want to introduce new macros > that require types than we can --- as long as it just provides warnings but > still compiles then I suppose I could find this acceptable. > > 2) Changing MACROS to require semicolons when they were previously not > needed. I'm going to be very hard-nosed about this one. > > > >> I'm going to be a lot more resistant to that sort of change in the code >> base when I see it. >> > > Numpy is a team effort. There are people out there who write better code > than you do, you should learn from them. > > > Exactly! It's a team effort. I'm part of that team as well, and while I > don't always have strong opinions about things. When I do, I'm going to > voice it. > > I've learned long ago there are people that write better code than me. > There are people that write better code than you. > Of course. Writing code is not my profession, and even if it were, there are people out there who would be immeasurable better. I have tried to improve my style over the years by reading books and browsing code by people who are better than me. I also recognize common bad habits naive coders tend to pick up when they start out, not least because I have at one time or another had many of the same bad habits. That is not the question here at all. The question here is not > requiring a *re-write* of code in order to get their extensions to compile > using NumPy headers. We should not be making people change their code to > get their extensions to compile in NumPy 1.X > I think a bit of rewrite here and there along the way is more palatable than a big change coming in as one big lump, especially if the changes are done with a long term goal in mind. We are working towards a Numpy 2, but we can't just go off for a year or two and write it, we have to get there step by step. And that requires a plan. > > > >> >> One particularly glaring example to my lens on the world: I think it >> would have been better to define new macros which require semicolons than >> changing the macros that don't require semicolons to now require >> semicolons: >> >> NPY_BEGIN_THREADS_DEF >> NPY_BEGIN_THREADS >> NPY_ALLOW_C_API >> NPY_ALLOW_C_API_DEF >> NPY_DISABLE_C_API >> >> That feels like a gratuitous style change that will force users of those >> macros to re-write their code. >> > > It doesn't seem to be much of a problem. > > > Unfortunately, I don't trust your judgment on that. My experience and > understanding tells a much different story. I'm sorry if you disagree > with me. > > I'm sorry I made you sorry ;) The problem here is that you don't come forth with specifics. People tell you things, but you don't say who or what their specific problem was. Part of working with a team is keeping folks informed, it isn't that useful to appeal to authority. I watch the list, which is admittedly a small window into the community, and I haven't seen show stoppers. Bugs, sure, but that isn't the same thing. > > >> Sure, it's a simple change, but it's a simple change that doesn't do >> anything for you as an end user. I think I'm going to back this change >> out, in fact. I can't see requiring people to change their C-code like >> this will require without a clear benefit to them. I'm quite sure there >> is code out there that uses these documented APIs (without the semicolon). >> If we want to define new macros that require colons, then we do that, but >> we can't get rid of the old ones --- especially in a 1.x release. >> >> Our policy should not be to allow gratuitous style changes just because >> we think something is prettier another way. The NumPy code base has come >> from multiple sources and reflects several styles. It also follows an >> older style of C-programming (that is quite common in the Python code >> base). It can be changed, but those changes shouldn't be painful for a >> library user without some specific gain for them that the change allows. >> >> > You use that word 'gratuitous' a lot, I don't think it means what you > think it means. For instance, the new polynomial coefficient order wasn't > gratuitous, it was doing things in a way many found more intuitive and > generalized better to different polynomial basis. People > > have different ideas, that doesn't make them gratuitous. > > > That's a slightly different issue. At least you created a new object > and api which is a *little* better. My complaint about the choice there > is now there *must* be two interfaces and added confusion as people will > have to figure out which assumption is being used. I don't really care > about the coefficient order --- really I don't. Either one is fine in my > mind. I recognize the reasons. The problem is *changing* it without a > *really* good reason. Now, we have to have two different APIs. I would > much preferred to have poly1d disappear and just use your much nicer > polynomial classes. Now, it can't and we are faced with a user-story > that is either difficult for someone transitioning from MATLAB > Most folks aren't going to transition from MATLAB or IDL. Engineers tend to stick with the tools they learned in school, they aren't interested in the tool itself as long as they can get their job done. And getting the job done is what they are paid for. That said, I doubt they would have much problem making the adjustment if they were inclined to switch tools. or a "why did you do that?" puzzled look from a new user as to why we > support both coefficient orders. Of course, that could be our story --- > hey we support all kinds of orders, it doesn't really matter, you just have > to tell us what you mean when passing in an unadorned array of > coefficients. But, this is a different issue. > > I'm using the word 'gratuitous' to mean that it is "uncalled for and lacks > a good reason". There needs to be much better reasons given for code > changes that require someone to re-write working code than "it's better > style" or even "it will help new programmers avoid errors". Let's write > another interface that new programmers can use that fits the world the way > you see it, don't change what's already working just because you don't like > it or wish a different choice had been made. > Well, and that was exactly what you meant when you called to coefficient order 'gratuitous' in your first post to me about it. The problem was that you didn't understand why I made the change until I explained it, but rather made the charge sans explanation. It might be that some of the other things you call gratuitous are less so than you think. These are hasty judgements I think. > > > >> There are significant users of NumPy out there still on 1.4. Even the >> policy of deprecation that has been discussed will not help people trying >> to upgrade from 1.4 to 1.8. They will be forced to upgrade multiple >> times. The easier we can make this process for users the better. I >> remain convinced that it's better and am much more comfortable with making >> a release that requires a re-compile (that will succeed without further >> code changes --- because of backward compatibility efforts) than to have >> supposed ABI compatibility with subtle semantic changes and required C-code >> changes when you do happen to re-compile. >> >> > Cleanups need to be made bit by bit. I don't think we have done anything > that will cause undo trouble. > > > I disagree substantially on the impact of these changes. You can disagree > about my awareness of NumPy users, but I think I understand a large number > of them and why NumPy has been successful in getting users. I agree that > we have been unsuccessful at getting serious developers and I'm convinced > by you and Mark as to why that is. But, we can't sacrifice users for the > sake of getting developers who will spend their free time trying to get > around the organic pile that NumPy is at this point. > > Because of this viewpoint, I think there is some adaptation and cleanup > right now, needed, so that significant users of NumPy can upgrade based on > the changes that have occurred without causing them annoying errors (even > simple changes can be a pain in the neck to fix). > > I do agree changes can be made. I realize you've worked hard to keep > the code-base in a state that you find more adequate. I think you go > overboard on that front, but I acknowledge that there are people that > appreciate this. I do feel very strongly that we should not require > users to have to re-write working C-code in order to use a new minor > version number in NumPy, regardless of how the code "looks" or how much > "better" it is according to some idealized standard. > > The macro changes are border-line (at least I believe code will still > compile --- just raise warnings, but I need to be sure about this). The > changes that require semi-colons are not acceptable at all. > I was tempted to back them out myself, but I don't think the upshot will be earth shaking. > > Look Charles, I believe we can continue to work productively together and > our differences can be a strength to the community. I hope you feel the > same way. I will continue to respect and listen to your perspective --- > especially when I disagree with it. > Sounds like a threat to me. Who are you to judge? If you are going to be the dictator, let's put that out there and make it official. Chuck. -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Sat Jun 23 03:32:08 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 23 Jun 2012 09:32:08 +0200 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> Message-ID: <4FE570F8.9060307@astro.uio.no> On 06/23/2012 05:14 AM, Charles R Harris wrote: > > > On Fri, Jun 22, 2012 at 2:42 PM, Travis Oliphant > wrote: > >> >> The usual practice is to announce a schedule first. > > I just did announce the schedule. > > > What has been done in the past is that an intent to fork is announced > some two weeks in advance so that people can weigh in on what needs to > be done before the fork. The immediate fork was a bit hasty. Likewise, > when I suggested going to the github issue tracking, I opened a > discussion on needed tags, but voila, there it was with an incomplete > set and no discussion. That to seemed hasty. > >> >> There is time before the first Release candidate to make >> changes on the 1.7.x branch. If you want to make the changes >> on master, and just indicate the Pull requests, Ondrej can >> make sure they are added to the 1.7.x. branch by Monday. We >> can also delay the first Release Candidate by a few days to >> next Wednesday and then bump everything 3 days if that will >> help. There will be a follow-on 1.8 release before the end >> of the year --- so there is time to make changes for that >> release as well. The next release will not take a year to >> get out, so we shouldn't feel pressured to get *everything* in >> this release. >> >> >> What are we going to do for 1.8? > > Let's get 1.7 out the door first. > > > Mark proposed a schedule for the next several releases, I'd like to know > if we are going to follow it. > > >> >> Yes, the functions will give warnings otherwise. > > I think this needs to be revisited. I don't think these changes are > necessary for *every* use of macros. It can cause a lot of effort > for people downstream without concrete benefit. > > > The idea is to slowly move towards hiding the innards of the array type. > This has been under discussion since 1.3 came out. It is certainly the > case that not all macros need to go away. > > >> >> That's not as nice to type. >> >> >> So? The point is to have correctness, not ease of typing. > > I'm not sure if a pun was intended there or not. C is not a safe > and fully-typed system. That is one of its weaknesses according > to many. But, I would submit that not being forced to give > everything a "type" (and recognizing the tradeoffs that implies) is > also one reason it gets used. > > > C was famous for bugs due to the lack of function prototypes. This was > fixed with C99 and the stricter typing was a great help. > > > >> >> Is that assuming that PyArray_NDIM will become a function and >> need a specific object type for its argument (and everything >> else cast....). That's one clear disadvantage of inline >> functions versus macros in my mind: no automatic polymorphism. >> >> >> That's a disadvantage of Python. The virtue of inline functions is >> precisely type checking. > > Right, but we need to be more conscientious about this. Not every > use of Macros should be replaced by inline function calls and the > requisite *forced* type-checking. type-chekcing is not > *universally* a virtue --- if it were, nobody would use Python. > >> >> I don't think type safety is a big win for macros like these. >> We need to be more judicious about which macros are >> scheduled for function inlining. Some just don't benefit from >> the type-safety implications as much as others do, and you end >> up requiring everyone to change their code downstream for no >> real reason. >> >> These sorts of changes really feel to me like unnecessary >> spelling changes that require work from extension writers who >> now have to modify their code with no real gain. There seems >> to be a lot of that going on in the code base and I'm not >> really convinced that it's useful for end-users. >> >> >> Good style and type checking are useful. Numpy needs more of both. > > You can assert it, but it doesn't make it so. "Good style" depends > on what you are trying to accomplish and on your point of view. > NumPy's style is not the product of one person, it's been adapted > from multiple styles and inherits quite a bit from Python's style. > I don't make any claims for it other than it allowed me to write it > with the time and experience I had 7 years ago. We obviously > disagree about this point. I'm sorry about that. I'm pretty > flexible usually --- that's probably one of your big criticisms of > my "style". > > > Curiously, my criticism would be more that you are inflexible, slow to > change old habits. > > > But, one of the things I feel quite strongly about is how hard we > make it for NumPy users to upgrade. There are two specific things > I disagree with pretty strongly: > > 1) Changing defined macros that should work the same on > PyArrayObjects or PyObjects to now *require* types --- if we want to > introduce new macros that require types than we can --- as long as > it just provides warnings but still compiles then I suppose I could > find this acceptable. > > 2) Changing MACROS to require semicolons when they were previously > not needed. I'm going to be very hard-nosed about this one. > >> >> I'm going to be a lot more resistant to that sort of change in >> the code base when I see it. >> >> >> Numpy is a team effort. There are people out there who write >> better code than you do, you should learn from them. > > Exactly! It's a team effort. I'm part of that team as well, and > while I don't always have strong opinions about things. When I do, > I'm going to voice it. > > I've learned long ago there are people that write better code than > me. There are people that write better code than you. > > > Of course. Writing code is not my profession, and even if it were, there > are people out there who would be immeasurable better. I have tried to > improve my style over the years by reading books and browsing code by > people who are better than me. I also recognize common bad habits naive > coders tend to pick up when they start out, not least because I have at > one time or another had many of the same bad habits. > > That is not the question here at all. The question here is not > requiring a *re-write* of code in order to get their extensions to > compile using NumPy headers. We should not be making people > change their code to get their extensions to compile in NumPy 1.X > > > I think a bit of rewrite here and there along the way is more palatable > than a big change coming in as one big lump, especially if the changes > are done with a long term goal in mind. We are working towards a Numpy > 2, but we can't just go off for a year or two and write it, we have to > get there step by step. And that requires a plan. To me you sound like you expect that people just need to change, say, PyArray_SHAPE(obj) to PyArray_SHAPE((PyArrayObject*)obj) But that's not the reality. The reality is that most users of the NumPy C API are required to do: #if WHATEVERNUMPYVERSIONDEFINE > 0x... PyArray_SHAPE(obj) #else PyArray_SHAPE((PyArrayObject*)obj) #endif or, perhaps, PyArray_SHAPE(CAST_IF_NEW_NUMPY obj). Or perhaps write a shim wrapper to insulate themselves from the NumPy API. At least if you want to cleanly compile against all the last ~3 versions of NumPy cleanly without warnings -- which any good developer wishes (unless there are *features* in newer versions that make a hard dependency on the newest version logical). Thus, cleaning up the NumPy API makes users' code much more ugly and difficult to read. "Gradual changes along the way" means there will be lots of different #if tests like that, which is at least harder to remember and work with than a single #if test for 1.x vs 2.x. Dag From d.s.seljebotn at astro.uio.no Sat Jun 23 03:34:28 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 23 Jun 2012 09:34:28 +0200 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: <4FE570F8.9060307@astro.uio.no> References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <4FE570F8.9060307@astro.uio.no> Message-ID: <4FE57184.5000407@astro.uio.no> On 06/23/2012 09:32 AM, Dag Sverre Seljebotn wrote: > On 06/23/2012 05:14 AM, Charles R Harris wrote: >> >> >> On Fri, Jun 22, 2012 at 2:42 PM, Travis Oliphant> > wrote: >> >>> >>> The usual practice is to announce a schedule first. >> >> I just did announce the schedule. >> >> >> What has been done in the past is that an intent to fork is announced >> some two weeks in advance so that people can weigh in on what needs to >> be done before the fork. The immediate fork was a bit hasty. Likewise, >> when I suggested going to the github issue tracking, I opened a >> discussion on needed tags, but voila, there it was with an incomplete >> set and no discussion. That to seemed hasty. >> >>> >>> There is time before the first Release candidate to make >>> changes on the 1.7.x branch. If you want to make the changes >>> on master, and just indicate the Pull requests, Ondrej can >>> make sure they are added to the 1.7.x. branch by Monday. We >>> can also delay the first Release Candidate by a few days to >>> next Wednesday and then bump everything 3 days if that will >>> help. There will be a follow-on 1.8 release before the end >>> of the year --- so there is time to make changes for that >>> release as well. The next release will not take a year to >>> get out, so we shouldn't feel pressured to get *everything* in >>> this release. >>> >>> >>> What are we going to do for 1.8? >> >> Let's get 1.7 out the door first. >> >> >> Mark proposed a schedule for the next several releases, I'd like to know >> if we are going to follow it. >> >> >>> >>> Yes, the functions will give warnings otherwise. >> >> I think this needs to be revisited. I don't think these changes are >> necessary for *every* use of macros. It can cause a lot of effort >> for people downstream without concrete benefit. >> >> >> The idea is to slowly move towards hiding the innards of the array type. >> This has been under discussion since 1.3 came out. It is certainly the >> case that not all macros need to go away. >> >> >>> >>> That's not as nice to type. >>> >>> >>> So? The point is to have correctness, not ease of typing. >> >> I'm not sure if a pun was intended there or not. C is not a safe >> and fully-typed system. That is one of its weaknesses according >> to many. But, I would submit that not being forced to give >> everything a "type" (and recognizing the tradeoffs that implies) is >> also one reason it gets used. >> >> >> C was famous for bugs due to the lack of function prototypes. This was >> fixed with C99 and the stricter typing was a great help. >> >> >> >>> >>> Is that assuming that PyArray_NDIM will become a function and >>> need a specific object type for its argument (and everything >>> else cast....). That's one clear disadvantage of inline >>> functions versus macros in my mind: no automatic polymorphism. >>> >>> >>> That's a disadvantage of Python. The virtue of inline functions is >>> precisely type checking. >> >> Right, but we need to be more conscientious about this. Not every >> use of Macros should be replaced by inline function calls and the >> requisite *forced* type-checking. type-chekcing is not >> *universally* a virtue --- if it were, nobody would use Python. >> >>> >>> I don't think type safety is a big win for macros like these. >>> We need to be more judicious about which macros are >>> scheduled for function inlining. Some just don't benefit from >>> the type-safety implications as much as others do, and you end >>> up requiring everyone to change their code downstream for no >>> real reason. >>> >>> These sorts of changes really feel to me like unnecessary >>> spelling changes that require work from extension writers who >>> now have to modify their code with no real gain. There seems >>> to be a lot of that going on in the code base and I'm not >>> really convinced that it's useful for end-users. >>> >>> >>> Good style and type checking are useful. Numpy needs more of both. >> >> You can assert it, but it doesn't make it so. "Good style" depends >> on what you are trying to accomplish and on your point of view. >> NumPy's style is not the product of one person, it's been adapted >> from multiple styles and inherits quite a bit from Python's style. >> I don't make any claims for it other than it allowed me to write it >> with the time and experience I had 7 years ago. We obviously >> disagree about this point. I'm sorry about that. I'm pretty >> flexible usually --- that's probably one of your big criticisms of >> my "style". >> >> >> Curiously, my criticism would be more that you are inflexible, slow to >> change old habits. >> >> >> But, one of the things I feel quite strongly about is how hard we >> make it for NumPy users to upgrade. There are two specific things >> I disagree with pretty strongly: >> >> 1) Changing defined macros that should work the same on >> PyArrayObjects or PyObjects to now *require* types --- if we want to >> introduce new macros that require types than we can --- as long as >> it just provides warnings but still compiles then I suppose I could >> find this acceptable. >> >> 2) Changing MACROS to require semicolons when they were previously >> not needed. I'm going to be very hard-nosed about this one. >> >>> >>> I'm going to be a lot more resistant to that sort of change in >>> the code base when I see it. >>> >>> >>> Numpy is a team effort. There are people out there who write >>> better code than you do, you should learn from them. >> >> Exactly! It's a team effort. I'm part of that team as well, and >> while I don't always have strong opinions about things. When I do, >> I'm going to voice it. >> >> I've learned long ago there are people that write better code than >> me. There are people that write better code than you. >> >> >> Of course. Writing code is not my profession, and even if it were, there >> are people out there who would be immeasurable better. I have tried to >> improve my style over the years by reading books and browsing code by >> people who are better than me. I also recognize common bad habits naive >> coders tend to pick up when they start out, not least because I have at >> one time or another had many of the same bad habits. >> >> That is not the question here at all. The question here is not >> requiring a *re-write* of code in order to get their extensions to >> compile using NumPy headers. We should not be making people >> change their code to get their extensions to compile in NumPy 1.X >> >> >> I think a bit of rewrite here and there along the way is more palatable >> than a big change coming in as one big lump, especially if the changes >> are done with a long term goal in mind. We are working towards a Numpy >> 2, but we can't just go off for a year or two and write it, we have to >> get there step by step. And that requires a plan. > > To me you sound like you expect that people just need to change, say, > > PyArray_SHAPE(obj) > > to > > PyArray_SHAPE((PyArrayObject*)obj) > > But that's not the reality. The reality is that most users of the NumPy > C API are required to do: > > #if WHATEVERNUMPYVERSIONDEFINE> 0x... > PyArray_SHAPE(obj) > #else > PyArray_SHAPE((PyArrayObject*)obj) > #endif > > or, perhaps, PyArray_SHAPE(CAST_IF_NEW_NUMPY obj). Whoops. Terribly sorry, bad example -- I guess fixes to the users code would make it work with any NumPy version. And I guess an extra semicolon never hurts for the macros either? So by now I wish I could retract that post. Realized it five seconds too late :-) Dag > > Or perhaps write a shim wrapper to insulate themselves from the NumPy API. > > At least if you want to cleanly compile against all the last ~3 versions > of NumPy cleanly without warnings -- which any good developer wishes > (unless there are *features* in newer versions that make a hard > dependency on the newest version logical). Thus, cleaning up the NumPy > API makes users' code much more ugly and difficult to read. > > "Gradual changes along the way" means there will be lots of different > #if tests like that, which is at least harder to remember and work with > than a single #if test for 1.x vs 2.x. > > Dag > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From thouis at gmail.com Sat Jun 23 05:03:08 2012 From: thouis at gmail.com (Thouis (Ray) Jones) Date: Sat, 23 Jun 2012 11:03:08 +0200 Subject: [Numpy-discussion] Issue tracking In-Reply-To: References: <3C64F2BE-50E5-403C-9022-71233A6E3449@continuum.io> Message-ID: On Fri, Jun 22, 2012 at 7:29 PM, Ralf Gommers wrote: > > > On Fri, Jun 22, 2012 at 9:49 AM, Thouis (Ray) Jones > wrote: >> >> On Mon, Jun 4, 2012 at 7:43 PM, Travis Oliphant >> wrote: >> > I have turned on issue tracking and started a few labels. ? Feel free to >> > add >> > more / adjust the names as appropriate. ? ? I am trying to find someone >> > who >> > can help manage the migration from Trac. >> >> Are the github issues set up sufficiently for Trac to be disabled and >> github to take over? > > > You lost me here. You were going to set up a test site where we could see > the Trac --> Github conversion could be tested, before actually pushing that > conversion to the numpy Github repo. If you sent a message that that was > ready, I must have missed it. > > The current state of labels on https://github.com/numpy/numpy/issues is also > far from complete (no prios, components). I wasn't completely clear. What I meant to ask: "Are the github issues (and labels) set up well enough for Trac to be disabled for accepting new bugs and to point users filing new bugs to github instead?" (The answer to which is "no", based on your reply). I was under the impression that github issues could become the default for new bugs even before the old bugs were moved, but perhaps I misunderstood. I can see arguments for and against this. The primary argument in favor is that it would be easier to transition old bugs to a known set of labels, rather than trying to define the labels at the same time as moving the bugs. (This is more a concern on my part about stepping on toes than a difficulty in knowing what labels are needed, though.) Ray Jones From ralf.gommers at googlemail.com Sat Jun 23 05:13:23 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 23 Jun 2012 11:13:23 +0200 Subject: [Numpy-discussion] Issue tracking In-Reply-To: References: <3C64F2BE-50E5-403C-9022-71233A6E3449@continuum.io> Message-ID: On Sat, Jun 23, 2012 at 11:03 AM, Thouis (Ray) Jones wrote: > On Fri, Jun 22, 2012 at 7:29 PM, Ralf Gommers > wrote: > > > > > > On Fri, Jun 22, 2012 at 9:49 AM, Thouis (Ray) Jones > > wrote: > >> > >> On Mon, Jun 4, 2012 at 7:43 PM, Travis Oliphant > >> wrote: > >> > I have turned on issue tracking and started a few labels. Feel free > to > >> > add > >> > more / adjust the names as appropriate. I am trying to find > someone > >> > who > >> > can help manage the migration from Trac. > >> > >> Are the github issues set up sufficiently for Trac to be disabled and > >> github to take over? > > > > > > You lost me here. You were going to set up a test site where we could see > > the Trac --> Github conversion could be tested, before actually pushing > that > > conversion to the numpy Github repo. If you sent a message that that was > > ready, I must have missed it. > > > > The current state of labels on https://github.com/numpy/numpy/issues is > also > > far from complete (no prios, components). > > I wasn't completely clear. What I meant to ask: > > "Are the github issues (and labels) set up well enough for Trac to be > disabled for accepting new bugs and to point users filing new bugs to > github instead?" > > (The answer to which is "no", based on your reply). > I don't think it's a problem that a few issues have already been filed on Github, but we'll have to properly label them by hand later. Making Github the default or only option now would be a bit strange. It would be better to first do the conversion, or at least have it far enough along that we have agreed on workflow and labels to use. Ralf > I was under the impression that github issues could become the default > for new bugs even before the old bugs were moved, but perhaps I > misunderstood. I can see arguments for and against this. The primary > argument in favor is that it would be easier to transition old bugs to > a known set of labels, rather than trying to define the labels at the > same time as moving the bugs. (This is more a concern on my part > about stepping on toes than a difficulty in knowing what labels are > needed, though.) > > Ray Jones > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thouis at gmail.com Sat Jun 23 05:23:14 2012 From: thouis at gmail.com (Thouis (Ray) Jones) Date: Sat, 23 Jun 2012 11:23:14 +0200 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> Message-ID: On Sat, Jun 23, 2012 at 5:14 AM, Charles R Harris wrote: > What has been done in the past is that an intent to fork is announced some > two weeks in advance so that people can weigh in on what needs to be done > before the fork. The immediate fork was a bit hasty. Likewise, when I > suggested going to the github issue tracking, I opened a discussion on > needed tags, but voila, there it was with an incomplete set and no > discussion. That to seemed hasty. I don't have a particular dog in this fight, but it seems like neither creating the fork nor turning on issues are worth disagreeing to much about. There's going to be a 1.7 fork sometime soon, and whether it gets created now or after discussion seems mostly academic. Even if there were changes that needed to go into both branches, git makes that straightforward. Likewise github issues. Turning them on has minimal cost, especially given that pull requests already go through github, and gives another route for bug reporting and a way to experiment with issues to inform the discussion. > [...] > Most folks aren't going to transition from MATLAB or IDL. Engineers tend to > stick with the tools they learned in school, they aren't interested in the > tool itself as long as they can get their job done. And getting the job done > is what they are paid for. That said, I doubt they would have much problem > making the adjustment if they were inclined to switch tools. > [...] My own experience is the opposite. Most programmers/engineers I've worked with are happy to transition away from Matlab, but part of why they're willing to is that it's not that difficult to retarget Matlab knowledge onto numpy/scipy/matplotlib knowledge. Making that transition as easy as possible (as I think matplotlib does particularly well) is a good goal. I agree that getting the job done is what they're paid for, but python/numpy/scipy/matplotlib allow them to get that job done much faster and more easily. Ray Jones From thouis at gmail.com Sat Jun 23 05:30:04 2012 From: thouis at gmail.com (Thouis (Ray) Jones) Date: Sat, 23 Jun 2012 11:30:04 +0200 Subject: [Numpy-discussion] Issue tracking In-Reply-To: References: <3C64F2BE-50E5-403C-9022-71233A6E3449@continuum.io> Message-ID: On Sat, Jun 23, 2012 at 11:13 AM, Ralf Gommers wrote: > > > On Sat, Jun 23, 2012 at 11:03 AM, Thouis (Ray) Jones > wrote: >> >> On Fri, Jun 22, 2012 at 7:29 PM, Ralf Gommers >> wrote: >> > >> > >> > On Fri, Jun 22, 2012 at 9:49 AM, Thouis (Ray) Jones >> > wrote: >> >> >> >> On Mon, Jun 4, 2012 at 7:43 PM, Travis Oliphant >> >> wrote: >> >> > I have turned on issue tracking and started a few labels. ? Feel free >> >> > to >> >> > add >> >> > more / adjust the names as appropriate. ? ? I am trying to find >> >> > someone >> >> > who >> >> > can help manage the migration from Trac. >> >> >> >> Are the github issues set up sufficiently for Trac to be disabled and >> >> github to take over? >> > >> > >> > You lost me here. You were going to set up a test site where we could >> > see >> > the Trac --> Github conversion could be tested, before actually pushing >> > that >> > conversion to the numpy Github repo. If you sent a message that that was >> > ready, I must have missed it. >> > >> > The current state of labels on https://github.com/numpy/numpy/issues is >> > also >> > far from complete (no prios, components). >> >> I wasn't completely clear. ?What I meant to ask: >> >> "Are the github issues (and labels) set up well enough for Trac to be >> disabled for accepting new bugs and to point users filing new bugs to >> github instead?" >> >> (The answer to which is "no", based on your reply). > > > I don't think it's a problem that a few issues have already been filed on > Github, but we'll have to properly label them by hand later. > > Making Github the default or only option now would be a bit strange. It > would be better to first do the conversion, or at least have it far enough > along that we have agreed on workflow and labels to use. My concern is that transitioning first would define the workflow/labels based on what's in Trac, rather than on what would work best with github. But maybe the best way to move things forward is to do the transition to a test project, and see what comes out. Ray Jones From charlesr.harris at gmail.com Sat Jun 23 08:01:09 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 23 Jun 2012 06:01:09 -0600 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> Message-ID: On Sat, Jun 23, 2012 at 3:23 AM, Thouis (Ray) Jones wrote: > On Sat, Jun 23, 2012 at 5:14 AM, Charles R Harris > wrote: > > What has been done in the past is that an intent to fork is announced > some > > two weeks in advance so that people can weigh in on what needs to be done > > before the fork. The immediate fork was a bit hasty. Likewise, when I > > suggested going to the github issue tracking, I opened a discussion on > > needed tags, but voila, there it was with an incomplete set and no > > discussion. That to seemed hasty. > > I don't have a particular dog in this fight, but it seems like neither > creating the fork nor turning on issues are worth disagreeing to much > about. There's going to be a 1.7 fork sometime soon, and whether it > gets created now or after discussion seems mostly academic. Even if > there were changes that needed to go into both branches, git makes > that straightforward. Likewise github issues. Turning them on has > minimal cost, especially given that pull requests already go through > github, and gives another route for bug reporting and a way to > experiment with issues to inform the discussion. > > > [...] > > Most folks aren't going to transition from MATLAB or IDL. Engineers tend > to > > stick with the tools they learned in school, they aren't interested in > the > > tool itself as long as they can get their job done. And getting the job > done > > is what they are paid for. That said, I doubt they would have much > problem > > making the adjustment if they were inclined to switch tools. > > [...] > > My own experience is the opposite. Most programmers/engineers I've > worked with are happy to transition away from Matlab, but part of why > they're willing to is that it's not that difficult to retarget Matlab > knowledge onto numpy/scipy/matplotlib knowledge. Making that > transition as easy as possible (as I think matplotlib does > particularly well) is a good goal. I agree that getting the job done > is what they're paid for, but python/numpy/scipy/matplotlib allow them > to get that job done much faster and more easily. > Haven't seen that myself. When engineers' time is paid for out of contracts and the work is on a schedule, they generally don't have the time to chase after new things unless the payoff is substantial. Matlab is also widely used for rapid prototyping of control systems with the models then translated to run on the actual hardware. Not to mention that device makers often provide Simulink models of their hardware.That sort of thing is not available in Python. I agree that there are many places that Python could work better, but the old rule of thumb is that the effective savings need to be on the order of a factor of ten to drive a new technology takeover of a widespread existing technology. Of course, that doesn't hold for new markets. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Jun 23 08:12:29 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 23 Jun 2012 06:12:29 -0600 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> Message-ID: On Sat, Jun 23, 2012 at 3:23 AM, Thouis (Ray) Jones wrote: > On Sat, Jun 23, 2012 at 5:14 AM, Charles R Harris > wrote: > > What has been done in the past is that an intent to fork is announced > some > > two weeks in advance so that people can weigh in on what needs to be done > > before the fork. The immediate fork was a bit hasty. Likewise, when I > > suggested going to the github issue tracking, I opened a discussion on > > needed tags, but voila, there it was with an incomplete set and no > > discussion. That to seemed hasty. > > I don't have a particular dog in this fight, but it seems like neither > creating the fork nor turning on issues are worth disagreeing to much > about. There's going to be a 1.7 fork sometime soon, and whether it > gets created now or after discussion seems mostly academic. Even if > there were changes that needed to go into both branches, git makes > that straightforward. Likewise github issues. Turning them on has > minimal cost, especially given that pull requests already go through > github, and gives another route for bug reporting and a way to > experiment with issues to inform the discussion. > >From my point of view, the haste seems to be driven by SciPy2012. And why the rush after we have wasted three months running in circles for lack of a decision, with Mark and Nathaniel sent off to write a report that had no impact on the final outcome. The github thing also ended the thread and now someone has to clean up the result. It also appears that that work is being done by request rather than by a volunteer, that has subtle implications in the long run. Things have been happening by fits and starts, with issues picked up and than dropped half done. That isn't a good way to move forward. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Jun 24 07:31:19 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 24 Jun 2012 13:31:19 +0200 Subject: [Numpy-discussion] Issue tracking In-Reply-To: References: <3C64F2BE-50E5-403C-9022-71233A6E3449@continuum.io> Message-ID: On Sat, Jun 23, 2012 at 11:30 AM, Thouis (Ray) Jones wrote: > On Sat, Jun 23, 2012 at 11:13 AM, Ralf Gommers > wrote: > > > > > > On Sat, Jun 23, 2012 at 11:03 AM, Thouis (Ray) Jones > > wrote: > >> > >> On Fri, Jun 22, 2012 at 7:29 PM, Ralf Gommers > >> wrote: > >> > > >> > > >> > On Fri, Jun 22, 2012 at 9:49 AM, Thouis (Ray) Jones > > >> > wrote: > >> >> > >> >> On Mon, Jun 4, 2012 at 7:43 PM, Travis Oliphant > > >> >> wrote: > >> >> > I have turned on issue tracking and started a few labels. Feel > free > >> >> > to > >> >> > add > >> >> > more / adjust the names as appropriate. I am trying to find > >> >> > someone > >> >> > who > >> >> > can help manage the migration from Trac. > >> >> > >> >> Are the github issues set up sufficiently for Trac to be disabled and > >> >> github to take over? > >> > > >> > > >> > You lost me here. You were going to set up a test site where we could > >> > see > >> > the Trac --> Github conversion could be tested, before actually > pushing > >> > that > >> > conversion to the numpy Github repo. If you sent a message that that > was > >> > ready, I must have missed it. > >> > > >> > The current state of labels on https://github.com/numpy/numpy/issuesis > >> > also > >> > far from complete (no prios, components). > >> > >> I wasn't completely clear. What I meant to ask: > >> > >> "Are the github issues (and labels) set up well enough for Trac to be > >> disabled for accepting new bugs and to point users filing new bugs to > >> github instead?" > >> > >> (The answer to which is "no", based on your reply). > > > > > > I don't think it's a problem that a few issues have already been filed on > > Github, but we'll have to properly label them by hand later. > > > > Making Github the default or only option now would be a bit strange. It > > would be better to first do the conversion, or at least have it far > enough > > along that we have agreed on workflow and labels to use. > > My concern is that transitioning first would define the > workflow/labels based on what's in Trac, rather than on what would > work best with github. Trac is not unique, most bug trackers have similar concepts (milestones, components, prios, issue types). > But maybe the best way to move things forward > is to do the transition to a test project, and see what comes out. > +1 Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Mon Jun 25 00:09:17 2012 From: travis at continuum.io (Travis Oliphant) Date: Sun, 24 Jun 2012 23:09:17 -0500 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> Message-ID: > > What has been done in the past is that an intent to fork is announced some two weeks in advance so that people can weigh in on what needs to be done before the fork. The immediate fork was a bit hasty. Likewise, when I suggested going to the github issue tracking, I opened a discussion on needed tags, but voila, there it was with an incomplete set and no discussion. That to seemed hasty. My style is just different. I like to do things and then if discussion requires an alteration, we alter. It is just a different style. The labels can be altered, they are not set in stone. I prefer to have something to talk about and a starting point to alter from --- especially on potential bike-shedding discussions. There are several people that can make changes to the labels. If we have difficulty agreeing then we can go from that point. >> >> There is time before the first Release candidate to make changes on the 1.7.x branch. If you want to make the changes on master, and just indicate the Pull requests, Ondrej can make sure they are added to the 1.7.x. branch by Monday. We can also delay the first Release Candidate by a few days to next Wednesday and then bump everything 3 days if that will help. There will be a follow-on 1.8 release before the end of the year --- so there is time to make changes for that release as well. The next release will not take a year to get out, so we shouldn't feel pressured to get *everything* in this release. >> >> What are we going to do for 1.8? > > Let's get 1.7 out the door first. > > Mark proposed a schedule for the next several releases, I'd like to know if we are going to follow it. We should discuss it again. I don't recall the specifics and I believe it was just a proposal. I do not recall much feedback on it. > >> >> Yes, the functions will give warnings otherwise. > > I think this needs to be revisited. I don't think these changes are necessary for *every* use of macros. It can cause a lot of effort for people downstream without concrete benefit. > > The idea is to slowly move towards hiding the innards of the array type. This has been under discussion since 1.3 came out. It is certainly the case that not all macros need to go away. I know it's been under discussion, but it looks like a lot of changes were made just last year (and I am just starting to understand the implications of all those changes). I think there are many NumPy users that will be in the same position over the coming years. This is a bit more than just hiding the array innards. The array innards have been "hidden" by using the macros since NumPy 1.0. There was a specific intent to create macros for all array access and encourage use of those macros --- precisely so that the array object could change. The requirement of ABI compatibility was not pre-envisioned in NumPy 1.0 Neither was NumPy 1.0 trying to provide type-safety in all cases. I don't recall a discussion on the value of having macros that can be interpreted at least as both PyObject * and PyArrayObject *. Perhaps this is possible, and I just need to be educated. But, my opinion is that it's not always useful to require type-casting especially between those two. > >> >> That's not as nice to type. >> >> So? The point is to have correctness, not ease of typing. > > I'm not sure if a pun was intended there or not. C is not a safe and fully-typed system. That is one of its weaknesses according to many. But, I would submit that not being forced to give everything a "type" (and recognizing the tradeoffs that implies) is also one reason it gets used. > > C was famous for bugs due to the lack of function prototypes. This was fixed with C99 and the stricter typing was a great help. Bugs are not "due to lack of function prototypes". Bugs are due to mistakes that programmers make (and I know all about mistakes programmers make). Function prototypes can help detect some kinds of mistakes which is helpful. But, this doesn't help the question of how to transition a weakly-typed program or whether or not that is even a useful exercise. > > >> >> Is that assuming that PyArray_NDIM will become a function and need a specific object type for its argument (and everything else cast....). That's one clear disadvantage of inline functions versus macros in my mind: no automatic polymorphism. >> >> That's a disadvantage of Python. The virtue of inline functions is precisely type checking. > > Right, but we need to be more conscientious about this. Not every use of Macros should be replaced by inline function calls and the requisite *forced* type-checking. type-chekcing is not *universally* a virtue --- if it were, nobody would use Python. > >> >> I don't think type safety is a big win for macros like these. We need to be more judicious about which macros are scheduled for function inlining. Some just don't benefit from the type-safety implications as much as others do, and you end up requiring everyone to change their code downstream for no real reason. >> >> These sorts of changes really feel to me like unnecessary spelling changes that require work from extension writers who now have to modify their code with no real gain. There seems to be a lot of that going on in the code base and I'm not really convinced that it's useful for end-users. >> >> Good style and type checking are useful. Numpy needs more of both. > > You can assert it, but it doesn't make it so. "Good style" depends on what you are trying to accomplish and on your point of view. NumPy's style is not the product of one person, it's been adapted from multiple styles and inherits quite a bit from Python's style. I don't make any claims for it other than it allowed me to write it with the time and experience I had 7 years ago. We obviously disagree about this point. I'm sorry about that. I'm pretty flexible usually --- that's probably one of your big criticisms of my "style". > > Curiously, my criticism would be more that you are inflexible, slow to change old habits. I don't mind changing old habits at all. In fact, I don't think you know me very well if that's your take. You have a very narrow window into my activity and behavior. Of course habits are always hard to change (that's why they call them habits). Mostly, I need to be convinced of the value of changing old patterns --- just like everyone else (including existing NumPy users). On the type-question, I'm just not convinced that the most pressing matter in NumPy and SciPy is to re-write existing code to be more strictly typed. I'm quite open to other view points on that --- as long as backward compatibility is preserved, or a clear upgrade story is provided to existing users. > > > But, one of the things I feel quite strongly about is how hard we make it for NumPy users to upgrade. There are two specific things I disagree with pretty strongly: > > 1) Changing defined macros that should work the same on PyArrayObjects or PyObjects to now *require* types --- if we want to introduce new macros that require types than we can --- as long as it just provides warnings but still compiles then I suppose I could find this acceptable. > > 2) Changing MACROS to require semicolons when they were previously not needed. I'm going to be very hard-nosed about this one. > >> >> I'm going to be a lot more resistant to that sort of change in the code base when I see it. >> >> Numpy is a team effort. There are people out there who write better code than you do, you should learn from them. > > Exactly! It's a team effort. I'm part of that team as well, and while I don't always have strong opinions about things. When I do, I'm going to voice it. > > I've learned long ago there are people that write better code than me. There are people that write better code than you. > > Of course. Writing code is not my profession, and even if it were, there are people out there who would be immeasurable better. I have tried to improve my style over the years by reading books and browsing code by people who are better than me. I also recognize common bad habits naive coders tend to pick up when they start out, not least because I have at one time or another had many of the same bad habits. We are really quite a like here. I have done and continue to do exactly the same thing. My priorities are just different. I don't believe it is universally useful to alter patterns in existing code. I have typically adapted my style to the code I'm working with. Numeric had a style which I adapted to. Python has a style which I adapted to. I think that people reading code and seeing multiple styles will find the code harder to read. Such changes of style take work, and quite often the transition is not worth the effort. I have not been nor will I continue to be in opposition to changes that improve things (under any developers notion of "improvement"). The big exception to that is when it seems to me that the changes will make it more difficult for existing users to use their code. I know you are trying to make it easier for NumPy developers as you understand it. I really admire you for doing what you feel strongly about. I think we are both in our way trying to encourage more NumPy developers (you by making the code easier to get in to) and me by trying to figure out acceptable ways to fund them. I just think that we must recognize the users out there who have written to the existing NumPy interface. Any change that requires effort from users should be met with skepticism. We can write new interfaces and encourage new users to use those new interfaces. We can even re-write NumPy internals to use those interfaces. But, we can't just change documented interfaces (and even be careful about undocumented but implied interfaces -- I agree that this gets difficult to really adhere to, but we can and should try at least for heavily used code-paths). One thing I'm deeply aware of is the limited audience of this list compared to the user base of NumPy and the intertia of old NumPy releases. Discussions on this list are just not visible to the wider user base. My recent activity and interest is in protecting that user-base from the difficulties that recent changes are going to be on people upgrading from 1.5. My failing last year was to encourage and pay for (through Enthought) Mark's full-time activity on this list but not have the time to provide enough guidance to him about my understanding of the implications of his changes and think hard enough about those to understand them in the time. > > That is not the question here at all. The question here is not requiring a *re-write* of code in order to get their extensions to compile using NumPy headers. We should not be making people change their code to get their extensions to compile in NumPy 1.X > > I think a bit of rewrite here and there along the way is more palatable than a big change coming in as one big lump, especially if the changes are done with a long term goal in mind. We are working towards a Numpy 2, but we can't just go off for a year or two and write it, we have to get there step by step. And that requires a plan. We see things a little differently on that front, I think. A bit of re-write here and there for down-stream users is exactly the wrong approach in my view. I think it depends on the user. For one who is tracking every NumPy release and has time to make any and all changes needed, I think you are right --- that approach will work for them. However, there are people out there who are using NumPy in ways (either significantly or only indirectly) where having to change *any* code from one release to another will make them seriously annoyed and we will start losing users. > > >> >> >> One particularly glaring example to my lens on the world: I think it would have been better to define new macros which require semicolons than changing the macros that don't require semicolons to now require semicolons: >> >> NPY_BEGIN_THREADS_DEF >> NPY_BEGIN_THREADS >> NPY_ALLOW_C_API >> NPY_ALLOW_C_API_DEF >> NPY_DISABLE_C_API >> >> That feels like a gratuitous style change that will force users of those macros to re-write their code. >> >> It doesn't seem to be much of a problem. > > Unfortunately, I don't trust your judgment on that. My experience and understanding tells a much different story. I'm sorry if you disagree with me. > > > I'm sorry I made you sorry ;) The problem here is that you don't come forth with specifics. People tell you things, but you don't say who or what their specific problem was. Part of working with a team is keeping folks informed, it isn't that useful to appeal to authority. I watch the list, which is admittedly a small window into the community, and I haven't seen show stoppers. Bugs, sure, but that isn't the same thing. I came up with a very specific thing. I'm not sure what you are talking about. If you are talking about discussions with people off list, then I can't speak for them unless they have allowed me to. I encourage them to speak up here as often as they can. Yes, you will have to trust that a little bit of concern might just be an iceberg waiting to sink the ship. >> >> Sure, it's a simple change, but it's a simple change that doesn't do anything for you as an end user. I think I'm going to back this change out, in fact. I can't see requiring people to change their C-code like this will require without a clear benefit to them. I'm quite sure there is code out there that uses these documented APIs (without the semicolon). If we want to define new macros that require colons, then we do that, but we can't get rid of the old ones --- especially in a 1.x release. >> >> Our policy should not be to allow gratuitous style changes just because we think something is prettier another way. The NumPy code base has come from multiple sources and reflects several styles. It also follows an older style of C-programming (that is quite common in the Python code base). It can be changed, but those changes shouldn't be painful for a library user without some specific gain for them that the change allows. >> >> >> You use that word 'gratuitous' a lot, I don't think it means what you think it means. For instance, the new polynomial coefficient order wasn't gratuitous, it was doing things in a way many found more intuitive and generalized better to different polynomial basis. People >> have different ideas, that doesn't make them gratuitous. > > That's a slightly different issue. At least you created a new object and api which is a *little* better. My complaint about the choice there is now there *must* be two interfaces and added confusion as people will have to figure out which assumption is being used. I don't really care about the coefficient order --- really I don't. Either one is fine in my mind. I recognize the reasons. The problem is *changing* it without a *really* good reason. Now, we have to have two different APIs. I would much preferred to have poly1d disappear and just use your much nicer polynomial classes. Now, it can't and we are faced with a user-story that is either difficult for someone transitioning from MATLAB > > Most folks aren't going to transition from MATLAB or IDL. Engineers tend to stick with the tools they learned in school, they aren't interested in the tool itself as long as they can get their job done. And getting the job done is what they are paid for. That said, I doubt they would have much problem making the adjustment if they were inclined to switch tools. I don't share your pessimism. You really think that "most folks aren't going to transition". It's happening now. It's been happening for several years. > > or a "why did you do that?" puzzled look from a new user as to why we support both coefficient orders. Of course, that could be our story --- hey we support all kinds of orders, it doesn't really matter, you just have to tell us what you mean when passing in an unadorned array of coefficients. But, this is a different issue. > > I'm using the word 'gratuitous' to mean that it is "uncalled for and lacks a good reason". There needs to be much better reasons given for code changes that require someone to re-write working code than "it's better style" or even "it will help new programmers avoid errors". Let's write another interface that new programmers can use that fits the world the way you see it, don't change what's already working just because you don't like it or wish a different choice had been made. > > Well, and that was exactly what you meant when you called to coefficient order 'gratuitous' in your first post to me about it. The problem was that you didn't understand why I made the change until I explained it, but rather made the charge sans explanation. It might be that some of the other things you call gratuitous are less so than you think. These are hasty judgements I think. I'm sure we all have our share of hasty judgments to go around. Even after your explanation, I still disagree with it. But, I appreciate the reminder to give you the benefit of the doubt when I encounter something that makes me raise my eyebrows. I hope you will do the same. > > >> >> There are significant users of NumPy out there still on 1.4. Even the policy of deprecation that has been discussed will not help people trying to upgrade from 1.4 to 1.8. They will be forced to upgrade multiple times. The easier we can make this process for users the better. I remain convinced that it's better and am much more comfortable with making a release that requires a re-compile (that will succeed without further code changes --- because of backward compatibility efforts) than to have supposed ABI compatibility with subtle semantic changes and required C-code changes when you do happen to re-compile. >> >> >> Cleanups need to be made bit by bit. I don't think we have done anything that will cause undo trouble. > > I disagree substantially on the impact of these changes. You can disagree about my awareness of NumPy users, but I think I understand a large number of them and why NumPy has been successful in getting users. I agree that we have been unsuccessful at getting serious developers and I'm convinced by you and Mark as to why that is. But, we can't sacrifice users for the sake of getting developers who will spend their free time trying to get around the organic pile that NumPy is at this point. > > Because of this viewpoint, I think there is some adaptation and cleanup right now, needed, so that significant users of NumPy can upgrade based on the changes that have occurred without causing them annoying errors (even simple changes can be a pain in the neck to fix). > > I do agree changes can be made. I realize you've worked hard to keep the code-base in a state that you find more adequate. I think you go overboard on that front, but I acknowledge that there are people that appreciate this. I do feel very strongly that we should not require users to have to re-write working C-code in order to use a new minor version number in NumPy, regardless of how the code "looks" or how much "better" it is according to some idealized standard. > > The macro changes are border-line (at least I believe code will still compile --- just raise warnings, but I need to be sure about this). The changes that require semi-colons are not acceptable at all. > > I was tempted to back them out myself, but I don't think the upshot will be earth shaking. I think it's important that code using NumPy headers that compiled with 1.5 will compile with 1.7. > > > Look Charles, I believe we can continue to work productively together and our differences can be a strength to the community. I hope you feel the same way. I will continue to respect and listen to your perspective --- especially when I disagree with it. > > Sounds like a threat to me. Who are you to judge? If you are going to be the dictator, let's put that out there and make it official. Wow, charles! I think you should re-read what I wrote. It was not a threat at all. It was an appeal to work more closely together, and a commitment on my end to listen to your point of view and try to sift from any of my own opposition the chaff from the wheat. I am just not thinking in those terms at all. I do not think it is appropriate to talk about a dictator in this context. I have no control over what you do, and you have no control over what I do. We can only work cooperatively or independently for the benefit of NumPy. Perhaps there are things I've said and done that really bother you, or have offended you. I'm sorry for anything I've said that might have grated on you personally. I do appreciate your voice, ability, perspective, and skill. I suspect there are others in the NumPy community that feel the same way. Best regards, -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Mon Jun 25 00:11:42 2012 From: travis at continuum.io (Travis Oliphant) Date: Sun, 24 Jun 2012 23:11:42 -0500 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> Message-ID: On Jun 23, 2012, at 4:23 AM, Thouis (Ray) Jones wrote: > On Sat, Jun 23, 2012 at 5:14 AM, Charles R Harris > wrote: >> What has been done in the past is that an intent to fork is announced some >> two weeks in advance so that people can weigh in on what needs to be done >> before the fork. The immediate fork was a bit hasty. Likewise, when I >> suggested going to the github issue tracking, I opened a discussion on >> needed tags, but voila, there it was with an incomplete set and no >> discussion. That to seemed hasty. > > I don't have a particular dog in this fight, but it seems like neither > creating the fork nor turning on issues are worth disagreeing to much > about. There's going to be a 1.7 fork sometime soon, and whether it > gets created now or after discussion seems mostly academic. Even if > there were changes that needed to go into both branches, git makes > that straightforward. Likewise github issues. Turning them on has > minimal cost, especially given that pull requests already go through > github, and gives another route for bug reporting and a way to > experiment with issues to inform the discussion. Yes, this is exactly my perspective. Let's use the power of github and avoid discussions that don't need to happen and have more of them that do. -Travis From travis at continuum.io Mon Jun 25 00:23:18 2012 From: travis at continuum.io (Travis Oliphant) Date: Sun, 24 Jun 2012 23:23:18 -0500 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> Message-ID: On Jun 23, 2012, at 7:12 AM, Charles R Harris wrote: > > > On Sat, Jun 23, 2012 at 3:23 AM, Thouis (Ray) Jones wrote: > On Sat, Jun 23, 2012 at 5:14 AM, Charles R Harris > wrote: > > What has been done in the past is that an intent to fork is announced some > > two weeks in advance so that people can weigh in on what needs to be done > > before the fork. The immediate fork was a bit hasty. Likewise, when I > > suggested going to the github issue tracking, I opened a discussion on > > needed tags, but voila, there it was with an incomplete set and no > > discussion. That to seemed hasty. > > I don't have a particular dog in this fight, but it seems like neither > creating the fork nor turning on issues are worth disagreeing to much > about. There's going to be a 1.7 fork sometime soon, and whether it > gets created now or after discussion seems mostly academic. Even if > there were changes that needed to go into both branches, git makes > that straightforward. Likewise github issues. Turning them on has > minimal cost, especially given that pull requests already go through > github, and gives another route for bug reporting and a way to > experiment with issues to inform the discussion. > > From my point of view, the haste seems to be driven by SciPy2012. And why the rush after we have wasted three months running in circles for lack of a decision, with Mark and Nathaniel sent off to write a report that had no impact on the final outcome. The github thing also ended the thread and now someone has to clean up the result. It also appears that that work is being done by request rather than by a volunteer, that has subtle implications in the long run. > The report has tremendous impact on the final outcome --- especially because the outcome is not *final*. I think the report helped clarify exactly what the differences were between Mark and Nathaniel's viewpoints and absolutely impacted the outcome for 1.7. I don't agree with your interpretation of events. I'm not sure what is meant by "request rather than volunteer", but I think it has something to do with your perspective on how NumPy should be developed. > Things have been happening by fits and starts, with issues picked up and than dropped half done. That isn't a good way to move forward. > That's the problem with volunteer labor. It's at the whim of the time people have available. The only time it's different is when people have resources to make it different. Issues are picked up when people have the time to pick them up. It turns out that good people are hard to find and it takes time to get them engaged. NumFOCUS is actively raising money to fund technology fellowships in order to provide full-time support to both mentors and students. The hope is that good people who want to continue to help the NumPy project will be found and supported. Best, -Travis > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From klonuo at gmail.com Mon Jun 25 06:27:54 2012 From: klonuo at gmail.com (klo uo) Date: Mon, 25 Jun 2012 12:27:54 +0200 Subject: [Numpy-discussion] Numpy logo in VTK Message-ID: I was reading mayavi documentation and one of the examples (tvtk.ImageData) resembled Numpy logo grid. I added barchart and tweaked a bit colormap and thought to post it for fun: ======================================== import numpy as np from tvtk.api import tvtk from mayavi import mlab def view(dataset): fig = mlab.figure(bgcolor=(1, 1, 1), fgcolor=(0, 0, 0), figure=dataset.class_name[3:]) surf = mlab.pipeline.surface(dataset, opacity=0.2) mlab.pipeline.surface(mlab.pipeline.extract_edges(surf), color=(0, 0, 0), line_width=.1 ) mlab.barchart(n, extent=[0.05, 4.5, 0.05, 4.5, -.35, 1]) n=([[1,0,0,1], [1,0,1,1], [1,1,0,1], [1,0,0,1]]) data = np.random.random((5, 5, 5)) i = tvtk.ImageData(spacing=(1, 1, 1), origin=(0, 0, 0)) i.point_data.scalars = data.ravel() i.point_data.scalars.name = 'scalars' i.dimensions = data.shape view(i) ======================================== Cheers -------------- next part -------------- A non-text attachment was scrubbed... Name: np.vtk.png Type: image/png Size: 51675 bytes Desc: not available URL: From tmp50 at ukr.net Mon Jun 25 07:20:34 2012 From: tmp50 at ukr.net (Dmitrey) Date: Mon, 25 Jun 2012 14:20:34 +0300 Subject: [Numpy-discussion] numpy bug with ndarray subclassing Message-ID: <62086.1340623234.7127925784463409152@ffe2.ukr.net> I will use walkaround but I think you'd better fix the numpy bug: from numpy import ndarray, float64, asanyarray, array class asdf(ndarray): __array_priority__ = 10 def __new__(self, vals1, vals2): obj = asanyarray(vals1).view(self) obj.vals2 = vals2 return obj def __add__(self, other): print('add') assert not isinstance(other , asdf), 'unimplemented' return asdf(self.view(ndarray) + other, self.vals2) def __radd__(self, other): print('radd') assert not isinstance(other , asdf), 'unimplemented' return asdf(self.view(ndarray) + other, self.vals2) a = asdf(array((1, 2, 3)), array((10, 20, 30))) z = float64(1.0) print(a.__array_priority__) # 10 print(z.__array_priority__) # -1000000.0 r2 = a + z print(r2.vals2) # ok, prints 'add' and (10,20,30) r1 = z+a print(r1.vals2) # doesn't print "radd" (i.e. doesn't enters asdf.__radd__ function at all) # raises AttributeError #"'asdf' object has no attribute 'vals2'" tried in Python2 + numpy 1.6.1 and Python3 + numpy 1.7.0 dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Jun 25 12:20:23 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 25 Jun 2012 10:20:23 -0600 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> Message-ID: On Sun, Jun 24, 2012 at 10:09 PM, Travis Oliphant wrote: > > > What has been done in the past is that an intent to fork is announced some > two weeks in advance so that people can weigh in on what needs to be done > before the fork. The immediate fork was a bit hasty. Likewise, when I > suggested going to the github issue tracking, I opened a discussion on > needed tags, but voila, there it was with an incomplete set and no > discussion. That to seemed hasty. > > > My style is just different. I like to do things and then if discussion > requires an alteration, we alter. It is just a different style. The > labels can be altered, they are not set in stone. I prefer to have > something to talk about and a starting point to alter from --- especially > on potential bike-shedding discussions. There are several people that can > make changes to the labels. If we have difficulty agreeing then we can > go from that point. > > >> >>> There is time before the first Release candidate to make changes on the >>> 1.7.x branch. If you want to make the changes on master, and just >>> indicate the Pull requests, Ondrej can make sure they are added to the >>> 1.7.x. branch by Monday. We can also delay the first Release Candidate >>> by a few days to next Wednesday and then bump everything 3 days if that >>> will help. There will be a follow-on 1.8 release before the end of the >>> year --- so there is time to make changes for that release as well. The >>> next release will not take a year to get out, so we shouldn't feel >>> pressured to get *everything* in this release. >>> >> >> What are we going to do for 1.8? >> >> >> Let's get 1.7 out the door first. >> > > Mark proposed a schedule for the next several releases, I'd like to know > if we are going to follow it. > > > We should discuss it again. I don't recall the specifics and I believe it > was just a proposal. I do not recall much feedback on it. > > > >> >> Yes, the functions will give warnings otherwise. >> >> >> I think this needs to be revisited. I don't think these changes are >> necessary for *every* use of macros. It can cause a lot of effort for >> people downstream without concrete benefit. >> > > The idea is to slowly move towards hiding the innards of the array type. > This has been under discussion since 1.3 came out. It is certainly the case > that not all macros need to go away. > > > I know it's been under discussion, but it looks like a lot of changes were > made just last year (and I am just starting to understand the implications > of all those changes). I think there are many NumPy users that will be > in the same position over the coming years. This is a bit more than > just hiding the array innards. The array innards have been "hidden" by > using the macros since NumPy 1.0. There was a specific intent to create > macros for all array access and encourage use of those macros --- precisely > so that the array object could change. The requirement of ABI > compatibility was not pre-envisioned in NumPy 1.0 > > Neither was NumPy 1.0 trying to provide type-safety in all cases. I > don't recall a discussion on the value of having macros that can be > interpreted at least as both PyObject * and PyArrayObject *. Perhaps > this is possible, and I just need to be educated. But, my opinion is that > it's not always useful to require type-casting especially between those > two. > > > >> >> >>> That's not as nice to type. >>> >> >> So? The point is to have correctness, not ease of typing. >> >> >> I'm not sure if a pun was intended there or not. C is not a safe and >> fully-typed system. That is one of its weaknesses according to many. >> But, I would submit that not being forced to give everything a "type" (and >> recognizing the tradeoffs that implies) is also one reason it gets used. >> > > C was famous for bugs due to the lack of function prototypes. This was > fixed with C99 and the stricter typing was a great help. > > > Bugs are not "due to lack of function prototypes". Bugs are due to > mistakes that programmers make (and I know all about mistakes programmers > make). Function prototypes can help detect some kinds of mistakes which is > helpful. But, this doesn't help the question of how to transition a > weakly-typed program or whether or not that is even a useful exercise. > Oh, come on. Writing correct C code used to be a guru exercise. A friend of mine, a Putnam fellow, was the Weitek guru for drivers. To say bugs are programmer mistakes is information free, the question is how to minimize programmer mistakes. > > >> >> >> >>> Is that assuming that PyArray_NDIM will become a function and need a >>> specific object type for its argument (and everything else cast....). >>> That's one clear disadvantage of inline functions versus macros in my mind: >>> no automatic polymorphism. >>> >> >> That's a disadvantage of Python. The virtue of inline functions is >> precisely type checking. >> >> >> Right, but we need to be more conscientious about this. Not every use >> of Macros should be replaced by inline function calls and the requisite >> *forced* type-checking. type-chekcing is not *universally* a virtue --- >> if it were, nobody would use Python. >> >> >> >>> I don't think type safety is a big win for macros like these. We >>> need to be more judicious about which macros are scheduled for function >>> inlining. Some just don't benefit from the type-safety implications as >>> much as others do, and you end up requiring everyone to change their code >>> downstream for no real reason. >>> >>> These sorts of changes really feel to me like unnecessary spelling >>> changes that require work from extension writers who now have to modify >>> their code with no real gain. There seems to be a lot of that going on in >>> the code base and I'm not really convinced that it's useful for end-users. >>> >> >> Good style and type checking are useful. Numpy needs more of both. >> >> >> You can assert it, but it doesn't make it so. "Good style" depends on >> what you are trying to accomplish and on your point of view. NumPy's style >> is not the product of one person, it's been adapted from multiple styles >> and inherits quite a bit from Python's style. I don't make any claims for >> it other than it allowed me to write it with the time and experience I had >> 7 years ago. We obviously disagree about this point. I'm sorry about >> that. I'm pretty flexible usually --- that's probably one of your big >> criticisms of my "style". >> > > Curiously, my criticism would be more that you are inflexible, slow to > change old habits. > > > I don't mind changing old habits at all. In fact, I don't think you know > me very well if that's your take. You have a very narrow window into my > activity and behavior. Of course habits are always hard to change > (that's why they call them habits). Mostly, I need to be convinced of > the value of changing old patterns --- just like everyone else (including > existing NumPy users). On the type-question, I'm just not convinced that > the most pressing matter in NumPy and SciPy is to re-write existing code to > be more strictly typed. I'm quite open to other view points on that > --- as long as backward compatibility is preserved, or a clear upgrade > story is provided to existing users. > > > > >> >> But, one of the things I feel quite strongly about is how hard we make it >> for NumPy users to upgrade. There are two specific things I disagree >> with pretty strongly: >> >> 1) Changing defined macros that should work the same on PyArrayObjects or >> PyObjects to now *require* types --- if we want to introduce new macros >> that require types than we can --- as long as it just provides warnings but >> still compiles then I suppose I could find this acceptable. >> >> 2) Changing MACROS to require semicolons when they were previously not >> needed. I'm going to be very hard-nosed about this one. >> >> >> >>> I'm going to be a lot more resistant to that sort of change in the code >>> base when I see it. >>> >> >> Numpy is a team effort. There are people out there who write better code >> than you do, you should learn from them. >> >> >> Exactly! It's a team effort. I'm part of that team as well, and while >> I don't always have strong opinions about things. When I do, I'm going to >> voice it. >> >> I've learned long ago there are people that write better code than me. >> There are people that write better code than you. >> > > Of course. Writing code is not my profession, and even if it were, there > are people out there who would be immeasurable better. I have tried to > improve my style over the years by reading books and browsing code by > people who are better than me. I also recognize common bad habits naive > coders tend to pick up when they start out, not least because I have at one > time or another had many of the same bad habits. > > > We are really quite a like here. I have done and continue to do exactly > the same thing. My priorities are just different. I don't believe it is > universally useful to alter patterns in existing code. I have typically > adapted my style to the code I'm working with. Numeric had a style which > I adapted to. Python has a style which I adapted to. I think that > people reading code and seeing multiple styles will find the code harder to > read. Such changes of style take work, and quite often the transition > is not worth the effort. I have not been nor will I continue to be in > opposition to changes that improve things (under any developers notion of > "improvement"). The big exception to that is when it seems to me that the > changes will make it more difficult for existing users to use their code. > > I know you are trying to make it easier for NumPy developers as you > understand it. I really admire you for doing what you feel strongly about. > I think we are both in our way trying to encourage more NumPy developers > (you by making the code easier to get in to) and me by trying to figure out > acceptable ways to fund them. > > I just think that we must recognize the users out there who have written > to the existing NumPy interface. Any change that requires effort from > users should be met with skepticism. We can write new interfaces and > encourage new users to use those new interfaces. We can even re-write > NumPy internals to use those interfaces. But, we can't just change > documented interfaces (and even be careful about undocumented but implied > interfaces -- I agree that this gets difficult to really adhere to, but we > can and should try at least for heavily used code-paths). One thing I'm > deeply aware of is the limited audience of this list compared to the user > base of NumPy and the intertia of old NumPy releases. Discussions on > this list are just not visible to the wider user base. My recent activity > and interest is in protecting that user-base from the difficulties that > recent changes are going to be on people upgrading from 1.5. > > My failing last year was to encourage and pay for (through Enthought) > Mark's full-time activity on this list but not have the time to provide > enough guidance to him about my understanding of the implications of his > changes and think hard enough about those to understand them in the time. > I thought Mark's activities actually declined once he entered the Enthought black hole. To be more specific, Mark did things that interested Enthought. I'd like to know what Mark himself would have liked to do. When an original thinker with impressive skills comes along it is worth letting them have a fair amount of freedom to move things, it's the only way to avoid stagnation. > > > That is not the question here at all. The question here is not >> requiring a *re-write* of code in order to get their extensions to compile >> using NumPy headers. We should not be making people change their code to >> get their extensions to compile in NumPy 1.X >> > > I think a bit of rewrite here and there along the way is more palatable > than a big change coming in as one big lump, especially if the changes are > done with a long term goal in mind. We are working towards a Numpy 2, but > we can't just go off for a year or two and write it, we have to get there > step by step. And that requires a plan. > > > We see things a little differently on that front, I think. A bit of > re-write here and there for down-stream users is exactly the wrong approach > in my view. I think it depends on the user. For one who is tracking > every NumPy release and has time to make any and all changes needed, I > think you are right --- that approach will work for them. However, there > are people out there who are using NumPy in ways (either significantly or > only indirectly) where having to change *any* code from one release to > another will make them seriously annoyed and we will start losing users. > Remember the lessons of 2.0, and of Python 3.0 for that matter. > > >> >> >> >>> >>> One particularly glaring example to my lens on the world: I think it >>> would have been better to define new macros which require semicolons than >>> changing the macros that don't require semicolons to now require >>> semicolons: >>> >>> NPY_BEGIN_THREADS_DEF >>> NPY_BEGIN_THREADS >>> NPY_ALLOW_C_API >>> NPY_ALLOW_C_API_DEF >>> NPY_DISABLE_C_API >>> >>> That feels like a gratuitous style change that will force users of those >>> macros to re-write their code. >>> >> >> It doesn't seem to be much of a problem. >> >> >> Unfortunately, I don't trust your judgment on that. My experience and >> understanding tells a much different story. I'm sorry if you disagree >> with me. >> >> > I'm sorry I made you sorry ;) The problem here is that you don't come > forth with specifics. People tell you things, but you don't say who or what > their specific problem was. Part of working with a team is keeping folks > informed, it isn't that useful to appeal to authority. I watch the list, > which is admittedly a small window into the community, and I haven't seen > show stoppers. Bugs, sure, but that isn't the same thing. > > > I came up with a very specific thing. I'm not sure what you are talking > about. If you are talking about discussions with people off list, then > I can't speak for them unless they have allowed me to. I encourage them to > speak up here as often as they can. Yes, you will have to trust that a > little bit of concern might just be an iceberg waiting to sink the ship. > > >> >>> Sure, it's a simple change, but it's a simple change that doesn't do >>> anything for you as an end user. I think I'm going to back this change >>> out, in fact. I can't see requiring people to change their C-code like >>> this will require without a clear benefit to them. I'm quite sure there >>> is code out there that uses these documented APIs (without the semicolon). >>> If we want to define new macros that require colons, then we do that, but >>> we can't get rid of the old ones --- especially in a 1.x release. >>> >>> Our policy should not be to allow gratuitous style changes just because >>> we think something is prettier another way. The NumPy code base has come >>> from multiple sources and reflects several styles. It also follows an >>> older style of C-programming (that is quite common in the Python code >>> base). It can be changed, but those changes shouldn't be painful for a >>> library user without some specific gain for them that the change allows. >>> >>> >> You use that word 'gratuitous' a lot, I don't think it means what you >> think it means. For instance, the new polynomial coefficient order wasn't >> gratuitous, it was doing things in a way many found more intuitive and >> generalized better to different polynomial basis. People >> >> have different ideas, that doesn't make them gratuitous. >> >> >> That's a slightly different issue. At least you created a new object >> and api which is a *little* better. My complaint about the choice there >> is now there *must* be two interfaces and added confusion as people will >> have to figure out which assumption is being used. I don't really care >> about the coefficient order --- really I don't. Either one is fine in my >> mind. I recognize the reasons. The problem is *changing* it without a >> *really* good reason. Now, we have to have two different APIs. I would >> much preferred to have poly1d disappear and just use your much nicer >> polynomial classes. Now, it can't and we are faced with a user-story >> that is either difficult for someone transitioning from MATLAB >> > > Most folks aren't going to transition from MATLAB or IDL. Engineers tend > to stick with the tools they learned in school, they aren't interested in > the tool itself as long as they can get their job done. And getting the job > done is what they are paid for. That said, I doubt they would have much > problem making the adjustment if they were inclined to switch tools. > > > I don't share your pessimism. You really think that "most folks aren't > going to transition". It's happening now. It's been happening for > several years. > I still haven't seen it. Once upon a time code for optical design was a new thing and many folks wrote their own, myself for one. These days they reach for Code V or Zemax. When they make the schematics they use something like Solidworks. When it comes time for thermal anaysis they run the Solidworks design into another commercial program. When it comes time to manufacture the parts another package takes the Solidworks data and produces nc instructions to drive the tools. The thing is, there is a whole ecosystem built around a few standard design tools. Similar considerations hold in civil engineering, architecture, and many other areas. Another example would be Linux on the desktop. That never really took off, Microsoft is still the dominant presence there. Where Linux succeeded was in embedded devices and smart phones, markets that hadn't yet developed a large ecosystem and where pennies count. Now to Matlab, suppose you want to analyse thermal effects on an orbiting satellite. Do you sit down and start writing new code in Python or do you buy a package for Matlab that deals with orbital calculations and knows all about shading and illumination? Suppose further that you have a few weeks to pull it off and have used the Matlab tools in the past. Matlab wins in this situation, Python isn't even a consideration. There are certainly places for Python out there. HPC is one, because last I looked Matlab licenses were still based around the number of cpu cores, so there are significant cost savings. Research that needs innovative software is another area where Python has an advantage. First, because in research it is expected that time will be spent exploring new things, and second because it is easier to write Python than Matlab scripts and there are more tools available at no cost. On the other hand, if you need sophisticated mathematics, Mathematica is the easy way to go. Engineering is a big area, and only a small part of it offers opportunity for Python to make inroads. > > or a "why did you do that?" puzzled look from a new user as to why we >> support both coefficient orders. Of course, that could be our story --- >> hey we support all kinds of orders, it doesn't really matter, you just have >> to tell us what you mean when passing in an unadorned array of >> coefficients. But, this is a different issue. >> >> I'm using the word 'gratuitous' to mean that it is "uncalled for and >> lacks a good reason". There needs to be much better reasons given for >> code changes that require someone to re-write working code than "it's >> better style" or even "it will help new programmers avoid errors". Let's >> write another interface that new programmers can use that fits the world >> the way you see it, don't change what's already working just because you >> don't like it or wish a different choice had been made. >> > > Well, and that was exactly what you meant when you called to coefficient > order 'gratuitous' in your first post to me about it. The problem was that > you didn't understand why I made the change until I explained it, but > rather made the charge sans explanation. It might be that some of the other > things you call gratuitous are less so than you think. These are hasty > judgements I think. > > > I'm sure we all have our share of hasty judgments to go around. Even > after your explanation, I still disagree with it. But, I appreciate the > reminder to give you the benefit of the doubt when I encounter something > that makes me raise my eyebrows. I hope you will do the same. > > > >> >> >> >>> There are significant users of NumPy out there still on 1.4. Even the >>> policy of deprecation that has been discussed will not help people trying >>> to upgrade from 1.4 to 1.8. They will be forced to upgrade multiple >>> times. The easier we can make this process for users the better. I >>> remain convinced that it's better and am much more comfortable with making >>> a release that requires a re-compile (that will succeed without further >>> code changes --- because of backward compatibility efforts) than to have >>> supposed ABI compatibility with subtle semantic changes and required C-code >>> changes when you do happen to re-compile. >>> >>> >> Cleanups need to be made bit by bit. I don't think we have done anything >> that will cause undo trouble. >> >> >> I disagree substantially on the impact of these changes. You can >> disagree about my awareness of NumPy users, but I think I understand a >> large number of them and why NumPy has been successful in getting users. >> I agree that we have been unsuccessful at getting serious developers and >> I'm convinced by you and Mark as to why that is. But, we can't sacrifice >> users for the sake of getting developers who will spend their free time >> trying to get around the organic pile that NumPy is at this point. >> >> Because of this viewpoint, I think there is some adaptation and cleanup >> right now, needed, so that significant users of NumPy can upgrade based on >> the changes that have occurred without causing them annoying errors (even >> simple changes can be a pain in the neck to fix). >> >> I do agree changes can be made. I realize you've worked hard to keep >> the code-base in a state that you find more adequate. I think you go >> overboard on that front, but I acknowledge that there are people that >> appreciate this. I do feel very strongly that we should not require >> users to have to re-write working C-code in order to use a new minor >> version number in NumPy, regardless of how the code "looks" or how much >> "better" it is according to some idealized standard. >> >> The macro changes are border-line (at least I believe code will still >> compile --- just raise warnings, but I need to be sure about this). The >> changes that require semi-colons are not acceptable at all. >> > > I was tempted to back them out myself, but I don't think the upshot will > be earth shaking. > > > I think it's important that code using NumPy headers that compiled with > 1.5 will compile with 1.7. > > > >> >> Look Charles, I believe we can continue to work productively together and >> our differences can be a strength to the community. I hope you feel the >> same way. I will continue to respect and listen to your perspective --- >> especially when I disagree with it. >> > > Sounds like a threat to me. Who are you to judge? If you are going to be > the dictator, let's put that out there and make it official. > > > Wow, charles! I think you should re-read what I wrote. It was not a > threat at all. It was an appeal to work more closely together, and a > commitment on my end to listen to your point of view and try to sift from > any of my own opposition the chaff from the wheat. > > I am just not thinking in those terms at all. I do not think it is > appropriate to talk about a dictator in this context. I have no control > over what you do, and you have no control over what I do. We can only > work cooperatively or independently for the benefit of NumPy. > > Perhaps there are things I've said and done that really bother you, or > have offended you. I'm sorry for anything I've said that might have grated > on you personally. I do appreciate your voice, ability, perspective, and > skill. I suspect there are others in the NumPy community that feel the > same way. > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Mon Jun 25 13:41:58 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 25 Jun 2012 12:41:58 -0500 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> Message-ID: >> >> C was famous for bugs due to the lack of function prototypes. This was fixed with C99 and the stricter typing was a great help. > > Bugs are not "due to lack of function prototypes". Bugs are due to mistakes that programmers make (and I know all about mistakes programmers make). Function prototypes can help detect some kinds of mistakes which is helpful. But, this doesn't help the question of how to transition a weakly-typed program or whether or not that is even a useful exercise. > > Oh, come on. Writing correct C code used to be a guru exercise. A friend of mine, a Putnam fellow, was the Weitek guru for drivers. To say bugs are programmer mistakes is information free, the question is how to minimize programmer mistakes. Bugs *are* programmer mistakes. Let's put responsibility where it lies. Of course, writing languages that help programmers make fewer mistakes (or catch them earlier when they do) are a good thing. I'm certainly not arguing against that. But, I reiterate that just because a better way to write new code under some metric is discovered or understood does not mean that all current code should be re-written to use that style. That's the only comment I'm making. Also, you mention the lessons from Python 2 and Python 3, but I'm not sure we would agree on what those lessons actually were, so I wouldn't rely on that as a way of getting your point across if it matters. Best, -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Mon Jun 25 13:55:25 2012 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 25 Jun 2012 13:55:25 -0400 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> Message-ID: On Mon, Jun 25, 2012 at 1:41 PM, Travis Oliphant wrote: > >> C was famous for bugs due to the lack of function prototypes. This was >> fixed with C99 and the stricter typing was a great help. >> >> >> Bugs are not "due to lack of function prototypes". Bugs are due to >> mistakes that programmers make (and I know all about mistakes programmers >> make). Function prototypes can help detect some kinds of mistakes which is >> helpful. But, this doesn't help the question of how to transition a >> weakly-typed program or whether or not that is even a useful exercise. >> > > Oh, come on. Writing correct C code used to be a guru exercise. A friend > of mine, a Putnam fellow, was the Weitek guru for drivers. To say bugs are > programmer mistakes is information free, the question is how to minimize > programmer mistakes. > > > Bugs *are* programmer mistakes. Let's put responsibility where it lies. > Of course, writing languages that help programmers make fewer mistakes > (or catch them earlier when they do) are a good thing. I'm certainly not > arguing against that. > > But, I reiterate that just because a better way to write new code under > some metric is discovered or understood does not mean that all current code > should be re-written to use that style. That's the only comment I'm > making. > > Also, you mention the lessons from Python 2 and Python 3, but I'm not sure > we would agree on what those lessons actually were, so I wouldn't rely on > that as a way of getting your point across if it matters. > > Best, > > -Travis > > At the risk of starting a language flame war, my take of Charles' comment about the lessons of python 3.0 is its success in getting packages transitioned smoothly (still an on-going process), versus what happened with Perl 5. Perl 5 was a major change that happened all at once and no-one adopted it for the longest time. Meanwhile, python incremented itself from the 2.x series to the 3.x series in a very nice manner with a well-thought-out plan that was visible to all. At least, that is my understanding and perception. Take it with as much salt as you (or your doctor) desires. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From perry at stsci.edu Mon Jun 25 13:56:54 2012 From: perry at stsci.edu (Perry Greenfield) Date: Mon, 25 Jun 2012 13:56:54 -0400 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> Message-ID: <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> On Jun 25, 2012, at 12:20 PM, Charles R Harris wrote: >>> >>> Most folks aren't going to transition from MATLAB or IDL. >>> Engineers tend to stick with the tools they learned in school, >>> they aren't interested in the tool itself as long as they can get >>> their job done. And getting the job done is what they are paid >>> for. That said, I doubt they would have much problem making the >>> adjustment if they were inclined to switch tools. >> >> I don't share your pessimism. You really think that "most folks >> aren't going to transition". It's happening now. It's been >> happening for several years. >> >> > I still haven't seen it. Once upon a time code for optical design > was a new thing and many folks wrote their own, myself for one. > These days they reach for Code V or Zemax. When they make the > schematics they use something like Solidworks. When it comes time > for thermal anaysis they run the Solidworks design into another > commercial program. When it comes time to manufacture the parts > another package takes the Solidworks data and produces nc > instructions to drive the tools. The thing is, there is a whole > ecosystem built around a few standard design tools. Similar > considerations hold in civil engineering, architecture, and many > other areas. > > Another example would be Linux on the desktop. That never really > took off, Microsoft is still the dominant presence there. Where > Linux succeeded was in embedded devices and smart phones, markets > that hadn't yet developed a large ecosystem and where pennies count. > > Now to Matlab, suppose you want to analyse thermal effects on an > orbiting satellite. Do you sit down and start writing new code in > Python or do you buy a package for Matlab that deals with orbital > calculations and knows all about shading and illumination? Suppose > further that you have a few weeks to pull it off and have used the > Matlab tools in the past. Matlab wins in this situation, Python > isn't even a consideration. > > There are certainly places for Python out there. HPC is one, because > last I looked Matlab licenses were still based around the number of > cpu cores, so there are significant cost savings. Research that > needs innovative software is another area where Python has an > advantage. First, because in research it is expected that time will > be spent exploring new things, and second because it is easier to > write Python than Matlab scripts and there are more tools available > at no cost. On the other hand, if you need sophisticated > mathematics, Mathematica is the easy way to go. > > Engineering is a big area, and only a small part of it offers > opportunity for Python to make inroads. > It's hard to generalize that much here. There are some areas in what you say is true, particularly if whole industries rely on libraries that have much time involved in developing them, and for which it is particularly difficult to break away. But there are plenty of other areas where it isn't that hard. I'd characterize the process a bit differently. I would agree that it is pretty hard to get someone who has been using matlab or IDL for many years to transition. That doesn't happen very often (if it does, it's because all the other people they work with are using a different tool and they are forced to). I think we are targeting the younger people; those that do not have a lot of experience tied up in matlab or IDL. For example, IDL is very well established in astronomy, and we've seen few make that switch if they already have been using IDL for a while. But we are seeing many more younger astronomers choose Python over IDL these days. Perry From charlesr.harris at gmail.com Mon Jun 25 15:25:09 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 25 Jun 2012 13:25:09 -0600 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> Message-ID: On Mon, Jun 25, 2012 at 11:56 AM, Perry Greenfield wrote: > > On Jun 25, 2012, at 12:20 PM, Charles R Harris wrote: > >>> > >>> Most folks aren't going to transition from MATLAB or IDL. > >>> Engineers tend to stick with the tools they learned in school, > >>> they aren't interested in the tool itself as long as they can get > >>> their job done. And getting the job done is what they are paid > >>> for. That said, I doubt they would have much problem making the > >>> adjustment if they were inclined to switch tools. > >> > >> I don't share your pessimism. You really think that "most folks > >> aren't going to transition". It's happening now. It's been > >> happening for several years. > >> > >> > > I still haven't seen it. Once upon a time code for optical design > > was a new thing and many folks wrote their own, myself for one. > > These days they reach for Code V or Zemax. When they make the > > schematics they use something like Solidworks. When it comes time > > for thermal anaysis they run the Solidworks design into another > > commercial program. When it comes time to manufacture the parts > > another package takes the Solidworks data and produces nc > > instructions to drive the tools. The thing is, there is a whole > > ecosystem built around a few standard design tools. Similar > > considerations hold in civil engineering, architecture, and many > > other areas. > > > > Another example would be Linux on the desktop. That never really > > took off, Microsoft is still the dominant presence there. Where > > Linux succeeded was in embedded devices and smart phones, markets > > that hadn't yet developed a large ecosystem and where pennies count. > > > > Now to Matlab, suppose you want to analyse thermal effects on an > > orbiting satellite. Do you sit down and start writing new code in > > Python or do you buy a package for Matlab that deals with orbital > > calculations and knows all about shading and illumination? Suppose > > further that you have a few weeks to pull it off and have used the > > Matlab tools in the past. Matlab wins in this situation, Python > > isn't even a consideration. > > > > There are certainly places for Python out there. HPC is one, because > > last I looked Matlab licenses were still based around the number of > > cpu cores, so there are significant cost savings. Research that > > needs innovative software is another area where Python has an > > advantage. First, because in research it is expected that time will > > be spent exploring new things, and second because it is easier to > > write Python than Matlab scripts and there are more tools available > > at no cost. On the other hand, if you need sophisticated > > mathematics, Mathematica is the easy way to go. > > > > Engineering is a big area, and only a small part of it offers > > opportunity for Python to make inroads. > > > It's hard to generalize that much here. There are some areas in what > you say is true, particularly if whole industries rely on libraries > that have much time involved in developing them, and for which it is > particularly difficult to break away. But there are plenty of other > areas where it isn't that hard. > > I'd characterize the process a bit differently. I would agree that it > is pretty hard to get someone who has been using matlab or IDL for > many years to transition. That doesn't happen very often (if it does, > it's because all the other people they work with are using a different > tool and they are forced to). I think we are targeting the younger > people; those that do not have a lot of experience tied up in matlab > or IDL. For example, IDL is very well established in astronomy, and > we've seen few make that switch if they already have been using IDL > for a while. But we are seeing many more younger astronomers choose > Python over IDL these days. > I didn't bring up the Astronomy experience, but I think that is a special case because it is a fairly small area and to some extent you had the advantage of a supported center, STSci, maintaining some software. There are also a lot of amateurs who can appreciate the low costs and simplicity of Python. The software engineers use tends to be set early, in college or in their first jobs. I suspect that these days professional astronomers spend a number of years in graduate school where they have time to experiment a bit. That is a nice luxury to have. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From perry at stsci.edu Mon Jun 25 18:21:30 2012 From: perry at stsci.edu (Perry Greenfield) Date: Mon, 25 Jun 2012 18:21:30 -0400 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> Message-ID: <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> On Jun 25, 2012, at 3:25 PM, Charles R Harris wrote: > > > On Mon, Jun 25, 2012 at 11:56 AM, Perry Greenfield > wrote: > > It's hard to generalize that much here. There are some areas in what > you say is true, particularly if whole industries rely on libraries > that have much time involved in developing them, and for which it is > particularly difficult to break away. But there are plenty of other > areas where it isn't that hard. > > I'd characterize the process a bit differently. I would agree that it > is pretty hard to get someone who has been using matlab or IDL for > many years to transition. That doesn't happen very often (if it does, > it's because all the other people they work with are using a different > tool and they are forced to). I think we are targeting the younger > people; those that do not have a lot of experience tied up in matlab > or IDL. For example, IDL is very well established in astronomy, and > we've seen few make that switch if they already have been using IDL > for a while. But we are seeing many more younger astronomers choose > Python over IDL these days. > > I didn't bring up the Astronomy experience, but I think that is a > special case because it is a fairly small area and to some extent > you had the advantage of a supported center, STSci, maintaining some > software. There are also a lot of amateurs who can appreciate the > low costs and simplicity of Python. > > The software engineers use tends to be set early, in college or in > their first jobs. I suspect that these days professional astronomers > spend a number of years in graduate school where they have time to > experiment a bit. That is a nice luxury to have. > Sure. But it's not unusual for an invasive technology (that's us) to take root in certain niches before spreading more widely. Another way of looking at such things is: is what we are seeking to replace that much worse? If the gains are marginal, then it is very hard to displace. But if there are significant advantages, eventually they will win through. I tend to think Python and the scientific stack does offer the potential for great advantages over IDL or matlab. But that doesn't make it easy. Perry From srean.list at gmail.com Mon Jun 25 18:29:22 2012 From: srean.list at gmail.com (srean) Date: Mon, 25 Jun 2012 17:29:22 -0500 Subject: [Numpy-discussion] Semantics of index arrays and a request to fix the user guide Message-ID: >From the user guide: ----------------------------- > Boolean arrays must be of the same shape as the array being indexed, > or broadcastable to the same shape. In the most straightforward case, > the boolean array has the same shape. Comment: So far so good, but the doc has not told me yet what is the shape or the output. -------------- user guide continues with an example: ------------------------------------------------------ > The result is a 1-D array containing all the elements in the indexed array corresponding to all the true elements in the boolean array. Comment: -------------- Now it is not clear from that line whether the shape of the result is generally true or is it specific to the example. So the reader (me) is still confused. User Guide continues: -------------------------------- > With broadcasting, multidimensional arrays may be the result. For example... Comment: -------------- I will get to the example in a minute, but there is no explanation of the mechanism used to arrive at the output shape, is it the shape of what the index array was broadcasted to ? or is it something else, if it is the latter, what is it. Example ------------ The example indexes a (5,7) array with a (5,) index array. Now this very confusing because it seems to contradict the original documentation because (5,) is neither the same shape as (5,7) nor is it broadcastable to it. The steps of the conventional broaddcasting would yield (5,7) (5,) then (5,7) (1,5) then an error because 7 and 5 dont match. User guide continues: ------------------------------ > Combining index arrays with slices. > In effect, the slice is converted to an index array > np.array([[1,2]]) (shape (1,2)) that is broadcast with > the index array to produce a resultant array of shape (3,2). comment: ------------- Here the two arrays have shape (3,) and (1,2) so how does broadcasting yield the shape 3,2. Broadcasting is supposed to proceed trailing dimension first but it seems in these examples it is doing the opposite. ===== So could someone explain the semantics and make the user guide more precise. Assuming the user guide will be the first document the new user will read it is surprisingly difficult to read, primarily because it gets into advanced topics to soon and partially because of ambiguous language. The numpy reference on the other hand is very clear as is Travis's book which I am glad to say I actually bought a long time ago. Thanks, srean From charlesr.harris at gmail.com Mon Jun 25 20:01:52 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 25 Jun 2012 18:01:52 -0600 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> Message-ID: On Mon, Jun 25, 2012 at 4:21 PM, Perry Greenfield wrote: > > On Jun 25, 2012, at 3:25 PM, Charles R Harris wrote: > > > > > > > On Mon, Jun 25, 2012 at 11:56 AM, Perry Greenfield > > wrote: > > > > It's hard to generalize that much here. There are some areas in what > > you say is true, particularly if whole industries rely on libraries > > that have much time involved in developing them, and for which it is > > particularly difficult to break away. But there are plenty of other > > areas where it isn't that hard. > > > > I'd characterize the process a bit differently. I would agree that it > > is pretty hard to get someone who has been using matlab or IDL for > > many years to transition. That doesn't happen very often (if it does, > > it's because all the other people they work with are using a different > > tool and they are forced to). I think we are targeting the younger > > people; those that do not have a lot of experience tied up in matlab > > or IDL. For example, IDL is very well established in astronomy, and > > we've seen few make that switch if they already have been using IDL > > for a while. But we are seeing many more younger astronomers choose > > Python over IDL these days. > > > > I didn't bring up the Astronomy experience, but I think that is a > > special case because it is a fairly small area and to some extent > > you had the advantage of a supported center, STSci, maintaining some > > software. There are also a lot of amateurs who can appreciate the > > low costs and simplicity of Python. > > > > The software engineers use tends to be set early, in college or in > > their first jobs. I suspect that these days professional astronomers > > spend a number of years in graduate school where they have time to > > experiment a bit. That is a nice luxury to have. > > > Sure. But it's not unusual for an invasive technology (that's us) to > take root in certain niches before spreading more widely. > > Another way of looking at such things is: is what we are seeking to > replace that much worse? If the gains are marginal, then it is very > hard to displace. But if there are significant advantages, eventually > they will win through. I tend to think Python and the scientific stack > does offer the potential for great advantages over IDL or matlab. But > that doesn't make it easy. > I didn't say we couldn't make inroads. The original proposition was that we needed a polynomial class compatible with Matlab. I didn't think compatibility with Matlab mattered so much in that case because not many people switch, as you have agreed is the case, and those who start fresh, or are the adventurous sort, can adapt without a problem. In other words, IMHO, it wasn't a pressing issue and could be decided on the merits of the interface, which I thought of in terms of series approximation. In particular, it wasn't a 'gratuitous' choice as I had good reasons to do things the way I did. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Mon Jun 25 20:10:46 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 25 Jun 2012 19:10:46 -0500 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> Message-ID: <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> You are still missing the point that there was already a choice that was made in the previous class --- made in Numeric actually. You made a change to that. It is the change that is 'gratuitous'. The pain and unnecessary overhead of having two competing standards is the problem --- not whether one is 'right' or not. That is a different discussion entirely. -- Travis Oliphant (on a mobile) 512-826-7480 On Jun 25, 2012, at 7:01 PM, Charles R Harris wrote: > > > On Mon, Jun 25, 2012 at 4:21 PM, Perry Greenfield wrote: > > On Jun 25, 2012, at 3:25 PM, Charles R Harris wrote: > > > > > > > On Mon, Jun 25, 2012 at 11:56 AM, Perry Greenfield > > wrote: > > > > It's hard to generalize that much here. There are some areas in what > > you say is true, particularly if whole industries rely on libraries > > that have much time involved in developing them, and for which it is > > particularly difficult to break away. But there are plenty of other > > areas where it isn't that hard. > > > > I'd characterize the process a bit differently. I would agree that it > > is pretty hard to get someone who has been using matlab or IDL for > > many years to transition. That doesn't happen very often (if it does, > > it's because all the other people they work with are using a different > > tool and they are forced to). I think we are targeting the younger > > people; those that do not have a lot of experience tied up in matlab > > or IDL. For example, IDL is very well established in astronomy, and > > we've seen few make that switch if they already have been using IDL > > for a while. But we are seeing many more younger astronomers choose > > Python over IDL these days. > > > > I didn't bring up the Astronomy experience, but I think that is a > > special case because it is a fairly small area and to some extent > > you had the advantage of a supported center, STSci, maintaining some > > software. There are also a lot of amateurs who can appreciate the > > low costs and simplicity of Python. > > > > The software engineers use tends to be set early, in college or in > > their first jobs. I suspect that these days professional astronomers > > spend a number of years in graduate school where they have time to > > experiment a bit. That is a nice luxury to have. > > > Sure. But it's not unusual for an invasive technology (that's us) to > take root in certain niches before spreading more widely. > > Another way of looking at such things is: is what we are seeking to > replace that much worse? If the gains are marginal, then it is very > hard to displace. But if there are significant advantages, eventually > they will win through. I tend to think Python and the scientific stack > does offer the potential for great advantages over IDL or matlab. But > that doesn't make it easy. > > I didn't say we couldn't make inroads. The original proposition was that we needed a polynomial class compatible with Matlab. I didn't think compatibility with Matlab mattered so much in that case because not many people switch, as you have agreed is the case, and those who start fresh, or are the adventurous sort, can adapt without a problem. In other words, IMHO, it wasn't a pressing issue and could be decided on the merits of the interface, which I thought of in terms of series approximation. In particular, it wasn't a 'gratuitous' choice as I had good reasons to do things the way I did. > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Mon Jun 25 20:21:31 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Mon, 25 Jun 2012 17:21:31 -0700 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> Message-ID: On Mon, Jun 25, 2012 at 5:10 PM, Travis Oliphant wrote: > You are still missing the point that there was already a choice that was > made in the previous class --- made in Numeric actually. > > You made a change to that. ?It is the change that is 'gratuitous'. As someone who played a role in that change (by talking with Chuck about it, I didn't do the actual hard work), I'd like to pitch in. I think it's unfair to use the word gratuitous here, which is defined as: Adjective: Uncalled for; lacking good reason; unwarranted. It is true that the change happened to not consider enough the reasons that existed for the previous state of affairs, but it's *not* true that there were no good reasons for it. Calling something gratuitous is fairly derogatory, as it implies that it was done without any thinking whatsoever, and that was most certainly not the case here. It was a deliberate and considered change for what were *thought* to be good reasons. It's possible that, had there been feedback from you at the time, those reasons would have been appreciated as not being sufficient to make the change, or that a different solution would have been arrived at. But to say that there were no good reason is unfair to those who did spend the time thinking about the problem, and who thought the reasons they had found were indeed good ones. That particular issue was simply one of the best examples of what happens in a project when there are not enough eyes to provide feedback on its evolution: even with the best intentions, the few doing the work may make changes that might not have gone through with more input from others. But the alternative was to paralyze numpy completely, which I think would have been a worse outcome. I know that this particular issue grates you quite a bit, but I urge you to be fair in your appreciation of how it came to be: through the work of well-intentioned and thoughtful (but not omniscient) people when you weren't participating actively in numpy development. Cheers, f From josef.pktd at gmail.com Mon Jun 25 20:25:54 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 25 Jun 2012 20:25:54 -0400 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> Message-ID: On Mon, Jun 25, 2012 at 8:10 PM, Travis Oliphant wrote: > You are still missing the point that there was already a choice that was > made in the previous class --- made in Numeric actually. > > You made a change to that. ?It is the change that is 'gratuitous'. ?The pain > and unnecessary overhead of having two competing standards is the problem > --- not whether one is 'right' or not. ?That is a different discussion > entirely. I remember there was a discussion about the order of the coefficients on the mailing list and all in favor of the new order, IIRC. I cannot find the thread. I know I was. At least I'm switching pretty much to the new polynomial classes, and don't really care about the inherited choice before that any more. So, I'm pretty much in favor of updating, if new choices are more convenient and more familiar to new users. Josef > > -- > Travis Oliphant > (on a mobile) > 512-826-7480 > > > On Jun 25, 2012, at 7:01 PM, Charles R Harris > wrote: > > > > On Mon, Jun 25, 2012 at 4:21 PM, Perry Greenfield wrote: >> >> >> On Jun 25, 2012, at 3:25 PM, Charles R Harris wrote: >> >> > >> > >> > On Mon, Jun 25, 2012 at 11:56 AM, Perry Greenfield >> > wrote: >> > >> > It's hard to generalize that much here. There are some areas in what >> > you say is true, particularly if whole industries rely on libraries >> > that have much time involved in developing them, and for which it is >> > particularly difficult to break away. But there are plenty of other >> > areas where it isn't that hard. >> > >> > I'd characterize the process a bit differently. I would agree that it >> > is pretty hard to get someone who has been using matlab or IDL for >> > many years to transition. That doesn't happen very often (if it does, >> > it's because all the other people they work with are using a different >> > tool and they are forced to). I think we are targeting the younger >> > people; those that do not have a lot of experience tied up in matlab >> > or IDL. For example, IDL is very well established in astronomy, and >> > we've seen few make that switch if they already have been using IDL >> > for a while. But we are seeing many more younger astronomers choose >> > Python over IDL these days. >> > >> > I didn't bring up the Astronomy experience, but I think that is a >> > special case because it is a fairly small area and to some extent >> > you had the advantage of a supported center, STSci, maintaining some >> > software. There are also a lot of amateurs who can appreciate the >> > low costs and simplicity of Python. >> > >> > The software engineers use tends to be set early, in college or in >> > their first jobs. I suspect that these days professional astronomers >> > spend a number of years in graduate school where they have time to >> > experiment a bit. That is a nice luxury to have. >> > >> Sure. But it's not unusual for an invasive technology (that's us) to >> take root in certain niches before spreading more widely. >> >> Another way of looking at such things is: is what we are seeking to >> replace that much worse? If the gains are marginal, then it is very >> hard to displace. But if there are significant advantages, eventually >> they will win through. I tend to think Python and the scientific stack >> does offer the potential for great advantages over IDL or matlab. But >> that doesn't make it easy. > > > I didn't say we couldn't make inroads. The original proposition was that we > needed a polynomial class compatible with Matlab. I didn't think > compatibility with Matlab mattered so much in that case because not many > people switch, as you have agreed is the case, and those who start fresh, or > are the adventurous sort, can adapt without a problem. In other words, IMHO, > it wasn't a pressing issue and could be decided on the merits of the > interface, which I thought of in terms of series approximation.? In > particular, it wasn't a 'gratuitous' choice as I had good reasons to do > things the way I did. > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Mon Jun 25 20:53:12 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 25 Jun 2012 20:53:12 -0400 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> Message-ID: On Mon, Jun 25, 2012 at 8:25 PM, wrote: > On Mon, Jun 25, 2012 at 8:10 PM, Travis Oliphant wrote: >> You are still missing the point that there was already a choice that was >> made in the previous class --- made in Numeric actually. >> >> You made a change to that. ?It is the change that is 'gratuitous'. ?The pain >> and unnecessary overhead of having two competing standards is the problem >> --- not whether one is 'right' or not. ?That is a different discussion >> entirely. > > I remember there was a discussion about the order of the coefficients > on the mailing list and all in favor of the new order, IIRC. I cannot > find the thread. I know I was. > > At least I'm switching pretty much to the new polynomial classes, and > don't really care about the inherited choice before that any more. > > So, I'm pretty much in favor of updating, if new choices are more > convenient and more familiar to new users. just to add a bit more information, given the existence of both poly's nobody had to rewrite flipping order in scipy.signal.residuez b, a = map(asarray, (b, a)) gain = a[0] brev, arev = b[::-1], a[::-1] krev, brev = polydiv(brev, arev) if krev == []: k = [] else: k = krev[::-1] b = brev[::-1] while my arma_process class can start at the same time with def __init__(self, ar, ma, nobs=None): self.ar = np.asarray(ar) self.ma = np.asarray(ma) self.arpoly = np.polynomial.Polynomial(self.ar) self.mapoly = np.polynomial.Polynomial(self.ma) As a downstream user of numpy and observer of the mailing list for a few years, I think the gradual improvements have gone down pretty well. At least I haven't seen any mayor complaints on the mailing list. For me, the big problem was numpy 1.4.0 where several packages where not available because of binary compatibility, NaN's didn't concern me much, current incomplete transition to new MinGW and gcc is currently a bit of a problem. Purely as an observer, my impression was also that the internal numpy c source cleanup, started by David C., I guess, didn't cause any big problems that would have created lots of complaints on the numpy mailing list. Josef > > Josef > >> >> -- >> Travis Oliphant >> (on a mobile) >> 512-826-7480 >> >> >> On Jun 25, 2012, at 7:01 PM, Charles R Harris >> wrote: >> >> >> >> On Mon, Jun 25, 2012 at 4:21 PM, Perry Greenfield wrote: >>> >>> >>> On Jun 25, 2012, at 3:25 PM, Charles R Harris wrote: >>> >>> > >>> > >>> > On Mon, Jun 25, 2012 at 11:56 AM, Perry Greenfield >>> > wrote: >>> > >>> > It's hard to generalize that much here. There are some areas in what >>> > you say is true, particularly if whole industries rely on libraries >>> > that have much time involved in developing them, and for which it is >>> > particularly difficult to break away. But there are plenty of other >>> > areas where it isn't that hard. >>> > >>> > I'd characterize the process a bit differently. I would agree that it >>> > is pretty hard to get someone who has been using matlab or IDL for >>> > many years to transition. That doesn't happen very often (if it does, >>> > it's because all the other people they work with are using a different >>> > tool and they are forced to). I think we are targeting the younger >>> > people; those that do not have a lot of experience tied up in matlab >>> > or IDL. For example, IDL is very well established in astronomy, and >>> > we've seen few make that switch if they already have been using IDL >>> > for a while. But we are seeing many more younger astronomers choose >>> > Python over IDL these days. >>> > >>> > I didn't bring up the Astronomy experience, but I think that is a >>> > special case because it is a fairly small area and to some extent >>> > you had the advantage of a supported center, STSci, maintaining some >>> > software. There are also a lot of amateurs who can appreciate the >>> > low costs and simplicity of Python. >>> > >>> > The software engineers use tends to be set early, in college or in >>> > their first jobs. I suspect that these days professional astronomers >>> > spend a number of years in graduate school where they have time to >>> > experiment a bit. That is a nice luxury to have. >>> > >>> Sure. But it's not unusual for an invasive technology (that's us) to >>> take root in certain niches before spreading more widely. >>> >>> Another way of looking at such things is: is what we are seeking to >>> replace that much worse? If the gains are marginal, then it is very >>> hard to displace. But if there are significant advantages, eventually >>> they will win through. I tend to think Python and the scientific stack >>> does offer the potential for great advantages over IDL or matlab. But >>> that doesn't make it easy. >> >> >> I didn't say we couldn't make inroads. The original proposition was that we >> needed a polynomial class compatible with Matlab. I didn't think >> compatibility with Matlab mattered so much in that case because not many >> people switch, as you have agreed is the case, and those who start fresh, or >> are the adventurous sort, can adapt without a problem. In other words, IMHO, >> it wasn't a pressing issue and could be decided on the merits of the >> interface, which I thought of in terms of series approximation.? In >> particular, it wasn't a 'gratuitous' choice as I had good reasons to do >> things the way I did. >> >> Chuck >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> From travis at continuum.io Mon Jun 25 21:39:19 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 25 Jun 2012 20:39:19 -0500 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> Message-ID: On Jun 25, 2012, at 7:21 PM, Fernando Perez wrote: > On Mon, Jun 25, 2012 at 5:10 PM, Travis Oliphant wrote: >> You are still missing the point that there was already a choice that was >> made in the previous class --- made in Numeric actually. >> >> You made a change to that. It is the change that is 'gratuitous'. > > As someone who played a role in that change (by talking with Chuck > about it, I didn't do the actual hard work), I'd like to pitch in. > > I think it's unfair to use the word gratuitous here, which is defined as: > > Adjective: > > Uncalled for; lacking good reason; unwarranted. I appreciate your perspective, but I still think it's fair to use that word. I think it's been interpreted more broadly then I intended and in a different color than I intended. My use of the word is closer to "uncalled for" and "unwarranted" than an isolated "lacking good reason". I know very well that anything done in NumPy has a "good reason" because the people who participate in NumPy development are very bright and capable. For context, consider that for many years, the word "gratuitous" has been used in a non-derogatory way in the Python ecosystem to describe changes to semantics and syntax that don't have benefits significant enough to offset the pain it will cause to existing users. That's why I used the word. I am not trying to be derogatory. I am trying to be clear that we need to respect existing users of NumPy more than we have done from 1.5 to 1.7 in the enthusiasm to make changes. I will repeat my very simple argument: I think this particular change was uncalled for because now we will have 2 different conventions in NumPy for polynomial order coefficients. I understand it's a different API so code won't break which is a good thing --- but it will be a wart for NumPy as we explain for years to come about the different conventions in the same code-base. Working on the NumPy code base implies respecting the conventions that are already in place --- not just disregarding them and doing whatever we want. I'm not really sure why I have to argue the existing users point of view so much recently. I would hope that all of us would have the perspective that the people who have adopted NumPy deserve to be treated with respect. The changes that grate on me are the ones that seem to take lightly existing users of NumPy. > > > It is true that the change happened to not consider enough the reasons > that existed for the previous state of affairs, but it's *not* true > that there were no good reasons for it. Of course not. I've tried to make the point very clearly that I understand the good reasons for it. I do understand them. 12-13 years ago, when the decisions were being made about current conventions, I would likely have been persuaded by them. > But to say that there were no good reason is unfair to those who did > spend the time thinking about the problem, and who thought the reasons > they had found were indeed good ones. I did not ever say there were no good reasons. Please do not add energy to the idea that I'm dis-regarding the reasoning of those who thought about this. I'm not. I said the changes were uncalled for and unwarranted. I stand by that assessment. I do not mean any dis-respect to the people who made the changes. In that context, it's also useful to recognize how unfair it is to existing users to change conventions and ignore the work they have put in to understanding and using what is there. It's also useful to consider the unfairness of ignoring the thinking and work that went in to the existing conventions and APIs. > I know that this particular issue grates you quite a bit, but I urge > you to be fair in your appreciation of how it came to be: through the > work of well-intentioned and thoughtful (but not omniscient) people > when you weren't participating actively in numpy development. I'm trying very hard to be fair --- especially to changes like this. What grates me are changes that affect our user base in a negative way --- specifically by causing code that used to work to no longer work or create alterations to real conventions. This kind of change is just not acceptable if we can avoid it. I'm really trying to understand why others do not feel so strongly about this, but I'm not persuaded by what I've heard so far. Please note that I'm not trying to assign blame. I recognize the part that my failings and inadequacies have played in this (I continue to be willing to listen to others assessments of those inadequacies and failings and do my best to learn from them). I'm just trying to create a different context for future discussions about these sorts of things. I love the changes that add features and capability for our users. I have not called for ripping these particular changes out even though I would be much, much happier if there were a single convention for polynomial-coefficient order in NumPy. In fact, most of my messages have included references to how to incorporate such changes with as little impact as possible -- finding some way to reconcile things, perhaps by focusing attention away from such things (adding a keyword to the poly1d class, perhaps, to allow it to be called in reverse order). Best, -Travis From travis at continuum.io Mon Jun 25 21:50:36 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 25 Jun 2012 20:50:36 -0500 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> Message-ID: On Jun 25, 2012, at 7:53 PM, josef.pktd at gmail.com wrote: > On Mon, Jun 25, 2012 at 8:25 PM, wrote: >> On Mon, Jun 25, 2012 at 8:10 PM, Travis Oliphant wrote: >>> You are still missing the point that there was already a choice that was >>> made in the previous class --- made in Numeric actually. >>> >>> You made a change to that. It is the change that is 'gratuitous'. The pain >>> and unnecessary overhead of having two competing standards is the problem >>> --- not whether one is 'right' or not. That is a different discussion >>> entirely. >> >> I remember there was a discussion about the order of the coefficients >> on the mailing list and all in favor of the new order, IIRC. I cannot >> find the thread. I know I was. >> >> At least I'm switching pretty much to the new polynomial classes, and >> don't really care about the inherited choice before that any more. >> >> So, I'm pretty much in favor of updating, if new choices are more >> convenient and more familiar to new users. > > just to add a bit more information, given the existence of both poly's > > nobody had to rewrite flipping order in scipy.signal.residuez > b, a = map(asarray, (b, a)) > gain = a[0] > brev, arev = b[::-1], a[::-1] > krev, brev = polydiv(brev, arev) > if krev == []: > k = [] > else: > k = krev[::-1] > b = brev[::-1] > > while my arma_process class can start at the same time with > def __init__(self, ar, ma, nobs=None): > self.ar = np.asarray(ar) > self.ma = np.asarray(ma) > self.arpoly = np.polynomial.Polynomial(self.ar) > self.mapoly = np.polynomial.Polynomial(self.ma) That's a nice argument for a different convention, really it is. It's not enough for changing a convention that already exists. Now, the polynomial object could store coefficients in this order, but allow construction with the coefficients in the standard convention order. That would have been a fine compromise from my perspective. > > As a downstream user of numpy and observer of the mailing list for a > few years, I think the gradual improvements have gone down pretty > well. At least I haven't seen any mayor complaints on the mailing > list. You are an *active* user of NumPy. Your perspective is valuable, but it is one of many perspectives in the user community. What is missing in this discussion is the 100's of thousands of users of NumPy who never comment on this mailing list and won't. There are many that have not moved from 1.5.1 yet. I hope your optimism is correct about how difficult it will be to upgrade for them. As long as I hold any influence at all on the NumPy project, I will argue and fight on behalf of those users to the best that I can understand their perspective. > > For me, the big problem was numpy 1.4.0 where several packages where > not available because of binary compatibility, NaN's didn't concern me > much, current incomplete transition to new MinGW and gcc is currently > a bit of a problem. It is *much*, *much* easier to create binaries of downstream packages than to re-write APIs. I still think we would be better off to remove the promise of ABI compatibility in every .X release (perhaps we hold ABI compatibility for 2 releases). However, we should preserve API compatibility for every release. > > Purely as an observer, my impression was also that the internal numpy > c source cleanup, started by David C., I guess, didn't cause any big > problems that would have created lots of complaints on the numpy > mailing list. David C spent a lot of time ensuring his changes did not alter the compiling experience or the run-time experience of users of NumPy. This was greatly appreciated. Lack of complaints on the mailing list is not the metric we should be using. Most users will never comment on this list --- especially given how hard we've made it for people to feel like they will be listened to. We have to think about the implications of our changes on existing users. -Travis > > Josef > >> >> Josef >> >>> >>> -- >>> Travis Oliphant >>> (on a mobile) >>> 512-826-7480 >>> >>> >>> On Jun 25, 2012, at 7:01 PM, Charles R Harris >>> wrote: >>> >>> >>> >>> On Mon, Jun 25, 2012 at 4:21 PM, Perry Greenfield wrote: >>>> >>>> >>>> On Jun 25, 2012, at 3:25 PM, Charles R Harris wrote: >>>> >>>>> >>>>> >>>>> On Mon, Jun 25, 2012 at 11:56 AM, Perry Greenfield >>>>> wrote: >>>>> >>>>> It's hard to generalize that much here. There are some areas in what >>>>> you say is true, particularly if whole industries rely on libraries >>>>> that have much time involved in developing them, and for which it is >>>>> particularly difficult to break away. But there are plenty of other >>>>> areas where it isn't that hard. >>>>> >>>>> I'd characterize the process a bit differently. I would agree that it >>>>> is pretty hard to get someone who has been using matlab or IDL for >>>>> many years to transition. That doesn't happen very often (if it does, >>>>> it's because all the other people they work with are using a different >>>>> tool and they are forced to). I think we are targeting the younger >>>>> people; those that do not have a lot of experience tied up in matlab >>>>> or IDL. For example, IDL is very well established in astronomy, and >>>>> we've seen few make that switch if they already have been using IDL >>>>> for a while. But we are seeing many more younger astronomers choose >>>>> Python over IDL these days. >>>>> >>>>> I didn't bring up the Astronomy experience, but I think that is a >>>>> special case because it is a fairly small area and to some extent >>>>> you had the advantage of a supported center, STSci, maintaining some >>>>> software. There are also a lot of amateurs who can appreciate the >>>>> low costs and simplicity of Python. >>>>> >>>>> The software engineers use tends to be set early, in college or in >>>>> their first jobs. I suspect that these days professional astronomers >>>>> spend a number of years in graduate school where they have time to >>>>> experiment a bit. That is a nice luxury to have. >>>>> >>>> Sure. But it's not unusual for an invasive technology (that's us) to >>>> take root in certain niches before spreading more widely. >>>> >>>> Another way of looking at such things is: is what we are seeking to >>>> replace that much worse? If the gains are marginal, then it is very >>>> hard to displace. But if there are significant advantages, eventually >>>> they will win through. I tend to think Python and the scientific stack >>>> does offer the potential for great advantages over IDL or matlab. But >>>> that doesn't make it easy. >>> >>> >>> I didn't say we couldn't make inroads. The original proposition was that we >>> needed a polynomial class compatible with Matlab. I didn't think >>> compatibility with Matlab mattered so much in that case because not many >>> people switch, as you have agreed is the case, and those who start fresh, or >>> are the adventurous sort, can adapt without a problem. In other words, IMHO, >>> it wasn't a pressing issue and could be decided on the merits of the >>> interface, which I thought of in terms of series approximation. In >>> particular, it wasn't a 'gratuitous' choice as I had good reasons to do >>> things the way I did. >>> >>> Chuck >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From josef.pktd at gmail.com Mon Jun 25 22:37:23 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 25 Jun 2012 22:37:23 -0400 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> Message-ID: On Mon, Jun 25, 2012 at 9:50 PM, Travis Oliphant wrote: > > On Jun 25, 2012, at 7:53 PM, josef.pktd at gmail.com wrote: > >> On Mon, Jun 25, 2012 at 8:25 PM, ? wrote: >>> On Mon, Jun 25, 2012 at 8:10 PM, Travis Oliphant wrote: >>>> You are still missing the point that there was already a choice that was >>>> made in the previous class --- made in Numeric actually. >>>> >>>> You made a change to that. ?It is the change that is 'gratuitous'. ?The pain >>>> and unnecessary overhead of having two competing standards is the problem >>>> --- not whether one is 'right' or not. ?That is a different discussion >>>> entirely. >>> >>> I remember there was a discussion about the order of the coefficients >>> on the mailing list and all in favor of the new order, IIRC. I cannot >>> find the thread. I know I was. >>> >>> At least I'm switching pretty much to the new polynomial classes, and >>> don't really care about the inherited choice before that any more. >>> >>> So, I'm pretty much in favor of updating, if new choices are more >>> convenient and more familiar to new users. >> >> just to add a bit more information, given the existence of both poly's >> >> nobody had to rewrite ?flipping order in scipy.signal.residuez >> ? ?b, a = map(asarray, (b, a)) >> ? ?gain = a[0] >> ? ?brev, arev = b[::-1], a[::-1] >> ? ?krev, brev = polydiv(brev, arev) >> ? ?if krev == []: >> ? ? ? ?k = [] >> ? ?else: >> ? ? ? ?k = krev[::-1] >> ? ?b = brev[::-1] >> >> while my arma_process class can start at the same time with >> ? ?def __init__(self, ar, ma, nobs=None): >> ? ? ? ?self.ar = np.asarray(ar) >> ? ? ? ?self.ma = np.asarray(ma) >> ? ? ? ?self.arpoly = np.polynomial.Polynomial(self.ar) >> ? ? ? ?self.mapoly = np.polynomial.Polynomial(self.ma) > > That's a nice argument for a different convention, really it is. ? It's not enough for changing a convention that already exists. ?Now, the polynomial object could store coefficients in this order, but allow construction with the coefficients in the standard convention order. ?That would have been a fine compromise from my perspective. I'm much happier with the current solution. As long as I stick with the np.polynomial classes, I don't have to *think* about coefficient order. With a hybrid I would always have to worry about whether this animal is facing front or back. I wouldn't mind if the old order is eventually deprecated and dropped. (Another example: NIST polynomial follow the new order, 2nd section http://jpktd.blogspot.ca/2012/03/numerical-accuracy-in-linear-least.html no [::-1] in the second version.) > >> >> As a downstream user of numpy and observer of the mailing list for a >> few years, I think the gradual improvements have gone down pretty >> well. At least I haven't seen any mayor complaints on the mailing >> list. > > You are an *active* user of NumPy. ? ?Your perspective is valuable, but it is one of many perspectives in the user community. ?What is missing in this discussion is the 100's of thousands of users of NumPy who never comment on this mailing list and won't. ?There are many that have not moved from 1.5.1 yet. ? ?I hope your optimism is correct about how difficult it will be to upgrade for them. ? ?As long as I hold any influence at all on the NumPy project, I will argue and fight on behalf of those users to the best that I can understand their perspective. oops, my working version >>> np.__version__ '1.5.1' I'm testing and maintaining statsmodels compatibility from numpy 1.4.1 and scipy 0.7.2 to the current released versions (with a compat directory). statsmodels dropped numpy 1.3 support, because I didn't want to give up using numpy.polynomial. Most of the 100,000s of numpy users that never show up on the mailing list won't worry much about most changes, because package managers and binary builders and developers of application packages take care of most of it. When I use matplotlib, I don't care whether it uses masked arrays, or other array types internally (and rely on Benjamin and others to represent matplotlib usage/users). Wes is recommending users to use the pandas API to insulate them from changes in numpy's datetimes. > >> >> For me, the big problem was numpy 1.4.0 where several packages where >> not available because of binary compatibility, NaN's didn't concern me >> much, current incomplete transition to new MinGW and gcc is currently >> a bit of a problem. > > It is *much*, *much* easier to create binaries of downstream packages than to re-write APIs. ? ?I still think we would be better off to remove the promise of ABI compatibility in every .X release (perhaps we hold ABI compatibility for 2 releases). ? However, we should preserve API compatibility for every release. freeze the API wherever it got by "historical accident"? > >> >> Purely as an observer, my impression was also that the internal numpy >> c source cleanup, started by David C., I guess, didn't cause any big >> problems that would have created lots of complaints on the numpy >> mailing list. > > David C spent a lot of time ensuring his changes did not alter the compiling experience or the run-time experience of users of NumPy. ? ?This was greatly appreciated. ? Lack of complaints on the mailing list is not the metric we should be using. ? Most users will never comment on this list --- especially given how hard we've made it for people to feel like they will be listened to. I think for some things, questions and complaints on the mailing list or stackoverflow is a very good metric. My reason to appreciate David's work, is reflected in that the number of installation issues on Windows has disappeared from the mailing list. I just easy_installed numpy into a virtualenv without any problems at all (it just worked), which was the last issue on Windows that I know of (last seen on stackoverflow). easy_installing scipy into a virtualenv almost worked (needed some help). > > We have to think about the implications of our changes on existing users. Yes, Josef > > -Travis > > > > > >> >> Josef >> >>> >>> Josef >>> >>>> >>>> -- >>>> Travis Oliphant >>>> (on a mobile) >>>> 512-826-7480 >>>> >>>> >>>> On Jun 25, 2012, at 7:01 PM, Charles R Harris >>>> wrote: >>>> >>>> >>>> >>>> On Mon, Jun 25, 2012 at 4:21 PM, Perry Greenfield wrote: >>>>> >>>>> >>>>> On Jun 25, 2012, at 3:25 PM, Charles R Harris wrote: >>>>> >>>>>> >>>>>> >>>>>> On Mon, Jun 25, 2012 at 11:56 AM, Perry Greenfield >>>>>> wrote: >>>>>> >>>>>> It's hard to generalize that much here. There are some areas in what >>>>>> you say is true, particularly if whole industries rely on libraries >>>>>> that have much time involved in developing them, and for which it is >>>>>> particularly difficult to break away. But there are plenty of other >>>>>> areas where it isn't that hard. >>>>>> >>>>>> I'd characterize the process a bit differently. I would agree that it >>>>>> is pretty hard to get someone who has been using matlab or IDL for >>>>>> many years to transition. That doesn't happen very often (if it does, >>>>>> it's because all the other people they work with are using a different >>>>>> tool and they are forced to). I think we are targeting the younger >>>>>> people; those that do not have a lot of experience tied up in matlab >>>>>> or IDL. For example, IDL is very well established in astronomy, and >>>>>> we've seen few make that switch if they already have been using IDL >>>>>> for a while. But we are seeing many more younger astronomers choose >>>>>> Python over IDL these days. >>>>>> >>>>>> I didn't bring up the Astronomy experience, but I think that is a >>>>>> special case because it is a fairly small area and to some extent >>>>>> you had the advantage of a supported center, STSci, maintaining some >>>>>> software. There are also a lot of amateurs who can appreciate the >>>>>> low costs and simplicity of Python. >>>>>> >>>>>> The software engineers use tends to be set early, in college or in >>>>>> their first jobs. I suspect that these days professional astronomers >>>>>> spend a number of years in graduate school where they have time to >>>>>> experiment a bit. That is a nice luxury to have. >>>>>> >>>>> Sure. But it's not unusual for an invasive technology (that's us) to >>>>> take root in certain niches before spreading more widely. >>>>> >>>>> Another way of looking at such things is: is what we are seeking to >>>>> replace that much worse? If the gains are marginal, then it is very >>>>> hard to displace. But if there are significant advantages, eventually >>>>> they will win through. I tend to think Python and the scientific stack >>>>> does offer the potential for great advantages over IDL or matlab. But >>>>> that doesn't make it easy. >>>> >>>> >>>> I didn't say we couldn't make inroads. The original proposition was that we >>>> needed a polynomial class compatible with Matlab. I didn't think >>>> compatibility with Matlab mattered so much in that case because not many >>>> people switch, as you have agreed is the case, and those who start fresh, or >>>> are the adventurous sort, can adapt without a problem. In other words, IMHO, >>>> it wasn't a pressing issue and could be decided on the merits of the >>>> interface, which I thought of in terms of series approximation. ?In >>>> particular, it wasn't a 'gratuitous' choice as I had good reasons to do >>>> things the way I did. >>>> >>>> Chuck >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From fperez.net at gmail.com Mon Jun 25 22:38:24 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Mon, 25 Jun 2012 19:38:24 -0700 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> Message-ID: On Mon, Jun 25, 2012 at 6:39 PM, Travis Oliphant wrote: > > On Jun 25, 2012, at 7:21 PM, Fernando Perez wrote: > > For context, consider that for many years, the word "gratuitous" has been used in a non-derogatory way in the Python ecosystem to describe changes to semantics and syntax that don't have benefits significant enough to offset the pain it will cause to existing users. ? ?That's why I used the word. ? I am not trying to be derogatory. ? I am trying to be clear that we need to respect existing users of NumPy more than we have done from 1.5 to 1.7 in the enthusiasm to make changes. > For reference, here's the (long) thread where this came to be: http://mail.scipy.org/pipermail/scipy-dev/2009-October/012958.html It's worth noting that at the time, the discussion was for an addition to *scipy*, not to numpy. I don't know when things were moved over to numpy. > Working on the NumPy code base implies respecting the conventions that are already in place --- not just disregarding them and doing whatever we want. ? ? I'm not really sure why I have to argue the existing users point of view so much recently. ? ?I would hope that all of us would have the perspective that the people who have adopted NumPy deserve to be treated with respect. ? ?The changes that grate on me are the ones that seem to take lightly existing users of NumPy. > I certainly appreciate the need to not break user habits/code, as we struggle with the very same issue in IPython all the time. And obviously at this point numpy is 'core infrastructure' enough that breaking backwards compatibility in any way should be very strongly discouraged (things were probably a bit different back in 2009). >> I know that this particular issue grates you quite a bit, but I urge >> you to be fair in your appreciation of how it came to be: through the >> work of well-intentioned and thoughtful (but not omniscient) people >> when you weren't participating actively in numpy development. > > I'm trying very hard to be fair --- especially to changes like this. ?What grates me are changes that affect our user base in a negative way --- specifically by causing code that used to work to no longer work or create alterations to real conventions. ?This kind of change is just not acceptable if we can avoid it. ? I'm really trying to understand why others do not feel so strongly about this, but I'm not persuaded by what I've heard so far. I just want to note that I'm not advocating for *any* backwards-compatibility breakage in numpy at this point... I was just providing context for a discussion that happened back in 2009, and in the scipy list. I certainly feel pretty strongly at this point about the importance of preserving working code *today*, given the role of numpy at the 'root node' of the scipy ecosystem tree and the size of said tree. Best, f From travis at continuum.io Mon Jun 25 23:04:02 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 25 Jun 2012 22:04:02 -0500 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> Message-ID: <2E286066-C6EF-4E34-AC77-B58863362AEC@continuum.io> On Jun 25, 2012, at 9:38 PM, Fernando Perez wrote: > On Mon, Jun 25, 2012 at 6:39 PM, Travis Oliphant wrote: >> >> On Jun 25, 2012, at 7:21 PM, Fernando Perez wrote: > >> >> For context, consider that for many years, the word "gratuitous" has been used in a non-derogatory way in the Python ecosystem to describe changes to semantics and syntax that don't have benefits significant enough to offset the pain it will cause to existing users. That's why I used the word. I am not trying to be derogatory. I am trying to be clear that we need to respect existing users of NumPy more than we have done from 1.5 to 1.7 in the enthusiasm to make changes. >> > > For reference, here's the (long) thread where this came to be: > > http://mail.scipy.org/pipermail/scipy-dev/2009-October/012958.html > > It's worth noting that at the time, the discussion was for an addition > to *scipy*, not to numpy. I don't know when things were moved over to > numpy. > Yes, it's also worth noting the discussion took place on the SciPy list. The fact that NumPy decisions were made on the SciPy mailing list is not a pattern we should repeat. While the two communities have overlap, they are not the same. It is important to remind ourselves of this (especially those of us who feel at home in both). From that thread, I wish that ideas of Anne and David had been listened to instead of just dismissed out of hand, like was done. Anne suggested putting the polynomial class in SciPy (where there would have been less consternation about the coefficient order change --- although many seem to really want to ingore the entire Controls and LTI-system communities where the other convention is common). David suggested allowing both orders to be specified. That is still a good idea in my view. Thanks for doing the research to bring the thread up again. > > I just want to note that I'm not advocating for *any* > backwards-compatibility breakage in numpy at this point... I was just > providing context for a discussion that happened back in 2009, and in > the scipy list. I certainly feel pretty strongly at this point about > the importance of preserving working code *today*, given the role of > numpy at the 'root node' of the scipy ecosystem tree and the size of > said tree. Thank you for re-iterating that position. The polynomial order question is moot at this point. It's not going to change. We just need to also keep maintaining poly1d's interface. -Travis > > Best, > > f > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ondrej.certik at gmail.com Mon Jun 25 23:10:14 2012 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Mon, 25 Jun 2012 20:10:14 -0700 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> Message-ID: On Mon, Jun 25, 2012 at 7:38 PM, Fernando Perez wrote: > On Mon, Jun 25, 2012 at 6:39 PM, Travis Oliphant wrote: >> >> On Jun 25, 2012, at 7:21 PM, Fernando Perez wrote: > >> >> For context, consider that for many years, the word "gratuitous" has been used in a non-derogatory way in the Python ecosystem to describe changes to semantics and syntax that don't have benefits significant enough to offset the pain it will cause to existing users. ? ?That's why I used the word. ? I am not trying to be derogatory. ? I am trying to be clear that we need to respect existing users of NumPy more than we have done from 1.5 to 1.7 in the enthusiasm to make changes. >> > > For reference, here's the (long) thread where this came to be: > > http://mail.scipy.org/pipermail/scipy-dev/2009-October/012958.html > > It's worth noting that at the time, the discussion was for an addition > to *scipy*, not to numpy. ?I don't know when things were moved over to > numpy. > > >> Working on the NumPy code base implies respecting the conventions that are already in place --- not just disregarding them and doing whatever we want. ? ? I'm not really sure why I have to argue the existing users point of view so much recently. ? ?I would hope that all of us would have the perspective that the people who have adopted NumPy deserve to be treated with respect. ? ?The changes that grate on me are the ones that seem to take lightly existing users of NumPy. >> > > I certainly appreciate the need to not break user habits/code, as we > struggle with the very same issue in IPython all the time. ?And > obviously at this point numpy is 'core infrastructure' enough that > breaking backwards compatibility in any way should be very strongly > discouraged (things were probably a bit different back in 2009). > >>> I know that this particular issue grates you quite a bit, but I urge >>> you to be fair in your appreciation of how it came to be: through the >>> work of well-intentioned and thoughtful (but not omniscient) people >>> when you weren't participating actively in numpy development. >> >> I'm trying very hard to be fair --- especially to changes like this. ?What grates me are changes that affect our user base in a negative way --- specifically by causing code that used to work to no longer work or create alterations to real conventions. ?This kind of change is just not acceptable if we can avoid it. ? I'm really trying to understand why others do not feel so strongly about this, but I'm not persuaded by what I've heard so far. > > I just want to note that I'm not advocating for *any* > backwards-compatibility breakage in numpy at this point... I was just > providing context for a discussion that happened back in 2009, and in > the scipy list. ?I certainly feel pretty strongly at this point about > the importance of preserving working code *today*, given the role of > numpy at the 'root node' of the scipy ecosystem tree and the size of > said tree. I think that everybody strongly agrees that backward incompatible changes should not be made. Sometimes it can be more subtle, see for example this numpy bug report in Debian: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=589835 and read the dozens of emails that it generated, e.g. http://lists.debian.org/debian-python/2010/07/msg00048.html, and so on. I've been hit by this problem too, that's why I remember it -- suddenly many packages that depend on NumPy stopped working in a subtle way and I had to spent hours figuring out what went wrong and that the problem is not in h5py, but actually that NumPy has changed its ABI, or more precisely the problem is described here (some new members were added to a C datastructure): http://lists.debian.org/debian-python/2010/07/msg00045.html I am sure that this ABI change had to be done and there were good reasons for it and this particular change probably even couldn't have been avoided. But nevertheless it has caused headaches to a lot of people downstream. I just looked into the release notes for NumPy 1.4.0 and didn't find this change nor how to fix it in there. I am just posting this as a particular, concrete, real life example of consequences for the end users. My understanding is that Travis is simply trying to stress "We have to think about the implications of our changes on existing users." and also that little changes (with the best intentions!) that however mean either a breakage or confusion for users (due to historical reasons) should be avoided if possible. And I very strongly feel the same way. And I think that most people on this list do as well. But sometimes I guess mistakes are made anyway. What can be done to avoid similar issues like with the polynomial order in the future? Ondrej From travis at continuum.io Mon Jun 25 23:13:33 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 25 Jun 2012 22:13:33 -0500 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> Message-ID: <204BE916-1E3C-4E44-A15B-7A31013555AE@continuum.io> >> >> That's a nice argument for a different convention, really it is. It's not enough for changing a convention that already exists. Now, the polynomial object could store coefficients in this order, but allow construction with the coefficients in the standard convention order. That would have been a fine compromise from my perspective. > > I'm much happier with the current solution. As long as I stick with > the np.polynomial classes, I don't have to *think* about coefficient > order. With a hybrid I would always have to worry about whether this > animal is facing front or back. I don't think you would have to worry about it at all. It would just be an interface you personally wouldn't ever call. In other words, you just provide the option for someone else to specify their *input* arrays in reverse order. You could keep them stored in this "natural order" just as they are now. > I wouldn't mind if the old order is eventually deprecated and dropped. > > (Another example: NIST polynomial follow the new order, 2nd section > http://jpktd.blogspot.ca/2012/03/numerical-accuracy-in-linear-least.html > no [::-1] in the second version.) Thanks for providing the additional references. I do recognize that the convention is in use elsewhere. >> >> It is *much*, *much* easier to create binaries of downstream packages than to re-write APIs. I still think we would be better off to remove the promise of ABI compatibility in every .X release (perhaps we hold ABI compatibility for 2 releases). However, we should preserve API compatibility for every release. > > freeze the API wherever it got by "historical accident"? Not quite. You can add new and different APIs. You just can't change old ones. You also have to be careful about changes that break the implied but not specified code contract of current users. Even the strategy of "deprecating APIs needs to be used very judiciously and occasionally. We can deprecate APIs but can't remove them for several releases --- say 4 or 5. You are correct, I'm concerned about users that have built additional packages on top of NumPy. Some of these we know about, many of them we don't know about --- as they are in internal systems. Many users are shielded from NumPy changes by other APIs, this is an avenue of exploration that can and will continue. We aren't there yet, though, and I don't think the "plans for NumPy change" have previously considered enough the impact on users of NumPy. Thank for your voicing your comments and perspective. -Travis From josef.pktd at gmail.com Mon Jun 25 23:22:26 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 25 Jun 2012 23:22:26 -0400 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> Message-ID: On Mon, Jun 25, 2012 at 11:10 PM, Ond?ej ?ert?k wrote: > On Mon, Jun 25, 2012 at 7:38 PM, Fernando Perez wrote: >> On Mon, Jun 25, 2012 at 6:39 PM, Travis Oliphant wrote: >>> >>> On Jun 25, 2012, at 7:21 PM, Fernando Perez wrote: >> >>> >>> For context, consider that for many years, the word "gratuitous" has been used in a non-derogatory way in the Python ecosystem to describe changes to semantics and syntax that don't have benefits significant enough to offset the pain it will cause to existing users. ? ?That's why I used the word. ? I am not trying to be derogatory. ? I am trying to be clear that we need to respect existing users of NumPy more than we have done from 1.5 to 1.7 in the enthusiasm to make changes. >>> >> >> For reference, here's the (long) thread where this came to be: >> >> http://mail.scipy.org/pipermail/scipy-dev/2009-October/012958.html >> >> It's worth noting that at the time, the discussion was for an addition >> to *scipy*, not to numpy. ?I don't know when things were moved over to >> numpy. >> >> >>> Working on the NumPy code base implies respecting the conventions that are already in place --- not just disregarding them and doing whatever we want. ? ? I'm not really sure why I have to argue the existing users point of view so much recently. ? ?I would hope that all of us would have the perspective that the people who have adopted NumPy deserve to be treated with respect. ? ?The changes that grate on me are the ones that seem to take lightly existing users of NumPy. >>> >> >> I certainly appreciate the need to not break user habits/code, as we >> struggle with the very same issue in IPython all the time. ?And >> obviously at this point numpy is 'core infrastructure' enough that >> breaking backwards compatibility in any way should be very strongly >> discouraged (things were probably a bit different back in 2009). >> >>>> I know that this particular issue grates you quite a bit, but I urge >>>> you to be fair in your appreciation of how it came to be: through the >>>> work of well-intentioned and thoughtful (but not omniscient) people >>>> when you weren't participating actively in numpy development. >>> >>> I'm trying very hard to be fair --- especially to changes like this. ?What grates me are changes that affect our user base in a negative way --- specifically by causing code that used to work to no longer work or create alterations to real conventions. ?This kind of change is just not acceptable if we can avoid it. ? I'm really trying to understand why others do not feel so strongly about this, but I'm not persuaded by what I've heard so far. >> >> I just want to note that I'm not advocating for *any* >> backwards-compatibility breakage in numpy at this point... I was just >> providing context for a discussion that happened back in 2009, and in >> the scipy list. ?I certainly feel pretty strongly at this point about >> the importance of preserving working code *today*, given the role of >> numpy at the 'root node' of the scipy ecosystem tree and the size of >> said tree. > > I think that everybody strongly agrees that backward incompatible > changes should not be made. > > Sometimes it can be more subtle, > see for example this numpy bug report in Debian: > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=589835 > > and read the dozens of emails that it generated, e.g. > http://lists.debian.org/debian-python/2010/07/msg00048.html, and so > on. I've been hit by this problem too, that's why I remember it -- > suddenly many packages that depend on NumPy stopped working in a > subtle way and I had to spent hours figuring out what went wrong and > that the problem is not in h5py, but actually that NumPy has changed > its ABI, or more precisely the problem is described here (some new > members were added to a C datastructure): > http://lists.debian.org/debian-python/2010/07/msg00045.html > I am sure that this ABI change had to be done and there were good > reasons for it and this particular change probably even couldn't have > been avoided. But nevertheless it has caused headaches to a lot of > people downstream. I just looked into the release notes for NumPy > 1.4.0 and didn't find this change nor how to fix it in there. I am > just posting this as a particular, concrete, real life example of > consequences for the end users. > > My understanding is that Travis is simply trying to stress "We have to > think about the implications of our changes on existing users." and > also that little changes (with the best intentions!) that however mean > either a breakage or confusion for users (due to historical reasons) > should be avoided if possible. And I very strongly feel the same way. > And I think that most people on this list do as well. That's not the case that Travi's has in mind. This was an ABI break, not API break. It took quite some time and an 1.4.1 to recover from it. Although there were some indication of the ABI break before the 1.4.0 release, it was only found out after the release (as byproduct of datetime). Many packages on windows were never available for 1.4.0 because not many package developers wanted to recompile for 1.4.0, (like h5py) Josef > > But sometimes I guess mistakes are made anyway. What can be done to > avoid similar issues like with the polynomial order in the future? > > Ondrej > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From travis at continuum.io Mon Jun 25 23:33:46 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 25 Jun 2012 22:33:46 -0500 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> Message-ID: <895DAD05-5B24-4EBC-84F4-10442A4531B3@continuum.io> >> >> I just want to note that I'm not advocating for *any* >> backwards-compatibility breakage in numpy at this point... I was just >> providing context for a discussion that happened back in 2009, and in >> the scipy list. I certainly feel pretty strongly at this point about >> the importance of preserving working code *today*, given the role of >> numpy at the 'root node' of the scipy ecosystem tree and the size of >> said tree. > > I think that everybody strongly agrees that backward incompatible > changes should not be made. > > Sometimes it can be more subtle, > see for example this numpy bug report in Debian: There are a lot of subtleties and different users and different expectations. It does make it difficult to know the best course of action. I appreciate the perspective of as many people as possible --- especially those who have managed code bases with a large number of users. What should have happened in this case, in my mind, is that NumPy 1.4.0 should have been 1.5.0 and advertised that there was a break in the ABI and that all extensions would have to be re-built against the new version. This would have been some pain for one class of users (primarily package maintainers) and no pain for another class. There was no API breakage. We just needed to communicate clearly. Because we guessed wrongly that the changes made did not change the ABI, we did not communicate clearly during the release. This was a mistake. I was a large part of that mistake. I also understand the impact that the unsolved packaging problem in the Python community has created (at least for non-academic users and HPC users). Some take this example as you can't change the ABI. That's not quite my perspective for what it's worth. I don't think you should have a habit of changing the ABI (because it does create some hassle for downstream users), but especially today when there are many pre-packaged distributions of Python, occassional changes that require a re-compile of downstream dependencies does not constitute the kind of breakage I'm talking about. The kind of breakage I'm talking about is the kind that causes code that used to work to stop working (either because it won't compile against the new headers) or because the behavior of operations changes in subtle ways. Both kinds of changes have happened between 1.5.x and 1.7.x. Some believe these changes are inconsequential. I hope they are right. I don't believe we have enough data to make that call, and there is some evidence I am aware of from people in other organizations that their are changes that will make upgrading difficult for people --- much more difficult than an ABI breakage would have been. You can change things. You just have to be cautious and more careful. It's definitely more painful. Changes that will require *any* work by a user of NumPy other than a re-compile of their code should only be on major version numbers, preferably have a backward-compatible header to use, and a document that describes all the changes that must be made to move the code forward. I'm not trying to throw stones. My glass house and my own sins would not justify such behavior. I apologize if it has come off that way at any time. Best, -Travis From cournape at gmail.com Mon Jun 25 23:35:30 2012 From: cournape at gmail.com (David Cournapeau) Date: Tue, 26 Jun 2012 04:35:30 +0100 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> Message-ID: On Tue, Jun 26, 2012 at 4:10 AM, Ond?ej ?ert?k wrote: > > My understanding is that Travis is simply trying to stress "We have to > think about the implications of our changes on existing users." and > also that little changes (with the best intentions!) that however mean > either a breakage or confusion for users (due to historical reasons) > should be avoided if possible. And I very strongly feel the same way. > And I think that most people on this list do as well. I think Travis is more concerned about API than ABI changes (in that example for 1.4, the ABI breakage was caused by a change that was pushed by Travis IIRC). The relative importance of API vs ABI is a tough one: I think ABI breakage is as bad as API breakage (but matter in different circumstances), but it is hard to improve the situation around our ABI without changing the API (especially everything around macros and publicly accessible structures). Changing this is politically difficult because nobody will upgrade to a new numpy with a different API just because it is cleaner, but without a cleaner API, it will be difficult to implement quite a few improvements. The situation is not that different form python 3, which has seen a poor adoption, and only starts having interesting feature on its own now. As for more concrete actions: I believe Wes McKinney has a comprehensive suite with multiple versions of numpy/pandas, I can't seem to find where that was mentioned, though. This would be a good starting point to check ABI matters (say pandas, mpl, scipy on top of multiple numpy). David From ondrej.certik at gmail.com Mon Jun 25 23:42:51 2012 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Mon, 25 Jun 2012 20:42:51 -0700 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> Message-ID: On Mon, Jun 25, 2012 at 8:35 PM, David Cournapeau wrote: > On Tue, Jun 26, 2012 at 4:10 AM, Ond?ej ?ert?k wrote: > >> >> My understanding is that Travis is simply trying to stress "We have to >> think about the implications of our changes on existing users." and >> also that little changes (with the best intentions!) that however mean >> either a breakage or confusion for users (due to historical reasons) >> should be avoided if possible. And I very strongly feel the same way. >> And I think that most people on this list do as well. > > I think Travis is more concerned about API than ABI changes (in that > example for 1.4, the ABI breakage was caused by a change that was > pushed by Travis IIRC). > > The relative importance of API vs ABI is a tough one: I think ABI > breakage is as bad as API breakage (but matter in different > circumstances), but it is hard to improve the situation around our ABI > without changing the API (especially everything around macros and > publicly accessible structures). Changing this is politically > difficult because nobody will upgrade to a new numpy with a different > API just because it is cleaner, but without a cleaner API, it will be > difficult to implement quite a few improvements. The situation is not > that different form python 3, which has seen a poor adoption, and only > starts having interesting feature on its own now. > > As for more concrete actions: I believe Wes McKinney has a > comprehensive suite with multiple versions of numpy/pandas, I can't > seem to find where that was mentioned, though. This would be a good > starting point to check ABI matters (say pandas, mpl, scipy on top of > multiple numpy). I will try to check as many packages as I can to see what actual problems arise. I have created an issue for it: https://github.com/numpy/numpy/issues/319 Feel free to add more packages that you feel are important. I will try to check at least the ones that are in the issue, and more if I have time. I will close the issue once the upgrade path is clearly documented in the release for every thing that breaks. Ondrej From cournape at gmail.com Tue Jun 26 00:12:19 2012 From: cournape at gmail.com (David Cournapeau) Date: Tue, 26 Jun 2012 05:12:19 +0100 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> Message-ID: On Tue, Jun 26, 2012 at 4:42 AM, Ond?ej ?ert?k wrote: > On Mon, Jun 25, 2012 at 8:35 PM, David Cournapeau wrote: >> On Tue, Jun 26, 2012 at 4:10 AM, Ond?ej ?ert?k wrote: >> >>> >>> My understanding is that Travis is simply trying to stress "We have to >>> think about the implications of our changes on existing users." and >>> also that little changes (with the best intentions!) that however mean >>> either a breakage or confusion for users (due to historical reasons) >>> should be avoided if possible. And I very strongly feel the same way. >>> And I think that most people on this list do as well. >> >> I think Travis is more concerned about API than ABI changes (in that >> example for 1.4, the ABI breakage was caused by a change that was >> pushed by Travis IIRC). >> >> The relative importance of API vs ABI is a tough one: I think ABI >> breakage is as bad as API breakage (but matter in different >> circumstances), but it is hard to improve the situation around our ABI >> without changing the API (especially everything around macros and >> publicly accessible structures). Changing this is politically >> difficult because nobody will upgrade to a new numpy with a different >> API just because it is cleaner, but without a cleaner API, it will be >> difficult to implement quite a few improvements. The situation is not >> that different form python 3, which has seen a poor adoption, and only >> starts having interesting feature on its own now. >> >> As for more concrete actions: I believe Wes McKinney has a >> comprehensive suite with multiple versions of numpy/pandas, I can't >> seem to find where that was mentioned, though. This would be a good >> starting point to check ABI matters (say pandas, mpl, scipy on top of >> multiple numpy). > > I will try to check as many packages as I can to see what actual problems > arise. I have created an issue for it: > > https://github.com/numpy/numpy/issues/319 > > Feel free to add more packages that you feel are important. I will try to check > at least the ones that are in the issue, and more if I have time. I will > close the issue once the upgrade path is clearly documented in the release > for every thing that breaks. I believe the basis can be 1.4.1 against which we build different packages, and then test each new version. There are also tools to check ABI compatibility (e.g. http://ispras.linuxbase.org/index.php/ABI_compliance_checker), but I have never used them. Being able to tell when a version of numpy breaks ABI would already be a good improvement. David From travis at continuum.io Tue Jun 26 00:17:29 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 25 Jun 2012 23:17:29 -0500 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> Message-ID: <028B7B6F-5AA7-4862-86A0-B6E90982F006@continuum.io> On Jun 25, 2012, at 10:35 PM, David Cournapeau wrote: > On Tue, Jun 26, 2012 at 4:10 AM, Ond?ej ?ert?k wrote: > >> >> My understanding is that Travis is simply trying to stress "We have to >> think about the implications of our changes on existing users." and >> also that little changes (with the best intentions!) that however mean >> either a breakage or confusion for users (due to historical reasons) >> should be avoided if possible. And I very strongly feel the same way. >> And I think that most people on this list do as well. > > I think Travis is more concerned about API than ABI changes (in that > example for 1.4, the ABI breakage was caused by a change that was > pushed by Travis IIRC). In the present climate, I'm going to have to provide additional context to a comment like this. This is not an accurate enough characterization of events. I was trying to get date-time changes in, for sure. I generally like feature additions to NumPy. (Robert Kern was also involved with that effort and it was funded by an active user of NumPy. I was concerned that the changes would break the ABI. In fact, I expected them to --- I was not against such changes, even though it was a change in previously discussed policy. We just needed to advertise them widely. Other voices, prevailed, however, and someone else believed the changes would not break ABI compatibility. Unfortunately, I did not have much time to look into the matter as I was working full time on other things. If I had had my way we would have released NumPy 1.5 at the time and widely advertised the ABI breakage (and moved at the same time to a design that would have made it easier to upgrade without breaking the ABI). I do not believe it would have been that big of a deal as long as we communicated correctly about the release. I still don't think it's correct to be overly concerned about ABI breakage in a world where packages can just be re-compiled against the new version in a matter of minutes with one hand and with the other make changes to the code base that change existing code behavior. I think the fact that the latter has occurred is evidence that we have to sacrifice one of them. And ABI compatibility is the preferred one to sacrifice by a long stretch in my view. -Travis From cournape at gmail.com Tue Jun 26 00:43:34 2012 From: cournape at gmail.com (David Cournapeau) Date: Tue, 26 Jun 2012 05:43:34 +0100 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: <028B7B6F-5AA7-4862-86A0-B6E90982F006@continuum.io> References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> <028B7B6F-5AA7-4862-86A0-B6E90982F006@continuum.io> Message-ID: On Tue, Jun 26, 2012 at 5:17 AM, Travis Oliphant wrote: > > On Jun 25, 2012, at 10:35 PM, David Cournapeau wrote: > >> On Tue, Jun 26, 2012 at 4:10 AM, Ond?ej ?ert?k wrote: >> >>> >>> My understanding is that Travis is simply trying to stress "We have to >>> think about the implications of our changes on existing users." and >>> also that little changes (with the best intentions!) that however mean >>> either a breakage or confusion for users (due to historical reasons) >>> should be avoided if possible. And I very strongly feel the same way. >>> And I think that most people on this list do as well. >> >> I think Travis is more concerned about API than ABI changes (in that >> example for 1.4, the ABI breakage was caused by a change that was >> pushed by Travis IIRC). > > In the present climate, I'm going to have to provide additional context to a comment like this. ?This is not an accurate enough characterization of events. ? I was trying to get date-time changes in, for sure. ? I generally like feature additions to NumPy. ? (Robert Kern was also involved with that effort and it was funded by an active user of NumPy. ? ?I was concerned that the changes would break the ABI. I did not mean to go back at old history, sorry. My main point was to highlight ABI vs API issues. Numpy needs to decide whether it attempts to keep ABI or not. We already had this discussion 2 years ago (for the issue mentioned by Ondrej), and the decision was not made. The arguments and their value did not really change. The issue is thus that a decision needs to be made over that disagreement in one way or the other. David From travis at continuum.io Tue Jun 26 00:48:29 2012 From: travis at continuum.io (Travis Oliphant) Date: Mon, 25 Jun 2012 23:48:29 -0500 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> <028B7B6F-5AA7-4862-86A0-B6E90982F006@continuum.io> Message-ID: <837E81A4-FD81-4525-A278-3C86858D1689@continuum.io> >> In the present climate, I'm going to have to provide additional context to a comment like this. This is not an accurate enough characterization of events. I was trying to get date-time changes in, for sure. I generally like feature additions to NumPy. (Robert Kern was also involved with that effort and it was funded by an active user of NumPy. I was concerned that the changes would break the ABI. > > I did not mean to go back at old history, sorry. My main point was to > highlight ABI vs API issues. Numpy needs to decide whether it attempts > to keep ABI or not. We already had this discussion 2 years ago (for > the issue mentioned by Ondrej), and the decision was not made. The > arguments and their value did not really change. The issue is thus > that a decision needs to be made over that disagreement in one way or > the other. > Thank you for clarifying and for being willing to look to the future. I agree a decision needs to be made. I think we will need to break the ABI. At this point, I don't know of any pressing features that would require it short of NumPy 2.0. -Travis > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From fperez.net at gmail.com Tue Jun 26 01:09:04 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Mon, 25 Jun 2012 22:09:04 -0700 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: <837E81A4-FD81-4525-A278-3C86858D1689@continuum.io> References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> <028B7B6F-5AA7-4862-86A0-B6E90982F006@continuum.io> <837E81A4-FD81-4525-A278-3C86858D1689@continuum.io> Message-ID: On Mon, Jun 25, 2012 at 9:48 PM, Travis Oliphant wrote: > I agree a decision needs to be made. ? I think we will need to break the ABI. ? ?At this point, I don't know of any pressing features that would require it short of NumPy 2.0. Sorry, I don't quite know how to parse the above, do you mean: 1. We will need to break ABI in the upcoming 1.7 release or 2. We will need to be more willing to accept ABI breakages in .Y releases (in X.Y convention) Just curious... Cheers, f From travis at continuum.io Tue Jun 26 01:20:58 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 26 Jun 2012 00:20:58 -0500 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> <028B7B6F-5AA7-4862-86A0-B6E90982F006@continuum.io> <837E81A4-FD81-4525-A278-3C86858D1689@continuum.io> Message-ID: On Jun 26, 2012, at 12:09 AM, Fernando Perez wrote: > On Mon, Jun 25, 2012 at 9:48 PM, Travis Oliphant wrote: >> I agree a decision needs to be made. I think we will need to break the ABI. At this point, I don't know of any pressing features that would require it short of NumPy 2.0. > > Sorry, I don't quite know how to parse the above, do you mean: > > 1. We will need to break ABI in the upcoming 1.7 release > > or > > 2. We will need to be more willing to accept ABI breakages in .Y > releases (in X.Y convention) Eventually we will need to break the ABI. We might as well wait until 2.0 at this point. -Travis From fperez.net at gmail.com Tue Jun 26 01:40:28 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Mon, 25 Jun 2012 22:40:28 -0700 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> <028B7B6F-5AA7-4862-86A0-B6E90982F006@continuum.io> <837E81A4-FD81-4525-A278-3C86858D1689@continuum.io> Message-ID: On Mon, Jun 25, 2012 at 10:20 PM, Travis Oliphant wrote: > Eventually we will need to break the ABI. ? We might as well wait until 2.0 at this point. Ah, got it; thanks for the clarification, I just didn't understand the original. Cheers, f From scopatz at gmail.com Tue Jun 26 02:19:07 2012 From: scopatz at gmail.com (Anthony Scopatz) Date: Tue, 26 Jun 2012 01:19:07 -0500 Subject: [Numpy-discussion] Numpy logo in VTK In-Reply-To: References: Message-ID: This is awesome! On Mon, Jun 25, 2012 at 5:27 AM, klo uo wrote: > I was reading mayavi documentation and one of the examples > (tvtk.ImageData) resembled Numpy logo grid. > I added barchart and tweaked a bit colormap and thought to post it for fun: > > ======================================== > import numpy as np > from tvtk.api import tvtk > from mayavi import mlab > > def view(dataset): > fig = mlab.figure(bgcolor=(1, 1, 1), fgcolor=(0, 0, 0), > figure=dataset.class_name[3:]) > surf = mlab.pipeline.surface(dataset, opacity=0.2) > mlab.pipeline.surface(mlab.pipeline.extract_edges(surf), color=(0, > 0, 0), line_width=.1 ) > mlab.barchart(n, extent=[0.05, 4.5, 0.05, 4.5, -.35, 1]) > > n=([[1,0,0,1], [1,0,1,1], [1,1,0,1], [1,0,0,1]]) > > data = np.random.random((5, 5, 5)) > i = tvtk.ImageData(spacing=(1, 1, 1), origin=(0, 0, 0)) > i.point_data.scalars = data.ravel() > i.point_data.scalars.name = 'scalars' > i.dimensions = data.shape > > view(i) > ======================================== > > > Cheers > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From klonuo at gmail.com Tue Jun 26 02:52:08 2012 From: klonuo at gmail.com (klo uo) Date: Tue, 26 Jun 2012 08:52:08 +0200 Subject: [Numpy-discussion] Numpy logo in VTK In-Reply-To: References: Message-ID: Heh, thanks :) It's free interpretation made from quick idea then immediately shared. Original logo can be made exact I guess with interlaced planes and shallower bars or similar... On Tue, Jun 26, 2012 at 8:19 AM, Anthony Scopatz wrote: > This is awesome! > From ondrej.certik at gmail.com Tue Jun 26 03:36:12 2012 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Tue, 26 Jun 2012 00:36:12 -0700 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: Message-ID: On Thu, Jun 21, 2012 at 3:11 AM, Travis Oliphant wrote: > Hey all, > > I made a branch called with_maskna and then merged Nathaniel's PR which removes the mask_na support from master. ?I then applied a patch to fix the boolean indexing problem reported by Ralf. > > I then created a NumPy 1.7.x maintenance branch from which the release of NumPy 1.7 will be made. ? Ondrej Certik and I will be managing the release of NumPy 1.7. ? Ondrej is the author of SymPy and has agreed to help get NumPy 1.7 out the door. ? Thanks, Ondrej for being willing to help in this way. > > In principal only bug-fixes should be pushed to the NumPy 1.7 branch at this point. ? The target is to make a release of NumPy 1.7.x by July 9th. ? The schedule we will work for is: > > RC1 -- June 25 > RC2 -- July ?5 > Release -- July 13 I worked on the release notes: https://github.com/numpy/numpy/pull/318 Please let me know if you think that I forgot some important feature or if you have any suggestions for improvement. If it looks pretty good, then I will start testing NumPy against packages (https://github.com/numpy/numpy/issues/319). Ondrej From d.s.seljebotn at astro.uio.no Tue Jun 26 05:27:24 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 26 Jun 2012 11:27:24 +0200 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> Message-ID: <4FE9807C.6020504@astro.uio.no> On 06/26/2012 05:35 AM, David Cournapeau wrote: > On Tue, Jun 26, 2012 at 4:10 AM, Ond?ej ?ert?k wrote: > >> >> My understanding is that Travis is simply trying to stress "We have to >> think about the implications of our changes on existing users." and >> also that little changes (with the best intentions!) that however mean >> either a breakage or confusion for users (due to historical reasons) >> should be avoided if possible. And I very strongly feel the same way. >> And I think that most people on this list do as well. > > I think Travis is more concerned about API than ABI changes (in that > example for 1.4, the ABI breakage was caused by a change that was > pushed by Travis IIRC). > > The relative importance of API vs ABI is a tough one: I think ABI > breakage is as bad as API breakage (but matter in different > circumstances), but it is hard to improve the situation around our ABI > without changing the API (especially everything around macros and > publicly accessible structures). Changing this is politically But I think it is *possible* to get to a situation where ABI isn't broken without changing API. I have posted such a proposal. If one uses the kind of C-level duck typing I describe in the link below, one would do typedef PyObject PyArrayObject; typedef struct { ... } NumPyArray; /* used to be PyArrayObject */ Thus, a ABI-hiding PyArray_SHAPE function could take either a PyArrayObject* or a PyObject*, since they would be the same. http://thread.gmane.org/gmane.comp.python.numeric.general/49997 (The technical parts are a bit out of date; me and Robert Bradshaw are in the 4th iteration of that concept for use within Cython, we are now hovering around perfect-hashing lookup tables that have 1ns branch-miss-free lookups and uses ~20us for construction/initialization). Dag From cournape at gmail.com Tue Jun 26 05:58:59 2012 From: cournape at gmail.com (David Cournapeau) Date: Tue, 26 Jun 2012 10:58:59 +0100 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: <4FE9807C.6020504@astro.uio.no> References: <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> <4FE9807C.6020504@astro.uio.no> Message-ID: On Tue, Jun 26, 2012 at 10:27 AM, Dag Sverre Seljebotn wrote: > On 06/26/2012 05:35 AM, David Cournapeau wrote: >> On Tue, Jun 26, 2012 at 4:10 AM, Ond?ej ?ert?k ?wrote: >> >>> >>> My understanding is that Travis is simply trying to stress "We have to >>> think about the implications of our changes on existing users." and >>> also that little changes (with the best intentions!) that however mean >>> either a breakage or confusion for users (due to historical reasons) >>> should be avoided if possible. And I very strongly feel the same way. >>> And I think that most people on this list do as well. >> >> I think Travis is more concerned about API than ABI changes (in that >> example for 1.4, the ABI breakage was caused by a change that was >> pushed by Travis IIRC). >> >> The relative importance of API vs ABI is a tough one: I think ABI >> breakage is as bad as API breakage (but matter in different >> circumstances), but it is hard to improve the situation around our ABI >> without changing the API (especially everything around macros and >> publicly accessible structures). Changing this is politically > > But I think it is *possible* to get to a situation where ABI isn't > broken without changing API. I have posted such a proposal. > If one uses the kind of C-level duck typing I describe in the link > below, one would do > > typedef PyObject PyArrayObject; > > typedef struct { > ? ?... > } NumPyArray; /* used to be PyArrayObject */ Maybe we're just in violent agreement, but whatever ends up being used would require to change the *current* C API, right ? If one wants to allow for changes in our structures more freely, we have to hide them from the headers, which means breaking the code that depends on the structure binary layout. Any code that access those directly will need to be changed. There is the particular issue of iterator, which seem quite difficult to make "ABI-safe" without losing significant performance. cheers, David From d.s.seljebotn at astro.uio.no Tue Jun 26 06:41:54 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 26 Jun 2012 12:41:54 +0200 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> <4FE9807C.6020504@astro.uio.no> Message-ID: <4FE991F2.6000202@astro.uio.no> On 06/26/2012 11:58 AM, David Cournapeau wrote: > On Tue, Jun 26, 2012 at 10:27 AM, Dag Sverre Seljebotn > wrote: >> On 06/26/2012 05:35 AM, David Cournapeau wrote: >>> On Tue, Jun 26, 2012 at 4:10 AM, Ond?ej ?ert?k wrote: >>> >>>> >>>> My understanding is that Travis is simply trying to stress "We have to >>>> think about the implications of our changes on existing users." and >>>> also that little changes (with the best intentions!) that however mean >>>> either a breakage or confusion for users (due to historical reasons) >>>> should be avoided if possible. And I very strongly feel the same way. >>>> And I think that most people on this list do as well. >>> >>> I think Travis is more concerned about API than ABI changes (in that >>> example for 1.4, the ABI breakage was caused by a change that was >>> pushed by Travis IIRC). >>> >>> The relative importance of API vs ABI is a tough one: I think ABI >>> breakage is as bad as API breakage (but matter in different >>> circumstances), but it is hard to improve the situation around our ABI >>> without changing the API (especially everything around macros and >>> publicly accessible structures). Changing this is politically >> >> But I think it is *possible* to get to a situation where ABI isn't >> broken without changing API. I have posted such a proposal. >> If one uses the kind of C-level duck typing I describe in the link >> below, one would do >> >> typedef PyObject PyArrayObject; >> >> typedef struct { >> ... >> } NumPyArray; /* used to be PyArrayObject */ > > Maybe we're just in violent agreement, but whatever ends up being used > would require to change the *current* C API, right ? If one wants to Accessing arr->dims[i] directly would need to change. But that's been discouraged for a long time. By "API" I meant access through the macros. One of the changes under discussion here is to change PyArray_SHAPE from a macro that accepts both PyObject* and PyArrayObject* to a function that only accepts PyArrayObject* (hence breakage). I'm saying that under my proposal, assuming I or somebody else can find the time to implement it under, you can both make it a function and have it accept both PyObject* and PyArrayObject* (since they are the same), undoing the breakage but allowing to hide the ABI. (It doesn't give you full flexibility in ABI, it does require that you somewhere have an "npy_intp dims[nd]" with the same lifetime as your object, etc., but I don't consider that a big disadvantage). > allow for changes in our structures more freely, we have to hide them > from the headers, which means breaking the code that depends on the > structure binary layout. Any code that access those directly will need > to be changed. > > There is the particular issue of iterator, which seem quite difficult > to make "ABI-safe" without losing significant performance. I don't agree (for some meanings of "ABI-safe"). You can export the data (dataptr/shape/strides) through the ABI, then the iterator uses these in whatever way it wishes consumer-side. Sort of like PEP 3118 without the performance degradation. The only sane way IMO of doing iteration is building it into the consumer anyway. I didn't think about whether API breakage would be needed for iterators though, that may be the case, I just didn't look at it yet. Dag From cournape at gmail.com Tue Jun 26 07:48:30 2012 From: cournape at gmail.com (David Cournapeau) Date: Tue, 26 Jun 2012 12:48:30 +0100 Subject: [Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch) Message-ID: Hi, I am just continuing the discussion around ABI/API, the technical side of things that is, as this is unrelated to 1.7.x. release. On Tue, Jun 26, 2012 at 11:41 AM, Dag Sverre Seljebotn wrote: > On 06/26/2012 11:58 AM, David Cournapeau wrote: >> On Tue, Jun 26, 2012 at 10:27 AM, Dag Sverre Seljebotn >> ?wrote: >>> On 06/26/2012 05:35 AM, David Cournapeau wrote: >>>> On Tue, Jun 26, 2012 at 4:10 AM, Ond?ej ?ert?k ? ?wrote: >>>> >>>>> >>>>> My understanding is that Travis is simply trying to stress "We have to >>>>> think about the implications of our changes on existing users." and >>>>> also that little changes (with the best intentions!) that however mean >>>>> either a breakage or confusion for users (due to historical reasons) >>>>> should be avoided if possible. And I very strongly feel the same way. >>>>> And I think that most people on this list do as well. >>>> >>>> I think Travis is more concerned about API than ABI changes (in that >>>> example for 1.4, the ABI breakage was caused by a change that was >>>> pushed by Travis IIRC). >>>> >>>> The relative importance of API vs ABI is a tough one: I think ABI >>>> breakage is as bad as API breakage (but matter in different >>>> circumstances), but it is hard to improve the situation around our ABI >>>> without changing the API (especially everything around macros and >>>> publicly accessible structures). Changing this is politically >>> >>> But I think it is *possible* to get to a situation where ABI isn't >>> broken without changing API. I have posted such a proposal. >>> If one uses the kind of C-level duck typing I describe in the link >>> below, one would do >>> >>> typedef PyObject PyArrayObject; >>> >>> typedef struct { >>> ? ? ... >>> } NumPyArray; /* used to be PyArrayObject */ >> >> Maybe we're just in violent agreement, but whatever ends up being used >> would require to change the *current* C API, right ? If one wants to > > Accessing arr->dims[i] directly would need to change. But that's been > discouraged for a long time. By "API" I meant access through the macros. > > One of the changes under discussion here is to change PyArray_SHAPE from > a macro that accepts both PyObject* and PyArrayObject* to a function > that only accepts PyArrayObject* (hence breakage). I'm saying that under > my proposal, assuming I or somebody else can find the time to implement > it under, you can both make it a function and have it accept both > PyObject* and PyArrayObject* (since they are the same), undoing the > breakage but allowing to hide the ABI. > > (It doesn't give you full flexibility in ABI, it does require that you > somewhere have an "npy_intp dims[nd]" with the same lifetime as your > object, etc., but I don't consider that a big disadvantage). > >> allow for changes in our structures more freely, we have to hide them >> from the headers, which means breaking the code that depends on the >> structure binary layout. Any code that access those directly will need >> to be changed. >> >> There is the particular issue of iterator, which seem quite difficult >> to make "ABI-safe" without losing significant performance. > > I don't agree (for some meanings of "ABI-safe"). You can export the data > (dataptr/shape/strides) through the ABI, then the iterator uses these in > whatever way it wishes consumer-side. Sort of like PEP 3118 without the > performance degradation. The only sane way IMO of doing iteration is > building it into the consumer anyway. (I have not read the whole cython discussion yet) What do you mean by "building iteration in the consumer" ? My understanding is that any data export would be done through a level of indirection (dataptr/shape/strides). Conceptually, I can't see how one could keep ABI without that level of indirection without some compile. In the case of iterator, that means multiple pointer chasing per sample -- i.e. the tight loop issue you mentioned earlier for PyArray_DATA is the common case for iterator. I can only see two ways of doing fast (special casing) iteration: compile-time special casing or runtime optimization. Compile-time requires access to the internals (even if one were to use C++ with advanced template magic ala STL/iterator, I don't think one can get performance if everything is not in the headers, but maybe C++ compilers are super smart those days in ways I can't comprehend). I would think runtime is the long-term solution, but that's far away, David From travis at continuum.io Tue Jun 26 09:17:46 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 26 Jun 2012 08:17:46 -0500 Subject: [Numpy-discussion] Numpy logo in VTK In-Reply-To: References: Message-ID: It would be really awesome to have a script like this to generate the logo. That's pretty amazing. Would you be able to tweak it up a bit and then we could take a poll here? Perhaps we change the logo to a variation of what your script produces. Can you export a PNG? -Travis On Jun 26, 2012, at 1:52 AM, klo uo wrote: > Heh, thanks :) > It's free interpretation made from quick idea then immediately shared. > Original logo can be made exact I guess with interlaced planes and > shallower bars or similar... > > > On Tue, Jun 26, 2012 at 8:19 AM, Anthony Scopatz wrote: >> This is awesome! >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From d.s.seljebotn at astro.uio.no Tue Jun 26 09:40:29 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 26 Jun 2012 15:40:29 +0200 Subject: [Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch) In-Reply-To: References: Message-ID: <4FE9BBCD.4070009@astro.uio.no> On 06/26/2012 01:48 PM, David Cournapeau wrote: > Hi, > > I am just continuing the discussion around ABI/API, the technical side > of things that is, as this is unrelated to 1.7.x. release. > > On Tue, Jun 26, 2012 at 11:41 AM, Dag Sverre Seljebotn > wrote: >> On 06/26/2012 11:58 AM, David Cournapeau wrote: >>> On Tue, Jun 26, 2012 at 10:27 AM, Dag Sverre Seljebotn >>> wrote: >>>> On 06/26/2012 05:35 AM, David Cournapeau wrote: >>>>> On Tue, Jun 26, 2012 at 4:10 AM, Ond?ej ?ert?k wrote: >>>>> >>>>>> >>>>>> My understanding is that Travis is simply trying to stress "We have to >>>>>> think about the implications of our changes on existing users." and >>>>>> also that little changes (with the best intentions!) that however mean >>>>>> either a breakage or confusion for users (due to historical reasons) >>>>>> should be avoided if possible. And I very strongly feel the same way. >>>>>> And I think that most people on this list do as well. >>>>> >>>>> I think Travis is more concerned about API than ABI changes (in that >>>>> example for 1.4, the ABI breakage was caused by a change that was >>>>> pushed by Travis IIRC). >>>>> >>>>> The relative importance of API vs ABI is a tough one: I think ABI >>>>> breakage is as bad as API breakage (but matter in different >>>>> circumstances), but it is hard to improve the situation around our ABI >>>>> without changing the API (especially everything around macros and >>>>> publicly accessible structures). Changing this is politically >>>> >>>> But I think it is *possible* to get to a situation where ABI isn't >>>> broken without changing API. I have posted such a proposal. >>>> If one uses the kind of C-level duck typing I describe in the link >>>> below, one would do >>>> >>>> typedef PyObject PyArrayObject; >>>> >>>> typedef struct { >>>> ... >>>> } NumPyArray; /* used to be PyArrayObject */ >>> >>> Maybe we're just in violent agreement, but whatever ends up being used >>> would require to change the *current* C API, right ? If one wants to >> >> Accessing arr->dims[i] directly would need to change. But that's been >> discouraged for a long time. By "API" I meant access through the macros. >> >> One of the changes under discussion here is to change PyArray_SHAPE from >> a macro that accepts both PyObject* and PyArrayObject* to a function >> that only accepts PyArrayObject* (hence breakage). I'm saying that under >> my proposal, assuming I or somebody else can find the time to implement >> it under, you can both make it a function and have it accept both >> PyObject* and PyArrayObject* (since they are the same), undoing the >> breakage but allowing to hide the ABI. >> >> (It doesn't give you full flexibility in ABI, it does require that you >> somewhere have an "npy_intp dims[nd]" with the same lifetime as your >> object, etc., but I don't consider that a big disadvantage). >> >>> allow for changes in our structures more freely, we have to hide them >>> from the headers, which means breaking the code that depends on the >>> structure binary layout. Any code that access those directly will need >>> to be changed. >>> >>> There is the particular issue of iterator, which seem quite difficult >>> to make "ABI-safe" without losing significant performance. >> >> I don't agree (for some meanings of "ABI-safe"). You can export the data >> (dataptr/shape/strides) through the ABI, then the iterator uses these in >> whatever way it wishes consumer-side. Sort of like PEP 3118 without the >> performance degradation. The only sane way IMO of doing iteration is >> building it into the consumer anyway. > > (I have not read the whole cython discussion yet) I'll try to write a summary and post it when I can get around to it. > > What do you mean by "building iteration in the consumer" ? My "consumer" is the user of the NumPy C API. So I meant that the iteration logic is all in C header files and compiled again for each such consumer. Iterators don't cross the ABI boundary. > understanding is that any data export would be done through a level of > indirection (dataptr/shape/strides). Conceptually, I can't see how one > could keep ABI without that level of indirection without some compile. > In the case of iterator, that means multiple pointer chasing per > sample -- i.e. the tight loop issue you mentioned earlier for > PyArray_DATA is the common case for iterator. Even if you do indirection, iterator utilities that are compiled in the "consumer"/user code can cache the data that's retrieved. Iterators just do // setup crossing ABI npy_intp *shape = PyArray_DIMS(arr); npy_intp *strides = PyArray_STRIDES(arr); ... // performance-sensitive code just accesses cached pointers and don't // cross ABI We're probably in violent agreement and just talking past one another...? > > I can only see two ways of doing fast (special casing) iteration: > compile-time special casing or runtime optimization. Compile-time > requires access to the internals (even if one were to use C++ with > advanced template magic ala STL/iterator, I don't think one can get > performance if everything is not in the headers, but maybe C++ > compilers are super smart those days in ways I can't comprehend). I > would think runtime is the long-term solution, but that's far away, Going slightly OT, then IMO, the *only* long-term solution in 2012 is LLVM. That allows you to do any level of inlining and special casing and optimization at run-time, which is the only way of matching needs for performance with using Python at all. Mark Florisson is heading down that road this summer with his 'minivect' project (essentially, code generation for optimal iteration over NumPy (or NumPy-like) arrays that can be used both by Cython (C code generation backend) and Numba (LLVM code generation backend)). Relying on C++ metaprogramming to implement iterators is like using the technology of the 80's to build the NumPy of the 2010's. It can only be exported to Python in a crippled form, so kind of useless. (C++ to implement the core that sits behind an ABI is another matter, I don't have an opinion on that. But iterators can't be behind the ABI, as I think we agree on.) Dag From charlesr.harris at gmail.com Tue Jun 26 10:00:53 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 26 Jun 2012 08:00:53 -0600 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> Message-ID: On Mon, Jun 25, 2012 at 9:10 PM, Ond?ej ?ert?k wrote: > On Mon, Jun 25, 2012 at 7:38 PM, Fernando Perez > wrote: > > On Mon, Jun 25, 2012 at 6:39 PM, Travis Oliphant > wrote: > >> > >> On Jun 25, 2012, at 7:21 PM, Fernando Perez wrote: > > > >> > >> For context, consider that for many years, the word "gratuitous" has > been used in a non-derogatory way in the Python ecosystem to describe > changes to semantics and syntax that don't have benefits significant enough > to offset the pain it will cause to existing users. That's why I used > the word. I am not trying to be derogatory. I am trying to be clear > that we need to respect existing users of NumPy more than we have done from > 1.5 to 1.7 in the enthusiasm to make changes. > >> > > > > For reference, here's the (long) thread where this came to be: > > > > http://mail.scipy.org/pipermail/scipy-dev/2009-October/012958.html > > > > It's worth noting that at the time, the discussion was for an addition > > to *scipy*, not to numpy. I don't know when things were moved over to > > numpy. > > > > > >> Working on the NumPy code base implies respecting the conventions that > are already in place --- not just disregarding them and doing whatever we > want. I'm not really sure why I have to argue the existing users point > of view so much recently. I would hope that all of us would have the > perspective that the people who have adopted NumPy deserve to be treated > with respect. The changes that grate on me are the ones that seem to > take lightly existing users of NumPy. > >> > > > > I certainly appreciate the need to not break user habits/code, as we > > struggle with the very same issue in IPython all the time. And > > obviously at this point numpy is 'core infrastructure' enough that > > breaking backwards compatibility in any way should be very strongly > > discouraged (things were probably a bit different back in 2009). > > > >>> I know that this particular issue grates you quite a bit, but I urge > >>> you to be fair in your appreciation of how it came to be: through the > >>> work of well-intentioned and thoughtful (but not omniscient) people > >>> when you weren't participating actively in numpy development. > >> > >> I'm trying very hard to be fair --- especially to changes like this. > What grates me are changes that affect our user base in a negative way --- > specifically by causing code that used to work to no longer work or create > alterations to real conventions. This kind of change is just not > acceptable if we can avoid it. I'm really trying to understand why others > do not feel so strongly about this, but I'm not persuaded by what I've > heard so far. > > > > I just want to note that I'm not advocating for *any* > > backwards-compatibility breakage in numpy at this point... I was just > > providing context for a discussion that happened back in 2009, and in > > the scipy list. I certainly feel pretty strongly at this point about > > the importance of preserving working code *today*, given the role of > > numpy at the 'root node' of the scipy ecosystem tree and the size of > > said tree. > > I think that everybody strongly agrees that backward incompatible > changes should not be made. > > Sometimes it can be more subtle, > see for example this numpy bug report in Debian: > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=589835 > > and read the dozens of emails that it generated, e.g. > http://lists.debian.org/debian-python/2010/07/msg00048.html, and so > on. I've been hit by this problem too, that's why I remember it -- > suddenly many packages that depend on NumPy stopped working in a > subtle way and I had to spent hours figuring out what went wrong and > that the problem is not in h5py, but actually that NumPy has changed > its ABI, or more precisely the problem is described here (some new > members were added to a C datastructure): > http://lists.debian.org/debian-python/2010/07/msg00045.html > I am sure that this ABI change had to be done and there were good > reasons for it and this particular change probably even couldn't have > been avoided. But nevertheless it has caused headaches to a lot of > people downstream. I just looked into the release notes for NumPy > 1.4.0 and didn't find this change nor how to fix it in there. I am > just posting this as a particular, concrete, real life example of > consequences for the end users. > Let us note that that problem was due to Travis convincing David to include the Datetime work in the release against David's own best judgement. The result was a delay of several months until Ralf could get up to speed and get 1.4.1 out. Let us also note that poly1d is actually not the same as Matlab poly1d. > > My understanding is that Travis is simply trying to stress "We have to > think about the implications of our changes on existing users." and > also that little changes (with the best intentions!) that however mean > either a breakage or confusion for users (due to historical reasons) > should be avoided if possible. And I very strongly feel the same way. > And I think that most people on this list do as well. > > But sometimes I guess mistakes are made anyway. What can be done to > avoid similar issues like with the polynomial order in the future? > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Tue Jun 26 10:08:01 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 26 Jun 2012 16:08:01 +0200 Subject: [Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch) In-Reply-To: References: Message-ID: <4FE9C241.4000302@astro.uio.no> On 06/26/2012 01:48 PM, David Cournapeau wrote: > Hi, > > I am just continuing the discussion around ABI/API, the technical side > of things that is, as this is unrelated to 1.7.x. release. > > On Tue, Jun 26, 2012 at 11:41 AM, Dag Sverre Seljebotn > wrote: >> On 06/26/2012 11:58 AM, David Cournapeau wrote: >>> On Tue, Jun 26, 2012 at 10:27 AM, Dag Sverre Seljebotn >>> wrote: >>>> On 06/26/2012 05:35 AM, David Cournapeau wrote: >>>>> On Tue, Jun 26, 2012 at 4:10 AM, Ond?ej ?ert?k wrote: >>>>> >>>>>> >>>>>> My understanding is that Travis is simply trying to stress "We have to >>>>>> think about the implications of our changes on existing users." and >>>>>> also that little changes (with the best intentions!) that however mean >>>>>> either a breakage or confusion for users (due to historical reasons) >>>>>> should be avoided if possible. And I very strongly feel the same way. >>>>>> And I think that most people on this list do as well. >>>>> >>>>> I think Travis is more concerned about API than ABI changes (in that >>>>> example for 1.4, the ABI breakage was caused by a change that was >>>>> pushed by Travis IIRC). >>>>> >>>>> The relative importance of API vs ABI is a tough one: I think ABI >>>>> breakage is as bad as API breakage (but matter in different >>>>> circumstances), but it is hard to improve the situation around our ABI >>>>> without changing the API (especially everything around macros and >>>>> publicly accessible structures). Changing this is politically >>>> >>>> But I think it is *possible* to get to a situation where ABI isn't >>>> broken without changing API. I have posted such a proposal. >>>> If one uses the kind of C-level duck typing I describe in the link >>>> below, one would do >>>> >>>> typedef PyObject PyArrayObject; >>>> >>>> typedef struct { >>>> ... >>>> } NumPyArray; /* used to be PyArrayObject */ >>> >>> Maybe we're just in violent agreement, but whatever ends up being used >>> would require to change the *current* C API, right ? If one wants to >> >> Accessing arr->dims[i] directly would need to change. But that's been >> discouraged for a long time. By "API" I meant access through the macros. >> >> One of the changes under discussion here is to change PyArray_SHAPE from >> a macro that accepts both PyObject* and PyArrayObject* to a function >> that only accepts PyArrayObject* (hence breakage). I'm saying that under >> my proposal, assuming I or somebody else can find the time to implement >> it under, you can both make it a function and have it accept both >> PyObject* and PyArrayObject* (since they are the same), undoing the >> breakage but allowing to hide the ABI. >> >> (It doesn't give you full flexibility in ABI, it does require that you >> somewhere have an "npy_intp dims[nd]" with the same lifetime as your >> object, etc., but I don't consider that a big disadvantage). >> >>> allow for changes in our structures more freely, we have to hide them >>> from the headers, which means breaking the code that depends on the >>> structure binary layout. Any code that access those directly will need >>> to be changed. >>> >>> There is the particular issue of iterator, which seem quite difficult >>> to make "ABI-safe" without losing significant performance. >> >> I don't agree (for some meanings of "ABI-safe"). You can export the data >> (dataptr/shape/strides) through the ABI, then the iterator uses these in >> whatever way it wishes consumer-side. Sort of like PEP 3118 without the >> performance degradation. The only sane way IMO of doing iteration is >> building it into the consumer anyway. > > (I have not read the whole cython discussion yet) So here's the summary. It's rather complicated but also incredibly neat :-) And technical details can be hidden behind a tight API. - We introduce a C-level metaclass, "extensibletype", which to each type adds a branch-miss-free string->pointer hash table. The ndarray type is made an instance of this metaclass, so that you can do PyCustomSlots_GetTable(array_object->ob_type) - The hash table uses a perfect hashing scheme: a) We take the lower 64 bits of md5 of the lookup string (this can be done compile-time or module-load-time) as a pre-hash "h". b) When looking up the table for a key with pre-hash "h", the index in the table is given by ((h >> table->r) & table->m1) ^ table->d[r & table->m2] Then, *if* the element is present, it will always be found on the first try; the table is guaranteed collisionless. This means that an expensive branch-miss can be avoided. It is really incredibly fast in practice, with a 0.5 ns penalty on my 1.8 GHz laptop. The magic is in finding the right table->r and table->d. For a 64-slot table, parameters r and d[0]..d[63] can be found in 10us on my machine (it's an O(n) operation). (table->d[i] has type uint16_t) (This algorithm was found in an academic paper which I'm too lazy to dig up from that thread right now; perfect hashing is an active research field.) The result? You can use this table to store function pointers in the type, like C++ virtual tables or like the built-in slots like tp_get_buffer, but *without* having to agree on everything at compile-time like in C++. And the only penalty is ~0.5 ns per call and some cache usage. Cython would use this to replace the current custom "cdef class" vtable with something more tools could agree on, e.g. store function pointers in the table with keys like "method:foo:i4i8->f4" But NumPy could easily store entries relating to its C API in the same hash table, "numpy:SHAPE" Then, the C API functions would all take PyObject*, look up the fast hash table on the ob_type. This allows for incredibly flexible duck typing on the C level. PyArray_Check would just check for the presence of the C API but not care about the actual Python type, i.e., no PyObject_TypeCheck. Me and Robert have talked a lot about this and will move forward with it for Cython. Obviously I don't expect others than me to pick it up for NumPy so we'll see... I'll write up a specification document sometimes over the next couple of weeks as we need that even if only for Cython. Dag From d.s.seljebotn at astro.uio.no Tue Jun 26 10:10:26 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 26 Jun 2012 16:10:26 +0200 Subject: [Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch) In-Reply-To: <4FE9C241.4000302@astro.uio.no> References: <4FE9C241.4000302@astro.uio.no> Message-ID: <4FE9C2D2.1090500@astro.uio.no> On 06/26/2012 04:08 PM, Dag Sverre Seljebotn wrote: > On 06/26/2012 01:48 PM, David Cournapeau wrote: >> Hi, >> >> I am just continuing the discussion around ABI/API, the technical side >> of things that is, as this is unrelated to 1.7.x. release. >> >> On Tue, Jun 26, 2012 at 11:41 AM, Dag Sverre Seljebotn >> wrote: >>> On 06/26/2012 11:58 AM, David Cournapeau wrote: >>>> On Tue, Jun 26, 2012 at 10:27 AM, Dag Sverre Seljebotn >>>> wrote: >>>>> On 06/26/2012 05:35 AM, David Cournapeau wrote: >>>>>> On Tue, Jun 26, 2012 at 4:10 AM, Ond?ej >>>>>> ?ert?k wrote: >>>>>> >>>>>>> >>>>>>> My understanding is that Travis is simply trying to stress "We >>>>>>> have to >>>>>>> think about the implications of our changes on existing users." and >>>>>>> also that little changes (with the best intentions!) that however >>>>>>> mean >>>>>>> either a breakage or confusion for users (due to historical reasons) >>>>>>> should be avoided if possible. And I very strongly feel the same >>>>>>> way. >>>>>>> And I think that most people on this list do as well. >>>>>> >>>>>> I think Travis is more concerned about API than ABI changes (in that >>>>>> example for 1.4, the ABI breakage was caused by a change that was >>>>>> pushed by Travis IIRC). >>>>>> >>>>>> The relative importance of API vs ABI is a tough one: I think ABI >>>>>> breakage is as bad as API breakage (but matter in different >>>>>> circumstances), but it is hard to improve the situation around our >>>>>> ABI >>>>>> without changing the API (especially everything around macros and >>>>>> publicly accessible structures). Changing this is politically >>>>> >>>>> But I think it is *possible* to get to a situation where ABI isn't >>>>> broken without changing API. I have posted such a proposal. >>>>> If one uses the kind of C-level duck typing I describe in the link >>>>> below, one would do >>>>> >>>>> typedef PyObject PyArrayObject; >>>>> >>>>> typedef struct { >>>>> ... >>>>> } NumPyArray; /* used to be PyArrayObject */ >>>> >>>> Maybe we're just in violent agreement, but whatever ends up being used >>>> would require to change the *current* C API, right ? If one wants to >>> >>> Accessing arr->dims[i] directly would need to change. But that's been >>> discouraged for a long time. By "API" I meant access through the macros. >>> >>> One of the changes under discussion here is to change PyArray_SHAPE from >>> a macro that accepts both PyObject* and PyArrayObject* to a function >>> that only accepts PyArrayObject* (hence breakage). I'm saying that under >>> my proposal, assuming I or somebody else can find the time to implement >>> it under, you can both make it a function and have it accept both >>> PyObject* and PyArrayObject* (since they are the same), undoing the >>> breakage but allowing to hide the ABI. >>> >>> (It doesn't give you full flexibility in ABI, it does require that you >>> somewhere have an "npy_intp dims[nd]" with the same lifetime as your >>> object, etc., but I don't consider that a big disadvantage). >>> >>>> allow for changes in our structures more freely, we have to hide them >>>> from the headers, which means breaking the code that depends on the >>>> structure binary layout. Any code that access those directly will need >>>> to be changed. >>>> >>>> There is the particular issue of iterator, which seem quite difficult >>>> to make "ABI-safe" without losing significant performance. >>> >>> I don't agree (for some meanings of "ABI-safe"). You can export the data >>> (dataptr/shape/strides) through the ABI, then the iterator uses these in >>> whatever way it wishes consumer-side. Sort of like PEP 3118 without the >>> performance degradation. The only sane way IMO of doing iteration is >>> building it into the consumer anyway. >> >> (I have not read the whole cython discussion yet) > > So here's the summary. It's rather complicated but also incredibly neat > :-) And technical details can be hidden behind a tight API. > > - We introduce a C-level metaclass, "extensibletype", which to each type > adds a branch-miss-free string->pointer hash table. The ndarray type is > made an instance of this metaclass, so that you can do > > PyCustomSlots_GetTable(array_object->ob_type) > > - The hash table uses a perfect hashing scheme: > > a) We take the lower 64 bits of md5 of the lookup string (this can be > done compile-time or module-load-time) as a pre-hash "h". > > b) When looking up the table for a key with pre-hash "h", the index in > the table is given by > > ((h >> table->r) & table->m1) ^ table->d[r & table->m2] Sorry, typo. Should be ((h >> table->r) & table->m1) ^ table->d[h & table->m2] What happens is that "h & table->m2" sorts the keys of the table into n buckets. Then "r" is selected (among 64 possible choices) so that there's no intra-bucket collisions. Finally, d is chosen so that none of the buckets collide, starting with the largest one. Dag > > Then, *if* the element is present, it will always be found on the first > try; the table is guaranteed collisionless. This means that an expensive > branch-miss can be avoided. It is really incredibly fast in practice, > with a 0.5 ns penalty on my 1.8 GHz laptop. > > The magic is in finding the right table->r and table->d. For a 64-slot > table, parameters r and d[0]..d[63] can be found in 10us on my machine > (it's an O(n) operation). (table->d[i] has type uint16_t) > > (This algorithm was found in an academic paper which I'm too lazy to dig > up from that thread right now; perfect hashing is an active research > field.) > > The result? You can use this table to store function pointers in the > type, like C++ virtual tables or like the built-in slots like > tp_get_buffer, but *without* having to agree on everything at > compile-time like in C++. And the only penalty is ~0.5 ns per call and > some cache usage. > > Cython would use this to replace the current custom "cdef class" vtable > with something more tools could agree on, e.g. store function pointers > in the table with keys like > > "method:foo:i4i8->f4" > > But NumPy could easily store entries relating to its C API in the same > hash table, > > "numpy:SHAPE" > > Then, the C API functions would all take PyObject*, look up the fast > hash table on the ob_type. > > This allows for incredibly flexible duck typing on the C level. > > PyArray_Check would just check for the presence of the C API but not > care about the actual Python type, i.e., no PyObject_TypeCheck. > > Me and Robert have talked a lot about this and will move forward with it > for Cython. Obviously I don't expect others than me to pick it up for > NumPy so we'll see... I'll write up a specification document sometimes > over the next couple of weeks as we need that even if only for Cython. > > Dag From cournape at gmail.com Tue Jun 26 10:15:25 2012 From: cournape at gmail.com (David Cournapeau) Date: Tue, 26 Jun 2012 15:15:25 +0100 Subject: [Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch) In-Reply-To: <4FE9BBCD.4070009@astro.uio.no> References: <4FE9BBCD.4070009@astro.uio.no> Message-ID: On Tue, Jun 26, 2012 at 2:40 PM, Dag Sverre Seljebotn wrote: > On 06/26/2012 01:48 PM, David Cournapeau wrote: >> Hi, >> >> I am just continuing the discussion around ABI/API, the technical side >> of things that is, as this is unrelated to 1.7.x. release. >> >> On Tue, Jun 26, 2012 at 11:41 AM, Dag Sverre Seljebotn >> ?wrote: >>> On 06/26/2012 11:58 AM, David Cournapeau wrote: >>>> On Tue, Jun 26, 2012 at 10:27 AM, Dag Sverre Seljebotn >>>> ? ?wrote: >>>>> On 06/26/2012 05:35 AM, David Cournapeau wrote: >>>>>> On Tue, Jun 26, 2012 at 4:10 AM, Ond?ej ?ert?k ? ? ?wrote: >>>>>> >>>>>>> >>>>>>> My understanding is that Travis is simply trying to stress "We have to >>>>>>> think about the implications of our changes on existing users." and >>>>>>> also that little changes (with the best intentions!) that however mean >>>>>>> either a breakage or confusion for users (due to historical reasons) >>>>>>> should be avoided if possible. And I very strongly feel the same way. >>>>>>> And I think that most people on this list do as well. >>>>>> >>>>>> I think Travis is more concerned about API than ABI changes (in that >>>>>> example for 1.4, the ABI breakage was caused by a change that was >>>>>> pushed by Travis IIRC). >>>>>> >>>>>> The relative importance of API vs ABI is a tough one: I think ABI >>>>>> breakage is as bad as API breakage (but matter in different >>>>>> circumstances), but it is hard to improve the situation around our ABI >>>>>> without changing the API (especially everything around macros and >>>>>> publicly accessible structures). Changing this is politically >>>>> >>>>> But I think it is *possible* to get to a situation where ABI isn't >>>>> broken without changing API. I have posted such a proposal. >>>>> If one uses the kind of C-level duck typing I describe in the link >>>>> below, one would do >>>>> >>>>> typedef PyObject PyArrayObject; >>>>> >>>>> typedef struct { >>>>> ? ? ?... >>>>> } NumPyArray; /* used to be PyArrayObject */ >>>> >>>> Maybe we're just in violent agreement, but whatever ends up being used >>>> would require to change the *current* C API, right ? If one wants to >>> >>> Accessing arr->dims[i] directly would need to change. But that's been >>> discouraged for a long time. By "API" I meant access through the macros. >>> >>> One of the changes under discussion here is to change PyArray_SHAPE from >>> a macro that accepts both PyObject* and PyArrayObject* to a function >>> that only accepts PyArrayObject* (hence breakage). I'm saying that under >>> my proposal, assuming I or somebody else can find the time to implement >>> it under, you can both make it a function and have it accept both >>> PyObject* and PyArrayObject* (since they are the same), undoing the >>> breakage but allowing to hide the ABI. >>> >>> (It doesn't give you full flexibility in ABI, it does require that you >>> somewhere have an "npy_intp dims[nd]" with the same lifetime as your >>> object, etc., but I don't consider that a big disadvantage). >>> >>>> allow for changes in our structures more freely, we have to hide them >>>> from the headers, which means breaking the code that depends on the >>>> structure binary layout. Any code that access those directly will need >>>> to be changed. >>>> >>>> There is the particular issue of iterator, which seem quite difficult >>>> to make "ABI-safe" without losing significant performance. >>> >>> I don't agree (for some meanings of "ABI-safe"). You can export the data >>> (dataptr/shape/strides) through the ABI, then the iterator uses these in >>> whatever way it wishes consumer-side. Sort of like PEP 3118 without the >>> performance degradation. The only sane way IMO of doing iteration is >>> building it into the consumer anyway. >> >> (I have not read the whole cython discussion yet) > > I'll try to write a summary and post it when I can get around to it. > >> >> What do you mean by "building iteration in the consumer" ? My > > "consumer" is the user of the NumPy C API. So I meant that the iteration > logic is all in C header files and compiled again for each such > consumer. Iterators don't cross the ABI boundary. > >> understanding is that any data export would be done through a level of >> indirection (dataptr/shape/strides). Conceptually, I can't see how one >> could keep ABI without that level of indirection without some compile. >> In the case of iterator, that means multiple pointer chasing per >> sample -- i.e. the tight loop issue you mentioned earlier for >> PyArray_DATA is the common case for iterator. > > Even if you do indirection, iterator utilities that are compiled in the > "consumer"/user code can cache the data that's retrieved. > > Iterators just do > > // setup crossing ABI > npy_intp *shape = PyArray_DIMS(arr); > npy_intp *strides = PyArray_STRIDES(arr); > ... > // performance-sensitive code just accesses cached pointers and don't > // cross ABI The problem is that iterators need more that this. But thinking more about it, I am not so dead sure we could not get there. I will need to play with some code. > > Going slightly OT, then IMO, the *only* long-term solution in 2012 is > LLVM. That allows you to do any level of inlining and special casing and > optimization at run-time, which is the only way of matching needs for > performance with using Python at all. > > Mark Florisson is heading down that road this summer with his 'minivect' > project (essentially, code generation for optimal iteration over NumPy > (or NumPy-like) arrays that can be used both by Cython (C code > generation backend) and Numba (LLVM code generation backend)). > > Relying on C++ metaprogramming to implement iterators is like using the > technology of the 80's to build the NumPy of the 2010's. It can only be > exported to Python in a crippled form, so kind of useless. (C++ to > implement the core that sits behind an ABI is another matter, I don't > have an opinion on that. But iterators can't be behind the ABI, as I > think we agree on.) Well, no need to convince me about which of the two solutions is the most appropriate. I was just trying to appear more unbiased than I really am :) David From travis at continuum.io Tue Jun 26 10:52:42 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 26 Jun 2012 09:52:42 -0500 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> Message-ID: On Jun 26, 2012, at 9:00 AM, Charles R Harris wrote: > > > On Mon, Jun 25, 2012 at 9:10 PM, Ond?ej ?ert?k wrote: > On Mon, Jun 25, 2012 at 7:38 PM, Fernando Perez wrote: > > On Mon, Jun 25, 2012 at 6:39 PM, Travis Oliphant wrote: > >> > >> On Jun 25, 2012, at 7:21 PM, Fernando Perez wrote: > > > >> > >> For context, consider that for many years, the word "gratuitous" has been used in a non-derogatory way in the Python ecosystem to describe changes to semantics and syntax that don't have benefits significant enough to offset the pain it will cause to existing users. That's why I used the word. I am not trying to be derogatory. I am trying to be clear that we need to respect existing users of NumPy more than we have done from 1.5 to 1.7 in the enthusiasm to make changes. > >> > > > > For reference, here's the (long) thread where this came to be: > > > > http://mail.scipy.org/pipermail/scipy-dev/2009-October/012958.html > > > > It's worth noting that at the time, the discussion was for an addition > > to *scipy*, not to numpy. I don't know when things were moved over to > > numpy. > > > > > >> Working on the NumPy code base implies respecting the conventions that are already in place --- not just disregarding them and doing whatever we want. I'm not really sure why I have to argue the existing users point of view so much recently. I would hope that all of us would have the perspective that the people who have adopted NumPy deserve to be treated with respect. The changes that grate on me are the ones that seem to take lightly existing users of NumPy. > >> > > > > I certainly appreciate the need to not break user habits/code, as we > > struggle with the very same issue in IPython all the time. And > > obviously at this point numpy is 'core infrastructure' enough that > > breaking backwards compatibility in any way should be very strongly > > discouraged (things were probably a bit different back in 2009). > > > >>> I know that this particular issue grates you quite a bit, but I urge > >>> you to be fair in your appreciation of how it came to be: through the > >>> work of well-intentioned and thoughtful (but not omniscient) people > >>> when you weren't participating actively in numpy development. > >> > >> I'm trying very hard to be fair --- especially to changes like this. What grates me are changes that affect our user base in a negative way --- specifically by causing code that used to work to no longer work or create alterations to real conventions. This kind of change is just not acceptable if we can avoid it. I'm really trying to understand why others do not feel so strongly about this, but I'm not persuaded by what I've heard so far. > > > > I just want to note that I'm not advocating for *any* > > backwards-compatibility breakage in numpy at this point... I was just > > providing context for a discussion that happened back in 2009, and in > > the scipy list. I certainly feel pretty strongly at this point about > > the importance of preserving working code *today*, given the role of > > numpy at the 'root node' of the scipy ecosystem tree and the size of > > said tree. > > I think that everybody strongly agrees that backward incompatible > changes should not be made. > > Sometimes it can be more subtle, > see for example this numpy bug report in Debian: > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=589835 > > and read the dozens of emails that it generated, e.g. > http://lists.debian.org/debian-python/2010/07/msg00048.html, and so > on. I've been hit by this problem too, that's why I remember it -- > suddenly many packages that depend on NumPy stopped working in a > subtle way and I had to spent hours figuring out what went wrong and > that the problem is not in h5py, but actually that NumPy has changed > its ABI, or more precisely the problem is described here (some new > members were added to a C datastructure): > http://lists.debian.org/debian-python/2010/07/msg00045.html > I am sure that this ABI change had to be done and there were good > reasons for it and this particular change probably even couldn't have > been avoided. But nevertheless it has caused headaches to a lot of > people downstream. I just looked into the release notes for NumPy > 1.4.0 and didn't find this change nor how to fix it in there. I am > just posting this as a particular, concrete, real life example of > consequences for the end users. > > Let us note that that problem was due to Travis convincing David to include the Datetime work in the release against David's own best judgement. The result was a delay of several months until Ralf could get up to speed and get 1.4.1 out. Let us also note that poly1d is actually not the same as Matlab poly1d. This is not accurate, Charles. Please stop trying to dredge up old history you don't know the full story about and are trying to create an alternate reality about. It doesn't help anything and is quite poisonous to this mailing list. You have a narrative about the past that seems very different from mine --- and you apparently blame me personally for all that is wrong with NumPy. This is not a helpful perspective and it just alienates us further and is a very polarizing perspective. This is not good for the community nor for our ability to work productively together. I hope that it is not a permanent reality and you will find a way to see things in a different light. -Travis > > > My understanding is that Travis is simply trying to stress "We have to > think about the implications of our changes on existing users." and > also that little changes (with the best intentions!) that however mean > either a breakage or confusion for users (due to historical reasons) > should be avoided if possible. And I very strongly feel the same way. > And I think that most people on this list do as well. > > But sometimes I guess mistakes are made anyway. What can be done to > avoid similar issues like with the polynomial order in the future? > > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Tue Jun 26 11:02:05 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 26 Jun 2012 10:02:05 -0500 Subject: [Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch) In-Reply-To: <4FE9C241.4000302@astro.uio.no> References: <4FE9C241.4000302@astro.uio.no> Message-ID: >> >> (I have not read the whole cython discussion yet) > > So here's the summary. It's rather complicated but also incredibly neat > :-) And technical details can be hidden behind a tight API. Could you provide a bit more context for this list. I think this is an important technology concept. I'd like to understand better how well it jives with Numba-produced APIs and how we can make use of it in NumPy. Where exactly would this be used in the NumPy API? What would it replace? > > - We introduce a C-level metaclass, "extensibletype", which to each > type adds a branch-miss-free string->pointer hash table. The ndarray > type is made an instance of this metaclass, so that you can do > > PyCustomSlots_GetTable(array_object->ob_type) > > - The hash table uses a perfect hashing scheme: > > a) We take the lower 64 bits of md5 of the lookup string (this can be > done compile-time or module-load-time) as a pre-hash "h". > > b) When looking up the table for a key with pre-hash "h", the index > in the table is given by > > ((h >> table->r) & table->m1) ^ table->d[r & table->m2] > > Then, *if* the element is present, it will always be found on the first > try; the table is guaranteed collisionless. This means that an expensive > branch-miss can be avoided. It is really incredibly fast in practice, > with a 0.5 ns penalty on my 1.8 GHz laptop. > > The magic is in finding the right table->r and table->d. For a 64-slot > table, parameters r and d[0]..d[63] can be found in 10us on my machine > (it's an O(n) operation). (table->d[i] has type uint16_t) > > (This algorithm was found in an academic paper which I'm too lazy to dig > up from that thread right now; perfect hashing is an active research field.) > > The result? You can use this table to store function pointers in the > type, like C++ virtual tables or like the built-in slots like > tp_get_buffer, but *without* having to agree on everything at > compile-time like in C++. And the only penalty is ~0.5 ns per call and > some cache usage. > > Cython would use this to replace the current custom "cdef class" vtable > with something more tools could agree on, e.g. store function pointers > in the table with keys like > > "method:foo:i4i8->f4" > > But NumPy could easily store entries relating to its C API in the same > hash table, > > "numpy:SHAPE" > > Then, the C API functions would all take PyObject*, look up the fast > hash table on the ob_type. > > This allows for incredibly flexible duck typing on the C level. This does sound very nice. > > PyArray_Check would just check for the presence of the C API but not > care about the actual Python type, i.e., no PyObject_TypeCheck. > > Me and Robert have talked a lot about this and will move forward with it > for Cython. Obviously I don't expect others than me to pick it up for > NumPy so we'll see... I'll write up a specification document sometimes > over the next couple of weeks as we need that even if only for Cython. We will look forward to what you come up with. Best regards, -Travis From charlesr.harris at gmail.com Tue Jun 26 11:33:41 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 26 Jun 2012 09:33:41 -0600 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> Message-ID: On Tue, Jun 26, 2012 at 8:52 AM, Travis Oliphant wrote: > > On Jun 26, 2012, at 9:00 AM, Charles R Harris wrote: > > > > On Mon, Jun 25, 2012 at 9:10 PM, Ond?ej ?ert?k wrote: > >> On Mon, Jun 25, 2012 at 7:38 PM, Fernando Perez >> wrote: >> > On Mon, Jun 25, 2012 at 6:39 PM, Travis Oliphant >> wrote: >> >> >> >> On Jun 25, 2012, at 7:21 PM, Fernando Perez wrote: >> > >> >> >> >> For context, consider that for many years, the word "gratuitous" has >> been used in a non-derogatory way in the Python ecosystem to describe >> changes to semantics and syntax that don't have benefits significant enough >> to offset the pain it will cause to existing users. That's why I used >> the word. I am not trying to be derogatory. I am trying to be clear >> that we need to respect existing users of NumPy more than we have done from >> 1.5 to 1.7 in the enthusiasm to make changes. >> >> >> > >> > For reference, here's the (long) thread where this came to be: >> > >> > http://mail.scipy.org/pipermail/scipy-dev/2009-October/012958.html >> > >> > It's worth noting that at the time, the discussion was for an addition >> > to *scipy*, not to numpy. I don't know when things were moved over to >> > numpy. >> > >> > >> >> Working on the NumPy code base implies respecting the conventions that >> are already in place --- not just disregarding them and doing whatever we >> want. I'm not really sure why I have to argue the existing users point >> of view so much recently. I would hope that all of us would have the >> perspective that the people who have adopted NumPy deserve to be treated >> with respect. The changes that grate on me are the ones that seem to >> take lightly existing users of NumPy. >> >> >> > >> > I certainly appreciate the need to not break user habits/code, as we >> > struggle with the very same issue in IPython all the time. And >> > obviously at this point numpy is 'core infrastructure' enough that >> > breaking backwards compatibility in any way should be very strongly >> > discouraged (things were probably a bit different back in 2009). >> > >> >>> I know that this particular issue grates you quite a bit, but I urge >> >>> you to be fair in your appreciation of how it came to be: through the >> >>> work of well-intentioned and thoughtful (but not omniscient) people >> >>> when you weren't participating actively in numpy development. >> >> >> >> I'm trying very hard to be fair --- especially to changes like this. >> What grates me are changes that affect our user base in a negative way --- >> specifically by causing code that used to work to no longer work or create >> alterations to real conventions. This kind of change is just not >> acceptable if we can avoid it. I'm really trying to understand why others >> do not feel so strongly about this, but I'm not persuaded by what I've >> heard so far. >> > >> > I just want to note that I'm not advocating for *any* >> > backwards-compatibility breakage in numpy at this point... I was just >> > providing context for a discussion that happened back in 2009, and in >> > the scipy list. I certainly feel pretty strongly at this point about >> > the importance of preserving working code *today*, given the role of >> > numpy at the 'root node' of the scipy ecosystem tree and the size of >> > said tree. >> >> I think that everybody strongly agrees that backward incompatible >> changes should not be made. >> >> Sometimes it can be more subtle, >> see for example this numpy bug report in Debian: >> >> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=589835 >> >> and read the dozens of emails that it generated, e.g. >> http://lists.debian.org/debian-python/2010/07/msg00048.html, and so >> on. I've been hit by this problem too, that's why I remember it -- >> suddenly many packages that depend on NumPy stopped working in a >> subtle way and I had to spent hours figuring out what went wrong and >> that the problem is not in h5py, but actually that NumPy has changed >> its ABI, or more precisely the problem is described here (some new >> members were added to a C datastructure): >> http://lists.debian.org/debian-python/2010/07/msg00045.html >> I am sure that this ABI change had to be done and there were good >> reasons for it and this particular change probably even couldn't have >> been avoided. But nevertheless it has caused headaches to a lot of >> people downstream. I just looked into the release notes for NumPy >> 1.4.0 and didn't find this change nor how to fix it in there. I am >> just posting this as a particular, concrete, real life example of >> consequences for the end users. >> > > Let us note that that problem was due to Travis convincing David to > include the Datetime work in the release against David's own best > judgement. The result was a delay of several months until Ralf could get up > to speed and get 1.4.1 out. Let us also note that poly1d is actually not > the same as Matlab poly1d. > > > This is not accurate, Charles. Please stop trying to dredge up old > history you don't know the full story about and are trying to create an > alternate reality about. It doesn't help anything and is quite poisonous > to this mailing list. > I didn't start the discussion of 1.4, nor did I raise the issue at the time as I didn't think it would be productive. We moved forward. But in any case, I asked David at the time why the datetime stuff got included. I'd welcome your version if you care to offer it. That would be more useful than accusing me of creating an alternative reality and would clear the air. > You have a narrative about the past that seems very different from mine > --- and you apparently blame me personally for all that is wrong with NumPy. > You started this blame game. You could have simply said, "here is how we will move forward." This is not a helpful perspective and it just alienates us further and is a > very polarizing perspective. This is not good for the community nor for > our ability to work productively together. > Calling this and that 'gratuitous' is already damaging to the community. Them's fightin' words. If you didn't want a fight you could have simply pointed out a path forward. I hope that it is not a permanent reality and you will find a way to see > things in a different light. > > I see things as I see them. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Tue Jun 26 11:43:56 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 26 Jun 2012 10:43:56 -0500 Subject: [Numpy-discussion] NumPy 1.7 release delays Message-ID: <29DC563C-9A0C-4D8D-9F2A-E1A8D17D554B@continuum.io> Hey all, After some more investigation, I'm not optimistic that we will be able to get a 1.7 release out before SciPy. I would like to get a beta release out by SciPy (or even an rc1 release). But, given the number of code changes and differences between 1.5.x and 1.7, I think we will need an extended beta release stage for 1.7 that will allow as many users as possible to try out the new code base and report back any regressions or backward incompatibilities that need to be fixed before the final release. The fundamental rule I think we have is that "code depending on NumPy that worked with 1.5.x should continue to work with 1.7 without alterations required by the user" This does not mean we can't add new APIs or deprecate old APIs --- but I think that we do have to be much more careful about when deprecated APIs become unavailable. There is a lot of code that assumes the current API. Both code that is in released packages and code that is in "unreleased packages" which we are not even aware of. I don't want to finalize the 1.7 release until we get enough feedback from end-users about the impact of all the changes. This will likely take a longer beta-release period than usual: certainly not until after SciPy where we will make a concerted effort to get people to try the new 1.7 beta and report back on the impact on their code-base. Ondrej is helping out on this effort which I really appreciate. Other people who have time to help with the release effort --- especially in fixing regressions will be greatly appreciated. We are also using this time to 1) setup Continuous Integration services for NumPy using both Jenkins and Travis-CI and 2) migrate the issue tracker to github. Ondrej is heading up #1 and Ray Jones is heading up #2. Please coordinate with them if you'd like to help out on any of those areas. Thanks, -Travis From travis at continuum.io Tue Jun 26 12:24:56 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 26 Jun 2012 11:24:56 -0500 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> Message-ID: >> Let us note that that problem was due to Travis convincing David to include the Datetime work in the release against David's own best judgement. The result was a delay of several months until Ralf could get up to speed and get 1.4.1 out. Let us also note that poly1d is actually not the same as Matlab poly1d. > > This is not accurate, Charles. Please stop trying to dredge up old history you don't know the full story about and are trying to create an alternate reality about. It doesn't help anything and is quite poisonous to this mailing list. > > I didn't start the discussion of 1.4, nor did I raise the issue at the time as I didn't think it would be productive. We moved forward. But in any case, I asked David at the time why the datetime stuff got included. I'd welcome your version if you care to offer it. That would be more useful than accusing me of creating an alternative reality and would clear the air. The datetime stuff got included because it is a very useful and important feature for multiple users. It still needed work, but it was in a state where it could be tried. It did require breaking ABI compatibility in the state it was in. My approach was to break ABI compatibility and move forward (there were other things we could do at the time that are still needed in the code base that will break ABI compatibility in the future). David didn't want to break ABI compatibility and so tried to satisfy two competing desires in a way that did not ultimately work. These things happen. We all get to share responsibility for the outcome. > You have a narrative about the past that seems very different from mine --- and you apparently blame me personally for all that is wrong with NumPy. > > You started this blame game. You could have simply said, "here is how we will move forward." I'm sorry you feel that way. My intent was not to assign blame --- but of course mailing lists can be notoriously hard to actually communicate intent. My intent was to provide context for why I think we should move forward in a particular way. > > This is not a helpful perspective and it just alienates us further and is a very polarizing perspective. This is not good for the community nor for our ability to work productively together. > > Calling this and that 'gratuitous' is already damaging to the community. Them's fightin' words. If you didn't want a fight you could have simply pointed out a path forward. They were not intended as "fighting words". I used the term in a very specific way as used by the Python developers themselves in describing their hope in moving from Python 2 to Python 3. Clearly your semantic environment interpreted them differently. As I have emphasized, I did not mean to disrespect you or anyone else by using that term. From where I sit, however, it seems you are anxious for a fight and so interpret everything I say in the worst possible light. If that is really the case, then this is a very bad state of affairs. We can't really communicate at that point. It will be impossible to agree on anything, and the whole idea of finding consensus just won't work. That's what I'm concerned about, fundamentally. You don't seem to be willing to give me the benefit of the doubt at all. Just like anyone who has created something, I feel a sense of "ownership" of NumPy. It might be helpful to recognize that I also feel that way about SciPy. In the case of SciPy, however, I have handed that project off to Ralf, Pauli, Warren, Josef, and others who are able to spend the time on it that it deserves. That internal mental decision to formally "hand off" SciPy did not come, though, until the end of last year and the first of this year. Perhaps it should have come sooner, but SciPy took a lot of time from me during a lot of formative years and I've always had very high hopes for it. It's hard to let that go. I am not ready to formally "hand off" my involvement with NumPy at all --- especially not now that I understand so much better what NumPy should and can be and how it's being used. Of course, I recognize that it's a team effort. I can't help but feel that you wish I would just "hand off" things to someone else and get out of Dodge. I understand that NumPy would not be what it is today without your contributions, those of David, Mark, Robert, Pauli and so many other people, but I'm not going anywhere at least for the foreseeable future. I've respected that "team effort" perspective from the beginning and remain respectful of it. I recognize that you must feel some sense of "ownership" of NumPy as well. I suspect there are several others that feel the same way. Right now, though, we need to work as hard as we can to reconcile our different perspectives so that we can do our very best to serve and respect the time of the users who have adopted NumPy. -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From thouis at gmail.com Tue Jun 26 12:29:03 2012 From: thouis at gmail.com (Thouis (Ray) Jones) Date: Tue, 26 Jun 2012 18:29:03 +0200 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> Message-ID: On Tue, Jun 26, 2012 at 5:33 PM, Charles R Harris wrote: > Calling this and that 'gratuitous' is already damaging to the community. > Them's fightin' words. If you didn't want a fight you could have simply > pointed out a path forward. I disagree. If a change is gratuitous, and someone call's it out for being so, it's not a reason to get offended. Even if someone call's a change stupid, one should take a large step back before taking offense, just because they were responsible for the change. Defend the change, give the reasons it's not gratuitous/stupid/ugly/whatever, but keep calm and carry on. This sort of back-and-forth sniping should be taken off list. From travis at continuum.io Tue Jun 26 12:46:08 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 26 Jun 2012 11:46:08 -0500 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> Message-ID: <3AA497E1-C4C1-45AA-951B-1B3DAF625AC4@continuum.io> On Jun 26, 2012, at 11:29 AM, Thouis (Ray) Jones wrote: > On Tue, Jun 26, 2012 at 5:33 PM, Charles R Harris > wrote: >> Calling this and that 'gratuitous' is already damaging to the community. >> Them's fightin' words. If you didn't want a fight you could have simply >> pointed out a path forward. > > I disagree. If a change is gratuitous, and someone call's it out for > being so, it's not a reason to get offended. Even if someone call's a > change stupid, one should take a large step back before taking > offense, just because they were responsible for the change. Defend > the change, give the reasons it's not gratuitous/stupid/ugly/whatever, > but keep calm and carry on. > > This sort of back-and-forth sniping should be taken off list. I agree. I will try to refrain from this. Please call me out if I slip up and react to something posted. -Travis > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From jsalvati at u.washington.edu Tue Jun 26 12:46:46 2012 From: jsalvati at u.washington.edu (John Salvatier) Date: Tue, 26 Jun 2012 09:46:46 -0700 Subject: [Numpy-discussion] Would a patch with a function for incrementing an array with advanced indexing be accepted? Message-ID: Hello, If you increment an array using advanced indexing and have repeated indexes, the array doesn't get repeatedly incremented, http://comments.gmane.org/gmane.comp.python.numeric.general/50291. I wrote a C function that does incrementing with repeated indexes correctly. The branch is here (https://github.com/jsalvatier/numpy see the last two commits). Would a patch with a cleaned up version of a function like this be accepted into numpy? I'm not experienced writing numpy C code so I'm sure it still needs improvement. If you compile and install that branch, you can test the code using: from numpy import * from numpy.core.multiarray import index_increment a = arange(12).reshape((3,4)) index = ([1,1,2,0], [0,0,2,3]) vals = [50,50., 30.,16] b = index_increment(a, index, vals) print b """ should print out: [[ 0. 1. 2. 19.] [ 104. 5. 6. 7.] [ 8. 9. 40. 11.]] """ Cheers, John -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Tue Jun 26 12:48:44 2012 From: cournape at gmail.com (David Cournapeau) Date: Tue, 26 Jun 2012 17:48:44 +0100 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> Message-ID: On Tue, Jun 26, 2012 at 5:24 PM, Travis Oliphant wrote: > >> Let us note that that problem was due to Travis convincing David to >> include the Datetime work in the release against David's own best judgement. >> The result was a delay of several months until Ralf could get up to speed >> and get 1.4.1 out. Let us also note that poly1d is actually not the same as >> Matlab poly1d. >> >> >> This is not accurate, Charles. ?Please stop trying to dredge up old >> history you don't know the full story about and are trying to create an >> alternate reality about. ? It doesn't help anything and is quite poisonous >> to this mailing list. > > > I didn't start the discussion of 1.4, nor did I raise the issue at the time > as I didn't think it would be productive. We moved forward. But in any case, > I asked David at the time why the datetime stuff got included. I'd welcome > your version if you care to offer it. That would be more useful than > accusing me of creating an alternative reality and would clear the air. > > > The datetime stuff got included because it is a very useful and important > feature for multiple users. ? It still needed work, but it was in a state > where it could be tried. ? It did require breaking ABI compatibility in the > state it was in. ? My approach was to break ABI compatibility and move > forward (there were other things we could do at the time that are still > needed in the code base that will break ABI compatibility in the future). > ?David didn't want to break ABI compatibility and so tried to satisfy two > competing desires in a way that did not ultimately work. ? ? These things > happen. ? ?We all get to share responsibility for the outcome. I think Chuck alludes to the fact that I was rather reserved about merging datetime before *anyone* knew about breaking the ABI. I don't feel responsible for this issue (except I maybe should have pushed more strongly about datetime being included), but I am also not interested in making a big deal out of it, certainly not two years after the fact. I am merely point this out so that you realize that you may both have a different view that could be seen as valid depending on what you are willing to highlight. I suggest that Chuck and you take this off-list, David From ben.root at ou.edu Tue Jun 26 12:52:11 2012 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 26 Jun 2012 12:52:11 -0400 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> Message-ID: On Tue, Jun 26, 2012 at 12:48 PM, David Cournapeau wrote: > On Tue, Jun 26, 2012 at 5:24 PM, Travis Oliphant > wrote: > > > >> Let us note that that problem was due to Travis convincing David to > >> include the Datetime work in the release against David's own best > judgement. > >> The result was a delay of several months until Ralf could get up to > speed > >> and get 1.4.1 out. Let us also note that poly1d is actually not the > same as > >> Matlab poly1d. > >> > >> > >> This is not accurate, Charles. Please stop trying to dredge up old > >> history you don't know the full story about and are trying to create an > >> alternate reality about. It doesn't help anything and is quite > poisonous > >> to this mailing list. > > > > > > I didn't start the discussion of 1.4, nor did I raise the issue at the > time > > as I didn't think it would be productive. We moved forward. But in any > case, > > I asked David at the time why the datetime stuff got included. I'd > welcome > > your version if you care to offer it. That would be more useful than > > accusing me of creating an alternative reality and would clear the air. > > > > > > The datetime stuff got included because it is a very useful and important > > feature for multiple users. It still needed work, but it was in a state > > where it could be tried. It did require breaking ABI compatibility in > the > > state it was in. My approach was to break ABI compatibility and move > > forward (there were other things we could do at the time that are still > > needed in the code base that will break ABI compatibility in the future). > > David didn't want to break ABI compatibility and so tried to satisfy two > > competing desires in a way that did not ultimately work. These things > > happen. We all get to share responsibility for the outcome. > > I think Chuck alludes to the fact that I was rather reserved about > merging datetime before *anyone* knew about breaking the ABI. I don't > feel responsible for this issue (except I maybe should have pushed > more strongly about datetime being included), but I am also not > interested in making a big deal out of it, certainly not two years > after the fact. I am merely point this out so that you realize that > you may both have a different view that could be seen as valid > depending on what you are willing to highlight. > > I suggest that Chuck and you take this off-list, > > David > Or, we could raise funds for NumFOCUS by selling tickets for a brawl between the two at SciPy2012... I kid, I kid! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Tue Jun 26 12:52:40 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 26 Jun 2012 11:52:40 -0500 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> Message-ID: <1AFCCB33-47D8-4A28-AD3D-E7649E4A13B8@continuum.io> > > I think Chuck alludes to the fact that I was rather reserved about > merging datetime before *anyone* knew about breaking the ABI. I don't > feel responsible for this issue (except I maybe should have pushed > more strongly about datetime being included), but I am also not > interested in making a big deal out of it, certainly not two years > after the fact. I am merely point this out so that you realize that > you may both have a different view that could be seen as valid > depending on what you are willing to highlight. > > I suggest that Chuck and you take this off-list, Agreed! -Travis > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From travis at continuum.io Tue Jun 26 12:57:35 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 26 Jun 2012 11:57:35 -0500 Subject: [Numpy-discussion] Would a patch with a function for incrementing an array with advanced indexing be accepted? In-Reply-To: References: Message-ID: On Jun 26, 2012, at 11:46 AM, John Salvatier wrote: > Hello, > > If you increment an array using advanced indexing and have repeated indexes, the array doesn't get repeatedly incremented, http://comments.gmane.org/gmane.comp.python.numeric.general/50291. I wrote a C function that does incrementing with repeated indexes correctly. The branch is here (https://github.com/jsalvatier/numpy see the last two commits). Would a patch with a cleaned up version of a function like this be accepted into numpy? I'm not experienced writing numpy C code so I'm sure it still needs improvement. This is great. It is an often-requested feature. It's *very difficult* to do without changing fundamentally what NumPy is. But, yes this would be a great pull request. Thanks, -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Tue Jun 26 12:59:07 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 26 Jun 2012 11:59:07 -0500 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> Message-ID: <44DEAACF-5799-428F-9B36-81BE577FE301@continuum.io> > > Or, we could raise funds for NumFOCUS by selling tickets for a brawl between the two at SciPy2012... > > I kid, I kid! Thanks for the humor. Unfortunately, I would be no match physically with someone used to the cold of Logan.... :-) -Travis From charlesr.harris at gmail.com Tue Jun 26 13:22:03 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 26 Jun 2012 11:22:03 -0600 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> Message-ID: On Tue, Jun 26, 2012 at 10:48 AM, David Cournapeau wrote: > On Tue, Jun 26, 2012 at 5:24 PM, Travis Oliphant > wrote: > > > >> Let us note that that problem was due to Travis convincing David to > >> include the Datetime work in the release against David's own best > judgement. > >> The result was a delay of several months until Ralf could get up to > speed > >> and get 1.4.1 out. Let us also note that poly1d is actually not the > same as > >> Matlab poly1d. > >> > >> > >> This is not accurate, Charles. Please stop trying to dredge up old > >> history you don't know the full story about and are trying to create an > >> alternate reality about. It doesn't help anything and is quite > poisonous > >> to this mailing list. > > > > > > I didn't start the discussion of 1.4, nor did I raise the issue at the > time > > as I didn't think it would be productive. We moved forward. But in any > case, > > I asked David at the time why the datetime stuff got included. I'd > welcome > > your version if you care to offer it. That would be more useful than > > accusing me of creating an alternative reality and would clear the air. > > > > > > The datetime stuff got included because it is a very useful and important > > feature for multiple users. It still needed work, but it was in a state > > where it could be tried. It did require breaking ABI compatibility in > the > > state it was in. My approach was to break ABI compatibility and move > > forward (there were other things we could do at the time that are still > > needed in the code base that will break ABI compatibility in the future). > > David didn't want to break ABI compatibility and so tried to satisfy two > > competing desires in a way that did not ultimately work. These things > > happen. We all get to share responsibility for the outcome. > > I think Chuck alludes to the fact that I was rather reserved about > merging datetime before *anyone* knew about breaking the ABI. Exactly. > I don't > feel responsible for this issue (except I maybe should have pushed > more strongly about datetime being included), I think you left out a 'not'. I don't mean to imply that you were in anyway the blame. And you have been pretty adamant about not allowing late merges of large bits of code since then. It falls in the lessons learned category. but I am also not > interested in making a big deal out of it, certainly not two years > after the fact. I am merely point this out so that you realize that > you may both have a different view that could be seen as valid > depending on what you are willing to highlight. > > I suggest that Chuck and you take this off-list, > I don't think there is much more to say, although I would suggest Travis be more careful about criticising previous work, ala 'gratuitous', 'not listening', etc. We got 1.3, 1.4, 1.5, and 1.6 out without any help from him, and I think we did a pretty damn good job of working with the community and improving the code in the process. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsalvati at u.washington.edu Tue Jun 26 13:27:31 2012 From: jsalvati at u.washington.edu (John Salvatier) Date: Tue, 26 Jun 2012 10:27:31 -0700 Subject: [Numpy-discussion] Would a patch with a function for incrementing an array with advanced indexing be accepted? In-Reply-To: References: Message-ID: Can you clarify why it would be super hard? I just reused the code for advanced indexing (a modification of PyArray_SetMap). Am I missing something crucial? On Tue, Jun 26, 2012 at 9:57 AM, Travis Oliphant wrote: > > On Jun 26, 2012, at 11:46 AM, John Salvatier wrote: > > Hello, > > If you increment an array using advanced indexing and have repeated > indexes, the array doesn't get repeatedly incremented, > http://comments.gmane.org/gmane.comp.python.numeric.general/50291. I > wrote a C function that does incrementing with repeated indexes correctly. > The branch is here (https://github.com/jsalvatier/numpy see the last two > commits). Would a patch with a cleaned up version of a function like this > be accepted into numpy? I'm not experienced writing numpy C code so I'm > sure it still needs improvement. > > > This is great. It is an often-requested feature. It's *very difficult* > to do without changing fundamentally what NumPy is. But, yes this would be > a great pull request. > > Thanks, > > -Travis > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nouiz at nouiz.org Tue Jun 26 14:34:58 2012 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Tue, 26 Jun 2012 14:34:58 -0400 Subject: [Numpy-discussion] Would a patch with a function for incrementing an array with advanced indexing be accepted? In-Reply-To: References: Message-ID: Hi, I think he was referring that making NUMPY_ARRAY_OBJECT[...] syntax support the operation that you said is hard. But having a separate function do it is less complicated as you said. Fred On Tue, Jun 26, 2012 at 1:27 PM, John Salvatier wrote: > Can you clarify why it would be super hard? I just reused the code for > advanced indexing (a modification of PyArray_SetMap). Am I missing something > crucial? > > > > On Tue, Jun 26, 2012 at 9:57 AM, Travis Oliphant > wrote: >> >> >> On Jun 26, 2012, at 11:46 AM, John Salvatier wrote: >> >> Hello, >> >> If you increment an array using advanced indexing and have repeated >> indexes, the array doesn't get repeatedly >> incremented,?http://comments.gmane.org/gmane.comp.python.numeric.general/50291. >> I wrote a C function that does incrementing with repeated indexes correctly. >> The branch is here (https://github.com/jsalvatier/numpy?see the last two >> commits). Would a patch with a cleaned up version of a function like this be >> accepted into numpy? I'm not experienced writing numpy C code so I'm sure it >> still needs improvement. >> >> >> This is great. ? It is an often-requested feature. ? It's *very difficult* >> to do without changing fundamentally what NumPy is. ?But, yes this would be >> a great pull request. >> >> Thanks, >> >> -Travis >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From jsalvati at u.washington.edu Tue Jun 26 14:48:13 2012 From: jsalvati at u.washington.edu (John Salvatier) Date: Tue, 26 Jun 2012 11:48:13 -0700 Subject: [Numpy-discussion] Would a patch with a function for incrementing an array with advanced indexing be accepted? In-Reply-To: References: Message-ID: Right, that makes sense. Thanks. On Tue, Jun 26, 2012 at 11:34 AM, Fr?d?ric Bastien wrote: > Hi, > > I think he was referring that making NUMPY_ARRAY_OBJECT[...] syntax > support the operation that you said is hard. But having a separate > function do it is less complicated as you said. > > Fred > > On Tue, Jun 26, 2012 at 1:27 PM, John Salvatier > wrote: > > Can you clarify why it would be super hard? I just reused the code for > > advanced indexing (a modification of PyArray_SetMap). Am I missing > something > > crucial? > > > > > > > > On Tue, Jun 26, 2012 at 9:57 AM, Travis Oliphant > > wrote: > >> > >> > >> On Jun 26, 2012, at 11:46 AM, John Salvatier wrote: > >> > >> Hello, > >> > >> If you increment an array using advanced indexing and have repeated > >> indexes, the array doesn't get repeatedly > >> incremented, > http://comments.gmane.org/gmane.comp.python.numeric.general/50291. > >> I wrote a C function that does incrementing with repeated indexes > correctly. > >> The branch is here (https://github.com/jsalvatier/numpy see the last > two > >> commits). Would a patch with a cleaned up version of a function like > this be > >> accepted into numpy? I'm not experienced writing numpy C code so I'm > sure it > >> still needs improvement. > >> > >> > >> This is great. It is an often-requested feature. It's *very > difficult* > >> to do without changing fundamentally what NumPy is. But, yes this > would be > >> a great pull request. > >> > >> Thanks, > >> > >> -Travis > >> > >> > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From srean.list at gmail.com Tue Jun 26 14:51:08 2012 From: srean.list at gmail.com (srean) Date: Tue, 26 Jun 2012 13:51:08 -0500 Subject: [Numpy-discussion] Semantics of index arrays and a request to fix the user guide In-Reply-To: References: Message-ID: Hi All, my question might have got lost due to the intense activity around the 1.7 release. Now that it has quietened down, would appreciate any help regarding my confusion about how index arrays work (especially when broadcasted). -- srean On Mon, Jun 25, 2012 at 5:29 PM, srean wrote: > From the user guide: > ----------------------------- > >> Boolean arrays must be of the same shape as the array being indexed, >> or broadcastable to the same shape. In the most straightforward case, >> ?the boolean array has the same shape. > > Comment: So far so good, but the doc has not told me yet what is the > shape or the output. > -------------- > > user guide continues with an example: > ------------------------------------------------------ > >> The result is a 1-D array containing all the elements in the indexed array corresponding to all the true elements in the boolean array. > > > Comment: > -------------- > > Now it is not clear from that line whether the shape of the result is > generally true or is it specific to the example. So the reader(me) is > still confused. > >There is no explanation about > the mechanism used to arrive at the output shape, is it the shape of > what the index array was broadcasted to ? or is it something else, if > it is the latter, what is it. > > Example > ------------ > > The example indexes a (5,7) array with a (5,) index array. Now this > is confusing because it seems to contradict the original > documentation because > (5,) is neither the same shape as (5,7) nor is it broadcastable to it. > > The steps of the conventional broaddcasting would yield > > (5,7) > (5,) > > then > > (5,7) > (1,5) > > then an error because 7 and 5 dont match. > > > > User guide continues: > ------------------------------ > >> Combining index arrays with slices. > >> In effect, the slice is converted to an index array >> np.array([[1,2]]) (shape (1,2)) that is broadcast with >> ?the index array to produce a resultant array of shape (3,2). > > comment: > ------------- > > Here the two arrays have shape > (3,) and (1,2) so how does broadcasting yield the shape 3,2. > Broadcasting is supposed to proceed trailing dimension first but it > seems in these examples it is doing the opposite. > > ===== > > So could someone explain the semantics and make the user guide more precise. > > Assuming the user guide will be the first document the new user will > read it is surprisingly difficult to read, primarily because it gets > into advanced topics to soon and partially because of ambiguous > language. The numpy reference on the other hand is very clear as is > Travis's book which I am glad to say I actually bought a long time > ago. > > Thanks, > ?srean From ralf.gommers at googlemail.com Tue Jun 26 15:10:23 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 26 Jun 2012 21:10:23 +0200 Subject: [Numpy-discussion] NumPy 1.7 release delays In-Reply-To: <29DC563C-9A0C-4D8D-9F2A-E1A8D17D554B@continuum.io> References: <29DC563C-9A0C-4D8D-9F2A-E1A8D17D554B@continuum.io> Message-ID: On Tue, Jun 26, 2012 at 5:43 PM, Travis Oliphant wrote: > > Hey all, > > After some more investigation, I'm not optimistic that we will be able to > get a 1.7 release out before SciPy. I would like to get a beta release > out by SciPy (or even an rc1 release). But, given the number of code > changes and differences between 1.5.x and 1.7, I think we will need an > extended beta release stage for 1.7 that will allow as many users as > possible to try out the new code base and report back any regressions or > backward incompatibilities that need to be fixed before the final release. > +1 > The fundamental rule I think we have is that "code depending on NumPy that > worked with 1.5.x should continue to work with 1.7 without alterations > required by the user" > The rule should be 1.6.x imho. Undoing things that were changed in between 1.5.x and 1.6.x makes very little sense; numpy 1.6.0 has been out for over a year. > This does not mean we can't add new APIs or deprecate old APIs --- but I > think that we do have to be much more careful about when deprecated APIs > become unavailable. There is a lot of code that assumes the current > API. Both code that is in released packages and code that is in > "unreleased packages" which we are not even aware of. > I think you are mainly talking here about changes that had unintended side-effects, and broke things without anyone realizing that in time. If you read the 1.5.0, 1.6.0 and 1.7.0 release notes, there have been very few actual deprecations. Besides that, we have a long standing policy of removing those things that do get deprecated after one minor release: http://projects.scipy.org/numpy/wiki/ApiDeprecation. If you propose to change that, I suggest discussing it in a separate thread. > I don't want to finalize the 1.7 release until we get enough feedback from > end-users about the impact of all the changes. This will likely take a > longer beta-release period than usual: certainly not until after SciPy > where we will make a concerted effort to get people to try the new 1.7 beta > and report back on the impact on their code-base. > > Ondrej is helping out on this effort which I really appreciate. Other > people who have time to help with the release effort --- especially in > fixing regressions will be greatly appreciated. > Did you happen to see https://github.com/numpy/numpy/blob/master/doc/HOWTO_RELEASE.rst.txt? Among other things, it lists a few things that are still to be done (merge doc wiki edits, flip the "raise_warnings" switch) and details on the Wine / MinGW setup that may be useful. I did just spot a mistake there by the way, we're still on MinGW 3.4.5. Cheers, Ralf We are also using this time to 1) setup Continuous Integration services for > NumPy using both Jenkins and Travis-CI and 2) migrate the issue tracker to > github. Ondrej is heading up #1 and Ray Jones is heading up #2. Please > coordinate with them if you'd like to help out on any of those areas. > > Thanks, > > -Travis > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Tue Jun 26 15:10:50 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 26 Jun 2012 14:10:50 -0500 Subject: [Numpy-discussion] Would a patch with a function for incrementing an array with advanced indexing be accepted? In-Reply-To: References: Message-ID: On Jun 26, 2012, at 1:34 PM, Fr?d?ric Bastien wrote: > Hi, > > I think he was referring that making NUMPY_ARRAY_OBJECT[...] syntax > support the operation that you said is hard. But having a separate > function do it is less complicated as you said. Yes. That's precisely what I meant. Thank you for clarifying. -Travis > > Fred > > On Tue, Jun 26, 2012 at 1:27 PM, John Salvatier > wrote: >> Can you clarify why it would be super hard? I just reused the code for >> advanced indexing (a modification of PyArray_SetMap). Am I missing something >> crucial? >> >> >> >> On Tue, Jun 26, 2012 at 9:57 AM, Travis Oliphant >> wrote: >>> >>> >>> On Jun 26, 2012, at 11:46 AM, John Salvatier wrote: >>> >>> Hello, >>> >>> If you increment an array using advanced indexing and have repeated >>> indexes, the array doesn't get repeatedly >>> incremented, http://comments.gmane.org/gmane.comp.python.numeric.general/50291. >>> I wrote a C function that does incrementing with repeated indexes correctly. >>> The branch is here (https://github.com/jsalvatier/numpy see the last two >>> commits). Would a patch with a cleaned up version of a function like this be >>> accepted into numpy? I'm not experienced writing numpy C code so I'm sure it >>> still needs improvement. >>> >>> >>> This is great. It is an often-requested feature. It's *very difficult* >>> to do without changing fundamentally what NumPy is. But, yes this would be >>> a great pull request. >>> >>> Thanks, >>> >>> -Travis >>> >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From travis at continuum.io Tue Jun 26 15:20:48 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 26 Jun 2012 14:20:48 -0500 Subject: [Numpy-discussion] NumPy 1.7 release delays In-Reply-To: References: <29DC563C-9A0C-4D8D-9F2A-E1A8D17D554B@continuum.io> Message-ID: <333C36E4-D2E4-4127-99C3-9D55468CA558@continuum.io> On Jun 26, 2012, at 2:10 PM, Ralf Gommers wrote: > > > On Tue, Jun 26, 2012 at 5:43 PM, Travis Oliphant wrote: > > Hey all, > > After some more investigation, I'm not optimistic that we will be able to get a 1.7 release out before SciPy. I would like to get a beta release out by SciPy (or even an rc1 release). But, given the number of code changes and differences between 1.5.x and 1.7, I think we will need an extended beta release stage for 1.7 that will allow as many users as possible to try out the new code base and report back any regressions or backward incompatibilities that need to be fixed before the final release. > > +1 > > > The fundamental rule I think we have is that "code depending on NumPy that worked with 1.5.x should continue to work with 1.7 without alterations required by the user" > > The rule should be 1.6.x imho. Undoing things that were changed in between 1.5.x and 1.6.x makes very little sense; numpy 1.6.0 has been out for over a year. Unfortunately, I think there are issues we are just now seeing with code that was released in 1.6.x, and there are many people who have not moved forward to 1.6.x yet. The rule should in fact be that code working with NumPy 1.0 should work with 1.7 (except for "bug-fixes"). I realize that with some of the semantics it's going to be hard to be pedantic about the "rule". But, I'm going to be very responsive to users of 1.5.x and even possibly 1.3.x who have code issues in trying to move forward. > > > This does not mean we can't add new APIs or deprecate old APIs --- but I think that we do have to be much more careful about when deprecated APIs become unavailable. There is a lot of code that assumes the current API. Both code that is in released packages and code that is in "unreleased packages" which we are not even aware of. > > I think you are mainly talking here about changes that had unintended side-effects, and broke things without anyone realizing that in time. If you read the 1.5.0, 1.6.0 and 1.7.0 release notes, there have been very few actual deprecations. > > Besides that, we have a long standing policy of removing those things that do get deprecated after one minor release: http://projects.scipy.org/numpy/wiki/ApiDeprecation. If you propose to change that, I suggest discussing it in a separate thread. We need to change that, I think. I feel pretty strongly that we can't just remove APIs after one minor release after observing more of NumPy's use in the wild. APIs should exist for at least 5 years and preferably only change on major releases. > > > I don't want to finalize the 1.7 release until we get enough feedback from end-users about the impact of all the changes. This will likely take a longer beta-release period than usual: certainly not until after SciPy where we will make a concerted effort to get people to try the new 1.7 beta and report back on the impact on their code-base. > > Ondrej is helping out on this effort which I really appreciate. Other people who have time to help with the release effort --- especially in fixing regressions will be greatly appreciated. > > Did you happen to see https://github.com/numpy/numpy/blob/master/doc/HOWTO_RELEASE.rst.txt? Among other things, it lists a few things that are still to be done (merge doc wiki edits, flip the "raise_warnings" switch) and details on the Wine / MinGW setup that may be useful. I did just spot a mistake there by the way, we're still on MinGW 3.4.5. It's nice to have a document like this. Of course, I've seen it. I don't think we will be using Wine and MinGW to do the Windows builds, though. Thanks, -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Tue Jun 26 15:34:04 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 26 Jun 2012 14:34:04 -0500 Subject: [Numpy-discussion] API policy Message-ID: <905AFE89-FA0B-48E0-A5C3-13A39BDC55E1@continuum.io> I think we need to update this document: http://projects.scipy.org/numpy/wiki/ApiDeprecation I don't think this characterizes the opinion of all involved in NumPy development (it is certainly not the way I view our commitment to users). Incidentally, in the migration from Trac we should move all pages like this from Trac to Github pages or some other location. The idea that APIs should disappear after one minor release really needs to be re-visited -- especially if there is a strong interest in changing the APIs as there has been in the move from 1.5.x to 1.6 and then from 1.6 to 1.7. This created a situation where a large number of people who did not take the 1.6.x upgrade could potentially have APIs that disappear. Most open source projects do not change APIs that rapidly. One example: OpenGL supported every entry point for 17 years prior to OpenGL 3.0 At the very least APIs, can raise warnings, they can be moved to a legacy headers location, etc, but we are not doing anyone a service by having their code stop working. It frustrates users and makes people hesitate about adopting our library as a dependency. Thanks, -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Tue Jun 26 15:48:44 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 26 Jun 2012 21:48:44 +0200 Subject: [Numpy-discussion] NumPy 1.7 release delays In-Reply-To: <333C36E4-D2E4-4127-99C3-9D55468CA558@continuum.io> References: <29DC563C-9A0C-4D8D-9F2A-E1A8D17D554B@continuum.io> <333C36E4-D2E4-4127-99C3-9D55468CA558@continuum.io> Message-ID: On Tue, Jun 26, 2012 at 9:20 PM, Travis Oliphant wrote: > > On Jun 26, 2012, at 2:10 PM, Ralf Gommers wrote: > > > > On Tue, Jun 26, 2012 at 5:43 PM, Travis Oliphant wrote: > >> >> Hey all, >> >> After some more investigation, I'm not optimistic that we will be able to >> get a 1.7 release out before SciPy. I would like to get a beta release >> out by SciPy (or even an rc1 release). But, given the number of code >> changes and differences between 1.5.x and 1.7, I think we will need an >> extended beta release stage for 1.7 that will allow as many users as >> possible to try out the new code base and report back any regressions or >> backward incompatibilities that need to be fixed before the final release. >> > > +1 > > >> The fundamental rule I think we have is that "code depending on NumPy >> that worked with 1.5.x should continue to work with 1.7 without alterations >> required by the user" >> > > The rule should be 1.6.x imho. Undoing things that were changed in between > 1.5.x and 1.6.x makes very little sense; numpy 1.6.0 has been out for over > a year. > > > Unfortunately, I think there are issues we are just now seeing with code > that was released in 1.6.x, and there are many people who have not moved > forward to 1.6.x yet. > Some examples would be nice. A lot of people did move already. And I haven't seen reports of those that tried and got stuck. Also, Debian and Python(x, y) have 1.6.2, EPD has 1.6.1. I think the number of cases we're talking about here is in fact limited. But discussion of those cases is necessary if a change would break 1.6.x. The rule should in fact be that code working with NumPy 1.0 should work > with 1.7 (except for "bug-fixes"). > That's a good rule. Hard to ensure for corner cases which didn't have test coverage though. I realize that with some of the semantics it's going to be hard to be > pedantic about the "rule". But, I'm going to be very responsive to users > of 1.5.x and even possibly 1.3.x who have code issues in trying to move > forward. > > > >> This does not mean we can't add new APIs or deprecate old APIs --- but I >> think that we do have to be much more careful about when deprecated APIs >> become unavailable. There is a lot of code that assumes the current >> API. Both code that is in released packages and code that is in >> "unreleased packages" which we are not even aware of. >> > > I think you are mainly talking here about changes that had unintended > side-effects, and broke things without anyone realizing that in time. If > you read the 1.5.0, 1.6.0 and 1.7.0 release notes, there have been very few > actual deprecations. > > Besides that, we have a long standing policy of removing those things that > do get deprecated after one minor release: > http://projects.scipy.org/numpy/wiki/ApiDeprecation. If you propose to > change that, I suggest discussing it in a separate thread. > > > We need to change that, I think. I feel pretty strongly that we can't > just remove APIs after one minor release after observing more of NumPy's > use in the wild. APIs should exist for at least 5 years and preferably > only change on major releases. > > > >> I don't want to finalize the 1.7 release until we get enough feedback >> from end-users about the impact of all the changes. This will likely take >> a longer beta-release period than usual: certainly not until after SciPy >> where we will make a concerted effort to get people to try the new 1.7 beta >> and report back on the impact on their code-base. >> >> Ondrej is helping out on this effort which I really appreciate. Other >> people who have time to help with the release effort --- especially in >> fixing regressions will be greatly appreciated. >> > > Did you happen to see > https://github.com/numpy/numpy/blob/master/doc/HOWTO_RELEASE.rst.txt? > Among other things, it lists a few things that are still to be done (merge > doc wiki edits, flip the "raise_warnings" switch) and details on the Wine / > MinGW setup that may be useful. I did just spot a mistake there by the way, > we're still on MinGW 3.4.5. > > > It's nice to have a document like this. Of course, I've seen it. I > don't think we will be using Wine and MinGW to do the Windows builds, > though. > Any more details? If you are thinking about using MSVC for numpy, will it work with existing scipy and other binaries? Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Tue Jun 26 15:51:39 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 26 Jun 2012 14:51:39 -0500 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> Message-ID: <4032F07C-6BE4-4FDF-8627-9AA772AA9097@continuum.io> > > Exactly. > > I don't > feel responsible for this issue (except I maybe should have pushed > more strongly about datetime being included), > > I think you left out a 'not'. I don't mean to imply that you were in anyway the blame. And you have been pretty adamant about not allowing late merges of large bits of code since then. It falls in the lessons learned category. > > but I am also not > interested in making a big deal out of it, certainly not two years > after the fact. I am merely point this out so that you realize that > you may both have a different view that could be seen as valid > depending on what you are willing to highlight. > > I suggest that Chuck and you take this off-list, > > I don't think there is much more to say, although I would suggest Travis be more careful about criticising previous work, ala 'gratuitous', 'not listening', etc. We got 1.3, 1.4, 1.5, and 1.6 out without any help from him, and I think we did a pretty damn good job of working with the community and improving the code in the process. Wow! Again, your attitude surprises me and I can't just let a public comment like that go unaddressed. Not *any* help from me. Is that really the way you view it. Amazing! No wonder people new to the project lose sight of where it came from if that's the kind of dialogue and spin you spread. So, you are going to disregard anything I've done during that time. The personal time spent on bug fixes and code enhancements, the active discussions with people, the work on datetime, the contribution of resources, the growing of the community, the teaching, the talking, the actively trying to figure out just how to improve not only the state of the code but also how it gets written, the documentation improvements (from my early donation of my book). Just because you are not aware personally of something or I don't comment on this list, it doesn't mean I'm not active. I was not as active as I wanted to be sometimes (I do have other responsibilities), but this kind of statement is pretty hurtful as well as being completely inaccurate. "The community" is not just people that post to this list and a few users of SciPy that you know about. "The community" is much larger than that, and I've been working with them too --- all along, even when I wasn't actively making releases. I would suggest that you be more careful about accusing who is and who isn't "helping" with things. -Travis -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason-sage at creativetrax.com Tue Jun 26 16:01:57 2012 From: jason-sage at creativetrax.com (Jason Grout) Date: Tue, 26 Jun 2012 15:01:57 -0500 Subject: [Numpy-discussion] NumPy 1.7 release delays In-Reply-To: References: <29DC563C-9A0C-4D8D-9F2A-E1A8D17D554B@continuum.io> <333C36E4-D2E4-4127-99C3-9D55468CA558@continuum.io> Message-ID: <4FEA1535.7020407@creativetrax.com> On 6/26/12 2:48 PM, Ralf Gommers wrote: > Unfortunately, I think there are issues we are just now seeing with > code that was released in 1.6.x, and there are many people who have > not moved forward to 1.6.x yet. > > > Some examples would be nice. I'll bite. Here's an issue that prevents Sage from upgrading to 1.6.2 from 1.5.1: https://github.com/numpy/numpy/issues/291 People are actively working on it (Thanks! Travis commented 13 hours ago about the root of the problem, I think). Thanks, Jason From d.s.seljebotn at astro.uio.no Tue Jun 26 16:06:30 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 26 Jun 2012 22:06:30 +0200 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: <4032F07C-6BE4-4FDF-8627-9AA772AA9097@continuum.io> References: <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> <4032F07C-6BE4-4FDF-8627-9AA772AA9097@continuum.io> Message-ID: <4FEA1646.1020008@astro.uio.no> On 06/26/2012 09:51 PM, Travis Oliphant wrote: >> >> Exactly. >> >> I don't >> feel responsible for this issue (except I maybe should have pushed >> more strongly about datetime being included), >> >> >> I think you left out a 'not'. I don't mean to imply that you were in >> anyway the blame. And you have been pretty adamant about not allowing >> late merges of large bits of code since then. It falls in the lessons >> learned category. >> >> but I am also not >> interested in making a big deal out of it, certainly not two years >> after the fact. I am merely point this out so that you realize that >> you may both have a different view that could be seen as valid >> depending on what you are willing to highlight. >> >> I suggest that Chuck and you take this off-list, >> >> >> I don't think there is much more to say, although I would suggest >> Travis be more careful about criticising previous work, ala >> 'gratuitous', 'not listening', etc. We got 1.3, 1.4, 1.5, and 1.6 out >> without any help from him, and I think we did a pretty damn good job >> of working with the community and improving the code in the process. > > Wow! Again, your attitude surprises me and I can't just let a public > comment like that go unaddressed. Not *any* help from me. Is that really I hereby call you out!, per your comment earlier :-) Something the Sage project does very well is meeting often in person (granted, that's a lot easier to pull off for academics than people who have real work to be done). In my experience getting to know somebody better in person does wonders to email clarity -- one needs to know how somebody else's mind works to be able to read their emails well, and that's better picked up in person. Cython's had one workshop, and it did improve discussion climate. (And I don't think a SciPy conference, brawl or not, is a good replacement for an honest NumPy developer workshop to a nice cottage *by invitation only*, there's too much stuff going on, and too little undivided attention to one another.) Dag From charlesr.harris at gmail.com Tue Jun 26 16:07:29 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 26 Jun 2012 14:07:29 -0600 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: <4032F07C-6BE4-4FDF-8627-9AA772AA9097@continuum.io> References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> <4032F07C-6BE4-4FDF-8627-9AA772AA9097@continuum.io> Message-ID: On Tue, Jun 26, 2012 at 1:51 PM, Travis Oliphant wrote: > > Exactly. > > >> I don't >> feel responsible for this issue (except I maybe should have pushed >> more strongly about datetime being included), > > > I think you left out a 'not'. I don't mean to imply that you were in > anyway the blame. And you have been pretty adamant about not allowing late > merges of large bits of code since then. It falls in the lessons learned > category. > > but I am also not >> interested in making a big deal out of it, certainly not two years >> after the fact. I am merely point this out so that you realize that >> you may both have a different view that could be seen as valid >> depending on what you are willing to highlight. >> >> I suggest that Chuck and you take this off-list, >> > > I don't think there is much more to say, although I would suggest Travis > be more careful about criticising previous work, ala 'gratuitous', 'not > listening', etc. We got 1.3, 1.4, 1.5, and 1.6 out without any help from > him, and I think we did a pretty damn good job of working with the > community and improving the code in the process. > > > Wow! Again, your attitude surprises me and I can't just let a public > comment like that go unaddressed. Not *any* help from me. Is that really > the way you view it. Amazing! No wonder people new to the project lose > sight of where it came from if that's the kind of dialogue and spin you > spread. > > So, you are going to disregard anything I've done during that time. The > personal time spent on bug fixes and code enhancements, the active > discussions with people, the work on datetime, the contribution of > resources, the growing of the community, the teaching, the talking, the > actively trying to figure out just how to improve not only the state of the > code but also how it gets written, the documentation improvements (from my > early donation of my book). Just because you are not aware personally of > something or I don't comment on this list, it doesn't mean I'm not active. > I was not as active as I wanted to be sometimes (I do have other > responsibilities), but this kind of statement is pretty hurtful as well as > being completely inaccurate. > > "The community" is not just people that post to this list and a few users > of SciPy that you know about. "The community" is much larger than that, > and I've been working with them too --- all along, even when I wasn't > actively making releases. I would suggest that you be more careful > about accusing who is and who isn't "helping" with things. > I haven't been spinning. OTOH charris at f16 [numpy.git (master)]$ git log v1.2.1..v1.3.0 | grep -i oliphant | wc -l 23 charris at f16 [numpy.git (master)]$ git log v1.2.1..v1.3.0 | grep -i harris | wc -l 151 charris at f16 [numpy.git (master)]$ git log v1.2.1..v1.3.0 | grep -i cournapeau | wc -l 554 Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Tue Jun 26 16:10:07 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 26 Jun 2012 15:10:07 -0500 Subject: [Numpy-discussion] NumPy 1.7 release delays In-Reply-To: References: <29DC563C-9A0C-4D8D-9F2A-E1A8D17D554B@continuum.io> <333C36E4-D2E4-4127-99C3-9D55468CA558@continuum.io> Message-ID: > > Unfortunately, I think there are issues we are just now seeing with code that was released in 1.6.x, and there are many people who have not moved forward to 1.6.x yet. > > Some examples would be nice. A lot of people did move already. And I haven't seen reports of those that tried and got stuck. Also, Debian and Python(x, y) have 1.6.2, EPD has 1.6.1. One issues is the one that Sage identified about the array interface regression as noted by Jason. Any other regressions from 1.5.x need to be addressed as well. We'll have to decide on a case-by-case basis if there are issues that conflict with 1.6.x behavior. > > It's nice to have a document like this. Of course, I've seen it. I don't think we will be using Wine and MinGW to do the Windows builds, though. > > Any more details? If you are thinking about using MSVC for numpy, will it work with existing scipy and other binaries? It will need to. We need to make sure that whatever we do works for your SciPy binaries. -Travis > > Ralf > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason-sage at creativetrax.com Tue Jun 26 16:11:56 2012 From: jason-sage at creativetrax.com (Jason Grout) Date: Tue, 26 Jun 2012 15:11:56 -0500 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: <4FEA1646.1020008@astro.uio.no> References: <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> <4032F07C-6BE4-4FDF-8627-9AA772AA9097@continuum.io> <4FEA1646.1020008@astro.uio.no> Message-ID: <4FEA178C.2020900@creativetrax.com> On 6/26/12 3:06 PM, Dag Sverre Seljebotn wrote: > Something the Sage project does very well is meeting often in person Another thing we have that has improved the mailing list climate is a "sage-flame" list [1] that serves as a venting release valve for anyone to post *anything* at all. There have been multiple occasions where we called on people to move their discussion to sage-flame, and overall it's worked very nicely. Having a public forum to argue things out seems to help, and my guess is that most of us may peek at it every now and then for kicks and giggles. Thanks, Jason [1] https://groups.google.com/forum/?fromgroups#!forum/sage-flame From thouis at gmail.com Tue Jun 26 16:27:51 2012 From: thouis at gmail.com (Thouis (Ray) Jones) Date: Tue, 26 Jun 2012 22:27:51 +0200 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: <4FEA178C.2020900@creativetrax.com> References: <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> <4032F07C-6BE4-4FDF-8627-9AA772AA9097@continuum.io> <4FEA1646.1020008@astro.uio.no> <4FEA178C.2020900@creativetrax.com> Message-ID: On Tue, Jun 26, 2012 at 10:11 PM, Jason Grout wrote: > On 6/26/12 3:06 PM, Dag Sverre Seljebotn wrote: >> Something the Sage project does very well is meeting often in person > > Another thing we have that has improved the mailing list climate is a > "sage-flame" list [1] +1 ! Speaking as someone trying to get started in contributing to numpy, I find this discussion extremely off-putting. It's childish, meaningless, and spiteful, and I think it's doing more harm than any possible good that could come out of continuing it. From ralf.gommers at googlemail.com Tue Jun 26 16:31:41 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 26 Jun 2012 22:31:41 +0200 Subject: [Numpy-discussion] NumPy 1.7 release delays In-Reply-To: References: <29DC563C-9A0C-4D8D-9F2A-E1A8D17D554B@continuum.io> <333C36E4-D2E4-4127-99C3-9D55468CA558@continuum.io> Message-ID: On Tue, Jun 26, 2012 at 10:10 PM, Travis Oliphant wrote: > >> Unfortunately, I think there are issues we are just now seeing with code >> that was released in 1.6.x, and there are many people who have not moved >> forward to 1.6.x yet. >> > > Some examples would be nice. A lot of people did move already. And I > haven't seen reports of those that tried and got stuck. Also, Debian and > Python(x, y) have 1.6.2, EPD has 1.6.1. > > > One issues is the one that Sage identified about the array interface > regression as noted by Jason. > That's a good example, and indeed should be fixed. This is clearly a case where no one will be relying on the new behavior; no one wants that object array that's returned in 1.6.x. Any other regressions from 1.5.x need to be addressed as well. We'll > have to decide on a case-by-case basis if there are issues that conflict > with 1.6.x behavior. > Sounds good. > >> It's nice to have a document like this. Of course, I've seen it. I >> don't think we will be using Wine and MinGW to do the Windows builds, >> though. >> > > Any more details? If you are thinking about using MSVC for numpy, will it > work with existing scipy and other binaries? > > > It will need to. We need to make sure that whatever we do works for your > SciPy binaries. > Great. Please document the new setup, if it's better than the old one it will probably make sense to adopt it for SciPy too. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Tue Jun 26 16:31:50 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 26 Jun 2012 15:31:50 -0500 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> <4032F07C-6BE4-4FDF-8627-9AA772AA9097@continuum.io> <4FEA1646.1020008@astro.uio.no> <4FEA178C.2020900@creativetrax.com> Message-ID: <5791B4AC-CC39-4853-98AD-5EAAE3E0F651@continuum.io> On Jun 26, 2012, at 3:27 PM, Thouis (Ray) Jones wrote: > On Tue, Jun 26, 2012 at 10:11 PM, Jason Grout > wrote: >> On 6/26/12 3:06 PM, Dag Sverre Seljebotn wrote: >>> Something the Sage project does very well is meeting often in person >> >> Another thing we have that has improved the mailing list climate is a >> "sage-flame" list [1] > > +1 ! > > Speaking as someone trying to get started in contributing to numpy, I > find this discussion extremely off-putting. It's childish, > meaningless, and spiteful, and I think it's doing more harm than any > possible good that could come out of continuing it. Thank you for the reminder. I was already called out for not stopping. Thanks, Dag. A flame-list might indeed be a good idea at this point if there is further need for "clearing the air" -Travis > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From d.s.seljebotn at astro.uio.no Tue Jun 26 16:35:05 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 26 Jun 2012 22:35:05 +0200 Subject: [Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch) In-Reply-To: References: <4FE9C241.4000302@astro.uio.no> Message-ID: <4FEA1CF9.8080306@astro.uio.no> On 06/26/2012 05:02 PM, Travis Oliphant wrote: >>> >>> (I have not read the whole cython discussion yet) >> >> So here's the summary. It's rather complicated but also incredibly neat >> :-) And technical details can be hidden behind a tight API. > > Could you provide a bit more context for this list. I think this is an important technology concept. I'd like to understand better how well it jives with Numba-produced APIs and how we can make use of it in NumPy. > > Where exactly would this be used in the NumPy API? What would it replace? Right. I thought I did that :-) I realize I might sometimes be too brief, part of the "problem" is I'm used to Cython development where I can start a sentence and then Mark Florisson or Robert Bradshaw can finish it. I'll try to step through how PyArray_DIMS could work under a refactored API from a C client. To do this I gloss over some of the finer points etc. and just make a premature decision here and there. Almost none of the types or functions below already exists, I'll assume we implement them (I do have a good start on the reference implementation). We'll add a new C-level slot called "numpy:SHAPE" to the ndarray type, and hook the PyArray_DIMS to use this slot. Inside NumPy ------------ The PyArray_Type (?) definition changes from being a PyTypeObject to a PyExtensibleTypeObject, and PyExtensibleType_Ready is called instead of PyType_Ready. This builds the perfect lookup table etc. I'll omit the details. The caller ---------- First we need some macro module initialization setup (part of NumPy include files): /* lower-64-bits of md5 of "numpy:SHAPE" */ #define NPY_SHAPE_SLOT_PREHASH 0xa8cf70dc5f598f40ULL /* hold an interned "numpy:SHAPE" string */ static char *_Npy_interned_numpy_SHAPE; Then initialize interned key in import_array(): ... import_array(...) { ... PyCustomSlotsInternerContext interner = PyCustomSlots_GetInterner(); _Npy_interned_numpy_SHAPE = PyCustomSlots_InternLiteral("numpy:SHAPE"); ... } Then, let's get rid of that PyArrayObject (in the *API*; of course there's still some struct representing the NumPy array internally but its layout is no longer exposed anywhere). That means always using PyObject, just like the Python API does, e.g., PyDict_GetItem gets a PyObject even if it must be a dict. But for backwards compatability, let's throw in: typedef PyObject PyArrayObject; Now, change PyArray_Check a bit (likely/unlikely indicates branch hints, e.g. __builtin_expect in gcc). Some context: typedef struct { char *interned_key; uintptr_t flags; void *funcptr; } PyCustomSlot; Then: static inline int PyArray_Check(PyObject *arr) { /* "it is an array if it has the "numpy:SHAPE" slot" This is a bad choice of test but for simplicity... */ if (likely(PyCustomSlots_Check(arr->ob_type)) { PyCustomSlot *slot; slot = PyCustomSlots_Find(arr->ob_type, NPY_SHAPE_SLOT_PREHASH, _Npy_interned_numpy_SHAPE) if (likely(slot != NULL)) return 1; } return 0; } Finally, we can write our new PyArray_DIMS: static inline npy_intp *PyArray_DIMS(PyObject *arr) { PyCustomSlot *slot = PyCustomSlots_FindAssumePresent(arr->tp_base, NPY_SHAPE_SLOT_PREHASH); return (*slot->funcptr)(arr); } What goes on here is: - PyCustomSlots_Check checks whether the metaclass (arr->ob_type->tp_base) is the PyExtensibleType_Type, which is a class we agree upon by SEP - PyCustomSlots_Find takes the prehash of the key which through the parametrized hash function gives the position in the hash table. At that position in the PyCustomSlot array, one either finds the element (by comparing the interned key by pointer value), or the element is not in the table (so no loops or branch misses). - Finally, inside PyArray_DIMS we assume that PyArray_Check has already been called. Thus, since we know the slot is in the table, we can skip even the check and shave off a nanosecond. What is replaced ---------------- Largely the macros and existing function pointers imported by import_array. However, some of the functions (in particular constructors etc.) would work just like before. Only OOP "methods" change their behaviour. Compared to the macros, there should be ~4-7 ns penalty per call on my computer (1.9 GHz). However, compared to making PyArray_SHAPE a function going through the import_array function table, the cost is only a couple of ns. >> Me and Robert have talked a lot about this and will move forward with it >> for Cython. Obviously I don't expect others than me to pick it up for >> NumPy so we'll see... I'll write up a specification document sometimes >> over the next couple of weeks as we need that even if only for Cython. > > We will look forward to what you come up with. Will keep you posted, Dag From jdh2358 at gmail.com Tue Jun 26 16:39:47 2012 From: jdh2358 at gmail.com (John Hunter) Date: Tue, 26 Jun 2012 15:39:47 -0500 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> <4032F07C-6BE4-4FDF-8627-9AA772AA9097@continuum.io> <4FEA1646.1020008@astro.uio.no> <4FEA178C.2020900@creativetrax.com> Message-ID: On Tue, Jun 26, 2012 at 3:27 PM, Thouis (Ray) Jones wrote: > +1 ! > > Speaking as someone trying to get started in contributing to numpy, I > find this discussion extremely off-putting. ?It's childish, > meaningless, and spiteful, and I think it's doing more harm than any > possible good that could come out of continuing it. Hey Thouis, Just chiming in to encourage you not to get discouraged. There is a large, mostly silent majority who feel just the same way you do, it's just that they are silent precisely because they want to write good code and contribute and not participate in long, unproductive email threads that border on flame wars. You've made helpful comments here already advising people to take this offlist. After that there is nothing much to do but roll up your sleeves, make some pull requests, and engage in a worthwhile discussion about work. There are lots of people here who will engage you on that. From jason-sage at creativetrax.com Tue Jun 26 16:40:21 2012 From: jason-sage at creativetrax.com (Jason Grout) Date: Tue, 26 Jun 2012 15:40:21 -0500 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: <5791B4AC-CC39-4853-98AD-5EAAE3E0F651@continuum.io> References: <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> <4032F07C-6BE4-4FDF-8627-9AA772AA9097@continuum.io> <4FEA1646.1020008@astro.uio.no> <4FEA178C.2020900@creativetrax.com> <5791B4AC-CC39-4853-98AD-5EAAE3E0F651@continuu m.io> Message-ID: <4FEA1E35.90007@creativetrax.com> On 6/26/12 3:31 PM, Travis Oliphant wrote: > Thank you for the reminder. I was already called out for not stopping. Thanks, Dag. A flame-list might indeed be a good idea at this point if there is further need for "clearing the air" > Also, having it set up before it is needed is part of the solution. Setting it up in the heat of the moment can just further inflame feelings. You put a pressure valve in at the start, instead of waiting for a hole to blow in the side :). Sort of like all the governance discussions about setting up a decision procedure before having to face a huge decision.... Jason From d.s.seljebotn at astro.uio.no Tue Jun 26 16:49:50 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 26 Jun 2012 22:49:50 +0200 Subject: [Numpy-discussion] moving forward around ABI/API compatibilities (was numpy 1.7.x branch) In-Reply-To: <4FEA1CF9.8080306@astro.uio.no> References: <4FE9C241.4000302@astro.uio.no> <4FEA1CF9.8080306@astro.uio.no> Message-ID: <4FEA206E.80104@astro.uio.no> On 06/26/2012 10:35 PM, Dag Sverre Seljebotn wrote: > On 06/26/2012 05:02 PM, Travis Oliphant wrote: >>>> >>>> (I have not read the whole cython discussion yet) >>> >>> So here's the summary. It's rather complicated but also incredibly neat >>> :-) And technical details can be hidden behind a tight API. >> >> Could you provide a bit more context for this list. I think this is an important technology concept. I'd like to understand better how well it jives with Numba-produced APIs and how we can make use of it in NumPy. >> >> Where exactly would this be used in the NumPy API? What would it replace? > > Right. I thought I did that :-) I realize I might sometimes be too > brief, part of the "problem" is I'm used to Cython development where I > can start a sentence and then Mark Florisson or Robert Bradshaw can > finish it. > > I'll try to step through how PyArray_DIMS could work under a refactored > API from a C client. To do this I gloss over some of the finer points > etc. and just make a premature decision here and there. Almost none of > the types or functions below already exists, I'll assume we implement > them (I do have a good start on the reference implementation). > > We'll add a new C-level slot called "numpy:SHAPE" to the ndarray type, > and hook the PyArray_DIMS to use this slot. > > Inside NumPy > ------------ > > The PyArray_Type (?) definition changes from being a PyTypeObject to a > PyExtensibleTypeObject, and PyExtensibleType_Ready is called instead of > PyType_Ready. This builds the perfect lookup table etc. I'll omit the > details. > > The caller > ---------- > > First we need some macro module initialization setup (part of NumPy > include files): > > /* lower-64-bits of md5 of "numpy:SHAPE" */ > #define NPY_SHAPE_SLOT_PREHASH 0xa8cf70dc5f598f40ULL > /* hold an interned "numpy:SHAPE" string */ > static char *_Npy_interned_numpy_SHAPE; > > Then initialize interned key in import_array(): > > ... import_array(...) > { > ... > PyCustomSlotsInternerContext interner = PyCustomSlots_GetInterner(); > _Npy_interned_numpy_SHAPE = PyCustomSlots_InternLiteral("numpy:SHAPE"); > ... > } > > Then, let's get rid of that PyArrayObject (in the *API*; of course > there's still some struct representing the NumPy array internally but > its layout is no longer exposed anywhere). That means always using > PyObject, just like the Python API does, e.g., PyDict_GetItem gets a > PyObject even if it must be a dict. But for backwards compatability, > let's throw in: > > typedef PyObject PyArrayObject; > > Now, change PyArray_Check a bit (likely/unlikely indicates branch hints, > e.g. __builtin_expect in gcc). Some context: > > typedef struct { > char *interned_key; > uintptr_t flags; > void *funcptr; > } PyCustomSlot; > > Then: > > static inline int PyArray_Check(PyObject *arr) { > /* "it is an array if it has the "numpy:SHAPE" slot" > This is a bad choice of test but for simplicity... */ > if (likely(PyCustomSlots_Check(arr->ob_type)) { > PyCustomSlot *slot; > slot = PyCustomSlots_Find(arr->ob_type, > NPY_SHAPE_SLOT_PREHASH, _Npy_interned_numpy_SHAPE) > if (likely(slot != NULL)) return 1; > } > return 0; > } > > Finally, we can write our new PyArray_DIMS: > First bug report: > static inline npy_intp *PyArray_DIMS(PyObject *arr) { > PyCustomSlot *slot = PyCustomSlots_FindAssumePresent(arr->tp_base, > NPY_SHAPE_SLOT_PREHASH); > return (*slot->funcptr)(arr); last line should be npy_intp *(*func)(PyObject*); func = slot->funcptr; /* tbd throw in cast for C++ */ return (*func)(arr); Dag > } > > What goes on here is: > > - PyCustomSlots_Check checks whether the metaclass > (arr->ob_type->tp_base) is the PyExtensibleType_Type, which is a class > we agree upon by SEP > > - PyCustomSlots_Find takes the prehash of the key which through the > parametrized hash function gives the position in the hash table. At that > position in the PyCustomSlot array, one either finds the element (by > comparing the interned key by pointer value), or the element is not in > the table (so no loops or branch misses). > > - Finally, inside PyArray_DIMS we assume that PyArray_Check has > already been called. Thus, since we know the slot is in the table, we > can skip even the check and shave off a nanosecond. > > What is replaced > ---------------- > > Largely the macros and existing function pointers imported by > import_array. However, some of the functions (in particular constructors > etc.) would work just like before. Only OOP "methods" change their > behaviour. > > Compared to the macros, there should be ~4-7 ns penalty per call on my > computer (1.9 GHz). However, compared to making PyArray_SHAPE a function > going through the import_array function table, the cost is only a couple > of ns. > >>> Me and Robert have talked a lot about this and will move forward with it >>> for Cython. Obviously I don't expect others than me to pick it up for >>> NumPy so we'll see... I'll write up a specification document sometimes >>> over the next couple of weeks as we need that even if only for Cython. >> >> We will look forward to what you come up with. > > Will keep you posted, > > Dag > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From andrea.gavana at gmail.com Tue Jun 26 17:02:22 2012 From: andrea.gavana at gmail.com (Andrea Gavana) Date: Tue, 26 Jun 2012 23:02:22 +0200 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> <4032F07C-6BE4-4FDF-8627-9AA772AA9097@continuum.io> <4FEA1646.1020008@astro.uio.no> <4FEA178C.2020900@creativetrax.com> Message-ID: On 26 June 2012 22:39, John Hunter wrote: > On Tue, Jun 26, 2012 at 3:27 PM, Thouis (Ray) Jones wrote: >> +1 ! >> >> Speaking as someone trying to get started in contributing to numpy, I >> find this discussion extremely off-putting. ?It's childish, >> meaningless, and spiteful, and I think it's doing more harm than any >> possible good that could come out of continuing it. > > Hey Thouis, > > Just chiming in to encourage you not to get discouraged. ?There is a > large, mostly silent majority who feel just the same way you do, it's > just that they are silent precisely because they want to write good > code and contribute and not participate in long, unproductive email > threads that border on flame wars. ?You've made helpful comments here > already advising people to take this offlist. ?After that there is > nothing much to do but roll up your sleeves, make some pull requests, > and engage in a worthwhile discussion about work. ?There are lots of > people here who will engage you on that. +1 from a pretty much silent user. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ From ralf.gommers at googlemail.com Tue Jun 26 17:02:44 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 26 Jun 2012 23:02:44 +0200 Subject: [Numpy-discussion] API policy In-Reply-To: <905AFE89-FA0B-48E0-A5C3-13A39BDC55E1@continuum.io> References: <905AFE89-FA0B-48E0-A5C3-13A39BDC55E1@continuum.io> Message-ID: On Tue, Jun 26, 2012 at 9:34 PM, Travis Oliphant wrote: > I think we need to update this document: > http://projects.scipy.org/numpy/wiki/ApiDeprecation > Sounds fine to me to make the period for removal longer, or even to by default aim to not remove deprecated API's at all in minor release (unless the deprecation is due to buggy or incorrect behavior?). > I don't think this characterizes the opinion of all involved in NumPy > development (it is certainly not the way I view our commitment to users). > Incidentally, in the migration from Trac we should move all pages like this > from Trac to Github pages or some other location. > > The idea that APIs should disappear after one minor release really needs > to be re-visited -- especially if there is a strong interest in changing > the APIs as there has been in the move from 1.5.x to 1.6 and then from 1.6 > to 1.7. This created a situation where a large number of people who did > not take the 1.6.x upgrade could potentially have APIs that disappear. > This last sentence doesn't make sense, I'm sorry. Please read the release notes. In 1.6.0 there was exactly one deprecation, the "normed" keyword in histogram(). And in 1.6.1 and 1.6.2 there were none of course. I agree with what you're arguing for here (as little impact as possible on existing users), but your view of especially 1.6.x seems to be skewed by regressions and changes that were either unintended or thought to be okay because the affected numpy behavior was undocumented / off-label / untested. The poor test coverage being the number one culprit (example regression: http://projects.scipy.org/numpy/ticket/2078). Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue Jun 26 17:42:47 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 26 Jun 2012 14:42:47 -0700 Subject: [Numpy-discussion] Matrix rank default tolerance - is it too low? In-Reply-To: References: Message-ID: Hi, On Mon, Jun 18, 2012 at 3:50 PM, Matthew Brett wrote: > Hi, > > On Sun, Jun 17, 2012 at 7:22 PM, Charles R Harris > wrote: >> >> >> On Sat, Jun 16, 2012 at 2:33 PM, Matthew Brett >> wrote: >>> >>> Hi, >>> >>> On Sat, Jun 16, 2012 at 8:03 PM, Matthew Brett >>> wrote: >>> > Hi, >>> > >>> > On Sat, Jun 16, 2012 at 10:40 AM, Nathaniel Smith wrote: >>> >> On Fri, Jun 15, 2012 at 4:10 AM, Charles R Harris >>> >> wrote: >>> >>> >>> >>> >>> >>> On Thu, Jun 14, 2012 at 8:06 PM, Matthew Brett >>> >>> >>> >>> wrote: >>> >>>> >>> >>>> Hi, >>> >>>> >>> >>>> I noticed that numpy.linalg.matrix_rank sometimes gives full rank for >>> >>>> matrices that are numerically rank deficient: >>> >>>> >>> >>>> If I repeatedly make random matrices, then set the first column to be >>> >>>> equal to the sum of the second and third columns: >>> >>>> >>> >>>> def make_deficient(): >>> >>>> ? ?X = np.random.normal(size=(40, 10)) >>> >>>> ? ?deficient_X = X.copy() >>> >>>> ? ?deficient_X[:, 0] = deficient_X[:, 1] + deficient_X[:, 2] >>> >>>> ? ?return deficient_X >>> >>>> >>> >>>> then the current numpy.linalg.matrix_rank algorithm returns full rank >>> >>>> (10) in about 8 percent of cases (see appended script). >>> >>>> >>> >>>> I think this is a tolerance problem. ?The ``matrix_rank`` algorithm >>> >>>> does this by default: >>> >>>> >>> >>>> S = spl.svd(M, compute_uv=False) >>> >>>> tol = S.max() * np.finfo(S.dtype).eps >>> >>>> return np.sum(S > tol) >>> >>>> >>> >>>> I guess we'd we want the lowest tolerance that nearly always or >>> >>>> always >>> >>>> identifies numerically rank deficient matrices. ?I suppose one way of >>> >>>> looking at whether the tolerance is in the right range is to compare >>> >>>> the calculated tolerance (``tol``) to the minimum singular value >>> >>>> (``S.min()``) because S.min() in our case should be very small and >>> >>>> indicate the rank deficiency. The mean value of tol / S.min() for the >>> >>>> current algorithm, across many iterations, is about 2.8. ?We might >>> >>>> hope this value would be higher than 1, but not much higher, >>> >>>> otherwise >>> >>>> we might be rejecting too many columns. >>> >>>> >>> >>>> Our current algorithm for tolerance is the same as the 2-norm of M * >>> >>>> eps. ?We're citing Golub and Van Loan for this, but now I look at our >>> >>>> copy (p 261, last para) - they seem to be suggesting using u * |M| >>> >>>> where u = (p 61, section 2.4.2) eps / ?2. (see [1]). I think the >>> >>>> Golub >>> >>>> and Van Loan suggestion corresponds to: >>> >>>> >>> >>>> tol = np.linalg.norm(M, np.inf) * np.finfo(M.dtype).eps / 2 >>> >>>> >>> >>>> This tolerance gives full rank for these rank-deficient matrices in >>> >>>> about 39 percent of cases (tol / S.min() ratio of 1.7) >>> >>>> >>> >>>> We see on p 56 (section 2.3.2) that: >>> >>>> >>> >>>> m, n = M.shape >>> >>>> 1 / sqrt(n) . |M|_{inf} <= |M|_2 >>> >>>> >>> >>>> So we can get an upper bound on |M|_{inf} with |M|_2 * sqrt(n). >>> >>>> ?Setting: >>> >>>> >>> >>>> tol = S.max() * np.finfo(M.dtype).eps / 2 * np.sqrt(n) >>> >>>> >>> >>>> gives about 0.5 percent error (tol / S.min() of 4.4) >>> >>>> >>> >>>> Using the Mathworks threshold [2]: >>> >>>> >>> >>>> tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) >>> >>>> >>> >>>> There are no false negatives (0 percent rank 10), but tol / S.min() >>> >>>> is >>> >>>> around 110 - so conservative, in this case. >>> >>>> >>> >>>> So - summary - I'm worrying our current threshold is too small, >>> >>>> letting through many rank-deficient matrices without detection. ?I >>> >>>> may >>> >>>> have misread Golub and Van Loan, but maybe we aren't doing what they >>> >>>> suggest. ?Maybe what we could use is either the MATLAB threshold or >>> >>>> something like: >>> >>>> >>> >>>> tol = S.max() * np.finfo(M.dtype).eps * np.sqrt(n) >>> >>>> >>> >>>> - so 2 * the upper bound for the inf norm = 2 * |M|_2 * sqrt(n) . >>> >>>> This >>> >>>> gives 0 percent misses and tol / S.min() of 8.7. >>> >>>> >>> >>>> What do y'all think? >>> >>>> >>> >>>> Best, >>> >>>> >>> >>>> Matthew >>> >>>> >>> >>>> [1] >>> >>>> >>> >>>> http://matthew-brett.github.com/pydagogue/floating_error.html#machine-epsilon >>> >>>> [2] http://www.mathworks.com/help/techdoc/ref/rank.html >>> >>>> >>> >>>> Output from script: >>> >>>> >>> >>>> Percent undetected current: 9.8, tol / S.min(): 2.762 >>> >>>> Percent undetected inf norm: 39.1, tol / S.min(): 1.667 >>> >>>> Percent undetected upper bound inf norm: 0.5, tol / S.min(): 4.367 >>> >>>> Percent undetected upper bound inf norm * 2: 0.0, tol / S.min(): >>> >>>> 8.734 >>> >>>> Percent undetected MATLAB: 0.0, tol / S.min(): 110.477 >>> >>>> >>> >>>> >>> >>> >>> >>> >>> >>> The polynomial fitting uses eps times the largest array dimension for >>> >>> the >>> >>> relative condition number. IIRC, that choice traces back to numerical >>> >>> recipes. >>> > >>> > Chuck - sorry - I didn't understand what you were saying, and now I >>> > think you were proposing the MATLAB algorithm. ? I can't find that in >>> > Numerical Recipes - can you? ?It would be helpful as a reference. >>> > >>> >> This is the same as Matlab, right? >>> > >>> > Yes, I believe so, i.e: >>> > >>> > tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) >>> > >>> > from my original email. >>> > >>> >> If the Matlab condition is the most conservative, then it seems like a >>> >> reasonable choice -- conservative is good so long as your false >>> >> positive rate doesn't become to high, and presumably Matlab has enough >>> >> user experience to know whether the false positive rate is too high. >>> > >>> > Are we agreeing to go for the Matlab algorithm? >>> >>> As extra data, current Numerical Recipes (2007, p 67) appears to prefer: >>> >>> tol = S.max() * np.finfo(M.dtype).eps / 2. * np.sqrt(m + n + 1.) >> >> >> That's interesting, as something like that with a square root was my first >> choice for the least squares, but then someone mentioned the NR choice. That >> was all on the mailing list way several years back when I was fixing up the >> polynomial fitting routine. The NR reference is on page 517 of the 1986 >> edition (FORTRAN), which might be hard to come by these days ;) > > Thanks for tracking that down, it's very helpful. > > For those of you not near a huge University library or your own > private copy, p517 says: > > "A plausible answer to the question "how small is small", is to edit > in this fashion all singular values whose ratio to the largest > singular value is less then N times the machine precision \epsilon. > (You might argue for root N, or a constant, instead of N as the > multiple; that starts getting into hardware-dependent questions). > > Earlier (p510) we see the (General Linear Least Squares) problem being > set up as A = (N x M) where N >= M. > > The 2007 edition replaces the "(You might argue... )" text with: (p 795) > > "(This is a more conservative recommendation than the default in > section 2.6 which scales as N^{1/2})" > > and this in turn refers to the threshold: > > tol = S.max() * np.finfo(M.dtype).eps / 2. * np.sqrt(m + n + 1.) > > (p67) - which is justified as being (p71) " ... a default value based > on expected roundoff error". > > I could not at first glance see any other justification for this > threshold in the text. > > So, how about something like: > > def matrix_rank(M, tol='maxdim'): > ... > > ? ?tol: {'maxdim', 'nr-roundoff'} or float > ? ? ? ? If str, gives threshold strategy for tolerance relative to > the maximum singular value, explained below. ?If float gives absolute > tolerance below which singular values assumed zero. ?For the threshold > strategies, ?we will call the maximum singular value``S.max()`` and > the floating point epsilon for the working precision data type > ``eps``. ?Default strategy is 'maxdim' ? corresponding to ``tol = > S.max() * eps * max(M.shape)``. ?This is the MATLAB default; see also > Numerical Recipes 2007. ?Other options are 'nr-roundoff' (also from > Numerical Recipes 2007) corresponding to ``tol = S.max() * eps / 2 * > np.sqrt(M.shape[0] + M.shape[1] + 1)``. > > ? Thinking about this more, I'm tempted to just go for the matlab / old NR solution, that is: tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) and not offer any other options. My reasoning is that the matrix_rank code is basically trivial; it is a convenience function. If someone wants a better matrix rank it's reasonable to copy these few lines into a new function. So providing multiple options for tolerance seems like overkill. Then the question is whether to use NR-2007: tol = S.max() * np.finfo(M.dtype).eps / 2. * np.sqrt(m + n + 1.) or the matlab default above. I'm leaning to the matlab version because we can be more or less sure that it does not miss actual numerical rank deficiency and it might be what people are expecting, if they shift from matlab or compare to matlab. The NR-2007 might well be closer to an accurate threshold for numerical rank deficiency, but I haven't tested it with all variants and all platforms, and the reduced false positives seems a minor gain compared to the slight increase in risk of false negatives and difference from matlab. What do y'all think, Matthew From jsalvati at u.washington.edu Tue Jun 26 17:53:57 2012 From: jsalvati at u.washington.edu (John Salvatier) Date: Tue, 26 Jun 2012 14:53:57 -0700 Subject: [Numpy-discussion] What's the most numpythonic way to support multiple types in a C extension? Message-ID: I want to support multiple types in the index_increment function that I've written here: https://github.com/jsalvatier/numpy/blob/master/numpy/core/src/multiarray/mapping.c I need to check that the first argument's type can support addition, cast the dataptr to the appropriate type and do the addition operation for that type. It looks like some of the numpy code uses .c.src files to do templating. Is that what I want to do here? Is the syntax described somewhere? John -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Jun 26 17:57:32 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 26 Jun 2012 15:57:32 -0600 Subject: [Numpy-discussion] Matrix rank default tolerance - is it too low? In-Reply-To: References: Message-ID: On Tue, Jun 26, 2012 at 3:42 PM, Matthew Brett wrote: > Hi, > > On Mon, Jun 18, 2012 at 3:50 PM, Matthew Brett > wrote: > > Hi, > > > > On Sun, Jun 17, 2012 at 7:22 PM, Charles R Harris > > wrote: > >> > >> > >> On Sat, Jun 16, 2012 at 2:33 PM, Matthew Brett > > >> wrote: > >>> > >>> Hi, > >>> > >>> On Sat, Jun 16, 2012 at 8:03 PM, Matthew Brett < > matthew.brett at gmail.com> > >>> wrote: > >>> > Hi, > >>> > > >>> > On Sat, Jun 16, 2012 at 10:40 AM, Nathaniel Smith > wrote: > >>> >> On Fri, Jun 15, 2012 at 4:10 AM, Charles R Harris > >>> >> wrote: > >>> >>> > >>> >>> > >>> >>> On Thu, Jun 14, 2012 at 8:06 PM, Matthew Brett > >>> >>> > >>> >>> wrote: > >>> >>>> > >>> >>>> Hi, > >>> >>>> > >>> >>>> I noticed that numpy.linalg.matrix_rank sometimes gives full rank > for > >>> >>>> matrices that are numerically rank deficient: > >>> >>>> > >>> >>>> If I repeatedly make random matrices, then set the first column > to be > >>> >>>> equal to the sum of the second and third columns: > >>> >>>> > >>> >>>> def make_deficient(): > >>> >>>> X = np.random.normal(size=(40, 10)) > >>> >>>> deficient_X = X.copy() > >>> >>>> deficient_X[:, 0] = deficient_X[:, 1] + deficient_X[:, 2] > >>> >>>> return deficient_X > >>> >>>> > >>> >>>> then the current numpy.linalg.matrix_rank algorithm returns full > rank > >>> >>>> (10) in about 8 percent of cases (see appended script). > >>> >>>> > >>> >>>> I think this is a tolerance problem. The ``matrix_rank`` > algorithm > >>> >>>> does this by default: > >>> >>>> > >>> >>>> S = spl.svd(M, compute_uv=False) > >>> >>>> tol = S.max() * np.finfo(S.dtype).eps > >>> >>>> return np.sum(S > tol) > >>> >>>> > >>> >>>> I guess we'd we want the lowest tolerance that nearly always or > >>> >>>> always > >>> >>>> identifies numerically rank deficient matrices. I suppose one > way of > >>> >>>> looking at whether the tolerance is in the right range is to > compare > >>> >>>> the calculated tolerance (``tol``) to the minimum singular value > >>> >>>> (``S.min()``) because S.min() in our case should be very small and > >>> >>>> indicate the rank deficiency. The mean value of tol / S.min() for > the > >>> >>>> current algorithm, across many iterations, is about 2.8. We might > >>> >>>> hope this value would be higher than 1, but not much higher, > >>> >>>> otherwise > >>> >>>> we might be rejecting too many columns. > >>> >>>> > >>> >>>> Our current algorithm for tolerance is the same as the 2-norm of > M * > >>> >>>> eps. We're citing Golub and Van Loan for this, but now I look at > our > >>> >>>> copy (p 261, last para) - they seem to be suggesting using u * |M| > >>> >>>> where u = (p 61, section 2.4.2) eps / 2. (see [1]). I think the > >>> >>>> Golub > >>> >>>> and Van Loan suggestion corresponds to: > >>> >>>> > >>> >>>> tol = np.linalg.norm(M, np.inf) * np.finfo(M.dtype).eps / 2 > >>> >>>> > >>> >>>> This tolerance gives full rank for these rank-deficient matrices > in > >>> >>>> about 39 percent of cases (tol / S.min() ratio of 1.7) > >>> >>>> > >>> >>>> We see on p 56 (section 2.3.2) that: > >>> >>>> > >>> >>>> m, n = M.shape > >>> >>>> 1 / sqrt(n) . |M|_{inf} <= |M|_2 > >>> >>>> > >>> >>>> So we can get an upper bound on |M|_{inf} with |M|_2 * sqrt(n). > >>> >>>> Setting: > >>> >>>> > >>> >>>> tol = S.max() * np.finfo(M.dtype).eps / 2 * np.sqrt(n) > >>> >>>> > >>> >>>> gives about 0.5 percent error (tol / S.min() of 4.4) > >>> >>>> > >>> >>>> Using the Mathworks threshold [2]: > >>> >>>> > >>> >>>> tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) > >>> >>>> > >>> >>>> There are no false negatives (0 percent rank 10), but tol / > S.min() > >>> >>>> is > >>> >>>> around 110 - so conservative, in this case. > >>> >>>> > >>> >>>> So - summary - I'm worrying our current threshold is too small, > >>> >>>> letting through many rank-deficient matrices without detection. I > >>> >>>> may > >>> >>>> have misread Golub and Van Loan, but maybe we aren't doing what > they > >>> >>>> suggest. Maybe what we could use is either the MATLAB threshold > or > >>> >>>> something like: > >>> >>>> > >>> >>>> tol = S.max() * np.finfo(M.dtype).eps * np.sqrt(n) > >>> >>>> > >>> >>>> - so 2 * the upper bound for the inf norm = 2 * |M|_2 * sqrt(n) . > >>> >>>> This > >>> >>>> gives 0 percent misses and tol / S.min() of 8.7. > >>> >>>> > >>> >>>> What do y'all think? > >>> >>>> > >>> >>>> Best, > >>> >>>> > >>> >>>> Matthew > >>> >>>> > >>> >>>> [1] > >>> >>>> > >>> >>>> > http://matthew-brett.github.com/pydagogue/floating_error.html#machine-epsilon > >>> >>>> [2] http://www.mathworks.com/help/techdoc/ref/rank.html > >>> >>>> > >>> >>>> Output from script: > >>> >>>> > >>> >>>> Percent undetected current: 9.8, tol / S.min(): 2.762 > >>> >>>> Percent undetected inf norm: 39.1, tol / S.min(): 1.667 > >>> >>>> Percent undetected upper bound inf norm: 0.5, tol / S.min(): 4.367 > >>> >>>> Percent undetected upper bound inf norm * 2: 0.0, tol / S.min(): > >>> >>>> 8.734 > >>> >>>> Percent undetected MATLAB: 0.0, tol / S.min(): 110.477 > >>> >>>> > >>> >>>> > >>> >>> > >>> >>> > >>> >>> The polynomial fitting uses eps times the largest array dimension > for > >>> >>> the > >>> >>> relative condition number. IIRC, that choice traces back to > numerical > >>> >>> recipes. > >>> > > >>> > Chuck - sorry - I didn't understand what you were saying, and now I > >>> > think you were proposing the MATLAB algorithm. I can't find that in > >>> > Numerical Recipes - can you? It would be helpful as a reference. > >>> > > >>> >> This is the same as Matlab, right? > >>> > > >>> > Yes, I believe so, i.e: > >>> > > >>> > tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) > >>> > > >>> > from my original email. > >>> > > >>> >> If the Matlab condition is the most conservative, then it seems > like a > >>> >> reasonable choice -- conservative is good so long as your false > >>> >> positive rate doesn't become to high, and presumably Matlab has > enough > >>> >> user experience to know whether the false positive rate is too high. > >>> > > >>> > Are we agreeing to go for the Matlab algorithm? > >>> > >>> As extra data, current Numerical Recipes (2007, p 67) appears to > prefer: > >>> > >>> tol = S.max() * np.finfo(M.dtype).eps / 2. * np.sqrt(m + n + 1.) > >> > >> > >> That's interesting, as something like that with a square root was my > first > >> choice for the least squares, but then someone mentioned the NR choice. > That > >> was all on the mailing list way several years back when I was fixing up > the > >> polynomial fitting routine. The NR reference is on page 517 of the 1986 > >> edition (FORTRAN), which might be hard to come by these days ;) > > > > Thanks for tracking that down, it's very helpful. > > > > For those of you not near a huge University library or your own > > private copy, p517 says: > > > > "A plausible answer to the question "how small is small", is to edit > > in this fashion all singular values whose ratio to the largest > > singular value is less then N times the machine precision \epsilon. > > (You might argue for root N, or a constant, instead of N as the > > multiple; that starts getting into hardware-dependent questions). > > > > Earlier (p510) we see the (General Linear Least Squares) problem being > > set up as A = (N x M) where N >= M. > > > > The 2007 edition replaces the "(You might argue... )" text with: (p 795) > > > > "(This is a more conservative recommendation than the default in > > section 2.6 which scales as N^{1/2})" > > > > and this in turn refers to the threshold: > > > > tol = S.max() * np.finfo(M.dtype).eps / 2. * np.sqrt(m + n + 1.) > > > > (p67) - which is justified as being (p71) " ... a default value based > > on expected roundoff error". > > > > I could not at first glance see any other justification for this > > threshold in the text. > > > > So, how about something like: > > > > def matrix_rank(M, tol='maxdim'): > > ... > > > > tol: {'maxdim', 'nr-roundoff'} or float > > If str, gives threshold strategy for tolerance relative to > > the maximum singular value, explained below. If float gives absolute > > tolerance below which singular values assumed zero. For the threshold > > strategies, we will call the maximum singular value``S.max()`` and > > the floating point epsilon for the working precision data type > > ``eps``. Default strategy is 'maxdim' corresponding to ``tol = > > S.max() * eps * max(M.shape)``. This is the MATLAB default; see also > > Numerical Recipes 2007. Other options are 'nr-roundoff' (also from > > Numerical Recipes 2007) corresponding to ``tol = S.max() * eps / 2 * > > np.sqrt(M.shape[0] + M.shape[1] + 1)``. > > > > ? > > Thinking about this more, I'm tempted to just go for the matlab / old > NR solution, that is: > > tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) > > and not offer any other options. My reasoning is that the matrix_rank > code is basically trivial; it is a convenience function. If someone > wants a better matrix rank it's reasonable to copy these few lines > into a new function. So providing multiple options for tolerance > seems like overkill. > > Then the question is whether to use NR-2007: > > tol = S.max() * np.finfo(M.dtype).eps / 2. * np.sqrt(m + n + 1.) > > or the matlab default above. I'm leaning to the matlab version > because we can be more or less sure that it does not miss actual > numerical rank deficiency and it might be what people are expecting, > if they shift from matlab or compare to matlab. The NR-2007 might > well be closer to an accurate threshold for numerical rank deficiency, > but I haven't tested it with all variants and all platforms, and the > reduced false positives seems a minor gain compared to the slight > increase in risk of false negatives and difference from matlab. > > What do y'all think, > > I'm fine with that, and agree that it is likely to lead to fewer folks wondering why Matlab and numpy are different. A good explanation in the function documentation would be useful. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jbpoline at gmail.com Tue Jun 26 18:41:12 2012 From: jbpoline at gmail.com (JB Poline) Date: Tue, 26 Jun 2012 15:41:12 -0700 Subject: [Numpy-discussion] Matrix rank default tolerance - is it too low? Message-ID: >On Sat, Jun 16, 2012 at 4:39 PM, Nathaniel Smith wrote: >> On Sat, Jun 16, 2012 at 9:03 PM, Matthew Brett wrote: >>> Hi, >>> >>> On Sat, Jun 16, 2012 at 10:40 AM, Nathaniel Smith wrote: >>>> On Fri, Jun 15, 2012 at 4:10 AM, Charles R Harris >>>> wrote: >>>>> >>>>> >>>>> On Thu, Jun 14, 2012 at 8:06 PM, Matthew Brett >>>>> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I noticed that numpy.linalg.matrix_rank sometimes gives full rank for >>>>>> matrices that are numerically rank deficient: >>>>>> >>>>>> If I repeatedly make random matrices, then set the first column to be >>>>>> equal to the sum of the second and third columns: >>>>>> >>>>>> def make_deficient(): >>>>>> ? ?X = np.random.normal(size=(40, 10)) >>>>>> ? ?deficient_X = X.copy() >>>>>> ? ?deficient_X[:, 0] = deficient_X[:, 1] + deficient_X[:, 2] >>>>>> ? ?return deficient_X >>>>>> >>>>>> then the current numpy.linalg.matrix_rank algorithm returns full rank >>>>>> (10) in about 8 percent of cases (see appended script). >>>>>> >>>>>> I think this is a tolerance problem. ?The ``matrix_rank`` algorithm >>>>>> does this by default: >>>>>> >>>>>> S = spl.svd(M, compute_uv=False) >>>>>> tol = S.max() * np.finfo(S.dtype).eps >>>>>> return np.sum(S > tol) >>>>>> >>>>>> I guess we'd we want the lowest tolerance that nearly always or always >>>>>> identifies numerically rank deficient matrices. ?I suppose one way of >>>>>> looking at whether the tolerance is in the right range is to compare >>>>>> the calculated tolerance (``tol``) to the minimum singular value >>>>>> (``S.min()``) because S.min() in our case should be very small and >>>>>> indicate the rank deficiency. The mean value of tol / S.min() for the >>>>>> current algorithm, across many iterations, is about 2.8. ?We might >>>>>> hope this value would be higher than 1, but not much higher, otherwise >>>>>> we might be rejecting too many columns. >>>>>> >>>>>> Our current algorithm for tolerance is the same as the 2-norm of M * >>>>>> eps. ?We're citing Golub and Van Loan for this, but now I look at our >>>>>> copy (p 261, last para) - they seem to be suggesting using u * |M| >>>>>> where u = (p 61, section 2.4.2) eps / ?2. (see [1]). I think the Golub >>>>>> and Van Loan suggestion corresponds to: >>>>>> >>>>>> tol = np.linalg.norm(M, np.inf) * np.finfo(M.dtype).eps / 2 >>>>>> >>>>>> This tolerance gives full rank for these rank-deficient matrices in >>>>>> about 39 percent of cases (tol / S.min() ratio of 1.7) >>>>>> >>>>>> We see on p 56 (section 2.3.2) that: >>>>>> >>>>>> m, n = M.shape >>>>>> 1 / sqrt(n) . |M|_{inf} <= |M|_2 >>>>>> >>>>>> So we can get an upper bound on |M|_{inf} with |M|_2 * sqrt(n). ?Setting: >>>>>> >>>>>> tol = S.max() * np.finfo(M.dtype).eps / 2 * np.sqrt(n) >>>>>> >>>>>> gives about 0.5 percent error (tol / S.min() of 4.4) >>>>>> >>>>>> Using the Mathworks threshold [2]: >>>>>> >>>>>> tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) >>>>>> >>>>>> There are no false negatives (0 percent rank 10), but tol / S.min() is >>>>>> around 110 - so conservative, in this case. >>>>>> >>>>>> So - summary - I'm worrying our current threshold is too small, >>>>>> letting through many rank-deficient matrices without detection. ?I may >>>>>> have misread Golub and Van Loan, but maybe we aren't doing what they >>>>>> suggest. ?Maybe what we could use is either the MATLAB threshold or >>>>>> something like: >>>>>> >>>>>> tol = S.max() * np.finfo(M.dtype).eps * np.sqrt(n) >>>>>> >>>>>> - so 2 * the upper bound for the inf norm = 2 * |M|_2 * sqrt(n) . This >>>>>> gives 0 percent misses and tol / S.min() of 8.7. >>>>>> >>>>>> What do y'all think? >>>>>> >>>>>> Best, >>>>>> >>>>>> Matthew >>>>>> >>>>>> [1] >>>>>> http://matthew-brett.github.com/pydagogue/floating_error.html#machine-epsilon >>>>>> [2] http://www.mathworks.com/help/techdoc/ref/rank.html >>>>>> >>>>>> Output from script: >>>>>> >>>>>> Percent undetected current: 9.8, tol / S.min(): 2.762 >>>>>> Percent undetected inf norm: 39.1, tol / S.min(): 1.667 >>>>>> Percent undetected upper bound inf norm: 0.5, tol / S.min(): 4.367 >>>>>> Percent undetected upper bound inf norm * 2: 0.0, tol / S.min(): 8.734 >>>>>> Percent undetected MATLAB: 0.0, tol / S.min(): 110.477 >>>>>> >>>>>> >>>>> >>>>> >>>>> The polynomial fitting uses eps times the largest array dimension for the >>>>> relative condition number. IIRC, that choice traces back to numerical >>>>> recipes. >>> >>> Chuck - sorry - I didn't understand what you were saying, and now I >>> think you were proposing the MATLAB algorithm. ? I can't find that in >>> Numerical Recipes - can you? ?It would be helpful as a reference. >>> >>>> This is the same as Matlab, right? >>> >>> Yes, I believe so, i.e: >>> >>> tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) >>> >>> from my original email. >>> >>>> If the Matlab condition is the most conservative, then it seems like a >>>> reasonable choice -- conservative is good so long as your false >>>> positive rate doesn't become to high, and presumably Matlab has enough >>>> user experience to know whether the false positive rate is too high. >>> >>> Are we agreeing to go for the Matlab algorithm? >>> >>> If so, how should this be managed? ?Just changing it may change the >>> output of code using numpy >= 1.5.0, but then again, the threshold is >>> probably incorrect. >>> >>> Fix and break or FutureWarning with something like: >>> >>> def matrix_rank(M, tol=None): >>> >>> where ``tol`` can be a string like ``maxdim``? >> >> I dunno, I don't think we should do a big deprecation dance for every >> bug fix. Is this a bug fix, so numpy will simply start producing more >> accurate results on a given problem? I guess there isn't really a >> right answer here (though claiming that [a, b, a+b] is full-rank is >> clearly broken, and the matlab algorithm seems reasonable for >> answering the specific question of whether a matrix is full rank), so >> we'll have to hope some users speak up... > >I don't see a problem changing this as a bugfix. >statsmodels still has, I think, the original scipy.stats.models >version for rank which is still much higher for any non-huge array and >float, cond=1.0e-12. > >Josef + 1 for making the default "matlab" : it sounds like it would be the least confusing. It also seems to me that a bug fix is probably right procedure. Last, I like best having only the matlab default (options seem uncessary). cheers JB From ben.root at ou.edu Tue Jun 26 19:39:40 2012 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 26 Jun 2012 19:39:40 -0400 Subject: [Numpy-discussion] Matrix rank default tolerance - is it too low? In-Reply-To: References: Message-ID: On Tuesday, June 26, 2012, Charles R Harris wrote: > > > On Tue, Jun 26, 2012 at 3:42 PM, Matthew Brett wrote: > > Hi, > > On Mon, Jun 18, 2012 at 3:50 PM, Matthew Brett > wrote: > > Hi, > > > > On Sun, Jun 17, 2012 at 7:22 PM, Charles R Harris > > wrote: > >> > >> > >> On Sat, Jun 16, 2012 at 2:33 PM, Matthew Brett > > >> wrote: > >>> > >>> Hi, > >>> > >>> On Sat, Jun 16, 2012 at 8:03 PM, Matthew Brett < > matthew.brett at gmail.com> > >>> wrote: > >>> > Hi, > >>> > > >>> > On Sat, Jun 16, 2012 at 10:40 AM, Nathaniel Smith > wrote: > >>> >> On Fri, Jun 15, 2012 at 4:10 AM, Charles R Harris > >>> >> wrote: > >>> >>> > >>> >>> > >>> >>> On Thu, Jun 14, 2012 at 8:06 PM, Matthew Brett > >>> >>> > >>> >>> wrote: > >>> >>>> > >>> >>>> Hi, > >>> >>>> > >>> >>>> I noticed that numpy.linalg.matrix_rank sometimes gives full rank > for > >>> >>>> matrices that are numerically rank deficient: > >>> >>>> > >>> >>>> If I repeatedly make random matrices, then set the first column > to be > >>> >>>> equal to the sum of the second and third columns: > >>> >>>> > >>> >>>> def make_deficient(): > >>> >>>> X = np.random.normal(size=(40, 10)) > >>> >>>> deficient_X = X.copy() > >>> >>>> deficient_X[:, 0] = deficient_X[:, 1] + deficient_X[:, 2] > >>> >>>> return deficient_X > >>> >>>> > >>> >>>> then the current numpy.linalg.matrix_rank algorithm returns full > rank > >>> >>>> (10) in about 8 percent of cases (see appended script). > >>> >>>> > >>> >>>> I think this is a tolerance problem. The ``matrix_rank`` > algorithm > >>> >>>> does this by default: > >>> >>>> > >>> >>>> S = spl.svd(M, compute_uv=False) > >>> >>>> tol = S.max() * np.finfo(S.dtype).eps > >>> >>>> return np.sum(S > tol) > >>> >>>> > >>> >>>> I guess we'd we want the lowest tolerance that nearly always or > >>> >>>> always > >>> >>>> identifies numerically rank deficient matrices. I suppose one > way of > >>> >>>> looking at whether the tolerance is in the right range is to > compare > >>> >>>> the calculated tolerance (``tol``) to the minimum singular value > >>> >>>> (``S.min()``) because S.min() in our case should be very small and > >>> >>>> indicate the rank deficiency. The mean value of tol / S.min() for > the > >>> >>>> current algorithm, across many iterations, is about 2.8. We might > >>> >>>> hope this value would be higher than 1, but not much higher, > >>> >>>> otherwise > >>> >>>> we might be rejecting too many columns. > >>> >>>> > >>> >>>> Our current algorithm for tolerance is the same as the 2-norm of > M * > >>> >>>> eps. We're citing Golub and Van Loan for this, but now I look at > our > >>> >>>> copy (p 261, last para) - they seem to be suggesting using u * |M| > >>> >>>> where u = (p 61, section 2.4.2) eps / 2. (see [1]). I think the > >>> >>>> Golub > > > I'm fine with that, and agree that it is likely to lead to fewer folks > wondering why Matlab and numpy are different. A good explanation in the > function documentation would be useful. > > Chuck > > One potential problem is that it implies that it will always be the same as any version of matlab's tolerance. What if they change it in a future release? How likely are we to even notice? Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue Jun 26 19:46:20 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 26 Jun 2012 16:46:20 -0700 Subject: [Numpy-discussion] Matrix rank default tolerance - is it too low? In-Reply-To: References: Message-ID: Hi, On Tue, Jun 26, 2012 at 4:39 PM, Benjamin Root wrote: > > > On Tuesday, June 26, 2012, Charles R Harris wrote: >> >> >> >> On Tue, Jun 26, 2012 at 3:42 PM, Matthew Brett >> wrote: >> >> Hi, >> >> On Mon, Jun 18, 2012 at 3:50 PM, Matthew Brett >> wrote: >> > Hi, >> > >> > On Sun, Jun 17, 2012 at 7:22 PM, Charles R Harris >> > wrote: >> >> >> >> >> >> On Sat, Jun 16, 2012 at 2:33 PM, Matthew Brett >> >> >> >> wrote: >> >>> >> >>> Hi, >> >>> >> >>> On Sat, Jun 16, 2012 at 8:03 PM, Matthew Brett >> >>> >> >>> wrote: >> >>> > Hi, >> >>> > >> >>> > On Sat, Jun 16, 2012 at 10:40 AM, Nathaniel Smith >> >>> > wrote: >> >>> >> On Fri, Jun 15, 2012 at 4:10 AM, Charles R Harris >> >>> >> wrote: >> >>> >>> >> >>> >>> >> >>> >>> On Thu, Jun 14, 2012 at 8:06 PM, Matthew Brett One potential problem is that it implies that it will always be the same as any version of matlab's tolerance. What if they change it in a future release? How likely are we to even notice? >> >>> >>> >> >>> >>> wrote: >> >>> >>>> >> >>> >>>> Hi, >> >>> >>>> >> >>> >>>> I noticed that numpy.linalg.matrix_rank sometimes gives full rank >> >>> >>>> for >> >>> >>>> matrices that are numerically rank deficient: >> >>> >>>> >> >>> >>>> If I repeatedly make random matrices, then set the first column >> >>> >>>> to be >> >>> >>>> equal to the sum of the second and third columns: >> >>> >>>> >> >>> >>>> def make_deficient(): >> >>> >>>> ? ?X = np.random.normal(size=(40, 10)) >> >>> >>>> ? ?deficient_X = X.copy() >> >>> >>>> ? ?deficient_X[:, 0] = deficient_X[:, 1] + deficient_X[:, 2] >> >>> >>>> ? ?return deficient_X >> >>> >>>> >> >>> >>>> then the current numpy.linalg.matrix_rank algorithm returns full >> >>> >>>> rank >> >>> >>>> (10) in about 8 percent of cases (see appended script). >> >>> >>>> >> >>> >>>> I think this is a tolerance problem. ?The ``matrix_rank`` >> >>> >>>> algorithm >> >>> >>>> does this by default: >> >>> >>>> >> >>> >>>> S = spl.svd(M, compute_uv=False) >> >>> >>>> tol = S.max() * np.finfo(S.dtype).eps >> >>> >>>> return np.sum(S > tol) >> >>> >>>> >> >>> >>>> I guess we'd we want the lowest tolerance that nearly always or >> >>> >>>> always >> >>> >>>> identifies numerically rank deficient matrices. ?I suppose one >> >>> >>>> way of >> >>> >>>> looking at whether the tolerance is in the right range is to >> >>> >>>> compare >> >>> >>>> the calculated tolerance (``tol``) to the minimum singular value >> >>> >>>> (``S.min()``) because S.min() in our case should be very small >> >>> >>>> and >> >>> >>>> indicate the rank deficiency. The mean value of tol / S.min() for >> >>> >>>> the >> >>> >>>> current algorithm, across many iterations, is about 2.8. ?We >> >>> >>>> might >> >>> >>>> hope this value would be higher than 1, but not much higher, >> >>> >>>> otherwise >> >>> >>>> we might be rejecting too many columns. >> >>> >>>> >> >>> >>>> Our current algorithm for tolerance is the same as the 2-norm of >> >>> >>>> M * >> >>> >>>> eps. ?We're citing Golub and Van Loan for this, but now I look at >> >>> >>>> our >> >>> >>>> copy (p 261, last para) - they seem to be suggesting using u * >> >>> >>>> |M| >> >>> >>>> where u = (p 61, section 2.4.2) eps / ?2. (see [1]). I think the >> >>> >>>> Golub >> >> >> I'm fine with that, and agree that it is likely to lead to fewer folks >> wondering why Matlab and numpy are different. A good explanation in the >> function documentation would be useful. >> >> Chuck >> > > One potential problem is that it implies that it will always be the same as > any version of matlab's tolerance. ?What if they change it in a future > release? How likely are we to even notice? I guess that matlab is unlikely to change for the same reason that we would be reluctant to change, once we've found an acceptable value. I was thinking that we would say something like: """ The default tolerance is : tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) This corresponds to the tolerance suggested in NR page X, and to the tolerance used by MATLAB at the time of writing (June 2012; see http://www.mathworks.com/help/techdoc/ref/rank.html). """ I don't know whether we would want to track changes made by matlab - maybe we could have that discussion if they do change? Best, Matthew From ben.root at ou.edu Tue Jun 26 19:50:08 2012 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 26 Jun 2012 19:50:08 -0400 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> <4032F07C-6BE4-4FDF-8627-9AA772AA9097@continuum.io> <4FEA1646.1020008@astro.uio.no> <4FEA178C.2020900@creativetrax.com> Message-ID: On Tuesday, June 26, 2012, Thouis (Ray) Jones wrote: > On Tue, Jun 26, 2012 at 10:11 PM, Jason Grout > > wrote: > > On 6/26/12 3:06 PM, Dag Sverre Seljebotn wrote: > >> Something the Sage project does very well is meeting often in person > > > > Another thing we have that has improved the mailing list climate is a > > "sage-flame" list [1] > > +1 ! > > Speaking as someone trying to get started in contributing to numpy, I > find this discussion extremely off-putting. It's childish, > meaningless, and spiteful, and I think it's doing more harm than any > possible good that could come out of continuing it. And if you still feel dissuaded from contributing here, you are always welcome over at the matplotlib lists. Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Tue Jun 26 19:59:59 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 26 Jun 2012 16:59:59 -0700 Subject: [Numpy-discussion] NumPy 1.7 release delays In-Reply-To: References: <29DC563C-9A0C-4D8D-9F2A-E1A8D17D554B@continuum.io> <333C36E4-D2E4-4127-99C3-9D55468CA558@continuum.io> Message-ID: On Tue, Jun 26, 2012 at 1:10 PM, Travis Oliphant wrote: > One issues is the one that Sage identified about the array interface > regression as noted by Jason. ? ?Any other regressions from 1.5.x need to be > addressed as well. ? ?We'll have to decide on a case-by-case basis if there > are issues that conflict with 1.6.x behavior. > One thing this discussion made me think about, is that it would be great to identify a few key projects that: - use numpy heavily - have reasonably solid test suites and create a special build job that runs *those* test suites periodically. Not necessarily on every last numpy commit, but at least on a reasonable schedule. I think having that kind of information readily available, and with the ability to switch which numpy branch/commit those tests do get run against, could be very valuable as an early warning system for numpy to know if an apparently inconsequential change has unexpected side effects downstream. In IPython we've really benefited greatly from our improved CI infrastructure, but that only goes as far as catching *our own* problems. This kind of downstream integration testing could be very useful. Cheers, f From charlesr.harris at gmail.com Tue Jun 26 20:04:49 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 26 Jun 2012 18:04:49 -0600 Subject: [Numpy-discussion] Matrix rank default tolerance - is it too low? In-Reply-To: References: Message-ID: On Tue, Jun 26, 2012 at 5:46 PM, Matthew Brett wrote: > Hi, > > On Tue, Jun 26, 2012 at 4:39 PM, Benjamin Root wrote: > > > > > > On Tuesday, June 26, 2012, Charles R Harris wrote: > >> > >> > >> > >> On Tue, Jun 26, 2012 at 3:42 PM, Matthew Brett > > >> wrote: > >> > >> Hi, > >> > >> On Mon, Jun 18, 2012 at 3:50 PM, Matthew Brett > > >> wrote: > >> > Hi, > >> > > >> > On Sun, Jun 17, 2012 at 7:22 PM, Charles R Harris > >> > wrote: > >> >> > >> >> > >> >> On Sat, Jun 16, 2012 at 2:33 PM, Matthew Brett > >> >> > >> >> wrote: > >> >>> > >> >>> Hi, > >> >>> > >> >>> On Sat, Jun 16, 2012 at 8:03 PM, Matthew Brett > >> >>> > >> >>> wrote: > >> >>> > Hi, > >> >>> > > >> >>> > On Sat, Jun 16, 2012 at 10:40 AM, Nathaniel Smith > >> >>> > wrote: > >> >>> >> On Fri, Jun 15, 2012 at 4:10 AM, Charles R Harris > >> >>> >> wrote: > >> >>> >>> > >> >>> >>> > >> >>> >>> On Thu, Jun 14, 2012 at 8:06 PM, Matthew Brett > One potential problem is that it implies that it will always be the > same as any version of matlab's tolerance. What if they change it in > a future release? How likely are we to even notice? > > > >> >>> >>> > >> >>> >>> wrote: > >> >>> >>>> > >> >>> >>>> Hi, > >> >>> >>>> > >> >>> >>>> I noticed that numpy.linalg.matrix_rank sometimes gives full > rank > >> >>> >>>> for > >> >>> >>>> matrices that are numerically rank deficient: > >> >>> >>>> > >> >>> >>>> If I repeatedly make random matrices, then set the first column > >> >>> >>>> to be > >> >>> >>>> equal to the sum of the second and third columns: > >> >>> >>>> > >> >>> >>>> def make_deficient(): > >> >>> >>>> X = np.random.normal(size=(40, 10)) > >> >>> >>>> deficient_X = X.copy() > >> >>> >>>> deficient_X[:, 0] = deficient_X[:, 1] + deficient_X[:, 2] > >> >>> >>>> return deficient_X > >> >>> >>>> > >> >>> >>>> then the current numpy.linalg.matrix_rank algorithm returns > full > >> >>> >>>> rank > >> >>> >>>> (10) in about 8 percent of cases (see appended script). > >> >>> >>>> > >> >>> >>>> I think this is a tolerance problem. The ``matrix_rank`` > >> >>> >>>> algorithm > >> >>> >>>> does this by default: > >> >>> >>>> > >> >>> >>>> S = spl.svd(M, compute_uv=False) > >> >>> >>>> tol = S.max() * np.finfo(S.dtype).eps > >> >>> >>>> return np.sum(S > tol) > >> >>> >>>> > >> >>> >>>> I guess we'd we want the lowest tolerance that nearly always or > >> >>> >>>> always > >> >>> >>>> identifies numerically rank deficient matrices. I suppose one > >> >>> >>>> way of > >> >>> >>>> looking at whether the tolerance is in the right range is to > >> >>> >>>> compare > >> >>> >>>> the calculated tolerance (``tol``) to the minimum singular > value > >> >>> >>>> (``S.min()``) because S.min() in our case should be very small > >> >>> >>>> and > >> >>> >>>> indicate the rank deficiency. The mean value of tol / S.min() > for > >> >>> >>>> the > >> >>> >>>> current algorithm, across many iterations, is about 2.8. We > >> >>> >>>> might > >> >>> >>>> hope this value would be higher than 1, but not much higher, > >> >>> >>>> otherwise > >> >>> >>>> we might be rejecting too many columns. > >> >>> >>>> > >> >>> >>>> Our current algorithm for tolerance is the same as the 2-norm > of > >> >>> >>>> M * > >> >>> >>>> eps. We're citing Golub and Van Loan for this, but now I look > at > >> >>> >>>> our > >> >>> >>>> copy (p 261, last para) - they seem to be suggesting using u * > >> >>> >>>> |M| > >> >>> >>>> where u = (p 61, section 2.4.2) eps / 2. (see [1]). I think > the > >> >>> >>>> Golub > >> > >> > >> I'm fine with that, and agree that it is likely to lead to fewer folks > >> wondering why Matlab and numpy are different. A good explanation in the > >> function documentation would be useful. > >> > >> Chuck > >> > > > > One potential problem is that it implies that it will always be the same > as > > any version of matlab's tolerance. What if they change it in a future > > release? How likely are we to even notice? > > I guess that matlab is unlikely to change for the same reason that we > would be reluctant to change, once we've found an acceptable value. > > I was thinking that we would say something like: > > """ > The default tolerance is : > > tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) > > This corresponds to the tolerance suggested in NR page X, and to the > tolerance used by MATLAB at the time of writing (June 2012; see > http://www.mathworks.com/help/techdoc/ref/rank.html). > """ > > I don't know whether we would want to track changes made by matlab - > maybe we could have that discussion if they do change? > I wouldn't bother tracking Matlab, but I think the alternative threshold could be mentioned in the notes. Something like A less conservative threshold is ... Maybe mention that because of numerical uncertainty there will always be a chance that the computed rank could be wrong, but that with the conservative threshold the rank is very unlikely to be less than the computed rank. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Tue Jun 26 20:15:42 2012 From: jsseabold at gmail.com (Skipper Seabold) Date: Tue, 26 Jun 2012 20:15:42 -0400 Subject: [Numpy-discussion] NumPy 1.7 release delays In-Reply-To: References: <29DC563C-9A0C-4D8D-9F2A-E1A8D17D554B@continuum.io> <333C36E4-D2E4-4127-99C3-9D55468CA558@continuum.io> Message-ID: On Tue, Jun 26, 2012 at 7:59 PM, Fernando Perez wrote: > On Tue, Jun 26, 2012 at 1:10 PM, Travis Oliphant wrote: >> One issues is the one that Sage identified about the array interface >> regression as noted by Jason. ? ?Any other regressions from 1.5.x need to be >> addressed as well. ? ?We'll have to decide on a case-by-case basis if there >> are issues that conflict with 1.6.x behavior. >> > > One thing this discussion made me think about, is that it would be > great to identify a few key projects that: > > - use numpy heavily > - have reasonably solid test suites > > and create a special build job that runs *those* test suites > periodically. ?Not necessarily on every last numpy commit, but at > least on a reasonable schedule. > > I think having that kind of information readily available, and with > the ability to switch which numpy branch/commit those tests do get run > against, could be very valuable as an early warning system for numpy > to know if an apparently inconsequential change has unexpected side > effects downstream. > > In IPython we've really benefited greatly from our improved CI > infrastructure, but that only goes as far as catching *our own* > problems. ?This kind of downstream integration testing could be very > useful. > +1. Was thinking the same thing. My uninformed opinion from the sidelines: For me, this begged the question of why projects would wait so long and be upgrading 1.5.x -> 1.7.x. it sounded to me like an outreach problem. The whole point of having release candidates is so that downstream users (and especially big public downstream libraries) can test the release candidate and give feedback on any changes that affect them. This feedback step is especially crucial for a project without 100% test coverage (for new code and old)... Putting more restrictions on changes that can be made in releases doesn't seem to me to be the right fix, though, admittedly, numpy is a bit of a different beast than other projects. I would think you would want downstream projects not to wait 2 years to upgrade and skip a couple of minor releases. Skipper From wesmckinn at gmail.com Tue Jun 26 20:37:42 2012 From: wesmckinn at gmail.com (Wes McKinney) Date: Tue, 26 Jun 2012 20:37:42 -0400 Subject: [Numpy-discussion] NumPy 1.7 release delays In-Reply-To: References: <29DC563C-9A0C-4D8D-9F2A-E1A8D17D554B@continuum.io> <333C36E4-D2E4-4127-99C3-9D55468CA558@continuum.io> Message-ID: On Tue, Jun 26, 2012 at 8:15 PM, Skipper Seabold wrote: > On Tue, Jun 26, 2012 at 7:59 PM, Fernando Perez wrote: >> On Tue, Jun 26, 2012 at 1:10 PM, Travis Oliphant wrote: >>> One issues is the one that Sage identified about the array interface >>> regression as noted by Jason. ? ?Any other regressions from 1.5.x need to be >>> addressed as well. ? ?We'll have to decide on a case-by-case basis if there >>> are issues that conflict with 1.6.x behavior. >>> >> >> One thing this discussion made me think about, is that it would be >> great to identify a few key projects that: >> >> - use numpy heavily >> - have reasonably solid test suites >> >> and create a special build job that runs *those* test suites >> periodically. ?Not necessarily on every last numpy commit, but at >> least on a reasonable schedule. >> >> I think having that kind of information readily available, and with >> the ability to switch which numpy branch/commit those tests do get run >> against, could be very valuable as an early warning system for numpy >> to know if an apparently inconsequential change has unexpected side >> effects downstream. >> >> In IPython we've really benefited greatly from our improved CI >> infrastructure, but that only goes as far as catching *our own* >> problems. ?This kind of downstream integration testing could be very >> useful. >> > > +1. Was thinking the same thing. > > My uninformed opinion from the sidelines: For me, this begged the > question of why projects would wait so long and be upgrading 1.5.x -> > 1.7.x. it sounded to me like an outreach problem. The whole point of > having release candidates is so that downstream users (and especially > big public downstream libraries) can test the release candidate and > give feedback on any changes that affect them. This feedback step is > especially crucial for a project without 100% test coverage (for new > code and old)... Putting more restrictions on changes that can be made > in releases doesn't seem to me to be the right fix, though, > admittedly, numpy is a bit of a different beast than other projects. I > would think you would want downstream projects not to wait 2 years to > upgrade and skip a couple of minor releases. > > Skipper > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion +1. We've begun running pandas's test suite internally against NumPy git master on Jenkins. It has already turned up bugs and behavior changes in a few short weeks. We should definitely do this on a more grand scale (especially since pandas 0.8.0 is now littered with hacks around NumPy 1.6 datetime bugs. fortunately nothing was fatally broken but it came close). - Wes From ondrej.certik at gmail.com Tue Jun 26 20:40:24 2012 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Tue, 26 Jun 2012 17:40:24 -0700 Subject: [Numpy-discussion] NumPy 1.7 release delays In-Reply-To: References: <29DC563C-9A0C-4D8D-9F2A-E1A8D17D554B@continuum.io> <333C36E4-D2E4-4127-99C3-9D55468CA558@continuum.io> Message-ID: On Tue, Jun 26, 2012 at 4:59 PM, Fernando Perez wrote: > On Tue, Jun 26, 2012 at 1:10 PM, Travis Oliphant wrote: >> One issues is the one that Sage identified about the array interface >> regression as noted by Jason. ? ?Any other regressions from 1.5.x need to be >> addressed as well. ? ?We'll have to decide on a case-by-case basis if there >> are issues that conflict with 1.6.x behavior. >> > > One thing this discussion made me think about, is that it would be > great to identify a few key projects that: > > - use numpy heavily > - have reasonably solid test suites > > and create a special build job that runs *those* test suites > periodically. ?Not necessarily on every last numpy commit, but at > least on a reasonable schedule. > > I think having that kind of information readily available, and with > the ability to switch which numpy branch/commit those tests do get run > against, could be very valuable as an early warning system for numpy > to know if an apparently inconsequential change has unexpected side > effects downstream. I think that is a great idea. It would simply recompile numpy, but leave the other library intact, and then run some tests on the other library, which could be as simple as "import h5py", or more complicated. > > In IPython we've really benefited greatly from our improved CI > infrastructure, but that only goes as far as catching *our own* Do you use anything else besides Travis CI? I donated money to them and they enabled pull request testing for SymPy and it's invaluable. We also use our custom sympy-bot (https://github.com/sympy/sympy-bot) to test pull request, but now when Travis can do that, we might just use that. NumPy now has Travis for both master and pull requests and so it is pretty well covered. I need to setup some Jenkins instances for Windows (and Mac) testing, and then we can also add linux Jenkins instance to test numpy against a few other libraries. > problems. ?This kind of downstream integration testing could be very > useful. Ondrej From fperez.net at gmail.com Tue Jun 26 21:00:43 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 26 Jun 2012 18:00:43 -0700 Subject: [Numpy-discussion] NumPy 1.7 release delays In-Reply-To: References: <29DC563C-9A0C-4D8D-9F2A-E1A8D17D554B@continuum.io> <333C36E4-D2E4-4127-99C3-9D55468CA558@continuum.io> Message-ID: On Tue, Jun 26, 2012 at 5:40 PM, Ond?ej ?ert?k wrote: > Do you use anything else besides Travis CI? Yes, we use both Shining Panda and Travis CI: https://jenkins.shiningpanda.com/ipython/ http://travis-ci.org/#!/ipython/ipython The SP setup is more complete, including Mac and Windows bots. > I donated money to them and they enabled pull request > testing for SymPy and it's invaluable. We also use > our custom sympy-bot (https://github.com/sympy/sympy-bot) to test pull > request, but now > when Travis can do that, we might just use that. We have a version of that: after Aaron Meurer gave us an invaluable and detailed report on how you guys used it, Thomas Kluyver built for us our new test_pr script: https://github.com/ipython/ipython/blob/master/tools/test_pr.py which we regularly use now in most PRs, e.g.: https://github.com/ipython/ipython/pull/2015#issuecomment-6566387 It has proven to be *extremely* useful. This is some of the infrastructure that I hope we'll gradually start using across all the projects (the topic of some of the threads in the numfocus list). In IPython, our ability to rapidly absorb code has improved tremendously in part thanks to the smooth workflow these tools give us; just in the month of June we've merged 116 PRs totaling over 400 commits: (master)dreamweaver[ipython]> git log --oneline --since 2012-06-01 | grep "Merge pull request" | wc -l 116 (master)dreamweaver[ipython]> git log --oneline --since 2012-06-01 | wc -l 438 There's no way to keep that pace unless we can really trust our testing machinery to let us know what's safe by the time we get to code review. As our tools mature, I really hope we'll start using them more across different projects, because the benefit they provide is undeniable. Cheers, f From jbednar at inf.ed.ac.uk Tue Jun 26 21:39:02 2012 From: jbednar at inf.ed.ac.uk (James A. Bednar) Date: Wed, 27 Jun 2012 02:39:02 +0100 Subject: [Numpy-discussion] NumPy-Discussion Digest, Vol 69, Issue 77 In-Reply-To: References: Message-ID: <20458.25654.986363.327246@mckellar.inf.ed.ac.uk> | From: Rebekah Pratt | Date: Jun 25 23:59:59 2012 -0400 | | Hey, greetings from orlando! How has re entry to texas been? Our | holiday is going well, although a little fast. The whole disney | experience is cooler than I was expecting. We are all a bit tired | out though, and adah moody as anything. Anyway best get to sleep, | but wanted to say hi, and hope all going well with you. Sounds exciting! We've been limping along with very little sleep, but it's good to be home. Lots of sunshine and swimming, though 107 degrees F is not really all that fun. The kids are loving it here. Not much time to check email, so it's really piling up. Hope you have a good trip back... Jim -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. From matthew.brett at gmail.com Tue Jun 26 22:07:26 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 26 Jun 2012 19:07:26 -0700 Subject: [Numpy-discussion] NumPy 1.7 release delays In-Reply-To: References: <29DC563C-9A0C-4D8D-9F2A-E1A8D17D554B@continuum.io> <333C36E4-D2E4-4127-99C3-9D55468CA558@continuum.io> Message-ID: Hi, On Tue, Jun 26, 2012 at 6:00 PM, Fernando Perez wrote: > On Tue, Jun 26, 2012 at 5:40 PM, Ond?ej ?ert?k wrote: >> Do you use anything else besides Travis CI? > > Yes, we use both Shining Panda and Travis CI: > > https://jenkins.shiningpanda.com/ipython/ > http://travis-ci.org/#!/ipython/ipython > > The SP setup is more complete, including Mac and Windows bots. > >> I donated money to them and they enabled pull request >> testing for SymPy and it's invaluable. We also use >> our custom sympy-bot (https://github.com/sympy/sympy-bot) to test pull >> request, but now >> when Travis can do that, we might just use that. > > We have a version of that: after Aaron Meurer gave us an invaluable > and detailed report on how you guys used it, Thomas Kluyver built for > us our new test_pr script: > > https://github.com/ipython/ipython/blob/master/tools/test_pr.py > > which we regularly use now in most PRs, e.g.: > > https://github.com/ipython/ipython/pull/2015#issuecomment-6566387 > > It has proven to be *extremely* useful. > > This is some of the infrastructure that I hope we'll gradually start > using across all the projects (the topic of some of the threads in the > numfocus list). ?In IPython, our ability to rapidly absorb code has > improved tremendously in part thanks to the smooth workflow these > tools give us; just in the month of June we've merged 116 PRs totaling > over 400 commits: > > (master)dreamweaver[ipython]> git log --oneline --since 2012-06-01 | > grep "Merge pull request" | wc -l > 116 > > (master)dreamweaver[ipython]> git log --oneline --since 2012-06-01 | wc -l > 438 > > There's no way to keep that pace unless we can really trust our > testing machinery to let us know what's safe by the time we get to > code review. > > As our tools mature, I really hope we'll start using them more across > different projects, because the benefit they provide is undeniable. We (nipy'ers) are heavy users of numpy and scipy. We use travis-ci for testing individual commits to personal repos: https://github.com/nipy/nibabel/blob/master/.travis.yml (using standard travis-ci python test machinery, multiple python versions) https://github.com/nipy/dipy/blob/master/.travis.yml https://github.com/nipy/nipy/blob/master/.travis.yml (using a hack to test against a system python, to avoid multiple compiles of numpy / scipy). We've also been discussing numpy / scipy compiles on the Travis-CI mailing list : https://groups.google.com/d/topic/travis-ci/uJgu35XKdmI/discussion. For the main repos we use buildbot and test on: Ubuntu Maverick 32-bit Debian sid 64-bit OSX 10.4 PPC OSX 10.5 Intel Debian wheezy PPC Debian squeeze ARM (a Raspberry PI no less) WIndows XP 32 bit SPARC (courtesy of our friends at NeuroDebian) http://nipy.bic.berkeley.edu/builders We've found several issues with numpy using these, and I've fed them back as I found them, http://projects.scipy.org/numpy/ticket/2076 http://projects.scipy.org/numpy/ticket/2077 http://projects.scipy.org/numpy/ticket/2174 They are particularly useful for difficult to reproduce problems because they test often and leave a record that we can point to. As I've said before, y'all are welcome to use these machines for numpy builds / tests. Best, Matthew From matthew.brett at gmail.com Tue Jun 26 22:29:01 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 26 Jun 2012 19:29:01 -0700 Subject: [Numpy-discussion] Matrix rank default tolerance - is it too low? In-Reply-To: References: Message-ID: Hi, On Tue, Jun 26, 2012 at 5:04 PM, Charles R Harris wrote: > > > On Tue, Jun 26, 2012 at 5:46 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Tue, Jun 26, 2012 at 4:39 PM, Benjamin Root wrote: >> > >> > >> > On Tuesday, June 26, 2012, Charles R Harris wrote: >> >> >> >> >> >> >> >> On Tue, Jun 26, 2012 at 3:42 PM, Matthew Brett >> >> >> >> wrote: >> >> >> >> Hi, >> >> >> >> On Mon, Jun 18, 2012 at 3:50 PM, Matthew Brett >> >> >> >> wrote: >> >> > Hi, >> >> > >> >> > On Sun, Jun 17, 2012 at 7:22 PM, Charles R Harris >> >> > wrote: >> >> >> >> >> >> >> >> >> On Sat, Jun 16, 2012 at 2:33 PM, Matthew Brett >> >> >> >> >> >> wrote: >> >> >>> >> >> >>> Hi, >> >> >>> >> >> >>> On Sat, Jun 16, 2012 at 8:03 PM, Matthew Brett >> >> >>> >> >> >>> wrote: >> >> >>> > Hi, >> >> >>> > >> >> >>> > On Sat, Jun 16, 2012 at 10:40 AM, Nathaniel Smith >> >> >>> > wrote: >> >> >>> >> On Fri, Jun 15, 2012 at 4:10 AM, Charles R Harris >> >> >>> >> wrote: >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> On Thu, Jun 14, 2012 at 8:06 PM, Matthew Brett >> One potential problem is that it implies that it will always be the >> same as any version of matlab's tolerance. ?What if they change it in >> a future release? How likely are we to even notice? >> >> >> >> >>> >>> >> >> >>> >>> wrote: >> >> >>> >>>> >> >> >>> >>>> Hi, >> >> >>> >>>> >> >> >>> >>>> I noticed that numpy.linalg.matrix_rank sometimes gives full >> >> >>> >>>> rank >> >> >>> >>>> for >> >> >>> >>>> matrices that are numerically rank deficient: >> >> >>> >>>> >> >> >>> >>>> If I repeatedly make random matrices, then set the first >> >> >>> >>>> column >> >> >>> >>>> to be >> >> >>> >>>> equal to the sum of the second and third columns: >> >> >>> >>>> >> >> >>> >>>> def make_deficient(): >> >> >>> >>>> ? ?X = np.random.normal(size=(40, 10)) >> >> >>> >>>> ? ?deficient_X = X.copy() >> >> >>> >>>> ? ?deficient_X[:, 0] = deficient_X[:, 1] + deficient_X[:, 2] >> >> >>> >>>> ? ?return deficient_X >> >> >>> >>>> >> >> >>> >>>> then the current numpy.linalg.matrix_rank algorithm returns >> >> >>> >>>> full >> >> >>> >>>> rank >> >> >>> >>>> (10) in about 8 percent of cases (see appended script). >> >> >>> >>>> >> >> >>> >>>> I think this is a tolerance problem. ?The ``matrix_rank`` >> >> >>> >>>> algorithm >> >> >>> >>>> does this by default: >> >> >>> >>>> >> >> >>> >>>> S = spl.svd(M, compute_uv=False) >> >> >>> >>>> tol = S.max() * np.finfo(S.dtype).eps >> >> >>> >>>> return np.sum(S > tol) >> >> >>> >>>> >> >> >>> >>>> I guess we'd we want the lowest tolerance that nearly always >> >> >>> >>>> or >> >> >>> >>>> always >> >> >>> >>>> identifies numerically rank deficient matrices. ?I suppose one >> >> >>> >>>> way of >> >> >>> >>>> looking at whether the tolerance is in the right range is to >> >> >>> >>>> compare >> >> >>> >>>> the calculated tolerance (``tol``) to the minimum singular >> >> >>> >>>> value >> >> >>> >>>> (``S.min()``) because S.min() in our case should be very small >> >> >>> >>>> and >> >> >>> >>>> indicate the rank deficiency. The mean value of tol / S.min() >> >> >>> >>>> for >> >> >>> >>>> the >> >> >>> >>>> current algorithm, across many iterations, is about 2.8. ?We >> >> >>> >>>> might >> >> >>> >>>> hope this value would be higher than 1, but not much higher, >> >> >>> >>>> otherwise >> >> >>> >>>> we might be rejecting too many columns. >> >> >>> >>>> >> >> >>> >>>> Our current algorithm for tolerance is the same as the 2-norm >> >> >>> >>>> of >> >> >>> >>>> M * >> >> >>> >>>> eps. ?We're citing Golub and Van Loan for this, but now I look >> >> >>> >>>> at >> >> >>> >>>> our >> >> >>> >>>> copy (p 261, last para) - they seem to be suggesting using u * >> >> >>> >>>> |M| >> >> >>> >>>> where u = (p 61, section 2.4.2) eps / ?2. (see [1]). I think >> >> >>> >>>> the >> >> >>> >>>> Golub >> >> >> >> >> >> I'm fine with that, and agree that it is likely to lead to fewer folks >> >> wondering why Matlab and numpy are different. A good explanation in the >> >> function documentation would be useful. >> >> >> >> Chuck >> >> >> > >> > One potential problem is that it implies that it will always be the same >> > as >> > any version of matlab's tolerance. ?What if they change it in a future >> > release? How likely are we to even notice? >> >> I guess that matlab is unlikely to change for the same reason that we >> would be reluctant to change, once we've found an acceptable value. >> >> I was thinking that we would say something like: >> >> """ >> The default tolerance is : >> >> ?tol = S.max() * np.finfo(M.dtype).eps * max((m, n)) >> >> This corresponds to the tolerance suggested in NR page X, and to the >> tolerance used by MATLAB at the time of writing (June 2012; see >> http://www.mathworks.com/help/techdoc/ref/rank.html). >> """ >> >> I don't know whether we would want to track changes made by matlab - >> maybe we could have that discussion if they do change? > > > I wouldn't bother tracking Matlab, but I think the alternative threshold > could be mentioned in the notes. Something like > > A less conservative threshold is ... > > Maybe mention that because of numerical uncertainty there will always be a > chance that the computed rank could be wrong, but that with the conservative > threshold the rank is very unlikely to be less than the computed rank. Sounds good to me. Would anyone object to a pull request with these changes (matlab tolerance default, description in docstring)? Cheers, Matthew From travis at continuum.io Tue Jun 26 23:13:54 2012 From: travis at continuum.io (Travis Oliphant) Date: Tue, 26 Jun 2012 22:13:54 -0500 Subject: [Numpy-discussion] NumPy 1.7 release delays In-Reply-To: References: <29DC563C-9A0C-4D8D-9F2A-E1A8D17D554B@continuum.io> <333C36E4-D2E4-4127-99C3-9D55468CA558@continuum.io> Message-ID: <1E8CCAF9-20F6-48DB-B5E1-A8B74BC194F4@continuum.io> > For the main repos we use buildbot and test on: > > Ubuntu Maverick 32-bit > Debian sid 64-bit > OSX 10.4 PPC > OSX 10.5 Intel > Debian wheezy PPC > Debian squeeze ARM (a Raspberry PI no less) > WIndows XP 32 bit > SPARC (courtesy of our friends at NeuroDebian) > > http://nipy.bic.berkeley.edu/builders > > We've found several issues with numpy using these, and I've fed them > back as I found them, > > http://projects.scipy.org/numpy/ticket/2076 > http://projects.scipy.org/numpy/ticket/2077 > http://projects.scipy.org/numpy/ticket/2174 > > They are particularly useful for difficult to reproduce problems > because they test often and leave a record that we can point to. As > I've said before, y'all are welcome to use these machines for numpy > builds / tests. Now that Ondrej is working on getting continuous integration up for NumPy, I would encourage him to take you up on that offer. Can these machines run a Jenkins slave? Having periodic tests of Sage, Pandas, matplotlib, scipy, and other projects is a major priority and really critical before we can really talk about how to migrate the APIs. Thankfully, Ondrej is available to help get this project started and working this summer. -Travis > > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From matthew.brett at gmail.com Tue Jun 26 23:33:18 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 26 Jun 2012 20:33:18 -0700 Subject: [Numpy-discussion] NumPy 1.7 release delays In-Reply-To: <1E8CCAF9-20F6-48DB-B5E1-A8B74BC194F4@continuum.io> References: <29DC563C-9A0C-4D8D-9F2A-E1A8D17D554B@continuum.io> <333C36E4-D2E4-4127-99C3-9D55468CA558@continuum.io> <1E8CCAF9-20F6-48DB-B5E1-A8B74BC194F4@continuum.io> Message-ID: Hi, On Tue, Jun 26, 2012 at 8:13 PM, Travis Oliphant wrote: >> For the main repos we use buildbot and test on: >> >> Ubuntu Maverick 32-bit >> Debian sid 64-bit >> OSX 10.4 PPC >> OSX 10.5 Intel >> Debian wheezy PPC >> Debian squeeze ARM (a Raspberry PI no less) >> WIndows XP 32 bit >> SPARC (courtesy of our friends at NeuroDebian) >> >> http://nipy.bic.berkeley.edu/builders >> >> We've found several issues with numpy using these, and I've fed them >> back as I found them, >> >> http://projects.scipy.org/numpy/ticket/2076 >> http://projects.scipy.org/numpy/ticket/2077 >> http://projects.scipy.org/numpy/ticket/2174 >> >> They are particularly useful for difficult to reproduce problems >> because they test often and leave a record that we can point to. ?As >> I've said before, y'all are welcome to use these machines for numpy >> builds / tests. > > Now that Ondrej is working on getting continuous integration up for NumPy, ?I would encourage him to take you up on that offer. ? Can these machines run a Jenkins slave? I believe so, but haven't tested. Best, Matthew From travis at continuum.io Wed Jun 27 01:08:14 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 27 Jun 2012 00:08:14 -0500 Subject: [Numpy-discussion] NumPy 1.7 release plans Message-ID: <25C91477-BEBA-42DE-AE84-7498C9426F2D@continuum.io> In my enthusiasm of finding someone to help with the release of NumPy 1.7 and my desire to get something released by the SciPy conference, I was hasty and didn't gather enough feedback from others about the release of NumPy 1.7. I'm sorry about that. I would like to get NumPy 1.7 out the door as quickly as we can *and* make sure it is as well tested as we can --- in fact I'm hoping we can also use this opportunity to setup a Continuous Integration system for NumPy that will essentially extend NumPy's testing infrastructure and make it easier to do releases in the future. Ondrej, the author of SymPy, has agreed to help on the both the release and the Continuous Integration side. Ideally we would also start producing a code coverage report and a vbench report for NumPy as well. This is much more likely to happen if there are other people willing to pictch in (By the way, one of the goals of NumFOCUS is to provide Continuous Integration and code coverage resources to all of the Scientific Python projects as funds and community resources become available --- please email numfocus at googlegroups.com if you are interested in helping with that effort). So, I would propose a code-freeze by July 13th with a beta release of NumPy 1.7 by July 17th. We will work to get that beta release actively tested by as many projects as possible, leading to a release candidate by July 31. If all goes well I could imagine a release by August 14. If we need to make another release candidate, then we can do that August 14th and push the release to August 28th. Let me know if there are any concerns about this updated schedule. Best regards, -Travis From ralf.gommers at googlemail.com Wed Jun 27 01:25:16 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 27 Jun 2012 07:25:16 +0200 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: <895DAD05-5B24-4EBC-84F4-10442A4531B3@continuum.io> References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> <895DAD05-5B24-4EBC-84F4-10442A4531B3@continuum.io> Message-ID: Travis, apologies in advance if the tone of this message is too strong - please take it as a sign of how frustrating I find the discussion on this point. On Tue, Jun 26, 2012 at 5:33 AM, Travis Oliphant wrote: ... > What should have happened in this case, in my mind, is that NumPy 1.4.0 > should have been 1.5.0 and advertised that there was a break in the ABI and > that all extensions would have to be re-built against the new version. > This would have been some pain for one class of users (primarily package > maintainers) and no pain for another class. Please please stop asserting this. It's plain wrong. It has been explained to you multiple times by multiple people how bad the consequences of breaking the ABI are. It leads to random segfaults when existing installers are not updated or when users pick the wrong installer by accident (which undoubtedly some will). It also leads to a large increase in the number of installers that maintainers for every single package that depends on numpy will have to build. Including for releases they've already made in the past. The assertion that users nowadays mainly use bundles like EPD or package managers is also extremely pointless. Last week NumPy had over 7000 downloads on SF alone; the cumulative total stands at almost 1.7 million. If even 0.1% of those downloads are of the wrong binary, that's 7 users *every week* with a very serious problem. API breakage is also bad, and I'm not going to argue here about which kind of breakage is worse. What I will point out though is that we now have datetime merged back in while keeping ABI compatibility, thanks to Mark's efforts. That shows it's hardly ever really necessary to break the ABI. Finally, it has been agreed several times on this list to not break the ABI for minor releases, period. Let's please stick to that decision. Best regards, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From klonuo at gmail.com Wed Jun 27 01:52:39 2012 From: klonuo at gmail.com (klo uo) Date: Wed, 27 Jun 2012 07:52:39 +0200 Subject: [Numpy-discussion] Numpy logo in VTK In-Reply-To: References: Message-ID: I'll try, thanks ^_^ As I said I'm learning mayavi, and wasn't expecting that drawing trivial objects like planes would be trouble. It seems it is, and the way to do this would be using triangular_mesh function. So I looked elsewhere, i.e. matplotlib and abused bars: ======================================== from numpy import arange, ones import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D fig = plt.figure() ax = fig.add_subplot(111, projection='3d') o = ones(4) r = arange(4) # planes: for z in arange(3)+1: ax.bar(r, o*4, zs=z, zdir='x', alpha=.05, width=1) ax.bar(r, o*4, zs=z, zdir='y', alpha=.05, width=1) ax.bar(r, o*4, zs=z, zdir='z', alpha=.05, width=1) # N for i in [1, 2]: ax.bar3d([i], [0], [i], [.9], [.1], [.9], color='y', linewidth=.1) ax.bar3d(o+(i*(-1)**i), o-1, r, o-.1, o-.9, o-.1, color='y', linewidth=.1) # cage ax.bar3d([0], [0], [0], [4], [4], [4], alpha=.05, color='w', linewidth=0) plt.show() # plt.savefig('numpy.png') ======================================== Results in attachment. Annotation stuff can be turned off in above case like this: ======================================== ax.grid(False) for a in (ax.w_xaxis, ax.w_yaxis, ax.w_zaxis): for t in a.get_ticklines()+a.get_ticklabels(): t.set_visible(False) a.line.set_visible(False) a.pane.set_visible(False) ======================================== I'll try later with mayavi again, simply because it can export to 3D format suitable for further enhancing and rendering, if needed. Default lights in matplotlib 3D scene seem too low and colors are pale compared to mayavi, and I'm not sure if there is way to tweak it. Hope to see other competitors with different approach ;) Cheers -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 52277 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image2.png Type: image/png Size: 85571 bytes Desc: not available URL: From fperez.net at gmail.com Wed Jun 27 01:59:49 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 26 Jun 2012 22:59:49 -0700 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> <895DAD05-5B24-4EBC-84F4-10442A4531B3@continuum.io> Message-ID: On Tue, Jun 26, 2012 at 10:25 PM, Ralf Gommers wrote: > > On Tue, Jun 26, 2012 at 5:33 AM, Travis Oliphant > wrote: > ... >> >> What should have happened in this case, in my mind, is that NumPy 1.4.0 >> should have been 1.5.0 and advertised that there was a break in the ABI and >> that all extensions would have to be re-built against the new version. >> ?This would have been some pain for one class of users (primarily package >> maintainers) and no pain for another class. > > > Please please stop asserting this. It's plain wrong. It has been explained > to you multiple times by multiple people how bad the consequences of > breaking the ABI are. It leads to random segfaults when existing installers > are not updated or when users pick the wrong installer by accident (which > undoubtedly some will). It also leads to a large increase in the number of > installers that maintainers for every single package that depends on numpy > will have to build. Including for releases they've already made in the past. An additional perspective on the issue of ABI breakage: even for those of us who live in a distro-managed universe (ubuntu in my case), the moment numpy breaks ABI means that it becomes *much* harder to use the new numpy because I'd have to start recompiling all binary dependencies, some of which are not pleasant to start rebuilding (such as VTK for mayavi). So that means I'm much less likely to use an ABI-incompatible numpy for everyday work, and therefore less likely to find bugs, report them, etc. I typically run dev versions of numpy, scipy and matplotlib all the time, except when numpy breaks ABI, which means I have to 'pin' numpy to the system one and only update the others. Now, obviously that doesn't mean that ABI can never be broken, but it's just another data point for you as you evaluate the cost of ABI breakage. It is significant even for those who operate under the benefit of managed packages, because numpy is effectively the root node of the dependency tree for virtually all scientific python packages. I hope this is useful as additional data on the issue. Cheers, f From travis at continuum.io Wed Jun 27 02:02:41 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 27 Jun 2012 01:02:41 -0500 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> <895DAD05-5B24-4EBC-84F4-10442A4531B3@continuum.io> Message-ID: <2ADA61BB-148A-4BF8-B23B-BB1852C56801@continuum.io> I do understand the issues around ABI breakage. I just want to speak up for the people who are affected by API breakage who are not as vocal on this list. I believe we should have similar frustration and concern at talk of API breakage as there is about talk of ABI breakage. -Travis On Jun 27, 2012, at 12:59 AM, Fernando Perez wrote: > On Tue, Jun 26, 2012 at 10:25 PM, Ralf Gommers > wrote: >> >> On Tue, Jun 26, 2012 at 5:33 AM, Travis Oliphant >> wrote: >> ... >>> >>> What should have happened in this case, in my mind, is that NumPy 1.4.0 >>> should have been 1.5.0 and advertised that there was a break in the ABI and >>> that all extensions would have to be re-built against the new version. >>> This would have been some pain for one class of users (primarily package >>> maintainers) and no pain for another class. >> >> >> Please please stop asserting this. It's plain wrong. It has been explained >> to you multiple times by multiple people how bad the consequences of >> breaking the ABI are. It leads to random segfaults when existing installers >> are not updated or when users pick the wrong installer by accident (which >> undoubtedly some will). It also leads to a large increase in the number of >> installers that maintainers for every single package that depends on numpy >> will have to build. Including for releases they've already made in the past. > > An additional perspective on the issue of ABI breakage: even for those > of us who live in a distro-managed universe (ubuntu in my case), the > moment numpy breaks ABI means that it becomes *much* harder to use the > new numpy because I'd have to start recompiling all binary > dependencies, some of which are not pleasant to start rebuilding (such > as VTK for mayavi). So that means I'm much less likely to use an > ABI-incompatible numpy for everyday work, and therefore less likely to > find bugs, report them, etc. I typically run dev versions of numpy, > scipy and matplotlib all the time, except when numpy breaks ABI, > which means I have to 'pin' numpy to the system one and only update > the others. > > Now, obviously that doesn't mean that ABI can never be broken, but > it's just another data point for you as you evaluate the cost of ABI > breakage. It is significant even for those who operate under the > benefit of managed packages, because numpy is effectively the root > node of the dependency tree for virtually all scientific python > packages. > > I hope this is useful as additional data on the issue. > > Cheers, > > f > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From klonuo at gmail.com Wed Jun 27 02:04:48 2012 From: klonuo at gmail.com (klo uo) Date: Wed, 27 Jun 2012 08:04:48 +0200 Subject: [Numpy-discussion] Numpy logo in VTK In-Reply-To: References: Message-ID: Damn it, N is inverted and I noticed it now after posting. Sorry about that, here is correct one: ======================================== from numpy import arange, ones import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D fig = plt.figure() ax = fig.add_subplot(111, projection='3d') o = ones(4) r = arange(4) # planes: for z in arange(3)+1: ax.bar(r, o*4, zs=z, zdir='x', alpha=.05, width=1) ax.bar(r, o*4, zs=z, zdir='y', alpha=.05, width=1) ax.bar(r, o*4, zs=z, zdir='z', alpha=.05, width=1) # N for i in [1, 2]: ax.bar3d([3-i], [0], [i], [.9], [.1], [.9], color='y', linewidth=.1) ax.bar3d(o+(i*(-1)**i), o-1, r, o-.1, o-.9, o-.1, color='y', linewidth=.1) # cage ax.bar3d([0], [0], [0], [4], [4], [4], alpha=.05, color='w', linewidth=0) plt.show() # plt.savefig('numpy.png') ======================================== From fperez.net at gmail.com Wed Jun 27 02:18:02 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 26 Jun 2012 23:18:02 -0700 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: <2ADA61BB-148A-4BF8-B23B-BB1852C56801@continuum.io> References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> <895DAD05-5B24-4EBC-84F4-10442A4531B3@continuum.io> <2ADA61BB-148A-4BF8-B23B-BB1852C56801@continuum.io> Message-ID: On Tue, Jun 26, 2012 at 11:02 PM, Travis Oliphant wrote: > ?I just want to speak up for the people who are affected by API breakage who are not as vocal on this list. Certainly! And indeed I bet you that's a community underrepresented here: those of us who are on this list are likely to be up to speed on what's happening with the API and can therefore adjust to changes quickly, simply because we know they have occurred. Random J. User who gets an upstream update and all of a sudden finds previously working code to break is unlikely to be active here and will be very, very unhappy. If anything, the lesson is: for a project that's so deep in the dependency tree as numpy is, A{P,B}I stability is a paramount concern, with a cost that gets higher the more successful the project is. This means AXIs should evolve only in backwards-compatible ways when at all possible, with backwards-compatibility being broken only in: - clearly designated points that are agreed upon by as many as possible - with clear explanations of how old codes need to be adapted to the new interface to continue working - if at all possible with advance warnings, and even better, a system for 'future' loading. Python in fact has the __future__ imports that help quite a bit for people to start adapting their codes. How about creating a numpy.future module where new, non-backward-compatible APIs could go? That would give the adventurous a way to play with new features (hence getting them better tested) as well as an easier path for gradual migration to the new features by everyone. This may have already been discussed before, forgive me if I'm repeating well-known material. Cheers, f From ondrej.certik at gmail.com Wed Jun 27 02:55:24 2012 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Tue, 26 Jun 2012 23:55:24 -0700 Subject: [Numpy-discussion] NumPy 1.7 release delays In-Reply-To: References: <29DC563C-9A0C-4D8D-9F2A-E1A8D17D554B@continuum.io> <333C36E4-D2E4-4127-99C3-9D55468CA558@continuum.io> Message-ID: Hi Fernando, On Tue, Jun 26, 2012 at 6:00 PM, Fernando Perez wrote: > On Tue, Jun 26, 2012 at 5:40 PM, Ond?ej ?ert?k wrote: >> Do you use anything else besides Travis CI? > > Yes, we use both Shining Panda and Travis CI: > > https://jenkins.shiningpanda.com/ipython/ With NumPy, I am still there in the waitinglist, number 92. So I guess it will take a while. > http://travis-ci.org/#!/ipython/ipython > > The SP setup is more complete, including Mac and Windows bots. > >> I donated money to them and they enabled pull request >> testing for SymPy and it's invaluable. We also use >> our custom sympy-bot (https://github.com/sympy/sympy-bot) to test pull >> request, but now >> when Travis can do that, we might just use that. > > We have a version of that: after Aaron Meurer gave us an invaluable > and detailed report on how you guys used it, Thomas Kluyver built for > us our new test_pr script: > > https://github.com/ipython/ipython/blob/master/tools/test_pr.py > > which we regularly use now in most PRs, e.g.: > > https://github.com/ipython/ipython/pull/2015#issuecomment-6566387 > > It has proven to be *extremely* useful. I see, yes. > > This is some of the infrastructure that I hope we'll gradually start > using across all the projects (the topic of some of the threads in the > numfocus list). ?In IPython, our ability to rapidly absorb code has > improved tremendously in part thanks to the smooth workflow these > tools give us; just in the month of June we've merged 116 PRs totaling > over 400 commits: > > (master)dreamweaver[ipython]> git log --oneline --since 2012-06-01 | > grep "Merge pull request" | wc -l > 116 > > (master)dreamweaver[ipython]> git log --oneline --since 2012-06-01 | wc -l > 438 > > There's no way to keep that pace unless we can really trust our > testing machinery to let us know what's safe by the time we get to > code review. > > As our tools mature, I really hope we'll start using them more across > different projects, because the benefit they provide is undeniable. Thanks for the write up. Yes, I have exactly the same experience with SymPy's pull requests. So I have personally spent a lot of time trying to streamline the process and making sure that we can trust it and so on. My bet is that Travis CI will be *the* tool of choice for most projects at least on linux, to test the pull requests. Ondrej From ondrej.certik at gmail.com Wed Jun 27 02:57:51 2012 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Tue, 26 Jun 2012 23:57:51 -0700 Subject: [Numpy-discussion] NumPy 1.7 release delays In-Reply-To: References: <29DC563C-9A0C-4D8D-9F2A-E1A8D17D554B@continuum.io> <333C36E4-D2E4-4127-99C3-9D55468CA558@continuum.io> Message-ID: Hi Matthew, On Tue, Jun 26, 2012 at 7:07 PM, Matthew Brett wrote: > Hi, > > On Tue, Jun 26, 2012 at 6:00 PM, Fernando Perez wrote: >> On Tue, Jun 26, 2012 at 5:40 PM, Ond?ej ?ert?k wrote: >>> Do you use anything else besides Travis CI? >> >> Yes, we use both Shining Panda and Travis CI: >> >> https://jenkins.shiningpanda.com/ipython/ >> http://travis-ci.org/#!/ipython/ipython >> >> The SP setup is more complete, including Mac and Windows bots. >> >>> I donated money to them and they enabled pull request >>> testing for SymPy and it's invaluable. We also use >>> our custom sympy-bot (https://github.com/sympy/sympy-bot) to test pull >>> request, but now >>> when Travis can do that, we might just use that. >> >> We have a version of that: after Aaron Meurer gave us an invaluable >> and detailed report on how you guys used it, Thomas Kluyver built for >> us our new test_pr script: >> >> https://github.com/ipython/ipython/blob/master/tools/test_pr.py >> >> which we regularly use now in most PRs, e.g.: >> >> https://github.com/ipython/ipython/pull/2015#issuecomment-6566387 >> >> It has proven to be *extremely* useful. >> >> This is some of the infrastructure that I hope we'll gradually start >> using across all the projects (the topic of some of the threads in the >> numfocus list). ?In IPython, our ability to rapidly absorb code has >> improved tremendously in part thanks to the smooth workflow these >> tools give us; just in the month of June we've merged 116 PRs totaling >> over 400 commits: >> >> (master)dreamweaver[ipython]> git log --oneline --since 2012-06-01 | >> grep "Merge pull request" | wc -l >> 116 >> >> (master)dreamweaver[ipython]> git log --oneline --since 2012-06-01 | wc -l >> 438 >> >> There's no way to keep that pace unless we can really trust our >> testing machinery to let us know what's safe by the time we get to >> code review. >> >> As our tools mature, I really hope we'll start using them more across >> different projects, because the benefit they provide is undeniable. > > We (nipy'ers) are heavy users of numpy and scipy. > > We use travis-ci for testing individual commits to personal repos: > > https://github.com/nipy/nibabel/blob/master/.travis.yml > > (using standard travis-ci python test machinery, multiple python versions) > > https://github.com/nipy/dipy/blob/master/.travis.yml > https://github.com/nipy/nipy/blob/master/.travis.yml > > (using a hack to test against a system python, to avoid multiple > compiles of numpy / scipy). ?We've also been discussing numpy / scipy > compiles on the Travis-CI mailing list : > https://groups.google.com/d/topic/travis-ci/uJgu35XKdmI/discussion. > > For the main repos we use buildbot and test on: > > Ubuntu Maverick 32-bit > Debian sid 64-bit > OSX 10.4 PPC > OSX 10.5 Intel > Debian wheezy PPC > Debian squeeze ARM (a Raspberry PI no less) > WIndows XP 32 bit > SPARC (courtesy of our friends at NeuroDebian) > > http://nipy.bic.berkeley.edu/builders > > We've found several issues with numpy using these, and I've fed them > back as I found them, > > http://projects.scipy.org/numpy/ticket/2076 > http://projects.scipy.org/numpy/ticket/2077 > http://projects.scipy.org/numpy/ticket/2174 > > They are particularly useful for difficult to reproduce problems > because they test often and leave a record that we can point to. ?As > I've said before, y'all are welcome to use these machines for numpy > builds / tests. This is amazing, thanks a lot for the email. I'll talk to you offlist. Thanks, Ondrej From cgohlke at uci.edu Wed Jun 27 03:38:48 2012 From: cgohlke at uci.edu (Christoph Gohlke) Date: Wed, 27 Jun 2012 00:38:48 -0700 Subject: [Numpy-discussion] NumPy 1.7 release delays In-Reply-To: <1E8CCAF9-20F6-48DB-B5E1-A8B74BC194F4@continuum.io> References: <29DC563C-9A0C-4D8D-9F2A-E1A8D17D554B@continuum.io> <333C36E4-D2E4-4127-99C3-9D55468CA558@continuum.io> <1E8CCAF9-20F6-48DB-B5E1-A8B74BC194F4@continuum.io> Message-ID: <4FEAB888.7070807@uci.edu> On 6/26/2012 8:13 PM, Travis Oliphant wrote: >> For the main repos we use buildbot and test on: >> >> Ubuntu Maverick 32-bit >> Debian sid 64-bit >> OSX 10.4 PPC >> OSX 10.5 Intel >> Debian wheezy PPC >> Debian squeeze ARM (a Raspberry PI no less) >> WIndows XP 32 bit >> SPARC (courtesy of our friends at NeuroDebian) >> >> http://nipy.bic.berkeley.edu/builders >> >> We've found several issues with numpy using these, and I've fed them >> back as I found them, >> >> http://projects.scipy.org/numpy/ticket/2076 >> http://projects.scipy.org/numpy/ticket/2077 >> http://projects.scipy.org/numpy/ticket/2174 >> >> They are particularly useful for difficult to reproduce problems >> because they test often and leave a record that we can point to. As >> I've said before, y'all are welcome to use these machines for numpy >> builds / tests. > > Now that Ondrej is working on getting continuous integration up for NumPy, I would encourage him to take you up on that offer. Can these machines run a Jenkins slave? > > Having periodic tests of Sage, Pandas, matplotlib, scipy, and other projects is a major priority and really critical before we can really talk about how to migrate the APIs. Thankfully, Ondrej is available to help get this project started and working this summer. > > -Travis > FWIW: I can relatively easy (batch script) build numpy from github and run the test suites of many packages available at against it. For example at are the test results of assimulo, bitarray, bottleneck, h5py, matplotlib, numexpr, pandas, pygame, scipy, skimage, sklearn, statsmodels, and pytables, built against numpy-1.6.x and run against numpy-1.7.0.dev-66bd39f on win-amd64-py2.7. Christoph From klonuo at gmail.com Wed Jun 27 05:40:42 2012 From: klonuo at gmail.com (klo uo) Date: Wed, 27 Jun 2012 11:40:42 +0200 Subject: [Numpy-discussion] Numpy logo in VTK In-Reply-To: References: Message-ID: I continued in this mpl trip, with small animation sequence: ======================================== # animation ax.view_init(90,-90) plt.ion() plt.draw() plt.show() for l in arange(25): ax.set_xlim3d(1.5-.1*l,2.5+.1*l) ax.set_ylim3d(1.5-.1*l,2.5+.1*l) ax.view_init(90-3*l, -90+l) plt.draw() plt.title("NumPy") plt.ioff() plt.show() ======================================== Try it or check it out on YouTube: www.youtube.com/watch?v=mpYPS_zXAFw Whole script in attachment -------------- next part -------------- A non-text attachment was scrubbed... Name: nl.py Type: application/octet-stream Size: 1058 bytes Desc: not available URL: From brennan.williams at visualreservoir.com Wed Jun 27 05:51:48 2012 From: brennan.williams at visualreservoir.com (Brennan Williams) Date: Wed, 27 Jun 2012 10:51:48 +0100 Subject: [Numpy-discussion] Numpy logo in VTK In-Reply-To: References: Message-ID: <4FEAD7B4.1010004@visualreservoir.com> You're now reminding me of the old spinning SGI logo.... http://www.youtube.com/watch?v=Nqf6TjE49N8 Brennan On 27/06/2012 10:40 a.m., klo uo wrote: > I continued in this mpl trip, with small animation sequence: > > ======================================== > # animation > ax.view_init(90,-90) > plt.ion() > plt.draw() > plt.show() > > for l in arange(25): > ax.set_xlim3d(1.5-.1*l,2.5+.1*l) > ax.set_ylim3d(1.5-.1*l,2.5+.1*l) > ax.view_init(90-3*l, -90+l) > plt.draw() > > plt.title("NumPy") > plt.ioff() > plt.show() > ======================================== > > Try it or check it out on YouTube: www.youtube.com/watch?v=mpYPS_zXAFw > > Whole script in attachment > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From klonuo at gmail.com Wed Jun 27 06:09:00 2012 From: klonuo at gmail.com (klo uo) Date: Wed, 27 Jun 2012 12:09:00 +0200 Subject: [Numpy-discussion] Numpy logo in VTK In-Reply-To: <4FEAD7B4.1010004@visualreservoir.com> References: <4FEAD7B4.1010004@visualreservoir.com> Message-ID: Yeah, camera is in cliche, I know :D Something more original can be done, perhaps some idea of transforming grid in 2D (in Z plane) for opening sequence and then emerging latices in some analogy with numpy arrays, finishing with complete figure, but I guess not in matplotlib ;) From charlesr.harris at gmail.com Wed Jun 27 07:45:18 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 27 Jun 2012 05:45:18 -0600 Subject: [Numpy-discussion] NumPy 1.7 release plans In-Reply-To: <25C91477-BEBA-42DE-AE84-7498C9426F2D@continuum.io> References: <25C91477-BEBA-42DE-AE84-7498C9426F2D@continuum.io> Message-ID: On Tue, Jun 26, 2012 at 11:08 PM, Travis Oliphant wrote: > In my enthusiasm of finding someone to help with the release of NumPy 1.7 > and my desire to get something released by the SciPy conference, I was > hasty and didn't gather enough feedback from others about the release of > NumPy 1.7. I'm sorry about that. > > I would like to get NumPy 1.7 out the door as quickly as we can *and* make > sure it is as well tested as we can --- in fact I'm hoping we can also use > this opportunity to setup a Continuous Integration system for NumPy that > will essentially extend NumPy's testing infrastructure and make it easier > to do releases in the future. Ondrej, the author of SymPy, has agreed to > help on the both the release and the Continuous Integration side. Ideally > we would also start producing a code coverage report and a vbench report > for NumPy as well. This is much more likely to happen if there are other > people willing to pictch in > > (By the way, one of the goals of NumFOCUS is to provide Continuous > Integration and code coverage resources to all of the Scientific Python > projects as funds and community resources become available --- please email > numfocus at googlegroups.com if you are interested in helping with that > effort). > > So, I would propose a code-freeze by July 13th with a beta release of > NumPy 1.7 by July 17th. We will work to get that beta release actively > tested by as many projects as possible, leading to a release candidate by > July 31. If all goes well I could imagine a release by August 14. If > we need to make another release candidate, then we can do that August 14th > and push the release to August 28th. > > Let me know if there are any concerns about this updated schedule. > > That schedule sounds good to me. I thought Nathaniel did excellent work in getting tox and Travis CI started up. Kudos there. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Wed Jun 27 08:28:37 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 27 Jun 2012 07:28:37 -0500 Subject: [Numpy-discussion] NumPy 1.7 release plans In-Reply-To: References: <25C91477-BEBA-42DE-AE84-7498C9426F2D@continuum.io> Message-ID: <0F8F6670-59D4-4844-BD30-D90E5B17AA18@continuum.io> Great. Yes, the Travis-CI stuff looks great. There are a lot of good CI things happening on a lot of fronts. It is encouraging to see. It would be good to consolidate them --- or at least have a place to go to look at output from many of them. Travis -- Travis Oliphant (on a mobile) 512-826-7480 On Jun 27, 2012, at 6:45 AM, Charles R Harris wrote: > > > On Tue, Jun 26, 2012 at 11:08 PM, Travis Oliphant wrote: > In my enthusiasm of finding someone to help with the release of NumPy 1.7 and my desire to get something released by the SciPy conference, I was hasty and didn't gather enough feedback from others about the release of NumPy 1.7. I'm sorry about that. > > I would like to get NumPy 1.7 out the door as quickly as we can *and* make sure it is as well tested as we can --- in fact I'm hoping we can also use this opportunity to setup a Continuous Integration system for NumPy that will essentially extend NumPy's testing infrastructure and make it easier to do releases in the future. Ondrej, the author of SymPy, has agreed to help on the both the release and the Continuous Integration side. Ideally we would also start producing a code coverage report and a vbench report for NumPy as well. This is much more likely to happen if there are other people willing to pictch in > > (By the way, one of the goals of NumFOCUS is to provide Continuous Integration and code coverage resources to all of the Scientific Python projects as funds and community resources become available --- please email numfocus at googlegroups.com if you are interested in helping with that effort). > > So, I would propose a code-freeze by July 13th with a beta release of NumPy 1.7 by July 17th. We will work to get that beta release actively tested by as many projects as possible, leading to a release candidate by July 31. If all goes well I could imagine a release by August 14. If we need to make another release candidate, then we can do that August 14th and push the release to August 28th. > > Let me know if there are any concerns about this updated schedule. > > > That schedule sounds good to me. > > I thought Nathaniel did excellent work in getting tox and Travis CI started up. Kudos there. > > Chuck > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From jniehof at lanl.gov Wed Jun 27 08:58:44 2012 From: jniehof at lanl.gov (Jonathan T. Niehof) Date: Wed, 27 Jun 2012 06:58:44 -0600 Subject: [Numpy-discussion] NumPy 1.7 release delays In-Reply-To: References: <29DC563C-9A0C-4D8D-9F2A-E1A8D17D554B@continuum.io> <333C36E4-D2E4-4127-99C3-9D55468CA558@continuum.io> Message-ID: <4FEB0384.7020507@lanl.gov> On 06/26/2012 06:15 PM, Skipper Seabold wrote: > My uninformed opinion from the sidelines: For me, this begged the > question of why projects would wait so long and be upgrading 1.5.x -> > 1.7.x. it sounded to me like an outreach problem. lenny: none squeeze: 1.4.1 wheezy: 1.6.2 hardy: 1.0.4 lucid: 1.3.0 precise: 1.6.1 Even those of us who are developing downstream aren't necessarily rushing to hand-roll all the dependencies. Our users certainly aren't and the alternatives of abandoning support for earlier versions of numpy or having a bunch of version tests aren't appealing, either. I am somewhat surprised by the notion of breaking API compatibility without changing the major version number. -- Jonathan Niehof ISR-3 Space Data Systems Los Alamos National Laboratory MS-D466 Los Alamos, NM 87545 Phone: 505-667-9595 email: jniehof at lanl.gov Correspondence / Technical data or Software Publicly Available From jdh2358 at gmail.com Wed Jun 27 09:20:24 2012 From: jdh2358 at gmail.com (John Hunter) Date: Wed, 27 Jun 2012 08:20:24 -0500 Subject: [Numpy-discussion] NumPy 1.7 release delays In-Reply-To: References: <29DC563C-9A0C-4D8D-9F2A-E1A8D17D554B@continuum.io> <333C36E4-D2E4-4127-99C3-9D55468CA558@continuum.io> Message-ID: > Some examples would be nice. A lot of people did move already. And I haven't > seen reports of those that tried and got stuck. Also, Debian and Python(x, > y) have 1.6.2, EPD has 1.6.1. In my company, the numpy for our production python install is well behind 1.6. In the world of trading, the upgrade cycle can be slow, because when people have production trading systems that are working and running stably, they have little or no incentive to upgrade. I know Travis has been doing a lot of consulting inside major banks and investment houses, and these are probably the kinds of people he sees regularly. You also have a fair amount of personnel turnover over the years, so that the developer who wrote the trading system may have moved on, and an upgrade which breaks the code is difficult to repair because the original developers are gone. So people are loathe to upgrade. It is certainly true that deprecations that have lived for a single point release cycle have not been vetted by a large part of the user community. In my group, we try to stay as close to the bleeding edge as possible so as to not fall behind and make an upgrade painful, but we are not the rule. JDH From charlesr.harris at gmail.com Wed Jun 27 10:33:35 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 27 Jun 2012 08:33:35 -0600 Subject: [Numpy-discussion] NumPy 1.7 release delays In-Reply-To: References: <29DC563C-9A0C-4D8D-9F2A-E1A8D17D554B@continuum.io> <333C36E4-D2E4-4127-99C3-9D55468CA558@continuum.io> Message-ID: On Wed, Jun 27, 2012 at 7:20 AM, John Hunter wrote: > > Some examples would be nice. A lot of people did move already. And I > haven't > > seen reports of those that tried and got stuck. Also, Debian and > Python(x, > > y) have 1.6.2, EPD has 1.6.1. > > In my company, the numpy for our production python install is well > behind 1.6. In the world of trading, the upgrade cycle can be slow, > because when people have production trading systems that are working > and running stably, they have little or no incentive to upgrade. I > know Travis has been doing a lot of consulting inside major banks and > investment houses, and these are probably the kinds of people he sees > regularly. You also have a fair amount of personnel turnover over the > years, so that the developer who wrote the trading system may have > moved on, and an upgrade which breaks the code is difficult to repair > because the original developers are gone. So people are loathe to > upgrade. It is certainly true that deprecations that have lived for a > single point release cycle have not been vetted by a large part of the > user community. > I'd also venture a guess that many of those installations don't have adequate test suites. > > In my group, we try to stay as close to the bleeding edge as possible > so as to not fall behind and make an upgrade painful, but we are not > the rule. > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pwang at continuum.io Wed Jun 27 10:43:22 2012 From: pwang at continuum.io (Peter Wang) Date: Wed, 27 Jun 2012 09:43:22 -0500 Subject: [Numpy-discussion] NumPy 1.7 release delays In-Reply-To: References: <29DC563C-9A0C-4D8D-9F2A-E1A8D17D554B@continuum.io> <333C36E4-D2E4-4127-99C3-9D55468CA558@continuum.io> Message-ID: On Wed, Jun 27, 2012 at 9:33 AM, Charles R Harris wrote: > On Wed, Jun 27, 2012 at 7:20 AM, John Hunter wrote: >> because the original developers are gone. ?So people are loathe to >> upgrade. ?It is certainly true that deprecations that have lived for a >> single point release cycle have not been vetted by a large part of the >> user community. > > I'd also venture a guess that many of those installations don't have > adequate test suites. Depends how you define "adequate". If these companies stopped being able to make money, model science, or serve up web applications due to the lack of a test suite, then they would immediately put all their efforts on it. But for users for whom Numpy (and software in general) is merely a means to an end, the management of technical debt and future technology risk is merely one component of all the risk factors facing the organization. Every hour spent managing code and technical debt is an hour of lost business opportunity, and that balance is very conscientiously weighed and oftentimes the decision is not in the direction of quality of software process. In my experience, it's a toss up.. most people have reasonable unit tests and small integration tests, but large scale smoke tests can be very difficult to maintain or to justify to upper management. Because Numpy can be both a component of larger software or a direct tool in its own right, I've found that it makes a big difference whether an organization sees code as a means to an end, or an ends unto itself. -Peter From charlesr.harris at gmail.com Wed Jun 27 11:13:36 2012 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 27 Jun 2012 09:13:36 -0600 Subject: [Numpy-discussion] NumPy 1.7 release delays In-Reply-To: References: <29DC563C-9A0C-4D8D-9F2A-E1A8D17D554B@continuum.io> <333C36E4-D2E4-4127-99C3-9D55468CA558@continuum.io> Message-ID: On Wed, Jun 27, 2012 at 8:43 AM, Peter Wang wrote: > On Wed, Jun 27, 2012 at 9:33 AM, Charles R Harris > wrote: > > On Wed, Jun 27, 2012 at 7:20 AM, John Hunter wrote: > >> because the original developers are gone. So people are loathe to > >> upgrade. It is certainly true that deprecations that have lived for a > >> single point release cycle have not been vetted by a large part of the > >> user community. > > > > I'd also venture a guess that many of those installations don't have > > adequate test suites. > > Depends how you define "adequate". If these companies stopped being > able to make money, model science, or serve up web applications due to > the lack of a test suite, then they would immediately put all their > efforts on it. But for users for whom Numpy (and software in general) > is merely a means to an end, the management of technical debt and > future technology risk is merely one component of all the risk factors > facing the organization. Every hour spent managing code and technical > debt is an hour of lost business opportunity, and that balance is very > conscientiously weighed and oftentimes the decision is not in the > direction of quality of software process. > > In my experience, it's a toss up.. most people have reasonable unit > tests and small integration tests, but large scale smoke tests can be > very difficult to maintain or to justify to upper management. Because > Numpy can be both a component of larger software or a direct tool in > its own right, I've found that it makes a big difference whether an > organization sees code as a means to an end, or an ends unto itself. > > Yep. What I meant by adequate was adequate for safely upgrading/refactoring. My experience is that folks will stay with what they have as long as it gets the job done. Competition can drive changes, but even that can be an unreliable prod as it may take several years before losing an edge begins to hurt. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From vs at it.uu.se Wed Jun 27 11:20:53 2012 From: vs at it.uu.se (Virgil Stokes) Date: Wed, 27 Jun 2012 17:20:53 +0200 Subject: [Numpy-discussion] Numpy logo in VTK In-Reply-To: References: Message-ID: <4FEB24D5.6040805@it.uu.se> On 27-Jun-2012 11:40, klo uo wrote: > I continued in this mpl trip, with small animation sequence: > > ======================================== > # animation > ax.view_init(90,-90) > plt.ion() > plt.draw() > plt.show() > > for l in arange(25): > ax.set_xlim3d(1.5-.1*l,2.5+.1*l) > ax.set_ylim3d(1.5-.1*l,2.5+.1*l) > ax.view_init(90-3*l, -90+l) > plt.draw() > > plt.title("NumPy") > plt.ioff() > plt.show() > ======================================== > > Try it or check it out on YouTube: www.youtube.com/watch?v=mpYPS_zXAFw > > Whole script in attachment > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion +1 --- looks good. --V -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Wed Jun 27 11:37:28 2012 From: travis at continuum.io (Travis Oliphant) Date: Wed, 27 Jun 2012 10:37:28 -0500 Subject: [Numpy-discussion] Numpy logo in VTK In-Reply-To: <4FEB24D5.6040805@it.uu.se> References: <4FEB24D5.6040805@it.uu.se> Message-ID: <9200870E-42A5-4F14-A769-CF733E52E968@continuum.io> This is cool. It would be nice to put these things somewhere where they could be available for reference. -Travis On Jun 27, 2012, at 10:20 AM, Virgil Stokes wrote: > On 27-Jun-2012 11:40, klo uo wrote: >> I continued in this mpl trip, with small animation sequence: >> >> ======================================== >> # animation >> ax.view_init(90,-90) >> plt.ion() >> plt.draw() >> plt.show() >> >> for l in arange(25): >> ax.set_xlim3d(1.5-.1*l,2.5+.1*l) >> ax.set_ylim3d(1.5-.1*l,2.5+.1*l) >> ax.view_init(90-3*l, -90+l) >> plt.draw() >> >> plt.title("NumPy") >> plt.ioff() >> plt.show() >> ======================================== >> >> Try it or check it out on YouTube: www.youtube.com/watch?v=mpYPS_zXAFw >> >> Whole script in attachment >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > +1 --- looks good. > --V > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Jun 27 12:25:45 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 27 Jun 2012 17:25:45 +0100 Subject: [Numpy-discussion] What's the most numpythonic way to support multiple types in a C extension? In-Reply-To: References: Message-ID: On Tue, Jun 26, 2012 at 10:53 PM, John Salvatier wrote: > I want to support multiple types in the index_increment function that I've > written here: > https://github.com/jsalvatier/numpy/blob/master/numpy/core/src/multiarray/mapping.c > > I need to check that the first argument's type can support addition, cast > the dataptr to the appropriate type and do the addition operation for that > type. It looks like some of the numpy code uses .c.src files to do > templating. Is that what I want to do here? Is the syntax described > somewhere? The proper way would be use the ufunc machinery, which already knows how to perform addition on arbitrary numpy dtypes... unfortunately this may be more complicated than you are hoping :-/. Since there's nothing about this operation that is specific to the addition operation or to the double type, I guess the ideal API would actually be something like, an extra method added to binary ufuncs np.add.inplace_indexed(a, idx, b) which would be equivalent to a[idx] += b except that duplicate indices would be handled properly, and it would avoid making a copy in the case of fancy indexing. You could look at the implementation of ufunc.reduceat (numpy/core/src/umath/ufunc_object.c:PyUFunc_Reduceat) for an idea of how such fancy ufunc methods can be done. (An even more ideal API would find some way to make this work naturally with where=, but it's not obvious to me how that would work.) -n From vs at it.uu.se Wed Jun 27 12:34:38 2012 From: vs at it.uu.se (Virgil Stokes) Date: Wed, 27 Jun 2012 18:34:38 +0200 Subject: [Numpy-discussion] Numpy logo in VTK In-Reply-To: References: Message-ID: <4FEB361E.8090509@it.uu.se> On 27-Jun-2012 08:04, klo uo wrote: > from numpy import arange, ones > import matplotlib.pyplot as plt > from mpl_toolkits.mplot3d import Axes3D > > fig = plt.figure() > ax = fig.add_subplot(111, projection='3d') > > o = ones(4) > r = arange(4) > > # planes: > for z in arange(3)+1: > ax.bar(r, o*4, zs=z, zdir='x', alpha=.05, width=1) > ax.bar(r, o*4, zs=z, zdir='y', alpha=.05, width=1) > ax.bar(r, o*4, zs=z, zdir='z', alpha=.05, width=1) > > # N > for i in [1, 2]: > ax.bar3d([3-i], [0], [i], [.9], [.1], [.9], color='y', linewidth=.1) > ax.bar3d(o+(i*(-1)**i), o-1, r, o-.1, o-.9, o-.1, color='y', linewidth=.1) > > # cage > ax.bar3d([0], [0], [0], [4], [4], [4], alpha=.05, color='w', linewidth=0) > > plt.show() > # plt.savefig('numpy.png') Umh... The first version that you posted looks ok on my screen (N is not inverted). And this version shows no difference in the "N"; but, it does show tick marks labeled with numerical values. --V From njs at pobox.com Wed Jun 27 12:52:29 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 27 Jun 2012 17:52:29 +0100 Subject: [Numpy-discussion] API policy In-Reply-To: References: <905AFE89-FA0B-48E0-A5C3-13A39BDC55E1@continuum.io> Message-ID: On Tue, Jun 26, 2012 at 10:02 PM, Ralf Gommers wrote: > I agree with what you're arguing for here (as little impact as possible on > existing users), but your view of especially 1.6.x seems to be skewed by > regressions and changes that were either unintended or thought to be okay > because the affected numpy behavior was undocumented / off-label / untested. > The poor test coverage being the number one culprit (example regression: > http://projects.scipy.org/numpy/ticket/2078). Thanks for the reminder... https://github.com/numpy/numpy/pull/323 -n From njs at pobox.com Wed Jun 27 14:17:23 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 27 Jun 2012 19:17:23 +0100 Subject: [Numpy-discussion] Combined versus separate build Message-ID: Currently the numpy build system(s) support two ways of building numpy: either by compiling a giant concatenated C file, or by the more conventional route of first compiling each .c file to a .o file, and then linking those together. I gather from comments in the source code that the former is the traditional method, and the latter is the newer "experimental" approach. It's easy to break one of these builds without breaking the other (I just did this with the NA branch, and David had to clean up after me), and I don't see what value we really get from having both options -- it seems to just double the size of the test matrix without adding value. Now that the separate build seems to be fully supported, maybe it's time to finish the "experiment" and pick one approach to support going forward? I guess the arguments for each would be: - The monolithic build in principle allows for some extra intra-procedural optimization. I won't believe this until I see benchmarks, though; numpy doesn't have a lot of tiny inline-able function calls or anything like that. - The separate build is probably more convenient for developers, allowing faster rebuilds. Numpy builds fast enough for me that I'm not too worried about which approach we use, but it definitely seems worthwhile to reduce the number of configurations we have to support one way or the other. -N From njs at pobox.com Wed Jun 27 14:22:10 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 27 Jun 2012 19:22:10 +0100 Subject: [Numpy-discussion] Non-deterministic test failure in master Message-ID: According to the Travis-CI build logs, this code produces non-deterministic behaviour in master: a = np.arange(5) a[:3] = a[2:] assert_equal(a, [2, 3, 4, 3, 4]) Sometimes 'a' is [2, 3, 4, 3, 4], and sometimes it is [4, 3, 4, 3, 4]. The latter is what you get if the assignment is done 'backwards', like: a[2] = a[4] a[1] = a[3] a[0] = a[2] For example, in this build the above test failed on Python 3.2 (but passed on all other versions): http://travis-ci.org/#!/numpy/numpy/jobs/1676766 while in this build, it failed on Python 2.5 (but passed on all other versions): http://travis-ci.org/#!/numpy/numpy/jobs/1722121 Looks like we have a memcpy somewhere that should be a memmove? -n From cournape at gmail.com Wed Jun 27 14:50:12 2012 From: cournape at gmail.com (David Cournapeau) Date: Wed, 27 Jun 2012 19:50:12 +0100 Subject: [Numpy-discussion] Combined versus separate build In-Reply-To: References: Message-ID: On Wed, Jun 27, 2012 at 7:17 PM, Nathaniel Smith wrote: > Currently the numpy build system(s) support two ways of building > numpy: either by compiling a giant concatenated C file, or by the more > conventional route of first compiling each .c file to a .o file, and > then linking those together. I gather from comments in the source code > that the former is the traditional method, and the latter is the newer > "experimental" approach. > > It's easy to break one of these builds without breaking the other (I > just did this with the NA branch, and David had to clean up after me), > and I don't see what value we really get from having both options -- > it seems to just double the size of the test matrix without adding > value. There is unfortunately a big value in it: there is no standard way in C to share symbols within a library without polluting the whole process namespace, except on windows where the default is to export nothing. Most compilers support it (I actually know of none that does not support it in some way or the others), but that's platform-specific. I do find the multi-file support useful when developing (it does not make the full build faster, but I find partial rebuild too slow without it). David From njs at pobox.com Wed Jun 27 15:07:42 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 27 Jun 2012 20:07:42 +0100 Subject: [Numpy-discussion] Combined versus separate build In-Reply-To: References: Message-ID: On Wed, Jun 27, 2012 at 7:50 PM, David Cournapeau wrote: > On Wed, Jun 27, 2012 at 7:17 PM, Nathaniel Smith wrote: >> Currently the numpy build system(s) support two ways of building >> numpy: either by compiling a giant concatenated C file, or by the more >> conventional route of first compiling each .c file to a .o file, and >> then linking those together. I gather from comments in the source code >> that the former is the traditional method, and the latter is the newer >> "experimental" approach. >> >> It's easy to break one of these builds without breaking the other (I >> just did this with the NA branch, and David had to clean up after me), >> and I don't see what value we really get from having both options -- >> it seems to just double the size of the test matrix without adding >> value. > > There is unfortunately a big value in it: there is no standard way in > C to share symbols within a library without polluting the whole > process namespace, except on windows where the default is to export > nothing. > > Most compilers support it (I actually know of none that does not > support it in some way or the others), but that's platform-specific. IIRC this isn't too tricky to arrange for with gcc, but why is this an issue in the first place for a Python extension module? Extension modules are opened without RTLD_GLOBAL, which means that they *never* export any symbols. At least, that's how it should work on Linux and most Unix-alikes; I don't know much about OS X's linker, except that it's unusual in other ways. -N From cournape at gmail.com Wed Jun 27 15:29:54 2012 From: cournape at gmail.com (David Cournapeau) Date: Wed, 27 Jun 2012 20:29:54 +0100 Subject: [Numpy-discussion] Combined versus separate build In-Reply-To: References: Message-ID: On Wed, Jun 27, 2012 at 8:07 PM, Nathaniel Smith wrote: > On Wed, Jun 27, 2012 at 7:50 PM, David Cournapeau wrote: >> On Wed, Jun 27, 2012 at 7:17 PM, Nathaniel Smith wrote: >>> Currently the numpy build system(s) support two ways of building >>> numpy: either by compiling a giant concatenated C file, or by the more >>> conventional route of first compiling each .c file to a .o file, and >>> then linking those together. I gather from comments in the source code >>> that the former is the traditional method, and the latter is the newer >>> "experimental" approach. >>> >>> It's easy to break one of these builds without breaking the other (I >>> just did this with the NA branch, and David had to clean up after me), >>> and I don't see what value we really get from having both options -- >>> it seems to just double the size of the test matrix without adding >>> value. >> >> There is unfortunately a big value in it: there is no standard way in >> C to share symbols within a library without polluting the whole >> process namespace, except on windows where the default is to export >> nothing. >> >> Most compilers support it (I actually know of none that does not >> support it in some way or the others), but that's platform-specific. > > IIRC this isn't too tricky to arrange for with gcc No, which is why this is supported for gcc and windows :) >, but why is this an > issue in the first place for a Python extension module? Extension > modules are opened without RTLD_GLOBAL, which means that they *never* > export any symbols. At least, that's how it should work on Linux and > most Unix-alikes; I don't know much about OS X's linker, except that > it's unusual in other ways. The pragmatic answer is that if it were not an issue, python itself would not bother with it. Every single extension module in python itself is built from a single compilation unit. This is also why we have this awful system to export the numpy C API with array of function pointers instead of simply exporting things in a standard way. See this: http://docs.python.org/release/2.5.3/ext/using-cobjects.html Looking quickly at the 2.7.3 sources, the more detailed answer is that python actually does not use RTLD_LOCAL (nor RTLD_GLOBAL), and what happens when neither of them is used is implementation-dependent. It seems to be RTLD_LOCAL on linux, and RTLD_GLOBAL on mac os x. There also may be consequences on the use of RTLD_LOCAL in embedded mode (I have ancient and bad memories with matlab related to this, but I forgot the details). David From njs at pobox.com Wed Jun 27 15:53:50 2012 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 27 Jun 2012 20:53:50 +0100 Subject: [Numpy-discussion] Combined versus separate build In-Reply-To: References: Message-ID: On Wed, Jun 27, 2012 at 8:29 PM, David Cournapeau wrote: > On Wed, Jun 27, 2012 at 8:07 PM, Nathaniel Smith wrote: >> On Wed, Jun 27, 2012 at 7:50 PM, David Cournapeau wrote: >>> On Wed, Jun 27, 2012 at 7:17 PM, Nathaniel Smith wrote: >>>> Currently the numpy build system(s) support two ways of building >>>> numpy: either by compiling a giant concatenated C file, or by the more >>>> conventional route of first compiling each .c file to a .o file, and >>>> then linking those together. I gather from comments in the source code >>>> that the former is the traditional method, and the latter is the newer >>>> "experimental" approach. >>>> >>>> It's easy to break one of these builds without breaking the other (I >>>> just did this with the NA branch, and David had to clean up after me), >>>> and I don't see what value we really get from having both options -- >>>> it seems to just double the size of the test matrix without adding >>>> value. >>> >>> There is unfortunately a big value in it: there is no standard way in >>> C to share symbols within a library without polluting the whole >>> process namespace, except on windows where the default is to export >>> nothing. >>> >>> Most compilers support it (I actually know of none that does not >>> support it in some way or the others), but that's platform-specific. >> >> IIRC this isn't too tricky to arrange for with gcc > > No, which is why this is supported for gcc and windows :) > >>, but why is this an >> issue in the first place for a Python extension module? Extension >> modules are opened without RTLD_GLOBAL, which means that they *never* >> export any symbols. At least, that's how it should work on Linux and >> most Unix-alikes; I don't know much about OS X's linker, except that >> it's unusual in other ways. > > The pragmatic answer is that if it were not an issue, python itself > would not bother with it. Every single extension module in python > itself is built from a single compilation unit. This is also why we > have this awful system to export the numpy C API with array of > function pointers instead of simply exporting things in a standard > way. The array-of-function-pointers is solving the opposite problem, of exporting functions *without* having global symbols. > See this: http://docs.python.org/release/2.5.3/ext/using-cobjects.html > > Looking quickly at the 2.7.3 sources, the more detailed answer is that > python actually does not use RTLD_LOCAL (nor RTLD_GLOBAL), and what > happens when neither of them is used is implementation-dependent. It > seems to be RTLD_LOCAL on linux, and RTLD_GLOBAL on mac os x. There > also may be consequences on the use of RTLD_LOCAL in embedded mode (I > have ancient and bad memories with matlab related to this, but I > forgot the details). See, I knew OS X was quirky :-). That's what I get for trusting dlopen(3). But seriously, what compilers do we support that don't have -fvisibility=hidden? ...Is there even a list of compilers we support available anywhere? -N From d.s.seljebotn at astro.uio.no Wed Jun 27 15:57:13 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 27 Jun 2012 21:57:13 +0200 Subject: [Numpy-discussion] Combined versus separate build In-Reply-To: References: Message-ID: <4FEB6599.9020904@astro.uio.no> On 06/27/2012 09:53 PM, Nathaniel Smith wrote: > On Wed, Jun 27, 2012 at 8:29 PM, David Cournapeau wrote: >> On Wed, Jun 27, 2012 at 8:07 PM, Nathaniel Smith wrote: >>> On Wed, Jun 27, 2012 at 7:50 PM, David Cournapeau wrote: >>>> On Wed, Jun 27, 2012 at 7:17 PM, Nathaniel Smith wrote: >>>>> Currently the numpy build system(s) support two ways of building >>>>> numpy: either by compiling a giant concatenated C file, or by the more >>>>> conventional route of first compiling each .c file to a .o file, and >>>>> then linking those together. I gather from comments in the source code >>>>> that the former is the traditional method, and the latter is the newer >>>>> "experimental" approach. >>>>> >>>>> It's easy to break one of these builds without breaking the other (I >>>>> just did this with the NA branch, and David had to clean up after me), >>>>> and I don't see what value we really get from having both options -- >>>>> it seems to just double the size of the test matrix without adding >>>>> value. >>>> >>>> There is unfortunately a big value in it: there is no standard way in >>>> C to share symbols within a library without polluting the whole >>>> process namespace, except on windows where the default is to export >>>> nothing. >>>> >>>> Most compilers support it (I actually know of none that does not >>>> support it in some way or the others), but that's platform-specific. >>> >>> IIRC this isn't too tricky to arrange for with gcc >> >> No, which is why this is supported for gcc and windows :) >> >>> , but why is this an >>> issue in the first place for a Python extension module? Extension >>> modules are opened without RTLD_GLOBAL, which means that they *never* >>> export any symbols. At least, that's how it should work on Linux and >>> most Unix-alikes; I don't know much about OS X's linker, except that >>> it's unusual in other ways. >> >> The pragmatic answer is that if it were not an issue, python itself >> would not bother with it. Every single extension module in python >> itself is built from a single compilation unit. This is also why we >> have this awful system to export the numpy C API with array of >> function pointers instead of simply exporting things in a standard >> way. > > The array-of-function-pointers is solving the opposite problem, of > exporting functions *without* having global symbols. > >> See this: http://docs.python.org/release/2.5.3/ext/using-cobjects.html >> >> Looking quickly at the 2.7.3 sources, the more detailed answer is that >> python actually does not use RTLD_LOCAL (nor RTLD_GLOBAL), and what >> happens when neither of them is used is implementation-dependent. It >> seems to be RTLD_LOCAL on linux, and RTLD_GLOBAL on mac os x. There >> also may be consequences on the use of RTLD_LOCAL in embedded mode (I >> have ancient and bad memories with matlab related to this, but I >> forgot the details). > > See, I knew OS X was quirky :-). That's what I get for trusting dlopen(3). > > But seriously, what compilers do we support that don't have > -fvisibility=hidden? ...Is there even a list of compilers we support > available anywhere? You could at the very least switch the default for a couple of releases, introducing a new flag with a "please email numpy-discussion if you use this" note, and see if anybody complains? Dag From cournape at gmail.com Wed Jun 27 16:05:43 2012 From: cournape at gmail.com (David Cournapeau) Date: Wed, 27 Jun 2012 21:05:43 +0100 Subject: [Numpy-discussion] Combined versus separate build In-Reply-To: References: Message-ID: On Wed, Jun 27, 2012 at 8:53 PM, Nathaniel Smith wrote: > On Wed, Jun 27, 2012 at 8:29 PM, David Cournapeau wrote: >> On Wed, Jun 27, 2012 at 8:07 PM, Nathaniel Smith wrote: >>> On Wed, Jun 27, 2012 at 7:50 PM, David Cournapeau wrote: >>>> On Wed, Jun 27, 2012 at 7:17 PM, Nathaniel Smith wrote: >>>>> Currently the numpy build system(s) support two ways of building >>>>> numpy: either by compiling a giant concatenated C file, or by the more >>>>> conventional route of first compiling each .c file to a .o file, and >>>>> then linking those together. I gather from comments in the source code >>>>> that the former is the traditional method, and the latter is the newer >>>>> "experimental" approach. >>>>> >>>>> It's easy to break one of these builds without breaking the other (I >>>>> just did this with the NA branch, and David had to clean up after me), >>>>> and I don't see what value we really get from having both options -- >>>>> it seems to just double the size of the test matrix without adding >>>>> value. >>>> >>>> There is unfortunately a big value in it: there is no standard way in >>>> C to share symbols within a library without polluting the whole >>>> process namespace, except on windows where the default is to export >>>> nothing. >>>> >>>> Most compilers support it (I actually know of none that does not >>>> support it in some way or the others), but that's platform-specific. >>> >>> IIRC this isn't too tricky to arrange for with gcc >> >> No, which is why this is supported for gcc and windows :) >> >>>, but why is this an >>> issue in the first place for a Python extension module? Extension >>> modules are opened without RTLD_GLOBAL, which means that they *never* >>> export any symbols. At least, that's how it should work on Linux and >>> most Unix-alikes; I don't know much about OS X's linker, except that >>> it's unusual in other ways. >> >> The pragmatic answer is that if it were not an issue, python itself >> would not bother with it. Every single extension module in python >> itself is built from a single compilation unit. This is also why we >> have this awful system to export the numpy C API with array of >> function pointers instead of simply exporting things in a standard >> way. > > The array-of-function-pointers is solving the opposite problem, of > exporting functions *without* having global symbols. I meant that the lack of standard around symbols and namespaces is why we have to do those hacks. Most platforms have much better solutions to those problems. > >> See this: http://docs.python.org/release/2.5.3/ext/using-cobjects.html >> >> Looking quickly at the 2.7.3 sources, the more detailed answer is that >> python actually does not use RTLD_LOCAL (nor RTLD_GLOBAL), and what >> happens when neither of them is used is implementation-dependent. It >> seems to be RTLD_LOCAL on linux, and RTLD_GLOBAL on mac os x. There >> also may be consequences on the use of RTLD_LOCAL in embedded mode (I >> have ancient and bad memories with matlab related to this, but I >> forgot the details). > > See, I knew OS X was quirky :-). That's what I get for trusting dlopen(3). > > But seriously, what compilers do we support that don't have > -fvisibility=hidden? ...Is there even a list of compilers we support > available anywhere? Well, I am not sure how all this is handled on the big guys (bluegen and co), for once. There is also the issue of the consequence on statically linking numpy to python: I don't what they are (I would actually like to make statically linked numpy into python easier, not harder). David From cournape at gmail.com Wed Jun 27 16:07:13 2012 From: cournape at gmail.com (David Cournapeau) Date: Wed, 27 Jun 2012 21:07:13 +0100 Subject: [Numpy-discussion] Combined versus separate build In-Reply-To: <4FEB6599.9020904@astro.uio.no> References: <4FEB6599.9020904@astro.uio.no> Message-ID: On Wed, Jun 27, 2012 at 8:57 PM, Dag Sverre Seljebotn wrote: > On 06/27/2012 09:53 PM, Nathaniel Smith wrote: >> On Wed, Jun 27, 2012 at 8:29 PM, David Cournapeau ?wrote: >>> On Wed, Jun 27, 2012 at 8:07 PM, Nathaniel Smith ?wrote: >>>> On Wed, Jun 27, 2012 at 7:50 PM, David Cournapeau ?wrote: >>>>> On Wed, Jun 27, 2012 at 7:17 PM, Nathaniel Smith ?wrote: >>>>>> Currently the numpy build system(s) support two ways of building >>>>>> numpy: either by compiling a giant concatenated C file, or by the more >>>>>> conventional route of first compiling each .c file to a .o file, and >>>>>> then linking those together. I gather from comments in the source code >>>>>> that the former is the traditional method, and the latter is the newer >>>>>> "experimental" approach. >>>>>> >>>>>> It's easy to break one of these builds without breaking the other (I >>>>>> just did this with the NA branch, and David had to clean up after me), >>>>>> and I don't see what value we really get from having both options -- >>>>>> it seems to just double the size of the test matrix without adding >>>>>> value. >>>>> >>>>> There is unfortunately a big value in it: there is no standard way in >>>>> C to share symbols within a library without polluting the whole >>>>> process namespace, except on windows where the default is to export >>>>> nothing. >>>>> >>>>> Most compilers support it (I actually know of none that does not >>>>> support it in some way or the others), but that's platform-specific. >>>> >>>> IIRC this isn't too tricky to arrange for with gcc >>> >>> No, which is why this is supported for gcc and windows :) >>> >>>> , but why is this an >>>> issue in the first place for a Python extension module? Extension >>>> modules are opened without RTLD_GLOBAL, which means that they *never* >>>> export any symbols. At least, that's how it should work on Linux and >>>> most Unix-alikes; I don't know much about OS X's linker, except that >>>> it's unusual in other ways. >>> >>> The pragmatic answer is that if it were not an issue, python itself >>> would not bother with it. Every single extension module in python >>> itself is built from a single compilation unit. This is also why we >>> have this awful system to export the numpy C API with array of >>> function pointers instead of simply exporting things in a standard >>> way. >> >> The array-of-function-pointers is solving the opposite problem, of >> exporting functions *without* having global symbols. >> >>> See this: http://docs.python.org/release/2.5.3/ext/using-cobjects.html >>> >>> Looking quickly at the 2.7.3 sources, the more detailed answer is that >>> python actually does not use RTLD_LOCAL (nor RTLD_GLOBAL), and what >>> happens when neither of them is used is implementation-dependent. It >>> seems to be RTLD_LOCAL on linux, and RTLD_GLOBAL on mac os x. There >>> also may be consequences on the use of RTLD_LOCAL in embedded mode (I >>> have ancient and bad memories with matlab related to this, but I >>> forgot the details). >> >> See, I knew OS X was quirky :-). That's what I get for trusting dlopen(3). >> >> But seriously, what compilers do we support that don't have >> -fvisibility=hidden? ...Is there even a list of compilers we support >> available anywhere? > > You could at the very least switch the default for a couple of releases, > introducing a new flag with a "please email numpy-discussion if you use > this" note, and see if anybody complains? Yes, we could. That's actually why I set up travis-CI to build both configurations in the first place :) (see https://github.com/numpy/numpy/issues/315) David From jsalvati at u.washington.edu Wed Jun 27 16:10:13 2012 From: jsalvati at u.washington.edu (John Salvatier) Date: Wed, 27 Jun 2012 13:10:13 -0700 Subject: [Numpy-discussion] What's the most numpythonic way to support multiple types in a C extension? In-Reply-To: References: Message-ID: Thanks nathaniel, that does tricky... On Wed, Jun 27, 2012 at 9:25 AM, Nathaniel Smith wrote: > On Tue, Jun 26, 2012 at 10:53 PM, John Salvatier > wrote: > > I want to support multiple types in the index_increment function that > I've > > written here: > > > https://github.com/jsalvatier/numpy/blob/master/numpy/core/src/multiarray/mapping.c > > > > I need to check that the first argument's type can support addition, cast > > the dataptr to the appropriate type and do the addition operation for > that > > type. It looks like some of the numpy code uses .c.src files to do > > templating. Is that what I want to do here? Is the syntax described > > somewhere? > > The proper way would be use the ufunc machinery, which already knows > how to perform addition on arbitrary numpy dtypes... unfortunately > this may be more complicated than you are hoping :-/. > > Since there's nothing about this operation that is specific to the > addition operation or to the double type, I guess the ideal API would > actually be something like, an extra method added to binary ufuncs > np.add.inplace_indexed(a, idx, b) > which would be equivalent to > a[idx] += b > except that duplicate indices would be handled properly, and it would > avoid making a copy in the case of fancy indexing. You could look at > the implementation of ufunc.reduceat > (numpy/core/src/umath/ufunc_object.c:PyUFunc_Reduceat) for an idea of > how such fancy ufunc methods can be done. > > (An even more ideal API would find some way to make this work > naturally with where=, but it's not obvious to me how that would > work.) > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Wed Jun 27 16:51:29 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 27 Jun 2012 22:51:29 +0200 Subject: [Numpy-discussion] NumPy 1.7 release plans In-Reply-To: References: <25C91477-BEBA-42DE-AE84-7498C9426F2D@continuum.io> Message-ID: On Wed, Jun 27, 2012 at 1:45 PM, Charles R Harris wrote: > > > On Tue, Jun 26, 2012 at 11:08 PM, Travis Oliphant wrote: > >> In my enthusiasm of finding someone to help with the release of NumPy 1.7 >> and my desire to get something released by the SciPy conference, I was >> hasty and didn't gather enough feedback from others about the release of >> NumPy 1.7. I'm sorry about that. >> >> I would like to get NumPy 1.7 out the door as quickly as we can *and* >> make sure it is as well tested as we can --- in fact I'm hoping we can also >> use this opportunity to setup a Continuous Integration system for NumPy >> that will essentially extend NumPy's testing infrastructure and make it >> easier to do releases in the future. Ondrej, the author of SymPy, has >> agreed to help on the both the release and the Continuous Integration side. >> Ideally we would also start producing a code coverage report and a vbench >> report for NumPy as well. This is much more likely to happen if there are >> other people willing to pictch in >> >> (By the way, one of the goals of NumFOCUS is to provide Continuous >> Integration and code coverage resources to all of the Scientific Python >> projects as funds and community resources become available --- please email >> numfocus at googlegroups.com if you are interested in helping with that >> effort). >> >> So, I would propose a code-freeze by July 13th with a beta release of >> NumPy 1.7 by July 17th. We will work to get that beta release actively >> tested by as many projects as possible, leading to a release candidate by >> July 31. If all goes well I could imagine a release by August 14. If >> we need to make another release candidate, then we can do that August 14th >> and push the release to August 28th. >> >> Let me know if there are any concerns about this updated schedule. >> >> > That schedule sounds good to me. > Sounds good to me too. If no one else has gotten around to it by then, I'll make some time to merge the wiki doc edits the week before the 13th. > I thought Nathaniel did excellent work in getting tox and Travis CI > started up. Kudos there. > > Seconded. It's already quite useful. Ralf Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Wed Jun 27 16:46:49 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 27 Jun 2012 22:46:49 +0200 Subject: [Numpy-discussion] Preferring gfortran over g77 on OS X and other distributions? In-Reply-To: References: Message-ID: On Mon, Jun 18, 2012 at 9:47 AM, Aron Ahmadia wrote: > f2py, by default, seems to prefer g77 (no longer maintained, deprecated, > speedy, doesn't support Fortran 90 or Fortran 95) over gfortran > (maintained, slower, Fortran 90 and Fortran 95 support). > > This causes problems when we try to compile Fortran 90 extensions using > f2py on platforms where both g77 and gfortran are installed without > manually switching the compiler's flags. It is a very minor edit to the > fcompiler/__init__.py file to prefer gfortran over g77 on OS X, and I can > think of almost no reason not to do so, since the Vectorize framework (OS X > tuned LAPACK/BLAS) appears to be ABI compatible with gfortran. I am not > sure what the situation is on the distributions that numpy is trying to > support, but my feeling is that g77 should not be preferred when gfortran > is available. > On Windows g77 is still the default. But indeed, on OS X gfortran is the recommended Fortran compiler. A PR for this would be useful. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From aron at ahmadia.net Wed Jun 27 17:26:02 2012 From: aron at ahmadia.net (Aron Ahmadia) Date: Wed, 27 Jun 2012 23:26:02 +0200 Subject: [Numpy-discussion] Preferring gfortran over g77 on OS X and other distributions? In-Reply-To: References: Message-ID: I've promoted gfortran to be the default compiler on OS X over vendor compilers (to be more compatible with Linux), and made a similar adjustment for the platform detection. I've promoted gfortran over g77 but not vendor compilers on the other 'nixes. I left the Windows compiler options alone. https://github.com/numpy/numpy/pull/325 A On Wed, Jun 27, 2012 at 10:46 PM, Ralf Gommers wrote: > > > On Mon, Jun 18, 2012 at 9:47 AM, Aron Ahmadia wrote: > >> f2py, by default, seems to prefer g77 (no longer maintained, deprecated, >> speedy, doesn't support Fortran 90 or Fortran 95) over gfortran >> (maintained, slower, Fortran 90 and Fortran 95 support). >> >> This causes problems when we try to compile Fortran 90 extensions using >> f2py on platforms where both g77 and gfortran are installed without >> manually switching the compiler's flags. It is a very minor edit to the >> fcompiler/__init__.py file to prefer gfortran over g77 on OS X, and I can >> think of almost no reason not to do so, since the Vectorize framework (OS X >> tuned LAPACK/BLAS) appears to be ABI compatible with gfortran. I am not >> sure what the situation is on the distributions that numpy is trying to >> support, but my feeling is that g77 should not be preferred when gfortran >> is available. >> > > On Windows g77 is still the default. But indeed, on OS X gfortran is the > recommended Fortran compiler. A PR for this would be useful. > > Ralf > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From x.piter at gmail.com Wed Jun 27 17:38:15 2012 From: x.piter at gmail.com (x.piter at gmail.com) Date: Wed, 27 Jun 2012 23:38:15 +0200 Subject: [Numpy-discussion] dot() function question Message-ID: <87zk7oigfs.fsf@cica.cica> Hi list. I have got completely cunfused with the numpy.dot() function. dot(A,B) does: - matrix multiplication if A and B are of MxN and NxK sizey - dot product if A and B are of size M How how can I perform matrix multiplication of two vectors? (in matlab I do it like a*a') Thanks. Petro. From warren.weckesser at enthought.com Wed Jun 27 18:01:38 2012 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Wed, 27 Jun 2012 17:01:38 -0500 Subject: [Numpy-discussion] dot() function question In-Reply-To: <87zk7oigfs.fsf@cica.cica> References: <87zk7oigfs.fsf@cica.cica> Message-ID: On Wed, Jun 27, 2012 at 4:38 PM, wrote: > Hi list. > I have got completely cunfused with the numpy.dot() function. > dot(A,B) does: > - matrix multiplication if A and B are of MxN and NxK sizey > - dot product if A and B are of size M > How how can I perform matrix multiplication of two vectors? > (in matlab I do it like a*a') > If 'a' is a 1D numpy array, you can use numpy.outer: In [6]: a = array([1, -2, 3]) In [7]: outer(a, a) Out[7]: array([[ 1, -2, 3], [-2, 4, -6], [ 3, -6, 9]]) Warren > Thanks. > Petro. > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shailendra.vikas at gmail.com Wed Jun 27 19:38:02 2012 From: shailendra.vikas at gmail.com (astronomer) Date: Wed, 27 Jun 2012 16:38:02 -0700 (PDT) Subject: [Numpy-discussion] memory allocation at assignment Message-ID: <34083731.post@talk.nabble.com> Hi All, I am wondering if there any difference in memory overhead between the following code. a=numpy.arange(10) b=numpy.arange(10) c=a+b and a=numpy.arange(10) b=numpy.arange(10) c=numpy.empty_likes(a) c[:]=a+b Does the later code make a temproray array for the result of (a+b) and then copy it to c. I beleive it does that, but i wanted to make sure. Thanks, -- View this message in context: http://old.nabble.com/memory-allocation-at-assignment-tp34083731p34083731.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From jsalvati at u.washington.edu Wed Jun 27 19:47:37 2012 From: jsalvati at u.washington.edu (John Salvatier) Date: Wed, 27 Jun 2012 16:47:37 -0700 Subject: [Numpy-discussion] Would a patch with a function for incrementing an array with advanced indexing be accepted? In-Reply-To: References: Message-ID: I've submitted a pull request ( https://github.com/numpy/numpy/pull/326 ). I'm new to the numpy and python internals, so feedback is greatly appreciated. On Tue, Jun 26, 2012 at 12:10 PM, Travis Oliphant wrote: > > On Jun 26, 2012, at 1:34 PM, Fr?d?ric Bastien wrote: > > > Hi, > > > > I think he was referring that making NUMPY_ARRAY_OBJECT[...] syntax > > support the operation that you said is hard. But having a separate > > function do it is less complicated as you said. > > Yes. That's precisely what I meant. Thank you for clarifying. > > -Travis > > > > > Fred > > > > On Tue, Jun 26, 2012 at 1:27 PM, John Salvatier > > wrote: > >> Can you clarify why it would be super hard? I just reused the code for > >> advanced indexing (a modification of PyArray_SetMap). Am I missing > something > >> crucial? > >> > >> > >> > >> On Tue, Jun 26, 2012 at 9:57 AM, Travis Oliphant > >> wrote: > >>> > >>> > >>> On Jun 26, 2012, at 11:46 AM, John Salvatier wrote: > >>> > >>> Hello, > >>> > >>> If you increment an array using advanced indexing and have repeated > >>> indexes, the array doesn't get repeatedly > >>> incremented, > http://comments.gmane.org/gmane.comp.python.numeric.general/50291. > >>> I wrote a C function that does incrementing with repeated indexes > correctly. > >>> The branch is here (https://github.com/jsalvatier/numpy see the last > two > >>> commits). Would a patch with a cleaned up version of a function like > this be > >>> accepted into numpy? I'm not experienced writing numpy C code so I'm > sure it > >>> still needs improvement. > >>> > >>> > >>> This is great. It is an often-requested feature. It's *very > difficult* > >>> to do without changing fundamentally what NumPy is. But, yes this > would be > >>> a great pull request. > >>> > >>> Thanks, > >>> > >>> -Travis > >>> > >>> > >>> > >>> _______________________________________________ > >>> NumPy-Discussion mailing list > >>> NumPy-Discussion at scipy.org > >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >>> > >> > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Jun 27 20:34:28 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 28 Jun 2012 01:34:28 +0100 Subject: [Numpy-discussion] memory allocation at assignment In-Reply-To: <34083731.post@talk.nabble.com> References: <34083731.post@talk.nabble.com> Message-ID: On Thu, Jun 28, 2012 at 12:38 AM, astronomer wrote: > > Hi All, > I am wondering if there any difference in memory overhead between the > following code. > a=numpy.arange(10) > b=numpy.arange(10) > c=a+b > > and > a=numpy.arange(10) > b=numpy.arange(10) > c=numpy.empty_likes(a) > c[:]=a+b > > Does the later code make a temproray array for the result of (a+b) and then > copy it to c. I beleive it does that, but i wanted to make sure. Yes it does. If you want to avoid this extra copy, and have a pre-existing output array, you can do: np.add(a, b, out=c) ('+' on numpy array's is just a synonym for np.add; np.add is a ufunc, and all ufunc's accept this syntax: http://docs.scipy.org/doc/numpy/reference/ufuncs.html ) -n From klonuo at gmail.com Wed Jun 27 22:27:42 2012 From: klonuo at gmail.com (klo uo) Date: Thu, 28 Jun 2012 04:27:42 +0200 Subject: [Numpy-discussion] Numpy logo in VTK In-Reply-To: <4FEB361E.8090509@it.uu.se> References: <4FEB361E.8090509@it.uu.se> Message-ID: In the first version this line: ax.bar3d([i], [0], [i], [.9], [.1], [.9], color='y', linewidth=.1) is responsible for diagonal in N, and it is inverted. In the second version you quoted this is corrected with: ax.bar3d([3-i], [0], [i], [.9], [.1], [.9], color='y', linewidth=.1) Also snippet for clearing axis decorations, (grid, ticks, lines...) is posted separately besides first version. Anyhow attached python script in later mail (with youtube link) has all this together plus anim sequence On Wed, Jun 27, 2012 at 6:34 PM, Virgil Stokes wrote: > On 27-Jun-2012 08:04, klo uo wrote: >> from numpy import arange, ones >> import matplotlib.pyplot as plt >> from mpl_toolkits.mplot3d import Axes3D >> >> fig = plt.figure() >> ax = fig.add_subplot(111, projection='3d') >> >> o = ones(4) >> r = arange(4) >> >> # planes: >> for z in arange(3)+1: >> ? ? ax.bar(r, o*4, zs=z, zdir='x', alpha=.05, width=1) >> ? ? ax.bar(r, o*4, zs=z, zdir='y', alpha=.05, width=1) >> ? ? ax.bar(r, o*4, zs=z, zdir='z', alpha=.05, width=1) >> >> # N >> for i in [1, 2]: >> ? ? ?ax.bar3d([3-i], [0], [i], [.9], [.1], [.9], color='y', linewidth=.1) >> ? ? ?ax.bar3d(o+(i*(-1)**i), o-1, r, o-.1, o-.9, o-.1, color='y', linewidth=.1) >> >> # cage >> ax.bar3d([0], [0], [0], [4], [4], [4], alpha=.05, color='w', linewidth=0) >> >> plt.show() >> # plt.savefig('numpy.png') > Umh... > The first version that you posted looks ok on my screen (N is not inverted). And > this version shows no difference in the "N"; but, it does show tick marks > labeled with numerical values. > > --V > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From srean.list at gmail.com Thu Jun 28 01:44:36 2012 From: srean.list at gmail.com (srean) Date: Thu, 28 Jun 2012 00:44:36 -0500 Subject: [Numpy-discussion] memory allocation at assignment In-Reply-To: References: <34083731.post@talk.nabble.com> Message-ID: > Yes it does. If you want to avoid this extra copy, and have a > pre-existing output array, you can do: > > np.add(a, b, out=c) > > ('+' on numpy array's is just a synonym for np.add; np.add is a ufunc, > and all ufunc's accept this syntax: > ?http://docs.scipy.org/doc/numpy/reference/ufuncs.html > ) Is the creation of the tmp as expensive as creation of a new numpy array or is it somewhat lighter weight (like being just a data buffer). I sometimes use the c[:] syntax thinking I might benefit from numpy.array re-use. But now I think that was misguided. From travis at continuum.io Thu Jun 28 08:50:58 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 28 Jun 2012 07:50:58 -0500 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> <895DAD05-5B24-4EBC-84F4-10442A4531B3@continuum.io> <2ADA61BB-148A-4BF8-B23B-BB1852C56801@continuum.io> Message-ID: <244AAC5A-BC22-4F1C-B67E-DEB41C63B3A1@continuum.io> On Jun 27, 2012, at 1:18 AM, Fernando Perez wrote: > On Tue, Jun 26, 2012 at 11:02 PM, Travis Oliphant wrote: >> I just want to speak up for the people who are affected by API breakage who are not as vocal on this list. > > Certainly! And indeed I bet you that's a community underrepresented > here: those of us who are on this list are likely to be up to speed on > what's happening with the API and can therefore adjust to changes > quickly, simply because we know they have occurred. Random J. User > who gets an upstream update and all of a sudden finds previously > working code to break is unlikely to be active here and will be very, > very unhappy > If anything, the lesson is: for a project that's so deep in the > dependency tree as numpy is, A{P,B}I stability is a paramount concern, > with a cost that gets higher the more successful the project is. This > means AXIs should evolve only in backwards-compatible ways when at all > possible, with backwards-compatibility being broken only in: > > - clearly designated points that are agreed upon by as many as possible > - with clear explanations of how old codes need to be adapted to the > new interface to continue working > - if at all possible with advance warnings, and even better, a system > for 'future' loading. > This is a good reminder. I agree with your views here. I've not been able to communicate very well my attitudes on this and I've been saddened at how eager some seem to pick apart my words to find problems with them. My discussion about the ABI and API breakage should not be taken as an assertion that I don't recognize that ABI breakage is bad and has consequences. I'm a little surprised that people assume I haven't been listening or paying attention or something. But, I recognize that I don't always communicate clearly enough. I do understand the consequences of ABI breakage. I also understand the pain involved. I have no plans to break the ABI. There is a certain group who is affected by ABI breakage and another group *more* affected by API breakage. It feels like this list is particularly populated with people who feel pain by ABI breakage whereas the people who feel pain with API breakage are not as vocal, don' t track this list, etc. But, their stories are just as compelling to me. I understand the pain they feel as well when the NumPy API breaks. It's just as important that we take them into consideration. That's my only point. Right now, though, arguing over the relative importance of ABI or API breakage is moot. I was simply pointing out my perspective that I think a single ABI breakage in 1.5.0 would have been better than the API and use-case breakages that have been reported (I know these are only very weakly correlated so it's just an analogy). If you disagree with me, that's fine. Just understand that any frustration you feel about the thought of ABI breakage is the same as the frustration I feel about changes that cause working code to break for people. I also understand that it's not quite the same thing because the phrase "changes that cause working code to break" is too strong. Some code that "works" has "work-arounds and hacks" and assumptions about APIs. In other word, it is possible that some NumPy-dependent code out there works "accidentally". Of course, what is a "hack" or an "accidental" usage is not at all clear. I can't define it. It takes judgment to make a decision. This judgment requires an awareness of the "intention of the original" code, how big the user-base is of the group that is making the "hack". How difficult it is to remedy the situation, etc. These are hard problems. I don't claim to understand how to solve all of them. I don't claim that I won't make serious mistakes. All I can do is offer my experience, my awareness of the code history (including the code history of Numeric and Numarray), and my interactions with many downstream users. We need good judgment from as many NumPy developers as possible. That judgment must be colored with empathy for as many users of NumPy as possible. Best, -Travis > > Python in fact has the __future__ imports that help quite a bit for > people to start adapting their codes. How about creating a > numpy.future module where new, non-backward-compatible APIs could go? > That would give the adventurous a way to play with new features (hence > getting them better tested) as well as an easier path for gradual > migration to the new features by everyone. > > This may have already been discussed before, forgive me if I'm > repeating well-known material. This is a > > Cheers, > > f > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From srean.list at gmail.com Thu Jun 28 00:38:09 2012 From: srean.list at gmail.com (srean) Date: Wed, 27 Jun 2012 23:38:09 -0500 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow Message-ID: Hi List, this has been brought up several times, and the response has been generally positive but it has fallen through the cracks. So here are a few repeat requests. Am keeping it terse just for brevity i) Split the list into [devel] and [help] and as was mentioned recently [rant/flame]: some request for help get drowned out during active development related discussions and simple help requests pollutes more urgent development related matters. ii) Stackoverflow like site for help as well as for proposals. The silent majority has been referred to a few times recently. I suspect there does exist many lurkers on the list who do prefer one discussed solution over the other but for various reasons do not break out of their lurk mode to send a mail saying "I prefer this solution". Such an interface will also help in keeping track of the level of support as compared to mails that are larges hunks of quoted text with a line or two stating ones preference or seconding a proposal. One thing I have learned from traffic accidents is that if one asks for a help of the assembled crowd, no one knows how to respond. On the other hand if you say "hey there in a blue shirt could you get some water" you get instant results. So pardon me for taking the presumptuous liberty to request Travis to please set it up or delegate. Splitting the lists shouldn't be hard work, setting up overflow might be more work in comparison. Best -- srean From pierre.haessig at crans.org Thu Jun 28 02:20:00 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Thu, 28 Jun 2012 08:20:00 +0200 Subject: [Numpy-discussion] memory allocation at assignment In-Reply-To: References: <34083731.post@talk.nabble.com> Message-ID: <4FEBF790.2000303@crans.org> Le 28/06/2012 02:34, Nathaniel Smith a ?crit : > Yes it does. If you want to avoid this extra copy, and have a > pre-existing output array, you can do: > > np.add(a, b, out=c) And is there a temporary copy when using inplace operators like: c = a.copy() c += b Is there a temporary (c+b) array which is then assigned to c, or is it really an inplace assignment as the operator += would suggest ? Pierre -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From travis at continuum.io Thu Jun 28 08:25:30 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 28 Jun 2012 07:25:30 -0500 Subject: [Numpy-discussion] Dropping support for Python 2.4 in NumPy 1.8 Message-ID: Hey all, I'd like to propose dropping support for Python 2.4 in NumPy 1.8 (not the 1.7 release). What does everyone think of that? -Travis From aron at ahmadia.net Thu Jun 28 09:30:37 2012 From: aron at ahmadia.net (Aron Ahmadia) Date: Thu, 28 Jun 2012 15:30:37 +0200 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: Message-ID: We try to support numpy questions on http://scicomp.stackexchange.com, which is a StackOverflow site dedicated towards technical computing issues that gets a fair amount of traffic from mathematicians and computational scientists. We could always use more questions and answerers :) A On Thu, Jun 28, 2012 at 6:38 AM, srean wrote: > Hi List, > > this has been brought up several times, and the response has been > generally positive but it has fallen through the cracks. So here are a > few repeat requests. Am keeping it terse just for brevity > > i) Split the list into [devel] and [help] and as was mentioned > recently [rant/flame]: > > some request for help get drowned out during active development > related discussions and simple help requests pollutes more urgent > development related matters. > > ii) Stackoverflow like site for help as well as for proposals. > > The silent majority has been referred to a few times recently. I > suspect there does exist many lurkers on the list who do prefer one > discussed solution over the other but for various reasons do not break > out of their lurk mode to send a mail saying "I prefer this solution". > Such an interface will also help in keeping track of the level of > support as compared to mails that are larges hunks of quoted text with > a line or two stating ones preference or seconding a proposal. > > One thing I have learned from traffic accidents is that if one asks > for a help of the assembled crowd, no one knows how to respond. On the > other hand if you say "hey there in a blue shirt could you get some > water" you get instant results. So pardon me for taking the > presumptuous liberty to request Travis to please set it up or > delegate. > > Splitting the lists shouldn't be hard work, setting up overflow might > be more work in comparison. > > Best > -- srean > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Thu Jun 28 09:28:58 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 28 Jun 2012 08:28:58 -0500 Subject: [Numpy-discussion] memory allocation at assignment In-Reply-To: References: <34083731.post@talk.nabble.com> Message-ID: Yes, the creation of the tmp *is* the creation of a new NumPy array. So, it is as expensive. Travis -- Travis Oliphant (on a mobile) 512-826-7480 On Jun 28, 2012, at 12:44 AM, srean wrote: >> Yes it does. If you want to avoid this extra copy, and have a >> pre-existing output array, you can do: >> >> np.add(a, b, out=c) >> >> ('+' on numpy array's is just a synonym for np.add; np.add is a ufunc, >> and all ufunc's accept this syntax: >> http://docs.scipy.org/doc/numpy/reference/ufuncs.html >> ) > > > Is the creation of the tmp as expensive as creation of a new numpy > array or is it somewhat lighter weight (like being just a data > buffer). I sometimes use the c[:] syntax thinking I might benefit from > numpy.array re-use. But now I think that was misguided. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From travis at continuum.io Thu Jun 28 09:33:07 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 28 Jun 2012 08:33:07 -0500 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: Message-ID: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> There are some good ideas here. I propose splitting this list into devel and users lists. This might best be done by creating a new list for users and using this list for development. Travis -- Travis Oliphant (on a mobile) 512-826-7480 On Jun 27, 2012, at 11:38 PM, srean wrote: > Hi List, > > this has been brought up several times, and the response has been > generally positive but it has fallen through the cracks. So here are a > few repeat requests. Am keeping it terse just for brevity > > i) Split the list into [devel] and [help] and as was mentioned > recently [rant/flame]: > > some request for help get drowned out during active development > related discussions and simple help requests pollutes more urgent > development related matters. > > ii) Stackoverflow like site for help as well as for proposals. > > The silent majority has been referred to a few times recently. I > suspect there does exist many lurkers on the list who do prefer one > discussed solution over the other but for various reasons do not break > out of their lurk mode to send a mail saying "I prefer this solution". > Such an interface will also help in keeping track of the level of > support as compared to mails that are larges hunks of quoted text with > a line or two stating ones preference or seconding a proposal. > > One thing I have learned from traffic accidents is that if one asks > for a help of the assembled crowd, no one knows how to respond. On the > other hand if you say "hey there in a blue shirt could you get some > water" you get instant results. So pardon me for taking the > presumptuous liberty to request Travis to please set it up or > delegate. > > Splitting the lists shouldn't be hard work, setting up overflow might > be more work in comparison. > > Best > -- srean > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From travis at continuum.io Thu Jun 28 09:35:48 2012 From: travis at continuum.io (Travis Oliphant) Date: Thu, 28 Jun 2012 08:35:48 -0500 Subject: [Numpy-discussion] memory allocation at assignment In-Reply-To: <4FEBF790.2000303@crans.org> References: <34083731.post@talk.nabble.com> <4FEBF790.2000303@crans.org> Message-ID: <39908C1C-80B6-434C-8832-437F9FC37D36@continuum.io> -- Travis Oliphant (on a mobile) 512-826-7480 On Jun 28, 2012, at 1:20 AM, Pierre Haessig wrote: > Le 28/06/2012 02:34, Nathaniel Smith a ?crit : >> >> Yes it does. If you want to avoid this extra copy, and have a >> pre-existing output array, you can do: >> >> np.add(a, b, out=c) > And is there a temporary copy when using inplace operators like: > > c = a.copy() > c += b > > Is there a temporary (c+b) array which is then assigned to c, or is it really an inplace assignment as the operator += would suggest ? > It really is inplace. As Nathaniel mentioned --- all ufuncs take an out keyword. The inplace mechanism uses this so that one input and the output are the same. Travis > Pierre > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ondrej.certik at gmail.com Thu Jun 28 07:53:19 2012 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Thu, 28 Jun 2012 04:53:19 -0700 Subject: [Numpy-discussion] Non-deterministic test failure in master In-Reply-To: References: Message-ID: Hi Nathaniel, On Wed, Jun 27, 2012 at 11:22 AM, Nathaniel Smith wrote: > According to the Travis-CI build logs, this code produces > non-deterministic behaviour in master: > > ?a = np.arange(5) > ?a[:3] = a[2:] > ?assert_equal(a, [2, 3, 4, 3, 4]) > > Sometimes 'a' is [2, 3, 4, 3, 4], and sometimes it is [4, 3, 4, 3, 4]. > The latter is what you get if the assignment is done 'backwards', > like: > ?a[2] = a[4] > ?a[1] = a[3] > ?a[0] = a[2] > > For example, in this build the above test failed on Python 3.2 (but > passed on all other versions): > ?http://travis-ci.org/#!/numpy/numpy/jobs/1676766 > while in this build, it failed on Python 2.5 (but passed on all other versions): > ?http://travis-ci.org/#!/numpy/numpy/jobs/1722121 > > Looks like we have a memcpy somewhere that should be a memmove? I also noticed this failure a few days ago. What do you think is the best way to debug this? I don't know how to reproduce it. Any ideas? Ondrej From cournape at gmail.com Thu Jun 28 09:53:14 2012 From: cournape at gmail.com (David Cournapeau) Date: Thu, 28 Jun 2012 14:53:14 +0100 Subject: [Numpy-discussion] Dropping support for Python 2.4 in NumPy 1.8 In-Reply-To: References: Message-ID: Hi Travis, On Thu, Jun 28, 2012 at 1:25 PM, Travis Oliphant wrote: > Hey all, > > I'd like to propose dropping support for Python 2.4 in NumPy 1.8 (not the 1.7 release). ? ? ?What does everyone think of that? I think it would depend on 1.7 state. I am unwilling to drop support for 2.4 in 1.8 unless we make 1.7 a LTS, that would be supported up to 2014 Q1 (when RHEL5 stops getting security fixes - RHEL 5 is the one platform that warrants supporting 2.4 IMO) In my mind, it means 1.7 needs to be stable. Ondrej (and others) work to make sure we break neither API or ABI since a few releases would help achieving that. David From pierre.haessig at crans.org Thu Jun 28 02:13:56 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Thu, 28 Jun 2012 08:13:56 +0200 Subject: [Numpy-discussion] Non-deterministic test failure in master In-Reply-To: References: Message-ID: <4FEBF624.9090906@crans.org> Hi Nathaniel, Le 27/06/2012 20:22, Nathaniel Smith a ?crit : > According to the Travis-CI build logs, this code produces > non-deterministic behaviour in master: You mean non-deterministic across different builds, not across different executions on the same build, right ? I just ran a small loop : N = 10000 N_good = 0 for i in range(N): a = np.arange(5) a[:3] = a[2:] if (a == [2,3,4,3,4]).all(): N_good += 1 print 'good result : %d/%d' % (N_good, N) and got 100 % good replication. Best, Pierre -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From shailendra.vikas at gmail.com Wed Jun 27 23:46:07 2012 From: shailendra.vikas at gmail.com (astronomer) Date: Wed, 27 Jun 2012 20:46:07 -0700 (PDT) Subject: [Numpy-discussion] memory allocation at assignment In-Reply-To: References: <34083731.post@talk.nabble.com> Message-ID: <34084248.post@talk.nabble.com> Hi Nathaniel, Thanks for the clearing my understand. This is exactly what i needed. Thanks, Nathaniel Smith wrote: > > On Thu, Jun 28, 2012 at 12:38 AM, astronomer > wrote: >> >> Hi All, >> I am wondering if there any difference in memory overhead between the >> following code. >> a=numpy.arange(10) >> b=numpy.arange(10) >> c=a+b >> >> and >> a=numpy.arange(10) >> b=numpy.arange(10) >> c=numpy.empty_likes(a) >> c[:]=a+b >> >> Does the later code make a temproray array for the result of (a+b) and >> then >> copy it to c. I beleive it does that, but i wanted to make sure. > > Yes it does. If you want to avoid this extra copy, and have a > pre-existing output array, you can do: > > np.add(a, b, out=c) > > ('+' on numpy array's is just a synonym for np.add; np.add is a ufunc, > and all ufunc's accept this syntax: > http://docs.scipy.org/doc/numpy/reference/ufuncs.html > ) > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- View this message in context: http://old.nabble.com/memory-allocation-at-assignment-tp34083731p34084248.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From rhattersley at gmail.com Thu Jun 28 10:32:33 2012 From: rhattersley at gmail.com (Richard Hattersley) Date: Thu, 28 Jun 2012 15:32:33 +0100 Subject: [Numpy-discussion] Dropping support for Python 2.4 in NumPy 1.8 In-Reply-To: References: Message-ID: The project/environment we work with already targets Python 2.7, so it'd be fine for us and our collaborators. But it's hard to comment in a more altruistic way without knowing the impact of the change. Is it possible to summarise the benefits? (e.g. Simplifies NumPy codebase; allows better support for XXX under 2.5+; ...) On 28 June 2012 13:25, Travis Oliphant wrote: > Hey all, > > I'd like to propose dropping support for Python 2.4 in NumPy 1.8 (not the > 1.7 release). What does everyone think of that? > > -Travis > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Thu Jun 28 10:42:18 2012 From: shish at keba.be (Olivier Delalleau) Date: Thu, 28 Jun 2012 10:42:18 -0400 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> Message-ID: +1 for a numpy-users list without "dev noise". -=- Olivier 2012/6/28 Travis Oliphant > There are some good ideas here. > > I propose splitting this list into devel and users lists. > > This might best be done by creating a new list for users and using this > list for development. > > Travis > > -- > Travis Oliphant > (on a mobile) > 512-826-7480 > > > On Jun 27, 2012, at 11:38 PM, srean wrote: > > > Hi List, > > > > this has been brought up several times, and the response has been > > generally positive but it has fallen through the cracks. So here are a > > few repeat requests. Am keeping it terse just for brevity > > > > i) Split the list into [devel] and [help] and as was mentioned > > recently [rant/flame]: > > > > some request for help get drowned out during active development > > related discussions and simple help requests pollutes more urgent > > development related matters. > > > > ii) Stackoverflow like site for help as well as for proposals. > > > > The silent majority has been referred to a few times recently. I > > suspect there does exist many lurkers on the list who do prefer one > > discussed solution over the other but for various reasons do not break > > out of their lurk mode to send a mail saying "I prefer this solution". > > Such an interface will also help in keeping track of the level of > > support as compared to mails that are larges hunks of quoted text with > > a line or two stating ones preference or seconding a proposal. > > > > One thing I have learned from traffic accidents is that if one asks > > for a help of the assembled crowd, no one knows how to respond. On the > > other hand if you say "hey there in a blue shirt could you get some > > water" you get instant results. So pardon me for taking the > > presumptuous liberty to request Travis to please set it up or > > delegate. > > > > Splitting the lists shouldn't be hard work, setting up overflow might > > be more work in comparison. > > > > Best > > -- srean > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Thu Jun 28 10:44:54 2012 From: shish at keba.be (Olivier Delalleau) Date: Thu, 28 Jun 2012 10:44:54 -0400 Subject: [Numpy-discussion] Dropping support for Python 2.4 in NumPy 1.8 In-Reply-To: References: Message-ID: 2012/6/28 David Cournapeau > Hi Travis, > > On Thu, Jun 28, 2012 at 1:25 PM, Travis Oliphant > wrote: > > Hey all, > > > > I'd like to propose dropping support for Python 2.4 in NumPy 1.8 (not > the 1.7 release). What does everyone think of that? > > I think it would depend on 1.7 state. I am unwilling to drop support > for 2.4 in 1.8 unless we make 1.7 a LTS, that would be supported up to > 2014 Q1 (when RHEL5 stops getting security fixes - RHEL 5 is the one > platform that warrants supporting 2.4 IMO) > > In my mind, it means 1.7 needs to be stable. Ondrej (and others) work > to make sure we break neither API or ABI since a few releases would > help achieving that. > > David > As a user stuck with Python 2.4 for an undefined period of time, I would definitely appreciate a long-term support release that would retain Python 2.4 compatibility. -=- Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at depagne.org Thu Jun 28 10:52:36 2012 From: eric at depagne.org (=?iso-8859-1?q?=C9ric_Depagne?=) Date: Thu, 28 Jun 2012 16:52:36 +0200 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> Message-ID: <201206281652.36197.eric@depagne.org> Le jeudi 28 juin 2012 15:33:07, Travis Oliphant a ?crit : > There are some good ideas here. > > I propose splitting this list into devel and users lists. > > This might best be done by creating a new list for users and using this > list for development. I second that idea. As one of the silent users of the list, with not (so) much interest in the details of the development (and even less in the public display of personal dislikes , I'd be happy to switch to a more users-oriented list. ?ric. > > Travis > > -- > Travis Oliphant > (on a mobile) > 512-826-7480 > > On Jun 27, 2012, at 11:38 PM, srean wrote: > > Hi List, > > > > this has been brought up several times, and the response has been > > generally positive but it has fallen through the cracks. So here are a > > few repeat requests. Am keeping it terse just for brevity > > > > i) Split the list into [devel] and [help] and as was mentioned > > > > recently [rant/flame]: > > some request for help get drowned out during active development > > > > related discussions and simple help requests pollutes more urgent > > development related matters. > > > > ii) Stackoverflow like site for help as well as for proposals. > > > > The silent majority has been referred to a few times recently. I > > > > suspect there does exist many lurkers on the list who do prefer one > > discussed solution over the other but for various reasons do not break > > out of their lurk mode to send a mail saying "I prefer this solution". > > Such an interface will also help in keeping track of the level of > > support as compared to mails that are larges hunks of quoted text with > > a line or two stating ones preference or seconding a proposal. > > > > One thing I have learned from traffic accidents is that if one asks > > for a help of the assembled crowd, no one knows how to respond. On the > > other hand if you say "hey there in a blue shirt could you get some > > water" you get instant results. So pardon me for taking the > > presumptuous liberty to request Travis to please set it up or > > delegate. > > > > Splitting the lists shouldn't be hard work, setting up overflow might > > be more work in comparison. > > > > Best > > -- srean > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Un clavier azerty en vaut deux ---------------------------------------------------------- ?ric Depagne eric at depagne.org From jdh2358 at gmail.com Thu Jun 28 10:51:32 2012 From: jdh2358 at gmail.com (John Hunter) Date: Thu, 28 Jun 2012 09:51:32 -0500 Subject: [Numpy-discussion] Dropping support for Python 2.4 in NumPy 1.8 In-Reply-To: References: Message-ID: On Thu, Jun 28, 2012 at 7:25 AM, Travis Oliphant wrote: > Hey all, > > I'd like to propose dropping support for Python 2.4 in NumPy 1.8 (not the 1.7 release). ? ? ?What does everyone think of that? As a tangential point, MPL is dropping support for python2.4 in it's next major release. As such we have put a lot of effort in making our upcoming point release extremely stable since it is likely to be the last 2.4 release. Our next major release, either designated 1.2 or 2.0 TBT) will have python3 support, and it seemed to much to try and support python versions from 2.4 on up. From lists at hilboll.de Thu Jun 28 10:54:00 2012 From: lists at hilboll.de (Andreas Hilboll) Date: Thu, 28 Jun 2012 16:54:00 +0200 Subject: [Numpy-discussion] Dropping support for Python 2.4 in NumPy 1.8 In-Reply-To: References: Message-ID: <739a2cae43399dd7bc3c8646a238a32d.squirrel@srv2.s4y.tournesol-consulting.eu> > Hi Travis, > > On Thu, Jun 28, 2012 at 1:25 PM, Travis Oliphant > wrote: >> Hey all, >> >> I'd like to propose dropping support for Python 2.4 in NumPy 1.8 (not >> the 1.7 release). ? ? ?What does everyone think of that? > > I think it would depend on 1.7 state. I am unwilling to drop support > for 2.4 in 1.8 unless we make 1.7 a LTS, that would be supported up to > 2014 Q1 (when RHEL5 stops getting security fixes - RHEL 5 is the one > platform that warrants supporting 2.4 IMO) +1 for the LTS "requirement". There are many people out there who cannot/wantnot install their own python just to support a new NumPy release. Unless, of course, there's compelling reasons to drop support for Python 2.4 (almost) immediately. From tim at cerazone.net Thu Jun 28 10:58:59 2012 From: tim at cerazone.net (Cera, Tim) Date: Thu, 28 Jun 2012 10:58:59 -0400 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: Message-ID: Similar to http://scicomp.stackexchange.com there is http://meta.programmers.stackexchange.com/ intended for programmers. Darn it, there are choices involved! I had proposed http://meta.programmers.stackexchange.com/ on this mailing list earlier and no-one seemed interested, but maybe now the time is right. Kindest regards, Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: From aron at ahmadia.net Thu Jun 28 11:01:58 2012 From: aron at ahmadia.net (Aron Ahmadia) Date: Thu, 28 Jun 2012 17:01:58 +0200 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: Message-ID: Did you mean http://programmers.stackexchange.com? The meta sites on *. stackexchange.com are used (as one might guess) for meta discussions on the site. A On Thu, Jun 28, 2012 at 4:58 PM, Cera, Tim wrote: > Similar to http://scicomp.stackexchange.com there is > http://meta.programmers.stackexchange.com/ intended for programmers. > Darn it, there are choices involved! > > I had proposed http://meta.programmers.stackexchange.com/ on this mailing > list earlier and no-one seemed interested, but maybe now the time is right. > > Kindest regards, > Tim > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Thu Jun 28 11:08:36 2012 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 28 Jun 2012 17:08:36 +0200 Subject: [Numpy-discussion] Dropping support for Python 2.4 in NumPy 1.8 In-Reply-To: References: Message-ID: On Thu, Jun 28, 2012 at 4:44 PM, Olivier Delalleau wrote: > 2012/6/28 David Cournapeau > >> Hi Travis, >> >> On Thu, Jun 28, 2012 at 1:25 PM, Travis Oliphant >> wrote: >> > Hey all, >> > >> > I'd like to propose dropping support for Python 2.4 in NumPy 1.8 (not >> the 1.7 release). What does everyone think of that? >> >> I think it would depend on 1.7 state. I am unwilling to drop support >> for 2.4 in 1.8 unless we make 1.7 a LTS, that would be supported up to >> 2014 Q1 (when RHEL5 stops getting security fixes - RHEL 5 is the one >> platform that warrants supporting 2.4 IMO) >> >> In my mind, it means 1.7 needs to be stable. Ondrej (and others) work >> to make sure we break neither API or ABI since a few releases would >> help achieving that. >> >> David >> > > As a user stuck with Python 2.4 for an undefined period of time, I would > definitely appreciate a long-term support release that would retain Python > 2.4 compatibility. > Hi, I have an honest question for you (and other 2.4 users). Many packages have long since dropped 2.4 compatibility. IPython and scikit-learn require 2.6 as a minimum, scikits-image and statsmodels 2.5. So what do you do about those packages, not use them at all, or use an older version? All those packages are improving (in my opinion) at a much faster rate than numpy. So if you do use them, up-to-date versions of those are likely to be more useful than a new version of numpy. In that light, does keeping 2.4 support really add significant value for you? Regards, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From x.piter at gmail.com Thu Jun 28 01:02:54 2012 From: x.piter at gmail.com (x.piter at gmail.com) Date: Thu, 28 Jun 2012 07:02:54 +0200 Subject: [Numpy-discussion] dot() function question References: <87zk7oigfs.fsf@cica.cica> Message-ID: <87bok4111d.fsf@cica.cica> Warren Weckesser writes: > In [6]: a = array([1, -2, 3]) > > In [7]: outer(a, a) > Out[7]: > array([[ 1, -2,? 3], > ?????? [-2,? 4, -6], > ?????? [ 3, -6,? 9]]) > > Warren Thanks, It is much nicer then my method of adding a zero column. Petro. From shish at keba.be Thu Jun 28 11:15:18 2012 From: shish at keba.be (Olivier Delalleau) Date: Thu, 28 Jun 2012 11:15:18 -0400 Subject: [Numpy-discussion] Dropping support for Python 2.4 in NumPy 1.8 In-Reply-To: References: Message-ID: 2012/6/28 Ralf Gommers > > > On Thu, Jun 28, 2012 at 4:44 PM, Olivier Delalleau wrote: > >> 2012/6/28 David Cournapeau >> >>> Hi Travis, >>> >>> On Thu, Jun 28, 2012 at 1:25 PM, Travis Oliphant >>> wrote: >>> > Hey all, >>> > >>> > I'd like to propose dropping support for Python 2.4 in NumPy 1.8 (not >>> the 1.7 release). What does everyone think of that? >>> >>> I think it would depend on 1.7 state. I am unwilling to drop support >>> for 2.4 in 1.8 unless we make 1.7 a LTS, that would be supported up to >>> 2014 Q1 (when RHEL5 stops getting security fixes - RHEL 5 is the one >>> platform that warrants supporting 2.4 IMO) >>> >>> In my mind, it means 1.7 needs to be stable. Ondrej (and others) work >>> to make sure we break neither API or ABI since a few releases would >>> help achieving that. >>> >>> David >>> >> >> As a user stuck with Python 2.4 for an undefined period of time, I would >> definitely appreciate a long-term support release that would retain Python >> 2.4 compatibility. >> > > Hi, I have an honest question for you (and other 2.4 users). Many packages > have long since dropped 2.4 compatibility. IPython and scikit-learn require > 2.6 as a minimum, scikits-image and statsmodels 2.5. So what do you do > about those packages, not use them at all, or use an older version? > > All those packages are improving (in my opinion) at a much faster rate > than numpy. So if you do use them, up-to-date versions of those are likely > to be more useful than a new version of numpy. In that light, does keeping > 2.4 support really add significant value for you? > I just don't use any package that is not Python 2.4-compatible. The application I currently work with requires numpy, scipy and theano. I might not need new features from newer numpy versions (not sure), but fixes for bugs and future compatibility issues that may come up would be nice. -=- Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim at cerazone.net Thu Jun 28 11:18:40 2012 From: tim at cerazone.net (Cera, Tim) Date: Thu, 28 Jun 2012 11:18:40 -0400 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: Message-ID: You are correct, I meant http://programmers.stackexchange.com/ And on a site like stackexchange I could actually edit my post instead of my mistake being permanent. :-) Kindest regards, Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.haessig at crans.org Thu Jun 28 12:06:45 2012 From: pierre.haessig at crans.org (Pierre Haessig) Date: Thu, 28 Jun 2012 18:06:45 +0200 Subject: [Numpy-discussion] memory allocation at assignment In-Reply-To: <39908C1C-80B6-434C-8832-437F9FC37D36@continuum.io> References: <34083731.post@talk.nabble.com> <4FEBF790.2000303@crans.org> <39908C1C-80B6-434C-8832-437F9FC37D36@continuum.io> Message-ID: <4FEC8115.6000000@crans.org> Hi, Le 28/06/2012 15:35, Travis Oliphant a ?crit : > It really is inplace. As Nathaniel mentioned --- all ufuncs take an out keyword. > > The inplace mechanism uses this so that one input and the output are the same. Thanks for the feedback about inplace assignment. On the other hand, just like srean mentionned, I think I also misused the "c[:] = a+b" syntax. I feel it's a bit confusing since this way of writing the assignment really feels likes it happens inplace. Good to know it's not the case. Best, Pierre -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 900 bytes Desc: OpenPGP digital signature URL: From tim at cerazone.net Thu Jun 28 12:36:28 2012 From: tim at cerazone.net (Cera, Tim) Date: Thu, 28 Jun 2012 12:36:28 -0400 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: Message-ID: A little more research shows that we could have a http://numpy.stackexchange.com. The requirements are just to have people involved. See http://area51.stackexchange.com/faq for more info. Kindest regards, Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: From srean.list at gmail.com Thu Jun 28 13:43:25 2012 From: srean.list at gmail.com (srean) Date: Thu, 28 Jun 2012 12:43:25 -0500 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: Message-ID: If I remember correctly there used to be a stackexchange site at ask.scipy.org. It might be good to learn from that experience. I think handling with spam was a significant problem, but am not sure whether that is the reson why it got discontinued. Best srean On Thu, Jun 28, 2012 at 11:36 AM, Cera, Tim wrote: > > A little more research shows that we could have a > http://numpy.stackexchange.com. ?The requirements are just to have people > involved. See?http://area51.stackexchange.com/faq?for more info. > > Kindest regards, > Tim From chris.barker at noaa.gov Thu Jun 28 14:04:31 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 28 Jun 2012 11:04:31 -0700 Subject: [Numpy-discussion] memory allocation at assignment In-Reply-To: <4FEC8115.6000000@crans.org> References: <34083731.post@talk.nabble.com> <4FEBF790.2000303@crans.org> <39908C1C-80B6-434C-8832-437F9FC37D36@continuum.io> <4FEC8115.6000000@crans.org> Message-ID: On Thu, Jun 28, 2012 at 9:06 AM, Pierre Haessig > On the other hand, just like srean mentionned, I think I also misused > the "c[:] = a+b" syntax. > I feel it's a bit confusing since this way of writing the assignment > really feels likes it happens inplace. Good to know it's not the case. well, c is being modified in place -- it's the a+b that is creating a new array. so if you have a c around for another purpose (other than to store the result of a+b -- it might make sense to use this approach. Though a little faster might be: c[:] = a c += b -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From chris.barker at noaa.gov Thu Jun 28 14:14:31 2012 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 28 Jun 2012 11:14:31 -0700 Subject: [Numpy-discussion] dot() function question In-Reply-To: <87zk7oigfs.fsf@cica.cica> References: <87zk7oigfs.fsf@cica.cica> Message-ID: On Wed, Jun 27, 2012 at 2:38 PM, wrote: > How how can I perform matrix multiplication of two vectors? > (in matlab I do it like a*a') np.outer is a bit cleaner, I suppose, but you can exactly the same thing you do with matlab if a is a column (single column 2-d array): In [40]: a = np.arange(4).reshape((-1,1)) In [41]: a Out[41]: array([[0], [1], [2], [3]]) In [42]: np.dot(a,a.T) Out[42]: array([[0, 0, 0, 0], [0, 1, 2, 3], [0, 2, 4, 6], [0, 3, 6, 9]]) or, of course, 2 arrays to begin with: In [13]: a = np.arange(4).reshape((4,1)) In [14]: b = np.arange(4).reshape((1,4)) In [15]: np.dot(a,b) Out[15]: array([[0, 0, 0, 0], [0, 1, 2, 3], [0, 2, 4, 6], [0, 3, 6, 9]]) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959?? voice 7600 Sand Point Way NE ??(206) 526-6329?? fax Seattle, WA ?98115 ? ? ??(206) 526-6317?? main reception Chris.Barker at noaa.gov From njs at pobox.com Thu Jun 28 14:42:33 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 28 Jun 2012 19:42:33 +0100 Subject: [Numpy-discussion] memory allocation at assignment In-Reply-To: References: <34083731.post@talk.nabble.com> <4FEBF790.2000303@crans.org> <39908C1C-80B6-434C-8832-437F9FC37D36@continuum.io> <4FEC8115.6000000@crans.org> Message-ID: On Thu, Jun 28, 2012 at 7:04 PM, Chris Barker wrote: > On Thu, Jun 28, 2012 at 9:06 AM, Pierre Haessig > >> On the other hand, just like srean mentionned, I think I also misused >> the "c[:] = a+b" syntax. >> I feel it's a bit confusing since this way of writing the assignment >> really feels likes it happens inplace. Good to know it's not the case. > > well, c is being modified in place -- it's the a+b that is creating a new array. > > so if you have a c around for another purpose (other than to store the > result of a+b -- it might make sense to use this approach. Though a > little faster might be: > > c[:] = a > c += b That should be faster than c[:] = a + b, but still slower than np.add(a, b, out=c). -n From njs at pobox.com Thu Jun 28 15:06:48 2012 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 28 Jun 2012 20:06:48 +0100 Subject: [Numpy-discussion] Non-deterministic test failure in master In-Reply-To: <4FEBF624.9090906@crans.org> References: <4FEBF624.9090906@crans.org> Message-ID: On Thu, Jun 28, 2012 at 7:13 AM, Pierre Haessig wrote: > Hi Nathaniel, > Le 27/06/2012 20:22, Nathaniel Smith a ?crit : >> According to the Travis-CI build logs, this code produces >> non-deterministic behaviour in master: > You mean non-deterministic across different builds, not across different > executions on the same build, right ? > > I just ran a small loop : > > N = 10000 > N_good = 0 > for i in range(N): > ? ?a = np.arange(5) > ? ?a[:3] = a[2:] > ? ?if (a == [2,3,4,3,4]).all(): > ? ? ? ?N_good += 1 > print 'good result : %d/%d' % (N_good, N) > > and got 100 % good replication. Yes, the current hypothesis is that there is one particular Travis-CI machine on which memcpy goes backwards, and so the test fails whenever the build gets assigned to that machine. (Apparently this is actually faster on some CPUs, and new versions of glibc are known to exploit this.) https://github.com/numpy/numpy/pull/324 -n From cournape at gmail.com Thu Jun 28 15:32:35 2012 From: cournape at gmail.com (David Cournapeau) Date: Thu, 28 Jun 2012 20:32:35 +0100 Subject: [Numpy-discussion] Non-deterministic test failure in master In-Reply-To: References: <4FEBF624.9090906@crans.org> Message-ID: On Thu, Jun 28, 2012 at 8:06 PM, Nathaniel Smith wrote: > On Thu, Jun 28, 2012 at 7:13 AM, Pierre Haessig > wrote: >> Hi Nathaniel, >> Le 27/06/2012 20:22, Nathaniel Smith a ?crit : >>> According to the Travis-CI build logs, this code produces >>> non-deterministic behaviour in master: >> You mean non-deterministic across different builds, not across different >> executions on the same build, right ? >> >> I just ran a small loop : >> >> N = 10000 >> N_good = 0 >> for i in range(N): >> ? ?a = np.arange(5) >> ? ?a[:3] = a[2:] >> ? ?if (a == [2,3,4,3,4]).all(): >> ? ? ? ?N_good += 1 >> print 'good result : %d/%d' % (N_good, N) >> >> and got 100 % good replication. > > Yes, the current hypothesis is that there is one particular Travis-CI > machine on which memcpy goes backwards, and so the test fails whenever > the build gets assigned to that machine. (Apparently this is actually > faster on some CPUs, and new versions of glibc are known to exploit > this.) see also this: https://bugzilla.redhat.com/show_bug.cgi?id=638477 David From matthew.brett at gmail.com Thu Jun 28 15:42:29 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 28 Jun 2012 12:42:29 -0700 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> Message-ID: Hi, On Thu, Jun 28, 2012 at 7:42 AM, Olivier Delalleau wrote: > +1 for a numpy-users list without "dev noise". Moderately strong vote against splitting the mailing lists into devel and user. As we know, this list can be unhappy and distracting, but I don't think splitting the lists is the right approach to that problem. Splitting the lists sends the wrong signal. I'd rather that we show by example that the developers listen to all voices, and that the users should expect to become developers. In other words that the boundary between the user and developer is fluid and has no explicit boundaries. As data points, I make no distinction between scipy-devel and scipy-user, nor cython-devel and cython-user. Policing the distinction ('please post this on the user mailing list') is a boring job and doesn't make anyone more cheerful. I don't believe help questions are getting lost any more than devel questions are, but I'm happy to be corrected if someone has some data. Cheers, Matthew From tim at cerazone.net Thu Jun 28 15:46:15 2012 From: tim at cerazone.net (Cera, Tim) Date: Thu, 28 Jun 2012 15:46:15 -0400 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: Message-ID: That is really funny. Looking through the posts, there wasn't any spam (could have been deleted), but it wasn't used as much as I would think. Have to attract people who answer questions. Early on the registration seemed to be a problem. Solace, the software behind ask.scipy.org looks pretty nice, EXCEPT that the last commit was in 2009. On the other have it could be that it has reached perfection. :-) Kindest regards, Tim -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason-sage at creativetrax.com Thu Jun 28 15:51:24 2012 From: jason-sage at creativetrax.com (Jason Grout) Date: Thu, 28 Jun 2012 14:51:24 -0500 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: Message-ID: <4FECB5BC.4090509@creativetrax.com> On 6/28/12 2:46 PM, Cera, Tim wrote: > That is really funny. Looking through the posts, there wasn't any spam > (could have been deleted), but it wasn't used as much as I would think. > Have to attract people who answer questions. Early on the > registration seemed to be a problem. > > Solace, the software behind ask.scipy.org looks > pretty nice, EXCEPT that the last commit was in 2009. On the other have > it could be that it has reached perfection. :-) I'll just note that askbot.org provides a nice platform for ask.sagemath.org (last commit to askbot was yesterday :). I think it's as easy as 'pip install askbot' [1] Jason [1] http://askbot.org/doc/install.html From fperez.net at gmail.com Thu Jun 28 16:30:37 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Thu, 28 Jun 2012 13:30:37 -0700 Subject: [Numpy-discussion] Created NumPy 1.7.x branch In-Reply-To: <244AAC5A-BC22-4F1C-B67E-DEB41C63B3A1@continuum.io> References: <5077EA0D-6FFB-4942-BFD0-76CD6754BECF@continuum.io> <754F229D-014B-4808-B12D-69F830990A2A@stsci.edu> <517F5AB1-F367-455F-8903-A0FF78B65ECA@stsci.edu> <2573665C-974A-4FD1-B799-2678177AD4B3@continuum.io> <895DAD05-5B24-4EBC-84F4-10442A4531B3@continuum.io> <2ADA61BB-148A-4BF8-B23B-BB1852C56801@continuum.io> <244AAC5A-BC22-4F1C-B67E-DEB41C63B3A1@continuum.io> Message-ID: On Thu, Jun 28, 2012 at 5:50 AM, Travis Oliphant wrote: >> Python in fact has the __future__ imports that help quite a bit for >> people to start adapting their codes. ?How about creating a >> numpy.future module where new, non-backward-compatible APIs could go? >> That would give the adventurous a way to play with new features (hence >> getting them better tested) as well as an easier path for gradual >> migration to the new features by everyone. >> >> This may have already been discussed before, forgive me if I'm >> repeating well-known material. > > This is ?a Did you mean to finish a sentence here and hit 'send' earlier than planned? :) Cheers, f From srean.list at gmail.com Thu Jun 28 16:42:35 2012 From: srean.list at gmail.com (srean) Date: Thu, 28 Jun 2012 15:42:35 -0500 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> Message-ID: In case this changes your mind (or assuages fears) just wanted to point out that many open source projects do this. It is not about claiming that one is more important than the other, nor does it reinforce the idea that developers and users live in separate silos, but more of directing the mails to different folders. No policing is required as well, just reply to the author and to the appropriate list. Right now reading numpy-discussion at scipy.org feels a lot like drinking from a fire hydrant when a couple of threads become very active. This is just anecdotal evidence, but I have had mails unanswered when there is one or two threads that are dominating the list. People are human and there will be situations where the top responders will be overburdened and I think the split will mitigate the problem somewhat. For whatever reasons, answering help requests are handled largely by a small set of star responders, though I suspect the answer is available more widely even among comparitively new users. I am hoping (a) that with a separate "ask for help" such enlightened new users can take up the slack (b) the information gets better organized (c) we do not impose on users who are not so interested in devel issues and vice versa. I take interest in devel related issues (apart from the distracting and what at times seem petty flamewars) and like reading the numpy source, but dont think every user have similar tastes neither should they. Best Srean On Thu, Jun 28, 2012 at 2:42 PM, Matthew Brett wrote: > Hi, > > On Thu, Jun 28, 2012 at 7:42 AM, Olivier Delalleau wrote: >> +1 for a numpy-users list without "dev noise". > > Moderately strong vote against splitting the mailing lists into devel and user. > > As we know, this list can be unhappy and distracting, but I don't > think splitting the lists is the right approach to that problem. > > Splitting the lists sends the wrong signal. ?I'd rather that we show > by example that the developers listen to all voices, and that the > users should expect to become developers. In other words that the > boundary between the user and developer is fluid and has no explicit > boundaries. > > As data points, I make no distinction between scipy-devel and > scipy-user, nor cython-devel and cython-user. ?Policing the > distinction ('please post this on the user mailing list') is a boring > job and doesn't make anyone more cheerful. > > I don't believe help questions are getting lost any more than devel > questions are, but I'm happy to be corrected if someone has some data. > > Cheers, > > Matthew From matthew.brett at gmail.com Thu Jun 28 17:07:46 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 28 Jun 2012 14:07:46 -0700 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> Message-ID: Hi, On Thu, Jun 28, 2012 at 1:42 PM, srean wrote: > In case this changes your mind (or assuages fears) just wanted to > point out that many open source projects do this. It is not about > claiming that one is more important than the other, nor does it > reinforce the idea that developers and users live in separate silos, > but more of directing the mails to different folders. No policing is > required as well, just reply to the author and to the appropriate > list. Yes, I know this split is common, but I don't think it works very well. I see that sympy, for example, has only one mailing list, and that works extremely well. I'd be interested to hear from the Cython and IPython guys as to whether they feel the user / devel split has helped or hurt. Ferando? Dag? And I continue to think it sends the wrong message. My impression is that, at the moment, we numpy-ers are trying to work out what kind of community we are. Are we a developer community, or are we some developers who are users of a library that we rely on, but do not contribute to? The split between a 'user' and a 'developer' carries an idea that is very important - exactly now. So, I (personally) think that exactly now we should not do this. Maybe later when we've really confronted the - ideas - that are the source of the current trouble. See you, Matthew From klemm at phys.ethz.ch Thu Jun 28 17:30:54 2012 From: klemm at phys.ethz.ch (Hanno Klemm) Date: Thu, 28 Jun 2012 23:30:54 +0200 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> Message-ID: <94D778FD-4F9F-45AD-858F-54F2B7E318BF@phys.ethz.ch> Am 28.06.2012 um 23:07 schrieb Matthew Brett: > Hi, > > On Thu, Jun 28, 2012 at 1:42 PM, srean wrote: >> In case this changes your mind (or assuages fears) just wanted to >> point out that many open source projects do this. It is not about >> claiming that one is more important than the other, nor does it >> reinforce the idea that developers and users live in separate silos, >> but more of directing the mails to different folders. No policing is >> required as well, just reply to the author and to the appropriate >> list. > > Yes, I know this split is common, but I don't think it works very > well. > > I see that sympy, for example, has only one mailing list, and that > works extremely well. I'd be interested to hear from the Cython and > IPython guys as to whether they feel the user / devel split has helped > or hurt. Ferando? Dag? > > And I continue to think it sends the wrong message. > > My impression is that, at the moment, we numpy-ers are trying to work > out what kind of community we are. Are we a developer community, or > are we some developers who are users of a library that we rely on, but > do not contribute to? The split between a 'user' and a 'developer' > carries an idea that is very important - exactly now. So, I > (personally) think that exactly now we should not do this. Maybe > later when we've really confronted the - ideas - that are the source > of the current trouble. > Let me share the point of view of a typical(?) lurker on this list. I have raised a few questions quite a while back that were very much in "user land". I will probably (unfortunately) never actively contribute to the development of numpy but I like to know what's going on. As long as the bulk of postings are technical discussions I am quite happy to receive (and often delete) long threads that are totally above my head. However every once in a while there are these rather personal exchanges (I am loath to call them discussions) that basically clutter up everyones inbox. In principle, I would be happy to just delete them after a very cursory reading like almost all other posts, however, I have to admit they scare me, because this list was a place where even beginning users like myself could ask questions and get very helpful replies. The change in tone due to those discussions is discouraging to post simple questions (at least to me). So if this rather harsh tone of personal arguments is going to continue, I would very much favour a user and a developer list just because it reduces the barrier of asking "stupid" questions for new users. I would, however, very much prefer this list to go back to the previous style of being very technical with a supposting tone. Then I could still follow the discussions regarding the development of numpy and see some user questions mixed in... Cheers, Hanno From matthew.brett at gmail.com Thu Jun 28 17:39:02 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 28 Jun 2012 14:39:02 -0700 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: <94D778FD-4F9F-45AD-858F-54F2B7E318BF@phys.ethz.ch> References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> <94D778FD-4F9F-45AD-858F-54F2B7E318BF@phys.ethz.ch> Message-ID: Hi, On Thu, Jun 28, 2012 at 2:30 PM, Hanno Klemm wrote: > > Am 28.06.2012 um 23:07 schrieb Matthew Brett: > >> Hi, >> >> On Thu, Jun 28, 2012 at 1:42 PM, srean wrote: >>> In case this changes your mind (or assuages fears) just wanted to >>> point out that many open source projects do this. It is not about >>> claiming that one is more important than the other, nor does it >>> reinforce the idea that developers and users live in separate silos, >>> but more of directing the mails to different folders. No policing is >>> required as well, just reply to the author and to the appropriate >>> list. >> >> Yes, I know this split is common, but I don't think it works very >> well. >> >> I see that sympy, for example, has only one mailing list, and that >> works extremely well. ?I'd be interested to hear from the Cython and >> IPython guys as to whether they feel the user / devel split has helped >> or hurt. ?Ferando? Dag? >> >> And I continue to think it sends the wrong message. >> >> My impression is that, at the moment, we numpy-ers are trying to work >> out what kind of community we are. Are we a developer community, or >> are we some developers who are users of a library that we rely on, but >> do not contribute to? ?The split between a 'user' and a 'developer' >> carries an idea that is very important - exactly now. ?So, I >> (personally) think that exactly now we should not do this. ? Maybe >> later when we've really confronted the - ideas - that are the source >> of the current trouble. >> > > Let me share the point of view of a typical(?) lurker on this list. I > have raised a few questions quite a while back that were very much in > "user land". I will probably (unfortunately) never actively contribute > to the development of numpy but I like to know what's going on. As > long as the bulk of postings are technical discussions I am quite > happy to receive (and often delete) long threads that are totally > above my head. However every once in a while there are these rather > personal exchanges (I am loath to call them discussions) that > basically clutter up everyones inbox. In principle, I would be happy > to just delete them after a very cursory reading like almost all other > posts, however, I have to admit they scare me, because this list was a > place where even beginning users like myself could ask questions and > get very helpful replies. The change in tone due to those discussions > is discouraging to post simple questions (at least to me). > > So if this rather harsh tone of personal arguments is going to > continue, I would very much favour a user and a developer list just > because it reduces the barrier of asking "stupid" questions for new > users. I would, however, very much prefer this list to go back to the > previous style of being very technical with a supposting tone. Then I > could still follow the discussions regarding the development of numpy > and see some user questions mixed in... Yes, I think everyone wants the tone to be better. My very clear impression is that these arguments are signs of stress about real and significant issues, and that when we get down to those issues, and resolve them, then we will be in a better place than we were before. I guess I'm hoping that we can be patient enough to see the shape of the problem that keeps making this stuff happen, Cheers, Matthew From fperez.net at gmail.com Thu Jun 28 17:57:24 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Thu, 28 Jun 2012 14:57:24 -0700 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> Message-ID: On Thu, Jun 28, 2012 at 2:07 PM, Matthew Brett wrote: > I see that sympy, for example, has only one mailing list, and that > works extremely well. ?I'd be interested to hear from the Cython and > IPython guys as to whether they feel the user / devel split has helped > or hurt. ?Ferando? Dag? There's evidence that projects can work successfully in either mode (single/dual lists), so I don't think this is a completely clear-cut question with a 'right' and a 'wrong' answer. What matters most is finding for each project and community what works best, and I think the main factor should be how truly disjoint are the topics and typical threads of the two lists. Before talking about IPython, we can consider Python itself, where there's a very clear division between the general and dev lists, and even the dev list has been recently split with a new 'ideas' list where more exploratory threads can take place, so that -dev can remain 100% focused on active, concrete development work on the main Python repo. And that strong separation of lists (which python-dev enforces strictly by calmly but firmly redirecting threads to other lists as soon as they seem off-topic for the narrow python-dev focus), seems to work pretty well for them. As far as IPython, I personally do prefer the separated lists, and I think it works quite well for us. IPython is a project often used by python beginners for simple learning of basic programming, and they just want to know how to tab-complete or how to get plots to run in non-blocking mode. Our -dev list is relatively high-traffic and with a weird mix of topics, given the rather eclectic nature of IPython: we have qt discussions, parallel computing, low-level networking/zeromq, javascript/web issues, protocol API threads, etc. All that can be overwhelming for novices (though obviously one hopes that novices would gradually learn from that and become interested in being developers). I think this is how I'd summarize it: - having two lists is friendlier to beginners, as it gives them an environment in which to ask questions that they may feel more comfortable in, because the level of the discussions tends to be not as complex as what happens in a -dev list. - but the cost it has is that it insulates users a bit more from the development ideas, perhaps lowering the likelihood that they will catch on to the development conversations and dig deeper into the project. My cartoon view of it would be: a. novice person | user list || dev list b. novice person || combined list where the | bars indicate 'barriers': in (a), a novice has a low barrier to become a good user, but a higher barrier to transfer into developer. With (b), there is no clear barrier to becoming a developer, but it's more intimidating for new users to join. I have heard (but I only have anecdotal evidence) of users saying that they feel more comfortable asking questions in user-only lists because of the level of the discussion, and that they can read all messages and learn something without having to filter threads that are way over their heads. Long answer, I know... But in short, I'm happy having two lists for IPython: I prefer to have the first transition (gaining active users) to be the easiest to make, because I think once users have become confident, the cost of digging deeper into development is actually pretty low. But I'm sure other projects can and have successfully made the opposite choice. Cheers, f From matthew.brett at gmail.com Thu Jun 28 18:03:39 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 28 Jun 2012 15:03:39 -0700 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> Message-ID: Hi, On Thu, Jun 28, 2012 at 2:57 PM, Fernando Perez wrote: > On Thu, Jun 28, 2012 at 2:07 PM, Matthew Brett wrote: >> I see that sympy, for example, has only one mailing list, and that >> works extremely well. ?I'd be interested to hear from the Cython and >> IPython guys as to whether they feel the user / devel split has helped >> or hurt. ?Ferando? Dag? > > There's evidence that projects can work successfully in either mode > (single/dual lists), so I don't think this is a completely clear-cut > question with a 'right' and a 'wrong' answer. ?What matters most is > finding for each project and community what works best, and I think > the main factor should be how truly disjoint are the topics and > typical threads of the two lists. > > Before talking about IPython, we can consider Python itself, where > there's a very clear division between the general and dev lists, and > even the dev list has been recently split with a new 'ideas' list > where more exploratory threads can take place, so that -dev can remain > 100% focused on active, concrete development work on the main Python > repo. ?And that strong separation of lists (which python-dev enforces > strictly by calmly but firmly redirecting threads to other lists as > soon as they seem off-topic for the narrow python-dev focus), seems to > work pretty well for them. > > As far as IPython, I personally do prefer the separated lists, and I > think it works quite well for us. ?IPython is a project often used by > python beginners for simple learning of basic programming, and they > just want to know how to tab-complete or how to get plots to run in > non-blocking mode. ?Our -dev list is relatively high-traffic and with > a weird mix of topics, given the rather eclectic nature of IPython: we > have qt discussions, parallel computing, low-level networking/zeromq, > javascript/web issues, protocol API threads, etc. ?All that can be > overwhelming for novices (though obviously one hopes that novices > would gradually learn from that and become interested in being > developers). > > I think this is how I'd summarize it: > > - having two lists is friendlier to beginners, as it gives them an > environment in which to ask questions that they may feel more > comfortable in, because the level of the discussions tends to be not > as complex as what happens in a -dev list. > > - but the cost it has is that it insulates users a bit more from the > development ideas, perhaps lowering the likelihood that they will > catch on to the development conversations and dig deeper into the > project. > > My cartoon view of it would be: > > a. novice person | user list ?|| dev list > > b. novice person || combined list > > where the | bars indicate 'barriers': in (a), a novice has a low > barrier to become a good user, but a higher barrier to transfer into > developer. ?With (b), there is no clear barrier to becoming a > developer, but it's more intimidating for new users to join. > > I have heard (but I only have anecdotal evidence) of users saying that > they feel more comfortable asking questions in user-only lists because > of the level of the discussion, and that they can read all messages > and learn something without having to filter threads that are way over > their heads. > > > Long answer, I know... But in short, I'm happy having two lists for > IPython: I prefer to have the first transition (gaining active users) > to be the easiest to make, because I think once users have become > confident, the cost of digging deeper into development is actually > pretty low. > > But I'm sure other projects can and have successfully made the opposite choice. Fernando - you told me a week or so ago that you'd come across a blog post or similar advocating a single list - do you remember the reference? Thanks, Matthew From srean.list at gmail.com Thu Jun 28 18:06:12 2012 From: srean.list at gmail.com (srean) Date: Thu, 28 Jun 2012 17:06:12 -0500 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> Message-ID: > And I continue to think it sends the wrong message. Maybe if you articulate your fears I will be able to appreciate your point of view more. > My impression is that, at the moment, we numpy-ers are trying to work > out what kind of community we are. Are we a developer community, or > are we some developers who are users of a library that we rely on, but > do not contribute to? I think it is fair to extrapolate that all of us would want the numpy community to grow. If that be so at some point not all of the users will be developers. Apart from ones own pet projects, all successful projects have more users than active developers. What I like about having two lists is that on one hand it does not prevent me or you from participating in both, on the other hand it allows those who dont want to delve too deeply in one aspect or the other, the option of a cleaner inbox, or the option of having separate inboxes. I for instance would like to be in both the lists, perhaps mostly as a lurker, but still would want to have two different folders just for better organization. To me this seems a win win. There is also a chance that more lurkers would speak up on the help list than here and I think that is a good thing. Best srean From srean.list at gmail.com Thu Jun 28 18:15:26 2012 From: srean.list at gmail.com (srean) Date: Thu, 28 Jun 2012 17:15:26 -0500 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> Message-ID: Could not have said this better even if I tried, so thank you for your long answer. -- srean On Thu, Jun 28, 2012 at 4:57 PM, Fernando Perez wrote: > Long answer, I know... From fperez.net at gmail.com Thu Jun 28 18:20:42 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Thu, 28 Jun 2012 15:20:42 -0700 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> Message-ID: On Thu, Jun 28, 2012 at 3:03 PM, Matthew Brett wrote: > Fernando - you told me a week or so ago that you'd come across a blog > post or similar advocating a single list - do you remember the > reference? Found it after some digging: http://www.kitware.com/blog/home/post/263 and upon rereading it, it doesn't really advocate anything specific about mailing lists, just talking in general about a project considering all of its constituents as a single community, rather than two groups. And in that view, one can even argue that a single community can still benefit from multiple lists, much like the python developers have agreed to have python-dev and python-ideas as a way of triaging exploratory discussions form day-to-day work. But that's the post I had mentioned to you: I probably read it thinking about mailing lists as I went, which is why I think I misquoted it somewhat to you. Cheers, f From matthew.brett at gmail.com Thu Jun 28 18:22:44 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 28 Jun 2012 15:22:44 -0700 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> Message-ID: Hi, On Thu, Jun 28, 2012 at 3:06 PM, srean wrote: >> And I continue to think it sends the wrong message. > > Maybe if you articulate your fears I will be able to appreciate your > point of view more. Ah - I'm afraid I don't know how to say what I mean more clearly :( I can repeat myself, more or less, to say that this split both encapsulates a distinction that I think we should not make, and distracts from the fundamental issues at stake behind the recent discussions. I suppose I'd add that it does some harm to seek technical solutions for fundamental societal problems. The technical solution may be more or less neutral in effect, but it takes the focus off the problem we should be dealing with. The joke about the drunk under a lamp post looking for his keys. See you, Matthew From fperez.net at gmail.com Thu Jun 28 18:23:12 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Thu, 28 Jun 2012 15:23:12 -0700 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> Message-ID: On Thu, Jun 28, 2012 at 3:06 PM, srean wrote: > ?What I like about having two lists is that on one hand it does not > prevent me or you from participating in both, on the other hand it > allows those who dont want to delve too deeply in one aspect or the > other, the option of a cleaner inbox, or the option of having separate > inboxes. I for instance would like to be in both the lists, perhaps > mostly as a lurker, but still would want to have two different folders > just for better organization. I just want to mention that even as a project leader, I benefit from this: when I'm swamped, I simply ignore the user list. Not a nice thing to do, perhaps, but given the choice between moving the project forward and helping a new user, with often very limited time, I think it's the best solution possible. Of course I do help in the user list when I can, but I mostly encourage more experienced users to help new ones, so that our small dev team can spend its limited time moving the project forward. Cheers, f From matthew.brett at gmail.com Thu Jun 28 18:31:14 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 28 Jun 2012 15:31:14 -0700 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> Message-ID: Hi, On Thu, Jun 28, 2012 at 3:20 PM, Fernando Perez wrote: > On Thu, Jun 28, 2012 at 3:03 PM, Matthew Brett wrote: >> Fernando - you told me a week or so ago that you'd come across a blog >> post or similar advocating a single list - do you remember the >> reference? > > Found it after some digging: > > http://www.kitware.com/blog/home/post/263 > > and upon rereading it, it doesn't really advocate anything specific > about mailing lists, just talking in general about a project > considering all of its constituents as a single community, rather than > two groups. > > And in that view, one can even argue that a single community can still > benefit from multiple lists, much like the python developers have > agreed to have python-dev and python-ideas as a way of triaging > exploratory discussions form day-to-day work. I'm not on the python mailing lists, but my impression is that python is in a different space from numpy. I mean, I have the impression (I may be wrong) that python already has a clear idea about how work gets done and how decisions are made. There's a mature PEP process and clear precedent for the process of working through difficult decisions. Numpy lacks this, and more fundamentally, does not appear to be sure to what extent it is a community project in the sense that I've understood it from other projects around us - like - say - IPython, sympy, and so on. So, it may not make sense to think in terms of a model that works for Python, or even, IPython. See you, Matthew From srean.list at gmail.com Thu Jun 28 20:13:35 2012 From: srean.list at gmail.com (srean) Date: Thu, 28 Jun 2012 19:13:35 -0500 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> Message-ID: > I'm not on the python mailing lists, but my impression is that python > is in a different space from numpy. ?I mean, I have the impression Indeed one could seek out philosphical differences between different projects. No two projects are the same but they can and often do have common issues. About the issues that Fernando mentioned I can say that they are real, they do apply and this I say from a from the experience of being on the numpy mailing list. I think that many silent numpy users will thank the creation of a low barrier, low noise (noise is context sensitive) forum where they can ask for help with what they feel are simple questions with easy answers. I still do not have a tangible grasp of what your fears are. It seems you are unhappy that this will split the community. It wont, its just two lists for the same community where mails have been sorted into different folders. It also seems the notion of developers and users is disagreeable to you and you are philosophically hesitant about accepting/recognizing that such a difference exists. I may be wrong, I do not intend to speak for you, I am only trying to understand your objections. First let me assure you they are labels on (temporary) roles not on a person (if that is what is making you uncomfortable). Different people occupy different states for different amounts of time. A question about how to run length decode an array of integers is very different from a question on which files to touch to add reduceat( ) support to the numexpression engine and how. It would be strange to take the position that there is no difference between the nature of these questions. Or to take the position that the person who is interest in the former is also keen to learn about the former (note: some would be, example: yours sincerely. I know the former ot the latter ) or at the least keen on receiving mails on extended discussion on the topic of lesser interest. It seems to me, that sorting these mails into different bins only improves the contextual signal to noise ratio, which the recipient can use as he/she feels fit. The only issue is if there will be enough volume for each of these bins. My perception is yes but this can certainly be revisited. In anycase it does not prevent nor hinder any activity, but allows flexible organization of content should one want it. > So, it may not make sense to think in terms of a model that works for Python, or even, IPython. I do not want to read too much into this, but this I do find kind of odd and confusing: to proactively solicit input from other related projects but then say that do do not apply once the views expressed werent in total agreement. This thread is coming close to veer into the non-technical/non-productive/argumentative zone. The type that I am fearful off, so I will stop here. But I would encourage you to churn these views in your mind, impersonally, to see if the idea of different lists have any merit and to seek out what are the tangible harm that can come out of it. I think this request has come before (hasten to add not initiated by me) and the response had been largely been in favor, but nothing has happened. So I would welcome information on: if indeed two lists are to be made, who gets to create those lists Best, srean From matthew.brett at gmail.com Thu Jun 28 20:29:23 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 28 Jun 2012 17:29:23 -0700 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> Message-ID: Hi, On Thu, Jun 28, 2012 at 5:13 PM, srean wrote: >> I'm not on the python mailing lists, but my impression is that python >> is in a different space from numpy. ?I mean, I have the impression > > Indeed one could seek out philosphical differences between different > projects. No two projects are the same but they can and often do have > common issues. About the issues that Fernando mentioned I can say that > they are real, they do apply and this I say from a from the experience > of being on the numpy mailing list. > > I think that many silent numpy users will thank the creation of a low > barrier, low noise (noise is context sensitive) forum where they can > ask for help with what they feel are simple questions with easy > answers. > > I still do not have a tangible grasp of what your fears are. It seems > you are unhappy that this will split the community. It wont, its just > two lists for the same community where mails have been sorted into > different folders. > > It also seems the notion of developers and users is disagreeable to > you and you are philosophically hesitant about accepting/recognizing > that such a difference exists. I may be wrong, I do not intend to > speak for you, I am only trying to understand your objections. Did you read the blog post that Fernando sent the link for? Now I read it, it captures the idea I was trying to get at rather well. > First let me assure you they are labels on (temporary) roles not on a > person (if that is what is making you uncomfortable). Different people > occupy different states for different amounts of time. > > ?A question about how to run length decode an array of integers is > very different from a question on which files to touch to add > reduceat( ) support to the numexpression engine and how. > > It would be strange to take the position that there is no difference > between the nature of these questions. Or to take the position that > the person who is interest in the former is also keen to learn about > the former (note: some would be, example: yours sincerely. I know the > former ot the latter ) or at the least keen on receiving mails on > extended discussion on the topic of lesser interest. > > ?It seems to me, that sorting these mails into different bins only > improves the contextual signal to noise ratio, which the recipient can > use as he/she feels fit. The only issue is if there will be enough > volume for each of these bins. My perception is yes but this can > certainly be revisited. ?In anycase it does not prevent nor hinder any > activity, but allows flexible organization of content should one want > it. > >> So, it may not make sense to think in terms of a model that works for Python, or even, IPython. > > I do not want to read too much into this, but this I do find kind of > odd and confusing: ?to proactively solicit input from other related > projects but then say that do do not apply once the views expressed > werent in total agreement. Well - I'm sure you feel the same way - I often find myself wanting to hear of people's experience in order to be able to think more clearly. In this case I wasn't expecting Fernando to agree with me, but to give his thoughts and experience. That in turn modified how I was thinking about the problem, and hence my response. Please - don't worry - I don't think the sky will fall if there is a separate user list, and nor do I think it much matters what I think about the matter. I'm only trying to shorten the bad period we're going through by helping to concentrate on the problem in hand. Cheers, Matthew From tjhnson at gmail.com Thu Jun 28 21:52:27 2012 From: tjhnson at gmail.com (T J) Date: Thu, 28 Jun 2012 18:52:27 -0700 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> Message-ID: On Thu, Jun 28, 2012 at 3:23 PM, Fernando Perez wrote: > On Thu, Jun 28, 2012 at 3:06 PM, srean wrote: > > What I like about having two lists is that on one hand it does not > > prevent me or you from participating in both, on the other hand it > > allows those who dont want to delve too deeply in one aspect or the > > other, the option of a cleaner inbox, or the option of having separate > > inboxes. I for instance would like to be in both the lists, perhaps > > mostly as a lurker, but still would want to have two different folders > > just for better organization. > > I just want to mention that even as a project leader, I benefit from > this: when I'm swamped, I simply ignore the user list. Not a nice > thing to do, perhaps, but given the choice between moving the project > forward and helping a new user, with often very limited time, I think > it's the best solution possible. Of course I do help in the user list > when I can, but I mostly encourage more experienced users to help new > ones, so that our small dev team can spend its limited time moving the > project forward. > > I'm okay with having two lists as it does filtering for me, but this seems like a sub-optimal solution. Observation: Some people would like to apply labels to incoming messages. Reality: Email was not really designed for that. We can hack it by using two different email addresses, but why not just keep this list as is and make a concentrated effort to promote the use of 2.0 technologies, like stackoverflow/askbot/etc? There, people can put as many tags as desired on questions: matrix, C-API, iteration, etc. Potentially, these tags would streamline everyone's workflow. The stackoverflow setup also makes it easier for users to search for solutions to common questions, and know that the top answer is still an accurate answer. [No one likes finding old invalid solutions.] The reputation system and up/down votes also help new users figure out which responses to trust. As others have explained, it does seem that there are distinct types of discussions that take place on this list. 1) There are community discussiuons/debates. Examples are the NA discussion, the bug tracker, release schedule, ABI/API changes, matrix rank tolerance too low, lazy evaluation, etc. These are clearly mailing-list topics. If you look at all the messages for the last two(!) months, it seems like this type of message has been the dominate type. 2) There are also standard questions. Recent examples are "memory allocation at assignment", "dot() function question", "not expected output of fill_diagonal", "silly isscalar question". These messages seem much more suited to the stackoverflow environment. In fact, I'd be happy if we redirected such questions to stackoverflow. This has the added benefit that responses to such questions will stay on topic. Note that if a stackoverflow question seeds a discussion, then someone can start a new thread on the mailing list which cite the stackoverflow question. tl;dr Keep this list the same, and push "user" questions to stackoverflow instead of pushing them to a user list. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nouiz at nouiz.org Thu Jun 28 22:28:43 2012 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Thu, 28 Jun 2012 22:28:43 -0400 Subject: [Numpy-discussion] PyArray_FILLWBYTE dangerous doc Message-ID: Hi, The doc of PyArray_FILLWBYTE here http://docs.scipy.org/doc/numpy/reference/c-api.array.html is this PyArray_FILLWBYTE(PyObject* obj, int val) Fill the array pointed to by obj ?which must be a (subclass of) bigndarray?with the contents of val (evaluated as a byte). In the code, what it does is call memset: numpy/core/include/numpy/ndarrayobject.h #define PyArray_FILLWBYTE(obj, val) memset(PyArray_DATA(obj), val, \ PyArray_NBYTES(obj)) This make it ignore completely the strides! So the easy fix would be to update the doc, the real fix is to test the contiguity before calling memset, if not contiguous, call something else appropriate. Fred From srean.list at gmail.com Thu Jun 28 22:50:18 2012 From: srean.list at gmail.com (srean) Date: Thu, 28 Jun 2012 21:50:18 -0500 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> Message-ID: I like this solution and I think ask.scipy.org can be revived to take over that role, but this will need some policing to send standard questions there and also some hangout time at ask.scipy.org. I love the stackoverflow model but it requires more active participation of those who want to answer questions as compared to mailing lists. This because questions not only do not come to you by default but they also get knocked off the top page as more questions come in. Something to watch out for though I believe it wont be as bad as the main SO site. Meta^2 I have been top posting with abandon here. Not sure what is preferred here, top or bottom. Best srean On Thu, Jun 28, 2012 at 8:52 PM, T J wrote: > On Thu, Jun 28, 2012 at 3:23 PM, Fernando Perez > wrote: > I'm okay with having two lists as it does filtering for me, but this seems > like a?sub-optimal?solution. > > Observation: Some people would like to apply labels to incoming messages. > Reality: Email was not really designed for that. > > We can hack it by using two different email addresses, but why not just keep > this list as is and make a concentrated effort to promote the use of 2.0 > technologies, like stackoverflow/askbot/etc? ?There, people can put as many > tags as desired on questions: matrix, C-API, iteration, etc. Potentially, > these tags would streamline everyone's workflow. ?The stackoverflow setup > also makes it easier for users to search for solutions to common questions, > and know that the top answer is still an accurate answer. ?[No one likes > finding old invalid solutions.] ?The reputation system and up/down votes > also help new users figure out which responses to trust. > > As others have explained, it does seem that there are distinct types of > discussions that take place on this list. > > 1) ?There are community discussiuons/debates. > > Examples are the NA discussion, the bug tracker, release schedule, ABI/API > changes, matrix rank tolerance too low, lazy evaluation, etc. ? These are > clearly mailing-list topics. ? If you look at all the messages for the last > two(!) months, it seems like this type of message has been the dominate > type. > > 2) There are also standard questions. > > Recent examples are "memory allocation at assignment", ?"dot() function > question", "not expected output of fill_diagonal", "silly isscalar > question". ?These messages seem much more suited to the stackoverflow > environment. ?In fact, I'd be happy if we redirected such questions to > stackoverflow. ?This has the added benefit that responses to such questions > will stay on topic. ?Note that if a stackoverflow question seeds a > discussion, then someone can start a new thread on the mailing list which > cite the stackoverflow question. > > tl;dr > > Keep this list the same, and push "user" questions to stackoverflow instead > of pushing them to a user list. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From matthew.brett at gmail.com Thu Jun 28 23:49:38 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 28 Jun 2012 20:49:38 -0700 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> Message-ID: Hi, On Thu, Jun 28, 2012 at 7:50 PM, srean wrote: > I like this solution and I think ask.scipy.org can be revived to take > over that role, but this will need some policing to send standard > questions there and also some hangout time at ask.scipy.org. Sounds like a good idea to me too. If someone points me at it when the time comes, I'm happy to do hangout duty too. > I love the stackoverflow model but it requires more active > participation of ?those who want to answer questions as compared to > mailing lists. This because questions not only do not come to you by > default but they also ?get knocked off the top page as more questions > come in. Something to watch out for though I believe it wont be as bad > as the main SO site. > > Meta^2 I have been top posting with abandon here. Not sure what is > preferred here, top or bottom. Me I prefer posting under the relevant stuff, but I don't think you'll get any flak either way. Cheers, matthew From uschmitt at mineway.de Fri Jun 29 04:54:38 2012 From: uschmitt at mineway.de (Uwe Schmitt) Date: Fri, 29 Jun 2012 10:54:38 +0200 Subject: [Numpy-discussion] Strange problem Message-ID: <4FED6D4E.8080808@mineway.de> Hi, I have unreproducable crashes on a customers Win 7 machine with Python 2.7.2 and Numpy 1.6.1. He gets the following message: Problem signature: Problem Event Name: APPCRASH Application Name: python.exe Application Version: 0.0.0.0 Application Timestamp: 4df4ba7c Fault Module Name: umath.pyd Fault Module Version: 0.0.0.0 Fault Module Timestamp: 4e272b96 Exception Code: c0000005 Exception Offset: 0001983a OS Version: 6.1.7601.2.1.0.256.4 Locale ID: 2055 Additional Information 1: 0a9e Additional Information 2: 0a9e372d3b4ad19135b953a78882e789 Additional Information 3: 0a9e Additional Information 4: 0a9e372d3b4ad19135b953a78882e789 I know that I can not expect a clear answer without more information, but my customer is on hollidays and I just wanted to ask for some hints for possible reasons. The machine is not out of memory and despite this crash runs very stable. Regards, Uwe -- Dr. rer. nat. Uwe Schmitt Leitung F/E Mathematik mineway GmbH Geb?ude 4 Im Helmerswald 2 66121 Saarbr?cken Telefon: +49 (0)681 8390 5334 Telefax: +49 (0)681 830 4376 uschmitt at mineway.de www.mineway.de Gesch?ftsf?hrung: Dr.-Ing. Mathias Bauer Amtsgericht Saarbr?cken HRB 12339 -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Fri Jun 29 04:57:48 2012 From: cournape at gmail.com (David Cournapeau) Date: Fri, 29 Jun 2012 09:57:48 +0100 Subject: [Numpy-discussion] Strange problem In-Reply-To: <4FED6D4E.8080808@mineway.de> References: <4FED6D4E.8080808@mineway.de> Message-ID: On Fri, Jun 29, 2012 at 9:54 AM, Uwe Schmitt wrote: > Hi, > > I have unreproducable crashes on a customers Win 7 machine with Python 2.7.2 > and > Numpy 1.6.1.? He gets the following message: > > ? Problem signature: > ? Problem Event Name: APPCRASH > ? Application Name: python.exe > ? Application Version: 0.0.0.0 > ? Application Timestamp: 4df4ba7c > ? Fault Module Name: umath.pyd > ? Fault Module Version: 0.0.0.0 > ? Fault Module Timestamp: 4e272b96 > ? Exception Code: c0000005 > ? Exception Offset: 0001983a > ? OS Version: 6.1.7601.2.1.0.256.4 > ? Locale ID: 2055 > ? Additional Information 1: 0a9e > ? Additional Information 2: 0a9e372d3b4ad19135b953a78882e789 > ? Additional Information 3: 0a9e > ? Additional Information 4: 0a9e372d3b4ad19135b953a78882e789 > > I know that I can not expect a clear answer without more information, but my > customer is on hollidays and I just wanted to ask for some hints for > possible > reasons. The machine is not out of memory and despite this crash runs very > stable. Is this on 32 or 64 bits windows ? Do you know if your customer uses only numpy, or other packages that depend on numpy C extension ? David From uschmitt at mineway.de Fri Jun 29 05:09:17 2012 From: uschmitt at mineway.de (Uwe Schmitt) Date: Fri, 29 Jun 2012 11:09:17 +0200 Subject: [Numpy-discussion] Strange problem In-Reply-To: References: <4FED6D4E.8080808@mineway.de> Message-ID: <4FED70BD.1080708@mineway.de> Am 29.06.2012 10:57, schrieb David Cournapeau: > > Is this on 32 or 64 bits windows ? Do you know if your customer uses > only numpy, or other packages that depend on numpy C extension ? It is 64 bit Windows. I forgot to say that a part of my numpy arrays are generated by a short Cython method wrapping open-ms library. As the code fragment is short, I post it here: def get_peaks(self): cdef _MSSpectrum[_Peak1D] * spec_ = self.inst cdef unsigned int n = spec_.size() cdef np.ndarray[np.float32_t, ndim=2] peaks peaks = np.zeros( [n,2], dtype=np.float32) cdef _Peak1D p cdef vector[_Peak1D].iterator it = spec_.begin() cdef int i = 0 while it != spec_.end(): peaks[i,0] = deref(it).getMZ() peaks[i,1] = deref(it).getIntensity() preincrement(it) i += 1 return peaks I am sure that this functions does not crash during execution. As spec_ 's class is derived from C++ STL std::vector<..> there should be no conflict between counting 'i' up to 'n' and testing 'it' against 'spec_.end()'. Regards, Uwe -- Dr. rer. nat. Uwe Schmitt Leitung F/E Mathematik mineway GmbH Geb?ude 4 Im Helmerswald 2 66121 Saarbr?cken Telefon: +49 (0)681 8390 5334 Telefax: +49 (0)681 830 4376 uschmitt at mineway.de www.mineway.de Gesch?ftsf?hrung: Dr.-Ing. Mathias Bauer Amtsgericht Saarbr?cken HRB 12339 -------------- next part -------------- An HTML attachment was scrubbed... URL: From nouiz at nouiz.org Fri Jun 29 14:43:09 2012 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Fri, 29 Jun 2012 14:43:09 -0400 Subject: [Numpy-discussion] Would a patch with a function for incrementing an array with advanced indexing be accepted? In-Reply-To: References: Message-ID: Hi, I personnaly can't review this as this is too much in NumPy internal. My only comments is that you could add a test and an example in the doc for matrix[list]. I think it will be the most used case. Fred On Wed, Jun 27, 2012 at 7:47 PM, John Salvatier wrote: > I've submitted a pull request ( https://github.com/numpy/numpy/pull/326?). > I'm new to the numpy and python internals, so feedback is greatly > appreciated. > > > On Tue, Jun 26, 2012 at 12:10 PM, Travis Oliphant > wrote: >> >> >> On Jun 26, 2012, at 1:34 PM, Fr?d?ric Bastien wrote: >> >> > Hi, >> > >> > I think he was referring that making NUMPY_ARRAY_OBJECT[...] syntax >> > support the operation that you said is hard. But having a separate >> > function do it is less complicated as you said. >> >> Yes. That's precisely what I meant. ? Thank you for clarifying. >> >> -Travis >> >> > >> > Fred >> > >> > On Tue, Jun 26, 2012 at 1:27 PM, John Salvatier >> > wrote: >> >> Can you clarify why it would be super hard? I just reused the code for >> >> advanced indexing (a modification of PyArray_SetMap). Am I missing >> >> something >> >> crucial? >> >> >> >> >> >> >> >> On Tue, Jun 26, 2012 at 9:57 AM, Travis Oliphant >> >> wrote: >> >>> >> >>> >> >>> On Jun 26, 2012, at 11:46 AM, John Salvatier wrote: >> >>> >> >>> Hello, >> >>> >> >>> If you increment an array using advanced indexing and have repeated >> >>> indexes, the array doesn't get repeatedly >> >>> incremented, >> >>> http://comments.gmane.org/gmane.comp.python.numeric.general/50291. >> >>> I wrote a C function that does incrementing with repeated indexes >> >>> correctly. >> >>> The branch is here (https://github.com/jsalvatier/numpy see the last >> >>> two >> >>> commits). Would a patch with a cleaned up version of a function like >> >>> this be >> >>> accepted into numpy? I'm not experienced writing numpy C code so I'm >> >>> sure it >> >>> still needs improvement. >> >>> >> >>> >> >>> This is great. ? It is an often-requested feature. ? It's *very >> >>> difficult* >> >>> to do without changing fundamentally what NumPy is. ?But, yes this >> >>> would be >> >>> a great pull request. >> >>> >> >>> Thanks, >> >>> >> >>> -Travis >> >>> >> >>> >> >>> >> >>> _______________________________________________ >> >>> NumPy-Discussion mailing list >> >>> NumPy-Discussion at scipy.org >> >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >>> >> >> >> >> >> >> _______________________________________________ >> >> NumPy-Discussion mailing list >> >> NumPy-Discussion at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From jim.vickroy at noaa.gov Fri Jun 29 15:20:38 2012 From: jim.vickroy at noaa.gov (Jim Vickroy) Date: Fri, 29 Jun 2012 13:20:38 -0600 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> Message-ID: <4FEE0006.9080403@noaa.gov> As a lurker and user, I too wish for a distinct numpy-users list. -- jv On 6/28/2012 1:42 PM, Matthew Brett wrote: > Hi, > > On Thu, Jun 28, 2012 at 7:42 AM, Olivier Delalleau wrote: >> +1 for a numpy-users list without "dev noise". > Moderately strong vote against splitting the mailing lists into devel and user. > > As we know, this list can be unhappy and distracting, but I don't > think splitting the lists is the right approach to that problem. > > Splitting the lists sends the wrong signal. I'd rather that we show > by example that the developers listen to all voices, and that the > users should expect to become developers. In other words that the > boundary between the user and developer is fluid and has no explicit > boundaries. > > As data points, I make no distinction between scipy-devel and > scipy-user, nor cython-devel and cython-user. Policing the > distinction ('please post this on the user mailing list') is a boring > job and doesn't make anyone more cheerful. > > I don't believe help questions are getting lost any more than devel > questions are, but I'm happy to be corrected if someone has some data. > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From matthew.brett at gmail.com Sat Jun 30 01:52:31 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 29 Jun 2012 22:52:31 -0700 Subject: [Numpy-discussion] NumPy 1.7 release delays In-Reply-To: <4FEAB888.7070807@uci.edu> References: <29DC563C-9A0C-4D8D-9F2A-E1A8D17D554B@continuum.io> <333C36E4-D2E4-4127-99C3-9D55468CA558@continuum.io> <1E8CCAF9-20F6-48DB-B5E1-A8B74BC194F4@continuum.io> <4FEAB888.7070807@uci.edu> Message-ID: Hi, On Wed, Jun 27, 2012 at 12:38 AM, Christoph Gohlke wrote: > On 6/26/2012 8:13 PM, Travis Oliphant wrote: >>> For the main repos we use buildbot and test on: >>> >>> Ubuntu Maverick 32-bit >>> Debian sid 64-bit >>> OSX 10.4 PPC >>> OSX 10.5 Intel >>> Debian wheezy PPC >>> Debian squeeze ARM (a Raspberry PI no less) >>> WIndows XP 32 bit >>> SPARC (courtesy of our friends at NeuroDebian) >>> >>> http://nipy.bic.berkeley.edu/builders >>> >>> We've found several issues with numpy using these, and I've fed them >>> back as I found them, >>> >>> http://projects.scipy.org/numpy/ticket/2076 >>> http://projects.scipy.org/numpy/ticket/2077 >>> http://projects.scipy.org/numpy/ticket/2174 >>> >>> They are particularly useful for difficult to reproduce problems >>> because they test often and leave a record that we can point to. ?As >>> I've said before, y'all are welcome to use these machines for numpy >>> builds / tests. >> >> Now that Ondrej is working on getting continuous integration up for NumPy, ?I would encourage him to take you up on that offer. ? Can these machines run a Jenkins slave? >> >> Having periodic tests of Sage, Pandas, matplotlib, scipy, and other projects is a major priority and really critical before we can really talk about how to migrate the APIs. ? ?Thankfully, Ondrej is available to help get this project started and working this summer. >> >> -Travis >> > > > FWIW: I can relatively easy (batch script) build numpy from github and > run the test suites of many packages available at > against it. > > For example at > > are the test results of assimulo, bitarray, bottleneck, h5py, > matplotlib, numexpr, pandas, pygame, scipy, skimage, sklearn, > statsmodels, and pytables, built against numpy-1.6.x and run against > numpy-1.7.0.dev-66bd39f on win-amd64-py2.7. Thanks - that's very helpful. Do you have your build system documented somewhere? Is it easy to replicate do you think? Cheers, Matthew From aron at ahmadia.net Sat Jun 30 04:00:15 2012 From: aron at ahmadia.net (Aron Ahmadia) Date: Sat, 30 Jun 2012 10:00:15 +0200 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> Message-ID: As I mentioned before, numpy-related questions would be welcome on scicomp, and this would have the advantage of bringing in scientists and mathematicians from related fields who might be able to answer numerical questions that sit between mathematics, programming, and science that you might not otherwise. There's already somewhat of a critical mass of people hanging out at scicomp (500 unique visitors a day during the work week), and you can subscribe to the python-related tags if you want to filter out the other sorts of questions. A -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Sat Jun 30 04:35:20 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 30 Jun 2012 10:35:20 +0200 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> Message-ID: <96e6c662-a7e6-4b32-9c06-de5ca55a1cb8@email.android.com> +1 on scicomp.stackexchange.com For it to work, one would need to actively push users towards it though...so it would require a very clear pronouncement. Matthew: I'm happy with the split we did with Cython. It leaves me free to mostly ignore cython-users, and it saves users from thos 100+ post threads about inner workings. (I've had Cython users tell me several times that it is better that devs make Cython better than spend time helping newbies -- I feel helping out newbies is something advanced users can do too). I don't agree with your implication that the organization of mailing lists has much to do with governance. The mailing list split is a split of topics of discussion, not of the subscribers; anyone is welcome to post on cython-dev (e.g., ideas for new features or hashing out wanted semantics). However, a stackexchange-like solution may be a better fit than a users list. The. ask.scipy beta wasn't used much but it wasn't really promoted and users weren't pushed towards it. One advantage is pooling topics together; many new users may be unsure whether numpy or scipy or matplotlib or ipython or cython is the place to ask. There are 'inter-disiplinery' questions; currently numpy-discussion seems to catch some of that too, not just pure numpy. Dag -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. Aron Ahmadia wrote: As I mentioned before, numpy-related questions would be welcome on scicomp, and this would have the advantage of bringing in scientists and mathematicians from related fields who might be able to answer numerical questions that sit between mathematics, programming, and science that you might not otherwise. There's already somewhat of a critical mass of people hanging out at scicomp (500 unique visitors a day during the work week), and you can subscribe to the python-related tags if you want to filter out the other sorts of questions. A -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Sat Jun 30 04:55:36 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Sat, 30 Jun 2012 01:55:36 -0700 Subject: [Numpy-discussion] [ANN] IPython 0.13 is officially out! Message-ID: Hi all, on behalf of the IPython development team, and just in time for the imminent Debian freeze and SciPy 2012, I'm thrilled to announce, after an intense 6 months of work, the official release of IPython 0.13. This version contains several major new features, as well as a large amount of bug and regression fixes. The previous version (0.12) was released on December 19 2011, so in this development cycle we had: - ~6 months of work. - 373 pull requests merged. - 742 issues closed (non-pull requests). - contributions from 62 authors. - 1760 commits. - a diff of 114226 lines. This means that we closed a total of 1115 issues over 6 months, for a rate of almost 200 issues closed per month and almost 300 commits per month. We are very grateful to all of you who have contributed so enthusiastically to the project and have had the patience of pushing your contributions through our often lengthy review process. We've also welcomed several new members to the core IPython development group: J?rgen Stenarson (@jstenar - this really was an omission as J?rgen has been our Windows expert for a long time) and Matthias Bussonier (@Carreau), who has been very active on all fronts of the project. *Highlights* There is too much new work to write up here, so we refer you to our full What's New document (http://ipython.org/ipython-doc/rel-0.13/whatsnew/version0.13.html) for the full details. But the main highlights of this release are: * Brand new UI for the notebook, with major usability improvements (real menus, toolbar, and much more) * Manage all your parallel cluster configurations from the notebook with push-button simplicity (cluster start/stop with one button). * Cell magics: commands prefixed with %% apply to an entire cell. We ship with many cell magics by default, including timing, profiling, running cells under bash, Perl and Ruby as well as magics to interface seamlessly with Cython, R and Octave. * The IPython.parallel tools have received many fixes, optimizations, and a number of API improvements to make writing, profiling and debugging parallel codes with IPython much easier. * We have unified our interactive kernels (the basic ipython object you know and love) with the engines running in parallel, so that you can now use all IPython special tricks in parallel too. And you can connect a console or qtconsole to any parallel engine for direct, interactive execution, plotting and debugging in a cluster. *Downloads* Download links and instructions are at: http://ipython.org/download.html And IPython is also on PyPI: http://pypi.python.org/pypi/ipython Those contain a built version of the HTML docs; if you want pure source downloads with no docs, those are available on github: Tarball: https://github.com/ipython/ipython/tarball/rel-0.13 Zipball: https://github.com/ipython/ipython/zipball/rel-0.13 Please see our release notes for the full details on everything about this release: http://ipython.org/ipython-doc/rel-0.13/whatsnew/version0.13.html As usual, if you find any other problem, please file a ticket --or even better, a pull request fixing it-- on our github issues site (https://github.com/ipython/ipython/issues). Many thanks to all who contributed! Fernando, on behalf of the IPython development team. http://ipython.org From josef.pktd at gmail.com Sat Jun 30 07:59:56 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 30 Jun 2012 07:59:56 -0400 Subject: [Numpy-discussion] NumPy 1.7 release delays In-Reply-To: References: <29DC563C-9A0C-4D8D-9F2A-E1A8D17D554B@continuum.io> <333C36E4-D2E4-4127-99C3-9D55468CA558@continuum.io> <1E8CCAF9-20F6-48DB-B5E1-A8B74BC194F4@continuum.io> <4FEAB888.7070807@uci.edu> Message-ID: On Sat, Jun 30, 2012 at 1:52 AM, Matthew Brett wrote: > Hi, > > On Wed, Jun 27, 2012 at 12:38 AM, Christoph Gohlke wrote: >> On 6/26/2012 8:13 PM, Travis Oliphant wrote: >>>> For the main repos we use buildbot and test on: >>>> >>>> Ubuntu Maverick 32-bit >>>> Debian sid 64-bit >>>> OSX 10.4 PPC >>>> OSX 10.5 Intel >>>> Debian wheezy PPC >>>> Debian squeeze ARM (a Raspberry PI no less) >>>> WIndows XP 32 bit >>>> SPARC (courtesy of our friends at NeuroDebian) >>>> >>>> http://nipy.bic.berkeley.edu/builders >>>> >>>> We've found several issues with numpy using these, and I've fed them >>>> back as I found them, >>>> >>>> http://projects.scipy.org/numpy/ticket/2076 >>>> http://projects.scipy.org/numpy/ticket/2077 >>>> http://projects.scipy.org/numpy/ticket/2174 >>>> >>>> They are particularly useful for difficult to reproduce problems >>>> because they test often and leave a record that we can point to. ?As >>>> I've said before, y'all are welcome to use these machines for numpy >>>> builds / tests. >>> >>> Now that Ondrej is working on getting continuous integration up for NumPy, ?I would encourage him to take you up on that offer. ? Can these machines run a Jenkins slave? >>> >>> Having periodic tests of Sage, Pandas, matplotlib, scipy, and other projects is a major priority and really critical before we can really talk about how to migrate the APIs. ? ?Thankfully, Ondrej is available to help get this project started and working this summer. >>> >>> -Travis >>> >> >> >> FWIW: I can relatively easy (batch script) build numpy from github and >> run the test suites of many packages available at >> against it. >> >> For example at >> >> are the test results of assimulo, bitarray, bottleneck, h5py, >> matplotlib, numexpr, pandas, pygame, scipy, skimage, sklearn, >> statsmodels, and pytables, built against numpy-1.6.x and run against >> numpy-1.7.0.dev-66bd39f on win-amd64-py2.7. > > Thanks - that's very helpful. same here. I just saw a python 3.2.3 bug in statsmodels. Josef > > Do you have your build system documented somewhere? ?Is it easy to > replicate do you think? > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From matthew.brett at gmail.com Sat Jun 30 12:51:42 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 30 Jun 2012 09:51:42 -0700 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: <96e6c662-a7e6-4b32-9c06-de5ca55a1cb8@email.android.com> References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> <96e6c662-a7e6-4b32-9c06-de5ca55a1cb8@email.android.com> Message-ID: Hi, On Sat, Jun 30, 2012 at 1:35 AM, Dag Sverre Seljebotn wrote: > +1 on scicomp.stackexchange.com > > For it to work, one would need to actively push users towards it though...so > it would require a very clear pronouncement. > > Matthew: I'm happy with the split we did with Cython. It leaves me free to > mostly ignore cython-users, and it saves users from thos 100+ post threads > about inner workings. (I've had Cython users tell me several times that it > is better that devs make Cython better than spend time helping newbies -- I > feel helping out newbies is something advanced users can do too). Having heard from you and Fernando, I'm much more 50 - 50 than I was before. Although my experience is the same as TJ earlier - I don't filter my mail, I just skip the ones I don't want to read, often by subject line or the first few lines of the mail. > I don't agree with your implication that the organization of mailing lists > has much to do with governance. I think 'governance' would be a bad word for what I meant - more like 'tone'. I suppose they are strongly related but probably 'tone' comes first and then drives 'governance', and maybe the purpose of 'governance' is to preserve the 'tone' as people and circumstances change. > The mailing list split is a split of topics > of discussion, not of the subscribers; anyone is welcome to post on > cython-dev (e.g., ideas for new features or hashing out wanted semantics). Right. > However, a stackexchange-like solution may be a better fit than a users > list. The. ask.scipy beta wasn't used much but it wasn't really promoted and > users weren't pushed towards it. As a matter of interest - do y'all hang out much on stackexchange? I notice that I often go to stackexchange for a good answer, but it doesn't seem that good for - discussion. Or maybe it's just I'm not used to it. > One advantage is pooling topics together; many new users may be unsure > whether numpy or scipy or matplotlib or ipython or cython is the place to > ask. There are 'inter-disiplinery' questions; currently numpy-discussion > seems to catch some of that too, not just pure numpy. Yes, good point. See you, Matthew From fperez.net at gmail.com Sat Jun 30 13:10:26 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Sat, 30 Jun 2012 10:10:26 -0700 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> <96e6c662-a7e6-4b32-9c06-de5ca55a1cb8@email.android.com> Message-ID: On Sat, Jun 30, 2012 at 9:51 AM, Matthew Brett wrote: > As a matter of interest - do y'all hang out much on stackexchange? ?I > notice that I often go to stackexchange for a good answer, but it > doesn't seem that good for - discussion. ?Or maybe it's just I'm not > used to it. I'm in the same boat as you, but this discussion has made me much more interested in starting to use it, and it sounds like it might really be a better solution for the kind of 'cross-project' questions that often feel a bit out of place in just about all the lists. People have made pretty convincinge (to me) arguments for that kind of system, perhaps we should give it a try instead of opening yet another ML... Cheers, f From jason-sage at creativetrax.com Sat Jun 30 13:13:05 2012 From: jason-sage at creativetrax.com (Jason Grout) Date: Sat, 30 Jun 2012 12:13:05 -0500 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <96e6c662-a7e6-4b32-9c06-de5ca55a1cb8@email.android.com> Message-ID: <4FEF33A1.5030606@creativetrax.com> On 6/30/12 12:10 PM, Fernando Perez wrote: > On Sat, Jun 30, 2012 at 9:51 AM, Matthew Brett wrote: >> As a matter of interest - do y'all hang out much on stackexchange? I >> notice that I often go to stackexchange for a good answer, but it >> doesn't seem that good for - discussion. Or maybe it's just I'm not >> used to it. > > I'm in the same boat as you, but this discussion has made me much more > interested in starting to use it I'm curious: do you mean using stackexchange.com itself, or using http://scicomp.stackexchange.com/ specifically? Thanks, Jason From fperez.net at gmail.com Sat Jun 30 13:31:55 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Sat, 30 Jun 2012 10:31:55 -0700 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: <4FEF33A1.5030606@creativetrax.com> References: <96e6c662-a7e6-4b32-9c06-de5ca55a1cb8@email.android.com> <4FEF33A1.5030606@creativetrax.com> Message-ID: On Sat, Jun 30, 2012 at 10:13 AM, Jason Grout wrote: > > I'm curious: do you mean using stackexchange.com itself, or using > http://scicomp.stackexchange.com/ specifically? I meant the latter, which seems like it would be the best suited for the topic of this discussion. I don't use the site myself yet (other than, as Matthew mentions, stumbling on it via googling for a question), but I'm growing more interested... Cheers, f From d.s.seljebotn at astro.uio.no Sat Jun 30 14:36:50 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 30 Jun 2012 20:36:50 +0200 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <96e6c662-a7e6-4b32-9c06-de5ca55a1cb8@email.android.com> <4FEF33A1.5030606@creativetrax.com> Message-ID: <4FEF4742.2000202@astro.uio.no> On 06/30/2012 07:31 PM, Fernando Perez wrote: > On Sat, Jun 30, 2012 at 10:13 AM, Jason Grout > wrote: >> >> I'm curious: do you mean using stackexchange.com itself, or using >> http://scicomp.stackexchange.com/ specifically? > > I meant the latter, which seems like it would be the best suited for > the topic of this discussion. I don't use the site myself yet (other > than, as Matthew mentions, stumbling on it via googling for a > question), but I'm growing more interested... It is rumored that a problem with some stackexchange sites is the host of nay-sayers saying that a question doesn't belong here but in this other silo instead, instead of just letting a culture develop (though my only interface to stack*.com is Google too so I don't really know). If one was to actively push people to that site instead of numpy-discussion, one should make sure that almost any discussion about scientific Python is welcome there (at least anything that does "import numpy" at some point). Perhaps have that discussion on meta.scicomp.stackexchange.com beforehand. Dag From fperez.net at gmail.com Sat Jun 30 14:44:01 2012 From: fperez.net at gmail.com (Fernando Perez) Date: Sat, 30 Jun 2012 11:44:01 -0700 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: <4FEF4742.2000202@astro.uio.no> References: <96e6c662-a7e6-4b32-9c06-de5ca55a1cb8@email.android.com> <4FEF33A1.5030606@creativetrax.com> <4FEF4742.2000202@astro.uio.no> Message-ID: On Sat, Jun 30, 2012 at 11:36 AM, Dag Sverre Seljebotn wrote: > It is rumored that a problem with some stackexchange sites is the host > of nay-sayers saying that a question doesn't belong here but in this > other silo instead, instead of just letting a culture develop (though my > only interface to stack*.com is Google too so I don't really know). Mmh, interesting... Not being a regular user myself, I have no idea. But it does sound like something worth clarifying before starting to push discussions in that direction. From d.s.seljebotn at astro.uio.no Sat Jun 30 14:57:21 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 30 Jun 2012 20:57:21 +0200 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <96e6c662-a7e6-4b32-9c06-de5ca55a1cb8@email.android.com> <4FEF33A1.5030606@creativetrax.com> <4FEF4742.2000202@astro.uio.no> Message-ID: <4FEF4C11.2040604@astro.uio.no> On 06/30/2012 08:44 PM, Fernando Perez wrote: > On Sat, Jun 30, 2012 at 11:36 AM, Dag Sverre Seljebotn > wrote: >> It is rumored that a problem with some stackexchange sites is the host >> of nay-sayers saying that a question doesn't belong here but in this >> other silo instead, instead of just letting a culture develop (though my >> only interface to stack*.com is Google too so I don't really know). > > Mmh, interesting... Not being a regular user myself, I have no idea. > But it does sound like something worth clarifying before starting to > push discussions in that direction. Specifically: http://news.ycombinator.com/item?id=4131462 But I see that Aron and Andy seem to have some authority on meta.scicomp so it can't be too bad on scicomp...? Dag From aron at ahmadia.net Sat Jun 30 15:10:21 2012 From: aron at ahmadia.net (Aron Ahmadia) Date: Sat, 30 Jun 2012 21:10:21 +0200 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: <4FEF4C11.2040604@astro.uio.no> References: <96e6c662-a7e6-4b32-9c06-de5ca55a1cb8@email.android.com> <4FEF33A1.5030606@creativetrax.com> <4FEF4742.2000202@astro.uio.no> <4FEF4C11.2040604@astro.uio.no> Message-ID: I and Geoff are moderators on scicomp, I'm happy to invest the effort in getting the community started there. One way to use scicomp is like a blog/faq, that is, if you get a specific question a lot here on the list or elsewhere, you can ask and answer it yourself on scicomp. If others find the post useful, they will vote it and the answer up. Navel-gazing questions with generic scope are generally discouraged, for a good feel for the sort of questions we'd be able to handle from a scipy/numpy perspective on scicomp, take a look at either the petsc or python tag feeds: http://scicomp.stackexchange.com/questions/tagged/petsc?sort=active&pagesize=15 http://scicomp.stackexchange.com/questions/tagged/python?sort=active&pagesize=15 A On Sat, Jun 30, 2012 at 8:57 PM, Dag Sverre Seljebotn < d.s.seljebotn at astro.uio.no> wrote: > On 06/30/2012 08:44 PM, Fernando Perez wrote: > > On Sat, Jun 30, 2012 at 11:36 AM, Dag Sverre Seljebotn > > wrote: > >> It is rumored that a problem with some stackexchange sites is the host > >> of nay-sayers saying that a question doesn't belong here but in this > >> other silo instead, instead of just letting a culture develop (though my > >> only interface to stack*.com is Google too so I don't really know). > > > > Mmh, interesting... Not being a regular user myself, I have no idea. > > But it does sound like something worth clarifying before starting to > > push discussions in that direction. > > Specifically: > > http://news.ycombinator.com/item?id=4131462 > > But I see that Aron and Andy seem to have some authority on meta.scicomp > so it can't be too bad on scicomp...? > > Dag > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jdh2358 at gmail.com Sat Jun 30 15:29:04 2012 From: jdh2358 at gmail.com (John Hunter) Date: Sat, 30 Jun 2012 14:29:04 -0500 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: <4FEE0006.9080403@noaa.gov> References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> <4FEE0006.9080403@noaa.gov> Message-ID: On Fri, Jun 29, 2012 at 2:20 PM, Jim Vickroy wrote: > As a lurker and user, I too wish for a distinct numpy-users list. -- jv > > This thread is a perfect example of why another list is needed. It's currently 42 semi-philosophical posts about what kind community numpy should be and what kinds of lists or stacks should serve it. There needs to be a place where people can ask simple 'how do I do x in numpy" questions without having to wade through hundreds of posts about release cycles, community input, process, and decisions about ABI and API compatibility in point versus major releases. Most people just don't care -- they just want to be reasonably sure that the developers do care and are doing it right. And if they want to participate or observe these discussions, they know where to go. It's like sausage making -- the more people get an inside look at how the sausage is made, the more they are afraid to eat it. In mpl we have a devel list and a users list. Preparing for a release, we might have a hundred emails about PR status and breakers and release cycles and god knows what. The users list gets "rc1 is ready for testing", "rc2 is ready for testing" and "v1.1.1 is released". That's about all most people want to know about our release process. JDH -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sat Jun 30 15:37:16 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 30 Jun 2012 12:37:16 -0700 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> <4FEE0006.9080403@noaa.gov> Message-ID: Hi, On Sat, Jun 30, 2012 at 12:29 PM, John Hunter wrote: > > > On Fri, Jun 29, 2012 at 2:20 PM, Jim Vickroy wrote: >> >> As a lurker and user, I too wish for a distinct numpy-users list. ?-- jv >> > > This thread is a perfect example of why another list is needed. ?It's > currently 42 semi-philosophical posts about what kind community numpy should > be and what kinds of lists or stacks should serve it. ?There needs to be a > place where people can ask simple 'how do I do x in numpy" questions without > having to wade through hundreds of posts about release cycles, community > input, process, and decisions about ABI and API compatibility in point > versus major releases. Oh - dear. I think the point that most of us agreed on was that having a different from: address wasn't a perfect solution for giving people space for asking newbie type questions. No-one has to read an email. If it looks boring or silly or irrelevant to your concerns, well, then ignore it. >?Most people just don't care -- they just want to be > reasonably sure that the developers do care and are doing it right. ?And if > they want to participate or observe these discussions, they know where to > go. ?It's like sausage making -- the more people get an inside look at how > the sausage is made, the more they are afraid to eat it. Not so in general. The more I hang out on the cython / sympy / ipython mailing lists, the more I feel like using and (if I can) contributing. > In mpl we have a devel list and a users list. ?Preparing for a release, we > might have a hundred emails about PR status and breakers and release cycles > and god knows what. ?The users list gets "rc1 is ready for testing", "rc2 is > ready for testing" and "v1.1.1 is released". ?That's about all most people > want to know about our release process. I can see an argument for numpy-announce. Best, Matthew From d.s.seljebotn at astro.uio.no Sat Jun 30 16:05:07 2012 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 30 Jun 2012 22:05:07 +0200 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> <4FEE0006.9080403@noaa.gov> Message-ID: <4FEF5BF3.4010701@astro.uio.no> On 06/30/2012 09:37 PM, Matthew Brett wrote: > Hi, > > On Sat, Jun 30, 2012 at 12:29 PM, John Hunter wrote: >> >> >> On Fri, Jun 29, 2012 at 2:20 PM, Jim Vickroy wrote: >>> >>> As a lurker and user, I too wish for a distinct numpy-users list. -- jv >>> >> >> This thread is a perfect example of why another list is needed. It's >> currently 42 semi-philosophical posts about what kind community numpy should >> be and what kinds of lists or stacks should serve it. There needs to be a >> place where people can ask simple 'how do I do x in numpy" questions without >> having to wade through hundreds of posts about release cycles, community >> input, process, and decisions about ABI and API compatibility in point >> versus major releases. > > Oh - dear. I think the point that most of us agreed on was that > having a different from: address wasn't a perfect solution for giving > people space for asking newbie type questions. No-one has to read an > email. If it looks boring or silly or irrelevant to your concerns, > well, then ignore it. I'd think most users sort different mailing lists into folders/tags/... automatically, not into their main inbox. Ignoring an email takes time; if I actually had to read the title of each thread of all the mailing lists I'm subscribed to it'd cost me a significant amount of time. >> Most people just don't care -- they just want to be >> reasonably sure that the developers do care and are doing it right. And if >> they want to participate or observe these discussions, they know where to >> go. It's like sausage making -- the more people get an inside look at how >> the sausage is made, the more they are afraid to eat it. > > Not so in general. The more I hang out on the cython / sympy / > ipython mailing lists, the more I feel like using and (if I can) > contributing. John specifically says "most people". You (and I) are not "most people". Dag From dhyams at gmail.com Sat Jun 30 16:15:49 2012 From: dhyams at gmail.com (Daniel Hyams) Date: Sat, 30 Jun 2012 16:15:49 -0400 Subject: [Numpy-discussion] Bug in pickling an ndarray? Message-ID: I am having trouble pickling (and then unpickling) an ndarray. Upon unpickling, the "base" attribute of the ndarray is set to some very strange string ("base" was None when the ndarray was pickled, so it should remain None). I have tried on various platforms and versions of numpy, with inconclusive results: # tested: Linux (Suse 11.1), numpy 1.5.1 BUG # Linux (Suse 11,0), numpy 1.6.1 OK # Linux (Mint Debian), numpy 1.6.1 BUG # Linux (Mint Debian), numpy 1.6.2 BUG # OSX (Snow Leopard), numpy 1.5.1rc1 BUG # OSX (Snow Leopard), numpy 1.6.2 BUG # Windows 7, numpy 1.4.1 OK I have attached a script below that can be used to check for the problem; I suppose that this is a bug report, unless I'm doing something terribly wrong or my expectations for the base attribute are off. ---------------- cut here --------------------------------- # this little demo shows a problem with the base attribute of an ndarray, when # pickling. Before pickling, dset.base is None, but after pickling, it is some # strange string. import cPickle as pickle import numpy print numpy.__version__ #import pickle dset = numpy.ones((2,2)) print "BEFORE PICKLING" print dset print "base = ",dset.base print dset.flags # pickle. s = pickle.dumps(dset) # now unpickle. dset = pickle.loads(s) print "AFTER PICKLING AND THEN IMMEDIATELY UNPICKLING" print dset print "base = ",dset.base print dset.flags -- Daniel Hyams dhyams at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sat Jun 30 16:25:22 2012 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 30 Jun 2012 13:25:22 -0700 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: <4FEF5BF3.4010701@astro.uio.no> References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> <4FEE0006.9080403@noaa.gov> <4FEF5BF3.4010701@astro.uio.no> Message-ID: Hi, On Sat, Jun 30, 2012 at 1:05 PM, Dag Sverre Seljebotn wrote: > On 06/30/2012 09:37 PM, Matthew Brett wrote: >> Hi, >> >> On Sat, Jun 30, 2012 at 12:29 PM, John Hunter ?wrote: >>> >>> >>> On Fri, Jun 29, 2012 at 2:20 PM, Jim Vickroy ?wrote: >>>> >>>> As a lurker and user, I too wish for a distinct numpy-users list. ?-- jv >>>> >>> >>> This thread is a perfect example of why another list is needed. ?It's >>> currently 42 semi-philosophical posts about what kind community numpy should >>> be and what kinds of lists or stacks should serve it. ?There needs to be a >>> place where people can ask simple 'how do I do x in numpy" questions without >>> having to wade through hundreds of posts about release cycles, community >>> input, process, and decisions about ABI and API compatibility in point >>> versus major releases. >> >> Oh - dear. ? I think the point that most of us agreed on was that >> having a different from: address wasn't a perfect solution for giving >> people space for asking newbie type questions. ?No-one has to read an >> email. ?If it looks boring or silly or irrelevant to your concerns, >> well, then ignore it. > > I'd think most users sort different mailing lists into folders/tags/... > automatically, not into their main inbox. > > Ignoring an email takes time; if I actually had to read the title of > each thread of all the mailing lists I'm subscribed to it'd cost me a > significant amount of time. > >>> ? Most people just don't care -- they just want to be >>> reasonably sure that the developers do care and are doing it right. ?And if >>> they want to participate or observe these discussions, they know where to >>> go. ?It's like sausage making -- the more people get an inside look at how >>> the sausage is made, the more they are afraid to eat it. >> >> Not so in general. ?The more I hang out on the cython / sympy / >> ipython mailing lists, the more I feel like using and (if I can) >> contributing. > > John specifically says "most people". You (and I) are not "most people". Heads up - navel gazing alert. Read no further if you feel sick looking at navels. It's very obvious to some people on this thread that a user mailing list is necessary. It's less obvious to others. I personally don't think it's clear cut and there are arguments both ways. We each of us base our opinions and arguments on our experience. We were all 'users' once, as many of us were students once. I'm a 'user' of Cython, and Sympy and IPython. Like the rest of us, I'm trying to work out what I, as a user, would want, at the same time as wishing for the best community for numpy. Best, Matthew From josef.pktd at gmail.com Sat Jun 30 16:26:47 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 30 Jun 2012 16:26:47 -0400 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: <4FEF5BF3.4010701@astro.uio.no> References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> <4FEE0006.9080403@noaa.gov> <4FEF5BF3.4010701@astro.uio.no> Message-ID: just some statistics http://stackoverflow.com/questions/tagged/numpy 769 followers, 2,850 questions tagged a guess: average response time for regular usage question far less than an hour http://stackoverflow.com/questions/tagged/scipy 446 followers, 991questions tagged http://stackoverflow.com/questions/tagged/matplotlib 438 followers, 1,861 questions tagged ... http://stackoverflow.com/questions/tagged/ipython 395 questions tagged http://stackoverflow.com/questions/tagged/pandas 91 followers, 174 questions tagged I'm only watching numpy and scipy, but mainly for unanswered questions because the fast response team is fast. http://stackoverflow.com/questions/tagged/r 2.5k followers, 14,057 questions tagged they also ask additional questions so they can build up a FAQ Josef From njs at pobox.com Sat Jun 30 16:30:46 2012 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 30 Jun 2012 21:30:46 +0100 Subject: [Numpy-discussion] Bug in pickling an ndarray? In-Reply-To: References: Message-ID: On Sat, Jun 30, 2012 at 9:15 PM, Daniel Hyams wrote: > I am having trouble pickling (and then unpickling) an ndarray. Upon > unpickling, the "base" attribute of the ndarray is set to some very strange > string ("base" was None when the ndarray was pickled, so it should remain > None). This sounds like correct behaviour to me -- is it causing you a problem? In general ndarray's don't keep things like memory layout, view sharing, etc. through pickling, and that means that things like .flags and .base may change. -n From tjhnson at gmail.com Sat Jun 30 16:43:45 2012 From: tjhnson at gmail.com (T J) Date: Sat, 30 Jun 2012 13:43:45 -0700 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> <4FEE0006.9080403@noaa.gov> <4FEF5BF3.4010701@astro.uio.no> Message-ID: On Sat, Jun 30, 2012 at 1:26 PM, wrote: > just some statistics > > http://stackoverflow.com/questions/tagged/numpy > 769 followers, 2,850 questions tagged > > a guess: average response time for regular usage question far less than an > hour > > http://stackoverflow.com/questions/tagged/scipy > 446 followers, 991questions tagged > > > Yes they are frequently very quick and pinpoint. To provide yet another data point, I will mention that I used to be an avid follower of comp.text.tex. I would post questions there and also read it for knowledge. Now, I use http://tex.stackexchange.com/ almost exclusively. I know many others have done the same. I've also noticed a number of LaTeX gurus using the stackexchange site more and more. Try googling a LaTeX (esp TikZ) question. Would you rather read through an archived newsgroup (mailing list in NumPy's case) or have a webpage with useful features, embedded images, etc? jdh noticed this as well: the majority of the messages to numpy-discussion in the last 2 months have not been "usage" questions but decisions, releases, debates, etc. Personally, I would push for the stackexchange solution over a 'user' mailing list. That said, comp.text.tex and tex.stackexchange.com coexist just fine---it just means there is redudnancy and not the good kind IMO. -------------- next part -------------- An HTML attachment was scrubbed... URL: From srean.list at gmail.com Sat Jun 30 16:50:24 2012 From: srean.list at gmail.com (srean) Date: Sat, 30 Jun 2012 15:50:24 -0500 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> <4FEE0006.9080403@noaa.gov> Message-ID: On Sat, Jun 30, 2012 at 2:29 PM, John Hunter wrote: > This thread is a perfect example of why another list is needed. +1 On Sat, Jun 30, 2012 at 2:37 PM, Matthew Brett wrote: > Oh - dear. I think the point that most of us agreed on was that > having a different from: address wasn't a perfect solution for giving > people space for asking newbie type questions. No-one has to read an > email. If it looks boring or silly or irrelevant to your concerns, > well, then ignore it. Looking at the same mails, it doesn't seem to me that most of us have agreed on that. It seems most have us have expressed that they will be satisfied with two different lists but are open about considering the stackoverflow model. The latter will require more work and time to get it going copmpared to the former. Aside: A logical conclusion of your "dont read mails that dont interest you" would be that spam is not a problem, after all no one has to read spam. If it looks boring or silly or irrelevant to your concerns, well, then ignore it. On Sat, Jun 30, 2012 at 1:57 PM, Dag Sverre Seljebotn wrote: > http://news.ycombinator.com/item?id=4131462 It seems it was mostly driven an argumentative troll, who had decided beforehand to disagree with some of the other folks and went about cooking up interpretations so that he/she can complain about them. Sadly, this list shows such tendencies at times as well. Anecdotal data-point: I have been happy with SO in general. It works for certain types of queries very well. OTOH if the answer to the question is known only to a few and he/she does not happen to be online at time the question was posted, and he/she does not "pull" such possible questions by key-words, that question is all but history. The difference is that on a mailing list questions are "pushed" on to people who might be able to answer it, whereas in SO model people have to actively seek questions they want to answer. Unanticipated, niche questions tend to disappear. From tjhnson at gmail.com Sat Jun 30 17:02:36 2012 From: tjhnson at gmail.com (T J) Date: Sat, 30 Jun 2012 14:02:36 -0700 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> <4FEE0006.9080403@noaa.gov> Message-ID: On Sat, Jun 30, 2012 at 1:50 PM, srean wrote: > > Anecdotal data-point: > I have been happy with SO in general. It works for certain types of > queries very well. OTOH if the answer to the question is known only to > a few and he/she does not happen to be online at time the question > was posted, and he/she does not "pull" such possible questions by > key-words, that question is all but history. > > The difference is that on a mailing list questions are "pushed" on to > people who might be able to answer it, whereas in SO model people have > to actively seek questions they want to answer. Unanticipated, niche > questions tend to disappear. > Isn't that what the various sections are for? http://stackoverflow.com/questions?sort=newest http://stackoverflow.com/questions?sort=unanswered And then, if you want modification-by-modification updates: http://stackoverflow.com/questions?sort=active Entries are sorted by date and you can view as many pages worth as are available. -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat Jun 30 17:06:56 2012 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 30 Jun 2012 17:06:56 -0400 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> <4FEE0006.9080403@noaa.gov> Message-ID: On Sat, Jun 30, 2012 at 5:02 PM, T J wrote: > On Sat, Jun 30, 2012 at 1:50 PM, srean wrote: >> >> >> Anecdotal data-point: >> I have been ?happy with SO in general. It works for certain types of >> queries very well. OTOH if the answer to the question is known only to >> a few and he/she does not happen to be online at ?time the question >> was posted, and he/she does not "pull" such possible questions by >> key-words, that question is all but history. >> >> The difference is that on a mailing list questions are "pushed" on to >> people who might be able to answer it, whereas in SO model people have >> to actively seek questions they want to answer. Unanticipated, niche >> questions tend to disappear. > > > Isn't that what the various sections are for? > > http://stackoverflow.com/questions?sort=newest > > http://stackoverflow.com/questions?sort=unanswered also by tag http://stackoverflow.com/questions/tagged/scipy?sort=unanswered&pagesize=50 sparse knowledge is scarse Josef > > And then, if you want modification-by-modification updates: > > http://stackoverflow.com/questions?sort=active > > Entries are sorted by date and you can view as many pages worth as are > available. > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From srean.list at gmail.com Sat Jun 30 17:23:42 2012 From: srean.list at gmail.com (srean) Date: Sat, 30 Jun 2012 16:23:42 -0500 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> <4FEE0006.9080403@noaa.gov> Message-ID: > Isn't that what the various sections are for? Indeed they are, but it still needs active "pulling" on behalf of those who would want to answer questions and even then a question can sink deep in the well. Deeper than what one typically monitors. Sometimes question are not appropriately tagged. Sometimes it is not obvious what the tag should be, or which tag is being monitored by the persons who might have the answer. Could be less of a problem for us given that its a more focused group and the predefined tags are not split too fine. I think the main issue is that SO requires more active engagement than a mailing list because checking for new mail has become something that almost everyone does by default anyway. Not saying SO is bad, I have benefited greatly from it, but this issues should be kept in mind. > http://stackoverflow.com/questions?sort=newest > http://stackoverflow.com/questions?sort=unanswered > And then, if you want modification-by-modification updates: > http://stackoverflow.com/questions?sort=active > Entries are sorted by date and you can view as many pages worth as are > available. From goxberry at gmail.com Sat Jun 30 17:30:57 2012 From: goxberry at gmail.com (Geoffrey Oxberry) Date: Sat, 30 Jun 2012 21:30:57 +0000 (UTC) Subject: [Numpy-discussion] Meta: help, devel and stackoverflow References: <96e6c662-a7e6-4b32-9c06-de5ca55a1cb8@email.android.com> <4FEF33A1.5030606@creativetrax.com> <4FEF4742.2000202@astro.uio.no> <4FEF4C11.2040604@astro.uio.no> Message-ID: Aron Ahmadia ahmadia.net> writes: > > > I and Geoff are moderators on scicomp, I'm happy to invest the effort > in getting the community started there. ?One way to use scicomp is like > a blog/faq, that is, if you get a specific question a lot here on the list or > elsewhere, you can ask and answer it yourself on scicomp. ?If others find > the post useful, they will vote it and the answer up. ? > > > Navel-gazing questions with generic scope are generally discouraged, > for a good feel for the sort of questions we'd be able to handle from a > scipy/numpy perspective on scicomp, take a look at either the > petsc or python tag feeds: > > http://scicomp.stackexchange.com/questions/tagged/petsc > > http://scicomp.stackexchange.com/questions/tagged/python > > On Sat, Jun 30, 2012 at 8:57 PM, Dag Sverre Seljebotn > astro.uio.no> wrote: > On 06/30/2012 08:44 PM, Fernando Perez wrote: > > On Sat, Jun 30, 2012 at 11:36 AM, Dag Sverre Seljebotn > > astro.uio.no> ?wrote: > >> It is rumored that a problem with some stackexchange sites is the host > >> of nay-sayers saying that a question doesn't belong here but in this > >> other silo instead, instead of just letting a culture develop (though my > >> only interface to stack*.com is Google too so I don't really know). > > > > Mmh, interesting... Not being a regular user myself, I have no idea. > > But it does sound like something worth clarifying before starting to > > push discussions in that direction. > Specifically:http://news.ycombinator.com/item?id=4131462 > But I see that Aron and Andy seem to have some authority on meta.scicomp > so it can't be too bad on scicomp...? > Dag_______________________________________________ > NumPy-Discussion mailing listNumPy-Discussion scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > To follow up on Aron's post, Computational Science Stack Exchange (http://scicomp.stackexchange.com/) is probably the best currently available Stack Exchange site for questions about NumPy usage (not NumPy development). I believe Tim Cera brought up Programmers Stack Exchange (http://programmers.stackexchange.com/). Programmers Stack Exchange would not accept NumPy questions, because it would be out of their scope. Programmers Stack Exchange focuses more on language-agnostic programming issues; they specifically point out that their site is not for programming tools. For code-heavy issues, (http://stackoverflow.com/) would be a better choice. The Area 51 process to create a NumPy Stack Exchange site will take time before it could progress to the point that the site is open as a public beta. For Computational Science Stack Exchange, it took 5 or so months to go from Area 51 proposal to public beta. In order to move from proposal to beta, the proposal needs to demonstrate that there will be enough active users to sustain a site. For now, I'd suggest posting on Computational Science Stack Exchange, because we already have a few NumPy questions, and could always use more questions on NumPy, SciPy, Matplotlib, and so on. Geoff From dhyams at gmail.com Sat Jun 30 18:33:49 2012 From: dhyams at gmail.com (Daniel Hyams) Date: Sat, 30 Jun 2012 18:33:49 -0400 Subject: [Numpy-discussion] Bug in pickling an ndarray? In-Reply-To: References: Message-ID: Hmmm, I wouldn't think that it is correct behavior; I would think that *any* ndarray arising from pickling would have its .base attribute set to None. If not, then who is really the one that owns the data? It was my understanding that .base should hold a reference to another ndarray that the data is really coming from, or it's None. It certainly shouldn't be some random string, should it? And yes, it is causing a problem for me, which is why I noticed it. In my application, ndarrays can come from various sources, pickling being one of them. Later in the app, I was wanting to resize the array, which you cannot do if the data is not really owned by that array...I had explicit check for myarray.base==None, which it is not when I get the ndarray from a pickle. -- Daniel Hyams dhyams at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Sat Jun 30 18:37:02 2012 From: travis at continuum.io (Travis Oliphant) Date: Sat, 30 Jun 2012 17:37:02 -0500 Subject: [Numpy-discussion] Bug in pickling an ndarray? In-Reply-To: References: Message-ID: <50544D1F-715D-4B3C-A7BD-72A38130FBA9@continuum.io> This is the expected behavior. It is not a bug. NumPy arrays after pickling are views into the String that is created by the pickling machinery. Thus, the base is set. This was done to avoid an additional memcpy. This avoids a copy, but yes, it does mean that you can't resize the array until you make another copy. Best regards, -Travis On Jun 30, 2012, at 5:33 PM, Daniel Hyams wrote: > Hmmm, I wouldn't think that it is correct behavior; I would think that *any* ndarray arising from pickling would have its .base attribute set to None. If not, then who is really the one that owns the data? > > It was my understanding that .base should hold a reference to another ndarray that the data is really coming from, or it's None. It certainly shouldn't be some random string, should it? > > And yes, it is causing a problem for me, which is why I noticed it. In my application, ndarrays can come from various sources, pickling being one of them. Later in the app, I was wanting to resize the array, which you cannot do if the data is not really owned by that array...I had explicit check for myarray.base==None, which it is not when I get the ndarray from a pickle. > > > -- > Daniel Hyams > dhyams at gmail.com > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason-sage at creativetrax.com Sat Jun 30 18:39:15 2012 From: jason-sage at creativetrax.com (Jason Grout) Date: Sat, 30 Jun 2012 17:39:15 -0500 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> <4FEE0006.9080403@noaa.gov> Message-ID: <4FEF8013.8020602@creativetrax.com> On 6/30/12 4:23 PM, srean wrote: > Indeed they are, but it still needs active "pulling" on behalf of > those who would want to answer questions and even then a question can > sink deep in the well. Deeper than what one typically monitors. > Sometimes question are not appropriately tagged. Sometimes it is not > obvious what the tag should be, or which tag is being monitored by > the persons who might have the answer. > > Could be less of a problem for us given that its a more focused group > and the predefined tags are not split too fine. > > I think the main issue is that SO requires more active engagement than > a mailing list because checking for new mail has become something that > almost everyone does by default anyway. > > Not saying SO is bad, I have benefited greatly from it, but this > issues should be kept in mind. You can subscribe to be notified by email whenever a question is posted to a certain tag. So then it is no different than a mailing list as far as push/pull. As far as mistagging---that is no different than posting to the wrong mailing list, so I don't see how that is an extra problem. In fact, since it's easy to switch the tags, it's easier than a mailing list to shuttle a question to the right "mailing list"/tag. Thanks, Jason -- Jason Grout From robert.kern at gmail.com Sat Jun 30 18:41:51 2012 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 30 Jun 2012 23:41:51 +0100 Subject: [Numpy-discussion] Bug in pickling an ndarray? In-Reply-To: References: Message-ID: On Sat, Jun 30, 2012 at 11:33 PM, Daniel Hyams wrote: > Hmmm, I wouldn't think that it is correct behavior; I would think that *any* > ndarray arising from pickling would have its .base attribute set to None. > ?If not, then who is really the one that owns the data? > > It was my understanding that .base should hold a reference to another > ndarray that the data is really coming from, or it's None. ?It certainly > shouldn't be some random string, should it? It can be any object that will keep the data memory alive while the object is kept alive. It does not have to be an ndarray. In this case, the numpy unpickling constructor takes the string object that the underlying pickling machinery has just created and views its memory directly. In order to keep Python from freeing that memory, the string object needs to be kept alive via a reference, so it gets assigned to the .base. > And yes, it is causing a problem for me, which is why I noticed it. ?In my > application, ndarrays can come from various sources, pickling being one of > them. ?Later in the app, I was wanting to resize the array, which you cannot > do if the data is not really owned by that array... You also can't resize an array if any *other* array has a view on that array too, so checking for ownership isn't going to help. .resize() will raise an exception if it can't do this; it's better to just attempt it and catch the exception than to look before you leap. > I had explicit check for > myarray.base==None, which it is not when I get the ndarray from a pickle. That is not the way to check if an ndarray owns its data. Instead, check a.flags['OWNDATA'] -- Robert Kern From srean.list at gmail.com Sat Jun 30 19:25:30 2012 From: srean.list at gmail.com (srean) Date: Sat, 30 Jun 2012 18:25:30 -0500 Subject: [Numpy-discussion] Meta: help, devel and stackoverflow In-Reply-To: <4FEF8013.8020602@creativetrax.com> References: <1AD9EE2E-C286-456D-ABC1-C7EFB4E84C80@continuum.io> <4FEE0006.9080403@noaa.gov> <4FEF8013.8020602@creativetrax.com> Message-ID: > You can subscribe to be notified by email whenever a question is posted > to a certain tag. Absolutely true. > ?So then it is no different than a mailing list as far > as push/pull. There are a few differences though. New tags get created often, potentially in a decentralized fashion and dynamically, way more often than creation of lists. Thats why the need to actively monitor. Another is in frequency of subscription, how often does a user of SO subscribe to a tag. Yet another is that tags are usually are much more specific than a typical charter of a mailing list and thats a good thing because it makes things easier to find nd browse. I think if the tags are kept broad enough (or it is ensured that finer tags inherit from broader tags. For example numpy.foo where foo can be created according to the existing SO rules of tag creation ) and participants here are willing to subscribe to those tags, there wont be much of a difference. So, just two qualifiers. In addition if there is a way to bounce-n-answer user questions posted here to the SO forum relatively painlessy that will be quite nice too. May be something that creates a new user based on user's mail id, mails him/her the response and a password with which he/she can take control of the id. It is more polite and may be a good way for the SO site to collect more users. Best --srean From dhyams at gmail.com Sat Jun 30 20:04:46 2012 From: dhyams at gmail.com (Daniel Hyams) Date: Sat, 30 Jun 2012 20:04:46 -0400 Subject: [Numpy-discussion] Bug in pickling an ndarray? In-Reply-To: References: Message-ID: Thanks Travis and Robert for the clarification; it is much more clear what is going on now. As the demo code shows, also a.flags['OWNDATA'] is different on its way out of the pickle; which also makes sense now. So using that flag instead of checking a.base for None is equivalent, at least in this situation. So is it a bug, then, that, on Windows, .base is set to None (of course, this may be something that was fixed in later versions of numpy; I was only able to test Windows with numpy 1.4.1). I'll just make a copy and discard the original to work around the situation (which is what I already had done, but the inconsistent behavior across versions and platforms made me think it was a bug). Thanks again for the clear explanation of what is going on. On Sat, Jun 30, 2012 at 6:33 PM, Daniel Hyams wrote: > Hmmm, I wouldn't think that it is correct behavior; I would think that > *any* ndarray arising from pickling would have its .base attribute set to > None. If not, then who is really the one that owns the data? > > It was my understanding that .base should hold a reference to another > ndarray that the data is really coming from, or it's None. It certainly > shouldn't be some random string, should it? > > And yes, it is causing a problem for me, which is why I noticed it. In my > application, ndarrays can come from various sources, pickling being one of > them. Later in the app, I was wanting to resize the array, which you > cannot do if the data is not really owned by that array...I had explicit > check for myarray.base==None, which it is not when I get the ndarray from a > pickle. > > > -- > Daniel Hyams > dhyams at gmail.com > -- Daniel Hyams dhyams at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: