From cournape at gmail.com Fri Aug 1 01:46:06 2014 From: cournape at gmail.com (David Cournapeau) Date: Fri, 1 Aug 2014 14:46:06 +0900 Subject: [Numpy-discussion] Remove numpy/compat/_inspect.py ? In-Reply-To: References: Message-ID: The docstring at the beginning of the module is still relevant AFAIK: it was about decreasing import times. See http://mail.scipy.org/pipermail/numpy-discussion/2009-October/045981.html On Fri, Aug 1, 2014 at 10:27 AM, Charles R Harris wrote: > Hi All, > > The _inspect.py function looks like a numpy version of the python inspect > function. ISTR that is was a work around for problems with the early python > versions, but that would have been back in 2009. > > Thoughts? > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Fri Aug 1 04:20:29 2014 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Fri, 1 Aug 2014 10:20:29 +0200 Subject: [Numpy-discussion] OSX wheels for older numpy versions on pypi In-Reply-To: References: Message-ID: 2014-07-31 22:40 GMT+02:00 Matthew Brett : > > Sure, I built and uploaded: > > scipy-0.12.0 py27 > scipy-0.13.0 py27, 33, 34 > > Are there any others you need? Thanks, this is already great. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel From charlesr.harris at gmail.com Fri Aug 1 07:57:47 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 1 Aug 2014 05:57:47 -0600 Subject: [Numpy-discussion] Remove numpy/compat/_inspect.py ? In-Reply-To: References: Message-ID: On Thu, Jul 31, 2014 at 11:46 PM, David Cournapeau wrote: > The docstring at the beginning of the module is still relevant AFAIK: it > was about decreasing import times. See > http://mail.scipy.org/pipermail/numpy-discussion/2009-October/045981.html > > > On Fri, Aug 1, 2014 at 10:27 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> Hi All, >> >> The _inspect.py function looks like a numpy version of the python inspect >> function. ISTR that is was a work around for problems with the early python >> versions, but that would have been back in 2009. >> >> Thoughts? >> >> It's only used in one function. def get_object_signature(obj): """ Get the signature from obj """ try: sig = formatargspec(*getargspec(obj)) except TypeError as errmsg: sig = '' # msg = "Unable to retrieve the signature of %s '%s'\n"\ # "(Initial error message: %s)" # warnings.warn(msg % (type(obj), # getattr(obj, '__name__', '???'), # errmsg)) return sig Where a local import would do as well. It also has bugs, so evidently isn't called often ;) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Fri Aug 1 08:37:51 2014 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 1 Aug 2014 13:37:51 +0100 Subject: [Numpy-discussion] Remove numpy/compat/_inspect.py ? In-Reply-To: References: Message-ID: On Fri, Aug 1, 2014 at 12:57 PM, Charles R Harris wrote: > > On Thu, Jul 31, 2014 at 11:46 PM, David Cournapeau > wrote: >> >> The docstring at the beginning of the module is still relevant AFAIK: it >> was about decreasing import times. See >> http://mail.scipy.org/pipermail/numpy-discussion/2009-October/045981.html >> >> >> On Fri, Aug 1, 2014 at 10:27 AM, Charles R Harris >> wrote: >>> >>> Hi All, >>> >>> The _inspect.py function looks like a numpy version of the python inspect >>> function. ISTR that is was a work around for problems with the early python >>> versions, but that would have been back in 2009. >>> >>> Thoughts? >>> > > It's only used in one function. Yes, one function that is called at startup, so no, a local import of the stdlib inspect module would not help. > def get_object_signature(obj): > """ > Get the signature from obj > """ > try: > sig = formatargspec(*getargspec(obj)) > except TypeError as errmsg: > sig = '' > # msg = "Unable to retrieve the signature of %s '%s'\n"\ > # "(Initial error message: %s)" > # warnings.warn(msg % (type(obj), > # getattr(obj, '__name__', '???'), > # errmsg)) > return sig > > Where a local import would do as well. It also has bugs, so evidently isn't > called often ;) What bugs? Any bugs relevant to the objects that get_object_signature() is called with? It does not have to work for anything else but those. -- Robert Kern From charlesr.harris at gmail.com Fri Aug 1 09:03:38 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 1 Aug 2014 07:03:38 -0600 Subject: [Numpy-discussion] Remove numpy/compat/_inspect.py ? In-Reply-To: References: Message-ID: On Fri, Aug 1, 2014 at 6:37 AM, Robert Kern wrote: > On Fri, Aug 1, 2014 at 12:57 PM, Charles R Harris > wrote: > > > > On Thu, Jul 31, 2014 at 11:46 PM, David Cournapeau > > wrote: > >> > >> The docstring at the beginning of the module is still relevant AFAIK: it > >> was about decreasing import times. See > >> > http://mail.scipy.org/pipermail/numpy-discussion/2009-October/045981.html > >> > >> > >> On Fri, Aug 1, 2014 at 10:27 AM, Charles R Harris > >> wrote: > >>> > >>> Hi All, > >>> > >>> The _inspect.py function looks like a numpy version of the python > inspect > >>> function. ISTR that is was a work around for problems with the early > python > >>> versions, but that would have been back in 2009. > >>> > >>> Thoughts? > >>> > > > > It's only used in one function. > > Yes, one function that is called at startup, so no, a local import of > the stdlib inspect module would not help. > > > def get_object_signature(obj): > > """ > > Get the signature from obj > > """ > > try: > > sig = formatargspec(*getargspec(obj)) > > except TypeError as errmsg: > > sig = '' > > # msg = "Unable to retrieve the signature of %s '%s'\n"\ > > # "(Initial error message: %s)" > > # warnings.warn(msg % (type(obj), > > # getattr(obj, '__name__', '???'), > > # errmsg)) > > return sig > > > > Where a local import would do as well. It also has bugs, so evidently > isn't > > called often ;) > > What bugs? Any bugs relevant to the objects that > get_object_signature() is called with? It does not have to work for > anything else but those. > Undefined variables in getargs. The only two functions used from the module are very small and could simply be brought into `ma/core.py`. The python inspect module is used elsewhere... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Aug 1 09:54:06 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 1 Aug 2014 07:54:06 -0600 Subject: [Numpy-discussion] Remove numpy/compat/_inspect.py ? In-Reply-To: References: Message-ID: On Fri, Aug 1, 2014 at 7:03 AM, Charles R Harris wrote: > > > > On Fri, Aug 1, 2014 at 6:37 AM, Robert Kern wrote: > >> On Fri, Aug 1, 2014 at 12:57 PM, Charles R Harris >> wrote: >> > >> > On Thu, Jul 31, 2014 at 11:46 PM, David Cournapeau >> > wrote: >> >> >> >> The docstring at the beginning of the module is still relevant AFAIK: >> it >> >> was about decreasing import times. See >> >> >> http://mail.scipy.org/pipermail/numpy-discussion/2009-October/045981.html >> >> >> >> >> >> On Fri, Aug 1, 2014 at 10:27 AM, Charles R Harris >> >> wrote: >> >>> >> >>> Hi All, >> >>> >> >>> The _inspect.py function looks like a numpy version of the python >> inspect >> >>> function. ISTR that is was a work around for problems with the early >> python >> >>> versions, but that would have been back in 2009. >> >>> >> >>> Thoughts? >> >>> >> > >> > It's only used in one function. >> >> Yes, one function that is called at startup, so no, a local import of >> the stdlib inspect module would not help. >> >> > def get_object_signature(obj): >> > """ >> > Get the signature from obj >> > """ >> > try: >> > sig = formatargspec(*getargspec(obj)) >> > except TypeError as errmsg: >> > sig = '' >> > # msg = "Unable to retrieve the signature of %s '%s'\n"\ >> > # "(Initial error message: %s)" >> > # warnings.warn(msg % (type(obj), >> > # getattr(obj, '__name__', '???'), >> > # errmsg)) >> > return sig >> > >> > Where a local import would do as well. It also has bugs, so evidently >> isn't >> > called often ;) >> >> What bugs? Any bugs relevant to the objects that >> get_object_signature() is called with? It does not have to work for >> anything else but those. >> > > Undefined variables in getargs. The only two functions used from the > module are very small and could simply be brought into `ma/core.py`. The > python inspect module is used elsewhere... > Importing inspect looks to take about 500 ns on my machine. Although It is hard to be exact, as I suspect the file is sitting in the file cache. Would probably be slower with hard disks. But as the inspect module is already imported elsewhere, the python interpreter should also have it cached. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Fri Aug 1 09:59:53 2014 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 1 Aug 2014 14:59:53 +0100 Subject: [Numpy-discussion] Remove numpy/compat/_inspect.py ? In-Reply-To: References: Message-ID: On Fri, Aug 1, 2014 at 2:54 PM, Charles R Harris wrote: > Importing inspect looks to take about 500 ns on my machine. Although It is > hard to be exact, as I suspect the file is sitting in the file cache. Would > probably be slower with hard disks. Or where site-packages is on NFS. > But as the inspect module is already > imported elsewhere, the python interpreter should also have it cached. Not on a normal import it's not. >>> import numpy >>> import sys >>> sys.modules['inspect'] Traceback (most recent call last): File "", line 1, in KeyError: 'inspect' You should feel free to remove whatever parts of `_inspect` are not being used and to move the parts that are closer to where they are used if you feel compelled to. Please do not replace the current uses of `_inspect` with `inspect`. -- Robert Kern From charlesr.harris at gmail.com Fri Aug 1 10:23:39 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 1 Aug 2014 08:23:39 -0600 Subject: [Numpy-discussion] Remove numpy/compat/_inspect.py ? In-Reply-To: References: Message-ID: On Fri, Aug 1, 2014 at 7:59 AM, Robert Kern wrote: > On Fri, Aug 1, 2014 at 2:54 PM, Charles R Harris > wrote: > > > Importing inspect looks to take about 500 ns on my machine. Although It > is > > hard to be exact, as I suspect the file is sitting in the file cache. > Would > > probably be slower with hard disks. > > Or where site-packages is on NFS. > > > But as the inspect module is already > > imported elsewhere, the python interpreter should also have it cached. > > Not on a normal import it's not. > > >>> import numpy > >>> import sys > >>> sys.modules['inspect'] > Traceback (most recent call last): > File "", line 1, in > KeyError: 'inspect' > There are two lazy imports of inspect. > > You should feel free to remove whatever parts of `_inspect` are not > being used and to move the parts that are closer to where they are > used if you feel compelled to. Please do not replace the current uses > of `_inspect` with `inspect`. > It is used in just one place. Is importing inspect so much slower than all the other imports we do? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Fri Aug 1 10:29:01 2014 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 1 Aug 2014 15:29:01 +0100 Subject: [Numpy-discussion] Remove numpy/compat/_inspect.py ? In-Reply-To: References: Message-ID: On Fri, Aug 1, 2014 at 3:23 PM, Charles R Harris wrote: > > On Fri, Aug 1, 2014 at 7:59 AM, Robert Kern wrote: >> >> On Fri, Aug 1, 2014 at 2:54 PM, Charles R Harris >> wrote: >> >> > Importing inspect looks to take about 500 ns on my machine. Although It >> > is >> > hard to be exact, as I suspect the file is sitting in the file cache. >> > Would >> > probably be slower with hard disks. >> >> Or where site-packages is on NFS. >> >> > But as the inspect module is already >> > imported elsewhere, the python interpreter should also have it cached. >> >> Not on a normal import it's not. >> >> >>> import numpy >> >>> import sys >> >>> sys.modules['inspect'] >> Traceback (most recent call last): >> File "", line 1, in >> KeyError: 'inspect' > > There are two lazy imports of inspect. Sure, but get_object_signature() is called unlazily when numpy is imported. >> You should feel free to remove whatever parts of `_inspect` are not >> being used and to move the parts that are closer to where they are >> used if you feel compelled to. Please do not replace the current uses >> of `_inspect` with `inspect`. > > It is used in just one place. So? That one place is always called whenever numpy is imported. > Is importing inspect so much slower than all > the other imports we do? Yeah, it's pretty bad. -- Robert Kern From charlesr.harris at gmail.com Fri Aug 1 11:23:51 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 1 Aug 2014 09:23:51 -0600 Subject: [Numpy-discussion] Remove numpy/compat/_inspect.py ? In-Reply-To: References: Message-ID: On Fri, Aug 1, 2014 at 8:29 AM, Robert Kern wrote: > On Fri, Aug 1, 2014 at 3:23 PM, Charles R Harris > wrote: > > > > On Fri, Aug 1, 2014 at 7:59 AM, Robert Kern > wrote: > >> > >> On Fri, Aug 1, 2014 at 2:54 PM, Charles R Harris > >> wrote: > >> > >> > Importing inspect looks to take about 500 ns on my machine. Although > It > >> > is > >> > hard to be exact, as I suspect the file is sitting in the file cache. > >> > Would > >> > probably be slower with hard disks. > >> > >> Or where site-packages is on NFS. > >> > >> > But as the inspect module is already > >> > imported elsewhere, the python interpreter should also have it cached. > >> > >> Not on a normal import it's not. > >> > >> >>> import numpy > >> >>> import sys > >> >>> sys.modules['inspect'] > >> Traceback (most recent call last): > >> File "", line 1, in > >> KeyError: 'inspect' > > > > There are two lazy imports of inspect. > > Sure, but get_object_signature() is called unlazily when numpy is imported. > > >> You should feel free to remove whatever parts of `_inspect` are not > >> being used and to move the parts that are closer to where they are > >> used if you feel compelled to. Please do not replace the current uses > >> of `_inspect` with `inspect`. > > > > It is used in just one place. > > So? That one place is always called whenever numpy is imported. > > > Is importing inspect so much slower than all > > the other imports we do? > > Yeah, it's pretty bad. > > The buggy code is for tuple parameter unpacking, a path that is not exercised and a feature not in python 3. So... is it safe to excise that nasty bit of code, or does Enthought make use of the numpy _inspect module? The other (fixable) error is in formatargvalues, which is not in __all__ and not used as far as I can tell. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Fri Aug 1 11:29:24 2014 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 1 Aug 2014 16:29:24 +0100 Subject: [Numpy-discussion] Remove numpy/compat/_inspect.py ? In-Reply-To: References: Message-ID: On Fri, Aug 1, 2014 at 4:23 PM, Charles R Harris wrote: > > On Fri, Aug 1, 2014 at 8:29 AM, Robert Kern wrote: >> >> On Fri, Aug 1, 2014 at 3:23 PM, Charles R Harris >> wrote: >> > >> > On Fri, Aug 1, 2014 at 7:59 AM, Robert Kern >> > wrote: >> >> >> >> On Fri, Aug 1, 2014 at 2:54 PM, Charles R Harris >> >> wrote: >> >> >> >> > Importing inspect looks to take about 500 ns on my machine. Although >> >> > It >> >> > is >> >> > hard to be exact, as I suspect the file is sitting in the file cache. >> >> > Would >> >> > probably be slower with hard disks. >> >> >> >> Or where site-packages is on NFS. >> >> >> >> > But as the inspect module is already >> >> > imported elsewhere, the python interpreter should also have it >> >> > cached. >> >> >> >> Not on a normal import it's not. >> >> >> >> >>> import numpy >> >> >>> import sys >> >> >>> sys.modules['inspect'] >> >> Traceback (most recent call last): >> >> File "", line 1, in >> >> KeyError: 'inspect' >> > >> > There are two lazy imports of inspect. >> >> Sure, but get_object_signature() is called unlazily when numpy is >> imported. >> >> >> You should feel free to remove whatever parts of `_inspect` are not >> >> being used and to move the parts that are closer to where they are >> >> used if you feel compelled to. Please do not replace the current uses >> >> of `_inspect` with `inspect`. >> > >> > It is used in just one place. >> >> So? That one place is always called whenever numpy is imported. >> >> > Is importing inspect so much slower than all >> > the other imports we do? >> >> Yeah, it's pretty bad. >> > > The buggy code is for tuple parameter unpacking, a path that is not > exercised and a feature not in python 3. So... is it safe to excise that > nasty bit of code, "You should feel free to remove whatever parts of `_inspect` are not being used." > or does Enthought make use of the numpy _inspect module? No, of course not. It's _private for a reason. > The other (fixable) error is in formatargvalues, which is not in __all__ and > not used as far as I can tell. -- Robert Kern From charlesr.harris at gmail.com Fri Aug 1 11:35:55 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 1 Aug 2014 09:35:55 -0600 Subject: [Numpy-discussion] Remove numpy/compat/_inspect.py ? In-Reply-To: References: Message-ID: On Fri, Aug 1, 2014 at 9:29 AM, Robert Kern wrote: > On Fri, Aug 1, 2014 at 4:23 PM, Charles R Harris > wrote: > > > > On Fri, Aug 1, 2014 at 8:29 AM, Robert Kern > wrote: > >> > >> On Fri, Aug 1, 2014 at 3:23 PM, Charles R Harris > >> wrote: > >> > > >> > On Fri, Aug 1, 2014 at 7:59 AM, Robert Kern > >> > wrote: > >> >> > >> >> On Fri, Aug 1, 2014 at 2:54 PM, Charles R Harris > >> >> wrote: > >> >> > >> >> > Importing inspect looks to take about 500 ns on my machine. > Although > >> >> > It > >> >> > is > >> >> > hard to be exact, as I suspect the file is sitting in the file > cache. > >> >> > Would > >> >> > probably be slower with hard disks. > >> >> > >> >> Or where site-packages is on NFS. > >> >> > >> >> > But as the inspect module is already > >> >> > imported elsewhere, the python interpreter should also have it > >> >> > cached. > >> >> > >> >> Not on a normal import it's not. > >> >> > >> >> >>> import numpy > >> >> >>> import sys > >> >> >>> sys.modules['inspect'] > >> >> Traceback (most recent call last): > >> >> File "", line 1, in > >> >> KeyError: 'inspect' > >> > > >> > There are two lazy imports of inspect. > >> > >> Sure, but get_object_signature() is called unlazily when numpy is > >> imported. > >> > >> >> You should feel free to remove whatever parts of `_inspect` are not > >> >> being used and to move the parts that are closer to where they are > >> >> used if you feel compelled to. Please do not replace the current uses > >> >> of `_inspect` with `inspect`. > >> > > >> > It is used in just one place. > >> > >> So? That one place is always called whenever numpy is imported. > >> > >> > Is importing inspect so much slower than all > >> > the other imports we do? > >> > >> Yeah, it's pretty bad. > >> > > > > The buggy code is for tuple parameter unpacking, a path that is not > > exercised and a feature not in python 3. So... is it safe to excise that > > nasty bit of code, > > "You should feel free to remove whatever parts of `_inspect` are not > being used." > > > or does Enthought make use of the numpy _inspect module? > > No, of course not. It's _private for a reason. > > > The other (fixable) error is in formatargvalues, which is not in __all__ > and > > not used as far as I can tell. > > There is a missing import of the disassembler, `dis`, which I suspect would add substantially to the import time. So it looks like the easy path is to excise the code. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Fri Aug 1 16:11:49 2014 From: cournape at gmail.com (David Cournapeau) Date: Sat, 2 Aug 2014 05:11:49 +0900 Subject: [Numpy-discussion] Remove numpy/compat/_inspect.py ? In-Reply-To: References: Message-ID: On Fri, Aug 1, 2014 at 11:23 PM, Charles R Harris wrote: > > > > On Fri, Aug 1, 2014 at 7:59 AM, Robert Kern wrote: > >> On Fri, Aug 1, 2014 at 2:54 PM, Charles R Harris >> wrote: >> >> > Importing inspect looks to take about 500 ns on my machine. Although >> It is >> > hard to be exact, as I suspect the file is sitting in the file cache. >> Would >> > probably be slower with hard disks. >> >> Or where site-packages is on NFS. >> >> > But as the inspect module is already >> > imported elsewhere, the python interpreter should also have it cached. >> >> Not on a normal import it's not. >> >> >>> import numpy >> >>> import sys >> >>> sys.modules['inspect'] >> Traceback (most recent call last): >> File "", line 1, in >> KeyError: 'inspect' >> > > There are two lazy imports of inspect. > > >> >> You should feel free to remove whatever parts of `_inspect` are not >> being used and to move the parts that are closer to where they are >> used if you feel compelled to. Please do not replace the current uses >> of `_inspect` with `inspect`. >> > > It is used in just one place. Is importing inspect so much slower than all > the other imports we do? > Yes, please look at the thread I referred to. The custom inspect cut imports by 30 %, I doubt the ratio is much different today. David -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Fri Aug 1 22:01:47 2014 From: cournape at gmail.com (David Cournapeau) Date: Sat, 2 Aug 2014 11:01:47 +0900 Subject: [Numpy-discussion] Remove numpy/compat/_inspect.py ? In-Reply-To: References: Message-ID: On my machine, if I use inspect instead of _inspect in numpy.compat.__init__, the import time increases ~ 25 % (from 82 ms to 99 ms). So the hack certainly still make sense, one just need to fix whatever needs fixing (I am still not sure what's broken for the very specific usecase that code was bundled for). David On Sat, Aug 2, 2014 at 5:11 AM, David Cournapeau wrote: > > > > On Fri, Aug 1, 2014 at 11:23 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> >> On Fri, Aug 1, 2014 at 7:59 AM, Robert Kern >> wrote: >> >>> On Fri, Aug 1, 2014 at 2:54 PM, Charles R Harris >>> wrote: >>> >>> > Importing inspect looks to take about 500 ns on my machine. Although >>> It is >>> > hard to be exact, as I suspect the file is sitting in the file cache. >>> Would >>> > probably be slower with hard disks. >>> >>> Or where site-packages is on NFS. >>> >>> > But as the inspect module is already >>> > imported elsewhere, the python interpreter should also have it cached. >>> >>> Not on a normal import it's not. >>> >>> >>> import numpy >>> >>> import sys >>> >>> sys.modules['inspect'] >>> Traceback (most recent call last): >>> File "", line 1, in >>> KeyError: 'inspect' >>> >> >> There are two lazy imports of inspect. >> >> >>> >>> You should feel free to remove whatever parts of `_inspect` are not >>> being used and to move the parts that are closer to where they are >>> used if you feel compelled to. Please do not replace the current uses >>> of `_inspect` with `inspect`. >>> >> >> It is used in just one place. Is importing inspect so much slower than >> all the other imports we do? >> > > Yes, please look at the thread I referred to. The custom inspect cut > imports by 30 %, I doubt the ratio is much different today. > > David > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Aug 1 22:17:09 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 1 Aug 2014 20:17:09 -0600 Subject: [Numpy-discussion] Remove numpy/compat/_inspect.py ? In-Reply-To: References: Message-ID: On Fri, Aug 1, 2014 at 8:01 PM, David Cournapeau wrote: > On my machine, if I use inspect instead of _inspect in > numpy.compat.__init__, the import time increases ~ 25 % (from 82 ms to 99 > ms). > > So the hack certainly still make sense, one just need to fix whatever > needs fixing (I am still not sure what's broken for the very specific > usecase that code was bundled for). > > I'm not sure a one time hit of 17 ms is worth fighting for ;) The problems were that both the `string` and `dis` modules were used without importing them. Evidently those code paths were never traversed, so I removed the code using `dis` and raised an error there instead, it was for parsing tuple arguments. The string.join was fixed using the string method. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Fri Aug 1 22:22:19 2014 From: cournape at gmail.com (David Cournapeau) Date: Sat, 2 Aug 2014 11:22:19 +0900 Subject: [Numpy-discussion] Remove numpy/compat/_inspect.py ? In-Reply-To: References: Message-ID: On Sat, Aug 2, 2014 at 11:17 AM, Charles R Harris wrote: > > > > On Fri, Aug 1, 2014 at 8:01 PM, David Cournapeau > wrote: > >> On my machine, if I use inspect instead of _inspect in >> numpy.compat.__init__, the import time increases ~ 25 % (from 82 ms to 99 >> ms). >> >> So the hack certainly still make sense, one just need to fix whatever >> needs fixing (I am still not sure what's broken for the very specific >> usecase that code was bundled for). >> >> > I'm not sure a one time hit of 17 ms is worth fighting for ;) The problems > were that both the `string` and `dis` modules were used without importing > them. > Don't fix what ain't broken ;) The 17 ms is not what matters, the % is. People regularly complain about import times, and 25 % increase in import time is significant (the above timing are on my new macbook with SSD and 16 Gb RAM -- figures will easily be 1 order of magnitude worse in common situations with slower computers, slower HDD, NFS, etc...) David -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Aug 1 22:36:52 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 1 Aug 2014 20:36:52 -0600 Subject: [Numpy-discussion] Remove numpy/compat/_inspect.py ? In-Reply-To: References: Message-ID: On Fri, Aug 1, 2014 at 8:22 PM, David Cournapeau wrote: > > > > On Sat, Aug 2, 2014 at 11:17 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> >> On Fri, Aug 1, 2014 at 8:01 PM, David Cournapeau >> wrote: >> >>> On my machine, if I use inspect instead of _inspect in >>> numpy.compat.__init__, the import time increases ~ 25 % (from 82 ms to 99 >>> ms). >>> >>> So the hack certainly still make sense, one just need to fix whatever >>> needs fixing (I am still not sure what's broken for the very specific >>> usecase that code was bundled for). >>> >>> >> I'm not sure a one time hit of 17 ms is worth fighting for ;) The >> problems were that both the `string` and `dis` modules were used without >> importing them. >> > > Don't fix what ain't broken ;) > > The 17 ms is not what matters, the % is. People regularly complain about > import times, and 25 % increase in import time is significant (the above > timing are on my new macbook with SSD and 16 Gb RAM -- figures will easily > be 1 order of magnitude worse in common situations with slower computers, > slower HDD, NFS, etc...) > Be interesting to compare times. Could you send along the code you used? My machine is similar except it is a desktop with 2 SSDs in raid 0. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sat Aug 2 00:42:38 2014 From: cournape at gmail.com (David Cournapeau) Date: Sat, 2 Aug 2014 13:42:38 +0900 Subject: [Numpy-discussion] Remove numpy/compat/_inspect.py ? In-Reply-To: References: Message-ID: On Sat, Aug 2, 2014 at 11:36 AM, Charles R Harris wrote: > > > > On Fri, Aug 1, 2014 at 8:22 PM, David Cournapeau > wrote: > >> >> >> >> On Sat, Aug 2, 2014 at 11:17 AM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> >>> >>> >>> On Fri, Aug 1, 2014 at 8:01 PM, David Cournapeau >>> wrote: >>> >>>> On my machine, if I use inspect instead of _inspect in >>>> numpy.compat.__init__, the import time increases ~ 25 % (from 82 ms to 99 >>>> ms). >>>> >>>> So the hack certainly still make sense, one just need to fix whatever >>>> needs fixing (I am still not sure what's broken for the very specific >>>> usecase that code was bundled for). >>>> >>>> >>> I'm not sure a one time hit of 17 ms is worth fighting for ;) The >>> problems were that both the `string` and `dis` modules were used without >>> importing them. >>> >> >> Don't fix what ain't broken ;) >> >> The 17 ms is not what matters, the % is. People regularly complain about >> import times, and 25 % increase in import time is significant (the above >> timing are on my new macbook with SSD and 16 Gb RAM -- figures will easily >> be 1 order of magnitude worse in common situations with slower computers, >> slower HDD, NFS, etc...) >> > > Be interesting to compare times. Could you send along the code you used? > My machine is similar except it is a desktop with 2 SSDs in raid 0. > I just hacked numpy.lib.__init__ to use inspect instead of _inspect: diff --git a/numpy/compat/__init__.py b/numpy/compat/__init__.py index 5b371f5..57f6d7f 100644 --- a/numpy/compat/__init__.py +++ b/numpy/compat/__init__.py @@ -10,11 +10,11 @@ extensions, which may be included for the following reasons: """ from __future__ import division, absolute_import, print_function -from . import _inspect +import inspect as _inspect from . import py3k -from ._inspect import getargspec, formatargspec +from inspect import getargspec, formatargspec from .py3k import * __all__ = [] -__all__.extend(_inspect.__all__) +__all__.extend(["getargspec", "formatargspec"]) __all__.extend(py3k.__all__) David -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Aug 2 11:18:05 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 2 Aug 2014 09:18:05 -0600 Subject: [Numpy-discussion] Remove numpy/compat/_inspect.py ? In-Reply-To: References: Message-ID: On Fri, Aug 1, 2014 at 10:42 PM, David Cournapeau wrote: > > > > On Sat, Aug 2, 2014 at 11:36 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> >> On Fri, Aug 1, 2014 at 8:22 PM, David Cournapeau >> wrote: >> >>> >>> >>> >>> On Sat, Aug 2, 2014 at 11:17 AM, Charles R Harris < >>> charlesr.harris at gmail.com> wrote: >>> >>>> >>>> >>>> >>>> On Fri, Aug 1, 2014 at 8:01 PM, David Cournapeau >>>> wrote: >>>> >>>>> On my machine, if I use inspect instead of _inspect in >>>>> numpy.compat.__init__, the import time increases ~ 25 % (from 82 ms to 99 >>>>> ms). >>>>> >>>>> So the hack certainly still make sense, one just need to fix whatever >>>>> needs fixing (I am still not sure what's broken for the very specific >>>>> usecase that code was bundled for). >>>>> >>>>> >>>> I'm not sure a one time hit of 17 ms is worth fighting for ;) The >>>> problems were that both the `string` and `dis` modules were used without >>>> importing them. >>>> >>> >>> Don't fix what ain't broken ;) >>> >>> The 17 ms is not what matters, the % is. People regularly complain about >>> import times, and 25 % increase in import time is significant (the above >>> timing are on my new macbook with SSD and 16 Gb RAM -- figures will easily >>> be 1 order of magnitude worse in common situations with slower computers, >>> slower HDD, NFS, etc...) >>> >> >> Be interesting to compare times. Could you send along the code you used? >> My machine is similar except it is a desktop with 2 SSDs in raid 0. >> > > I just hacked numpy.lib.__init__ to use inspect instead of _inspect: > > diff --git a/numpy/compat/__init__.py b/numpy/compat/__init__.py > index 5b371f5..57f6d7f 100644 > --- a/numpy/compat/__init__.py > +++ b/numpy/compat/__init__.py > @@ -10,11 +10,11 @@ extensions, which may be included for the following > reasons: > """ > from __future__ import division, absolute_import, print_function > > -from . import _inspect > +import inspect as _inspect > from . import py3k > -from ._inspect import getargspec, formatargspec > +from inspect import getargspec, formatargspec > from .py3k import * > > __all__ = [] > -__all__.extend(_inspect.__all__) > +__all__.extend(["getargspec", "formatargspec"]) > __all__.extend(py3k.__all__) > > I was more interested in how you timed it. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Aug 2 13:14:19 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 2 Aug 2014 11:14:19 -0600 Subject: [Numpy-discussion] It lives! Or at least is not undead Message-ID: charris at localhost [matmul (master)]$ python3.5 Python 3.5.0a0 (default:4425024f2e01, Aug 2 2014, 10:10:31) [GCC 4.8.3 20140624 (Red Hat 4.8.3-1)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> import testing >>> a = np.ones(3).view(testing.marray) >>> a at 3 marray([ 3., 3., 3.]) >>> 3 at a marray([ 3., 3., 3.]) >>> a at np.eye(3) marray([ 1., 1., 1.]) >>> np.eye(3)@a marray([ 1., 1., 1.]) This was just for quick experimentation. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Aug 2 17:33:33 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 2 Aug 2014 15:33:33 -0600 Subject: [Numpy-discussion] Class to experiment with '@' Message-ID: Hi All, I've attached a subclass of ndarray that implements the new '@' operator for experimentation and comment. It is only intended for playing with that operator and may not work for other things. You will need to install python 3.5.0a1 to play with it. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: matmul.py Type: text/x-python Size: 955 bytes Desc: not available URL: From charlesr.harris at gmail.com Sun Aug 3 15:26:18 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 3 Aug 2014 13:26:18 -0600 Subject: [Numpy-discussion] Class to experiment with '@' In-Reply-To: References: Message-ID: Oops, corrected version attached. I think we need generalized functions in linalg for mat x mat, mat x vec, and vec x mat, and versions that also work for object arrays. I've used einsum in the attached file for the products, but it doesn't work for object arrays. This work is mainly to have a prototype version for the matmul operator that can be used to verify behavior and write some tests. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: matmul.py Type: text/x-python Size: 929 bytes Desc: not available URL: From charlesr.harris at gmail.com Sun Aug 3 17:48:06 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 3 Aug 2014 15:48:06 -0600 Subject: [Numpy-discussion] Class to experiment with '@' In-Reply-To: References: Message-ID: On Sun, Aug 3, 2014 at 1:26 PM, Charles R Harris wrote: > > > Oops, corrected version attached. I think we need generalized functions in > linalg for mat x mat, mat x vec, and vec x mat, and versions that also work > for object arrays. I've used einsum in the attached file for the products, > but it doesn't work for object arrays. > > This work is mainly to have a prototype version for the matmul operator > that can be used to verify behavior and write some tests. > > Another fix :( Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: matmul.py Type: text/x-python Size: 937 bytes Desc: not available URL: From t.b.poole at gmail.com Mon Aug 4 11:47:50 2014 From: t.b.poole at gmail.com (tpoole) Date: Mon, 4 Aug 2014 08:47:50 -0700 (PDT) Subject: [Numpy-discussion] Requesting Code Review of weighted covariance ENH Message-ID: <1407167270713-38292.post@n7.nabble.com> Hi everyone, I've added the ability to handle weighted data in a covariance calculation, in a similar manner to that already implemented for the calculation of a weighted average. https://github.com/tpoole/numpy/compare/weighted_cov Could an experienced someone please look over my changes before I submit a pull request? Validation of the formula can be found at "Exponential smoothing weighted correlations", F. Pozzi, T. Matteo, and T. Aste, Eur. Phys. J. B. 85, 175 (2012). --- though it is unfortunately paywalled and "An Analysis of WinCross, SPSS, and Mentor Procedures for Estimating the Variance of a Weighted Mean", A. Madansky and H. G. B. Alexander, www.analyticalgroup.com/download/weighted_variance.pdf for the "effective number of samples". Thanks, Tom -- View this message in context: http://numpy-discussion.10968.n7.nabble.com/Requesting-Code-Review-of-weighted-covariance-ENH-tp38292.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From jtaylor.debian at googlemail.com Mon Aug 4 18:05:43 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Tue, 05 Aug 2014 00:05:43 +0200 Subject: [Numpy-discussion] last call for numpy 1.8.2 bugfixes Message-ID: <53E003B7.3090100@googlemail.com> hi, as numpy 1.9 is going to be a relative hard upgrade as indexing changes expose a couple bugs in third party packages and the large amount of small little incompatibilities I will create a numpy 1.8.2 release tomorrow with a couple of important or hard to work around bugfixes. The most important bugfix is fixing the wrong result partition with multiple selections could produce if selections ended up in an equal range, see https://github.com/numpy/numpy/issues/4836 (if the crash is still unreproducable, help appreciated). the rest of the fixes are small ones listed below. If I have missed one or you consider one of the fixes to invasive for a bugfix release please speak up now. As the number of fixes is small I will skip a release candidate. Make fftpack._raw_fft threadsafe https://github.com/numpy/numpy/issues/4656 Prevent division by zero https://github.com/numpy/numpy/issues/650 Fix lack of NULL check in array_richcompare https://github.com/numpy/numpy/issues/4613 incorrect argument order to _copyto in in np.nanmax, np.nanmin https://github.com/numpy/numpy/issues/4628 Hold GIL for types with fields, fixes https://github.com/numpy/numpy/issues/4642 svd ufunc typo https://github.com/numpy/numpy/issues/4733 check alignment of strides for byteswap https://github.com/numpy/numpy/issues/4774 add missing elementsize alignment check for simd reductions https://github.com/numpy/numpy/issues/4853 ifort has issues with optimization flag /O2 https://github.com/numpy/numpy/issues/4602 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From matthew.brett at gmail.com Mon Aug 4 18:09:39 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 4 Aug 2014 15:09:39 -0700 Subject: [Numpy-discussion] last call for numpy 1.8.2 bugfixes In-Reply-To: <53E003B7.3090100@googlemail.com> References: <53E003B7.3090100@googlemail.com> Message-ID: Hi, On Mon, Aug 4, 2014 at 3:05 PM, Julian Taylor wrote: > hi, > as numpy 1.9 is going to be a relative hard upgrade as indexing changes > expose a couple bugs in third party packages and the large amount of > small little incompatibilities I will create a numpy 1.8.2 release > tomorrow with a couple of important or hard to work around bugfixes. > > The most important bugfix is fixing the wrong result partition with > multiple selections could produce if selections ended up in an equal > range, see https://github.com/numpy/numpy/issues/4836 (if the crash is > still unreproducable, help appreciated). > > the rest of the fixes are small ones listed below. > If I have missed one or you consider one of the fixes to invasive for a > bugfix release please speak up now. > As the number of fixes is small I will skip a release candidate. > > > Make fftpack._raw_fft threadsafe > https://github.com/numpy/numpy/issues/4656 > > Prevent division by zero > https://github.com/numpy/numpy/issues/650 > > Fix lack of NULL check in array_richcompare > https://github.com/numpy/numpy/issues/4613 > > incorrect argument order to _copyto in in np.nanmax, np.nanmin > https://github.com/numpy/numpy/issues/4628 > > Hold GIL for types with fields, fixes > https://github.com/numpy/numpy/issues/4642 > > svd ufunc typo > https://github.com/numpy/numpy/issues/4733 > > check alignment of strides for byteswap > https://github.com/numpy/numpy/issues/4774 > > add missing elementsize alignment check for simd reductions > https://github.com/numpy/numpy/issues/4853 > > ifort has issues with optimization flag /O2 > https://github.com/numpy/numpy/issues/4602 Any chance of a RC to give us some time to test? Cheers, Matthew From jtaylor.debian at googlemail.com Mon Aug 4 18:12:50 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Tue, 05 Aug 2014 00:12:50 +0200 Subject: [Numpy-discussion] last call for numpy 1.8.2 bugfixes In-Reply-To: References: <53E003B7.3090100@googlemail.com> Message-ID: <53E00562.5070602@googlemail.com> On 05.08.2014 00:09, Matthew Brett wrote: > Hi, > > On Mon, Aug 4, 2014 at 3:05 PM, Julian Taylor > wrote: >> hi, >> as numpy 1.9 is going to be a relative hard upgrade as indexing changes >> expose a couple bugs in third party packages and the large amount of >> small little incompatibilities I will create a numpy 1.8.2 release >> tomorrow with a couple of important or hard to work around bugfixes. >>... > > Any chance of a RC to give us some time to test? > I hope I have only selected fixes that are safe and do not require a RC. sure we could do one, but if there are issues we can also just make a quick 1.8.3 release follow up. the main backport PR is: https://github.com/numpy/numpy/pull/4949 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From njs at pobox.com Mon Aug 4 18:25:04 2014 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 4 Aug 2014 23:25:04 +0100 Subject: [Numpy-discussion] last call for numpy 1.8.2 bugfixes In-Reply-To: <53E00562.5070602@googlemail.com> References: <53E003B7.3090100@googlemail.com> <53E00562.5070602@googlemail.com> Message-ID: On Mon, Aug 4, 2014 at 11:12 PM, Julian Taylor wrote: > On 05.08.2014 00:09, Matthew Brett wrote: >> Hi, >> >> On Mon, Aug 4, 2014 at 3:05 PM, Julian Taylor >> wrote: >>> hi, >>> as numpy 1.9 is going to be a relative hard upgrade as indexing changes >>> expose a couple bugs in third party packages and the large amount of >>> small little incompatibilities I will create a numpy 1.8.2 release >>> tomorrow with a couple of important or hard to work around bugfixes. >>>... >> >> Any chance of a RC to give us some time to test? >> > > I hope I have only selected fixes that are safe and do not require a RC. > sure we could do one, but if there are issues we can also just make a > quick 1.8.3 release follow up. > > the main backport PR is: https://github.com/numpy/numpy/pull/4949 It's probably better to just make an RC if it's not too much trouble... it's always possible to misjudge what issues arise, if there's a real-but-non-catastrophic issue then people 1.8.2 will remain in use even if 1.8.3 is released afterwards and force downstream libraries to work around the issues, and just in general it's good to have and follow standard processes because special cases lead to errors. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From matthew.brett at gmail.com Mon Aug 4 18:27:38 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 4 Aug 2014 15:27:38 -0700 Subject: [Numpy-discussion] last call for numpy 1.8.2 bugfixes In-Reply-To: References: <53E003B7.3090100@googlemail.com> <53E00562.5070602@googlemail.com> Message-ID: On Mon, Aug 4, 2014 at 3:25 PM, Nathaniel Smith wrote: > On Mon, Aug 4, 2014 at 11:12 PM, Julian Taylor > wrote: >> On 05.08.2014 00:09, Matthew Brett wrote: >>> Hi, >>> >>> On Mon, Aug 4, 2014 at 3:05 PM, Julian Taylor >>> wrote: >>>> hi, >>>> as numpy 1.9 is going to be a relative hard upgrade as indexing changes >>>> expose a couple bugs in third party packages and the large amount of >>>> small little incompatibilities I will create a numpy 1.8.2 release >>>> tomorrow with a couple of important or hard to work around bugfixes. >>>>... >>> >>> Any chance of a RC to give us some time to test? >>> >> >> I hope I have only selected fixes that are safe and do not require a RC. >> sure we could do one, but if there are issues we can also just make a >> quick 1.8.3 release follow up. A few days to test would be fine, I'd prefer an RC too, Cheers, Matthew From jtaylor.debian at googlemail.com Mon Aug 4 18:46:14 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Tue, 05 Aug 2014 00:46:14 +0200 Subject: [Numpy-discussion] last call for numpy 1.8.2 bugfixes In-Reply-To: References: <53E003B7.3090100@googlemail.com> <53E00562.5070602@googlemail.com> Message-ID: <53E00D36.3080500@googlemail.com> On 05.08.2014 00:27, Matthew Brett wrote: > On Mon, Aug 4, 2014 at 3:25 PM, Nathaniel Smith wrote: >> On Mon, Aug 4, 2014 at 11:12 PM, Julian Taylor >> wrote: >>> On 05.08.2014 00:09, Matthew Brett wrote: >>>> Hi, >>>> >>>> On Mon, Aug 4, 2014 at 3:05 PM, Julian Taylor >>>> wrote: >>>>> hi, >>>>> as numpy 1.9 is going to be a relative hard upgrade as indexing changes >>>>> expose a couple bugs in third party packages and the large amount of >>>>> small little incompatibilities I will create a numpy 1.8.2 release >>>>> tomorrow with a couple of important or hard to work around bugfixes. >>>>> ... >>>> >>>> Any chance of a RC to give us some time to test? >>>> >>> >>> I hope I have only selected fixes that are safe and do not require a RC. >>> sure we could do one, but if there are issues we can also just make a >>> quick 1.8.3 release follow up. > > A few days to test would be fine, I'd prefer an RC too, > alright I'll make an RC tomorrow and planning for release this weekend then. From debruinjj at gmail.com Tue Aug 5 08:58:38 2014 From: debruinjj at gmail.com (Jurgens de Bruin) Date: Tue, 5 Aug 2014 14:58:38 +0200 Subject: [Numpy-discussion] Array2 subset of array1 Message-ID: Hi, I am new to numpy so any help would be greatly appreciated. I have two arrays: array1 = np.arange(1,100+1) array2 = np.arange(1,50+1) How can I calculate/determine if array2 is a subset of array1 (falls within array 1) Something like : array2 in array1 = TRUE for the case above. Thank -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Aug 5 09:15:18 2014 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 5 Aug 2014 14:15:18 +0100 Subject: [Numpy-discussion] Array2 subset of array1 In-Reply-To: References: Message-ID: On Tue, Aug 5, 2014 at 1:58 PM, Jurgens de Bruin wrote: > Hi, > > I am new to numpy so any help would be greatly appreciated. > > I have two arrays: > > array1 = np.arange(1,100+1) > array2 = np.arange(1,50+1) > > How can I calculate/determine if array2 is a subset of array1 (falls within > array 1) > > Something like : array2 in array1 = TRUE for the case above. Does this work? np.in1d(array2, array1) See: http://docs.scipy.org/doc/numpy/reference/routines.set.html (Note that while in1d does the best it can, set operations on arrays will usually be slower than if you used a more appropriate data type like 'set' or 'dict'.) -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From hoogendoorn.eelco at gmail.com Tue Aug 5 11:29:00 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Tue, 5 Aug 2014 17:29:00 +0200 Subject: [Numpy-discussion] Array2 subset of array1 In-Reply-To: References: Message-ID: np.all(np.in1d(array1,array2)) On Tue, Aug 5, 2014 at 2:58 PM, Jurgens de Bruin wrote: > Hi, > > I am new to numpy so any help would be greatly appreciated. > > I have two arrays: > > array1 = np.arange(1,100+1) > array2 = np.arange(1,50+1) > > How can I calculate/determine if array2 is a subset of array1 (falls > within array 1) > > Something like : array2 in array1 = TRUE for the case above. > > Thank > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Tue Aug 5 12:33:50 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 05 Aug 2014 18:33:50 +0200 Subject: [Numpy-discussion] Array2 subset of array1 In-Reply-To: References: Message-ID: <1407256430.3568.6.camel@sebastian-t440> On Di, 2014-08-05 at 14:58 +0200, Jurgens de Bruin wrote: > Hi, > > I am new to numpy so any help would be greatly appreciated. > > I have two arrays: > > array1 = np.arange(1,100+1) > array2 = np.arange(1,50+1) > > How can I calculate/determine if array2 is a subset of array1 (falls > within array 1) > > Something like : array2 in array1 = TRUE for the case above. > Just to be clear. You are looking for the whole of array1 (as a block/subarray) as far as I understand. And there is no obvious numpy way to do this. Depending on your array sizes, you could blow up the first array from (N,) to (N-M+1,M) and then check if any row matches completely. There may be better tricks available though, especially if array1 is large. - Sebastian > Thank > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From hoogendoorn.eelco at gmail.com Tue Aug 5 12:59:41 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Tue, 5 Aug 2014 18:59:41 +0200 Subject: [Numpy-discussion] Array2 subset of array1 In-Reply-To: <1407256430.3568.6.camel@sebastian-t440> References: <1407256430.3568.6.camel@sebastian-t440> Message-ID: ah yes, that may indeed be what you want. depending on your datatype, you could access the underlying raw data as a string. b.tostring() in a.tostring() sort of works; but isn't entirely safe, as you may have false positive matches which arnt aligned to your datatype using str.find in combination with dtype.itemsize could solve that problem; though it isn't the most elegant solution id say. also note that you need to check for identical datatypes and memory layout for this to guarantee correct results. On Tue, Aug 5, 2014 at 6:33 PM, Sebastian Berg wrote: > On Di, 2014-08-05 at 14:58 +0200, Jurgens de Bruin wrote: > > Hi, > > > > I am new to numpy so any help would be greatly appreciated. > > > > I have two arrays: > > > > array1 = np.arange(1,100+1) > > array2 = np.arange(1,50+1) > > > > How can I calculate/determine if array2 is a subset of array1 (falls > > within array 1) > > > > Something like : array2 in array1 = TRUE for the case above. > > > > Just to be clear. You are looking for the whole of array1 (as a > block/subarray) as far as I understand. And there is no obvious numpy > way to do this. Depending on your array sizes, you could blow up the > first array from (N,) to (N-M+1,M) and then check if any row matches > completely. There may be better tricks available though, especially if > array1 is large. > > - Sebastian > > > Thank > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Tue Aug 5 15:45:02 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Tue, 05 Aug 2014 21:45:02 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.8.2 release candidate Message-ID: <53E1343E.7020805@googlemail.com> Hello, I am pleased to announce the first release candidate for numpy 1.8.2, a pure bugfix release for the 1.8.x series. https://sourceforge.net/projects/numpy/files/NumPy/1.8.2rc1/ If no regressions show up the final release is planned this weekend. The upgrade is recommended for all users of the 1.8.x series. Following issues have been fixed: * gh-4836: partition produces wrong results for multiple selections in equal ranges * gh-4656: Make fftpack._raw_fft threadsafe * gh-4628: incorrect argument order to _copyto in in np.nanmax, np.nanmin * gh-4613: Fix lack of NULL check in array_richcompare * gh-4642: Hold GIL for converting dtypes types with fields * gh-4733: fix np.linalg.svd(b, compute_uv=False) * gh-4853: avoid unaligned simd load on reductions on i386 * gh-4774: avoid unaligned access for strided byteswap * gh-650: Prevent division by zero when creating arrays from some buffers * gh-4602: ifort has issues with optimization flag O2, use O1 Source tarballs, windows installers and release notes can be found at https://sourceforge.net/projects/numpy/files/NumPy/1.8.2rc1/ Cheers, Julian Taylor -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From cgohlke at uci.edu Tue Aug 5 16:32:50 2014 From: cgohlke at uci.edu (Christoph Gohlke) Date: Tue, 05 Aug 2014 13:32:50 -0700 Subject: [Numpy-discussion] ANN: NumPy 1.8.2 release candidate In-Reply-To: <53E1343E.7020805@googlemail.com> References: <53E1343E.7020805@googlemail.com> Message-ID: <53E13F72.8070501@uci.edu> On 8/5/2014 12:45 PM, Julian Taylor wrote: > Hello, > > I am pleased to announce the first release candidate for numpy 1.8.2, a > pure bugfix release for the 1.8.x series. > https://sourceforge.net/projects/numpy/files/NumPy/1.8.2rc1/ > > If no regressions show up the final release is planned this weekend. > The upgrade is recommended for all users of the 1.8.x series. > > Following issues have been fixed: > * gh-4836: partition produces wrong results for multiple selections in > equal ranges > * gh-4656: Make fftpack._raw_fft threadsafe > * gh-4628: incorrect argument order to _copyto in in np.nanmax, np.nanmin > * gh-4613: Fix lack of NULL check in array_richcompare > * gh-4642: Hold GIL for converting dtypes types with fields > * gh-4733: fix np.linalg.svd(b, compute_uv=False) > * gh-4853: avoid unaligned simd load on reductions on i386 > * gh-4774: avoid unaligned access for strided byteswap > * gh-650: Prevent division by zero when creating arrays from some buffers > * gh-4602: ifort has issues with optimization flag O2, use O1 > > Source tarballs, windows installers and release notes can be found at > https://sourceforge.net/projects/numpy/files/NumPy/1.8.2rc1/ > > Cheers, > Julian Taylor > Hello, thank you. Looks good. All builds and tests pass on Windows (using msvc/MKL). Any chance gh-4722 can make it into the release? Fix seg fault converting empty string to object Christoph From jtaylor.debian at googlemail.com Tue Aug 5 16:57:17 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Tue, 05 Aug 2014 22:57:17 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.8.2 release candidate In-Reply-To: <53E13F72.8070501@uci.edu> References: <53E1343E.7020805@googlemail.com> <53E13F72.8070501@uci.edu> Message-ID: <53E1452D.5090001@googlemail.com> On 05.08.2014 22:32, Christoph Gohlke wrote: > On 8/5/2014 12:45 PM, Julian Taylor wrote: >> Hello, >> >> I am pleased to announce the first release candidate for numpy 1.8.2, a >> pure bugfix release for the 1.8.x series. >> https://sourceforge.net/projects/numpy/files/NumPy/1.8.2rc1/ >> >> If no regressions show up the final release is planned this weekend. >> The upgrade is recommended for all users of the 1.8.x series. >> >> Following issues have been fixed: >> * gh-4836: partition produces wrong results for multiple selections in >> equal ranges >> * gh-4656: Make fftpack._raw_fft threadsafe >> * gh-4628: incorrect argument order to _copyto in in np.nanmax, np.nanmin >> * gh-4613: Fix lack of NULL check in array_richcompare >> * gh-4642: Hold GIL for converting dtypes types with fields >> * gh-4733: fix np.linalg.svd(b, compute_uv=False) >> * gh-4853: avoid unaligned simd load on reductions on i386 >> * gh-4774: avoid unaligned access for strided byteswap >> * gh-650: Prevent division by zero when creating arrays from some buffers >> * gh-4602: ifort has issues with optimization flag O2, use O1 >> >> Source tarballs, windows installers and release notes can be found at >> https://sourceforge.net/projects/numpy/files/NumPy/1.8.2rc1/ >> >> Cheers, >> Julian Taylor >> > > Hello, > > thank you. Looks good. All builds and tests pass on Windows (using > msvc/MKL). > > Any chance gh-4722 can make it into the release? > Fix seg fault converting empty string to object > > thanks, I missed that one, pretty simple, I'll add it to the final release. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From matthew.brett at gmail.com Tue Aug 5 17:27:14 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 5 Aug 2014 14:27:14 -0700 Subject: [Numpy-discussion] ANN: NumPy 1.8.2 release candidate In-Reply-To: <53E1452D.5090001@googlemail.com> References: <53E1343E.7020805@googlemail.com> <53E13F72.8070501@uci.edu> <53E1452D.5090001@googlemail.com> Message-ID: Hi, On Tue, Aug 5, 2014 at 1:57 PM, Julian Taylor wrote: > On 05.08.2014 22:32, Christoph Gohlke wrote: >> On 8/5/2014 12:45 PM, Julian Taylor wrote: >>> Hello, >>> >>> I am pleased to announce the first release candidate for numpy 1.8.2, a >>> pure bugfix release for the 1.8.x series. >>> https://sourceforge.net/projects/numpy/files/NumPy/1.8.2rc1/ >>> >>> If no regressions show up the final release is planned this weekend. >>> The upgrade is recommended for all users of the 1.8.x series. >>> >>> Following issues have been fixed: >>> * gh-4836: partition produces wrong results for multiple selections in >>> equal ranges >>> * gh-4656: Make fftpack._raw_fft threadsafe >>> * gh-4628: incorrect argument order to _copyto in in np.nanmax, np.nanmin >>> * gh-4613: Fix lack of NULL check in array_richcompare >>> * gh-4642: Hold GIL for converting dtypes types with fields >>> * gh-4733: fix np.linalg.svd(b, compute_uv=False) >>> * gh-4853: avoid unaligned simd load on reductions on i386 >>> * gh-4774: avoid unaligned access for strided byteswap >>> * gh-650: Prevent division by zero when creating arrays from some buffers >>> * gh-4602: ifort has issues with optimization flag O2, use O1 >>> >>> Source tarballs, windows installers and release notes can be found at >>> https://sourceforge.net/projects/numpy/files/NumPy/1.8.2rc1/ >>> >>> Cheers, >>> Julian Taylor >>> >> >> Hello, >> >> thank you. Looks good. All builds and tests pass on Windows (using >> msvc/MKL). >> >> Any chance gh-4722 can make it into the release? >> Fix seg fault converting empty string to object >> >> > > thanks, I missed that one, pretty simple, I'll add it to the final release. OSX wheels built and tested and uploaded OK : http://wheels.scikit-image.org https://travis-ci.org/matthew-brett/numpy-atlas-binaries/builds/31747958 Will test against the scipy stack later on today. Cheers, Matthew From derek at astro.physik.uni-goettingen.de Tue Aug 5 19:27:09 2014 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Wed, 6 Aug 2014 01:27:09 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.8.2 release candidate In-Reply-To: References: <53E1343E.7020805@googlemail.com> <53E13F72.8070501@uci.edu> <53E1452D.5090001@googlemail.com> Message-ID: <418B93F6-DEC6-4104-A690-D75DA6B10C7D@astro.physik.uni-goettingen.de> On 5 Aug 2014, at 11:27 pm, Matthew Brett wrote: > OSX wheels built and tested and uploaded OK : > > http://wheels.scikit-image.org > > https://travis-ci.org/matthew-brett/numpy-atlas-binaries/builds/31747958 > > Will test against the scipy stack later on today. Built and tested against the Fink Python installation under OSX. Seems to resolve one of a couple of f2py test errors appearing with 1.8.1 on Python 3.3 and 3.4: ====================================================================== ERROR: test_return_real.TestCReturnReal.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/sw/lib/python3.4/site-packages/nose/case.py", line 382, in setUp try_run(self.inst, ('setup', 'setUp')) File "/sw/lib/python3.4/site-packages/nose/util.py", line 470, in try_run return func() File "/sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py", line 348, in setUp module_name=self.module_name) File "/sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py", line 163, in build_code module_name=module_name) File "/sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: No module named ?c_ext_return_real' is gone on 3.4 now but still present on 3.3. Two errors of this kind (with different numbers) remain: ERROR: test_return_real.TestF90ReturnReal.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/sw/lib/python3.4/site-packages/nose/case.py", line 382, in setUp try_run(self.inst, ('setup', 'setUp')) File "/sw/lib/python3.4/site-packages/nose/util.py", line 470, in try_run return func() File "/sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py", line 348, in setUp module_name=self.module_name) File "/sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py", line 163, in build_code module_name=module_name) File "/sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/sw/lib/python3.4/site-packages/numpy/f2py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: No module named ?_test_ext_module_5415' NumPy version 1.8.2rc1 NumPy is installed in /sw/lib/python3.4/site-packages/numpy Python version 3.4.1 (default, Aug 3 2014, 21:02:44) [GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] nose version 1.3.3 Cheers, Derek From matthew.brett at gmail.com Tue Aug 5 20:46:21 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 5 Aug 2014 17:46:21 -0700 Subject: [Numpy-discussion] ANN: NumPy 1.8.2 release candidate In-Reply-To: References: <53E1343E.7020805@googlemail.com> <53E13F72.8070501@uci.edu> <53E1452D.5090001@googlemail.com> Message-ID: Hi, On Tue, Aug 5, 2014 at 2:27 PM, Matthew Brett wrote: > Hi, > > On Tue, Aug 5, 2014 at 1:57 PM, Julian Taylor > wrote: >> On 05.08.2014 22:32, Christoph Gohlke wrote: >>> On 8/5/2014 12:45 PM, Julian Taylor wrote: >>>> Hello, >>>> >>>> I am pleased to announce the first release candidate for numpy 1.8.2, a >>>> pure bugfix release for the 1.8.x series. >>>> https://sourceforge.net/projects/numpy/files/NumPy/1.8.2rc1/ >>>> >>>> If no regressions show up the final release is planned this weekend. >>>> The upgrade is recommended for all users of the 1.8.x series. >>>> >>>> Following issues have been fixed: >>>> * gh-4836: partition produces wrong results for multiple selections in >>>> equal ranges >>>> * gh-4656: Make fftpack._raw_fft threadsafe >>>> * gh-4628: incorrect argument order to _copyto in in np.nanmax, np.nanmin >>>> * gh-4613: Fix lack of NULL check in array_richcompare >>>> * gh-4642: Hold GIL for converting dtypes types with fields >>>> * gh-4733: fix np.linalg.svd(b, compute_uv=False) >>>> * gh-4853: avoid unaligned simd load on reductions on i386 >>>> * gh-4774: avoid unaligned access for strided byteswap >>>> * gh-650: Prevent division by zero when creating arrays from some buffers >>>> * gh-4602: ifort has issues with optimization flag O2, use O1 >>>> >>>> Source tarballs, windows installers and release notes can be found at >>>> https://sourceforge.net/projects/numpy/files/NumPy/1.8.2rc1/ >>>> >>>> Cheers, >>>> Julian Taylor >>>> >>> >>> Hello, >>> >>> thank you. Looks good. All builds and tests pass on Windows (using >>> msvc/MKL). >>> >>> Any chance gh-4722 can make it into the release? >>> Fix seg fault converting empty string to object >>> >>> >> >> thanks, I missed that one, pretty simple, I'll add it to the final release. > > OSX wheels built and tested and uploaded OK : > > http://wheels.scikit-image.org > > https://travis-ci.org/matthew-brett/numpy-atlas-binaries/builds/31747958 OSX wheel tested OK against current scipy stack for system Python, python.org Python, homebrew, macports: https://travis-ci.org/matthew-brett/scipy-stack-osx-testing/builds/31756325 Cheers, Matthew From charlesr.harris at gmail.com Tue Aug 5 21:19:00 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 5 Aug 2014 19:19:00 -0600 Subject: [Numpy-discussion] Preliminary thoughts on implementing __matmul__ Message-ID: Hi All, I've been looking to implement the "@" operator from Python 3.5. Looking at the current implementation of the dot function, it only uses a vector inner product, which is either that defined in arraytypes.c.src or a version using cblas defined in _dotblas for the float, cfloat, double, cdouble types. I note that the versions defined in arraytypes.c.src include all the numeric types plus boolean, datetime, timedelta, and object. I'm not clear why datetime and timedelta should have dot products, except perhaps for scalar multiplication. The boolean version has the advantage that it can short circuit. I also note that all the operations proposed for "@" can easily be done with einsum except for objects. So I'm wondering if one easy way to implement the functions is to extend einsum to work with objects and make it use blas when available. Another thing that may be worth looking into would be some way to multiply by the complex conjugate, as that is easy to implement at the low level. I'd welcome any thoughts as to how that might be done. Anyway, I'm just looking for a discussion and ideas here. Any input is welcome. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Aug 6 08:31:36 2014 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 6 Aug 2014 13:31:36 +0100 Subject: [Numpy-discussion] Preliminary thoughts on implementing __matmul__ In-Reply-To: References: Message-ID: On Wed, Aug 6, 2014 at 2:19 AM, Charles R Harris wrote: > Hi All, > > I've been looking to implement the "@" operator from Python 3.5. Looking at > the current implementation of the dot function, it only uses a vector inner > product, which is either that defined in arraytypes.c.src or a version using > cblas defined in _dotblas for the float, cfloat, double, cdouble types. I > note that the versions defined in arraytypes.c.src include all the numeric > types plus boolean, datetime, timedelta, and object. I'm not clear why > datetime and timedelta should have dot products, except perhaps for scalar > multiplication. I guess numeric @ timedelta is at least well-defined, but dot products on datetime make no sense -- datetimes do not support +! One thing we should keep in mind as well is how to allow user-defined dtypes to provide efficient matmul implementations. > The boolean version has the advantage that it can short > circuit. I also note that all the operations proposed for "@" can easily be > done with einsum except for objects. So I'm wondering if one easy way to > implement the functions is to extend einsum to work with objects and make it > use blas when available. Those do seem like nice features regardless of what we do for @ :-). I think the other obvious strategy to consider, is defining a 'dot' gufunc, with semantics identical to @. (This would be useful for backcompat as well: adding/dropping compatibility with older python versions would be as simple as mechanically replacing a @ b with newdot(a, b) or vice-versa.) This would require one new feature in the gufunc machinery: support for "optional core axes", to get the right semantics for 1d arrays. OTOH this would also be useful in general because there are other gufuncs that want to handle 1d arrays the same way @ does -- e.g., 'solve' variants. This would automatically solve both the user-defined dtype problem (ufuncs already allow for new loops to be registered) and the third-party array type problem (via __numpy_ufunc__). > Another thing that may be worth looking into would be some way to multiply > by the complex conjugate, as that is easy to implement at the low level. I'd > welcome any thoughts as to how that might be done. One idea that's come up before was to define a complex-conjugate dtype, which would allow .H to be a view on the original array. A simpler solution would be to define a specialized conjdot gufunc. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From jaime.frio at gmail.com Wed Aug 6 10:32:56 2014 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Wed, 6 Aug 2014 07:32:56 -0700 Subject: [Numpy-discussion] Preliminary thoughts on implementing __matmul__ In-Reply-To: References: Message-ID: On Wed, Aug 6, 2014 at 5:31 AM, Nathaniel Smith wrote: > I think the other obvious strategy to consider, is defining a 'dot' > gufunc, with semantics identical to @. (This would be useful for > backcompat as well: adding/dropping compatibility with older python > versions would be as simple as mechanically replacing a @ b with > newdot(a, b) or vice-versa.) This would require one new feature in the > gufunc machinery: support for "optional core axes", to get the right > semantics for 1d arrays. Can you elaborate on what those optional core axes would look like? If I am understanding you correctly, this is what now is solved by having more than one gufunc defined, and choosing which one to use based on the input's shapes in a thin Python wrapper. There are several examples in the linalg module you are certainly well aware of. Say you could define the matmul signature as "(i)j,j(k)->(ik)", with dimensions in parenthesis being "optional." Say we modified the gufunc machinery to detect which optional core axes are present and which not. It seems to me that you would then still need to write 4 traditional gufuncs (ij,jk->ik, j,jk->k, ij,j->i, j,j->) and dispatch to one of them. I haven't thought it through, but are there really a set of universal dispatch rules that will apply to any optional core axes problem? Would we not be losing flexibility in doing so? When I looked into gufuncs several months ago, what I missed was a way of defining signatures like n,m->n*(n-1), which would come in very handy if computing all pairwise distances. You can work around this by making the signature n,m->p and always calling the gufunc from a Python wrapper that passes in an out parameter of the right shape. But if someone gets a hold of the gufunc handle and calls it directly without an out parameter, the p defaults to 1 and you are probably in for a big crash. So it would be nice if you could provide a pointer to a function to produce the output shape based on the inputs'. On my wish list for gufunc signatures there is also frozen dimensions, e.g. a gufunc to compute greater circle distances on a sphere can be defined as m,m->, but m has to be 2, and since you don't typically want to be raising errors in the kernel, a Python wrapper is once more necessary. And again an unwrapped call to the gufunc is potentially catastrophic. Sorry for hijacking the thread, but I wouldn't mind spending some time working on expanding this functionality to include the optional axes and my wish-list, if the whole thing makes sense. Jaime -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Aug 6 11:32:42 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 6 Aug 2014 09:32:42 -0600 Subject: [Numpy-discussion] Preliminary thoughts on implementing __matmul__ In-Reply-To: References: Message-ID: On Wed, Aug 6, 2014 at 8:32 AM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > On Wed, Aug 6, 2014 at 5:31 AM, Nathaniel Smith wrote: > >> I think the other obvious strategy to consider, is defining a 'dot' >> gufunc, with semantics identical to @. (This would be useful for >> backcompat as well: adding/dropping compatibility with older python >> versions would be as simple as mechanically replacing a @ b with >> newdot(a, b) or vice-versa.) This would require one new feature in the >> gufunc machinery: support for "optional core axes", to get the right >> semantics for 1d arrays. > > > Can you elaborate on what those optional core axes would look like? If I > am understanding you correctly, this is what now is solved by having more > than one gufunc defined, and choosing which one to use based on the input's > shapes in a thin Python wrapper. There are several examples in the linalg > module you are certainly well aware of. > > Say you could define the matmul signature as "(i)j,j(k)->(ik)", with > dimensions in parenthesis being "optional." Say we modified the gufunc > machinery to detect which optional core axes are present and which not. It > seems to me that you would then still need to write 4 traditional gufuncs > (ij,jk->ik, j,jk->k, ij,j->i, j,j->) and dispatch to one of them. I haven't > thought it through, but are there really a set of universal dispatch rules > that will apply to any optional core axes problem? Would we not be losing > flexibility in doing so? > > When I looked into gufuncs several months ago, what I missed was a way of > defining signatures like n,m->n*(n-1), which would come in very handy if > computing all pairwise distances. You can work around this by making the > signature n,m->p and always calling the gufunc from a Python wrapper that > passes in an out parameter of the right shape. But if someone gets a hold > of the gufunc handle and calls it directly without an out parameter, the p > defaults to 1 and you are probably in for a big crash. So it would be nice > if you could provide a pointer to a function to produce the output shape > based on the inputs'. > > On my wish list for gufunc signatures there is also frozen dimensions, > e.g. a gufunc to compute greater circle distances on a sphere can be > defined as m,m->, but m has to be 2, and since you don't typically want to > be raising errors in the kernel, a Python wrapper is once more necessary. > And again an unwrapped call to the gufunc is potentially catastrophic. > > Sorry for hijacking the thread, but I wouldn't mind spending some time > working on expanding this functionality to include the optional axes and my > wish-list, if the whole thing makes sense. > Should also mention that we don't have the ability to operate on stacked vectors because they can't be identified by dimension info. One workaround is to add dummy dimensions where needed, another is to add two flags, row and col, and set them appropriately. Two flags are needed for backward compatibility, i.e., both false is a traditional array. Note that adding dummy dimensions can lead to '[[...]]' scalars. Working with stacked vectors isn't part of the '@' PEP. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Aug 6 12:14:38 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 6 Aug 2014 10:14:38 -0600 Subject: [Numpy-discussion] Preliminary thoughts on implementing __matmul__ In-Reply-To: References: Message-ID: On Wed, Aug 6, 2014 at 9:32 AM, Charles R Harris wrote: > > > > On Wed, Aug 6, 2014 at 8:32 AM, Jaime Fern?ndez del R?o < > jaime.frio at gmail.com> wrote: > >> On Wed, Aug 6, 2014 at 5:31 AM, Nathaniel Smith wrote: >> >>> I think the other obvious strategy to consider, is defining a 'dot' >>> gufunc, with semantics identical to @. (This would be useful for >>> backcompat as well: adding/dropping compatibility with older python >>> versions would be as simple as mechanically replacing a @ b with >>> newdot(a, b) or vice-versa.) This would require one new feature in the >>> gufunc machinery: support for "optional core axes", to get the right >>> semantics for 1d arrays. >> >> >> Can you elaborate on what those optional core axes would look like? If I >> am understanding you correctly, this is what now is solved by having more >> than one gufunc defined, and choosing which one to use based on the input's >> shapes in a thin Python wrapper. There are several examples in the linalg >> module you are certainly well aware of. >> >> Say you could define the matmul signature as "(i)j,j(k)->(ik)", with >> dimensions in parenthesis being "optional." Say we modified the gufunc >> machinery to detect which optional core axes are present and which not. It >> seems to me that you would then still need to write 4 traditional gufuncs >> (ij,jk->ik, j,jk->k, ij,j->i, j,j->) and dispatch to one of them. I haven't >> thought it through, but are there really a set of universal dispatch rules >> that will apply to any optional core axes problem? Would we not be losing >> flexibility in doing so? >> >> When I looked into gufuncs several months ago, what I missed was a way of >> defining signatures like n,m->n*(n-1), which would come in very handy if >> computing all pairwise distances. You can work around this by making the >> signature n,m->p and always calling the gufunc from a Python wrapper that >> passes in an out parameter of the right shape. But if someone gets a hold >> of the gufunc handle and calls it directly without an out parameter, the p >> defaults to 1 and you are probably in for a big crash. So it would be nice >> if you could provide a pointer to a function to produce the output shape >> based on the inputs'. >> >> On my wish list for gufunc signatures there is also frozen dimensions, >> e.g. a gufunc to compute greater circle distances on a sphere can be >> defined as m,m->, but m has to be 2, and since you don't typically want to >> be raising errors in the kernel, a Python wrapper is once more necessary. >> And again an unwrapped call to the gufunc is potentially catastrophic. >> >> Sorry for hijacking the thread, but I wouldn't mind spending some time >> working on expanding this functionality to include the optional axes and my >> wish-list, if the whole thing makes sense. >> > > Should also mention that we don't have the ability to operate on stacked > vectors because they can't be identified by dimension info. One workaround > is to add dummy dimensions where needed, another is to add two flags, row > and col, and set them appropriately. Two flags are needed for backward > compatibility, i.e., both false is a traditional array. Note that adding > dummy dimensions can lead to '[[...]]' scalars. Working with stacked > vectors isn't part of the '@' PEP. > > Transpose doesn't work with stacked arrays, so it would also be useful to have a function for that. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Aug 6 16:42:34 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 6 Aug 2014 14:42:34 -0600 Subject: [Numpy-discussion] How to give feedback to github Message-ID: Does anyone know how to complain about features to github? The new author selection list for PRs is practically useless as 1) it only lists authors belonging to the project and 2) it doesn't list the number of PRs for each author. The old list was far more useful. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Aug 6 17:05:19 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 6 Aug 2014 14:05:19 -0700 Subject: [Numpy-discussion] Preliminary thoughts on implementing __matmul__ In-Reply-To: References: Message-ID: On Wed, Aug 6, 2014 at 8:32 AM, Charles R Harris wrote: > Should also mention that we don't have the ability to operate on stacked > vectors because they can't be identified by dimension info. One workaround > is to add dummy dimensions where needed, another is to add two flags, row > and col, and set them appropriately. > I've thought for ages that if you want to naturally do linear algebra, you need to capture the concept of a row and column vector as distinct from each-other and from (1,n) and (n,1) shape arrays. So: +1 -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Wed Aug 6 18:04:03 2014 From: fperez.net at gmail.com (Fernando Perez) Date: Wed, 6 Aug 2014 15:04:03 -0700 Subject: [Numpy-discussion] How to give feedback to github In-Reply-To: References: Message-ID: The form at: https://github.com/contact or simly email support at github.com are the options. I've used it a couple of times and they've been responsive. Cheers f On Wed, Aug 6, 2014 at 1:42 PM, Charles R Harris wrote: > Does anyone know how to complain about features to github? The new author > selection list for PRs is practically useless as 1) it only lists authors > belonging to the project and 2) it doesn't list the number of PRs for each > author. The old list was far more useful. > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Fernando Perez (@fperez_org; http://fperez.org) fperez.net-at-gmail: mailing lists only (I ignore this when swamped!) fernando.perez-at-berkeley: contact me here for any direct mail -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Wed Aug 6 18:45:30 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Wed, 06 Aug 2014 18:45:30 -0400 Subject: [Numpy-discussion] Preliminary thoughts on implementing __matmul__ In-Reply-To: References: Message-ID: <53E2B00A.4000801@gmail.com> > On Wed, Aug 6, 2014 at 8:32 AM, Charles R Harris wrote: >> Should also mention that we don't have the ability to >> operate on stacked vectors because they can't be >> identified by dimension info. One >> workaround is to add dummy dimensions where needed, >> another is to add two flags, row and col, and set them >> appropriately. On 8/6/2014 5:05 PM, Chris Barker wrote: > I've thought for ages that if you want to naturally do > linear algebra, you need to capture the concept of a row > and column vector as distinct from > each-other and from (1,n) and (n,1) shape arrays. So: It seems to me that although this it may sound trivial to "add two flags", this is a fundamental conceptual change, and I hope it will not go forward without extensive discussion. To aid users like me who might want to think about this, can you please suggest for exploration a language that has adopted this approach. (Ideally, where the decision is considered a good one.) Thank you, Alan Isaac From njs at pobox.com Wed Aug 6 18:57:57 2014 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 6 Aug 2014 23:57:57 +0100 Subject: [Numpy-discussion] Preliminary thoughts on implementing __matmul__ In-Reply-To: References: Message-ID: On Wed, Aug 6, 2014 at 4:32 PM, Charles R Harris wrote: > Should also mention that we don't have the ability to operate on stacked > vectors because they can't be identified by dimension info. One workaround > is to add dummy dimensions where needed, another is to add two flags, row > and col, and set them appropriately. Two flags are needed for backward > compatibility, i.e., both false is a traditional array. It's possible I could be convinced to like this, but it would take a substantial amount of convincing :-). It seems like a pretty big violation of orthogonality/"one obvious way"/etc. to have two totally different ways of representing row/column vectors. -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From charlesr.harris at gmail.com Wed Aug 6 19:24:59 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 6 Aug 2014 17:24:59 -0600 Subject: [Numpy-discussion] Preliminary thoughts on implementing __matmul__ In-Reply-To: References: Message-ID: On Wed, Aug 6, 2014 at 4:57 PM, Nathaniel Smith wrote: > On Wed, Aug 6, 2014 at 4:32 PM, Charles R Harris > wrote: > > Should also mention that we don't have the ability to operate on stacked > > vectors because they can't be identified by dimension info. One > workaround > > is to add dummy dimensions where needed, another is to add two flags, row > > and col, and set them appropriately. Two flags are needed for backward > > compatibility, i.e., both false is a traditional array. > > It's possible I could be convinced to like this, but it would take a > substantial amount of convincing :-). It seems like a pretty big > violation of orthogonality/"one obvious way"/etc. to have two totally > different ways of representing row/column vectors. > > The '@' operator supports matrix stacks, so it would seem we also need to support vector stacks. The new addition would only be effective with the '@' operator. The main problem I see with flags is that adding them would require an extensive audit of the C code to make sure they were preserved. Another option, already supported to a large extent, is to have row and col classes inheriting from ndarray that add nothing, except for a possible new transpose type function/method. I did mock up such a class just for fun, and also added a 'dyad' function. If we really don't care to support stacked vectors we can get by without adding anything. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Aug 6 19:27:01 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 6 Aug 2014 17:27:01 -0600 Subject: [Numpy-discussion] Preliminary thoughts on implementing __matmul__ In-Reply-To: References: Message-ID: On Wed, Aug 6, 2014 at 5:24 PM, Charles R Harris wrote: > > > > On Wed, Aug 6, 2014 at 4:57 PM, Nathaniel Smith wrote: > >> On Wed, Aug 6, 2014 at 4:32 PM, Charles R Harris >> wrote: >> > Should also mention that we don't have the ability to operate on stacked >> > vectors because they can't be identified by dimension info. One >> workaround >> > is to add dummy dimensions where needed, another is to add two flags, >> row >> > and col, and set them appropriately. Two flags are needed for backward >> > compatibility, i.e., both false is a traditional array. >> >> It's possible I could be convinced to like this, but it would take a >> substantial amount of convincing :-). It seems like a pretty big >> violation of orthogonality/"one obvious way"/etc. to have two totally >> different ways of representing row/column vectors. >> >> > The '@' operator supports matrix stacks, so it would seem we also need to > support vector stacks. The new addition would only be effective with the > '@' operator. The main problem I see with flags is that adding them would > require an extensive audit of the C code to make sure they were preserved. > Another option, already supported to a large extent, is to have row and col > classes inheriting from ndarray that add nothing, except for a possible new > transpose type function/method. I did mock up such a class just for fun, > and also added a 'dyad' function. If we really don't care to support > stacked vectors we can get by without adding anything. > > Note that the '@' PEP is not compatible with current 'dot' for arrays with more than two dimensions and for scalars. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Aug 6 19:33:13 2014 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 7 Aug 2014 00:33:13 +0100 Subject: [Numpy-discussion] Preliminary thoughts on implementing __matmul__ In-Reply-To: References: Message-ID: On Thu, Aug 7, 2014 at 12:24 AM, Charles R Harris wrote: > > On Wed, Aug 6, 2014 at 4:57 PM, Nathaniel Smith wrote: >> >> On Wed, Aug 6, 2014 at 4:32 PM, Charles R Harris >> wrote: >> > Should also mention that we don't have the ability to operate on stacked >> > vectors because they can't be identified by dimension info. One >> > workaround >> > is to add dummy dimensions where needed, another is to add two flags, >> > row >> > and col, and set them appropriately. Two flags are needed for backward >> > compatibility, i.e., both false is a traditional array. >> >> It's possible I could be convinced to like this, but it would take a >> substantial amount of convincing :-). It seems like a pretty big >> violation of orthogonality/"one obvious way"/etc. to have two totally >> different ways of representing row/column vectors. >> > > The '@' operator supports matrix stacks, so it would seem we also need to > support vector stacks. The new addition would only be effective with the '@' > operator. The main problem I see with flags is that adding them would > require an extensive audit of the C code to make sure they were preserved. > Another option, already supported to a large extent, is to have row and col > classes inheriting from ndarray that add nothing, except for a possible new > transpose type function/method. I did mock up such a class just for fun, and > also added a 'dyad' function. If we really don't care to support stacked > vectors we can get by without adding anything. It's possible you could convince me that this is a good idea, but I'm starting at like -0.95 :-). Wouldn't it be vastly simpler to just have np.linalg.matvec, matmat, vecvec or something (each of which are single-liners in terms of @), rather than deal with two different ways of representing row/column vectors everywhere? -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From charlesr.harris at gmail.com Wed Aug 6 19:41:46 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 6 Aug 2014 17:41:46 -0600 Subject: [Numpy-discussion] Preliminary thoughts on implementing __matmul__ In-Reply-To: References: Message-ID: On Wed, Aug 6, 2014 at 5:33 PM, Nathaniel Smith wrote: > On Thu, Aug 7, 2014 at 12:24 AM, Charles R Harris > wrote: > > > > On Wed, Aug 6, 2014 at 4:57 PM, Nathaniel Smith wrote: > >> > >> On Wed, Aug 6, 2014 at 4:32 PM, Charles R Harris > >> wrote: > >> > Should also mention that we don't have the ability to operate on > stacked > >> > vectors because they can't be identified by dimension info. One > >> > workaround > >> > is to add dummy dimensions where needed, another is to add two flags, > >> > row > >> > and col, and set them appropriately. Two flags are needed for backward > >> > compatibility, i.e., both false is a traditional array. > >> > >> It's possible I could be convinced to like this, but it would take a > >> substantial amount of convincing :-). It seems like a pretty big > >> violation of orthogonality/"one obvious way"/etc. to have two totally > >> different ways of representing row/column vectors. > >> > > > > The '@' operator supports matrix stacks, so it would seem we also need to > > support vector stacks. The new addition would only be effective with the > '@' > > operator. The main problem I see with flags is that adding them would > > require an extensive audit of the C code to make sure they were > preserved. > > Another option, already supported to a large extent, is to have row and > col > > classes inheriting from ndarray that add nothing, except for a possible > new > > transpose type function/method. I did mock up such a class just for fun, > and > > also added a 'dyad' function. If we really don't care to support stacked > > vectors we can get by without adding anything. > > It's possible you could convince me that this is a good idea, but I'm > starting at like -0.95 :-). Wouldn't it be vastly simpler to just have > np.linalg.matvec, matmat, vecvec or something (each of which are > single-liners in terms of @), rather than deal with two different ways > of representing row/column vectors everywhere? > > Sure, but matvec and vecvec would not be supported by '@' except when vec was 1d because there is no way to distinguish a stack of vectors from a matrix or a stack of matrices. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Aug 6 19:51:18 2014 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 7 Aug 2014 00:51:18 +0100 Subject: [Numpy-discussion] Preliminary thoughts on implementing __matmul__ In-Reply-To: References: Message-ID: On 7 Aug 2014 00:41, "Charles R Harris" wrote: > > On Wed, Aug 6, 2014 at 5:33 PM, Nathaniel Smith wrote: >> >> On Thu, Aug 7, 2014 at 12:24 AM, Charles R Harris >> wrote: >> > >> > On Wed, Aug 6, 2014 at 4:57 PM, Nathaniel Smith wrote: >> >> >> >> On Wed, Aug 6, 2014 at 4:32 PM, Charles R Harris >> >> wrote: >> >> > Should also mention that we don't have the ability to operate on stacked >> >> > vectors because they can't be identified by dimension info. One >> >> > workaround >> >> > is to add dummy dimensions where needed, another is to add two flags, >> >> > row >> >> > and col, and set them appropriately. Two flags are needed for backward >> >> > compatibility, i.e., both false is a traditional array. >> >> >> >> It's possible I could be convinced to like this, but it would take a >> >> substantial amount of convincing :-). It seems like a pretty big >> >> violation of orthogonality/"one obvious way"/etc. to have two totally >> >> different ways of representing row/column vectors. >> >> >> > >> > The '@' operator supports matrix stacks, so it would seem we also need to >> > support vector stacks. The new addition would only be effective with the '@' >> > operator. The main problem I see with flags is that adding them would >> > require an extensive audit of the C code to make sure they were preserved. >> > Another option, already supported to a large extent, is to have row and col >> > classes inheriting from ndarray that add nothing, except for a possible new >> > transpose type function/method. I did mock up such a class just for fun, and >> > also added a 'dyad' function. If we really don't care to support stacked >> > vectors we can get by without adding anything. >> >> It's possible you could convince me that this is a good idea, but I'm >> starting at like -0.95 :-). Wouldn't it be vastly simpler to just have >> np.linalg.matvec, matmat, vecvec or something (each of which are >> single-liners in terms of @), rather than deal with two different ways >> of representing row/column vectors everywhere? >> > > Sure, but matvec and vecvec would not be supported by '@' except when vec was 1d because there is no way to distinguish a stack of vectors from a matrix or a stack of matrices. Yes. But @ can never be magic - either people will have to write something extra to flip these flags on their array objects, or they'll have to write something extra to describe which operation they want. @ was never intended to cover every case, just the simple-but-super-common ones that dot covers, plus a few more (simple broadcasting). We have np.add even though + exists too... -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Aug 6 20:00:52 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 6 Aug 2014 18:00:52 -0600 Subject: [Numpy-discussion] Preliminary thoughts on implementing __matmul__ In-Reply-To: References: Message-ID: On Wed, Aug 6, 2014 at 5:51 PM, Nathaniel Smith wrote: > On 7 Aug 2014 00:41, "Charles R Harris" wrote: > > > > On Wed, Aug 6, 2014 at 5:33 PM, Nathaniel Smith wrote: > >> > >> On Thu, Aug 7, 2014 at 12:24 AM, Charles R Harris > >> wrote: > >> > > >> > On Wed, Aug 6, 2014 at 4:57 PM, Nathaniel Smith > wrote: > >> >> > >> >> On Wed, Aug 6, 2014 at 4:32 PM, Charles R Harris > >> >> wrote: > >> >> > Should also mention that we don't have the ability to operate on > stacked > >> >> > vectors because they can't be identified by dimension info. One > >> >> > workaround > >> >> > is to add dummy dimensions where needed, another is to add two > flags, > >> >> > row > >> >> > and col, and set them appropriately. Two flags are needed for > backward > >> >> > compatibility, i.e., both false is a traditional array. > >> >> > >> >> It's possible I could be convinced to like this, but it would take a > >> >> substantial amount of convincing :-). It seems like a pretty big > >> >> violation of orthogonality/"one obvious way"/etc. to have two totally > >> >> different ways of representing row/column vectors. > >> >> > >> > > >> > The '@' operator supports matrix stacks, so it would seem we also > need to > >> > support vector stacks. The new addition would only be effective with > the '@' > >> > operator. The main problem I see with flags is that adding them would > >> > require an extensive audit of the C code to make sure they were > preserved. > >> > Another option, already supported to a large extent, is to have row > and col > >> > classes inheriting from ndarray that add nothing, except for a > possible new > >> > transpose type function/method. I did mock up such a class just for > fun, and > >> > also added a 'dyad' function. If we really don't care to support > stacked > >> > vectors we can get by without adding anything. > >> > >> It's possible you could convince me that this is a good idea, but I'm > >> starting at like -0.95 :-). Wouldn't it be vastly simpler to just have > >> np.linalg.matvec, matmat, vecvec or something (each of which are > >> single-liners in terms of @), rather than deal with two different ways > >> of representing row/column vectors everywhere? > >> > > > > Sure, but matvec and vecvec would not be supported by '@' except when > vec was 1d because there is no way to distinguish a stack of vectors from a > matrix or a stack of matrices. > > Yes. But @ can never be magic - either people will have to write something > extra to flip these flags on their array objects, or they'll have to write > something extra to describe which operation they want. @ was never intended > to cover every case, just the simple-but-super-common ones that dot covers, > plus a few more (simple broadcasting). We have np.add even though + exists > too... > I don't expect stacked matrices/vectors to be used often, although there are some areas that might make heavy use of them, so I think we could live with the simple implementation, it's just a bit of a wart when there is broadcasting of arrays. Just to be clear, the '@' broadcasting differs from the dot broadcasting, agreed? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Aug 6 20:05:22 2014 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 7 Aug 2014 01:05:22 +0100 Subject: [Numpy-discussion] Preliminary thoughts on implementing __matmul__ In-Reply-To: References: Message-ID: On Thu, Aug 7, 2014 at 1:00 AM, Charles R Harris wrote: > > Just to be clear, the '@' broadcasting differs from > the dot broadcasting, agreed? Right, np.dot does the equivalent of ufunc.outer (i.e., not broadcasting at all), while @ broadcasts. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From hoogendoorn.eelco at gmail.com Thu Aug 7 03:40:44 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Thu, 7 Aug 2014 09:40:44 +0200 Subject: [Numpy-discussion] Preliminary thoughts on implementing __matmul__ In-Reply-To: References: Message-ID: I don't expect stacked matrices/vectors to be used often, although there are some areas that might make heavy use of them, so I think we could live with the simple implementation, it's just a bit of a wart when there is broadcasting of arrays. Just to be clear, the '@' broadcasting differs from the dot broadcasting, agreed? This lack of elegance and unity combined with frankly; a lack of utility, made me plead @ is a bad idea in the first place; but I guess I lost that debate... On Thu, Aug 7, 2014 at 2:00 AM, Charles R Harris wrote: > > > > On Wed, Aug 6, 2014 at 5:51 PM, Nathaniel Smith wrote: > >> On 7 Aug 2014 00:41, "Charles R Harris" >> wrote: >> > >> > On Wed, Aug 6, 2014 at 5:33 PM, Nathaniel Smith wrote: >> >> >> >> On Thu, Aug 7, 2014 at 12:24 AM, Charles R Harris >> >> wrote: >> >> > >> >> > On Wed, Aug 6, 2014 at 4:57 PM, Nathaniel Smith >> wrote: >> >> >> >> >> >> On Wed, Aug 6, 2014 at 4:32 PM, Charles R Harris >> >> >> wrote: >> >> >> > Should also mention that we don't have the ability to operate on >> stacked >> >> >> > vectors because they can't be identified by dimension info. One >> >> >> > workaround >> >> >> > is to add dummy dimensions where needed, another is to add two >> flags, >> >> >> > row >> >> >> > and col, and set them appropriately. Two flags are needed for >> backward >> >> >> > compatibility, i.e., both false is a traditional array. >> >> >> >> >> >> It's possible I could be convinced to like this, but it would take a >> >> >> substantial amount of convincing :-). It seems like a pretty big >> >> >> violation of orthogonality/"one obvious way"/etc. to have two >> totally >> >> >> different ways of representing row/column vectors. >> >> >> >> >> > >> >> > The '@' operator supports matrix stacks, so it would seem we also >> need to >> >> > support vector stacks. The new addition would only be effective with >> the '@' >> >> > operator. The main problem I see with flags is that adding them would >> >> > require an extensive audit of the C code to make sure they were >> preserved. >> >> > Another option, already supported to a large extent, is to have row >> and col >> >> > classes inheriting from ndarray that add nothing, except for a >> possible new >> >> > transpose type function/method. I did mock up such a class just for >> fun, and >> >> > also added a 'dyad' function. If we really don't care to support >> stacked >> >> > vectors we can get by without adding anything. >> >> >> >> It's possible you could convince me that this is a good idea, but I'm >> >> starting at like -0.95 :-). Wouldn't it be vastly simpler to just have >> >> np.linalg.matvec, matmat, vecvec or something (each of which are >> >> single-liners in terms of @), rather than deal with two different ways >> >> of representing row/column vectors everywhere? >> >> >> > >> > Sure, but matvec and vecvec would not be supported by '@' except when >> vec was 1d because there is no way to distinguish a stack of vectors from a >> matrix or a stack of matrices. >> >> Yes. But @ can never be magic - either people will have to write >> something extra to flip these flags on their array objects, or they'll have >> to write something extra to describe which operation they want. @ was never >> intended to cover every case, just the simple-but-super-common ones that >> dot covers, plus a few more (simple broadcasting). We have np.add even >> though + exists too... >> > I don't expect stacked matrices/vectors to be used often, although there > are some areas that might make heavy use of them, so I think we could live > with the simple implementation, it's just a bit of a wart when there is > broadcasting of arrays. Just to be clear, the '@' broadcasting differs from > the dot broadcasting, agreed? > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Thu Aug 7 05:40:53 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 07 Aug 2014 11:40:53 +0200 Subject: [Numpy-discussion] Preliminary thoughts on implementing __matmul__ In-Reply-To: References: Message-ID: <1407404453.6447.3.camel@sebastian-t440> On Mi, 2014-08-06 at 14:05 -0700, Chris Barker wrote: > On Wed, Aug 6, 2014 at 8:32 AM, Charles R Harris > wrote: > Should also mention that we don't have the ability to operate > on stacked vectors because they can't be identified by > dimension info. One workaround is to add dummy dimensions > where needed, another is to add two flags, row and col, and > set them appropriately. > > > I've thought for ages that if you want to naturally do linear algebra, > you need to capture the concept of a row and column vector as distinct > from each-other and from (1,n) and (n,1) shape arrays. So: > As a first thought I am against flags. We have dot, and vdot, which ideally would at some point do stacked matrix-matrix and stacked vector-vector (albeit vdot does complex conjugation). vector-matrix and matrix-vector would require the user to use (1, n) or (n, 1) matrices. If someone can convince me that this is a big deal, flags might be the only option, though... - Sebastian > > +1 > > > -Chris > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From Nicolas.Rougier at inria.fr Thu Aug 7 07:16:13 2014 From: Nicolas.Rougier at inria.fr (Nicolas P. Rougier) Date: Thu, 7 Aug 2014 13:16:13 +0200 Subject: [Numpy-discussion] Inverted indices Message-ID: Hi, I've a small problem for which I cannot find a solution and I'm quite sure there is an obvious one: I've an array Z (any dtype) with some data. I've a (sorted) array I (of integer, same size as Z) that tells me the index of Z[i] (if necessary, the index can be stored in Z). Now, I have an arbitrary sequence S of indices (in the sense of I), how do I build the corresponding data ? Here is a small example: Z = [(0,0), (1,1), (2,2), (3,3), (4,4)) I = [0, 20, 23, 24, 37] S = [ 20,20,0,24] -> Result should be [(1,1), (1,1), (0,0),(3,3)] S = [15,15] -> Wrong (15 not in I) but ideally, I would like this to be converted to [(0,0), (0,0)] Any idea ? Nicolas From stefan at sun.ac.za Thu Aug 7 07:31:12 2014 From: stefan at sun.ac.za (=?UTF-8?Q?St=C3=A9fan_van_der_Walt?=) Date: Thu, 7 Aug 2014 13:31:12 +0200 Subject: [Numpy-discussion] Inverted indices In-Reply-To: References: Message-ID: Hi Nicolas On Thu, Aug 7, 2014 at 1:16 PM, Nicolas P. Rougier wrote: > Here is a small example: > > Z = [(0,0), (1,1), (2,2), (3,3), (4,4)) > I = [0, 20, 23, 24, 37] > > S = [ 20,20,0,24] > -> Result should be [(1,1), (1,1), (0,0),(3,3)] > > S = [15,15] > -> Wrong (15 not in I) but ideally, I would like this to be converted to [(0,0), (0,0)] First try: Z = np.array([(0,0), (1,1), (2,2), (3,3), (4,4)]) I = np.array([0, 20, 23, 24, 37]) S = np.array([ 20,20,0,24,15]) out = np.zeros((len(S), len(Z[0]))) mask = (S[:, np.newaxis] == I) item, coord = np.where(mask) out[item, :] = Z[coord] Perhaps there's a neater way of doing it! St?fan From Nicolas.Rougier at inria.fr Thu Aug 7 07:54:10 2014 From: Nicolas.Rougier at inria.fr (Nicolas P. Rougier) Date: Thu, 7 Aug 2014 13:54:10 +0200 Subject: [Numpy-discussion] Inverted indices In-Reply-To: References: Message-ID: <3C707DA6-D3B1-41BB-BA37-27EB524BA9AE@inria.fr> Nice ! Thanks St?fan. I will add it to the numpy 100 problems. Nicolas On 07 Aug 2014, at 13:31, St?fan van der Walt wrote: > Hi Nicolas > > On Thu, Aug 7, 2014 at 1:16 PM, Nicolas P. Rougier > wrote: >> Here is a small example: >> >> Z = [(0,0), (1,1), (2,2), (3,3), (4,4)) >> I = [0, 20, 23, 24, 37] >> >> S = [ 20,20,0,24] >> -> Result should be [(1,1), (1,1), (0,0),(3,3)] >> >> S = [15,15] >> -> Wrong (15 not in I) but ideally, I would like this to be converted to [(0,0), (0,0)] > > First try: > > Z = np.array([(0,0), (1,1), (2,2), (3,3), (4,4)]) > I = np.array([0, 20, 23, 24, 37]) > S = np.array([ 20,20,0,24,15]) > > out = np.zeros((len(S), len(Z[0]))) > mask = (S[:, np.newaxis] == I) > item, coord = np.where(mask) > out[item, :] = Z[coord] > > Perhaps there's a neater way of doing it! > > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From gregor.thalhammer at gmail.com Thu Aug 7 07:59:05 2014 From: gregor.thalhammer at gmail.com (Gregor Thalhammer) Date: Thu, 7 Aug 2014 13:59:05 +0200 Subject: [Numpy-discussion] Inverted indices In-Reply-To: References: Message-ID: <4928D13D-0208-49A8-AA05-A5D4BAFEE950@gmail.com> Am 07.08.2014 um 13:16 schrieb Nicolas P. Rougier : > > Hi, > > I've a small problem for which I cannot find a solution and I'm quite sure there is an obvious one: > > I've an array Z (any dtype) with some data. > I've a (sorted) array I (of integer, same size as Z) that tells me the index of Z[i] (if necessary, the index can be stored in Z). > > Now, I have an arbitrary sequence S of indices (in the sense of I), how do I build the corresponding data ? > > Here is a small example: > > Z = [(0,0), (1,1), (2,2), (3,3), (4,4)) > I = [0, 20, 23, 24, 37] > > S = [ 20,20,0,24] > -> Result should be [(1,1), (1,1), (0,0),(3,3)] > > S = [15,15] > -> Wrong (15 not in I) but ideally, I would like this to be converted to [(0,0), (0,0)] > > > Any idea ? > If I is sorted, I would propose to use a bisection algorithm, faster than linear search: Z = array([(0,0), (1,1), (2,2), (3,3), (4,4)]) I = array([0, 20, 23, 24, 37]) S = array([ 20,20,0,24,15,27]) a = zeros(S.shape,dtype=int) b = a + S.shape[0]-1 for i in range(int(log2(S.shape[0]))+2): c = (a+b)>>1 sel = I[c]<=S a[sel] = c[sel] b[~sel] = c[~sel] Z[c] If I[c] != S, then there is no corresponding index entry in I to match S. Gregor From gregor.thalhammer at gmail.com Thu Aug 7 08:04:38 2014 From: gregor.thalhammer at gmail.com (Gregor Thalhammer) Date: Thu, 7 Aug 2014 14:04:38 +0200 Subject: [Numpy-discussion] Inverted indices In-Reply-To: <4928D13D-0208-49A8-AA05-A5D4BAFEE950@gmail.com> References: <4928D13D-0208-49A8-AA05-A5D4BAFEE950@gmail.com> Message-ID: <9988E53C-EC30-44A7-8E0E-B83AA707CB83@gmail.com> Am 07.08.2014 um 13:59 schrieb Gregor Thalhammer : > > Am 07.08.2014 um 13:16 schrieb Nicolas P. Rougier : > >> >> Hi, >> >> I've a small problem for which I cannot find a solution and I'm quite sure there is an obvious one: >> >> I've an array Z (any dtype) with some data. >> I've a (sorted) array I (of integer, same size as Z) that tells me the index of Z[i] (if necessary, the index can be stored in Z). >> >> Now, I have an arbitrary sequence S of indices (in the sense of I), how do I build the corresponding data ? >> >> Here is a small example: >> >> Z = [(0,0), (1,1), (2,2), (3,3), (4,4)) >> I = [0, 20, 23, 24, 37] >> >> S = [ 20,20,0,24] >> -> Result should be [(1,1), (1,1), (0,0),(3,3)] >> >> S = [15,15] >> -> Wrong (15 not in I) but ideally, I would like this to be converted to [(0,0), (0,0)] >> >> >> Any idea ? >> > > If I is sorted, I would propose to use a bisection algorithm, faster than linear search: > > Z = array([(0,0), (1,1), (2,2), (3,3), (4,4)]) > I = array([0, 20, 23, 24, 37]) > S = array([ 20,20,0,24,15,27]) > > a = zeros(S.shape,dtype=int) > b = a + S.shape[0]-1 > for i in range(int(log2(S.shape[0]))+2): > c = (a+b)>>1 > sel = I[c]<=S > a[sel] = c[sel] > b[~sel] = c[~sel] or even simpler: c = searchsorted(I, S) Z[c] Gregor > Z[c] > > If I[c] != S, then there is no corresponding index entry in I to match S. > > Gregor -------------- next part -------------- An HTML attachment was scrubbed... URL: From Nicolas.Rougier at inria.fr Thu Aug 7 08:09:43 2014 From: Nicolas.Rougier at inria.fr (Nicolas P. Rougier) Date: Thu, 7 Aug 2014 14:09:43 +0200 Subject: [Numpy-discussion] Inverted indices In-Reply-To: <9988E53C-EC30-44A7-8E0E-B83AA707CB83@gmail.com> References: <4928D13D-0208-49A8-AA05-A5D4BAFEE950@gmail.com> <9988E53C-EC30-44A7-8E0E-B83AA707CB83@gmail.com> Message-ID: <8B49A50B-BF7E-418D-A2D3-73E16A9FF21F@inria.fr> Oh thanks, I would never have imagined a one-line solution... Here are the benchmarks: In [2]: %timeit stefan(S) 100000 loops, best of 3: 10.8 ?s per loop In [3]: %timeit gregor(S) 10000 loops, best of 3: 48.1 ?s per loop In [4]: %timeit gregor2(S) 100000 loops, best of 3: 3.23 ?s per loop Nicolas On 07 Aug 2014, at 14:04, Gregor Thalhammer wrote: > > Am 07.08.2014 um 13:59 schrieb Gregor Thalhammer : > >> >> Am 07.08.2014 um 13:16 schrieb Nicolas P. Rougier : >> >>> >>> Hi, >>> >>> I've a small problem for which I cannot find a solution and I'm quite sure there is an obvious one: >>> >>> I've an array Z (any dtype) with some data. >>> I've a (sorted) array I (of integer, same size as Z) that tells me the index of Z[i] (if necessary, the index can be stored in Z). >>> >>> Now, I have an arbitrary sequence S of indices (in the sense of I), how do I build the corresponding data ? >>> >>> Here is a small example: >>> >>> Z = [(0,0), (1,1), (2,2), (3,3), (4,4)) >>> I = [0, 20, 23, 24, 37] >>> >>> S = [ 20,20,0,24] >>> -> Result should be [(1,1), (1,1), (0,0),(3,3)] >>> >>> S = [15,15] >>> -> Wrong (15 not in I) but ideally, I would like this to be converted to [(0,0), (0,0)] >>> >>> >>> Any idea ? >>> >> >> If I is sorted, I would propose to use a bisection algorithm, faster than linear search: >> >> Z = array([(0,0), (1,1), (2,2), (3,3), (4,4)]) >> I = array([0, 20, 23, 24, 37]) >> S = array([ 20,20,0,24,15,27]) >> >> a = zeros(S.shape,dtype=int) >> b = a + S.shape[0]-1 >> for i in range(int(log2(S.shape[0]))+2): >> c = (a+b)>>1 >> sel = I[c]<=S >> a[sel] = c[sel] >> b[~sel] = c[~sel] > > or even simpler: > c = searchsorted(I, S) > Z[c] > > Gregor > >> Z[c] >> >> If I[c] != S, then there is no corresponding index entry in I to match S. >> >> Gregor > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From cjw at ncf.ca Thu Aug 7 08:17:20 2014 From: cjw at ncf.ca (cjw) Date: Thu, 07 Aug 2014 08:17:20 -0400 Subject: [Numpy-discussion] Preliminary thoughts on implementing __matmul__ In-Reply-To: <1407404453.6447.3.camel@sebastian-t440> References: <1407404453.6447.3.camel@sebastian-t440> Message-ID: <53E36E50.6020105@ncf.ca> On 07/08/2014 5:40 AM, Sebastian Berg wrote: > On Mi, 2014-08-06 at 14:05 -0700, Chris Barker wrote: >> On Wed, Aug 6, 2014 at 8:32 AM, Charles R Harris >> wrote: >> Should also mention that we don't have the ability to operate >> on stacked vectors because they can't be identified by >> dimension info. One workaround is to add dummy dimensions >> where needed, another is to add two flags, row and col, and >> set them appropriately. >> >> >> I've thought for ages that if you want to naturally do linear algebra, >> you need to capture the concept of a row and column vector as distinct >> from each-other and from (1,n) and (n,1) shape arrays. So: It's a pity that these ideas weren't incorporatated into the Numarray implementation. The treatment of the scalar was also questionable. There's time to fix these things. >> > As a first thought I am against flags. We have dot, and vdot, which > ideally would at some point do stacked matrix-matrix and stacked > vector-vector (albeit vdot does complex conjugation). vector-matrix and > matrix-vector would require the user to use (1, n) or (n, 1) matrices. > If someone can convince me that this is a big deal, flags might be the > only option, though... That's a basic question: Is dot a big deal? Unfortunately, this wasn't examined carefully enough last Spring. > > - Sebastian Colin W. > >> +1 >> >> >> -Chris >> >> >> >> -- >> >> Christopher Barker, Ph.D. >> Oceanographer >> >> Emergency Response Division >> NOAA/NOS/OR&R (206) 526-6959 voice >> 7600 Sand Point Way NE (206) 526-6329 fax >> Seattle, WA 98115 (206) 526-6317 main reception >> >> Chris.Barker at noaa.gov >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From nouiz at nouiz.org Thu Aug 7 12:54:38 2014 From: nouiz at nouiz.org (=?UTF-8?B?RnLDqWTDqXJpYyBCYXN0aWVu?=) Date: Thu, 7 Aug 2014 12:54:38 -0400 Subject: [Numpy-discussion] ANN: NumPy 1.8.2 release candidate In-Reply-To: References: <53E1343E.7020805@googlemail.com> <53E13F72.8070501@uci.edu> <53E1452D.5090001@googlemail.com> Message-ID: All Theano tests work. thanks! Fred On Tue, Aug 5, 2014 at 8:46 PM, Matthew Brett wrote: > Hi, > > On Tue, Aug 5, 2014 at 2:27 PM, Matthew Brett > wrote: > > Hi, > > > > On Tue, Aug 5, 2014 at 1:57 PM, Julian Taylor > > wrote: > >> On 05.08.2014 22:32, Christoph Gohlke wrote: > >>> On 8/5/2014 12:45 PM, Julian Taylor wrote: > >>>> Hello, > >>>> > >>>> I am pleased to announce the first release candidate for numpy 1.8.2, > a > >>>> pure bugfix release for the 1.8.x series. > >>>> https://sourceforge.net/projects/numpy/files/NumPy/1.8.2rc1/ > >>>> > >>>> If no regressions show up the final release is planned this weekend. > >>>> The upgrade is recommended for all users of the 1.8.x series. > >>>> > >>>> Following issues have been fixed: > >>>> * gh-4836: partition produces wrong results for multiple selections in > >>>> equal ranges > >>>> * gh-4656: Make fftpack._raw_fft threadsafe > >>>> * gh-4628: incorrect argument order to _copyto in in np.nanmax, > np.nanmin > >>>> * gh-4613: Fix lack of NULL check in array_richcompare > >>>> * gh-4642: Hold GIL for converting dtypes types with fields > >>>> * gh-4733: fix np.linalg.svd(b, compute_uv=False) > >>>> * gh-4853: avoid unaligned simd load on reductions on i386 > >>>> * gh-4774: avoid unaligned access for strided byteswap > >>>> * gh-650: Prevent division by zero when creating arrays from some > buffers > >>>> * gh-4602: ifort has issues with optimization flag O2, use O1 > >>>> > >>>> Source tarballs, windows installers and release notes can be found at > >>>> https://sourceforge.net/projects/numpy/files/NumPy/1.8.2rc1/ > >>>> > >>>> Cheers, > >>>> Julian Taylor > >>>> > >>> > >>> Hello, > >>> > >>> thank you. Looks good. All builds and tests pass on Windows (using > >>> msvc/MKL). > >>> > >>> Any chance gh-4722 can make it into the release? > >>> Fix seg fault converting empty string to object > >>> > >>> > >> > >> thanks, I missed that one, pretty simple, I'll add it to the final > release. > > > > OSX wheels built and tested and uploaded OK : > > > > http://wheels.scikit-image.org > > > > https://travis-ci.org/matthew-brett/numpy-atlas-binaries/builds/31747958 > > OSX wheel tested OK against current scipy stack for system Python, > python.org Python, homebrew, macports: > > https://travis-ci.org/matthew-brett/scipy-stack-osx-testing/builds/31756325 > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Aug 7 22:32:21 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 7 Aug 2014 20:32:21 -0600 Subject: [Numpy-discussion] OpenBLAS and dotblas Message-ID: Hi All, It looks like numpy dot only uses BLAS if ATLAS is present, see numpy/core/setup.py. Has anyone done the mods needed to use OpenBLAS? What is the current status of using OpenBLAS with numpy? Also, I'm thinking of moving linalg/lapack_lite/blas_lite.c down into the core. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Aug 7 23:42:15 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 7 Aug 2014 21:42:15 -0600 Subject: [Numpy-discussion] OpenBLAS and dotblas In-Reply-To: References: Message-ID: On Thu, Aug 7, 2014 at 8:32 PM, Charles R Harris wrote: > Hi All, > > It looks like numpy dot only uses BLAS if ATLAS is present, see > numpy/core/setup.py. Has anyone done the mods needed to use OpenBLAS? What > is the current status of using OpenBLAS with numpy? > NVM, the comments and "NO_ATLAS_INFO" are respectively wrong and deceptive. Ugh. > > Also, I'm thinking of moving linalg/lapack_lite/blas_lite.c down into the > core. Thoughts? > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From kikocorreoso at gmail.com Fri Aug 8 03:31:59 2014 From: kikocorreoso at gmail.com (Kiko) Date: Fri, 8 Aug 2014 09:31:59 +0200 Subject: [Numpy-discussion] Calculation of a hessian Message-ID: Hi all, I am trying to calculate a Hessian. I am using numdifftools for this ( https://pypi.python.org/pypi/Numdifftools). My question is, is it possible to make it using pure numpy?. The actual code is like this: *import numdifftools as nd* *import numpy as np* *def log_likelihood(params):* * sum1 = 0; sum2 = 0* * mu = params[0]; sigma = params[1]; xi = params[2]* * for z in data:* * x = 1 + xi * ((z-mu)/sigma)* * sum1 += np.log(x)* * sum2 += x**(-1.0/xi)* * return -((-len(data) * np.log(sigma)) - (1 + 1/xi)*sum1 - sum2) # negated so we can use 'minimum'* *kk = nd.Hessian(log_likelihood)* Thanks in advance. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jgomezdans at gmail.com Fri Aug 8 05:51:41 2014 From: jgomezdans at gmail.com (Jose Gomez-Dans) Date: Fri, 8 Aug 2014 10:51:41 +0100 Subject: [Numpy-discussion] Calculation of a hessian In-Reply-To: References: Message-ID: Your function looks fairly simple to differentiate by hand, but if you have access to the gradient (or you estimate it numerically using scipy...), this function might do the job: def hessian ( x, the_func, epsilon=1e-8): """Numerical approximation to the Hessian Parameters ------------ x: array-like The evaluation point the_func: function The function. We assume that the function returns the function value and the associated gradient as the second return element epsilon: float The size of the step """ N = x.size h = np.zeros((N,N)) df_0 = the_func ( x )[1] for i in xrange(N): xx0 = 1.*x[i] x[i] = xx0 + epsilon df_1 = the_func ( x )[1] h[i,:] = (df_1 - df_0)/epsilon x[i] = xx0 return h Jose On 8 August 2014 08:31, Kiko wrote: > Hi all, > > I am trying to calculate a Hessian. I am using numdifftools for this ( > https://pypi.python.org/pypi/Numdifftools). > > My question is, is it possible to make it using pure numpy?. > > The actual code is like this: > > > *import numdifftools as nd* > *import numpy as np* > > *def log_likelihood(params):* > * sum1 = 0; sum2 = 0* > * mu = params[0]; sigma = params[1]; xi = params[2]* > * for z in data:* > * x = 1 + xi * ((z-mu)/sigma)* > * sum1 += np.log(x)* > * sum2 += x**(-1.0/xi)* > * return -((-len(data) * np.log(sigma)) - (1 + 1/xi)*sum1 - sum2) # > negated so we can use 'minimum'* > > *kk = nd.Hessian(log_likelihood)* > > Thanks in advance. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Fri Aug 8 10:37:48 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Fri, 8 Aug 2014 16:37:48 +0200 Subject: [Numpy-discussion] Calculation of a hessian In-Reply-To: References: Message-ID: Do it in pure numpy? How about copying the source of numdifftools? What exactly is the obstacle to using numdifftools? There seem to be no licensing issues. In my experience, its a crafty piece of work; and calculating a hessian correctly, accounting for all kinds of nasty floating point issues, is no walk in the park. Even if an analytical derivative isn't too big a pain in the ass to implement, there is a good chance that what numdifftools does is more numerically stable (though in all likelihood much slower). The only good reason for a specialized solution I can think of is speed; but be aware what you are trading it in for. If speed is your major concern though, you really cant go wrong with Theano. http://deeplearning.net/software/theano/library/gradient.html#theano.gradient.hessian -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Fri Aug 8 18:31:09 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Fri, 8 Aug 2014 22:31:09 +0000 (UTC) Subject: [Numpy-discussion] OpenBLAS and dotblas References: Message-ID: <1675599749429229671.817278sturla.molden-gmail.com@news.gmane.org> Charles R Harris wrote: > It looks like numpy dot only uses BLAS if ATLAS is present, see > numpy/core/setup.py. Has anyone done the mods needed to use OpenBLAS? What > is the current status of using OpenBLAS with numpy? I thought it also uses BLAS if MKL or Accerate Framework is present, but I am not sure about OpenBLAS, ACML or Cray libsci. Sturla From matthew.brett at gmail.com Fri Aug 8 21:41:21 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 8 Aug 2014 18:41:21 -0700 Subject: [Numpy-discussion] Help - numpy / scipy binary compatibility Message-ID: Hi, I would be very happy of some help trying to work out a numpy package binary incompatibility. I'm trying to work out what's happening for this ticket: https://github.com/scipy/scipy/issues/3863 which I summarized at the end: https://github.com/scipy/scipy/issues/3863#issuecomment-51669861 but basically, we're getting these errors: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility I now realize I am lost in the world of numpy / scipy etc binary compatibility, I'd really like some advice. In this case numpy == 1.8.1 scipy == 0.14.0 - compiled against numpy 1.5.1 scikit-learn == 0.15.1 compiled against numpy 1.6.0 Can y'all see any potential problem with those dependencies in binary builds? The relevant scipy Cython c files seem to guard against raising this error by doing not-strict checks of the e.g. numpy dtype, so I am confused how these errors come about. Can anyone give any pointers? Cheers, Matthew From cournape at gmail.com Fri Aug 8 23:04:57 2014 From: cournape at gmail.com (David Cournapeau) Date: Sat, 9 Aug 2014 12:04:57 +0900 Subject: [Numpy-discussion] Help - numpy / scipy binary compatibility In-Reply-To: References: Message-ID: On Sat, Aug 9, 2014 at 10:41 AM, Matthew Brett wrote: > Hi, > > I would be very happy of some help trying to work out a numpy package > binary incompatibility. > > I'm trying to work out what's happening for this ticket: > > https://github.com/scipy/scipy/issues/3863 > > which I summarized at the end: > > https://github.com/scipy/scipy/issues/3863#issuecomment-51669861 > > but basically, we're getting these errors: > > RuntimeWarning: numpy.dtype size changed, may indicate binary > incompatibility > > I now realize I am lost in the world of numpy / scipy etc binary > compatibility, I'd really like some advice. In this case > > numpy == 1.8.1 > scipy == 0.14.0 - compiled against numpy 1.5.1 > scikit-learn == 0.15.1 compiled against numpy 1.6.0 > > Can y'all see any potential problem with those dependencies in binary > builds? > > The relevant scipy Cython c files seem to guard against raising this > error by doing not-strict checks of the e.g. numpy dtype, so I am > confused how these errors come about. Can anyone give any pointers? > Assuming the message is not bogus, I would try import von_mises with a venv containing numpy 1.5.1, then 1.6.0, etc... to detect when the change happened. David > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmkleffner at gmail.com Sat Aug 9 08:15:16 2014 From: cmkleffner at gmail.com (Carl Kleffner) Date: Sat, 9 Aug 2014 14:15:16 +0200 Subject: [Numpy-discussion] OpenBLAS and dotblas In-Reply-To: <1675599749429229671.817278sturla.molden-gmail.com@news.gmane.org> References: <1675599749429229671.817278sturla.molden-gmail.com@news.gmane.org> Message-ID: numpy dot is using BLAS with OpenBLAS. Tested on Linux and Windows (see https://bitbucket.org/carlkl/mingw-w64-for-python/downloads) Regards Carl 2014-08-09 0:31 GMT+02:00 Sturla Molden : > Charles R Harris wrote: > > > It looks like numpy dot only uses BLAS if ATLAS is present, see > > numpy/core/setup.py. Has anyone done the mods needed to use OpenBLAS? > What > > is the current status of using OpenBLAS with numpy? > > I thought it also uses BLAS if MKL or Accerate Framework is present, but I > am not sure about OpenBLAS, ACML or Cray libsci. > > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Sat Aug 9 08:38:02 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Sat, 09 Aug 2014 14:38:02 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.8.2 bugfix release Message-ID: <53E6162A.8050809@googlemail.com> Hello, I am pleased to announce the release of NumPy 1.8.2, a pure bugfix release for the 1.8.x series. https://sourceforge.net/projects/numpy/files/NumPy/1.8.2/ The upgrade is recommended for all users of the 1.8.x series. Following issues have been fixed: * gh-4836: partition produces wrong results for multiple selections in equal ranges * gh-4656: Make fftpack._raw_fft threadsafe * gh-4628: incorrect argument order to _copyto in in np.nanmax, np.nanmin * gh-4642: Hold GIL for converting dtypes types with fields * gh-4733: fix np.linalg.svd(b, compute_uv=False) * gh-4853: avoid unaligned simd load on reductions on i386 * gh-4722: Fix seg fault converting empty string to object * gh-4613: Fix lack of NULL check in array_richcompare * gh-4774: avoid unaligned access for strided byteswap * gh-650: Prevent division by zero when creating arrays from some buffers * gh-4602: ifort has issues with optimization flag O2, use O1 The source distributions have been uploaded to PyPI. The Windows installers, documentation and release notes can be found at: https://sourceforge.net/projects/numpy/files/NumPy/1.8.2/ Cheers, Julian Taylor -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From charlesr.harris at gmail.com Sat Aug 9 10:28:54 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 9 Aug 2014 08:28:54 -0600 Subject: [Numpy-discussion] OpenBLAS and dotblas In-Reply-To: References: <1675599749429229671.817278sturla.molden-gmail.com@news.gmane.org> Message-ID: On Sat, Aug 9, 2014 at 6:15 AM, Carl Kleffner wrote: > numpy dot is using BLAS with OpenBLAS. Tested on Linux and Windows (see > https://bitbucket.org/carlkl/mingw-w64-for-python/downloads) > > Regards > > Carl > > > 2014-08-09 0:31 GMT+02:00 Sturla Molden : > > Charles R Harris wrote: >> >> > It looks like numpy dot only uses BLAS if ATLAS is present, see >> > numpy/core/setup.py. Has anyone done the mods needed to use OpenBLAS? >> What >> > is the current status of using OpenBLAS with numpy? >> >> I thought it also uses BLAS if MKL or Accerate Framework is present, but I >> am not sure about OpenBLAS, ACML or Cray libsci. >> >> Yeah, I figured that out, there is a comment in dotblas that says not, but checking how things are configured, it turns out they should be good. The original problem seems to have been that dotblas requires cblas and can't work with fortran blas. OTOH, linalg uses the f2c interface and, I presume, can link to fortran libraries. It would be nice to unify the two at some point. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Sat Aug 9 15:35:34 2014 From: matti.picus at gmail.com (Matti Picus) Date: Sat, 09 Aug 2014 22:35:34 +0300 Subject: [Numpy-discussion] NumPy-Discussion OpenBLAS and dotblas In-Reply-To: References: Message-ID: <53E67806.70602@gmail.com> Hi. I am working on numpy in pypy. It would be much more challenging for me if you merged more code into the core of numpy, that means even more must be duplicated when using a different underlying ndarray implementation. If you are thinking of touching linalg/lapack_lite/blas_lite, I would prefer a simpler model that uses ndarray as a simple storage container, does not use cpython-specifc capi extentions with reference counting, and interfaces with python via cffi[0]. Matti [0] https://pypi.python.org/pypi/cffi/ On 8/08/2014 8:00 PM, numpy-discussion-request at scipy.org wrote: > Message: 1 > Date: Thu, 7 Aug 2014 20:32:21 -0600 > From: Charles R Harris > Subject: [Numpy-discussion] OpenBLAS and dotblas > To: numpy-discussion > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > Hi All, > > It looks like numpy dot only uses BLAS if ATLAS is present, see > numpy/core/setup.py. Has anyone done the mods needed to use OpenBLAS? What > is the current status of using OpenBLAS with numpy? > > Also, I'm thinking of moving linalg/lapack_lite/blas_lite.c down into the > core. Thoughts? > > Chuck > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20140807/848ccce2/attachment-0001.html > > From njs at pobox.com Sat Aug 9 16:11:19 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 9 Aug 2014 21:11:19 +0100 Subject: [Numpy-discussion] NumPy-Discussion OpenBLAS and dotblas In-Reply-To: <53E67806.70602@gmail.com> References: <53E67806.70602@gmail.com> Message-ID: On Sat, Aug 9, 2014 at 8:35 PM, Matti Picus wrote: > Hi. I am working on numpy in pypy. It would be much more challenging for > me if you merged more code into the core of numpy, Hi Matti, I can definitely see how numpy changes cause trouble for you, and sympathize. But, can you elaborate on what kind of changes would make your life easier *that also* help make numpy proper better in their own right? Because unfortunately, I don't see how we can reasonably pass up on improvements to numpy if the only justification is to make numpypy's life easier. (I'd also love to see pypy become usable for general numerical work, but not only is it not there now, I don't see how numpypy will ultimately get us there even if we do help it along -- almost none of the ecosystem can get by numpy's python-level APIs alone.) But obviously if there are changes that are mutually beneficial, well then, that's a lot easier to justify :-) -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From charlesr.harris at gmail.com Sat Aug 9 16:11:29 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 9 Aug 2014 14:11:29 -0600 Subject: [Numpy-discussion] NumPy-Discussion OpenBLAS and dotblas In-Reply-To: <53E67806.70602@gmail.com> References: <53E67806.70602@gmail.com> Message-ID: On Sat, Aug 9, 2014 at 1:35 PM, Matti Picus wrote: > Hi. I am working on numpy in pypy. It would be much more challenging for > me if you merged more code into the core of numpy, that means even more > must be duplicated when using a different underlying ndarray > implementation. > If you are thinking of touching linalg/lapack_lite/blas_lite, I would > prefer a simpler model that uses ndarray as a simple storage container, > does not use cpython-specifc capi extentions with reference counting, > and interfaces with python via cffi[0]. > Matti > > [0] https://pypi.python.org/pypi/cffi/ > Could you be more specific? Numpy is pretty tightly bound to the cpython interface, changing that would require a fundamental rethink/redesign/redo Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Sat Aug 9 17:48:20 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sat, 09 Aug 2014 23:48:20 +0200 Subject: [Numpy-discussion] OpenBLAS and dotblas In-Reply-To: References: <1675599749429229671.817278sturla.molden-gmail.com@news.gmane.org> Message-ID: On 09/08/14 16:28, Charles R Harris wrote: > Yeah, I figured that out, there is a comment in dotblas that says not, > but checking how things are configured, it turns out they should be > good. The original problem seems to have been that dotblas requires > cblas and can't work with fortran blas. OTOH, linalg uses the f2c > interface and, I presume, can link to fortran libraries. It would be > nice to unify the two at some point. Right. At least ACML does not implement cblas, so I guess this one is not used then. Does anyone have a comprehensive list of which BLAS libraries are actually used for _dotblas? What is used for numpy.linalg I also don't know. Ideally we should use BLAS and LAPACK if it is present. (I think it does, but I am not sure.) Then there is the issue of C vs. Fortran order input: _dotblas can handle this correctly, but I don't think numpy.linalg is fast for C ordered arrays. But it is often possible to avoid this algorithmically. E.g. call LQ instead of QR if numpy.linalg.qr is invoked with C arrays. (There is little hope of optimizing scipy.linalg this way because of f2py, but numpy.linalg is easier to fix.) And then there is the GIL. Should numpy.linalg release it? Why or why not? Usually we can assume BLAS and LAPACK to be thread-safe, but it depends on the Fortran compiler. (I am not sure about f2c'd lapack_lite and blas_lite, though. f2c is famous for generating code that is not re-entrant.) _dotblas just assumes the BLAS is reentrant, even though it might not be. But we could also just specify that NumPy requires a re-entrant BLAS and LAPACK, and those that don't have one are self to blame for any trouble caused by multithreading. As for Fortran BLAS: Accelerate does not officially support Fortran BLAS or Fortran LAPACK, but there are wrappers for it in SciPy. If I remember correctly it has CLAPACK (not LAPACKE) with f2c ABI. It seems there are quite few things that need fixing or improvement... Sturla From matthew.brett at gmail.com Sat Aug 9 20:23:54 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 9 Aug 2014 17:23:54 -0700 Subject: [Numpy-discussion] ANN: NumPy 1.8.2 bugfix release In-Reply-To: <53E6162A.8050809@googlemail.com> References: <53E6162A.8050809@googlemail.com> Message-ID: On Sat, Aug 9, 2014 at 5:38 AM, Julian Taylor wrote: > Hello, > > I am pleased to announce the release of NumPy 1.8.2, a > pure bugfix release for the 1.8.x series. > https://sourceforge.net/projects/numpy/files/NumPy/1.8.2/ > The upgrade is recommended for all users of the 1.8.x series. > > Following issues have been fixed: > * gh-4836: partition produces wrong results for multiple selections in > equal ranges > * gh-4656: Make fftpack._raw_fft threadsafe > * gh-4628: incorrect argument order to _copyto in in np.nanmax, np.nanmin > * gh-4642: Hold GIL for converting dtypes types with fields > * gh-4733: fix np.linalg.svd(b, compute_uv=False) > * gh-4853: avoid unaligned simd load on reductions on i386 > * gh-4722: Fix seg fault converting empty string to object > * gh-4613: Fix lack of NULL check in array_richcompare > * gh-4774: avoid unaligned access for strided byteswap > * gh-650: Prevent division by zero when creating arrays from some buffers > * gh-4602: ifort has issues with optimization flag O2, use O1 > > > The source distributions have been uploaded to PyPI. The Windows > installers, documentation and release notes can be found at: > https://sourceforge.net/projects/numpy/files/NumPy/1.8.2/ OSX wheels now also up on pypi, please let us know of any problems, Cheers, Matthew From kikocorreoso at gmail.com Mon Aug 11 02:02:48 2014 From: kikocorreoso at gmail.com (Kiko) Date: Mon, 11 Aug 2014 08:02:48 +0200 Subject: [Numpy-discussion] Calculation of a hessian In-Reply-To: References: Message-ID: 2014-08-08 11:51 GMT+02:00 Jose Gomez-Dans : > Your function looks fairly simple to differentiate by hand, but if you > have access to the gradient (or you estimate it numerically using > scipy...), this function might do the job: > > def hessian ( x, the_func, epsilon=1e-8): > """Numerical approximation to the Hessian > Parameters > ------------ > x: array-like > The evaluation point > the_func: function > The function. We assume that the function returns the function > value and > the associated gradient as the second return element > epsilon: float > The size of the step > """ > > N = x.size > h = np.zeros((N,N)) > df_0 = the_func ( x )[1] > for i in xrange(N): > xx0 = 1.*x[i] > x[i] = xx0 + epsilon > df_1 = the_func ( x )[1] > h[i,:] = (df_1 - df_0)/epsilon > x[i] = xx0 > return h > > Jose > > Hi Jos?, Thanks for the answer. My idea would be to generalise the calculation of the Hessian, not just to differentiate the example I posted and I was wondering if Numpy/Scipy already had something similar to that provided by NumDiffTools. Thanks again. > > On 8 August 2014 08:31, Kiko wrote: > >> Hi all, >> >> I am trying to calculate a Hessian. I am using numdifftools for this ( >> https://pypi.python.org/pypi/Numdifftools). >> >> My question is, is it possible to make it using pure numpy?. >> >> The actual code is like this: >> >> >> *import numdifftools as nd* >> *import numpy as np* >> >> *def log_likelihood(params):* >> * sum1 = 0; sum2 = 0* >> * mu = params[0]; sigma = params[1]; xi = params[2]* >> * for z in data:* >> * x = 1 + xi * ((z-mu)/sigma)* >> * sum1 += np.log(x)* >> * sum2 += x**(-1.0/xi)* >> * return -((-len(data) * np.log(sigma)) - (1 + 1/xi)*sum1 - sum2) # >> negated so we can use 'minimum'* >> >> *kk = nd.Hessian(log_likelihood)* >> >> Thanks in advance. >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kikocorreoso at gmail.com Mon Aug 11 02:05:14 2014 From: kikocorreoso at gmail.com (Kiko) Date: Mon, 11 Aug 2014 08:05:14 +0200 Subject: [Numpy-discussion] Calculation of a hessian In-Reply-To: References: Message-ID: 2014-08-08 16:37 GMT+02:00 Eelco Hoogendoorn : > Do it in pure numpy? How about copying the source of numdifftools? > Of course it is a solution. I was just wondering if it exist something similar in the numpy/scipy packages so I do not have to use a new third party library to do that. > What exactly is the obstacle to using numdifftools? There seem to be no > licensing issues. In my experience, its a crafty piece of work; and > calculating a hessian correctly, accounting for all kinds of nasty floating > point issues, is no walk in the park. Even if an analytical derivative > isn't too big a pain in the ass to implement, there is a good chance that > what numdifftools does is more numerically stable (though in all likelihood > much slower). > > The only good reason for a specialized solution I can think of is speed; > but be aware what you are trading it in for. If speed is your major concern > though, you really cant go wrong with Theano. > > > http://deeplearning.net/software/theano/library/gradient.html#theano.gradient.hessian > > Thanks, it seems that NumDiffTools is the way to go. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Aug 11 02:39:12 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 11 Aug 2014 08:39:12 +0200 Subject: [Numpy-discussion] Help - numpy / scipy binary compatibility In-Reply-To: References: Message-ID: On Sat, Aug 9, 2014 at 5:04 AM, David Cournapeau wrote: > > > > On Sat, Aug 9, 2014 at 10:41 AM, Matthew Brett > wrote: > >> Hi, >> >> I would be very happy of some help trying to work out a numpy package >> binary incompatibility. >> >> I'm trying to work out what's happening for this ticket: >> >> https://github.com/scipy/scipy/issues/3863 >> >> which I summarized at the end: >> >> https://github.com/scipy/scipy/issues/3863#issuecomment-51669861 >> >> but basically, we're getting these errors: >> >> RuntimeWarning: numpy.dtype size changed, may indicate binary >> incompatibility >> >> I now realize I am lost in the world of numpy / scipy etc binary >> compatibility, I'd really like some advice. In this case >> >> numpy == 1.8.1 >> scipy == 0.14.0 - compiled against numpy 1.5.1 >> scikit-learn == 0.15.1 compiled against numpy 1.6.0 >> >> Can y'all see any potential problem with those dependencies in binary >> builds? >> >> The relevant scipy Cython c files seem to guard against raising this >> error by doing not-strict checks of the e.g. numpy dtype, so I am >> confused how these errors come about. Can anyone give any pointers? >> > > Assuming the message is not bogus, I would try import von_mises with a > venv containing numpy 1.5.1, then 1.6.0, etc... to detect when the change > happened. > That should be a recent change in 1.8.1, either because the dtype size did actually change or because the silencing of this message in NoseTester (numpy.testing) is not effective anymore. Note that the warning is too agressive, because it triggers both on ABI breaks and on backwards-compatible extensions to dtype. That's why it's filtered in numpy.testing. This was reported to Cython a while ago, not sure they've fixed the issue in the meantime or not. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Aug 11 02:41:56 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 11 Aug 2014 08:41:56 +0200 Subject: [Numpy-discussion] Help - numpy / scipy binary compatibility In-Reply-To: References: Message-ID: On Mon, Aug 11, 2014 at 8:39 AM, Ralf Gommers wrote: > > > > On Sat, Aug 9, 2014 at 5:04 AM, David Cournapeau > wrote: > >> >> >> >> On Sat, Aug 9, 2014 at 10:41 AM, Matthew Brett >> wrote: >> >>> Hi, >>> >>> I would be very happy of some help trying to work out a numpy package >>> binary incompatibility. >>> >>> I'm trying to work out what's happening for this ticket: >>> >>> https://github.com/scipy/scipy/issues/3863 >>> >>> which I summarized at the end: >>> >>> https://github.com/scipy/scipy/issues/3863#issuecomment-51669861 >>> >>> but basically, we're getting these errors: >>> >>> RuntimeWarning: numpy.dtype size changed, may indicate binary >>> incompatibility >>> >>> I now realize I am lost in the world of numpy / scipy etc binary >>> compatibility, I'd really like some advice. In this case >>> >>> numpy == 1.8.1 >>> scipy == 0.14.0 - compiled against numpy 1.5.1 >>> scikit-learn == 0.15.1 compiled against numpy 1.6.0 >>> >>> Can y'all see any potential problem with those dependencies in binary >>> builds? >>> >>> The relevant scipy Cython c files seem to guard against raising this >>> error by doing not-strict checks of the e.g. numpy dtype, so I am >>> confused how these errors come about. Can anyone give any pointers? >>> >> >> Assuming the message is not bogus, I would try import von_mises with a >> venv containing numpy 1.5.1, then 1.6.0, etc... to detect when the change >> happened. >> > > That should be a recent change in 1.8.1, either because the dtype size did > actually change or because the silencing of this message in NoseTester > (numpy.testing) is not effective anymore. > > Note that the warning is too agressive, because it triggers both on ABI > breaks and on backwards-compatible extensions to dtype. That's why it's > filtered in numpy.testing. This was reported to Cython a while ago, not > sure they've fixed the issue in the meantime or not. > Never mind, saw that you already figured out that it's due to scikit-learn: https://github.com/scipy/scipy/issues/3863 Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon Aug 11 02:53:19 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 10 Aug 2014 23:53:19 -0700 Subject: [Numpy-discussion] Help - numpy / scipy binary compatibility In-Reply-To: References: Message-ID: Hi, On Sun, Aug 10, 2014 at 11:41 PM, Ralf Gommers wrote: > > > > On Mon, Aug 11, 2014 at 8:39 AM, Ralf Gommers > wrote: >> >> >> >> >> On Sat, Aug 9, 2014 at 5:04 AM, David Cournapeau >> wrote: >>> >>> >>> >>> >>> On Sat, Aug 9, 2014 at 10:41 AM, Matthew Brett >>> wrote: >>>> >>>> Hi, >>>> >>>> I would be very happy of some help trying to work out a numpy package >>>> binary incompatibility. >>>> >>>> I'm trying to work out what's happening for this ticket: >>>> >>>> https://github.com/scipy/scipy/issues/3863 >>>> >>>> which I summarized at the end: >>>> >>>> https://github.com/scipy/scipy/issues/3863#issuecomment-51669861 >>>> >>>> but basically, we're getting these errors: >>>> >>>> RuntimeWarning: numpy.dtype size changed, may indicate binary >>>> incompatibility >>>> >>>> I now realize I am lost in the world of numpy / scipy etc binary >>>> compatibility, I'd really like some advice. In this case >>>> >>>> numpy == 1.8.1 >>>> scipy == 0.14.0 - compiled against numpy 1.5.1 >>>> scikit-learn == 0.15.1 compiled against numpy 1.6.0 >>>> >>>> Can y'all see any potential problem with those dependencies in binary >>>> builds? >>>> >>>> The relevant scipy Cython c files seem to guard against raising this >>>> error by doing not-strict checks of the e.g. numpy dtype, so I am >>>> confused how these errors come about. Can anyone give any pointers? >>> >>> >>> Assuming the message is not bogus, I would try import von_mises with a >>> venv containing numpy 1.5.1, then 1.6.0, etc... to detect when the change >>> happened. >> >> >> That should be a recent change in 1.8.1, either because the dtype size did >> actually change or because the silencing of this message in NoseTester >> (numpy.testing) is not effective anymore. >> >> Note that the warning is too agressive, because it triggers both on ABI >> breaks and on backwards-compatible extensions to dtype. That's why it's >> filtered in numpy.testing. This was reported to Cython a while ago, not sure >> they've fixed the issue in the meantime or not. > > > Never mind, saw that you already figured out that it's due to scikit-learn: > https://github.com/scipy/scipy/issues/3863 Yes, sorry, I should have reported back to the list - the problem was that sklearn is removing all the warnings filters that numpy installs. For reference, no, Cython left the warnings are they were. As you implied, Cython raises these warnings (if not filtered by numpy) if the various numpy C structs have increased size since compile time, including numpy dtype ufuncs and ndarrays. The structs get bigger when we add to the entries in the struct; this is backwards compatible but not forwards compatible. Because I got very confused I wrote this little piece of code to show current compiled and in-memory sizes of dtypes and ufuncs: https://github.com/matthew-brett/npsizes Giving (result of ./npreport): Numpy version: 1.5.1 dtype: static size 80; memory size 80 ndarray: static size 80; memory size 80 ufunc: static size 144; memory size 144 Numpy version: 1.6.0 dtype: static size 80; memory size 80 ndarray: static size 80; memory size 80 ufunc: static size 144; memory size 144 Numpy version: 1.7.1 dtype: static size 88; memory size 88 ndarray: static size 80; memory size 80 ufunc: static size 176; memory size 176 Numpy version: 1.8.1 dtype: static size 88; memory size 88 ndarray: static size 80; memory size 80 ufunc: static size 192; memory size 192 on OSX.... Cheers, Matthew From jtaylor.debian at googlemail.com Mon Aug 11 03:30:30 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Mon, 11 Aug 2014 09:30:30 +0200 Subject: [Numpy-discussion] Help - numpy / scipy binary compatibility In-Reply-To: References: Message-ID: <53E87116.1040107@googlemail.com> On 11.08.2014 08:53, Matthew Brett wrote: > Hi, > > On Sun, Aug 10, 2014 at 11:41 PM, Ralf Gommers wrote: >> >> >> >> On Mon, Aug 11, 2014 at 8:39 AM, Ralf Gommers >> wrote: >>> >>> >>> >>> >>> On Sat, Aug 9, 2014 at 5:04 AM, David Cournapeau >>> wrote: >>>> >>>> >>>> >>>> >>>> On Sat, Aug 9, 2014 at 10:41 AM, Matthew Brett >>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> I would be very happy of some help trying to work out a numpy package >>>>> binary incompatibility. >>>>> >>>>> I'm trying to work out what's happening for this ticket: >>>>> >>>>> https://github.com/scipy/scipy/issues/3863 >>>>> >>>>> which I summarized at the end: >>>>> >>>>> https://github.com/scipy/scipy/issues/3863#issuecomment-51669861 >>>>> >>>>> but basically, we're getting these errors: >>>>> >>>>> RuntimeWarning: numpy.dtype size changed, may indicate binary >>>>> incompatibility >>>>> >>>>> I now realize I am lost in the world of numpy / scipy etc binary >>>>> compatibility, I'd really like some advice. In this case >>>>> >>>>> numpy == 1.8.1 >>>>> scipy == 0.14.0 - compiled against numpy 1.5.1 >>>>> scikit-learn == 0.15.1 compiled against numpy 1.6.0 >>>>> >>>>> Can y'all see any potential problem with those dependencies in binary >>>>> builds? >>>>> >>>>> The relevant scipy Cython c files seem to guard against raising this >>>>> error by doing not-strict checks of the e.g. numpy dtype, so I am >>>>> confused how these errors come about. Can anyone give any pointers? >>>> >>>> >>>> Assuming the message is not bogus, I would try import von_mises with a >>>> venv containing numpy 1.5.1, then 1.6.0, etc... to detect when the change >>>> happened. >>> >>> >>> That should be a recent change in 1.8.1, either because the dtype size did >>> actually change or because the silencing of this message in NoseTester >>> (numpy.testing) is not effective anymore. >>> >>> Note that the warning is too agressive, because it triggers both on ABI >>> breaks and on backwards-compatible extensions to dtype. That's why it's >>> filtered in numpy.testing. This was reported to Cython a while ago, not sure >>> they've fixed the issue in the meantime or not. >> >> >> Never mind, saw that you already figured out that it's due to scikit-learn: >> https://github.com/scipy/scipy/issues/3863 > > Yes, sorry, I should have reported back to the list - the problem was > that sklearn is removing all the warnings filters that numpy installs. > > For reference, no, Cython left the warnings are they were. > > As you implied, Cython raises these warnings (if not filtered by > numpy) if the various numpy C structs have increased size since > compile time, including numpy dtype ufuncs and ndarrays. The structs > get bigger when we add to the entries in the struct; this is backwards > compatible but not forwards compatible. should we deprecate use of the ufunc and dtype structures? Or are the internals of them used too much outside of numpy? I am thinking about changing the ufunc size yet again for 1.10, and it already has far too many members third parties should probably have even seen. From kwmsmith at gmail.com Mon Aug 11 17:09:58 2014 From: kwmsmith at gmail.com (Kurt Smith) Date: Mon, 11 Aug 2014 16:09:58 -0500 Subject: [Numpy-discussion] ANN: DistArray 0.5 release Message-ID: =============================================== DistArray 0.5 release =============================================== **Mailing list:** distarray at googlegroups.com **Documentation:** http://distarray.readthedocs.org **License:** Three-clause BSD **Python versions:** 2.7, 3.3, and 3.4 **OS support:** \*nix and Mac OS X What is DistArray? ------------------ DistArray aims to bring the ease-of-use of NumPy to data-parallel high-performance computing. It provides distributed multi-dimensional NumPy arrays, distributed ufuncs, and distributed IO capabilities. It can efficiently interoperate with external distributed libraries like Trilinos. DistArray works with NumPy and builds on top of it in a flexible and natural way. 0.5 Release ----------- Noteworthy improvements in this release include: * closer alignment with NumPy's API, * support for Python 3.4 (existing support for Python 2.7 and 3.3), * a performance-oriented MPI-only mode for deployment on clusters and supercomputers, * a way to register user-defined functions to be callable locally on worker processes, * more consistent naming of sub-packages, * testing with MPICH2 (already tested against OpenMPI), * improved and expanded examples, * installed version testable via ``distarray.test()``, and * performance and scaling improvements. With this release, DistArray ready for real-world testing and deployment. The project is still evolving rapidly and we appreciate the continued input from the larger scientific-Python community. Existing features ----------------- DistArray: * supports NumPy-like slicing, reductions, and ufuncs on distributed multidimensional arrays; * has a client-engine process design -- data resides on the worker processes, commands are initiated from master; * allows full control over what is executed on the worker processes and integrates transparently with the master process; * allows direct communication between workers, bypassing the master process for scalability; * integrates with IPython.parallel for interactive creation and exploration of distributed data; * supports distributed ufuncs (currently without broadcasting); * builds on and leverages MPI via MPI4Py in a transparent and user-friendly way; * has basic support for unstructured arrays; * supports user-controllable array distributions across workers (block, cyclic, block-cyclic, and unstructured) on a per-axis basis; * has a straightforward API to control how an array is distributed; * has basic plotting support for visualization of array distributions; * separates the array?s distribution from the array?s data -- useful for slicing, reductions, redistribution, broadcasting, and other operations; * implements distributed random arrays; * supports ``.npy``-like flat-file IO and hdf5 parallel IO (via ``h5py``); leverages MPI-based IO parallelism in an easy-to-use and transparent way; and * supports the distributed array protocol [protocol]_, which allows independently developed parallel libraries to share distributed arrays without copying, analogous to the PEP-3118 new buffer protocol. Planned features and roadmap ---------------------------- Near-term features and improvements include: * array re-distribution capabilities; * lazy evaluation and deferred computation for latency hiding; * interoperation with Trilinos [Trilinos]_; and * distributed broadcasting support. The longer-term roadmap includes: * Integration with other packages [petsc]_ that subscribe to the distributed array protocol [protocol]_; * Distributed fancy indexing; * Out-of-core computations; * Support for distributed sorting and other non-trivial distributed algorithms; and * End-user control over communication and temporary array creation, and other performance aspects of distributed computations. History and funding ------------------- Brian Granger started DistArray as a NASA-funded SBIR project in 2008. Enthought picked it up as part of a DOE Phase II SBIR [SBIR]_ to provide a generally useful distributed array package. It builds on NumPy, MPI, MPI4Py, IPython, IPython.parallel, and interfaces with the Trilinos suite of distributed HPC solvers (via PyTrilinos [Trilinos]_). This material is based upon work supported by the Department of Energy under Award Number DE-SC0007699. This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof. .. [protocol] http://distributed-array-protocol.readthedocs.org/en/rel-0.10.0/ .. [Trilinos] http://trilinos.org/ .. [petsc] http://www.mcs.anl.gov/petsc/ .. [SBIR] http://www.sbir.gov/sbirsearch/detail/410257 -------------- next part -------------- An HTML attachment was scrubbed... URL: From matti.picus at gmail.com Mon Aug 11 17:46:19 2014 From: matti.picus at gmail.com (Matti Picus) Date: Tue, 12 Aug 2014 00:46:19 +0300 Subject: [Numpy-discussion] NumPy-Discussion OpenBLAS and dotblas In-Reply-To: References: Message-ID: <53E939AB.4010305@gmail.com> Hi Nathaniel. Thanks for your prompt reply. I think numpy is a wonderful project, and you all do a great job moving it forward. If you ask what would my vision for maturing numpy, I would like to see a grouping of linalg matrix-operation functionality into a python level package, exactly the opposite of more tightly tying linalg into the core of numpy. The orthagonality would allow goups like PyOpenCL to reuse the matrix operations on data located off the CPU's RAM, just to give one example; and make it easier for non-numpy developers to create a complete replacement of lapack with other implementations. Much of the linalg package would of course be implemented in c or fortran, but the interface to ndarray would use the well-established idea of contiguous matrices with shapes, strides, and a single memory store, supporting only numeric number types. I suggested cffi since it provides a convienent and efficient interface to ndarray. Thus python could remain as a thin wrapper over the calls out to c-based libraries much like lapack_lite does today, but at the python level rather that the capi level. Yes, a python-based interface would slows the code down a bit, but I would argue that 1. the current state of lapack_litemodule.c and umath_linalg.c.src, with its myriad of compile-time macros and complex code paths, scares people away from contributing to the ongoing maintenance of the library while tying the code very closely to the lapack routines, and 2. matrices larger than 3x3 or so should be spending most of the computation time in the underlying lapack/blas library irregardless of whether the interface is python-based or capi-based. Matti On 10/08/2014 8:00 PM, numpy-discussion-request at scipy.org wrote: > > Date: Sat, 9 Aug 2014 21:11:19 +0100 > From: Nathaniel Smith > Subject: Re: [Numpy-discussion] NumPy-Discussion OpenBLAS and dotblas > To: Discussion of Numerical Python > > On Sat, Aug 9, 2014 at 8:35 PM, Matti Picus wrote: >> Hi. I am working on numpy in pypy. It would be much more challenging for >> me if you merged more code into the core of numpy, > Hi Matti, > > I can definitely see how numpy changes cause trouble for you, and > sympathize. But, can you elaborate on what kind of changes would make > your life easier *that also* help make numpy proper better in their > own right? Because unfortunately, I don't see how we can reasonably > pass up on improvements to numpy if the only justification is to make > numpypy's life easier. (I'd also love to see pypy become usable for > general numerical work, but not only is it not there now, I don't see > how numpypy will ultimately get us there even if we do help it along > -- almost none of the ecosystem can get by numpy's python-level APIs > alone.) But obviously if there are changes that are mutually > beneficial, well then, that's a lot easier to justify :-) > > -n From sturla.molden at gmail.com Tue Aug 12 10:06:15 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 12 Aug 2014 14:06:15 +0000 (UTC) Subject: [Numpy-discussion] NumPy-Discussion OpenBLAS and dotblas References: <53E939AB.4010305@gmail.com> Message-ID: <794697273429544473.299975sturla.molden-gmail.com@news.gmane.org> Matti Picus wrote: > Thanks for your prompt reply. I think numpy is a wonderful project, and > you all do a great job moving it forward. > If you ask what would my vision for maturing numpy, I would like to see > a grouping of linalg matrix-operation functionality into a python level > package, exactly the opposite of more tightly tying linalg into the core > of numpy. But with the @ operator in Python 3.5 it makes sence to have both matrix multiplication and linear algebra solvers in the core of NumP. Just consider: A @ B A.LazyInverse @ B A @ B.LazyInverse (Lazy matrix inversion with O(1) complexity does not exist right now, but could be added in the future.) To implement this efficiently, we need BLAS and LAPACK in the core of NumPy. It does not mean there would not be a linalg namespace for LU, SVD, et al. Sturla From njs at pobox.com Tue Aug 12 10:26:09 2014 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 12 Aug 2014 15:26:09 +0100 Subject: [Numpy-discussion] NumPy-Discussion OpenBLAS and dotblas In-Reply-To: <53E939AB.4010305@gmail.com> References: <53E939AB.4010305@gmail.com> Message-ID: Hi Matt, On Mon, Aug 11, 2014 at 10:46 PM, Matti Picus wrote: > Hi Nathaniel. > Thanks for your prompt reply. I think numpy is a wonderful project, and you > all do a great job moving it forward. > If you ask what would my vision for maturing numpy, I would like to see a > grouping of linalg matrix-operation functionality into a python level > package, exactly the opposite of more tightly tying linalg into the core of > numpy. As I understood it (though I admit Chuck was pretty terse, maybe he'll correct me :-)), what he was proposing was basically just a build system reorganization -- it's much easier to call between C functions that are in the same Python module than C functions that are in different modules, so we end up with lots of boilerplate gunk for the latter. I don't think it would involve any tighter coupling than we already have in practice. > The orthagonality would allow goups like PyOpenCL to reuse the matrix > operations on data located off the CPU's RAM, just to give one example; and > make it easier for non-numpy developers to create a complete replacement of > lapack with other implementations. I guess I don't really understand what you're suggesting. If we have a separate package that is the same as current np.linalg, then how does that allow PyOpenCL to suddenly run the np.linalg code on the GPU? What kind of re-use are you envisioning? The important kind of re-use that comes to mind for me is that I should be able to write code that can accept either a RAM matrix or a GPU matrix and works the same. But the key feature to enable this is that there should be a single API that works on both types of objects -- e.g. np.dot(a, b) should work even if a, b are on the GPU. But this is exactly what __numpy_ufunc__ is designed to enable, and that has nothing to do with splitting linalg off into a separate package... And of course if someone has a better idea about how to implement lapack, then they should do that work in the numpy repo so everyone can benefit, not go off and reimplement their own version from scratch that no-one will use :-). > Much of the linalg package would of > course be implemented in c or fortran, but the interface to ndarray would > use the well-established idea of contiguous matrices with shapes, strides, > and a single memory store, supporting only numeric number types. It's actually possible today for third-party users to add support for third-party dtypes to most linalg operations, b/c most linalg operations are implemented using the numpy ufunc machinery. > I suggested cffi since it provides a convienent and efficient interface to > ndarray. Thus python could remain as a thin wrapper over the calls out to > c-based libraries much like lapack_lite does today, but at the python level > rather that the capi level. > Yes, a python-based interface would slows the code down a bit, but I would > argue that > 1. the current state of lapack_litemodule.c and umath_linalg.c.src, with its > myriad of compile-time macros and complex code paths, scares people away > from contributing to the ongoing maintenance of the library while tying the > code very closely to the lapack routines, and I agree that simple is better than complex, but I don't see how moving those macros and code paths into a separate package decreases complexity. If anything it would increase complexity, because now we have two repos instead of one, two release schedules instead of one, and n^2 combinations of (linalg version, numpy version) to test against. -n > 2. matrices larger than 3x3 or so should be spending most of the computation > time in the underlying lapack/blas library irregardless of whether the > interface is python-based or capi-based. > Matti > > On 10/08/2014 8:00 PM, numpy-discussion-request at scipy.org wrote: >> >> >> Date: Sat, 9 Aug 2014 21:11:19 +0100 >> From: Nathaniel Smith >> Subject: Re: [Numpy-discussion] NumPy-Discussion OpenBLAS and dotblas >> To: Discussion of Numerical Python >> >> >> On Sat, Aug 9, 2014 at 8:35 PM, Matti Picus wrote: >>> >>> Hi. I am working on numpy in pypy. It would be much more challenging for >>> me if you merged more code into the core of numpy, >> >> Hi Matti, >> >> >> I can definitely see how numpy changes cause trouble for you, and >> sympathize. But, can you elaborate on what kind of changes would make >> your life easier *that also* help make numpy proper better in their >> own right? Because unfortunately, I don't see how we can reasonably >> pass up on improvements to numpy if the only justification is to make >> numpypy's life easier. (I'd also love to see pypy become usable for >> general numerical work, but not only is it not there now, I don't see >> how numpypy will ultimately get us there even if we do help it along >> -- almost none of the ecosystem can get by numpy's python-level APIs >> alone.) But obviously if there are changes that are mutually >> beneficial, well then, that's a lot easier to justify :-) >> >> -n > > -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From warren.weckesser at gmail.com Tue Aug 12 11:35:43 2014 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Tue, 12 Aug 2014 11:35:43 -0400 Subject: [Numpy-discussion] New function `count_unique` to generate contingency tables. Message-ID: I created a pull request (https://github.com/numpy/numpy/pull/4958) that defines the function `count_unique`. `count_unique` generates a contingency table from a collection of sequences. For example, In [7]: x = [1, 1, 1, 1, 2, 2, 2, 2, 2] In [8]: y = [3, 4, 3, 3, 3, 4, 5, 5, 5] In [9]: (xvals, yvals), counts = count_unique(x, y) In [10]: xvals Out[10]: array([1, 2]) In [11]: yvals Out[11]: array([3, 4, 5]) In [12]: counts Out[12]: array([[3, 1, 0], [1, 1, 3]]) It can be interpreted as a multi-argument generalization of `np.unique(x, return_counts=True)`. It overlaps with Pandas' `crosstab`, but I think this is a pretty fundamental counting operation that fits in numpy. Matlab's `crosstab` (http://www.mathworks.com/help/stats/crosstab.html) and R's `table` perform the same calculation (with a few more bells and whistles). For comparison, here's Pandas' `crosstab` (same `x` and `y` as above): In [28]: import pandas as pd In [29]: xs = pd.Series(x) In [30]: ys = pd.Series(y) In [31]: pd.crosstab(xs, ys) Out[31]: col_0 3 4 5 row_0 1 3 1 0 2 1 1 3 And here is R's `table`: > x <- c(1,1,1,1,2,2,2,2,2) > y <- c(3,4,3,3,3,4,5,5,5) > table(x, y) y x 3 4 5 1 3 1 0 2 1 1 3 Is there any interest in adding this (or some variation of it) to numpy? Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Tue Aug 12 11:57:57 2014 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Tue, 12 Aug 2014 11:57:57 -0400 Subject: [Numpy-discussion] New function `count_unique` to generate contingency tables. In-Reply-To: References: Message-ID: On Tue, Aug 12, 2014 at 11:35 AM, Warren Weckesser < warren.weckesser at gmail.com> wrote: > I created a pull request (https://github.com/numpy/numpy/pull/4958) that > defines the function `count_unique`. `count_unique` generates a > contingency table from a collection of sequences. For example, > > In [7]: x = [1, 1, 1, 1, 2, 2, 2, 2, 2] > > In [8]: y = [3, 4, 3, 3, 3, 4, 5, 5, 5] > > In [9]: (xvals, yvals), counts = count_unique(x, y) > > In [10]: xvals > Out[10]: array([1, 2]) > > In [11]: yvals > Out[11]: array([3, 4, 5]) > > In [12]: counts > Out[12]: > array([[3, 1, 0], > [1, 1, 3]]) > > > It can be interpreted as a multi-argument generalization of `np.unique(x, > return_counts=True)`. > > It overlaps with Pandas' `crosstab`, but I think this is a pretty > fundamental counting operation that fits in numpy. > > Matlab's `crosstab` (http://www.mathworks.com/help/stats/crosstab.html) > and R's `table` perform the same calculation (with a few more bells and > whistles). > > > For comparison, here's Pandas' `crosstab` (same `x` and `y` as above): > > In [28]: import pandas as pd > > In [29]: xs = pd.Series(x) > > In [30]: ys = pd.Series(y) > > In [31]: pd.crosstab(xs, ys) > Out[31]: > col_0 3 4 5 > row_0 > 1 3 1 0 > 2 1 1 3 > > > And here is R's `table`: > > > x <- c(1,1,1,1,2,2,2,2,2) > > y <- c(3,4,3,3,3,4,5,5,5) > > table(x, y) > y > x 3 4 5 > 1 3 1 0 > 2 1 1 3 > > > Is there any interest in adding this (or some variation of it) to numpy? > > > Warren > > While searching StackOverflow in the numpy tag for "count unique", I just discovered that I basically reinvented Eelco Hoogendoorn's code in his answer to http://stackoverflow.com/questions/10741346/numpy-frequency-counts-for-unique-values-in-an-array. Nice one, Eelco! Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Tue Aug 12 12:17:09 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Tue, 12 Aug 2014 18:17:09 +0200 Subject: [Numpy-discussion] New function `count_unique` to generate contingency tables. In-Reply-To: References: Message-ID: Thanks. Prompted by that stackoverflow question, and similar problems I had to deal with myself, I started working on a much more general extension to numpy's functionality in this space. Like you noted, things get a little panda-y, but I think there is a lot of panda's functionality that could or should be part of the numpy core, a robust set of grouping operations in particular. see pastebin here: http://pastebin.com/c5WLWPbp Ive posted about it on this list before, but without apparent interest; and I havnt gotten around to getting this up to professional standards yet either. But there is a lot more that could be done in this direction. Note that the count functionality in the stackoverflow answer is relatively indirect and inefficient, using the inverse_index and such. A much more efficient method is obtained by the code used here. On Tue, Aug 12, 2014 at 5:57 PM, Warren Weckesser < warren.weckesser at gmail.com> wrote: > > > > On Tue, Aug 12, 2014 at 11:35 AM, Warren Weckesser < > warren.weckesser at gmail.com> wrote: > >> I created a pull request (https://github.com/numpy/numpy/pull/4958) that >> defines the function `count_unique`. `count_unique` generates a >> contingency table from a collection of sequences. For example, >> >> In [7]: x = [1, 1, 1, 1, 2, 2, 2, 2, 2] >> >> In [8]: y = [3, 4, 3, 3, 3, 4, 5, 5, 5] >> >> In [9]: (xvals, yvals), counts = count_unique(x, y) >> >> In [10]: xvals >> Out[10]: array([1, 2]) >> >> In [11]: yvals >> Out[11]: array([3, 4, 5]) >> >> In [12]: counts >> Out[12]: >> array([[3, 1, 0], >> [1, 1, 3]]) >> >> >> It can be interpreted as a multi-argument generalization of `np.unique(x, >> return_counts=True)`. >> >> It overlaps with Pandas' `crosstab`, but I think this is a pretty >> fundamental counting operation that fits in numpy. >> >> Matlab's `crosstab` (http://www.mathworks.com/help/stats/crosstab.html) >> and R's `table` perform the same calculation (with a few more bells and >> whistles). >> >> >> For comparison, here's Pandas' `crosstab` (same `x` and `y` as above): >> >> In [28]: import pandas as pd >> >> In [29]: xs = pd.Series(x) >> >> In [30]: ys = pd.Series(y) >> >> In [31]: pd.crosstab(xs, ys) >> Out[31]: >> col_0 3 4 5 >> row_0 >> 1 3 1 0 >> 2 1 1 3 >> >> >> And here is R's `table`: >> >> > x <- c(1,1,1,1,2,2,2,2,2) >> > y <- c(3,4,3,3,3,4,5,5,5) >> > table(x, y) >> y >> x 3 4 5 >> 1 3 1 0 >> 2 1 1 3 >> >> >> Is there any interest in adding this (or some variation of it) to numpy? >> >> >> Warren >> >> > > While searching StackOverflow in the numpy tag for "count unique", I just > discovered that I basically reinvented Eelco Hoogendoorn's code in his > answer to > http://stackoverflow.com/questions/10741346/numpy-frequency-counts-for-unique-values-in-an-array. > Nice one, Eelco! > > Warren > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joferkington at gmail.com Tue Aug 12 12:33:16 2014 From: joferkington at gmail.com (Joe Kington) Date: Tue, 12 Aug 2014 11:33:16 -0500 Subject: [Numpy-discussion] New function `count_unique` to generate contingency tables. In-Reply-To: References: Message-ID: On Tue, Aug 12, 2014 at 11:17 AM, Eelco Hoogendoorn < hoogendoorn.eelco at gmail.com> wrote: > Thanks. Prompted by that stackoverflow question, and similar problems I > had to deal with myself, I started working on a much more general extension > to numpy's functionality in this space. Like you noted, things get a little > panda-y, but I think there is a lot of panda's functionality that could or > should be part of the numpy core, a robust set of grouping operations in > particular. > > see pastebin here: > http://pastebin.com/c5WLWPbp > On a side note, this is related to a pull request of mine from awhile back: https://github.com/numpy/numpy/pull/3584 There was a lot of disagreement on the mailing list about what to call a "unique slices along a given axis" function, so I wound up closing the pull request pending more discussion. At any rate, I think it's a useful thing to have in "base" numpy. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Tue Aug 12 12:51:10 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Tue, 12 Aug 2014 18:51:10 +0200 Subject: [Numpy-discussion] New function `count_unique` to generate contingency tables. In-Reply-To: References: Message-ID: ah yes, that's also an issue I was trying to deal with. the semantics I prefer in these type of operators, is (as a default), to have every array be treated as a sequence of keys, so if calling unique(arr_2d), youd get unique rows, unless you pass axis=None, in which case the array is flattened. I also agree that the extension you propose here is useful; but ideally, with a little more discussion on these subjects we can converge on an even more comprehensive overhaul On Tue, Aug 12, 2014 at 6:33 PM, Joe Kington wrote: > > > > On Tue, Aug 12, 2014 at 11:17 AM, Eelco Hoogendoorn < > hoogendoorn.eelco at gmail.com> wrote: > >> Thanks. Prompted by that stackoverflow question, and similar problems I >> had to deal with myself, I started working on a much more general extension >> to numpy's functionality in this space. Like you noted, things get a little >> panda-y, but I think there is a lot of panda's functionality that could or >> should be part of the numpy core, a robust set of grouping operations in >> particular. >> >> see pastebin here: >> http://pastebin.com/c5WLWPbp >> > > On a side note, this is related to a pull request of mine from awhile > back: https://github.com/numpy/numpy/pull/3584 > > There was a lot of disagreement on the mailing list about what to call a > "unique slices along a given axis" function, so I wound up closing the pull > request pending more discussion. > > At any rate, I think it's a useful thing to have in "base" numpy. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Aug 12 13:50:21 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 12 Aug 2014 11:50:21 -0600 Subject: [Numpy-discussion] NumPy-Discussion OpenBLAS and dotblas In-Reply-To: References: <53E939AB.4010305@gmail.com> Message-ID: On Tue, Aug 12, 2014 at 8:26 AM, Nathaniel Smith wrote: > Hi Matt, > > On Mon, Aug 11, 2014 at 10:46 PM, Matti Picus > wrote: > > Hi Nathaniel. > > Thanks for your prompt reply. I think numpy is a wonderful project, and > you > > all do a great job moving it forward. > > If you ask what would my vision for maturing numpy, I would like to see a > > grouping of linalg matrix-operation functionality into a python level > > package, exactly the opposite of more tightly tying linalg into the core > of > > numpy. > > As I understood it (though I admit Chuck was pretty terse, maybe he'll > correct me :-)), what he was proposing was basically just a build > system reorganization -- it's much easier to call between C functions > that are in the same Python module than C functions that are in > different modules, so we end up with lots of boilerplate gunk for the > latter. I don't think it would involve any tighter coupling than we > already have in practice. > I'm trying to think of the correct sequence of moves. Here are my current thoughts. - Move _dotblas down into multiarray 1. When there is cblas, add cblas implementations of decr->f->dot. 2. Reimplement API matrixproduct2 3. Make ndarray.dot a first class method and use it for numpy.dot. - Implement matmul 1. Add matrixmultiply (matmul?) to the numpy API 2. Implement __matmul__ method. 3. Add functions to linalg for stacked vectors. 4. Make sure __matmul__ works with __numpy_ufunc__ - Consider using blas_lite instead of cblas, but that is now independent of the previous steps. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Tue Aug 12 14:08:47 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 12 Aug 2014 18:08:47 +0000 (UTC) Subject: [Numpy-discussion] NumPy-Discussion OpenBLAS and dotblas References: <53E939AB.4010305@gmail.com> Message-ID: <1001481297429559625.786097sturla.molden-gmail.com@news.gmane.org> Charles R Harris wrote: > - Consider using blas_lite instead of cblas, but that is now independent > of the previous steps. It should also be possible to build reference cblas on top of blas_lite. (Or just create a wrapper for the parts of cblas we need.) Sturla From sturla.molden at gmail.com Tue Aug 12 15:24:28 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 12 Aug 2014 19:24:28 +0000 (UTC) Subject: [Numpy-discussion] NumPy-Discussion OpenBLAS and dotblas References: <53E939AB.4010305@gmail.com> Message-ID: <1197348250429560368.571367sturla.molden-gmail.com@news.gmane.org> Charles R Harris wrote: > - Move _dotblas down into multiarray > 1. When there is cblas, add cblas implementations of decr->f->dot. > 2. Reimplement API matrixproduct2 > 3. Make ndarray.dot a first class method and use it for numpy.dot. > - Implement matmul > 1. Add matrixmultiply (matmul?) to the numpy API > 2. Implement __matmul__ method. > 3. Add functions to linalg for stacked vectors. > 4. Make sure __matmul__ works with __numpy_ufunc__ > - Consider using blas_lite instead of cblas, but that is now independent > of the previous steps. We could consider to have a linalg._linalg module that just stores BLAS and LAPACK function pointer values as read-only integer attributes. This way we could move _dotblas into the core without actually having linalg in the core. linalg._linalg would just sit there and own BLAS and LAPACK, and no other part of NumPy would need build dependencies on these libraries. When _dotblas is imported it just imports linalg._linalg and reads whatever function pointer value it needs. It would also make it possible to remove BLAS and LAPACK build dependencies from SciPy, as long as we export most or all of BLAS and LAPACK. Sturla From matthew.brett at gmail.com Tue Aug 12 15:24:39 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 12 Aug 2014 12:24:39 -0700 Subject: [Numpy-discussion] Fwd: We need help working on code coverage for Cython code In-Reply-To: References: Message-ID: Hi, Sorry for those of you also on the scikit-image mailing list - but here again I'm asking for help to get coverage working for Cython code. Over on another mailing list, we've hit a big problem trying to work out coverage on a large amount of Cython code. As y'all probably know, there's no automated way of checking code coverage on Cython code at the moment. The Cython developers have done some work on this [1] but it is currently stalled for lack of developer time to work on it. We'd really like to get this working, and the Cython developers have offered to help, to get this started. Can anyone help us out by a) joining an interactive discussion for 15 minutes or so with the Cython developers to get us started b) helping with a short burst of coding that will follow, we estimate a few days. I think this is something many of us need, and it would also be a thank you to the Cython team for their work, which we all use so much. Cheers, Matthew [1] http://trac.cython.org/cython_trac/ticket/815 From ralf.gommers at gmail.com Tue Aug 12 15:32:58 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 12 Aug 2014 21:32:58 +0200 Subject: [Numpy-discussion] NumPy-Discussion OpenBLAS and dotblas In-Reply-To: <1197348250429560368.571367sturla.molden-gmail.com@news.gmane.org> References: <53E939AB.4010305@gmail.com> <1197348250429560368.571367sturla.molden-gmail.com@news.gmane.org> Message-ID: On Tue, Aug 12, 2014 at 9:24 PM, Sturla Molden wrote: > Charles R Harris wrote: > > > - Move _dotblas down into multiarray > > 1. When there is cblas, add cblas implementations of decr->f->dot. > > 2. Reimplement API matrixproduct2 > > 3. Make ndarray.dot a first class method and use it for numpy.dot. > > - Implement matmul > > 1. Add matrixmultiply (matmul?) to the numpy API > > 2. Implement __matmul__ method. > > 3. Add functions to linalg for stacked vectors. > > 4. Make sure __matmul__ works with __numpy_ufunc__ > > - Consider using blas_lite instead of cblas, but that is now > independent > > of the previous steps. > > We could consider to have a linalg._linalg module that just stores BLAS and > LAPACK function pointer values as read-only integer attributes. This way we > could move _dotblas into the core without actually having linalg in the > core. linalg._linalg would just sit there and own BLAS and LAPACK, and no > other part of NumPy would need build dependencies on these libraries. Note that those dependencies are optional now. > When _dotblas is imported it just imports linalg._linalg and reads whatever > function pointer value it needs. It would also make it possible to remove > BLAS and LAPACK build dependencies from SciPy, as long as we export most or > all of BLAS and LAPACK. > That's not possible. The only way you can do that is move the hard dependency on BLAS & LAPACK to numpy, which we don't want to do. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Tue Aug 12 15:43:16 2014 From: stefan at sun.ac.za (=?UTF-8?Q?St=C3=A9fan_van_der_Walt?=) Date: Tue, 12 Aug 2014 21:43:16 +0200 Subject: [Numpy-discussion] Fwd: We need help working on code coverage for Cython code In-Reply-To: References: Message-ID: Hi Matthew On Tue, Aug 12, 2014 at 9:24 PM, Matthew Brett wrote: > The Cython developers have done some work on this [1] but it is > currently stalled for lack of developer time to work on it. It looks like we can help them with the rest of the work once the lnotab PR is merged; is that correct? St?fan From matthew.brett at gmail.com Tue Aug 12 15:49:04 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 12 Aug 2014 12:49:04 -0700 Subject: [Numpy-discussion] Fwd: We need help working on code coverage for Cython code In-Reply-To: References: Message-ID: Hi, On Tue, Aug 12, 2014 at 12:43 PM, St?fan van der Walt wrote: > Hi Matthew > > On Tue, Aug 12, 2014 at 9:24 PM, Matthew Brett wrote: >> The Cython developers have done some work on this [1] but it is >> currently stalled for lack of developer time to work on it. > > It looks like we can help them with the rest of the work once the > lnotab PR is merged; is that correct? My very vague impression is that Stefan B thinks of the lnotab PR as part of the process of getting the work done, so that merging would only be worthwhile if it was pretty clear that the rest of the work would happen as well. We could ask again on the Cython list... Cheers, Matthew From stefan at sun.ac.za Tue Aug 12 15:52:36 2014 From: stefan at sun.ac.za (=?UTF-8?Q?St=C3=A9fan_van_der_Walt?=) Date: Tue, 12 Aug 2014 21:52:36 +0200 Subject: [Numpy-discussion] Fwd: We need help working on code coverage for Cython code In-Reply-To: References: Message-ID: Hi Matthew On Tue, Aug 12, 2014 at 9:49 PM, Matthew Brett wrote: > My very vague impression is that Stefan B thinks of the lnotab PR as > part of the process of getting the work done, so that merging would > only be worthwhile if it was pretty clear that the rest of the work > would happen as well. We could ask again on the Cython list... I think a clear roadmap with small targets would help--few of us know Cython in too much depth, so it would help if we could avoid any duplicate effort/research. Thank you for bringing this issue up on the radar. St?fan From matthew.brett at gmail.com Tue Aug 12 16:15:36 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 12 Aug 2014 13:15:36 -0700 Subject: [Numpy-discussion] Fwd: We need help working on code coverage for Cython code In-Reply-To: References: Message-ID: Hi, On Tue, Aug 12, 2014 at 12:52 PM, St?fan van der Walt wrote: > Hi Matthew > > On Tue, Aug 12, 2014 at 9:49 PM, Matthew Brett wrote: >> My very vague impression is that Stefan B thinks of the lnotab PR as >> part of the process of getting the work done, so that merging would >> only be worthwhile if it was pretty clear that the rest of the work >> would happen as well. We could ask again on the Cython list... > > I think a clear roadmap with small targets would help--few of us know > Cython in too much depth, so it would help if we could avoid any > duplicate effort/research. The first step we thought of was having a group live conversation of some sort with the Cython developers to get an idea of what work needs doing. So, I think the first question is - who would be up for joining that? From stefan at sun.ac.za Tue Aug 12 16:27:06 2014 From: stefan at sun.ac.za (=?UTF-8?Q?St=C3=A9fan_van_der_Walt?=) Date: Tue, 12 Aug 2014 22:27:06 +0200 Subject: [Numpy-discussion] Fwd: We need help working on code coverage for Cython code In-Reply-To: References: Message-ID: On Tue, Aug 12, 2014 at 10:15 PM, Matthew Brett wrote: > The first step we thought of was having a group live conversation of > some sort with the Cython developers to get an idea of what work needs > doing. So, I think the first question is - who would be up for > joining that? I'd be up for that. Also, perhaps some key Cython players would be at EuroSciPy, then we can discuss it in person? St?fan From sturla.molden at gmail.com Tue Aug 12 16:53:13 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 12 Aug 2014 20:53:13 +0000 (UTC) Subject: [Numpy-discussion] NumPy-Discussion OpenBLAS and dotblas References: <53E939AB.4010305@gmail.com> <1197348250429560368.571367sturla.molden-gmail.com@news.gmane.org> Message-ID: <458662559429569529.743536sturla.molden-gmail.com@news.gmane.org> Ralf Gommers wrote: > That's not possible. The only way you can do that is move the hard > dependency on BLAS & LAPACK to numpy, which we don't want to do. But NumPy already depends on BLAS and LAPACK, right? From ralf.gommers at gmail.com Tue Aug 12 17:02:06 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 12 Aug 2014 23:02:06 +0200 Subject: [Numpy-discussion] NumPy-Discussion OpenBLAS and dotblas In-Reply-To: <458662559429569529.743536sturla.molden-gmail.com@news.gmane.org> References: <53E939AB.4010305@gmail.com> <1197348250429560368.571367sturla.molden-gmail.com@news.gmane.org> <458662559429569529.743536sturla.molden-gmail.com@news.gmane.org> Message-ID: On Tue, Aug 12, 2014 at 10:53 PM, Sturla Molden wrote: > Ralf Gommers wrote: > > > That's not possible. The only way you can do that is move the hard > > dependency on BLAS & LAPACK to numpy, which we don't want to do. > > But NumPy already depends on BLAS and LAPACK, right? > No. Numpy uses those libs when they're detected, but it falls back on its own dot implementation if they're not found. From first bullet under http://scipy.org/scipylib/building/linux.html#generic-instructions: "BLAS and LAPACK libraries (optional but strongly recommended for NumPy, required for SciPy)". BLAS/LAPACK are heavy dependencies that often give problems, which is why you don't want to require them for the casual user that only needs numpy arrays to make some plots for examples. Ralf > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Tue Aug 12 17:35:56 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 12 Aug 2014 21:35:56 +0000 (UTC) Subject: [Numpy-discussion] NumPy-Discussion OpenBLAS and dotblas References: <53E939AB.4010305@gmail.com> <1197348250429560368.571367sturla.molden-gmail.com@news.gmane.org> <458662559429569529.743536sturla.molden-gmail.com@news.gmane.org> Message-ID: <1031917335429572037.628140sturla.molden-gmail.com@news.gmane.org> Ralf Gommers wrote: > No. Numpy uses those libs when they're detected, but it falls back on its > own dot implementation if they're not found. From first bullet under > href="http://scipy.org/scipylib/building/linux.html#generic-instructions:">http://scipy.org/scipylib/building/linux.html#generic-instructions: > "BLAS > and LAPACK libraries (optional but strongly recommended for NumPy, required > for SciPy)". > > BLAS/LAPACK are heavy dependencies that often give problems, which is why > you don't want to require them for the casual user that only needs numpy > arrays to make some plots for examples. Maybe we are not talking about the same thing, but isn't blas_lite.c and lapack_lite.c more or less f2c'd versions of reference BLAS and reference LAPACK? Sturla From robert.kern at gmail.com Tue Aug 12 18:14:16 2014 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 12 Aug 2014 23:14:16 +0100 Subject: [Numpy-discussion] NumPy-Discussion OpenBLAS and dotblas In-Reply-To: <1031917335429572037.628140sturla.molden-gmail.com@news.gmane.org> References: <53E939AB.4010305@gmail.com> <1197348250429560368.571367sturla.molden-gmail.com@news.gmane.org> <458662559429569529.743536sturla.molden-gmail.com@news.gmane.org> <1031917335429572037.628140sturla.molden-gmail.com@news.gmane.org> Message-ID: On Tue, Aug 12, 2014 at 10:35 PM, Sturla Molden wrote: > Ralf Gommers wrote: > >> No. Numpy uses those libs when they're detected, but it falls back on its >> own dot implementation if they're not found. From first bullet under >> > href="http://scipy.org/scipylib/building/linux.html#generic-instructions:">http://scipy.org/scipylib/building/linux.html#generic-instructions: >> "BLAS >> and LAPACK libraries (optional but strongly recommended for NumPy, required >> for SciPy)". >> >> BLAS/LAPACK are heavy dependencies that often give problems, which is why >> you don't want to require them for the casual user that only needs numpy >> arrays to make some plots for examples. > > Maybe we are not talking about the same thing, but isn't blas_lite.c and > lapack_lite.c more or less f2c'd versions of reference BLAS and reference > LAPACK? Not all of them, no. Just the routines that numpy itself uses. Hence, "lite". -- Robert Kern From cjw at ncf.ca Tue Aug 12 18:19:15 2014 From: cjw at ncf.ca (cjw) Date: Tue, 12 Aug 2014 18:19:15 -0400 Subject: [Numpy-discussion] NumPy-Discussion OpenBLAS and dotblas In-Reply-To: References: <53E939AB.4010305@gmail.com> Message-ID: <53EA92E3.7010600@ncf.ca> Charles, Nothing I've seen so far envisages disturbing the existing, in my opinion flawed, Matrix Class. I trust that I have not missed anything. Compilation is a complex press for a person unfamiliar with the C. Anything you could do to simplify that would be welcome. Colin W. On 12/08/2014 1:50 PM, Charles R Harris wrote: > > > > On Tue, Aug 12, 2014 at 8:26 AM, Nathaniel Smith > wrote: > > Hi Matt, > > On Mon, Aug 11, 2014 at 10:46 PM, Matti Picus > > wrote: > > Hi Nathaniel. > > Thanks for your prompt reply. I think numpy is a wonderful > project, and you > > all do a great job moving it forward. > > If you ask what would my vision for maturing numpy, I would like > to see a > > grouping of linalg matrix-operation functionality into a python > level > > package, exactly the opposite of more tightly tying linalg into > the core of > > numpy. > > As I understood it (though I admit Chuck was pretty terse, maybe he'll > correct me :-)), what he was proposing was basically just a build > system reorganization -- it's much easier to call between C functions > that are in the same Python module than C functions that are in > different modules, so we end up with lots of boilerplate gunk for the > latter. I don't think it would involve any tighter coupling than we > already have in practice. > > > I'm trying to think of the correct sequence of moves. Here are my > current thoughts. > > * Move _dotblas down into multiarray > 1. When there is cblas, add cblas implementations of decr->f->dot. > 2. Reimplement API matrixproduct2 > 3. Make ndarray.dot a first class method and use it for numpy.dot. > * Implement matmul > 1. Add matrixmultiply (matmul?) to the numpy API > 2. Implement __matmul__ method. > 3. Add functions to linalg for stacked vectors. > 4. Make sure __matmul__ works with __numpy_ufunc__ > * > Consider using blas_lite instead of cblas, but that is now independent > of the previous steps. > > > > Chuck > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Aug 12 18:29:59 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 12 Aug 2014 16:29:59 -0600 Subject: [Numpy-discussion] NumPy-Discussion OpenBLAS and dotblas In-Reply-To: <53EA92E3.7010600@ncf.ca> References: <53E939AB.4010305@gmail.com> <53EA92E3.7010600@ncf.ca> Message-ID: On Tue, Aug 12, 2014 at 4:19 PM, cjw wrote: > Charles, > > Nothing I've seen so far envisages disturbing the existing, in my opinion > flawed, Matrix Class. > > I trust that I have not missed anything. > > Compilation is a complex press for a person unfamiliar with the C. > Anything you could do to simplify that would be welcome. > > We aren't talking about the matrix class, but rather the new '@' operator to be used with arrays. The implementation of that operator could depend on routines defined in other modules, as does the current dot method, or we can move the overall implementation down into the multiarray module. The question of using blas_lite come up because some things would be a bit simpler is we didn't need to make the code depend on whether or not there was a cblas library. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Tue Aug 12 19:47:53 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 12 Aug 2014 23:47:53 +0000 (UTC) Subject: [Numpy-discussion] NumPy-Discussion OpenBLAS and dotblas References: <1197348250429560368.571367sturla.molden-gmail.com@news.gmane.org> <458662559429569529.743536sturla.molden-gmail.com@news.gmane.org> <1031917335429572037.628140sturla.molden-gmail.com@news.gmane.org> Message-ID: <1793334435429579206.133284sturla.molden-gmail.com@news.gmane.org> Robert Kern wrote: >>> BLAS/LAPACK are heavy dependencies that often give problems, which is why >>> you don't want to require them for the casual user that only needs numpy >>> arrays to make some plots for examples. >> >> Maybe we are not talking about the same thing, but isn't blas_lite.c and >> lapack_lite.c more or less f2c'd versions of reference BLAS and reference >> LAPACK? > > Not all of them, no. Just the routines that numpy itself uses. Hence, "lite". I thought it got the 'lite' name because Netlib up to LAPACK 3.1.1 had packages named 'lapack-lite-3.1.1.tgz' in addition to 'lapack-3.1.1.tgz'. (I am not sure what the differences between the packages were.) The lapack_lite.c file looks rather complete, but it seems the build process somehow extracts only parts of it. Sturla From robert.kern at gmail.com Wed Aug 13 05:50:22 2014 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 13 Aug 2014 10:50:22 +0100 Subject: [Numpy-discussion] NumPy-Discussion OpenBLAS and dotblas In-Reply-To: <1793334435429579206.133284sturla.molden-gmail.com@news.gmane.org> References: <1197348250429560368.571367sturla.molden-gmail.com@news.gmane.org> <458662559429569529.743536sturla.molden-gmail.com@news.gmane.org> <1031917335429572037.628140sturla.molden-gmail.com@news.gmane.org> <1793334435429579206.133284sturla.molden-gmail.com@news.gmane.org> Message-ID: On Wed, Aug 13, 2014 at 12:47 AM, Sturla Molden wrote: > Robert Kern wrote: > >>>> BLAS/LAPACK are heavy dependencies that often give problems, which is why >>>> you don't want to require them for the casual user that only needs numpy >>>> arrays to make some plots for examples. >>> >>> Maybe we are not talking about the same thing, but isn't blas_lite.c and >>> lapack_lite.c more or less f2c'd versions of reference BLAS and reference >>> LAPACK? >> >> Not all of them, no. Just the routines that numpy itself uses. Hence, "lite". > > I thought it got the 'lite' name because Netlib up to LAPACK 3.1.1 had > packages named 'lapack-lite-3.1.1.tgz' in addition to 'lapack-3.1.1.tgz'. > (I am not sure what the differences between the packages were.) No. https://github.com/numpy/numpy/blob/master/numpy/linalg/lapack_lite/README > The lapack_lite.c file looks rather complete, but it seems the build > process somehow extracts only parts of it. I assume you mean dlapack_lite.c? It is incomplete. It is the end product of taking the full LAPACK 3.0 distribution, stripping out the routines that are not used in numpy, and f2cing the subset. Go ahead and look for the routines in LAPACK 3.0 systematically, and you will find many of them missing. -- Robert Kern From warren.weckesser at gmail.com Wed Aug 13 16:57:07 2014 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Wed, 13 Aug 2014 16:57:07 -0400 Subject: [Numpy-discussion] New function `count_unique` to generate contingency tables. In-Reply-To: References: Message-ID: On Tue, Aug 12, 2014 at 12:51 PM, Eelco Hoogendoorn < hoogendoorn.eelco at gmail.com> wrote: > ah yes, that's also an issue I was trying to deal with. the semantics I > prefer in these type of operators, is (as a default), to have every array > be treated as a sequence of keys, so if calling unique(arr_2d), youd get > unique rows, unless you pass axis=None, in which case the array is > flattened. > > I also agree that the extension you propose here is useful; but ideally, > with a little more discussion on these subjects we can converge on an > even more comprehensive overhaul > > > On Tue, Aug 12, 2014 at 6:33 PM, Joe Kington > wrote: > >> >> >> >> On Tue, Aug 12, 2014 at 11:17 AM, Eelco Hoogendoorn < >> hoogendoorn.eelco at gmail.com> wrote: >> >>> Thanks. Prompted by that stackoverflow question, and similar problems I >>> had to deal with myself, I started working on a much more general extension >>> to numpy's functionality in this space. Like you noted, things get a little >>> panda-y, but I think there is a lot of panda's functionality that could or >>> should be part of the numpy core, a robust set of grouping operations in >>> particular. >>> >>> see pastebin here: >>> http://pastebin.com/c5WLWPbp >>> >> >> On a side note, this is related to a pull request of mine from awhile >> back: https://github.com/numpy/numpy/pull/3584 >> >> There was a lot of disagreement on the mailing list about what to call a >> "unique slices along a given axis" function, so I wound up closing the pull >> request pending more discussion. >> >> At any rate, I think it's a useful thing to have in "base" numpy. >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > Update: I renamed the function to `table` in the pull request: https://github.com/numpy/numpy/pull/4958 Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Wed Aug 13 17:15:27 2014 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 13 Aug 2014 17:15:27 -0400 Subject: [Numpy-discussion] New function `count_unique` to generate contingency tables. In-Reply-To: References: Message-ID: The ever-wonderful pylab mode in matplotlib has a table function for plotting a table of text in a plot. If I remember correctly, what would happen is that matplotlib's table() function will simply obliterate the numpy's table function. This isn't a show-stopper, I just wanted to point that out. Personally, while I wasn't a particular fan of "count_unique" because I wouldn't necessarially think of it when needing a contingency table, I do like that it is verb-ish. "table()", in this sense, is not a verb. That said, I am perfectly fine with it if you are fine with the name collision in pylab mode. On Wed, Aug 13, 2014 at 4:57 PM, Warren Weckesser < warren.weckesser at gmail.com> wrote: > > > > On Tue, Aug 12, 2014 at 12:51 PM, Eelco Hoogendoorn < > hoogendoorn.eelco at gmail.com> wrote: > >> ah yes, that's also an issue I was trying to deal with. the semantics I >> prefer in these type of operators, is (as a default), to have every array >> be treated as a sequence of keys, so if calling unique(arr_2d), youd get >> unique rows, unless you pass axis=None, in which case the array is >> flattened. >> >> I also agree that the extension you propose here is useful; but ideally, >> with a little more discussion on these subjects we can converge on an >> even more comprehensive overhaul >> >> >> On Tue, Aug 12, 2014 at 6:33 PM, Joe Kington >> wrote: >> >>> >>> >>> >>> On Tue, Aug 12, 2014 at 11:17 AM, Eelco Hoogendoorn < >>> hoogendoorn.eelco at gmail.com> wrote: >>> >>>> Thanks. Prompted by that stackoverflow question, and similar problems I >>>> had to deal with myself, I started working on a much more general extension >>>> to numpy's functionality in this space. Like you noted, things get a little >>>> panda-y, but I think there is a lot of panda's functionality that could or >>>> should be part of the numpy core, a robust set of grouping operations in >>>> particular. >>>> >>>> see pastebin here: >>>> http://pastebin.com/c5WLWPbp >>>> >>> >>> On a side note, this is related to a pull request of mine from awhile >>> back: https://github.com/numpy/numpy/pull/3584 >>> >>> There was a lot of disagreement on the mailing list about what to call a >>> "unique slices along a given axis" function, so I wound up closing the pull >>> request pending more discussion. >>> >>> At any rate, I think it's a useful thing to have in "base" numpy. >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > Update: I renamed the function to `table` in the pull request: > https://github.com/numpy/numpy/pull/4958 > > > Warren > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Wed Aug 13 17:25:35 2014 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Wed, 13 Aug 2014 17:25:35 -0400 Subject: [Numpy-discussion] New function `count_unique` to generate contingency tables. In-Reply-To: References: Message-ID: On Wed, Aug 13, 2014 at 5:15 PM, Benjamin Root wrote: > The ever-wonderful pylab mode in matplotlib has a table function for > plotting a table of text in a plot. If I remember correctly, what would > happen is that matplotlib's table() function will simply obliterate the > numpy's table function. This isn't a show-stopper, I just wanted to point > that out. > > Personally, while I wasn't a particular fan of "count_unique" because I > wouldn't necessarially think of it when needing a contingency table, I do > like that it is verb-ish. "table()", in this sense, is not a verb. That > said, I am perfectly fine with it if you are fine with the name collision > in pylab mode. > > Thanks for pointing that out. I only changed it to have something that sounded more table-ish, like the Pandas, R and Matlab functions. I won't update it right now, but if there is interest in putting it into numpy, I'll rename it to avoid the pylab conflict. Anything along the lines of `crosstab`, `xtable`, etc., would be fine with me. Warren > On Wed, Aug 13, 2014 at 4:57 PM, Warren Weckesser < > warren.weckesser at gmail.com> wrote: > >> >> >> >> On Tue, Aug 12, 2014 at 12:51 PM, Eelco Hoogendoorn < >> hoogendoorn.eelco at gmail.com> wrote: >> >>> ah yes, that's also an issue I was trying to deal with. the semantics I >>> prefer in these type of operators, is (as a default), to have every array >>> be treated as a sequence of keys, so if calling unique(arr_2d), youd get >>> unique rows, unless you pass axis=None, in which case the array is >>> flattened. >>> >>> I also agree that the extension you propose here is useful; but ideally, >>> with a little more discussion on these subjects we can converge on an >>> even more comprehensive overhaul >>> >>> >>> On Tue, Aug 12, 2014 at 6:33 PM, Joe Kington >>> wrote: >>> >>>> >>>> >>>> >>>> On Tue, Aug 12, 2014 at 11:17 AM, Eelco Hoogendoorn < >>>> hoogendoorn.eelco at gmail.com> wrote: >>>> >>>>> Thanks. Prompted by that stackoverflow question, and similar problems >>>>> I had to deal with myself, I started working on a much more general >>>>> extension to numpy's functionality in this space. Like you noted, things >>>>> get a little panda-y, but I think there is a lot of panda's functionality >>>>> that could or should be part of the numpy core, a robust set of grouping >>>>> operations in particular. >>>>> >>>>> see pastebin here: >>>>> http://pastebin.com/c5WLWPbp >>>>> >>>> >>>> On a side note, this is related to a pull request of mine from awhile >>>> back: https://github.com/numpy/numpy/pull/3584 >>>> >>>> There was a lot of disagreement on the mailing list about what to call >>>> a "unique slices along a given axis" function, so I wound up closing the >>>> pull request pending more discussion. >>>> >>>> At any rate, I think it's a useful thing to have in "base" numpy. >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> Update: I renamed the function to `table` in the pull request: >> https://github.com/numpy/numpy/pull/4958 >> >> >> Warren >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Wed Aug 13 18:17:37 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Thu, 14 Aug 2014 00:17:37 +0200 Subject: [Numpy-discussion] New function `count_unique` to generate contingency tables. In-Reply-To: References: Message-ID: Its pretty easy to implement this table functionality and more on top of the code I linked above. I still think such a comprehensive overhaul of arraysetops is worth discussing. import numpy as np import grouping x = [1, 1, 1, 1, 2, 2, 2, 2, 2] y = [3, 4, 3, 3, 3, 4, 5, 5, 5] z = np.random.randint(0,2,(9,2)) def table(*keys): """ desired table implementation, building on the index object cleaner, and more functionality performance should be the same """ indices = [grouping.as_index(k, axis=0) for k in keys] uniques = [i.unique for i in indices] inverses = [i.inverse for i in indices] shape = [i.groups for i in indices] t = np.zeros(shape, np.int) np.add.at(t, inverses, 1) return tuple(uniques), t #here is how to use print table(x,y) #but we can use fancy keys as well; here a composite key and a row-key print table((x,y), z) #this effectively creates a sparse matrix equivalent of your desired table print grouping.count((x,y)) On Wed, Aug 13, 2014 at 11:25 PM, Warren Weckesser < warren.weckesser at gmail.com> wrote: > > > > On Wed, Aug 13, 2014 at 5:15 PM, Benjamin Root wrote: > >> The ever-wonderful pylab mode in matplotlib has a table function for >> plotting a table of text in a plot. If I remember correctly, what would >> happen is that matplotlib's table() function will simply obliterate the >> numpy's table function. This isn't a show-stopper, I just wanted to point >> that out. >> >> Personally, while I wasn't a particular fan of "count_unique" because I >> wouldn't necessarially think of it when needing a contingency table, I do >> like that it is verb-ish. "table()", in this sense, is not a verb. That >> said, I am perfectly fine with it if you are fine with the name collision >> in pylab mode. >> >> > > Thanks for pointing that out. I only changed it to have something that > sounded more table-ish, like the Pandas, R and Matlab functions. I won't > update it right now, but if there is interest in putting it into numpy, > I'll rename it to avoid the pylab conflict. Anything along the lines of > `crosstab`, `xtable`, etc., would be fine with me. > > Warren > > > >> On Wed, Aug 13, 2014 at 4:57 PM, Warren Weckesser < >> warren.weckesser at gmail.com> wrote: >> >>> >>> >>> >>> On Tue, Aug 12, 2014 at 12:51 PM, Eelco Hoogendoorn < >>> hoogendoorn.eelco at gmail.com> wrote: >>> >>>> ah yes, that's also an issue I was trying to deal with. the semantics I >>>> prefer in these type of operators, is (as a default), to have every array >>>> be treated as a sequence of keys, so if calling unique(arr_2d), youd get >>>> unique rows, unless you pass axis=None, in which case the array is >>>> flattened. >>>> >>>> I also agree that the extension you propose here is useful; but >>>> ideally, with a little more discussion on these subjects we can converge on >>>> an even more comprehensive overhaul >>>> >>>> >>>> On Tue, Aug 12, 2014 at 6:33 PM, Joe Kington >>>> wrote: >>>> >>>>> >>>>> >>>>> >>>>> On Tue, Aug 12, 2014 at 11:17 AM, Eelco Hoogendoorn < >>>>> hoogendoorn.eelco at gmail.com> wrote: >>>>> >>>>>> Thanks. Prompted by that stackoverflow question, and similar problems >>>>>> I had to deal with myself, I started working on a much more general >>>>>> extension to numpy's functionality in this space. Like you noted, things >>>>>> get a little panda-y, but I think there is a lot of panda's functionality >>>>>> that could or should be part of the numpy core, a robust set of grouping >>>>>> operations in particular. >>>>>> >>>>>> see pastebin here: >>>>>> http://pastebin.com/c5WLWPbp >>>>>> >>>>> >>>>> On a side note, this is related to a pull request of mine from awhile >>>>> back: https://github.com/numpy/numpy/pull/3584 >>>>> >>>>> There was a lot of disagreement on the mailing list about what to call >>>>> a "unique slices along a given axis" function, so I wound up closing the >>>>> pull request pending more discussion. >>>>> >>>>> At any rate, I think it's a useful thing to have in "base" numpy. >>>>> >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> >>> Update: I renamed the function to `table` in the pull request: >>> https://github.com/numpy/numpy/pull/4958 >>> >>> >>> Warren >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Thu Aug 14 13:02:14 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 14 Aug 2014 19:02:14 +0200 Subject: [Numpy-discussion] Fwd: We need help working on code coverage for Cython code In-Reply-To: References: Message-ID: On Tue, Aug 12, 2014 at 10:27 PM, St?fan van der Walt wrote: > On Tue, Aug 12, 2014 at 10:15 PM, Matthew Brett > wrote: > > The first step we thought of was having a group live conversation of > > some sort with the Cython developers to get an idea of what work needs > > doing. So, I think the first question is - who would be up for > > joining that? > > I'd be up for that. Also, perhaps some key Cython players would be at > EuroSciPy, then we can discuss it in person? > There are no Cython devs that are presenting, so likely they're not there at all. If there's a clear todo-list / roadmap then EuroSciPy may be a good place to find help with implementing though. Ralf > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From noel.pierre.andre at gmail.com Thu Aug 14 15:07:38 2014 From: noel.pierre.andre at gmail.com (Pierre-Andre Noel) Date: Thu, 14 Aug 2014 15:07:38 -0400 Subject: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal Message-ID: <53ED08FA.7040407@gmail.com> (I created issue 4965 earlier today on this topic, and I have been advised to email to this mailing list to discuss whether it is a good idea or not. I include my original post as-is, followed by additional comments.) I think that the following new feature would make `numpy.einsum` even more powerful/useful/awesome than it already is. Moreover, the change should not interfere with existing code, it would preserve the "minimalistic" spirit of `numpy.einsum`, and the new functionality would integrate in a seamless/intuitive manner for the users. In short, the new feature would allow for repeated subscripts to appear in the "output" part of the `subscripts` parameter (i.e., on the right-hand side of `->`). The corresponding dimensions in the resulting `ndarray` would only be filled along their diagonal, leaving the off diagonal entries to the default value for this `dtype` (typically zero). Note that the current behavior is to raise an exception when repeated output subscripts are being used. This is simplest to describe with an example involving the dual behavior of `numpy.diag`. ```python # Extracting the diagonal of a 2-D array. A = arange(16).reshape(4,4) print(diag(A)) # Output: [ 0 5 10 15 ] print(einsum('ii->i', A)) # Same as previous line (current behavior). # Constructing a diagonal 2-D array. v = arange(4) print(diag(v)) # Output: [[0 0 0 0] [0 1 0 0] [0 0 2 0] [0 0 0 3]] print(einsum('i->ii', v)) # New behavior would be same as previous line. # The current behavior of the previous line is to raise an exception. ``` By opposition to `numpy.diag`, the approach generalizes to higher dimensions: `einsum('iii->i', A)` extracts the diagonal of a 3-D array, and `einsum('i->iii', v)` would build a diagonal 3-D array. The proposed behavior really starts to shine in more intricate cases. ```python # Dummy values, these should be probabilities to make sense below. P_w_ab = arange(24).reshape(3,2,4) P_y_wxab = arange(144).reshape(3,3,2,2,4) # With the proposed behavior, the following two lines should be equivalent. P_xyz_ab = einsum('wab,xa,ywxab,zy->xyzab', P_w_ab, eye(2), P_y_wxab, eye(3)) also_P_xyz_ab = einsum('wab,ywaab->ayyab', P_w_ab, P_y_wxab) ``` If this is not convincing enough, replace `eye(2)` by `eye(P_w_ab.shape[1])` and replace `eye(3)` by `eye(P_y_wxab.shape[0])`, then imagine more dimensions and repeated indices... The new notation would allow for crisper codes and reduce the opportunities for dumb mistakes. For those who wonder, the above computation amounts to $P(X=x,Y=y,Z=z|A=a,B=b) = \sum_w P(W=w|A=a,B=b) P(X=x|A=a) P(Y=y|W=w,X=x,A=a,B=b) P(Z=z|Y=y)$ with $P(X=x|A=a)=\delta_{xa}$ and $P(Z=z|Y=y)=\delta_{zy}$ (using LaTeX notation, and $\delta_{ij}$ is [Kronecker's delta](http://en.wikipedia.org/wiki/Kronecker_delta)). (End of original post.) I have been told by @jaimefrio that "The best way of getting a new feature into numpy is putting it in yourself." Hence, if discussions here do reveal that this is a good idea, then I may give a try at coding it myself. However, I currently know nothing of the inner workings of numpy/ndarray/einsum, and I have higher priorities right now. This means that it could take a long while before I contribute any code, if I ever do. Hence, if anyone feels like doing it, feel free to do so! Also, I am aware that storing a lot of zeros in an `ndarray` may not, a priori, be a desirable avenue. However, there are times where you have to do it: think of `numpy.eye` as an example. In my case of application, I use such diagonal structures in the initialization of an `ndarray` which is later updated through an iterative process. After these iterations, most of the zeros will be gone. Do other people see a use for such capabilities? Thank you for your time and have a nice day. Sincerely, Pierre-Andr? No?l From ben.root at ou.edu Thu Aug 14 15:21:11 2014 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 14 Aug 2014 15:21:11 -0400 Subject: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal In-Reply-To: <53ED08FA.7040407@gmail.com> References: <53ED08FA.7040407@gmail.com> Message-ID: You had me at Kronecker delta... :-) +1 On Thu, Aug 14, 2014 at 3:07 PM, Pierre-Andre Noel < noel.pierre.andre at gmail.com> wrote: > (I created issue 4965 earlier today on this topic, and I have been > advised to email to this mailing list to discuss whether it is a good > idea or not. I include my original post as-is, followed by additional > comments.) > > I think that the following new feature would make `numpy.einsum` even > more powerful/useful/awesome than it already is. Moreover, the change > should not interfere with existing code, it would preserve the > "minimalistic" spirit of `numpy.einsum`, and the new functionality would > integrate in a seamless/intuitive manner for the users. > > In short, the new feature would allow for repeated subscripts to appear > in the "output" part of the `subscripts` parameter (i.e., on the > right-hand side of `->`). The corresponding dimensions in the resulting > `ndarray` would only be filled along their diagonal, leaving the off > diagonal entries to the default value for this `dtype` (typically zero). > Note that the current behavior is to raise an exception when repeated > output subscripts are being used. > > This is simplest to describe with an example involving the dual behavior > of `numpy.diag`. > > ```python > # Extracting the diagonal of a 2-D array. > A = arange(16).reshape(4,4) > print(diag(A)) # Output: [ 0 5 10 15 ] > print(einsum('ii->i', A)) # Same as previous line (current behavior). > > # Constructing a diagonal 2-D array. > v = arange(4) > print(diag(v)) # Output: [[0 0 0 0] [0 1 0 0] [0 0 2 0] [0 0 0 3]] > print(einsum('i->ii', v)) # New behavior would be same as previous line. > # The current behavior of the previous line is to raise an exception. > ``` > > By opposition to `numpy.diag`, the approach generalizes to higher > dimensions: `einsum('iii->i', A)` extracts the diagonal of a 3-D array, > and `einsum('i->iii', v)` would build a diagonal 3-D array. > > The proposed behavior really starts to shine in more intricate cases. > > ```python > # Dummy values, these should be probabilities to make sense below. > P_w_ab = arange(24).reshape(3,2,4) > P_y_wxab = arange(144).reshape(3,3,2,2,4) > > # With the proposed behavior, the following two lines should be equivalent. > P_xyz_ab = einsum('wab,xa,ywxab,zy->xyzab', P_w_ab, eye(2), P_y_wxab, > eye(3)) > also_P_xyz_ab = einsum('wab,ywaab->ayyab', P_w_ab, P_y_wxab) > ``` > > If this is not convincing enough, replace `eye(2)` by > `eye(P_w_ab.shape[1])` and replace `eye(3)` by `eye(P_y_wxab.shape[0])`, > then imagine more dimensions and repeated indices... The new notation > would allow for crisper codes and reduce the opportunities for dumb > mistakes. > > For those who wonder, the above computation amounts to > $P(X=x,Y=y,Z=z|A=a,B=b) = \sum_w P(W=w|A=a,B=b) P(X=x|A=a) > P(Y=y|W=w,X=x,A=a,B=b) P(Z=z|Y=y)$ with $P(X=x|A=a)=\delta_{xa}$ and > $P(Z=z|Y=y)=\delta_{zy}$ (using LaTeX notation, and $\delta_{ij}$ is > [Kronecker's delta](http://en.wikipedia.org/wiki/Kronecker_delta)). > > (End of original post.) > > I have been told by @jaimefrio that "The best way of getting a new > feature into numpy is putting it in yourself." Hence, if discussions > here do reveal that this is a good idea, then I may give a try at coding > it myself. However, I currently know nothing of the inner workings of > numpy/ndarray/einsum, and I have higher priorities right now. This means > that it could take a long while before I contribute any code, if I ever > do. Hence, if anyone feels like doing it, feel free to do so! > > Also, I am aware that storing a lot of zeros in an `ndarray` may not, a > priori, be a desirable avenue. However, there are times where you have > to do it: think of `numpy.eye` as an example. In my case of application, > I use such diagonal structures in the initialization of an `ndarray` > which is later updated through an iterative process. After these > iterations, most of the zeros will be gone. Do other people see a use > for such capabilities? > > Thank you for your time and have a nice day. > > Sincerely, > > Pierre-Andr? No?l > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Thu Aug 14 15:42:51 2014 From: shoyer at gmail.com (Stephan Hoyer) Date: Thu, 14 Aug 2014 12:42:51 -0700 Subject: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal In-Reply-To: References: <53ED08FA.7040407@gmail.com> Message-ID: I think this would be very nice addition. On Thu, Aug 14, 2014 at 12:21 PM, Benjamin Root wrote: > You had me at Kronecker delta... :-) +1 > > > On Thu, Aug 14, 2014 at 3:07 PM, Pierre-Andre Noel < > noel.pierre.andre at gmail.com> wrote: > >> (I created issue 4965 earlier today on this topic, and I have been >> advised to email to this mailing list to discuss whether it is a good >> idea or not. I include my original post as-is, followed by additional >> comments.) >> >> I think that the following new feature would make `numpy.einsum` even >> more powerful/useful/awesome than it already is. Moreover, the change >> should not interfere with existing code, it would preserve the >> "minimalistic" spirit of `numpy.einsum`, and the new functionality would >> integrate in a seamless/intuitive manner for the users. >> >> In short, the new feature would allow for repeated subscripts to appear >> in the "output" part of the `subscripts` parameter (i.e., on the >> right-hand side of `->`). The corresponding dimensions in the resulting >> `ndarray` would only be filled along their diagonal, leaving the off >> diagonal entries to the default value for this `dtype` (typically zero). >> Note that the current behavior is to raise an exception when repeated >> output subscripts are being used. >> >> This is simplest to describe with an example involving the dual behavior >> of `numpy.diag`. >> >> ```python >> # Extracting the diagonal of a 2-D array. >> A = arange(16).reshape(4,4) >> print(diag(A)) # Output: [ 0 5 10 15 ] >> print(einsum('ii->i', A)) # Same as previous line (current behavior). >> >> # Constructing a diagonal 2-D array. >> v = arange(4) >> print(diag(v)) # Output: [[0 0 0 0] [0 1 0 0] [0 0 2 0] [0 0 0 3]] >> print(einsum('i->ii', v)) # New behavior would be same as previous line. >> # The current behavior of the previous line is to raise an exception. >> ``` >> >> By opposition to `numpy.diag`, the approach generalizes to higher >> dimensions: `einsum('iii->i', A)` extracts the diagonal of a 3-D array, >> and `einsum('i->iii', v)` would build a diagonal 3-D array. >> >> The proposed behavior really starts to shine in more intricate cases. >> >> ```python >> # Dummy values, these should be probabilities to make sense below. >> P_w_ab = arange(24).reshape(3,2,4) >> P_y_wxab = arange(144).reshape(3,3,2,2,4) >> >> # With the proposed behavior, the following two lines should be >> equivalent. >> P_xyz_ab = einsum('wab,xa,ywxab,zy->xyzab', P_w_ab, eye(2), P_y_wxab, >> eye(3)) >> also_P_xyz_ab = einsum('wab,ywaab->ayyab', P_w_ab, P_y_wxab) >> ``` >> >> If this is not convincing enough, replace `eye(2)` by >> `eye(P_w_ab.shape[1])` and replace `eye(3)` by `eye(P_y_wxab.shape[0])`, >> then imagine more dimensions and repeated indices... The new notation >> would allow for crisper codes and reduce the opportunities for dumb >> mistakes. >> >> For those who wonder, the above computation amounts to >> $P(X=x,Y=y,Z=z|A=a,B=b) = \sum_w P(W=w|A=a,B=b) P(X=x|A=a) >> P(Y=y|W=w,X=x,A=a,B=b) P(Z=z|Y=y)$ with $P(X=x|A=a)=\delta_{xa}$ and >> $P(Z=z|Y=y)=\delta_{zy}$ (using LaTeX notation, and $\delta_{ij}$ is >> [Kronecker's delta](http://en.wikipedia.org/wiki/Kronecker_delta)). >> >> (End of original post.) >> >> I have been told by @jaimefrio that "The best way of getting a new >> feature into numpy is putting it in yourself." Hence, if discussions >> here do reveal that this is a good idea, then I may give a try at coding >> it myself. However, I currently know nothing of the inner workings of >> numpy/ndarray/einsum, and I have higher priorities right now. This means >> that it could take a long while before I contribute any code, if I ever >> do. Hence, if anyone feels like doing it, feel free to do so! >> >> Also, I am aware that storing a lot of zeros in an `ndarray` may not, a >> priori, be a desirable avenue. However, there are times where you have >> to do it: think of `numpy.eye` as an example. In my case of application, >> I use such diagonal structures in the initialization of an `ndarray` >> which is later updated through an iterative process. After these >> iterations, most of the zeros will be gone. Do other people see a use >> for such capabilities? >> >> Thank you for your time and have a nice day. >> >> Sincerely, >> >> Pierre-Andr? No?l >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Fri Aug 15 09:46:33 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 15 Aug 2014 15:46:33 +0200 Subject: [Numpy-discussion] Weighted Covariance/correlation Message-ID: <1408110393.20638.16.camel@sebastian-t440> Hi all, Tom Poole has opened pull request https://github.com/numpy/numpy/pull/4960 to implement weights into np.cov (correlation can be added), somewhat picking up the effort started by Noel Dawe in https://github.com/numpy/numpy/pull/3864. The pull request would currently implement an accuracy type `weights` keyword argument as default, but have a switch `repeat_weights` to use repeat type weights instead (frequency type are a special case of this I think). As far as I can see, the code is in a state that it can be tested. But since it is a new feature, the names/defaults are up for discussion, so maybe someone who might use such a feature has a preference. I know we had a short discussion about this before, but it was a while ago. For example another option would be to have the two weights as two keyword arguments, instead of a boolean switch. Regards, Sebastian From sebastian at sipsolutions.net Fri Aug 15 09:53:23 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 15 Aug 2014 15:53:23 +0200 Subject: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal In-Reply-To: References: <53ED08FA.7040407@gmail.com> Message-ID: <1408110803.20638.18.camel@sebastian-t440> On Do, 2014-08-14 at 12:42 -0700, Stephan Hoyer wrote: > I think this would be very nice addition. > > > On Thu, Aug 14, 2014 at 12:21 PM, Benjamin Root > wrote: > You had me at Kronecker delta... :-) +1 > Sounds good to me. I don't see a reason for not relaxing the restriction, unless there is some technical issue, but I doubt that. - Sebastian > > > On Thu, Aug 14, 2014 at 3:07 PM, Pierre-Andre Noel > wrote: > (I created issue 4965 earlier today on this topic, and > I have been > advised to email to this mailing list to discuss > whether it is a good > idea or not. I include my original post as-is, > followed by additional > comments.) > > I think that the following new feature would make > `numpy.einsum` even > more powerful/useful/awesome than it already is. > Moreover, the change > should not interfere with existing code, it would > preserve the > "minimalistic" spirit of `numpy.einsum`, and the new > functionality would > integrate in a seamless/intuitive manner for the > users. > > In short, the new feature would allow for repeated > subscripts to appear > in the "output" part of the `subscripts` parameter > (i.e., on the > right-hand side of `->`). The corresponding dimensions > in the resulting > `ndarray` would only be filled along their diagonal, > leaving the off > diagonal entries to the default value for this `dtype` > (typically zero). > Note that the current behavior is to raise an > exception when repeated > output subscripts are being used. > > This is simplest to describe with an example involving > the dual behavior > of `numpy.diag`. > > ```python > # Extracting the diagonal of a 2-D array. > A = arange(16).reshape(4,4) > print(diag(A)) # Output: [ 0 5 10 15 ] > print(einsum('ii->i', A)) # Same as previous line > (current behavior). > > # Constructing a diagonal 2-D array. > v = arange(4) > print(diag(v)) # Output: [[0 0 0 0] [0 1 0 0] [0 0 2 > 0] [0 0 0 3]] > print(einsum('i->ii', v)) # New behavior would be same > as previous line. > # The current behavior of the previous line is to > raise an exception. > ``` > > By opposition to `numpy.diag`, the approach > generalizes to higher > dimensions: `einsum('iii->i', A)` extracts the > diagonal of a 3-D array, > and `einsum('i->iii', v)` would build a diagonal 3-D > array. > > The proposed behavior really starts to shine in more > intricate cases. > > ```python > # Dummy values, these should be probabilities to make > sense below. > P_w_ab = arange(24).reshape(3,2,4) > P_y_wxab = arange(144).reshape(3,3,2,2,4) > > # With the proposed behavior, the following two lines > should be equivalent. > P_xyz_ab = einsum('wab,xa,ywxab,zy->xyzab', P_w_ab, > eye(2), P_y_wxab, > eye(3)) > also_P_xyz_ab = einsum('wab,ywaab->ayyab', P_w_ab, > P_y_wxab) > ``` > > If this is not convincing enough, replace `eye(2)` by > `eye(P_w_ab.shape[1])` and replace `eye(3)` by > `eye(P_y_wxab.shape[0])`, > then imagine more dimensions and repeated indices... > The new notation > would allow for crisper codes and reduce the > opportunities for dumb > mistakes. > > For those who wonder, the above computation amounts to > $P(X=x,Y=y,Z=z|A=a,B=b) = \sum_w P(W=w|A=a,B=b) P(X=x| > A=a) > P(Y=y|W=w,X=x,A=a,B=b) P(Z=z|Y=y)$ with $P(X=x|A=a)= > \delta_{xa}$ and > $P(Z=z|Y=y)=\delta_{zy}$ (using LaTeX notation, and > $\delta_{ij}$ is > [Kronecker's > delta](http://en.wikipedia.org/wiki/Kronecker_delta)). > > (End of original post.) > > I have been told by @jaimefrio that "The best way of > getting a new > feature into numpy is putting it in yourself." Hence, > if discussions > here do reveal that this is a good idea, then I may > give a try at coding > it myself. However, I currently know nothing of the > inner workings of > numpy/ndarray/einsum, and I have higher priorities > right now. This means > that it could take a long while before I contribute > any code, if I ever > do. Hence, if anyone feels like doing it, feel free to > do so! > > Also, I am aware that storing a lot of zeros in an > `ndarray` may not, a > priori, be a desirable avenue. However, there are > times where you have > to do it: think of `numpy.eye` as an example. In my > case of application, > I use such diagonal structures in the initialization > of an `ndarray` > which is later updated through an iterative process. > After these > iterations, most of the zeros will be gone. Do other > people see a use > for such capabilities? > > Thank you for your time and have a nice day. > > Sincerely, > > Pierre-Andr? No?l > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From hoogendoorn.eelco at gmail.com Fri Aug 15 10:42:09 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Fri, 15 Aug 2014 16:42:09 +0200 Subject: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal In-Reply-To: <1408110803.20638.18.camel@sebastian-t440> References: <53ED08FA.7040407@gmail.com> <1408110803.20638.18.camel@sebastian-t440> Message-ID: Agreed; this addition occurred to me as well. Note that the implemenatation should be straightforward: just allocate an enlarged array, use some striding logic to construct the relevant view, and let einsums internals act on the view. hopefully, you wont even have to touch the guts of einsum at the C level, because id say that isn't for the faint of heart... On Fri, Aug 15, 2014 at 3:53 PM, Sebastian Berg wrote: > On Do, 2014-08-14 at 12:42 -0700, Stephan Hoyer wrote: > > I think this would be very nice addition. > > > > > > On Thu, Aug 14, 2014 at 12:21 PM, Benjamin Root > > wrote: > > You had me at Kronecker delta... :-) +1 > > > > Sounds good to me. I don't see a reason for not relaxing the > restriction, unless there is some technical issue, but I doubt that. > > - Sebastian > > > > > > > On Thu, Aug 14, 2014 at 3:07 PM, Pierre-Andre Noel > > wrote: > > (I created issue 4965 earlier today on this topic, and > > I have been > > advised to email to this mailing list to discuss > > whether it is a good > > idea or not. I include my original post as-is, > > followed by additional > > comments.) > > > > I think that the following new feature would make > > `numpy.einsum` even > > more powerful/useful/awesome than it already is. > > Moreover, the change > > should not interfere with existing code, it would > > preserve the > > "minimalistic" spirit of `numpy.einsum`, and the new > > functionality would > > integrate in a seamless/intuitive manner for the > > users. > > > > In short, the new feature would allow for repeated > > subscripts to appear > > in the "output" part of the `subscripts` parameter > > (i.e., on the > > right-hand side of `->`). The corresponding dimensions > > in the resulting > > `ndarray` would only be filled along their diagonal, > > leaving the off > > diagonal entries to the default value for this `dtype` > > (typically zero). > > Note that the current behavior is to raise an > > exception when repeated > > output subscripts are being used. > > > > This is simplest to describe with an example involving > > the dual behavior > > of `numpy.diag`. > > > > ```python > > # Extracting the diagonal of a 2-D array. > > A = arange(16).reshape(4,4) > > print(diag(A)) # Output: [ 0 5 10 15 ] > > print(einsum('ii->i', A)) # Same as previous line > > (current behavior). > > > > # Constructing a diagonal 2-D array. > > v = arange(4) > > print(diag(v)) # Output: [[0 0 0 0] [0 1 0 0] [0 0 2 > > 0] [0 0 0 3]] > > print(einsum('i->ii', v)) # New behavior would be same > > as previous line. > > # The current behavior of the previous line is to > > raise an exception. > > ``` > > > > By opposition to `numpy.diag`, the approach > > generalizes to higher > > dimensions: `einsum('iii->i', A)` extracts the > > diagonal of a 3-D array, > > and `einsum('i->iii', v)` would build a diagonal 3-D > > array. > > > > The proposed behavior really starts to shine in more > > intricate cases. > > > > ```python > > # Dummy values, these should be probabilities to make > > sense below. > > P_w_ab = arange(24).reshape(3,2,4) > > P_y_wxab = arange(144).reshape(3,3,2,2,4) > > > > # With the proposed behavior, the following two lines > > should be equivalent. > > P_xyz_ab = einsum('wab,xa,ywxab,zy->xyzab', P_w_ab, > > eye(2), P_y_wxab, > > eye(3)) > > also_P_xyz_ab = einsum('wab,ywaab->ayyab', P_w_ab, > > P_y_wxab) > > ``` > > > > If this is not convincing enough, replace `eye(2)` by > > `eye(P_w_ab.shape[1])` and replace `eye(3)` by > > `eye(P_y_wxab.shape[0])`, > > then imagine more dimensions and repeated indices... > > The new notation > > would allow for crisper codes and reduce the > > opportunities for dumb > > mistakes. > > > > For those who wonder, the above computation amounts to > > $P(X=x,Y=y,Z=z|A=a,B=b) = \sum_w P(W=w|A=a,B=b) P(X=x| > > A=a) > > P(Y=y|W=w,X=x,A=a,B=b) P(Z=z|Y=y)$ with $P(X=x|A=a)= > > \delta_{xa}$ and > > $P(Z=z|Y=y)=\delta_{zy}$ (using LaTeX notation, and > > $\delta_{ij}$ is > > [Kronecker's > > delta](http://en.wikipedia.org/wiki/Kronecker_delta)). > > > > (End of original post.) > > > > I have been told by @jaimefrio that "The best way of > > getting a new > > feature into numpy is putting it in yourself." Hence, > > if discussions > > here do reveal that this is a good idea, then I may > > give a try at coding > > it myself. However, I currently know nothing of the > > inner workings of > > numpy/ndarray/einsum, and I have higher priorities > > right now. This means > > that it could take a long while before I contribute > > any code, if I ever > > do. Hence, if anyone feels like doing it, feel free to > > do so! > > > > Also, I am aware that storing a lot of zeros in an > > `ndarray` may not, a > > priori, be a desirable avenue. However, there are > > times where you have > > to do it: think of `numpy.eye` as an example. In my > > case of application, > > I use such diagonal structures in the initialization > > of an `ndarray` > > which is later updated through an iterative process. > > After these > > iterations, most of the zeros will be gone. Do other > > people see a use > > for such capabilities? > > > > Thank you for your time and have a nice day. > > > > Sincerely, > > > > Pierre-Andr? No?l > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Fri Aug 15 11:01:46 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 15 Aug 2014 17:01:46 +0200 Subject: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal In-Reply-To: References: <53ED08FA.7040407@gmail.com> <1408110803.20638.18.camel@sebastian-t440> Message-ID: <1408114906.20638.20.camel@sebastian-t440> On Fr, 2014-08-15 at 16:42 +0200, Eelco Hoogendoorn wrote: > Agreed; this addition occurred to me as well. Note that the > implemenatation should be straightforward: just allocate an enlarged > array, use some striding logic to construct the relevant view, and let > einsums internals act on the view. hopefully, you wont even have to > touch the guts of einsum at the C level, because id say that isn't for > the faint of heart... > I am not sure if einsum isn't pure C :). But even if, it should be doing something identical already for duplicate indices on the inputs... - Sebastian > > On Fri, Aug 15, 2014 at 3:53 PM, Sebastian Berg > wrote: > On Do, 2014-08-14 at 12:42 -0700, Stephan Hoyer wrote: > > I think this would be very nice addition. > > > > > > On Thu, Aug 14, 2014 at 12:21 PM, Benjamin Root > > > wrote: > > You had me at Kronecker delta... :-) +1 > > > > > Sounds good to me. I don't see a reason for not relaxing the > restriction, unless there is some technical issue, but I doubt > that. > > - Sebastian > > > > > > > On Thu, Aug 14, 2014 at 3:07 PM, Pierre-Andre Noel > > wrote: > > (I created issue 4965 earlier today on this > topic, and > > I have been > > advised to email to this mailing list to > discuss > > whether it is a good > > idea or not. I include my original post > as-is, > > followed by additional > > comments.) > > > > I think that the following new feature would > make > > `numpy.einsum` even > > more powerful/useful/awesome than it already > is. > > Moreover, the change > > should not interfere with existing code, it > would > > preserve the > > "minimalistic" spirit of `numpy.einsum`, and > the new > > functionality would > > integrate in a seamless/intuitive manner for > the > > users. > > > > In short, the new feature would allow for > repeated > > subscripts to appear > > in the "output" part of the `subscripts` > parameter > > (i.e., on the > > right-hand side of `->`). The corresponding > dimensions > > in the resulting > > `ndarray` would only be filled along their > diagonal, > > leaving the off > > diagonal entries to the default value for > this `dtype` > > (typically zero). > > Note that the current behavior is to raise > an > > exception when repeated > > output subscripts are being used. > > > > This is simplest to describe with an example > involving > > the dual behavior > > of `numpy.diag`. > > > > ```python > > # Extracting the diagonal of a 2-D array. > > A = arange(16).reshape(4,4) > > print(diag(A)) # Output: [ 0 5 10 15 ] > > print(einsum('ii->i', A)) # Same as previous > line > > (current behavior). > > > > # Constructing a diagonal 2-D array. > > v = arange(4) > > print(diag(v)) # Output: [[0 0 0 0] [0 1 0 > 0] [0 0 2 > > 0] [0 0 0 3]] > > print(einsum('i->ii', v)) # New behavior > would be same > > as previous line. > > # The current behavior of the previous line > is to > > raise an exception. > > ``` > > > > By opposition to `numpy.diag`, the approach > > generalizes to higher > > dimensions: `einsum('iii->i', A)` extracts > the > > diagonal of a 3-D array, > > and `einsum('i->iii', v)` would build a > diagonal 3-D > > array. > > > > The proposed behavior really starts to shine > in more > > intricate cases. > > > > ```python > > # Dummy values, these should be > probabilities to make > > sense below. > > P_w_ab = arange(24).reshape(3,2,4) > > P_y_wxab = arange(144).reshape(3,3,2,2,4) > > > > # With the proposed behavior, the following > two lines > > should be equivalent. > > P_xyz_ab = einsum('wab,xa,ywxab,zy->xyzab', > P_w_ab, > > eye(2), P_y_wxab, > > eye(3)) > > also_P_xyz_ab = einsum('wab,ywaab->ayyab', > P_w_ab, > > P_y_wxab) > > ``` > > > > If this is not convincing enough, replace > `eye(2)` by > > `eye(P_w_ab.shape[1])` and replace `eye(3)` > by > > `eye(P_y_wxab.shape[0])`, > > then imagine more dimensions and repeated > indices... > > The new notation > > would allow for crisper codes and reduce the > > opportunities for dumb > > mistakes. > > > > For those who wonder, the above computation > amounts to > > $P(X=x,Y=y,Z=z|A=a,B=b) = \sum_w P(W=w| > A=a,B=b) P(X=x| > > A=a) > > P(Y=y|W=w,X=x,A=a,B=b) P(Z=z|Y=y)$ with > $P(X=x|A=a)= > > \delta_{xa}$ and > > $P(Z=z|Y=y)=\delta_{zy}$ (using LaTeX > notation, and > > $\delta_{ij}$ is > > [Kronecker's > > > delta](http://en.wikipedia.org/wiki/Kronecker_delta)). > > > > (End of original post.) > > > > I have been told by @jaimefrio that "The > best way of > > getting a new > > feature into numpy is putting it in > yourself." Hence, > > if discussions > > here do reveal that this is a good idea, > then I may > > give a try at coding > > it myself. However, I currently know nothing > of the > > inner workings of > > numpy/ndarray/einsum, and I have higher > priorities > > right now. This means > > that it could take a long while before I > contribute > > any code, if I ever > > do. Hence, if anyone feels like doing it, > feel free to > > do so! > > > > Also, I am aware that storing a lot of zeros > in an > > `ndarray` may not, a > > priori, be a desirable avenue. However, > there are > > times where you have > > to do it: think of `numpy.eye` as an > example. In my > > case of application, > > I use such diagonal structures in the > initialization > > of an `ndarray` > > which is later updated through an iterative > process. > > After these > > iterations, most of the zeros will be gone. > Do other > > people see a use > > for such capabilities? > > > > Thank you for your time and have a nice day. > > > > Sincerely, > > > > Pierre-Andr? No?l > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From hoogendoorn.eelco at gmail.com Fri Aug 15 11:20:16 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Fri, 15 Aug 2014 17:20:16 +0200 Subject: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal In-Reply-To: <1408114906.20638.20.camel@sebastian-t440> References: <53ED08FA.7040407@gmail.com> <1408110803.20638.18.camel@sebastian-t440> <1408114906.20638.20.camel@sebastian-t440> Message-ID: Well, there is the numpy-API C level, and then there is the arcane macro C level. The two might as well be a completely different language. Indeed, it should be doing something similar for the inputs. Actually, I think I wrote a wrapper around einsum/numexpr once that performed this generalized indexing once... ill see if I can dig that up. On Fri, Aug 15, 2014 at 5:01 PM, Sebastian Berg wrote: > On Fr, 2014-08-15 at 16:42 +0200, Eelco Hoogendoorn wrote: > > Agreed; this addition occurred to me as well. Note that the > > implemenatation should be straightforward: just allocate an enlarged > > array, use some striding logic to construct the relevant view, and let > > einsums internals act on the view. hopefully, you wont even have to > > touch the guts of einsum at the C level, because id say that isn't for > > the faint of heart... > > > > I am not sure if einsum isn't pure C :). But even if, it should be doing > something identical already for duplicate indices on the inputs... > > - Sebastian > > > > > On Fri, Aug 15, 2014 at 3:53 PM, Sebastian Berg > > wrote: > > On Do, 2014-08-14 at 12:42 -0700, Stephan Hoyer wrote: > > > I think this would be very nice addition. > > > > > > > > > On Thu, Aug 14, 2014 at 12:21 PM, Benjamin Root > > > > > wrote: > > > You had me at Kronecker delta... :-) +1 > > > > > > > > > Sounds good to me. I don't see a reason for not relaxing the > > restriction, unless there is some technical issue, but I doubt > > that. > > > > - Sebastian > > > > > > > > > > > On Thu, Aug 14, 2014 at 3:07 PM, Pierre-Andre Noel > > > wrote: > > > (I created issue 4965 earlier today on this > > topic, and > > > I have been > > > advised to email to this mailing list to > > discuss > > > whether it is a good > > > idea or not. I include my original post > > as-is, > > > followed by additional > > > comments.) > > > > > > I think that the following new feature would > > make > > > `numpy.einsum` even > > > more powerful/useful/awesome than it already > > is. > > > Moreover, the change > > > should not interfere with existing code, it > > would > > > preserve the > > > "minimalistic" spirit of `numpy.einsum`, and > > the new > > > functionality would > > > integrate in a seamless/intuitive manner for > > the > > > users. > > > > > > In short, the new feature would allow for > > repeated > > > subscripts to appear > > > in the "output" part of the `subscripts` > > parameter > > > (i.e., on the > > > right-hand side of `->`). The corresponding > > dimensions > > > in the resulting > > > `ndarray` would only be filled along their > > diagonal, > > > leaving the off > > > diagonal entries to the default value for > > this `dtype` > > > (typically zero). > > > Note that the current behavior is to raise > > an > > > exception when repeated > > > output subscripts are being used. > > > > > > This is simplest to describe with an example > > involving > > > the dual behavior > > > of `numpy.diag`. > > > > > > ```python > > > # Extracting the diagonal of a 2-D array. > > > A = arange(16).reshape(4,4) > > > print(diag(A)) # Output: [ 0 5 10 15 ] > > > print(einsum('ii->i', A)) # Same as previous > > line > > > (current behavior). > > > > > > # Constructing a diagonal 2-D array. > > > v = arange(4) > > > print(diag(v)) # Output: [[0 0 0 0] [0 1 0 > > 0] [0 0 2 > > > 0] [0 0 0 3]] > > > print(einsum('i->ii', v)) # New behavior > > would be same > > > as previous line. > > > # The current behavior of the previous line > > is to > > > raise an exception. > > > ``` > > > > > > By opposition to `numpy.diag`, the approach > > > generalizes to higher > > > dimensions: `einsum('iii->i', A)` extracts > > the > > > diagonal of a 3-D array, > > > and `einsum('i->iii', v)` would build a > > diagonal 3-D > > > array. > > > > > > The proposed behavior really starts to shine > > in more > > > intricate cases. > > > > > > ```python > > > # Dummy values, these should be > > probabilities to make > > > sense below. > > > P_w_ab = arange(24).reshape(3,2,4) > > > P_y_wxab = arange(144).reshape(3,3,2,2,4) > > > > > > # With the proposed behavior, the following > > two lines > > > should be equivalent. > > > P_xyz_ab = einsum('wab,xa,ywxab,zy->xyzab', > > P_w_ab, > > > eye(2), P_y_wxab, > > > eye(3)) > > > also_P_xyz_ab = einsum('wab,ywaab->ayyab', > > P_w_ab, > > > P_y_wxab) > > > ``` > > > > > > If this is not convincing enough, replace > > `eye(2)` by > > > `eye(P_w_ab.shape[1])` and replace `eye(3)` > > by > > > `eye(P_y_wxab.shape[0])`, > > > then imagine more dimensions and repeated > > indices... > > > The new notation > > > would allow for crisper codes and reduce the > > > opportunities for dumb > > > mistakes. > > > > > > For those who wonder, the above computation > > amounts to > > > $P(X=x,Y=y,Z=z|A=a,B=b) = \sum_w P(W=w| > > A=a,B=b) P(X=x| > > > A=a) > > > P(Y=y|W=w,X=x,A=a,B=b) P(Z=z|Y=y)$ with > > $P(X=x|A=a)= > > > \delta_{xa}$ and > > > $P(Z=z|Y=y)=\delta_{zy}$ (using LaTeX > > notation, and > > > $\delta_{ij}$ is > > > [Kronecker's > > > > > delta](http://en.wikipedia.org/wiki/Kronecker_delta)). > > > > > > (End of original post.) > > > > > > I have been told by @jaimefrio that "The > > best way of > > > getting a new > > > feature into numpy is putting it in > > yourself." Hence, > > > if discussions > > > here do reveal that this is a good idea, > > then I may > > > give a try at coding > > > it myself. However, I currently know nothing > > of the > > > inner workings of > > > numpy/ndarray/einsum, and I have higher > > priorities > > > right now. This means > > > that it could take a long while before I > > contribute > > > any code, if I ever > > > do. Hence, if anyone feels like doing it, > > feel free to > > > do so! > > > > > > Also, I am aware that storing a lot of zeros > > in an > > > `ndarray` may not, a > > > priori, be a desirable avenue. However, > > there are > > > times where you have > > > to do it: think of `numpy.eye` as an > > example. In my > > > case of application, > > > I use such diagonal structures in the > > initialization > > > of an `ndarray` > > > which is later updated through an iterative > > process. > > > After these > > > iterations, most of the zeros will be gone. > > Do other > > > people see a use > > > for such capabilities? > > > > > > Thank you for your time and have a nice day. > > > > > > Sincerely, > > > > > > Pierre-Andr? No?l > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at scipy.org > > > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at scipy.org > > > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at scipy.org > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Aug 15 13:03:27 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 15 Aug 2014 11:03:27 -0600 Subject: [Numpy-discussion] Question about OpenBLAS affinity setting. Message-ID: Hi All, Seeking some discussion about setting the OpenBLAS affinity. Currently this is set when numeric.py is imported: try: > # disables openblas affinity setting of the main thread that limits > # python threads or processes to one core > if 'OPENBLAS_MAIN_FREE' not in os.environ: > os.environ['OPENBLAS_MAIN_FREE'] = '1' > if 'GOTOBLAS_MAIN_FREE' not in os.environ: > os.environ['GOTOBLAS_MAIN_FREE'] = '1' > from ._dotblas import dot, vdot, inner > except ImportError: > ... > Note that the affinity is set whether or not the import of _dotblas fails, which it will if cblas was not found at buildtime. This all seems a bit hinky to me. If we are always going to set the affinity, it should be in the core/__init__.py file. If the setting is moved after the import, it will still happen if, say, ATLAS cblas is present. It seems to me that there should be a better, more transparent way to do this, especially as I'm in the process of moving some of the cblas down into multiarray. One option is to make this part of the multiarray module import, but without a bit of work, that would still depend only on cblas being detected during the build process rather than OpenBLAS. Also, is GotoBLAS still a viable option? Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Fri Aug 15 14:32:26 2014 From: fperez.net at gmail.com (Fernando Perez) Date: Fri, 15 Aug 2014 11:32:26 -0700 Subject: [Numpy-discussion] Heads-up: Guido van Rossum on mypy-style type annotations... Message-ID: Hi folks, [x-posting to numba and numpy lists, discussion should be on the python-ideas list where the python core folks actually reside] just a quick note that Guido has proposed to adopt the mypy model for type annotations in the language: http://thread.gmane.org/gmane.comp.python.ideas/28619 This same heads-up was also sent out by Stefan Behnel to the cython ML, please feel free to pass it to other communities that might be interested in this problem and can provide feedback to python-core while this is in the discussion phase. Cheers, f -- Fernando Perez (@fperez_org; http://fperez.org) fperez.net-at-gmail: mailing lists only (I ignore this when swamped!) fernando.perez-at-berkeley: contact me here for any direct mail -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Fri Aug 15 15:09:02 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Fri, 15 Aug 2014 21:09:02 +0200 Subject: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal In-Reply-To: References: <53ED08FA.7040407@gmail.com> <1408110803.20638.18.camel@sebastian-t440> <1408114906.20638.20.camel@sebastian-t440> Message-ID: here is a snippet I extracted from a project with similar aims (integrating the functionality of einsum and numexpr, actually) Not much to it, but in case someone needs a reminder on how to use striding tricks: http://pastebin.com/kQNySjcj On Fri, Aug 15, 2014 at 5:20 PM, Eelco Hoogendoorn < hoogendoorn.eelco at gmail.com> wrote: > Well, there is the numpy-API C level, and then there is the arcane macro C > level. The two might as well be a completely different language. > > Indeed, it should be doing something similar for the inputs. Actually, I > think I wrote a wrapper around einsum/numexpr once that performed this > generalized indexing once... ill see if I can dig that up. > > > On Fri, Aug 15, 2014 at 5:01 PM, Sebastian Berg < > sebastian at sipsolutions.net> wrote: > >> On Fr, 2014-08-15 at 16:42 +0200, Eelco Hoogendoorn wrote: >> > Agreed; this addition occurred to me as well. Note that the >> > implemenatation should be straightforward: just allocate an enlarged >> > array, use some striding logic to construct the relevant view, and let >> > einsums internals act on the view. hopefully, you wont even have to >> > touch the guts of einsum at the C level, because id say that isn't for >> > the faint of heart... >> > >> >> I am not sure if einsum isn't pure C :). But even if, it should be doing >> something identical already for duplicate indices on the inputs... >> >> - Sebastian >> >> > >> > On Fri, Aug 15, 2014 at 3:53 PM, Sebastian Berg >> > wrote: >> > On Do, 2014-08-14 at 12:42 -0700, Stephan Hoyer wrote: >> > > I think this would be very nice addition. >> > > >> > > >> > > On Thu, Aug 14, 2014 at 12:21 PM, Benjamin Root >> > >> > > wrote: >> > > You had me at Kronecker delta... :-) +1 >> > > >> > >> > >> > Sounds good to me. I don't see a reason for not relaxing the >> > restriction, unless there is some technical issue, but I doubt >> > that. >> > >> > - Sebastian >> > >> > > >> > > >> > > On Thu, Aug 14, 2014 at 3:07 PM, Pierre-Andre Noel >> > > wrote: >> > > (I created issue 4965 earlier today on this >> > topic, and >> > > I have been >> > > advised to email to this mailing list to >> > discuss >> > > whether it is a good >> > > idea or not. I include my original post >> > as-is, >> > > followed by additional >> > > comments.) >> > > >> > > I think that the following new feature would >> > make >> > > `numpy.einsum` even >> > > more powerful/useful/awesome than it already >> > is. >> > > Moreover, the change >> > > should not interfere with existing code, it >> > would >> > > preserve the >> > > "minimalistic" spirit of `numpy.einsum`, and >> > the new >> > > functionality would >> > > integrate in a seamless/intuitive manner for >> > the >> > > users. >> > > >> > > In short, the new feature would allow for >> > repeated >> > > subscripts to appear >> > > in the "output" part of the `subscripts` >> > parameter >> > > (i.e., on the >> > > right-hand side of `->`). The corresponding >> > dimensions >> > > in the resulting >> > > `ndarray` would only be filled along their >> > diagonal, >> > > leaving the off >> > > diagonal entries to the default value for >> > this `dtype` >> > > (typically zero). >> > > Note that the current behavior is to raise >> > an >> > > exception when repeated >> > > output subscripts are being used. >> > > >> > > This is simplest to describe with an example >> > involving >> > > the dual behavior >> > > of `numpy.diag`. >> > > >> > > ```python >> > > # Extracting the diagonal of a 2-D array. >> > > A = arange(16).reshape(4,4) >> > > print(diag(A)) # Output: [ 0 5 10 15 ] >> > > print(einsum('ii->i', A)) # Same as previous >> > line >> > > (current behavior). >> > > >> > > # Constructing a diagonal 2-D array. >> > > v = arange(4) >> > > print(diag(v)) # Output: [[0 0 0 0] [0 1 0 >> > 0] [0 0 2 >> > > 0] [0 0 0 3]] >> > > print(einsum('i->ii', v)) # New behavior >> > would be same >> > > as previous line. >> > > # The current behavior of the previous line >> > is to >> > > raise an exception. >> > > ``` >> > > >> > > By opposition to `numpy.diag`, the approach >> > > generalizes to higher >> > > dimensions: `einsum('iii->i', A)` extracts >> > the >> > > diagonal of a 3-D array, >> > > and `einsum('i->iii', v)` would build a >> > diagonal 3-D >> > > array. >> > > >> > > The proposed behavior really starts to shine >> > in more >> > > intricate cases. >> > > >> > > ```python >> > > # Dummy values, these should be >> > probabilities to make >> > > sense below. >> > > P_w_ab = arange(24).reshape(3,2,4) >> > > P_y_wxab = arange(144).reshape(3,3,2,2,4) >> > > >> > > # With the proposed behavior, the following >> > two lines >> > > should be equivalent. >> > > P_xyz_ab = einsum('wab,xa,ywxab,zy->xyzab', >> > P_w_ab, >> > > eye(2), P_y_wxab, >> > > eye(3)) >> > > also_P_xyz_ab = einsum('wab,ywaab->ayyab', >> > P_w_ab, >> > > P_y_wxab) >> > > ``` >> > > >> > > If this is not convincing enough, replace >> > `eye(2)` by >> > > `eye(P_w_ab.shape[1])` and replace `eye(3)` >> > by >> > > `eye(P_y_wxab.shape[0])`, >> > > then imagine more dimensions and repeated >> > indices... >> > > The new notation >> > > would allow for crisper codes and reduce the >> > > opportunities for dumb >> > > mistakes. >> > > >> > > For those who wonder, the above computation >> > amounts to >> > > $P(X=x,Y=y,Z=z|A=a,B=b) = \sum_w P(W=w| >> > A=a,B=b) P(X=x| >> > > A=a) >> > > P(Y=y|W=w,X=x,A=a,B=b) P(Z=z|Y=y)$ with >> > $P(X=x|A=a)= >> > > \delta_{xa}$ and >> > > $P(Z=z|Y=y)=\delta_{zy}$ (using LaTeX >> > notation, and >> > > $\delta_{ij}$ is >> > > [Kronecker's >> > > >> > delta](http://en.wikipedia.org/wiki/Kronecker_delta)). >> > > >> > > (End of original post.) >> > > >> > > I have been told by @jaimefrio that "The >> > best way of >> > > getting a new >> > > feature into numpy is putting it in >> > yourself." Hence, >> > > if discussions >> > > here do reveal that this is a good idea, >> > then I may >> > > give a try at coding >> > > it myself. However, I currently know nothing >> > of the >> > > inner workings of >> > > numpy/ndarray/einsum, and I have higher >> > priorities >> > > right now. This means >> > > that it could take a long while before I >> > contribute >> > > any code, if I ever >> > > do. Hence, if anyone feels like doing it, >> > feel free to >> > > do so! >> > > >> > > Also, I am aware that storing a lot of zeros >> > in an >> > > `ndarray` may not, a >> > > priori, be a desirable avenue. However, >> > there are >> > > times where you have >> > > to do it: think of `numpy.eye` as an >> > example. In my >> > > case of application, >> > > I use such diagonal structures in the >> > initialization >> > > of an `ndarray` >> > > which is later updated through an iterative >> > process. >> > > After these >> > > iterations, most of the zeros will be gone. >> > Do other >> > > people see a use >> > > for such capabilities? >> > > >> > > Thank you for your time and have a nice day. >> > > >> > > Sincerely, >> > > >> > > Pierre-Andr? No?l >> > > >> > _______________________________________________ >> > > NumPy-Discussion mailing list >> > > NumPy-Discussion at scipy.org >> > > >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > >> > > >> > > >> > > _______________________________________________ >> > > NumPy-Discussion mailing list >> > > NumPy-Discussion at scipy.org >> > > >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > >> > > >> > > >> > > _______________________________________________ >> > > NumPy-Discussion mailing list >> > > NumPy-Discussion at scipy.org >> > > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrea.gavana at gmail.com Fri Aug 15 17:07:58 2014 From: andrea.gavana at gmail.com (Andrea Gavana) Date: Fri, 15 Aug 2014 23:07:58 +0200 Subject: [Numpy-discussion] Heads-up: Guido van Rossum on mypy-style type annotations... In-Reply-To: References: Message-ID: On 15 August 2014 20:32, Fernando Perez wrote: > Hi folks, > > [x-posting to numba and numpy lists, discussion should be on the > python-ideas list where the python core folks actually reside] > > just a quick note that Guido has proposed to adopt the mypy model for type > annotations in the language: > > http://thread.gmane.org/gmane.comp.python.ideas/28619 > > This same heads-up was also sent out by Stefan Behnel to the cython ML, > please feel free to pass it to other communities that might be interested > in this problem and can provide feedback to python-core while this is in > the discussion phase. > > And I thought it was a joke... Luckily it's optional :-) But the thread on Python-ideas is a good read, thanks for the info. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://www.infinity77.net # ------------------------------------------------------------- # def ask_mailing_list_support(email): if mention_platform_and_version() and include_sample_app(): send_message(email) else: install_malware() erase_hard_drives() # ------------------------------------------------------------- # -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrea.gavana at gmail.com Tue Aug 19 10:23:10 2014 From: andrea.gavana at gmail.com (Andrea Gavana) Date: Tue, 19 Aug 2014 16:23:10 +0200 Subject: [Numpy-discussion] Numpy/Fortran puzzle (?) Message-ID: Hi All, I have the following (very ugly) line of code: all_results = np.asarray([transm_hist[date_idx, :, idx_main_set[date_idx] ]*main_flow[date_idx, 0:n_fluids] for date_idx in xrange(n_dates)]) where transm_hist.shape = (n_dates, n_fluids, n_nodes), main_flow.shape = (n_dates, n_fluids) and idx_main_set is an array containing integer indices with idx_main_set.shape = (n_dates, ) . The resulting variable all_results.shape = (n_dates, n_fluids) Since that line of code is relatively slow if done repeatedly, I thought I'd be smart to rewrite it in Fortran and then use f2py to wrap the subroutine. So I wrote this: subroutine matmul(transm_hist, idx_main_set, main_flow, all_results, & n_dates, n_fluids, n_nodes) implicit none integer ( kind = 4 ), intent(in) :: n_dates, n_fluids, n_nodes real ( kind = 4 ), intent(in) :: transm_hist(n_dates, n_fluids, n_nodes) real ( kind = 4 ), intent(in) :: main_flow(n_dates, n_fluids) integer ( kind = 4 ), intent(in) :: idx_main_set(n_dates) real ( kind = 4 ), intent(out):: all_results(n_dates, n_fluids) integer (kind = 4) i, node do i = 1, n_dates node = int(idx_main_set(i)) all_results(i, :) = transm_hist(i, 1:n_fluids, node)*main_flow(i, 1:n_fluids) enddo end Unfortunately, it appears that I am not getting out quite the same results... I know it's a bit of a stretch with so little information, but does anyone have a suggestion on where the culprit might be? Maybe the elementwise multiplication is done differently in Numpy and Fortran, or I am misunderstanding what the np.asarray is doing with the list comprehension above? I appreciate any suggestion, which can also be related to improvement in the code. Thank you in advance. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://www.infinity77.net -------------- next part -------------- An HTML attachment was scrubbed... URL: From noel.pierre.andre at gmail.com Wed Aug 20 09:26:09 2014 From: noel.pierre.andre at gmail.com (Pierre-Andre Noel) Date: Wed, 20 Aug 2014 09:26:09 -0400 Subject: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal In-Reply-To: References: <53ED08FA.7040407@gmail.com> <1408110803.20638.18.camel@sebastian-t440> <1408114906.20638.20.camel@sebastian-t440> Message-ID: <53F4A1F1.6070305@gmail.com> Thanks all for the feedback! So there appears to be interest for this feature, and I think that I can implement it. However, it may take a while before I do so: I have other priorities right now. In view of jaimefrio's comment on https://github.com/numpy/numpy/issues/4965 as well as Eelco Hoogendoorn's reply above, here is how I currently intend to implement the feature. 1. Implement a `diag_view` function that uses strides to make a view. The function would use subscripts in a way very similar to `einsum`, except that no commas are allowed and all indices appearing on one side of `->` must also appear on the other side. Like the current `einsum`, indices on the right-hand side of `->` cannot be repeated. For example, `B=diag_view('iij->ij',A)` returns a 2D view `B` of the 3D array `A` where the off-diagonal elements in the first two dimensions of `A` are inaccessible in `B`. 2. The edits to `einsum` itself should be minimal. For the purpose of the following, suppose that the indices have the form `lhs+'->'+rhs`, where `lhs` and `rhs` are character strings. To make sure that the current behavior of `einsum` is not slowed down nor broken by the new functionality, I intend to limit edits to the point where an error would be raised due to repeated indices in `rhs`. The following outlines what would replace the current error-raising. 2.1 Extract from `rhs` the first occurrences of each indices; call that `rhs_first_oc`. 2.2 If no `out` has been provided to `einsum`, allocate a zeroed out `ndarray` of appropriate size, including off-diagonal entries; call that `full_out`. If an `out` was provided to `einsum`, set `full_out=out`. 2.3 Set `diag_out=diag_view(rhs+'->'+rhs_first_oc,full_out)`. 2.4 Call `einsum(lhs+'->'+rhs_first_oc, [...], out=diag_out)`. This call is recursive, but the recursion should stop there. 2.5 Return `full_out`. Note that if an `out` is provided to `einsum`, the off-diagonal entries are not zeroed out. This should be a documented "feature" of `einsum`. A disadvantage of this approach is that the indices are parsed 2-4 times, depending how you count. However, for large `ndarray`, the bottleneck won't be there anyway. Thanks again! Pierre-Andr? No?l On 08/15/2014 03:09 PM, Eelco Hoogendoorn wrote: > here is a snippet I extracted from a project with similar aims > (integrating the functionality of einsum and numexpr, actually) > > Not much to it, but in case someone needs a reminder on how to use > striding tricks: > http://pastebin.com/kQNySjcj > > > On Fri, Aug 15, 2014 at 5:20 PM, Eelco Hoogendoorn > > wrote: > > Well, there is the numpy-API C level, and then there is the arcane > macro C level. The two might as well be a completely different > language. > > Indeed, it should be doing something similar for the inputs. > Actually, I think I wrote a wrapper around einsum/numexpr once > that performed this generalized indexing once... ill see if I can > dig that up. > > > On Fri, Aug 15, 2014 at 5:01 PM, Sebastian Berg > > > wrote: > > On Fr, 2014-08-15 at 16:42 +0200, Eelco Hoogendoorn wrote: > > Agreed; this addition occurred to me as well. Note that the > > implemenatation should be straightforward: just allocate an > enlarged > > array, use some striding logic to construct the relevant > view, and let > > einsums internals act on the view. hopefully, you wont even > have to > > touch the guts of einsum at the C level, because id say that > isn't for > > the faint of heart... > > > > I am not sure if einsum isn't pure C :). But even if, it > should be doing > something identical already for duplicate indices on the inputs... > > - Sebastian > > > > > On Fri, Aug 15, 2014 at 3:53 PM, Sebastian Berg > > > wrote: > > On Do, 2014-08-14 at 12:42 -0700, Stephan Hoyer wrote: > > > I think this would be very nice addition. > > > > > > > > > On Thu, Aug 14, 2014 at 12:21 PM, Benjamin Root > > > > > > wrote: > > > You had me at Kronecker delta... :-) +1 > > > > > > > > > Sounds good to me. I don't see a reason for not > relaxing the > > restriction, unless there is some technical issue, > but I doubt > > that. > > > > - Sebastian > > > > > > > > > > > On Thu, Aug 14, 2014 at 3:07 PM, > Pierre-Andre Noel > > > > wrote: > > > (I created issue 4965 earlier > today on this > > topic, and > > > I have been > > > advised to email to this mailing > list to > > discuss > > > whether it is a good > > > idea or not. I include my original > post > > as-is, > > > followed by additional > > > comments.) > > > > > > I think that the following new > feature would > > make > > > `numpy.einsum` even > > > more powerful/useful/awesome than > it already > > is. > > > Moreover, the change > > > should not interfere with existing > code, it > > would > > > preserve the > > > "minimalistic" spirit of `numpy.einsum`, and > > the new > > > functionality would > > > integrate in a seamless/intuitive > manner for > > the > > > users. > > > > > > In short, the new feature would > allow for > > repeated > > > subscripts to appear > > > in the "output" part of the > `subscripts` > > parameter > > > (i.e., on the > > > right-hand side of `->`). The > corresponding > > dimensions > > > in the resulting > > > `ndarray` would only be filled > along their > > diagonal, > > > leaving the off > > > diagonal entries to the default > value for > > this `dtype` > > > (typically zero). > > > Note that the current behavior is > to raise > > an > > > exception when repeated > > > output subscripts are being used. > > > > > > This is simplest to describe with > an example > > involving > > > the dual behavior > > > of `numpy.diag`. > > > > > > ```python > > > # Extracting the diagonal of a 2-D > array. > > > A = arange(16).reshape(4,4) > > > print(diag(A)) # Output: [ 0 5 10 15 ] > > > print(einsum('ii->i', A)) # Same as previous > > line > > > (current behavior). > > > > > > # Constructing a diagonal 2-D array. > > > v = arange(4) > > > print(diag(v)) # Output: [[0 0 0 0] [0 1 0 > > 0] [0 0 2 > > > 0] [0 0 0 3]] > > > print(einsum('i->ii', v)) # New behavior > > would be same > > > as previous line. > > > # The current behavior of the > previous line > > is to > > > raise an exception. > > > ``` > > > > > > By opposition to `numpy.diag`, the > approach > > > generalizes to higher > > > dimensions: `einsum('iii->i', A)` > extracts > > the > > > diagonal of a 3-D array, > > > and `einsum('i->iii', v)` would > build a > > diagonal 3-D > > > array. > > > > > > The proposed behavior really > starts to shine > > in more > > > intricate cases. > > > > > > ```python > > > # Dummy values, these should be > > probabilities to make > > > sense below. > > > P_w_ab = arange(24).reshape(3,2,4) > > > P_y_wxab = > arange(144).reshape(3,3,2,2,4) > > > > > > # With the proposed behavior, the > following > > two lines > > > should be equivalent. > > > P_xyz_ab = > einsum('wab,xa,ywxab,zy->xyzab', > > P_w_ab, > > > eye(2), P_y_wxab, > > > eye(3)) > > > also_P_xyz_ab = einsum('wab,ywaab->ayyab', > > P_w_ab, > > > P_y_wxab) > > > ``` > > > > > > If this is not convincing enough, > replace > > `eye(2)` by > > > `eye(P_w_ab.shape[1])` and replace `eye(3)` > > by > > > `eye(P_y_wxab.shape[0])`, > > > then imagine more dimensions and > repeated > > indices... > > > The new notation > > > would allow for crisper codes and > reduce the > > > opportunities for dumb > > > mistakes. > > > > > > For those who wonder, the above > computation > > amounts to > > > $P(X=x,Y=y,Z=z|A=a,B=b) = \sum_w P(W=w| > > A=a,B=b) P(X=x| > > > A=a) > > > P(Y=y|W=w,X=x,A=a,B=b) P(Z=z|Y=y)$ with > > $P(X=x|A=a)= > > > \delta_{xa}$ and > > > $P(Z=z|Y=y)=\delta_{zy}$ (using LaTeX > > notation, and > > > $\delta_{ij}$ is > > > [Kronecker's > > > > > > delta](http://en.wikipedia.org/wiki/Kronecker_delta) > ). > > > > > > (End of original post.) > > > > > > I have been told by @jaimefrio > that "The > > best way of > > > getting a new > > > feature into numpy is putting it in > > yourself." Hence, > > > if discussions > > > here do reveal that this is a good > idea, > > then I may > > > give a try at coding > > > it myself. However, I currently > know nothing > > of the > > > inner workings of > > > numpy/ndarray/einsum, and I have higher > > priorities > > > right now. This means > > > that it could take a long while > before I > > contribute > > > any code, if I ever > > > do. Hence, if anyone feels like > doing it, > > feel free to > > > do so! > > > > > > Also, I am aware that storing a > lot of zeros > > in an > > > `ndarray` may not, a > > > priori, be a desirable avenue. > However, > > there are > > > times where you have > > > to do it: think of `numpy.eye` as an > > example. In my > > > case of application, > > > I use such diagonal structures in the > > initialization > > > of an `ndarray` > > > which is later updated through an > iterative > > process. > > > After these > > > iterations, most of the zeros will > be gone. > > Do other > > > people see a use > > > for such capabilities? > > > > > > Thank you for your time and have a > nice day. > > > > > > Sincerely, > > > > > > Pierre-Andr? No?l > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at scipy.org > > > > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at scipy.org > > > > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at scipy.org > > > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jorisvandenbossche at gmail.com Mon Aug 18 07:18:21 2014 From: jorisvandenbossche at gmail.com (Joris Van den Bossche) Date: Mon, 18 Aug 2014 13:18:21 +0200 Subject: [Numpy-discussion] Docs website down? Message-ID: It seems the docs website of numpy and scipy (http://docs.scipy.org/doc/) is down. Is anyone looking at this? There is even already a stackoverflow question about it .. Best regards, Joris -------------- next part -------------- An HTML attachment was scrubbed... URL: From arman.eshaghi at gmail.com Tue Aug 19 04:18:28 2014 From: arman.eshaghi at gmail.com (Arman Eshaghi) Date: Tue, 19 Aug 2014 12:48:28 +0430 Subject: [Numpy-discussion] Scipy.org has been unaccessible from UK and US Message-ID: Hi everyone, I apologize if this is not the right place to report this, for some reason docs.scipy.org including numpy documentation have been unavailable from yesterday. I have tried to access from servers in UK and US and the problem seems to be on scipy servers. scipy.org works fine though. Just wanted to give a heads up. Thanks Arman -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.hirschfeld at gmail.com Wed Aug 20 04:39:05 2014 From: dave.hirschfeld at gmail.com (Dave Hirschfeld) Date: Wed, 20 Aug 2014 08:39:05 +0000 (UTC) Subject: [Numpy-discussion] Website down! Message-ID: It seems that the docs website is down? http://docs.scipy.org/doc/ -Dave From charlesr.harris at gmail.com Sun Aug 17 17:37:07 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 17 Aug 2014 15:37:07 -0600 Subject: [Numpy-discussion] What to do about vdot? Message-ID: Hi All, I've moved the cblas implementations of the dot and inner functions from _dotblas down into multiarray. The only reason to retain the _dotblas file at this point is the vdot function. I think that vdot belongs in the linalg module, but another option is to make it part of multiarray where it sort of complements the inner function. Opinions? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From crist042 at umn.edu Wed Aug 20 21:34:05 2014 From: crist042 at umn.edu (James Crist) Date: Wed, 20 Aug 2014 20:34:05 -0500 Subject: [Numpy-discussion] Best way to broadcast a function from C Message-ID: All, I have a C function func that takes in scalar arguments, and an array of fixed dimension that is modified in place to provide the output. The prototype is something like: `void func(double a, double b, double c, double *arr);` I've wrapped this in Cython and called it from python with no problem. What I'd like to do now though is get it set up to broadcast over the input arguments and return a 3 dimensional array of the results. By this I mean a = array([1, 2, 3]) b = array([2.0, 3.0, 4.0]) c = array([3, 4, 5]) func(a, b, c) -> a 3d array containing the results of func for (a, b, c) = (1, 2.0, 3), (2, 3.0, 4), (3, 4.0, 5) I'm not sure if this would qualify as a ufunc, as the result of one function call isn't a scalar but an array, but the effect I'm looking for is similar. Ideally it would handle datatype conversions (in the above `a` and `c` aren't double, but `func` takes in double). It would also be awesome to allow an argument to be a scalar and not an array, and have it be broadcast as if it were. I'm just wondering what the best way for me to hook my code up to the internals of numpy and get this kind of behavior in an efficient way. I've read the "writing your own ufunc" part of the docs, but am unsure if what I'm looking for qualifies. Note that I can change the inner workings of `func` if this is required to achieve this behavior. Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Thu Aug 21 01:24:22 2014 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Wed, 20 Aug 2014 22:24:22 -0700 Subject: [Numpy-discussion] Proposed new feature for numpy.einsum: repeated output subscripts as diagonal In-Reply-To: <53F4A1F1.6070305@gmail.com> References: <53ED08FA.7040407@gmail.com> <1408110803.20638.18.camel@sebastian-t440> <1408114906.20638.20.camel@sebastian-t440> <53F4A1F1.6070305@gmail.com> Message-ID: On Wed, Aug 20, 2014 at 6:26 AM, Pierre-Andre Noel < noel.pierre.andre at gmail.com> wrote: > Thanks all for the feedback! > > So there appears to be interest for this feature, and I think that I can > implement it. However, it may take a while before I do so: I have other > priorities right now. > > In view of jaimefrio's comment on > https://github.com/numpy/numpy/issues/4965 as well as Eelco Hoogendoorn's > reply above, here is how I currently intend to implement the feature. > > 1. Implement a `diag_view` function that uses strides to make a view. The > function would use subscripts in a way very similar to `einsum`, except > that no commas are allowed and all indices appearing on one side of `->` > must also appear on the other side. Like the current `einsum`, indices on > the right-hand side of `->` cannot be repeated. For example, > `B=diag_view('iij->ij',A)` returns a 2D view `B` of the 3D array `A` where > the off-diagonal elements in the first two dimensions of `A` are > inaccessible in `B`. > > 2. The edits to `einsum` itself should be minimal. For the purpose of the > following, suppose that the indices have the form `lhs+'->'+rhs`, where > `lhs` and `rhs` are character strings. To make sure that the current > behavior of `einsum` is not slowed down nor broken by the new > functionality, I intend to limit edits to the point where an error would be > raised due to repeated indices in `rhs`. The following outlines what would > replace the current error-raising. > > 2.1 Extract from `rhs` the first occurrences of each indices; call > that `rhs_first_oc`. > > 2.2 If no `out` has been provided to `einsum`, allocate a zeroed out > `ndarray` of appropriate size, including off-diagonal entries; call that > `full_out`. If an `out` was provided to `einsum`, set `full_out=out`. > > 2.3 Set `diag_out=diag_view(rhs+'->'+rhs_first_oc,full_out)`. > > 2.4 Call `einsum(lhs+'->'+rhs_first_oc, [...], out=diag_out)`. This > call is recursive, but the recursion should stop there. > > 2.5 Return `full_out`. > I have looked a little into this, and I think there is an additional complication: if I understood the structure of the code correctly, `einsum`'s current entry point is the function `array_einsum` in `multiarraymodule.c`, which accepts two different input methods: the subscript one we have been discussing here, and another one that uses lists of axes after each operand. This second method gets translated into subscript notation by several functions in that same module: `einsum_sub_op_from_str`, `einsum_list_to_subscripts` and `einsum_sub_op_from_lists`, and then the C API einsum function, `PyArray_EinsteinSum` in `einsum.c.src`, which only understands the subscript notation, gets called. The simplest place to implement the changes you propose without any major rearchitecturing is therefore in `PyArray_EinsteinSum`. And while the flow you propose seems to me be correct, doing that at the C level will probably look somewhat different, e.g. you would probably let the iterator create an array with all the axes, and then remove the repeated ones from the iterator and modify the strides, instead of passing in a strided view with fewer axes. If you were planning on writing your code in a Python wrapper, you need to figure out how to keep the alternative syntax code path. Haven't given it much thought, but it doesn't look easy without rewriting a lot of stuff. I see either solution as way too much complication for the reward. And still see writing a function that does the opposite of your `diag_view`, and expecting the end user to chain a call to it to the call to einsum, as the simplest way of providing this functionality. Although if you can find the time and the motivation to do the big change, I am perfectly OK with it, of course! Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaakko.luttinen at aalto.fi Thu Aug 21 07:58:30 2014 From: jaakko.luttinen at aalto.fi (Jaakko Luttinen) Date: Thu, 21 Aug 2014 14:58:30 +0300 Subject: [Numpy-discussion] ANN: BayesPy 0.2 Message-ID: <53F5DEE6.5050204@aalto.fi> Dear all, I am pleased to announce the release of BayesPy version 0.2. BayesPy provides tools for Bayesian inference in Python. In particular, it implements variational message passing framework, which enables modular and efficient way to construct models and perform approximate posterior inference. Download: https://pypi.python.org/pypi/bayespy/ Documentation: http://www.bayespy.org Repository: https://github.com/bayespy/bayespy Comments, feedback and contributions are welcome. Best regards, Jaakko From xabart at gmail.com Thu Aug 21 15:26:20 2014 From: xabart at gmail.com (Xavier Barthelemy) Date: Thu, 21 Aug 2014 20:26:20 +0100 Subject: [Numpy-discussion] Numpy/Fortran puzzle (?) In-Reply-To: References: Message-ID: Hi Andrea You should add a dimension argument in your Fortran code, also you should write a f2py header in the same Fortran code. Remember, numpy memory is C order wise. You can specify in numpy the ordering of the matrices you pass when you create them. F2py automatically deals with matrices , but tends to mix dimensions when there are too many matrices. Manual declaration of dimensions should do the trick Xavier On 21/08/2014 2:07 am, "Andrea Gavana" wrote: > Hi All, > > I have the following (very ugly) line of code: > > all_results = np.asarray([transm_hist[date_idx, :, idx_main_set[date_idx] > ]*main_flow[date_idx, 0:n_fluids] for date_idx in xrange(n_dates)]) > > where transm_hist.shape = (n_dates, n_fluids, n_nodes), main_flow.shape = > (n_dates, n_fluids) and idx_main_set is an array containing integer indices > with idx_main_set.shape = (n_dates, ) . The resulting variable > all_results.shape = (n_dates, n_fluids) > > Since that line of code is relatively slow if done repeatedly, I thought > I'd be smart to rewrite it in Fortran and then use f2py to wrap the > subroutine. So I wrote this: > > subroutine matmul(transm_hist, idx_main_set, main_flow, all_results, & > n_dates, n_fluids, n_nodes) > > implicit none > > integer ( kind = 4 ), intent(in) :: n_dates, n_fluids, n_nodes > > real ( kind = 4 ), intent(in) :: transm_hist(n_dates, n_fluids, > n_nodes) > real ( kind = 4 ), intent(in) :: main_flow(n_dates, n_fluids) > integer ( kind = 4 ), intent(in) :: idx_main_set(n_dates) > real ( kind = 4 ), intent(out):: all_results(n_dates, n_fluids) > > integer (kind = 4) i, node > > do i = 1, n_dates > node = int(idx_main_set(i)) > all_results(i, :) = transm_hist(i, 1:n_fluids, node)*main_flow(i, > 1:n_fluids) > enddo > > end > > > Unfortunately, it appears that I am not getting out quite the same > results... I know it's a bit of a stretch with so little information, but > does anyone have a suggestion on where the culprit might be? Maybe the > elementwise multiplication is done differently in Numpy and Fortran, or I > am misunderstanding what the np.asarray is doing with the list > comprehension above? > > I appreciate any suggestion, which can also be related to improvement in > the code. Thank you in advance. > > Andrea. > > "Imagination Is The Only Weapon In The War Against Reality." > http://www.infinity77.net > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Aug 21 17:50:03 2014 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 21 Aug 2014 22:50:03 +0100 Subject: [Numpy-discussion] Best way to broadcast a function from C In-Reply-To: References: Message-ID: On Thu, Aug 21, 2014 at 2:34 AM, James Crist wrote: > All, > > I have a C function func that takes in scalar arguments, and an array of > fixed dimension that is modified in place to provide the output. The > prototype is something like: > > `void func(double a, double b, double c, double *arr);` > > I've wrapped this in Cython and called it from python with no problem. What > I'd like to do now though is get it set up to broadcast over the input > arguments and return a 3 dimensional array of the results. By this I mean > > a = array([1, 2, 3]) > b = array([2.0, 3.0, 4.0]) > c = array([3, 4, 5]) > > func(a, b, c) -> a 3d array containing the results of func for (a, b, c) = > (1, 2.0, 3), (2, 3.0, 4), (3, 4.0, 5) > > I'm not sure if this would qualify as a ufunc, as the result of one function > call isn't a scalar but an array, but the effect I'm looking for is similar. > Ideally it would handle datatype conversions (in the above `a` and `c` > aren't double, but `func` takes in double). It would also be awesome to > allow an argument to be a scalar and not an array, and have it be broadcast > as if it were. > > I'm just wondering what the best way for me to hook my code up to the > internals of numpy and get this kind of behavior in an efficient way. I've > read the "writing your own ufunc" part of the docs, but am unsure if what > I'm looking for qualifies. Note that I can change the inner workings of > `func` if this is required to achieve this behavior. I don't think it's currently possible to write a ufunc that maps scalars to fixed-length arrays. There's probably some not-too-terrible way to do this using the C nditer interface, but IIRC the docs aren't for that aren't very helpful. The simplest approach is just to do a bit of work at the Python level using the standard numpy API to do error-checking and set up your arrays in the way you want, and then pass them to the C implementation. This is not the best way to get a Right solution that will handle all the funky corner cases that ufuncs handle, but it's by far the fastest way to get something that's good enough for whatever you need Right Now. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From lists at onerussian.com Thu Aug 21 22:07:38 2014 From: lists at onerussian.com (Yaroslav Halchenko) Date: Thu, 21 Aug 2014 22:07:38 -0400 Subject: [Numpy-discussion] Just FYI: numpy-vbench was moved to another box, benchmarks are re-estimating Message-ID: <20140822020738.GI8145@onerussian.com> I have no stats on either anyone is looking at http://yarikoptic.github.io/numpy-vbench besides me at times, so I might be just crying into the wild: I have moved running of numpy-vbench on a bit newer/more powerful box, and that is why benchmark results are being reestimated (thus you might still find some spikes and more noise than before for a bit longer). It is a dual Quad-Core AMD Opteron(tm) Processor 2384 with 32GB of RAM, so if someone is keen on pushing the benchmarks limits -- be my guest ;) -- Yaroslav O. Halchenko, Ph.D. http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org Research Scientist, Psychological and Brain Sciences Dept. Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755 Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419 WWW: http://www.linkedin.com/in/yarik From Nicolas.Rougier at inria.fr Fri Aug 22 09:20:54 2014 From: Nicolas.Rougier at inria.fr (Nicolas P. Rougier) Date: Fri, 22 Aug 2014 15:20:54 +0200 Subject: [Numpy-discussion] np.unique with structured arrays Message-ID: <70ACF181-CA5A-4775-ADC1-B6C5E82032BF@inria.fr> Hello, I've found a strange behavior or I'm missing something obvious (or np.unique is not supposed to work with structured arrays). I'm trying to extract unique values from a simple structured array but it does not seem to work as expected. Here is a minimal script showing the problem: import numpy as np V = np.zeros(4, dtype=[("v", np.float32, 3)]) V["v"] = [ [0.5, 0.0, 1.0], [0.5, -1.e-16, 1.0], # [0.5, +1.e-16, 1.0] works [0.5, 0.0, -1.0], [0.5, -1.e-16, -1.0]] # [0.5, +1.e-16, -1.0]] works V_ = np.zeros_like(V) V_["v"][:,0] = V["v"][:,0].round(decimals=3) V_["v"][:,1] = V["v"][:,1].round(decimals=3) V_["v"][:,2] = V["v"][:,2].round(decimals=3) print np.unique(V_) [([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],) ([0.5, -0.0, 1.0],) ([0.5, -0.0, -1.0],)] While I would have expected: [([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],)] Can anyone confirm ? Nicolas -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Fri Aug 22 10:22:38 2014 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Fri, 22 Aug 2014 07:22:38 -0700 Subject: [Numpy-discussion] np.unique with structured arrays In-Reply-To: <70ACF181-CA5A-4775-ADC1-B6C5E82032BF@inria.fr> References: <70ACF181-CA5A-4775-ADC1-B6C5E82032BF@inria.fr> Message-ID: I can confirm, the issue seems to be in sorting: >>> np.sort(V_) array([([0.5, 0.0, 1.0],), ([0.5, 0.0, -1.0],), ([0.5, -0.0, 1.0],), ([0.5, -0.0, -1.0],)], dtype=[('v', ' wrote: > > Hello, > > I've found a strange behavior or I'm missing something obvious (or > np.unique is not supposed to work with structured arrays). > > I'm trying to extract unique values from a simple structured array but it > does not seem to work as expected. > Here is a minimal script showing the problem: > > import numpy as np > > V = np.zeros(4, dtype=[("v", np.float32, 3)]) > V["v"] = [ [0.5, 0.0, 1.0], > [0.5, -1.e-16, 1.0], # [0.5, +1.e-16, 1.0] works > [0.5, 0.0, -1.0], > [0.5, -1.e-16, -1.0]] # [0.5, +1.e-16, -1.0]] works > V_ = np.zeros_like(V) > V_["v"][:,0] = V["v"][:,0].round(decimals=3) > V_["v"][:,1] = V["v"][:,1].round(decimals=3) > V_["v"][:,2] = V["v"][:,2].round(decimals=3) > > print np.unique(V_) > [([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],) ([0.5, -0.0, 1.0],) ([0.5, -0.0, > -1.0],)] > > > While I would have expected: > > [([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],)] > > > Can anyone confirm ? > > > Nicolas > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Fri Aug 22 10:52:07 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Fri, 22 Aug 2014 16:52:07 +0200 Subject: [Numpy-discussion] np.unique with structured arrays In-Reply-To: <70ACF181-CA5A-4775-ADC1-B6C5E82032BF@inria.fr> References: <70ACF181-CA5A-4775-ADC1-B6C5E82032BF@inria.fr> Message-ID: <53f758f0.c90cc30a.0d14.06c0@mx.google.com> It does not sound like an issue with unique, but rather like a matter of floating point equality and representation. Do the ' identical' elements pass an equality test? -----Original Message----- From: "Nicolas P. Rougier" Sent: ?22-?8-?2014 15:21 To: "Discussion of Numerical Python" Subject: [Numpy-discussion] np.unique with structured arrays Hello, I've found a strange behavior or I'm missing something obvious (or np.unique is not supposed to work with structured arrays). I'm trying to extract unique values from a simple structured array but it does not seem to work as expected. Here is a minimal script showing the problem: import numpy as np V = np.zeros(4, dtype=[("v", np.float32, 3)]) V["v"] = [ [0.5, 0.0, 1.0], [0.5, -1.e-16, 1.0], # [0.5, +1.e-16, 1.0] works [0.5, 0.0, -1.0], [0.5, -1.e-16, -1.0]] # [0.5, +1.e-16, -1.0]] works V_ = np.zeros_like(V) V_["v"][:,0] = V["v"][:,0].round(decimals=3) V_["v"][:,1] = V["v"][:,1].round(decimals=3) V_["v"][:,2] = V["v"][:,2].round(decimals=3) print np.unique(V_) [([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],) ([0.5, -0.0, 1.0],) ([0.5, -0.0, -1.0],)] While I would have expected: [([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],)] Can anyone confirm ? Nicolas -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Fri Aug 22 10:54:50 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Fri, 22 Aug 2014 16:54:50 +0200 Subject: [Numpy-discussion] np.unique with structured arrays In-Reply-To: References: <70ACF181-CA5A-4775-ADC1-B6C5E82032BF@inria.fr> Message-ID: <53f75993.eaeec20a.7dd4.0356@mx.google.com> Oh yeah this could be. Floating point equality and bitwise equality are not the same thing. -----Original Message----- From: "Jaime Fern?ndez del R?o" Sent: ?22-?8-?2014 16:22 To: "Discussion of Numerical Python" Subject: Re: [Numpy-discussion] np.unique with structured arrays I can confirm, the issue seems to be in sorting: >>> np.sort(V_) array([([0.5, 0.0, 1.0],), ([0.5, 0.0, -1.0],), ([0.5, -0.0, 1.0],), ([0.5, -0.0, -1.0],)], dtype=[('v', ' wrote: Hello, I've found a strange behavior or I'm missing something obvious (or np.unique is not supposed to work with structured arrays). I'm trying to extract unique values from a simple structured array but it does not seem to work as expected. Here is a minimal script showing the problem: import numpy as np V = np.zeros(4, dtype=[("v", np.float32, 3)]) V["v"] = [ [0.5, 0.0, 1.0], [0.5, -1.e-16, 1.0], # [0.5, +1.e-16, 1.0] works [0.5, 0.0, -1.0], [0.5, -1.e-16, -1.0]] # [0.5, +1.e-16, -1.0]] works V_ = np.zeros_like(V) V_["v"][:,0] = V["v"][:,0].round(decimals=3) V_["v"][:,1] = V["v"][:,1].round(decimals=3) V_["v"][:,2] = V["v"][:,2].round(decimals=3) print np.unique(V_) [([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],) ([0.5, -0.0, 1.0],) ([0.5, -0.0, -1.0],)] While I would have expected: [([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],)] Can anyone confirm ? Nicolas _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Fri Aug 22 13:43:35 2014 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Fri, 22 Aug 2014 10:43:35 -0700 Subject: [Numpy-discussion] np.unique with structured arrays In-Reply-To: <53f75993.eaeec20a.7dd4.0356@mx.google.com> References: <70ACF181-CA5A-4775-ADC1-B6C5E82032BF@inria.fr> <53f75993.eaeec20a.7dd4.0356@mx.google.com> Message-ID: structured arrays are of VOID dtype, but with a non-None names attribute: >>> V_.dtype.num 20 >>> V_.dtype.names ('v',) >>> V_.view(np.void).dtype.num 20 >>> V_.view(np.void).dtype.names >>> The comparison function uses the STRING comparison function if names is None, or a proper field by field comparison if not, see here: https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/arraytypes.c.src#L2675 With a quick look at the source, the only fishy thing I see is that the original array has the sort axis moved to the end of the shape tuple, and is then copied into a contiguous array here: https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/item_selection.c#L1151 But that new array should preserve the dtype unchanged, and hence the right compare function should be called. If no one with a better understanding of the internals spots it, I will try to further debug it over the weekend. Jaime On Fri, Aug 22, 2014 at 7:54 AM, Eelco Hoogendoorn < hoogendoorn.eelco at gmail.com> wrote: > Oh yeah this could be. Floating point equality and bitwise equality are > not the same thing. > ------------------------------ > From: Jaime Fern?ndez del R?o > Sent: ?22-?8-?2014 16:22 > > To: Discussion of Numerical Python > Subject: Re: [Numpy-discussion] np.unique with structured arrays > > I can confirm, the issue seems to be in sorting: > > >>> np.sort(V_) > array([([0.5, 0.0, 1.0],), ([0.5, 0.0, -1.0],), ([0.5, -0.0, 1.0],), > ([0.5, -0.0, -1.0],)], > dtype=[('v', ' > These I think are handled by the generic sort functions, and it looks like > the comparison function being used is the one for a VOID dtype with no > fields, so it is being done byte-wise, hence the problems with 0.0 and > -0.0. Not sure where exactly the bug is, though... > > Jaime > > > > On Fri, Aug 22, 2014 at 6:20 AM, Nicolas P. Rougier < > Nicolas.Rougier at inria.fr> wrote: > >> >> Hello, >> >> I've found a strange behavior or I'm missing something obvious (or >> np.unique is not supposed to work with structured arrays). >> >> I'm trying to extract unique values from a simple structured array but it >> does not seem to work as expected. >> Here is a minimal script showing the problem: >> >> import numpy as np >> >> V = np.zeros(4, dtype=[("v", np.float32, 3)]) >> V["v"] = [ [0.5, 0.0, 1.0], >> [0.5, -1.e-16, 1.0], # [0.5, +1.e-16, 1.0] works >> [0.5, 0.0, -1.0], >> [0.5, -1.e-16, -1.0]] # [0.5, +1.e-16, -1.0]] works >> V_ = np.zeros_like(V) >> V_["v"][:,0] = V["v"][:,0].round(decimals=3) >> V_["v"][:,1] = V["v"][:,1].round(decimals=3) >> V_["v"][:,2] = V["v"][:,2].round(decimals=3) >> >> print np.unique(V_) >> [([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],) ([0.5, -0.0, 1.0],) ([0.5, -0.0, >> -1.0],)] >> >> >> While I would have expected: >> >> [([0.5, 0.0, 1.0],) ([0.5, 0.0, -1.0],)] >> >> >> Can anyone confirm ? >> >> >> Nicolas >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes > de dominaci?n mundial. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From crist042 at umn.edu Fri Aug 22 19:40:06 2014 From: crist042 at umn.edu (James Crist) Date: Fri, 22 Aug 2014 18:40:06 -0500 Subject: [Numpy-discussion] Best way to broadcast a function from C In-Reply-To: References: Message-ID: I suspected as much. This is actually part of my work on numerical evaluation in SymPy. In its current state compilation to C and autowrapping *works*, but I think it could definitely be more versatile/efficient. Since numpy seemed to have solved the broadcasting/datatype issues internally I hoped I could reuse this. I'll look into nditer (as this is for a library, it's better to do it *right* than do it now), but right now it's looking like I may need to implement this functionality myself... Thanks, -Jim On Thu, Aug 21, 2014 at 4:50 PM, Nathaniel Smith wrote: > On Thu, Aug 21, 2014 at 2:34 AM, James Crist wrote: > > All, > > > > I have a C function func that takes in scalar arguments, and an array of > > fixed dimension that is modified in place to provide the output. The > > prototype is something like: > > > > `void func(double a, double b, double c, double *arr);` > > > > I've wrapped this in Cython and called it from python with no problem. > What > > I'd like to do now though is get it set up to broadcast over the input > > arguments and return a 3 dimensional array of the results. By this I mean > > > > a = array([1, 2, 3]) > > b = array([2.0, 3.0, 4.0]) > > c = array([3, 4, 5]) > > > > func(a, b, c) -> a 3d array containing the results of func for (a, b, c) > = > > (1, 2.0, 3), (2, 3.0, 4), (3, 4.0, 5) > > > > I'm not sure if this would qualify as a ufunc, as the result of one > function > > call isn't a scalar but an array, but the effect I'm looking for is > similar. > > Ideally it would handle datatype conversions (in the above `a` and `c` > > aren't double, but `func` takes in double). It would also be awesome to > > allow an argument to be a scalar and not an array, and have it be > broadcast > > as if it were. > > > > I'm just wondering what the best way for me to hook my code up to the > > internals of numpy and get this kind of behavior in an efficient way. > I've > > read the "writing your own ufunc" part of the docs, but am unsure if what > > I'm looking for qualifies. Note that I can change the inner workings of > > `func` if this is required to achieve this behavior. > > I don't think it's currently possible to write a ufunc that maps > scalars to fixed-length arrays. > > There's probably some not-too-terrible way to do this using the C > nditer interface, but IIRC the docs aren't for that aren't very > helpful. > > The simplest approach is just to do a bit of work at the Python level > using the standard numpy API to do error-checking and set up your > arrays in the way you want, and then pass them to the C > implementation. This is not the best way to get a Right solution that > will handle all the funky corner cases that ufuncs handle, but it's by > far the fastest way to get something that's good enough for whatever > you need Right Now. > > -n > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Fri Aug 22 20:50:26 2014 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Fri, 22 Aug 2014 17:50:26 -0700 Subject: [Numpy-discussion] Best way to broadcast a function from C In-Reply-To: References: Message-ID: You can always write your own gufunc with signature '(),(),()->(a, a)', and write a Python wrapper that always call it with an `out=` parameter of shape (..., 3, 3), something along the lines of: def my_wrapper(a, b, c, out=None): if out is None: out = np.empty(np.broadcast(a,b,c).shape + (3, 3)) if out.shape[-2:] != (3, 3): raise ValueError("Wrong shape for 'out'") return my_gufunc(a, b, c, out=out) Writing your own gufunc is a little challenging, but not that hard with all the examples now available in the numpy.linalg code base, and it is a terribly powerful tool. Jaime On Fri, Aug 22, 2014 at 4:40 PM, James Crist wrote: > I suspected as much. This is actually part of my work on numerical > evaluation in SymPy. In its current state compilation to C and autowrapping > *works*, but I think it could definitely be more versatile/efficient. Since > numpy seemed to have solved the broadcasting/datatype issues internally I > hoped I could reuse this. > > I'll look into nditer (as this is for a library, it's better to do it > *right* than do it now), but right now it's looking like I may need to > implement this functionality myself... > > Thanks, > > -Jim > > > On Thu, Aug 21, 2014 at 4:50 PM, Nathaniel Smith wrote: > >> On Thu, Aug 21, 2014 at 2:34 AM, James Crist wrote: >> > All, >> > >> > I have a C function func that takes in scalar arguments, and an array of >> > fixed dimension that is modified in place to provide the output. The >> > prototype is something like: >> > >> > `void func(double a, double b, double c, double *arr);` >> > >> > I've wrapped this in Cython and called it from python with no problem. >> What >> > I'd like to do now though is get it set up to broadcast over the input >> > arguments and return a 3 dimensional array of the results. By this I >> mean >> > >> > a = array([1, 2, 3]) >> > b = array([2.0, 3.0, 4.0]) >> > c = array([3, 4, 5]) >> > >> > func(a, b, c) -> a 3d array containing the results of func for (a, b, >> c) = >> > (1, 2.0, 3), (2, 3.0, 4), (3, 4.0, 5) >> > >> > I'm not sure if this would qualify as a ufunc, as the result of one >> function >> > call isn't a scalar but an array, but the effect I'm looking for is >> similar. >> > Ideally it would handle datatype conversions (in the above `a` and `c` >> > aren't double, but `func` takes in double). It would also be awesome to >> > allow an argument to be a scalar and not an array, and have it be >> broadcast >> > as if it were. >> > >> > I'm just wondering what the best way for me to hook my code up to the >> > internals of numpy and get this kind of behavior in an efficient way. >> I've >> > read the "writing your own ufunc" part of the docs, but am unsure if >> what >> > I'm looking for qualifies. Note that I can change the inner workings of >> > `func` if this is required to achieve this behavior. >> >> I don't think it's currently possible to write a ufunc that maps >> scalars to fixed-length arrays. >> >> There's probably some not-too-terrible way to do this using the C >> nditer interface, but IIRC the docs aren't for that aren't very >> helpful. >> >> The simplest approach is just to do a bit of work at the Python level >> using the standard numpy API to do error-checking and set up your >> arrays in the way you want, and then pass them to the C >> implementation. This is not the best way to get a Right solution that >> will handle all the funky corner cases that ufuncs handle, but it's by >> far the fastest way to get something that's good enough for whatever >> you need Right Now. >> >> -n >> >> -- >> Nathaniel J. Smith >> Postdoctoral researcher - Informatics - University of Edinburgh >> http://vorpus.org >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Aug 22 21:42:52 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 23 Aug 2014 02:42:52 +0100 Subject: [Numpy-discussion] Best way to broadcast a function from C In-Reply-To: References: Message-ID: On Sat, Aug 23, 2014 at 12:40 AM, James Crist wrote: > I suspected as much. This is actually part of my work on numerical > evaluation in SymPy. In its current state compilation to C and autowrapping > *works*, but I think it could definitely be more versatile/efficient. Since > numpy seemed to have solved the broadcasting/datatype issues internally I > hoped I could reuse this. > > I'll look into nditer (as this is for a library, it's better to do it > *right* than do it now), but right now it's looking like I may need to > implement this functionality myself... Ah, I see. Right, if this isn't a one-off for some specific project then disregard my advice from before :-). nditer might be perfect -- or might not. If you figure it out then we'd certainly appreciate any additions to the docs that you feel up to contributing! And similarly, we'd be happy to merge any enhancements you come up with back into numpy itself, e.g. to allow gufunc signatures to handle fixed-size output arrays. (This would also be useful for other cases, e.g. np.cross, which is currently not a gufunc but it would be nice if it were...) -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From cleo21drakos at gmail.com Sat Aug 23 02:14:54 2014 From: cleo21drakos at gmail.com (Cleo Drakos) Date: Sat, 23 Aug 2014 15:14:54 +0900 Subject: [Numpy-discussion] Changing the numpy array into required shape Message-ID: Hello numpy users: I have 2d numpy array of 480 rows and 1440 columns as named by 'data' below: The first element belongs to (49.875S,179.875W), the second element belongs to (49.625S,179.625W),and the last element belongs to (49.875N,179.875E). import os, glob, gdal, numpy as np fname = '3B42RT.2014010606.7.bin' with open(fname, 'rb') as fi: fi.seek(2880,0) data = np.fromfile(fi,dtype=np.uint16,count=480*1440) data = data.byteswap() data = data.reshape(1440,480) How can I convert this numpy array so that its first element belongs to (49.875N,179.625W), i.e., upper left latitude and longitude respectively; and the last element belong to (49.625S,179.875E), i.e., lower right latitute and longitude respectively. I tried to rotate it, but I do not think it is correct. data = np.rot90(data,1) Have some of you experienced with this type of problem? The binary file I am using is here: ftp://trmmopen.gsfc.nasa.gov/pub/merged/3B42RT/3B42RT.2014010606.7.bin.gz cleo -------------- next part -------------- An HTML attachment was scrubbed... URL: From john_ladasky at sbcglobal.net Sat Aug 23 03:02:55 2014 From: john_ladasky at sbcglobal.net (John Ladasky) Date: Sat, 23 Aug 2014 00:02:55 -0700 Subject: [Numpy-discussion] Changing the numpy array into required shape In-Reply-To: References: Message-ID: <53F83C9F.8070202@sbcglobal.net> On 08/22/2014 11:14 PM, Cleo Drakos wrote: > > How can I convert this numpy array so that its first element belongs > to (49.875N,179.625W), i.e., upper left latitude and longitude > respectively; and the last element belong to (49.625S,179.875E), i.e., > lower right latitute and longitude respectively. > > I tried to rotate it, but I do not think it is correct. > I think that you want to use the numpy function flipud(). From nadavh at visionsense.com Sat Aug 23 03:30:55 2014 From: nadavh at visionsense.com (Nadav Horesh) Date: Sat, 23 Aug 2014 07:30:55 +0000 Subject: [Numpy-discussion] Changing the numpy array into required shape In-Reply-To: References: Message-ID: Replace data = data.byteswap() By data = data.byteswap()[::-1] Nadav On 23 Aug 2014 09:15, Cleo Drakos wrote: Hello numpy users: I have 2d numpy array of 480 rows and 1440 columns as named by 'data' below: The first element belongs to (49.875S,179.875W), the second element belongs to (49.625S,179.625W), and the last element belongs to (49.875N,179.875E). import os, glob, gdal, numpy as np fname = '3B42RT.2014010606.7.bin' with open(fname, 'rb') as fi: fi.seek(2880,0) data = np.fromfile(fi,dtype=np.uint16,count=480*1440) data = data.byteswap() data = data.reshape(1440,480) How can I convert this numpy array so that its first element belongs to (49.875N,179.625W), i.e., upper left latitude and longitude respectively; and the last element belong to (49.625S,179.875E), i.e., lower right latitute and longitude respectively. I tried to rotate it, but I do not think it is correct. data = np.rot90(data,1) Have some of you experienced with this type of problem? The binary file I am using is here:ftp://trmmopen.gsfc.nasa.gov/pub/merged/3B42RT/3B42RT.2014010606.7.bin.gz cleo -------------- next part -------------- An HTML attachment was scrubbed... URL: From t.b.poole at gmail.com Sun Aug 24 16:05:45 2014 From: t.b.poole at gmail.com (Tom Poole) Date: Sun, 24 Aug 2014 21:05:45 +0100 Subject: [Numpy-discussion] Weighted Covariance/correlation In-Reply-To: <1408110393.20638.16.camel@sebastian-t440> References: <1408110393.20638.16.camel@sebastian-t440> Message-ID: <196A73F5-AD29-42F0-875B-AAE170FCC6D7@gmail.com> Hi all, Any input to this? Last time it generated a fair bit of discussion, which I?ll summarise here. It?s currently possible to calculate a weighted average using np.average, but the corresponding functionality does not exist for (co)variance or corrcoeff calculations. In this case it?s less straightforward, and we need to worry about what type of information the weights contain. Repeat type weights are the easiest to explain. Here the variances of [x1, x2, x3] with weights [2, 1, 3] and [x1, x1, x2, x3, x3, x3] are identical. For Bessel correction the total number of samples is obtained by summing the weights. These weights do not have to be integer, and in this case the only important assumption is that their sum represents the total sample size. The second type of weights are importances or accuracies. Here the weighs represent the relative strength of contributions from each of the associated samples. Because this is a purely relative relation, there?s no concrete information about the total number of samples. This has to be obtained from the effective sample size, given by (sum(weights)^2)/sum(weights^2). I think the the clearest way of providing both options is to have a boolean switch indicating if the weights represent repeats or frequency type information. I can?t immediately see a good motivation for allowing both concurrently, and think this could cause confusion. Tom On 15 Aug 2014, at 14:46, Sebastian Berg wrote: > Hi all, > > Tom Poole has opened pull request > https://github.com/numpy/numpy/pull/4960 to implement weights into > np.cov (correlation can be added), somewhat picking up the effort > started by Noel Dawe in https://github.com/numpy/numpy/pull/3864. > > The pull request would currently implement an accuracy type `weights` > keyword argument as default, but have a switch `repeat_weights` to use > repeat type weights instead (frequency type are a special case of this I > think). > > As far as I can see, the code is in a state that it can be tested. But > since it is a new feature, the names/defaults are up for discussion, so > maybe someone who might use such a feature has a preference. I know we > had a short discussion about this before, but it was a while ago. For > example another option would be to have the two weights as two keyword > arguments, instead of a boolean switch. > > Regards, > > Sebastian > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From adrian.altenhoff at inf.ethz.ch Mon Aug 25 03:02:16 2014 From: adrian.altenhoff at inf.ethz.ch (Adrian Altenhoff) Date: Mon, 25 Aug 2014 09:02:16 +0200 Subject: [Numpy-discussion] Bug in genfromtxt with usecols and converters Message-ID: <53FADF78.6080507@inf.ethz.ch> Hi, I tried to load data from a csv file into numpy using genfromtxt. I need only a subset of the columns and want to apply some conversions to the data. attached is a minimal script showing the error. In brief, I want to load columns 1,2 and 4. But in the converter function for the 4th column, I get the 3rd value. The issue does not occur if I also load the 3rd column. Did I somehow misunderstand how the function is supposed to work or is this indeed a bug? I'm using python 3.3.1 with numpy 1.8.1 Regards Adrian -------------- next part -------------- A non-text attachment was scrubbed... Name: test.py Type: text/x-python Size: 492 bytes Desc: not available URL: From derek at astro.physik.uni-goettingen.de Tue Aug 26 10:26:32 2014 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Tue, 26 Aug 2014 16:26:32 +0200 Subject: [Numpy-discussion] Bug in genfromtxt with usecols and converters In-Reply-To: <53FADF78.6080507@inf.ethz.ch> References: <53FADF78.6080507@inf.ethz.ch> Message-ID: <8E2101C9-98D0-42FA-94D4-BD1FE530E858@astro.physik.uni-goettingen.de> Hi Adrian, > I tried to load data from a csv file into numpy using genfromtxt. I need > only a subset of the columns and want to apply some conversions to the > data. attached is a minimal script showing the error. > In brief, I want to load columns 1,2 and 4. But in the converter > function for the 4th column, I get the 3rd value. The issue does not > occur if I also load the 3rd column. > Did I somehow misunderstand how the function is supposed to work or is > this indeed a bug? not sure whether to call it a bug; the error seems to arise before reading any actual data (even on reading from an empty string); when genfromtxt is checking the filling_values used to substitute missing or invalid data it is apparently testing on default testing values of 1 or -1 which your conversion scheme does not know about. Although I think it is rather the user?s responsibility to provide valid converters, probably the documentation should at least be updated to make them aware of this requirement. I see two possible fixes/workarounds: provide an keyword argument filling_values=[0,0,'1:1?] or add the default filling values to your relEnum dictionary, e.g. { ? '-1':-1, '1':-1} Could you check if this works for your case? HTH, Derek From adrian.altenhoff at inf.ethz.ch Tue Aug 26 12:21:30 2014 From: adrian.altenhoff at inf.ethz.ch (Adrian Altenhoff) Date: Tue, 26 Aug 2014 18:21:30 +0200 Subject: [Numpy-discussion] Bug in genfromtxt with usecols and converters In-Reply-To: <8E2101C9-98D0-42FA-94D4-BD1FE530E858@astro.physik.uni-goettingen.de> References: <53FADF78.6080507@inf.ethz.ch> <8E2101C9-98D0-42FA-94D4-BD1FE530E858@astro.physik.uni-goettingen.de> Message-ID: <53FCB40A.1040003@inf.ethz.ch> Hi Derek, thanks for your answer. > not sure whether to call it a bug; the error seems to arise before reading any actual data > (even on reading from an empty string); when genfromtxt is checking the filling_values used > to substitute missing or invalid data it is apparently testing on default testing values of 1 or -1 > which your conversion scheme does not know about. Although I think it is rather the user?s > responsibility to provide valid converters, probably the documentation should at least be > updated to make them aware of this requirement. > I see two possible fixes/workarounds: > > provide an keyword argument filling_values=[0,0,'1:1?] This workaround seems to be work, but I doubt that the actual problem is the converter function I pass. The '-1', which is used as the testing value is the first_values from the 3rd column (line 1574 in npyio.py), but the converter is defined for column 4. by setting the filling_values to an array of length 3, this obviously makes the problem disappear. But I think if the first row is used, it should also use the values from the column for which the converter is defined. Best Adrian From derek at astro.physik.uni-goettingen.de Tue Aug 26 12:56:47 2014 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Tue, 26 Aug 2014 18:56:47 +0200 Subject: [Numpy-discussion] Bug in genfromtxt with usecols and converters In-Reply-To: <53FCB40A.1040003@inf.ethz.ch> References: <53FADF78.6080507@inf.ethz.ch> <8E2101C9-98D0-42FA-94D4-BD1FE530E858@astro.physik.uni-goettingen.de> <53FCB40A.1040003@inf.ethz.ch> Message-ID: Hi Adrian, >> not sure whether to call it a bug; the error seems to arise before reading any actual data >> (even on reading from an empty string); when genfromtxt is checking the filling_values used >> to substitute missing or invalid data it is apparently testing on default testing values of 1 or -1 >> which your conversion scheme does not know about. Although I think it is rather the user?s >> responsibility to provide valid converters, probably the documentation should at least be >> updated to make them aware of this requirement. >> I see two possible fixes/workarounds: >> >> provide an keyword argument filling_values=[0,0,'1:1?] > This workaround seems to be work, but I doubt that the actual problem is > the converter function I pass. The '-1', which is used as the testing > value is the first_values from the 3rd column (line 1574 in npyio.py), > but the converter is defined for column 4. by setting the filling_values > to an array of length 3, this obviously makes the problem disappear. But > I think if the first row is used, it should also use the values from the > column for which the converter is defined. it is certainly related to the converter function because a KeyError for the dictionary you provide is raised: File "test.py", line 13, in 3: lambda rel: relEnum[rel.decode()]}) File "/sw/lib/python3.4/site-packages/numpy/lib/npyio.py", line 1581, in genfromtxt missing_values=missing_values[i],) File "/sw/lib/python3.4/site-packages/numpy/lib/_iotools.py", line 784, in update tester = func(testing_value or asbytes('1')) File "test.py", line 13, in 3: lambda rel: relEnum[rel.decode()]}) KeyError: '-1? But you are right that the problem with using the first_values, which should of course be valid, somehow stems from the use of usecols, it seems that in that loop for (i, conv) in user_converters.items(): i in user_converters and in usecols get out of sync. This certainly looks like a bug, the entire way of modifying i inside the loop appears a bit dangerous to me. I?ll have look if I can make this safer. As long as your data don?t actually contain any missing values you might also simply use np.loadtxt. Cheers, Derek From adrian.altenhoff at inf.ethz.ch Tue Aug 26 15:05:55 2014 From: adrian.altenhoff at inf.ethz.ch (Adrian Altenhoff) Date: Tue, 26 Aug 2014 21:05:55 +0200 Subject: [Numpy-discussion] Bug in genfromtxt with usecols and converters In-Reply-To: References: <53FADF78.6080507@inf.ethz.ch> <8E2101C9-98D0-42FA-94D4-BD1FE530E858@astro.physik.uni-goettingen.de> <53FCB40A.1040003@inf.ethz.ch> Message-ID: <53FCDA93.7090700@inf.ethz.ch> Hi Derek, > But you are right that the problem with using the first_values, which should of course be valid, > somehow stems from the use of usecols, it seems that in that loop > > for (i, conv) in user_converters.items(): > > i in user_converters and in usecols get out of sync. This certainly looks like a bug, the entire way of > modifying i inside the loop appears a bit dangerous to me. I?ll have look if I can make this safer. Thanks. > > As long as your data don?t actually contain any missing values you might also simply use np.loadtxt. Ok, wasn't aware of that function so far. I will try that! Best wishes Adrian From scopatz at gmail.com Tue Aug 26 16:27:48 2014 From: scopatz at gmail.com (Anthony Scopatz) Date: Tue, 26 Aug 2014 15:27:48 -0500 Subject: [Numpy-discussion] Return item rather than scalar for user defined types Message-ID: Hello All, Yesterday I opened PR #4889 to solve a problem I have been having w.r.t. xdress and Nathaniel asked me bring the issue up here. The PR itself is quite small (6 lines?) and is easy to review. The opening text of my PR is pasted below because I believe that is a pretty good description of the issue. But briefly, pulling user defined dtypes out of an array do not behave idiomatically because you get a numpy scalar rather than a more representative Python object. For user-defined dtypes - which are typically more complex and possibly stateful than the builtin dtypes, I believe that it makes much more sense to get actual Python representation back a la the getitem() function. In fact, I think that this case also applies to the object dtype. However, changing that usage would likely break downstream code and would be inconsistent with how other builtin types are returned. In future major versions of numpy it would be ideal if the dtypes themselves could flag how they wished to be returned - either as a scalar or as the Python item. Thoughts? Be Well Anthony This updates what is effectively the __getitem__() method. For arrays such that the dtype is a user defined type, you receive the return that dtype's getitem() rather than a numpy scalar of the dtype. This allow the custom type to present a single Python API as well as an associated dtype. It also prevents users from having to subclass ndarray to get the appropriate behaviour. For example, suppose that we have a dtype representing a C++ std::vector and then we had a numpy array of this dtype. From Python, it might look like >>> arrarray([array([0, 0, 0, 0, 0], dtype=int32), array([0, 1, 2, 3, 4], dtype=int32), array([0, 2, 4, 6, 8], dtype=int32)], dtype='xd_vector_int') Without this PR, you'd have to do the following to access the most deeply nested elements: >>> arr.item(2)[4]8 This is because you cannot index a scalar: >>> arr[2][4]IndexError: invalid index to scalar variable With this PR, the idiomatic expression is now allowable because arr[2] is the associated Python type: >>> arr[2][4]8 This is a pretty big deal for xdress which creates many custom dtypes and provided a Python interface into those. See xdress/xdress#265 for what prompted this. Thanks for considering! -------------- next part -------------- An HTML attachment was scrubbed... URL: From derek at astro.physik.uni-goettingen.de Wed Aug 27 08:11:32 2014 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Wed, 27 Aug 2014 14:11:32 +0200 Subject: [Numpy-discussion] Bug in genfromtxt with usecols and converters In-Reply-To: <53FCDA93.7090700@inf.ethz.ch> References: <53FADF78.6080507@inf.ethz.ch> <8E2101C9-98D0-42FA-94D4-BD1FE530E858@astro.physik.uni-goettingen.de> <53FCB40A.1040003@inf.ethz.ch> <53FCDA93.7090700@inf.ethz.ch> Message-ID: <1EB811D8-710F-4997-BEF9-19385AE948D4@astro.physik.uni-goettingen.de> On 26 Aug 2014, at 09:05 pm, Adrian Altenhoff wrote: >> But you are right that the problem with using the first_values, which should of course be valid, >> somehow stems from the use of usecols, it seems that in that loop >> >> for (i, conv) in user_converters.items(): >> >> i in user_converters and in usecols get out of sync. This certainly looks like a bug, the entire way of >> modifying i inside the loop appears a bit dangerous to me. I?ll have look if I can make this safer. > Thanks. >> >> As long as your data don?t actually contain any missing values you might also simply use np.loadtxt. > Ok, wasn't aware of that function so far. I will try that! > It was first_values that needs to be addressed by the original indices. I have created a short test from your case and submitted a fix at https://github.com/numpy/numpy/pull/5006 Cheers, Derek From dphinnstuart at gmail.com Wed Aug 27 11:08:43 2014 From: dphinnstuart at gmail.com (phinn stuart) Date: Thu, 28 Aug 2014 00:08:43 +0900 Subject: [Numpy-discussion] Convert 3d NumPy array into 2d Message-ID: Hi everyone, how can I convert (1L, 480L, 1440L) shaped numpy array into (480L, 1440L)? Thanks in the advance. phinn -------------- next part -------------- An HTML attachment was scrubbed... URL: From Sebastian.Wagner.fl at ait.ac.at Wed Aug 27 11:12:05 2014 From: Sebastian.Wagner.fl at ait.ac.at (Wagner Sebastian) Date: Wed, 27 Aug 2014 15:12:05 +0000 Subject: [Numpy-discussion] Convert 3d NumPy array into 2d In-Reply-To: References: Message-ID: Hi, Our short example-data: >>> np.arange(10).reshape(1,5,2) array([[[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]]) Shape is (1,5,2) Two possibilies: >>> data.reshape(5,2) array([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]) Or just: >>> data[0] array([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]) From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of phinn stuart Sent: Mittwoch, 27. August 2014 17:09 To: python-list at python.org; scipy-user at scipy.org; numpy-discussion at scipy.org Subject: [Numpy-discussion] Convert 3d NumPy array into 2d Hi everyone, how can I convert (1L, 480L, 1440L) shaped numpy array into (480L, 1440L)? Thanks in the advance. phinn -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Wed Aug 27 11:15:54 2014 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 27 Aug 2014 11:15:54 -0400 Subject: [Numpy-discussion] Convert 3d NumPy array into 2d In-Reply-To: References: Message-ID: There is also np.squeeze(), which will eliminate any singleton dimensions (but I personally hate using it because it can accidentally squeeze out dimensions that you didn't intend to squeeze when you have arbitrary input data). Ben Root On Wed, Aug 27, 2014 at 11:12 AM, Wagner Sebastian < Sebastian.Wagner.fl at ait.ac.at> wrote: > Hi, > > > > Our short example-data: > > >>> np.arange(10).reshape(1,5,2) > > array([[[0, 1], > > [2, 3], > > [4, 5], > > [6, 7], > > [8, 9]]]) > > Shape is (1,5,2) > > > > Two possibilies: > > >>> data.reshape(5,2) > > array([[0, 1], > > [2, 3], > > [4, 5], > > [6, 7], > > [8, 9]]) > > > > Or just: > > >>> data[0] > > array([[0, 1], > > [2, 3], > > [4, 5], > > [6, 7], > > [8, 9]]) > > > > > > *From:* numpy-discussion-bounces at scipy.org [mailto: > numpy-discussion-bounces at scipy.org] *On Behalf Of *phinn stuart > *Sent:* Mittwoch, 27. August 2014 17:09 > *To:* python-list at python.org; scipy-user at scipy.org; > numpy-discussion at scipy.org > *Subject:* [Numpy-discussion] Convert 3d NumPy array into 2d > > > > Hi everyone, how can I convert (1L, 480L, 1440L) shaped numpy array into > (480L, 1440L)? > > > > Thanks in the advance. > > > > phinn > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Wed Aug 27 11:16:19 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Wed, 27 Aug 2014 17:16:19 +0200 Subject: [Numpy-discussion] Convert 3d NumPy array into 2d In-Reply-To: References: Message-ID: <53FDF643.4050408@googlemail.com> On 27.08.2014 17:08, phinn stuart wrote: > Hi everyone, how can I convert (1L, 480L, 1440L) shaped numpy array into > (480L, 1440L)? > > Thanks in the advance. np.squeeze removes empty dimensions: In [2]: np.squeeze(np.ones((1,23,232))).shape Out[2]: (23, 232) From sebastian at sipsolutions.net Wed Aug 27 11:16:35 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 27 Aug 2014 16:16:35 +0100 Subject: [Numpy-discussion] Convert 3d NumPy array into 2d In-Reply-To: References: Message-ID: <1409152595.13186.0.camel@sebastian-t440> On Do, 2014-08-28 at 00:08 +0900, phinn stuart wrote: > Hi everyone, how can I convert (1L, 480L, 1440L) shaped numpy array > into (480L, 1440L)? > Just slice it arr[0, ...] will do the trick. If you are daring, np.squeeze also works, or of course np.reshape. - Sebastian > > Thanks in the advance. > > phinn > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From jaime.frio at gmail.com Wed Aug 27 12:44:58 2014 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Wed, 27 Aug 2014 09:44:58 -0700 Subject: [Numpy-discussion] Should concatenate broadcast shapes? Message-ID: After reading this stackoverflow question: http://stackoverflow.com/questions/25530223/append-a-list-at-the-end-of-each-row-of-2d-array I was reminded that the `np.concatenate` family of functions do not broadcast the shapes of their inputs: >>> import numpy as np >>> a = np.arange(6).reshape(3, 2) >>> b = np.arange(6, 8) >>> np.concatenate((a, b), axis=1) Traceback (most recent call last): File "", line 1, in ValueError: all the input arrays must have same number of dimensions >>> np.concatenate((a, b[None]), axis=1) Traceback (most recent call last): File "", line 1, in ValueError: all the input array dimensions except for the concatenation axis must match exactly >>> np.concatenate((a, np.tile(b[None], (a.shape[0], 1))), axis=1) array([[0, 1, 6, 7], [2, 3, 6, 7], [4, 5, 6, 7]]) But there doesn't seem to be any fundamental reason why they shouldn't: >>> from numpy.lib.stride_tricks import as_strided >>> b_ = as_strided(b, (a.shape[0],)+b.shape, (0,)+b.strides) >>> np.concatenate((a, b_), axis=1) array([[0, 1, 6, 7], [2, 3, 6, 7], [4, 5, 6, 7]]) Is there any fundamental interface design reason why things are the way they are? Or is it simply that no one has implemented broadcasting for these functions? Without thinking much about it, I am +1 on doing this... At the least, it would probably be good to add a note to the docs explaining why broadcasting is not implemented. Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Aug 27 13:01:15 2014 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 27 Aug 2014 18:01:15 +0100 Subject: [Numpy-discussion] Should concatenate broadcast shapes? In-Reply-To: References: Message-ID: On Wed, Aug 27, 2014 at 5:44 PM, Jaime Fern?ndez del R?o wrote: > After reading this stackoverflow question: > > http://stackoverflow.com/questions/25530223/append-a-list-at-the-end-of-each-row-of-2d-array > > I was reminded that the `np.concatenate` family of functions do not > broadcast the shapes of their inputs: > >>>> import numpy as np >>>> a = np.arange(6).reshape(3, 2) >>>> b = np.arange(6, 8) >>>> np.concatenate((a, b), axis=1) > Traceback (most recent call last): > File "", line 1, in > ValueError: all the input arrays must have same number of dimensions >>>> np.concatenate((a, b[None]), axis=1) > Traceback (most recent call last): > File "", line 1, in > ValueError: all the input array dimensions except for the concatenation axis > must match exactly >>>> np.concatenate((a, np.tile(b[None], (a.shape[0], 1))), axis=1) > array([[0, 1, 6, 7], > [2, 3, 6, 7], > [4, 5, 6, 7]]) In my experience, when I get that ValueError, it has usually been a legitimate error on my part and broadcasting would not have accomplished what I wanted. Typically, I forgot to transpose something. If we allowed broadcasting, my most common source of errors using these functions would silently do something unintended. a = np.arange(6).reshape(3, 2) b = np.arange(6, 9) # b.shape == (3,) # I *intend* to append b as a new column, but forget to make b.shape==(3,1) c = np.hstack([a, b]) # If hstack() doesn't broadcast, that will fail and show me my error. # If it does broadcast, it "succeeds" but gives me something I didn't want: array([[0, 1, 6, 7, 8], [2, 3, 6, 7, 8], [4, 5, 6, 7, 8]]) -- Robert Kern From jaime.frio at gmail.com Wed Aug 27 13:02:52 2014 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Wed, 27 Aug 2014 10:02:52 -0700 Subject: [Numpy-discussion] Does a `mergesorted` function make sense? Message-ID: A request was open in github to add a `merge` function to numpy that would merge two sorted 1d arrays into a single sorted 1d array. I have been playing around with that idea for a while, and have a branch in my numpy fork that adds a `mergesorted` function to `numpy.lib`: https://github.com/jaimefrio/numpy/commit/ce5d480afecc989a36e5d2bf4ea1d1ba58a83b0a I drew inspiration from C++ STL algorithms, and merged into a single function what merge, set_union, set_intersection, set_difference and set_symmetric_difference do there. My first thought when implementing this was to not make it a public function, but use it under the hood to speed-up some of the functions of `arraysetops.py`, which are now merging two already sorted functions by doing `np.sort(np.concatenate((a, b)))`. I would need to revisit my testing, but the speed-ups weren't that great. One other thing I saw value in for some of the `arraysetops.py` functions, but couldn't fully figure out, was in providing extra output aside from the merged arrays, either in the form of indices, or of boolean masks, indicating which items of the original arrays made it into the merged one, and/or where did they end up in it. Since there is at least one other person out there that likes it, is there any more interest in such a function? If yes, any comments on what the proper interface for extra output should be? Although perhaps the best is to leave that out for starters and see what use people make of it, if any. Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Wed Aug 27 13:07:24 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Wed, 27 Aug 2014 19:07:24 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.9.0 release candidate 1 available Message-ID: <53FE104C.2020006@googlemail.com> Hello, Almost punctually for EuroScipy we have finally managed to release the first release candidate of NumPy 1.9. We intend to only fix bugs until the final release which we plan to do in the next 1-2 weeks. In this release numerous performance improvements have been added, most significantly the indexing code has been rewritten be several times faster for most cases and performance of using small arrays and scalars has almost doubled. Plenty of other functions have been improved too, nonzero, where, count_nonzero, floating point min/max, boolean argmin/argmax, searchsorted, triu/tril, masked sorting can be expected to perform significantly better in many cases. Also NumPy now releases the GIL for more functions, most notably the indexing now releases it and the random modules state object has a private lock instead of using the GIL. This allows leveraging pure python threads more efficiently. In order to make working with arrays containing NaN values easier nanmedian and nanpercentile have been added which ignore these values. These functions and the regular median and percentile now also support generalized axis arguments that ufuncs already have, these allow reducing along multiple axis in one call. Please see the release notes for all the details. Please also take not of the many small compatibility notes and deprecation in the notes. https://github.com/numpy/numpy/blob/maintenance/1.9.x/doc/release/1.9.0-notes.rst The source tarballs and win32 binaries can be downloaded here: https://sourceforge.net/projects/numpy/files/NumPy/1.9.0rc1 Cheers, Julian Taylor -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From jaime.frio at gmail.com Wed Aug 27 13:12:59 2014 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Wed, 27 Aug 2014 10:12:59 -0700 Subject: [Numpy-discussion] Should concatenate broadcast shapes? In-Reply-To: References: Message-ID: On Wed, Aug 27, 2014 at 10:01 AM, Robert Kern wrote: > On Wed, Aug 27, 2014 at 5:44 PM, Jaime Fern?ndez del R?o > wrote: > > After reading this stackoverflow question: > > > > > http://stackoverflow.com/questions/25530223/append-a-list-at-the-end-of-each-row-of-2d-array > > > > I was reminded that the `np.concatenate` family of functions do not > > broadcast the shapes of their inputs: > > > >>>> import numpy as np > >>>> a = np.arange(6).reshape(3, 2) > >>>> b = np.arange(6, 8) > >>>> np.concatenate((a, b), axis=1) > > Traceback (most recent call last): > > File "", line 1, in > > ValueError: all the input arrays must have same number of dimensions > >>>> np.concatenate((a, b[None]), axis=1) > > Traceback (most recent call last): > > File "", line 1, in > > ValueError: all the input array dimensions except for the concatenation > axis > > must match exactly > >>>> np.concatenate((a, np.tile(b[None], (a.shape[0], 1))), axis=1) > > array([[0, 1, 6, 7], > > [2, 3, 6, 7], > > [4, 5, 6, 7]]) > > In my experience, when I get that ValueError, it has usually been a > legitimate error on my part and broadcasting would not have > accomplished what I wanted. Typically, I forgot to transpose > something. If we allowed broadcasting, my most common source of errors > using these functions would silently do something unintended. > That makes sense, I kind of figured there had to be a reason. So though it may be beating a dead horse, perhaps adding a `broadcast=False` argument to the function would do the trick? No side effects unless you ask for them, in which case you had it coming... Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Wed Aug 27 13:35:40 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Wed, 27 Aug 2014 19:35:40 +0200 Subject: [Numpy-discussion] Does a `mergesorted` function make sense? In-Reply-To: References: Message-ID: It wouldn't hurt to have this function, but my intuition is that its use will be minimal. If you are already working with sorted arrays, you already have a flop cost on that order of magnitude, and the optimized merge saves you a factor two at the very most. Using numpy means you are sacrificing factors of two and beyond relative to pure C left right and center anyway, so if this kind of thing matters to you, you probably wont be working in numpy in the first place. That said, I share your interest in overhauling arraysetops. I see many opportunities for expanding its functionality. There is a question that amounts to 'how do I do group-by in numpy' on stackoverflow almost every week. That would have my top priority, but also things like extending np.unique to things like graph edges, or other more complex input, is very often useful to me. Ive written up a draft a while ago which accomplishes all of the above and more. It reimplements functions like np.unique around a common Index object. This index object encapsulates the precomputation (sorting) required for efficient set-ops on different datatypes, and provides a common interface to obtain the kind of information you are talking about (which is used extensively internally in the implementation of group_by, for instance). ie, this functionality allows you to write neat things like group_by(randint(0,9,(100,2))).median(rand(100)) But I have the feeling much more could be done in this direction, and I feel this draft could really use a bit of back and forth. If we are going to completely rewrite arraysetops, we might as well do it right. On Wed, Aug 27, 2014 at 7:02 PM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > A request was open in github to add a `merge` function to numpy that would > merge two sorted 1d arrays into a single sorted 1d array. I have been > playing around with that idea for a while, and have a branch in my numpy > fork that adds a `mergesorted` function to `numpy.lib`: > > > https://github.com/jaimefrio/numpy/commit/ce5d480afecc989a36e5d2bf4ea1d1ba58a83b0a > > I drew inspiration from C++ STL algorithms, and merged into a single > function what merge, set_union, set_intersection, set_difference and > set_symmetric_difference do there. > > My first thought when implementing this was to not make it a public > function, but use it under the hood to speed-up some of the functions of > `arraysetops.py`, which are now merging two already sorted functions by > doing `np.sort(np.concatenate((a, b)))`. I would need to revisit my > testing, but the speed-ups weren't that great. > > One other thing I saw value in for some of the `arraysetops.py` functions, > but couldn't fully figure out, was in providing extra output aside from the > merged arrays, either in the form of indices, or of boolean masks, > indicating which items of the original arrays made it into the merged one, > and/or where did they end up in it. > > Since there is at least one other person out there that likes it, is there > any more interest in such a function? If yes, any comments on what the > proper interface for extra output should be? Although perhaps the best is > to leave that out for starters and see what use people make of it, if any. > > Jaime > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes > de dominaci?n mundial. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Wed Aug 27 14:29:27 2014 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Wed, 27 Aug 2014 11:29:27 -0700 Subject: [Numpy-discussion] Does a `mergesorted` function make sense? In-Reply-To: References: Message-ID: Hi Eelco, I took a deeper look into your code a couple of weeks back. I don't think I have fully grasped what it allows completely, but I agree that some form of what you have there is highly desirable. Along the same lines, for sometime I have been thinking that the right place for a `groupby` in numpy is as a method of ufuncs, so that `np.add.groupby(arr, groups)` would do a multidimensional version of `np.bincount(groups, weights=arr)`. You would then need a more powerful version of `np.unique` to produce the `groups`, but that is something that Joe Kington's old PR was very close to achieving, that should probably be resurrected as well. But yes, there seems to be material for a NEP here, and some guidance from one of the numpy devs would be helpful in getting this somewhere. Jaime On Wed, Aug 27, 2014 at 10:35 AM, Eelco Hoogendoorn < hoogendoorn.eelco at gmail.com> wrote: > It wouldn't hurt to have this function, but my intuition is that its use > will be minimal. If you are already working with sorted arrays, you already > have a flop cost on that order of magnitude, and the optimized merge saves > you a factor two at the very most. Using numpy means you are sacrificing > factors of two and beyond relative to pure C left right and center anyway, > so if this kind of thing matters to you, you probably wont be working in > numpy in the first place. > > That said, I share your interest in overhauling arraysetops. I see many > opportunities for expanding its functionality. There is a question that > amounts to 'how do I do group-by in numpy' on stackoverflow almost every > week. That would have my top priority, but also things like extending > np.unique to things like graph edges, or other more complex input, is very > often useful to me. > > Ive written up a draft a while ago which > accomplishes all of the above and more. It reimplements functions like > np.unique around a common Index object. This index object encapsulates the > precomputation (sorting) required for efficient set-ops on different > datatypes, and provides a common interface to obtain the kind of > information you are talking about (which is used extensively internally in > the implementation of group_by, for instance). > > ie, this functionality allows you to write neat things like > group_by(randint(0,9,(100,2))).median(rand(100)) > > But I have the feeling much more could be done in this direction, and I > feel this draft could really use a bit of back and forth. If we are going > to completely rewrite arraysetops, we might as well do it right. > > > On Wed, Aug 27, 2014 at 7:02 PM, Jaime Fern?ndez del R?o < > jaime.frio at gmail.com> wrote: > >> A request was open in github to add a `merge` function to numpy that >> would merge two sorted 1d arrays into a single sorted 1d array. I have been >> playing around with that idea for a while, and have a branch in my numpy >> fork that adds a `mergesorted` function to `numpy.lib`: >> >> >> https://github.com/jaimefrio/numpy/commit/ce5d480afecc989a36e5d2bf4ea1d1ba58a83b0a >> >> I drew inspiration from C++ STL algorithms, and merged into a single >> function what merge, set_union, set_intersection, set_difference and >> set_symmetric_difference do there. >> >> My first thought when implementing this was to not make it a public >> function, but use it under the hood to speed-up some of the functions of >> `arraysetops.py`, which are now merging two already sorted functions by >> doing `np.sort(np.concatenate((a, b)))`. I would need to revisit my >> testing, but the speed-ups weren't that great. >> >> One other thing I saw value in for some of the `arraysetops.py` >> functions, but couldn't fully figure out, was in providing extra output >> aside from the merged arrays, either in the form of indices, or of boolean >> masks, indicating which items of the original arrays made it into the >> merged one, and/or where did they end up in it. >> >> Since there is at least one other person out there that likes it, is >> there any more interest in such a function? If yes, any comments on what >> the proper interface for extra output should be? Although perhaps the best >> is to leave that out for starters and see what use people make of it, if >> any. >> >> Jaime >> >> -- >> (\__/) >> ( O.o) >> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes >> de dominaci?n mundial. >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Wed Aug 27 15:27:06 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Wed, 27 Aug 2014 21:27:06 +0200 Subject: [Numpy-discussion] Does a `mergesorted` function make sense? In-Reply-To: References: Message-ID: If I understand you correctly, the current implementation supports these operations. All reductions over groups (except for median) are performed through the corresponding ufunc (see GroupBy.reduce). This works on multidimensional arrays as well, although this broadcasting over the non-grouping axes is accomplished using np.vectorize. Actual vectorization only happens over the axis being grouped over, but this is usually a long axis. If it isn't, it is more efficient to perform a reduction by means of splitting the array by its groups first, and then map the iterable of groups over some reduction operation (as noted in the docstring of GroupBy.reduce). On Wed, Aug 27, 2014 at 8:29 PM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > Hi Eelco, > > I took a deeper look into your code a couple of weeks back. I don't think > I have fully grasped what it allows completely, but I agree that some form > of what you have there is highly desirable. Along the same lines, for > sometime I have been thinking that the right place for a `groupby` in numpy > is as a method of ufuncs, so that `np.add.groupby(arr, groups)` would do a > multidimensional version of `np.bincount(groups, weights=arr)`. You would > then need a more powerful version of `np.unique` to produce the `groups`, > but that is something that Joe Kington's old PR was very close to > achieving, that should probably be resurrected as well. But yes, there > seems to be material for a NEP here, and some guidance from one of the > numpy devs would be helpful in getting this somewhere. > > Jaime > > > On Wed, Aug 27, 2014 at 10:35 AM, Eelco Hoogendoorn < > hoogendoorn.eelco at gmail.com> wrote: > >> It wouldn't hurt to have this function, but my intuition is that its use >> will be minimal. If you are already working with sorted arrays, you already >> have a flop cost on that order of magnitude, and the optimized merge saves >> you a factor two at the very most. Using numpy means you are sacrificing >> factors of two and beyond relative to pure C left right and center anyway, >> so if this kind of thing matters to you, you probably wont be working in >> numpy in the first place. >> >> That said, I share your interest in overhauling arraysetops. I see many >> opportunities for expanding its functionality. There is a question that >> amounts to 'how do I do group-by in numpy' on stackoverflow almost every >> week. That would have my top priority, but also things like extending >> np.unique to things like graph edges, or other more complex input, is very >> often useful to me. >> >> Ive written up a draft a while ago which >> accomplishes all of the above and more. It reimplements functions like >> np.unique around a common Index object. This index object encapsulates the >> precomputation (sorting) required for efficient set-ops on different >> datatypes, and provides a common interface to obtain the kind of >> information you are talking about (which is used extensively internally in >> the implementation of group_by, for instance). >> >> ie, this functionality allows you to write neat things like >> group_by(randint(0,9,(100,2))).median(rand(100)) >> >> But I have the feeling much more could be done in this direction, and I >> feel this draft could really use a bit of back and forth. If we are going >> to completely rewrite arraysetops, we might as well do it right. >> >> >> On Wed, Aug 27, 2014 at 7:02 PM, Jaime Fern?ndez del R?o < >> jaime.frio at gmail.com> wrote: >> >>> A request was open in github to add a `merge` function to numpy that >>> would merge two sorted 1d arrays into a single sorted 1d array. I have been >>> playing around with that idea for a while, and have a branch in my numpy >>> fork that adds a `mergesorted` function to `numpy.lib`: >>> >>> >>> https://github.com/jaimefrio/numpy/commit/ce5d480afecc989a36e5d2bf4ea1d1ba58a83b0a >>> >>> I drew inspiration from C++ STL algorithms, and merged into a single >>> function what merge, set_union, set_intersection, set_difference and >>> set_symmetric_difference do there. >>> >>> My first thought when implementing this was to not make it a public >>> function, but use it under the hood to speed-up some of the functions of >>> `arraysetops.py`, which are now merging two already sorted functions by >>> doing `np.sort(np.concatenate((a, b)))`. I would need to revisit my >>> testing, but the speed-ups weren't that great. >>> >>> One other thing I saw value in for some of the `arraysetops.py` >>> functions, but couldn't fully figure out, was in providing extra output >>> aside from the merged arrays, either in the form of indices, or of boolean >>> masks, indicating which items of the original arrays made it into the >>> merged one, and/or where did they end up in it. >>> >>> Since there is at least one other person out there that likes it, is >>> there any more interest in such a function? If yes, any comments on what >>> the proper interface for extra output should be? Although perhaps the best >>> is to leave that out for starters and see what use people make of it, if >>> any. >>> >>> Jaime >>> >>> -- >>> (\__/) >>> ( O.o) >>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus >>> planes de dominaci?n mundial. >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes > de dominaci?n mundial. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Wed Aug 27 15:29:49 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Wed, 27 Aug 2014 21:29:49 +0200 Subject: [Numpy-discussion] Does a `mergesorted` function make sense? In-Reply-To: References: Message-ID: f.i., this works as expected as well (100 keys of 1d int arrays and 100 values of 1d float arrays): group_by(randint(0,4,(100,2))).mean(rand(100,2)) On Wed, Aug 27, 2014 at 9:27 PM, Eelco Hoogendoorn < hoogendoorn.eelco at gmail.com> wrote: > If I understand you correctly, the current implementation supports these > operations. All reductions over groups (except for median) are performed > through the corresponding ufunc (see GroupBy.reduce). This works on > multidimensional arrays as well, although this broadcasting over the > non-grouping axes is accomplished using np.vectorize. Actual vectorization > only happens over the axis being grouped over, but this is usually a long > axis. If it isn't, it is more efficient to perform a reduction by means of > splitting the array by its groups first, and then map the iterable of > groups over some reduction operation (as noted in the docstring of > GroupBy.reduce). > > > On Wed, Aug 27, 2014 at 8:29 PM, Jaime Fern?ndez del R?o < > jaime.frio at gmail.com> wrote: > >> Hi Eelco, >> >> I took a deeper look into your code a couple of weeks back. I don't think >> I have fully grasped what it allows completely, but I agree that some form >> of what you have there is highly desirable. Along the same lines, for >> sometime I have been thinking that the right place for a `groupby` in numpy >> is as a method of ufuncs, so that `np.add.groupby(arr, groups)` would do a >> multidimensional version of `np.bincount(groups, weights=arr)`. You would >> then need a more powerful version of `np.unique` to produce the `groups`, >> but that is something that Joe Kington's old PR was very close to >> achieving, that should probably be resurrected as well. But yes, there >> seems to be material for a NEP here, and some guidance from one of the >> numpy devs would be helpful in getting this somewhere. >> >> Jaime >> >> >> On Wed, Aug 27, 2014 at 10:35 AM, Eelco Hoogendoorn < >> hoogendoorn.eelco at gmail.com> wrote: >> >>> It wouldn't hurt to have this function, but my intuition is that its use >>> will be minimal. If you are already working with sorted arrays, you already >>> have a flop cost on that order of magnitude, and the optimized merge saves >>> you a factor two at the very most. Using numpy means you are sacrificing >>> factors of two and beyond relative to pure C left right and center anyway, >>> so if this kind of thing matters to you, you probably wont be working in >>> numpy in the first place. >>> >>> That said, I share your interest in overhauling arraysetops. I see many >>> opportunities for expanding its functionality. There is a question that >>> amounts to 'how do I do group-by in numpy' on stackoverflow almost every >>> week. That would have my top priority, but also things like extending >>> np.unique to things like graph edges, or other more complex input, is very >>> often useful to me. >>> >>> Ive written up a draft a while ago which >>> accomplishes all of the above and more. It reimplements functions like >>> np.unique around a common Index object. This index object encapsulates the >>> precomputation (sorting) required for efficient set-ops on different >>> datatypes, and provides a common interface to obtain the kind of >>> information you are talking about (which is used extensively internally in >>> the implementation of group_by, for instance). >>> >>> ie, this functionality allows you to write neat things like >>> group_by(randint(0,9,(100,2))).median(rand(100)) >>> >>> But I have the feeling much more could be done in this direction, and I >>> feel this draft could really use a bit of back and forth. If we are going >>> to completely rewrite arraysetops, we might as well do it right. >>> >>> >>> On Wed, Aug 27, 2014 at 7:02 PM, Jaime Fern?ndez del R?o < >>> jaime.frio at gmail.com> wrote: >>> >>>> A request was open in github to add a `merge` function to numpy that >>>> would merge two sorted 1d arrays into a single sorted 1d array. I have been >>>> playing around with that idea for a while, and have a branch in my numpy >>>> fork that adds a `mergesorted` function to `numpy.lib`: >>>> >>>> >>>> https://github.com/jaimefrio/numpy/commit/ce5d480afecc989a36e5d2bf4ea1d1ba58a83b0a >>>> >>>> I drew inspiration from C++ STL algorithms, and merged into a single >>>> function what merge, set_union, set_intersection, set_difference and >>>> set_symmetric_difference do there. >>>> >>>> My first thought when implementing this was to not make it a public >>>> function, but use it under the hood to speed-up some of the functions of >>>> `arraysetops.py`, which are now merging two already sorted functions by >>>> doing `np.sort(np.concatenate((a, b)))`. I would need to revisit my >>>> testing, but the speed-ups weren't that great. >>>> >>>> One other thing I saw value in for some of the `arraysetops.py` >>>> functions, but couldn't fully figure out, was in providing extra output >>>> aside from the merged arrays, either in the form of indices, or of boolean >>>> masks, indicating which items of the original arrays made it into the >>>> merged one, and/or where did they end up in it. >>>> >>>> Since there is at least one other person out there that likes it, is >>>> there any more interest in such a function? If yes, any comments on what >>>> the proper interface for extra output should be? Although perhaps the best >>>> is to leave that out for starters and see what use people make of it, if >>>> any. >>>> >>>> Jaime >>>> >>>> -- >>>> (\__/) >>>> ( O.o) >>>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus >>>> planes de dominaci?n mundial. >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> >> -- >> (\__/) >> ( O.o) >> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes >> de dominaci?n mundial. >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Wed Aug 27 15:38:47 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Wed, 27 Aug 2014 21:38:47 +0200 Subject: [Numpy-discussion] Does a `mergesorted` function make sense? In-Reply-To: References: Message-ID: i.e, if the grouped axis is small but the other axes are not, you could write this, which avoids the python loop over the long axis that np.vectorize would otherwise perform. import numpy as np from grouping import group_by keys = np.random.randint(0,4,10) values = np.random.rand(10,2000) for k,g in zip(*group_by(keys)(values)): print k, g.mean(0) On Wed, Aug 27, 2014 at 9:29 PM, Eelco Hoogendoorn < hoogendoorn.eelco at gmail.com> wrote: > f.i., this works as expected as well (100 keys of 1d int arrays and 100 > values of 1d float arrays): > > group_by(randint(0,4,(100,2))).mean(rand(100,2)) > > > On Wed, Aug 27, 2014 at 9:27 PM, Eelco Hoogendoorn < > hoogendoorn.eelco at gmail.com> wrote: > >> If I understand you correctly, the current implementation supports these >> operations. All reductions over groups (except for median) are performed >> through the corresponding ufunc (see GroupBy.reduce). This works on >> multidimensional arrays as well, although this broadcasting over the >> non-grouping axes is accomplished using np.vectorize. Actual vectorization >> only happens over the axis being grouped over, but this is usually a long >> axis. If it isn't, it is more efficient to perform a reduction by means of >> splitting the array by its groups first, and then map the iterable of >> groups over some reduction operation (as noted in the docstring of >> GroupBy.reduce). >> >> >> On Wed, Aug 27, 2014 at 8:29 PM, Jaime Fern?ndez del R?o < >> jaime.frio at gmail.com> wrote: >> >>> Hi Eelco, >>> >>> I took a deeper look into your code a couple of weeks back. I don't >>> think I have fully grasped what it allows completely, but I agree that some >>> form of what you have there is highly desirable. Along the same lines, for >>> sometime I have been thinking that the right place for a `groupby` in numpy >>> is as a method of ufuncs, so that `np.add.groupby(arr, groups)` would do a >>> multidimensional version of `np.bincount(groups, weights=arr)`. You would >>> then need a more powerful version of `np.unique` to produce the `groups`, >>> but that is something that Joe Kington's old PR was very close to >>> achieving, that should probably be resurrected as well. But yes, there >>> seems to be material for a NEP here, and some guidance from one of the >>> numpy devs would be helpful in getting this somewhere. >>> >>> Jaime >>> >>> >>> On Wed, Aug 27, 2014 at 10:35 AM, Eelco Hoogendoorn < >>> hoogendoorn.eelco at gmail.com> wrote: >>> >>>> It wouldn't hurt to have this function, but my intuition is that its >>>> use will be minimal. If you are already working with sorted arrays, you >>>> already have a flop cost on that order of magnitude, and the optimized >>>> merge saves you a factor two at the very most. Using numpy means you are >>>> sacrificing factors of two and beyond relative to pure C left right and >>>> center anyway, so if this kind of thing matters to you, you probably wont >>>> be working in numpy in the first place. >>>> >>>> That said, I share your interest in overhauling arraysetops. I see many >>>> opportunities for expanding its functionality. There is a question that >>>> amounts to 'how do I do group-by in numpy' on stackoverflow almost every >>>> week. That would have my top priority, but also things like extending >>>> np.unique to things like graph edges, or other more complex input, is very >>>> often useful to me. >>>> >>>> Ive written up a draft a while ago which >>>> accomplishes all of the above and more. It reimplements functions like >>>> np.unique around a common Index object. This index object encapsulates the >>>> precomputation (sorting) required for efficient set-ops on different >>>> datatypes, and provides a common interface to obtain the kind of >>>> information you are talking about (which is used extensively internally in >>>> the implementation of group_by, for instance). >>>> >>>> ie, this functionality allows you to write neat things like >>>> group_by(randint(0,9,(100,2))).median(rand(100)) >>>> >>>> But I have the feeling much more could be done in this direction, and I >>>> feel this draft could really use a bit of back and forth. If we are going >>>> to completely rewrite arraysetops, we might as well do it right. >>>> >>>> >>>> On Wed, Aug 27, 2014 at 7:02 PM, Jaime Fern?ndez del R?o < >>>> jaime.frio at gmail.com> wrote: >>>> >>>>> A request was open in github to add a `merge` function to numpy that >>>>> would merge two sorted 1d arrays into a single sorted 1d array. I have been >>>>> playing around with that idea for a while, and have a branch in my numpy >>>>> fork that adds a `mergesorted` function to `numpy.lib`: >>>>> >>>>> >>>>> https://github.com/jaimefrio/numpy/commit/ce5d480afecc989a36e5d2bf4ea1d1ba58a83b0a >>>>> >>>>> I drew inspiration from C++ STL algorithms, and merged into a single >>>>> function what merge, set_union, set_intersection, set_difference and >>>>> set_symmetric_difference do there. >>>>> >>>>> My first thought when implementing this was to not make it a public >>>>> function, but use it under the hood to speed-up some of the functions of >>>>> `arraysetops.py`, which are now merging two already sorted functions by >>>>> doing `np.sort(np.concatenate((a, b)))`. I would need to revisit my >>>>> testing, but the speed-ups weren't that great. >>>>> >>>>> One other thing I saw value in for some of the `arraysetops.py` >>>>> functions, but couldn't fully figure out, was in providing extra output >>>>> aside from the merged arrays, either in the form of indices, or of boolean >>>>> masks, indicating which items of the original arrays made it into the >>>>> merged one, and/or where did they end up in it. >>>>> >>>>> Since there is at least one other person out there that likes it, is >>>>> there any more interest in such a function? If yes, any comments on what >>>>> the proper interface for extra output should be? Although perhaps the best >>>>> is to leave that out for starters and see what use people make of it, if >>>>> any. >>>>> >>>>> Jaime >>>>> >>>>> -- >>>>> (\__/) >>>>> ( O.o) >>>>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus >>>>> planes de dominaci?n mundial. >>>>> >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> >>> >>> -- >>> (\__/) >>> ( O.o) >>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus >>> planes de dominaci?n mundial. >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Wed Aug 27 17:32:37 2014 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Wed, 27 Aug 2014 14:32:37 -0700 Subject: [Numpy-discussion] Does a `mergesorted` function make sense? In-Reply-To: References: Message-ID: Yes, I was aware of that. But the point would be to provide true vectorization on those operations. The way I see it, numpy may not have to have a GroupBy implementation, but it should at least enable implementing one that is fast and efficient over any axis. On Wed, Aug 27, 2014 at 12:38 PM, Eelco Hoogendoorn < hoogendoorn.eelco at gmail.com> wrote: > i.e, if the grouped axis is small but the other axes are not, you could > write this, which avoids the python loop over the long axis that > np.vectorize would otherwise perform. > > import numpy as np > from grouping import group_by > keys = np.random.randint(0,4,10) > values = np.random.rand(10,2000) > for k,g in zip(*group_by(keys)(values)): > print k, g.mean(0) > > > > > On Wed, Aug 27, 2014 at 9:29 PM, Eelco Hoogendoorn < > hoogendoorn.eelco at gmail.com> wrote: > >> f.i., this works as expected as well (100 keys of 1d int arrays and 100 >> values of 1d float arrays): >> >> group_by(randint(0,4,(100,2))).mean(rand(100,2)) >> >> >> On Wed, Aug 27, 2014 at 9:27 PM, Eelco Hoogendoorn < >> hoogendoorn.eelco at gmail.com> wrote: >> >>> If I understand you correctly, the current implementation supports these >>> operations. All reductions over groups (except for median) are performed >>> through the corresponding ufunc (see GroupBy.reduce). This works on >>> multidimensional arrays as well, although this broadcasting over the >>> non-grouping axes is accomplished using np.vectorize. Actual vectorization >>> only happens over the axis being grouped over, but this is usually a long >>> axis. If it isn't, it is more efficient to perform a reduction by means of >>> splitting the array by its groups first, and then map the iterable of >>> groups over some reduction operation (as noted in the docstring of >>> GroupBy.reduce). >>> >>> >>> On Wed, Aug 27, 2014 at 8:29 PM, Jaime Fern?ndez del R?o < >>> jaime.frio at gmail.com> wrote: >>> >>>> Hi Eelco, >>>> >>>> I took a deeper look into your code a couple of weeks back. I don't >>>> think I have fully grasped what it allows completely, but I agree that some >>>> form of what you have there is highly desirable. Along the same lines, for >>>> sometime I have been thinking that the right place for a `groupby` in numpy >>>> is as a method of ufuncs, so that `np.add.groupby(arr, groups)` would do a >>>> multidimensional version of `np.bincount(groups, weights=arr)`. You would >>>> then need a more powerful version of `np.unique` to produce the `groups`, >>>> but that is something that Joe Kington's old PR was very close to >>>> achieving, that should probably be resurrected as well. But yes, there >>>> seems to be material for a NEP here, and some guidance from one of the >>>> numpy devs would be helpful in getting this somewhere. >>>> >>>> Jaime >>>> >>>> >>>> On Wed, Aug 27, 2014 at 10:35 AM, Eelco Hoogendoorn < >>>> hoogendoorn.eelco at gmail.com> wrote: >>>> >>>>> It wouldn't hurt to have this function, but my intuition is that its >>>>> use will be minimal. If you are already working with sorted arrays, you >>>>> already have a flop cost on that order of magnitude, and the optimized >>>>> merge saves you a factor two at the very most. Using numpy means you are >>>>> sacrificing factors of two and beyond relative to pure C left right and >>>>> center anyway, so if this kind of thing matters to you, you probably wont >>>>> be working in numpy in the first place. >>>>> >>>>> That said, I share your interest in overhauling arraysetops. I see >>>>> many opportunities for expanding its functionality. There is a question >>>>> that amounts to 'how do I do group-by in numpy' on stackoverflow almost >>>>> every week. That would have my top priority, but also things like extending >>>>> np.unique to things like graph edges, or other more complex input, is very >>>>> often useful to me. >>>>> >>>>> Ive written up a draft a while ago >>>>> which accomplishes all of the above and more. It reimplements functions >>>>> like np.unique around a common Index object. This index object encapsulates >>>>> the precomputation (sorting) required for efficient set-ops on different >>>>> datatypes, and provides a common interface to obtain the kind of >>>>> information you are talking about (which is used extensively internally in >>>>> the implementation of group_by, for instance). >>>>> >>>>> ie, this functionality allows you to write neat things like >>>>> group_by(randint(0,9,(100,2))).median(rand(100)) >>>>> >>>>> But I have the feeling much more could be done in this direction, and >>>>> I feel this draft could really use a bit of back and forth. If we are going >>>>> to completely rewrite arraysetops, we might as well do it right. >>>>> >>>>> >>>>> On Wed, Aug 27, 2014 at 7:02 PM, Jaime Fern?ndez del R?o < >>>>> jaime.frio at gmail.com> wrote: >>>>> >>>>>> A request was open in github to add a `merge` function to numpy that >>>>>> would merge two sorted 1d arrays into a single sorted 1d array. I have been >>>>>> playing around with that idea for a while, and have a branch in my numpy >>>>>> fork that adds a `mergesorted` function to `numpy.lib`: >>>>>> >>>>>> >>>>>> https://github.com/jaimefrio/numpy/commit/ce5d480afecc989a36e5d2bf4ea1d1ba58a83b0a >>>>>> >>>>>> I drew inspiration from C++ STL algorithms, and merged into a single >>>>>> function what merge, set_union, set_intersection, set_difference and >>>>>> set_symmetric_difference do there. >>>>>> >>>>>> My first thought when implementing this was to not make it a public >>>>>> function, but use it under the hood to speed-up some of the functions of >>>>>> `arraysetops.py`, which are now merging two already sorted functions by >>>>>> doing `np.sort(np.concatenate((a, b)))`. I would need to revisit my >>>>>> testing, but the speed-ups weren't that great. >>>>>> >>>>>> One other thing I saw value in for some of the `arraysetops.py` >>>>>> functions, but couldn't fully figure out, was in providing extra output >>>>>> aside from the merged arrays, either in the form of indices, or of boolean >>>>>> masks, indicating which items of the original arrays made it into the >>>>>> merged one, and/or where did they end up in it. >>>>>> >>>>>> Since there is at least one other person out there that likes it, is >>>>>> there any more interest in such a function? If yes, any comments on what >>>>>> the proper interface for extra output should be? Although perhaps the best >>>>>> is to leave that out for starters and see what use people make of it, if >>>>>> any. >>>>>> >>>>>> Jaime >>>>>> >>>>>> -- >>>>>> (\__/) >>>>>> ( O.o) >>>>>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus >>>>>> planes de dominaci?n mundial. >>>>>> >>>>>> _______________________________________________ >>>>>> NumPy-Discussion mailing list >>>>>> NumPy-Discussion at scipy.org >>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>> >>>>> >>>> >>>> >>>> -- >>>> (\__/) >>>> ( O.o) >>>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus >>>> planes de dominaci?n mundial. >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From orion at cora.nwra.com Wed Aug 27 17:52:18 2014 From: orion at cora.nwra.com (Orion Poplawski) Date: Wed, 27 Aug 2014 15:52:18 -0600 Subject: [Numpy-discussion] ANN: NumPy 1.9.0 release candidate 1 available In-Reply-To: <53FE104C.2020006@googlemail.com> References: <53FE104C.2020006@googlemail.com> Message-ID: <53FE5312.6070808@cora.nwra.com> On 08/27/2014 11:07 AM, Julian Taylor wrote: > Hello, > > Almost punctually for EuroScipy we have finally managed to release the > first release candidate of NumPy 1.9. > We intend to only fix bugs until the final release which we plan to do > in the next 1-2 weeks. I'm seeing the following errors from setup.py: non-existing path in 'numpy/f2py': 'docs' non-existing path in 'numpy/f2py': 'f2py.1' non-existing path in 'numpy/lib': 'benchmarks' It would be nice if f2py.1 was installed in /usr/share/man/man1/f2py.1. Filed as https://github.com/numpy/numpy/issues/5010 -- Orion Poplawski Technical Manager 303-415-9701 x222 NWRA, Boulder/CoRA Office FAX: 303-415-9702 3380 Mitchell Lane orion at nwra.com Boulder, CO 80301 http://www.nwra.com From davidmenhur at gmail.com Wed Aug 27 19:11:43 2014 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Thu, 28 Aug 2014 01:11:43 +0200 Subject: [Numpy-discussion] Does a `mergesorted` function make sense? In-Reply-To: References: Message-ID: On 27 August 2014 19:02, Jaime Fern?ndez del R?o wrote: > > Since there is at least one other person out there that likes it, is there > any more interest in such a function? If yes, any comments on what the > proper interface for extra output should be? Although perhaps the best is > to leave that out for starters and see what use people make of it, if any. > I think a perhaps more useful thing would be to implement timsort. I understand it is capable of take full advantage of the partially sorted arrays, with the extra safety of not making the assumption that the individual arrays are sorted. This will also be useful for other real world cases where the data is already partially sorted. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Wed Aug 27 19:49:19 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Thu, 28 Aug 2014 01:49:19 +0200 Subject: [Numpy-discussion] Does a `mergesorted` function make sense? In-Reply-To: References: Message-ID: I just checked the docs on ufuncs, and it appears that's a solved problem now, since ufunc.reduceat now comes with an axis argument. Or maybe it already did when I wrote that, but I simply wasn't paying attention. Either way, the code is fully vectorized now, in both grouped and non-grouped axes. Its a lot of code, but all that happens for a grouping other than some O(1) and O(n) stuff is an argsort of the keys, and then the reduction itself, all fully vectorized. Note that I sort the values first, and then use ufunc.reduceat on the groups. It would seem to me that ufunc.at should be more efficient, by avoiding this indirection, but testing very much revealed the opposite, for reasons unclear to me. Perhaps that's changed now as well. On Wed, Aug 27, 2014 at 11:32 PM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > Yes, I was aware of that. But the point would be to provide true > vectorization on those operations. > > The way I see it, numpy may not have to have a GroupBy implementation, but > it should at least enable implementing one that is fast and efficient over > any axis. > > > On Wed, Aug 27, 2014 at 12:38 PM, Eelco Hoogendoorn < > hoogendoorn.eelco at gmail.com> wrote: > >> i.e, if the grouped axis is small but the other axes are not, you could >> write this, which avoids the python loop over the long axis that >> np.vectorize would otherwise perform. >> >> import numpy as np >> from grouping import group_by >> keys = np.random.randint(0,4,10) >> values = np.random.rand(10,2000) >> for k,g in zip(*group_by(keys)(values)): >> print k, g.mean(0) >> >> >> >> >> On Wed, Aug 27, 2014 at 9:29 PM, Eelco Hoogendoorn < >> hoogendoorn.eelco at gmail.com> wrote: >> >>> f.i., this works as expected as well (100 keys of 1d int arrays and 100 >>> values of 1d float arrays): >>> >>> group_by(randint(0,4,(100,2))).mean(rand(100,2)) >>> >>> >>> On Wed, Aug 27, 2014 at 9:27 PM, Eelco Hoogendoorn < >>> hoogendoorn.eelco at gmail.com> wrote: >>> >>>> If I understand you correctly, the current implementation supports >>>> these operations. All reductions over groups (except for median) are >>>> performed through the corresponding ufunc (see GroupBy.reduce). This works >>>> on multidimensional arrays as well, although this broadcasting over the >>>> non-grouping axes is accomplished using np.vectorize. Actual vectorization >>>> only happens over the axis being grouped over, but this is usually a long >>>> axis. If it isn't, it is more efficient to perform a reduction by means of >>>> splitting the array by its groups first, and then map the iterable of >>>> groups over some reduction operation (as noted in the docstring of >>>> GroupBy.reduce). >>>> >>>> >>>> On Wed, Aug 27, 2014 at 8:29 PM, Jaime Fern?ndez del R?o < >>>> jaime.frio at gmail.com> wrote: >>>> >>>>> Hi Eelco, >>>>> >>>>> I took a deeper look into your code a couple of weeks back. I don't >>>>> think I have fully grasped what it allows completely, but I agree that some >>>>> form of what you have there is highly desirable. Along the same lines, for >>>>> sometime I have been thinking that the right place for a `groupby` in numpy >>>>> is as a method of ufuncs, so that `np.add.groupby(arr, groups)` would do a >>>>> multidimensional version of `np.bincount(groups, weights=arr)`. You would >>>>> then need a more powerful version of `np.unique` to produce the `groups`, >>>>> but that is something that Joe Kington's old PR was very close to >>>>> achieving, that should probably be resurrected as well. But yes, there >>>>> seems to be material for a NEP here, and some guidance from one of the >>>>> numpy devs would be helpful in getting this somewhere. >>>>> >>>>> Jaime >>>>> >>>>> >>>>> On Wed, Aug 27, 2014 at 10:35 AM, Eelco Hoogendoorn < >>>>> hoogendoorn.eelco at gmail.com> wrote: >>>>> >>>>>> It wouldn't hurt to have this function, but my intuition is that its >>>>>> use will be minimal. If you are already working with sorted arrays, you >>>>>> already have a flop cost on that order of magnitude, and the optimized >>>>>> merge saves you a factor two at the very most. Using numpy means you are >>>>>> sacrificing factors of two and beyond relative to pure C left right and >>>>>> center anyway, so if this kind of thing matters to you, you probably wont >>>>>> be working in numpy in the first place. >>>>>> >>>>>> That said, I share your interest in overhauling arraysetops. I see >>>>>> many opportunities for expanding its functionality. There is a question >>>>>> that amounts to 'how do I do group-by in numpy' on stackoverflow almost >>>>>> every week. That would have my top priority, but also things like extending >>>>>> np.unique to things like graph edges, or other more complex input, is very >>>>>> often useful to me. >>>>>> >>>>>> Ive written up a draft a while ago >>>>>> which accomplishes all of the above and more. It reimplements functions >>>>>> like np.unique around a common Index object. This index object encapsulates >>>>>> the precomputation (sorting) required for efficient set-ops on different >>>>>> datatypes, and provides a common interface to obtain the kind of >>>>>> information you are talking about (which is used extensively internally in >>>>>> the implementation of group_by, for instance). >>>>>> >>>>>> ie, this functionality allows you to write neat things like >>>>>> group_by(randint(0,9,(100,2))).median(rand(100)) >>>>>> >>>>>> But I have the feeling much more could be done in this direction, and >>>>>> I feel this draft could really use a bit of back and forth. If we are going >>>>>> to completely rewrite arraysetops, we might as well do it right. >>>>>> >>>>>> >>>>>> On Wed, Aug 27, 2014 at 7:02 PM, Jaime Fern?ndez del R?o < >>>>>> jaime.frio at gmail.com> wrote: >>>>>> >>>>>>> A request was open in github to add a `merge` function to numpy that >>>>>>> would merge two sorted 1d arrays into a single sorted 1d array. I have been >>>>>>> playing around with that idea for a while, and have a branch in my numpy >>>>>>> fork that adds a `mergesorted` function to `numpy.lib`: >>>>>>> >>>>>>> >>>>>>> https://github.com/jaimefrio/numpy/commit/ce5d480afecc989a36e5d2bf4ea1d1ba58a83b0a >>>>>>> >>>>>>> I drew inspiration from C++ STL algorithms, and merged into a single >>>>>>> function what merge, set_union, set_intersection, set_difference and >>>>>>> set_symmetric_difference do there. >>>>>>> >>>>>>> My first thought when implementing this was to not make it a public >>>>>>> function, but use it under the hood to speed-up some of the functions of >>>>>>> `arraysetops.py`, which are now merging two already sorted functions by >>>>>>> doing `np.sort(np.concatenate((a, b)))`. I would need to revisit my >>>>>>> testing, but the speed-ups weren't that great. >>>>>>> >>>>>>> One other thing I saw value in for some of the `arraysetops.py` >>>>>>> functions, but couldn't fully figure out, was in providing extra output >>>>>>> aside from the merged arrays, either in the form of indices, or of boolean >>>>>>> masks, indicating which items of the original arrays made it into the >>>>>>> merged one, and/or where did they end up in it. >>>>>>> >>>>>>> Since there is at least one other person out there that likes it, is >>>>>>> there any more interest in such a function? If yes, any comments on what >>>>>>> the proper interface for extra output should be? Although perhaps the best >>>>>>> is to leave that out for starters and see what use people make of it, if >>>>>>> any. >>>>>>> >>>>>>> Jaime >>>>>>> >>>>>>> -- >>>>>>> (\__/) >>>>>>> ( O.o) >>>>>>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus >>>>>>> planes de dominaci?n mundial. >>>>>>> >>>>>>> _______________________________________________ >>>>>>> NumPy-Discussion mailing list >>>>>>> NumPy-Discussion at scipy.org >>>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> NumPy-Discussion mailing list >>>>>> NumPy-Discussion at scipy.org >>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> (\__/) >>>>> ( O.o) >>>>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus >>>>> planes de dominaci?n mundial. >>>>> >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>> >>>>> >>>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes > de dominaci?n mundial. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Aug 28 01:00:23 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 27 Aug 2014 23:00:23 -0600 Subject: [Numpy-discussion] ANN: NumPy 1.9.0 release candidate 1 available In-Reply-To: <53FE5312.6070808@cora.nwra.com> References: <53FE104C.2020006@googlemail.com> <53FE5312.6070808@cora.nwra.com> Message-ID: On Wed, Aug 27, 2014 at 3:52 PM, Orion Poplawski wrote: > On 08/27/2014 11:07 AM, Julian Taylor wrote: > > Hello, > > > > Almost punctually for EuroScipy we have finally managed to release the > > first release candidate of NumPy 1.9. > > We intend to only fix bugs until the final release which we plan to do > > in the next 1-2 weeks. > > > I'm seeing the following errors from setup.py: > > > non-existing path in 'numpy/f2py': 'docs' > non-existing path in 'numpy/f2py': 'f2py.1' > > non-existing path in 'numpy/lib': 'benchmarks' > Hmm, benchmarks is long gone, and the f2py files also no longer exist. Since this causes no problems on my system, I suspect something else. How are you doing the install? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mogus.mochena at famu.edu Thu Aug 28 04:19:09 2014 From: mogus.mochena at famu.edu (Mochena, Mogus D.) Date: Thu, 28 Aug 2014 08:19:09 +0000 Subject: [Numpy-discussion] Unsubscribe Message-ID: Please unsubsribe me from this list. ________________________________ From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] on behalf of Charles R Harris [charlesr.harris at gmail.com] Sent: Thursday, August 28, 2014 1:00 AM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] ANN: NumPy 1.9.0 release candidate 1 available On Wed, Aug 27, 2014 at 3:52 PM, Orion Poplawski > wrote: On 08/27/2014 11:07 AM, Julian Taylor wrote: > Hello, > > Almost punctually for EuroScipy we have finally managed to release the > first release candidate of NumPy 1.9. > We intend to only fix bugs until the final release which we plan to do > in the next 1-2 weeks. I'm seeing the following errors from setup.py: non-existing path in 'numpy/f2py': 'docs' non-existing path in 'numpy/f2py': 'f2py.1' non-existing path in 'numpy/lib': 'benchmarks' Hmm, benchmarks is long gone, and the f2py files also no longer exist. Since this causes no problems on my system, I suspect something else. How are you doing the install? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmkleffner at gmail.com Thu Aug 28 16:08:22 2014 From: cmkleffner at gmail.com (Carl Kleffner) Date: Thu, 28 Aug 2014 22:08:22 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.9.0 release candidate 1 available In-Reply-To: References: <53FE104C.2020006@googlemail.com> <53FE5312.6070808@cora.nwra.com> Message-ID: I put 4 wheels for numpy-1.9.0rc1 on https://bitbucket.org/carlkl/mingw-w64-for-python/downloads . All wheels are compiled with the mingw-w64 compiler and makes use of OpenBLAS (latest github version). The 32 bit versions still have some testing errors on corner cases for atan2 and hypot. Carl 2014-08-28 7:00 GMT+02:00 Charles R Harris : > > > > On Wed, Aug 27, 2014 at 3:52 PM, Orion Poplawski > wrote: > >> On 08/27/2014 11:07 AM, Julian Taylor wrote: >> > Hello, >> > >> > Almost punctually for EuroScipy we have finally managed to release the >> > first release candidate of NumPy 1.9. >> > We intend to only fix bugs until the final release which we plan to do >> > in the next 1-2 weeks. >> >> >> I'm seeing the following errors from setup.py: >> >> >> non-existing path in 'numpy/f2py': 'docs' >> non-existing path in 'numpy/f2py': 'f2py.1' >> >> non-existing path in 'numpy/lib': 'benchmarks' >> > > Hmm, benchmarks is long gone, and the f2py files also no longer exist. > Since this causes no problems on my system, I suspect something else. How > are you doing the install? > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Thu Aug 28 20:14:41 2014 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Thu, 28 Aug 2014 17:14:41 -0700 Subject: [Numpy-discussion] PR added: frozen dimensions in gufunc signatures Message-ID: Hi, I have just sent a PR (https://github.com/numpy/numpy/pull/5015), adding the possibility of having frozen dimensions in gufunc signatures. As a proof of concept, I have added a `cross1d` gufunc to `numpy.core.umath_tests`: In [1]: import numpy as np In [2]: from numpy.core.umath_tests import cross1d In [3]: cross1d.signature Out[3]: '(3),(3)->(3)' In [4]: a = np.random.rand(1000, 3) In [5]: b = np.random.rand(1000, 3) In [6]: np.allclose(np.cross(a, b), cross1d(a, b)) Out[6]: True In [7]: %timeit np.cross(a, b) 10000 loops, best of 3: 76.1 us per loop In [8]: %timeit cross1d(a, b) 100000 loops, best of 3: 13.1 us per loop In [9]: c = np.random.rand(1000, 2) In [10]: d = np.random.rand(1000, 2) In [11]: cross1d(c, d) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () ----> 1 cross1d(c, d) ValueError: cross1d: Operand 0 has a mismatch in its core dimension 0, with gufunc signature (3),(3)->(3) (size 2 is different from 3) The speed up over `np.cross` is nice, and while `np.cross` is not the best of examples, as it needs to handle more sizes, in many cases this will allow producing gufuncs that work without a Python wrapper redoing checks that are best left to the iterator, such as dimension sizes. It still needs tests, but before embarking on fully developing those, I wanted to make sure that there is an interest on this. I would also like to further enhance gufuncs providing computed dimensions, e.g. making it possible to e.g. define `pairwise_cross` with signature '(n, 3)->($m, 3)', where the $ indicates that m is a computed dimension, that would have to be calculated by a function passed to the gufunc constructor and stored in the gufunc object, based on the other core dimensions. In this case it would make $m be n*(n-1), so that all pairwise cross products between 3D vectors could be computed. The syntax with '$' is kind of crappy, so any suggestions on how to better express this in the signature are more than welcome, as well as any feedback on the merits (or lack of them) of implementing this. Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Aug 28 20:40:06 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 29 Aug 2014 01:40:06 +0100 Subject: [Numpy-discussion] PR added: frozen dimensions in gufunc signatures In-Reply-To: References: Message-ID: On Fri, Aug 29, 2014 at 1:14 AM, Jaime Fern?ndez del R?o wrote: > Hi, > > I have just sent a PR (https://github.com/numpy/numpy/pull/5015), adding the > possibility of having frozen dimensions in gufunc signatures. As a proof of > concept, I have added a `cross1d` gufunc to `numpy.core.umath_tests`: > > In [1]: import numpy as np > In [2]: from numpy.core.umath_tests import cross1d > > In [3]: cross1d.signature > Out[3]: '(3),(3)->(3)' > > In [4]: a = np.random.rand(1000, 3) > In [5]: b = np.random.rand(1000, 3) > > In [6]: np.allclose(np.cross(a, b), cross1d(a, b)) > Out[6]: True > > In [7]: %timeit np.cross(a, b) > 10000 loops, best of 3: 76.1 us per loop > > In [8]: %timeit cross1d(a, b) > 100000 loops, best of 3: 13.1 us per loop > > In [9]: c = np.random.rand(1000, 2) > In [10]: d = np.random.rand(1000, 2) > > In [11]: cross1d(c, d) > --------------------------------------------------------------------------- > ValueError Traceback (most recent call last) > in () > ----> 1 cross1d(c, d) > > ValueError: cross1d: Operand 0 has a mismatch in its core dimension 0, with > gufunc signature (3),(3)->(3) (size 2 is different from 3) > > The speed up over `np.cross` is nice, and while `np.cross` is not the best > of examples, as it needs to handle more sizes, in many cases this will allow > producing gufuncs that work without a Python wrapper redoing checks that are > best left to the iterator, such as dimension sizes. > > It still needs tests, but before embarking on fully developing those, I > wanted to make sure that there is an interest on this. > > I would also like to further enhance gufuncs providing computed dimensions, > e.g. making it possible to e.g. define `pairwise_cross` with signature '(n, > 3)->($m, 3)', where the $ indicates that m is a computed dimension, that > would have to be calculated by a function passed to the gufunc constructor > and stored in the gufunc object, based on the other core dimensions. In this > case it would make $m be n*(n-1), so that all pairwise cross products > between 3D vectors could be computed. > > The syntax with '$' is kind of crappy, so any suggestions on how to better > express this in the signature are more than welcome, as well as any feedback > on the merits (or lack of them) of implementing this. Some thoughts: When I first saw the PR my first reaction was that maybe we should be allowing more general hooks for a gufunc to choose its core dimensions. Reading the code convinced me that this is a relatively minimal enhancement over what we're currently doing, so your current PR looks fine to me. But, for your computed dimension idea I'm wondering if what we should do instead is just let a gufunc provide a C callback that looks at the input array dimensions and explicitly says somehow which dimensions it wants to treat as the core dimensions and what its output shapes will be. There's no rule that we have to extend the signature mini-language to be Turing complete, we can just use C :-). It would be good to have a better motivation for computed gufunc dimensions, though. Your "all pairwise cross products" example would be *much* better handled by implementing the .outer method for binary gufuncs: pairwise_cross(a) == cross.outer(a, a). This would make gufuncs more consistent with ufuncs, plus let you do all-pairwise-cross-products between two different sets of cross products, plus give us all-pairwise-matrix-products for free, etc. While you're messing around with the gufunc dimension matching logic, any chance we can tempt you to implement the "optional dimensions" needed to handle '@', solve, etc. elegantly? The rule would be that you can write something like (n?,k),(k,m?)->(n?,m?) and the ? dimensions are allowed to take on an additional value "nothing at all". If there's no dimension available in the input, then we act like it was reshaped to add a dimension with shape 1, and then in the output we squeeze this dimension out again. I guess the rules would be that (1) in the input, you can have ? dimensions at the beginning or the end of your shape, but not both at the same time, (2) any dimension that has a ? in one place must have it in all places, (3) when checking argument conformity, "nothing at all" only matches against "nothing at all", not against 1; this is because if we allowed (n?,m),(n?,m)->(n?,m) to be applied to two arrays with shapes (5,) and (1, 5), then it would be ambiguous whether the output should have shape (5,) or (1, 5). -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From jaime.frio at gmail.com Fri Aug 29 04:55:00 2014 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Fri, 29 Aug 2014 01:55:00 -0700 Subject: [Numpy-discussion] PR added: frozen dimensions in gufunc signatures In-Reply-To: References: Message-ID: On Thu, Aug 28, 2014 at 5:40 PM, Nathaniel Smith wrote: > Some thoughts: > > But, for your computed dimension idea I'm wondering if what we should > do instead is just let a gufunc provide a C callback that looks at the > input array dimensions and explicitly says somehow which dimensions it > wants to treat as the core dimensions and what its output shapes will > be. There's no rule that we have to extend the signature mini-language > to be Turing complete, we can just use C :-). > > It would be good to have a better motivation for computed gufunc > dimensions, though. Your "all pairwise cross products" example would > be *much* better handled by implementing the .outer method for binary > gufuncs: pairwise_cross(a) == cross.outer(a, a). This would make > gufuncs more consistent with ufuncs, plus let you do > all-pairwise-cross-products between two different sets of cross > products, plus give us all-pairwise-matrix-products for free, etc. > The outer for binary gufuncs sounds like a good idea. A reduce for binary gufuncs that allow it (like square matrix multiplication) would also be nice. But going back to the original question, the pairwise whatevers were just an example: one could come up with several others, e.g.: (m),(n)->($p),($q) with $p = m - n and $q = n - 1, could be (I think) the signature of a polynomial division gufunc (m),(n)->($p), with $p = m - n + 1, could be the signature of a convolution or correlation gufunc (m)->($n), with $n = m / 2, could be some form of downsampling gufunc > While you're messing around with the gufunc dimension matching logic, > any chance we can tempt you to implement the "optional dimensions" > needed to handle '@', solve, etc. elegantly? The rule would be that > you can write something like > (n?,k),(k,m?)->(n?,m?) > and the ? dimensions are allowed to take on an additional value > "nothing at all". If there's no dimension available in the input, then > we act like it was reshaped to add a dimension with shape 1, and then > in the output we squeeze this dimension out again. I guess the rules > would be that (1) in the input, you can have ? dimensions at the > beginning or the end of your shape, but not both at the same time, (2) > any dimension that has a ? in one place must have it in all places, > (3) when checking argument conformity, "nothing at all" only matches > against "nothing at all", not against 1; this is because if we allowed > (n?,m),(n?,m)->(n?,m) to be applied to two arrays with shapes (5,) and > (1, 5), then it would be ambiguous whether the output should have > shape (5,) or (1, 5). > I definitely do not mind taking a look into it. I need to think a little more about the rules to convince myself that there is a consistent set of them that we can use. I also thought there may be a performance concern, that you may want to have different implementations when dimensions are missing, not automatically add a 1 and then remove it. It doesn't seem to be the case with neither `np.dot` nor `np.solve`, so maybe I am being overly cautious. Thanks for your comments and ideas. I have a feeling there are some nice features hidden in here, but I can't seem to figure out what should they be on my own. Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Fri Aug 29 08:31:29 2014 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 29 Aug 2014 08:31:29 -0400 Subject: [Numpy-discussion] ANN: NumPy 1.9.0 release candidate 1 available References: <53FE104C.2020006@googlemail.com> Message-ID: How do I run tests? python setup.py --help-commands claims 'test' is a command, but doesn't seem to work: python setup.py test Running from numpy source directory. /usr/lib64/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'test_suite' warnings.warn(msg) usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...] or: setup.py --help [cmd1 cmd2 ...] or: setup.py --help-commands or: setup.py cmd --help -- Those who don't understand recursion are doomed to repeat it From sebastian at sipsolutions.net Fri Aug 29 08:39:08 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 29 Aug 2014 13:39:08 +0100 Subject: [Numpy-discussion] ANN: NumPy 1.9.0 release candidate 1 available In-Reply-To: References: <53FE104C.2020006@googlemail.com> Message-ID: <1409315948.10779.1.camel@sebastian-t440> On Fr, 2014-08-29 at 08:31 -0400, Neal Becker wrote: > How do I run tests? > > python setup.py --help-commands claims 'test' is a command, but doesn't seem to > work: > There is a runtests script you can use, it should do the building, too. Or just install and then run `np.test()` (or run nosetest manually on the installed stuff). - Sebsatian > python setup.py test > Running from numpy source directory. > /usr/lib64/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution > option: 'test_suite' > warnings.warn(msg) > usage: setup.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...] > or: setup.py --help [cmd1 cmd2 ...] > or: setup.py --help-commands > or: setup.py cmd --help > > > -- Those who don't understand recursion are doomed to repeat it > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From ndbecker2 at gmail.com Fri Aug 29 08:45:56 2014 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 29 Aug 2014 08:45:56 -0400 Subject: [Numpy-discussion] ANN: NumPy 1.9.0 release candidate 1 available References: <53FE104C.2020006@googlemail.com> Message-ID: doesn't seem to work on fedora 20 x86_64 After python setup.py install --user, import fails: python Python 2.7.5 (default, Jun 25 2014, 10:19:55) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy Traceback (most recent call last): File "", line 1, in File "/home/nbecker/.local/lib/python2.7/site-packages/numpy/__init__.py", line 153, in from . import add_newdocs File "/home/nbecker/.local/lib/python2.7/site-packages/numpy/add_newdocs.py", line 13, in from numpy.lib import add_newdoc File "/home/nbecker/.local/lib/python2.7/site-packages/numpy/lib/__init__.py", line 8, in from .type_check import * File "/home/nbecker/.local/lib/python2.7/site- packages/numpy/lib/type_check.py", line 11, in import numpy.core.numeric as _nx File "/home/nbecker/.local/lib/python2.7/site- packages/numpy/core/__init__.py", line 11, in from . import numeric File "/home/nbecker/.local/lib/python2.7/site-packages/numpy/core/numeric.py", line 2721, in _setdef() File "/home/nbecker/.local/lib/python2.7/site-packages/numpy/core/numeric.py", line 2717, in _setdef defval = [UFUNC_BUFSIZE_DEFAULT, ERR_DEFAULT2, None] NameError: global name 'ERR_DEFAULT2' is not defined -- -- Those who don't understand recursion are doomed to repeat it From ndbecker2 at gmail.com Fri Aug 29 08:50:12 2014 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 29 Aug 2014 08:50:12 -0400 Subject: [Numpy-discussion] ANN: NumPy 1.9.0 release candidate 1 available References: <53FE104C.2020006@googlemail.com> Message-ID: OK, it's fixed by doing: rm -rf ~/.local/lib/python2.7/site-packages/numpy* python setup.py install --user I guess something was not cleaned out from previous packages From ben.root at ou.edu Fri Aug 29 09:26:47 2014 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 29 Aug 2014 09:26:47 -0400 Subject: [Numpy-discussion] ANN: NumPy 1.9.0 release candidate 1 available In-Reply-To: References: <53FE104C.2020006@googlemail.com> Message-ID: It is generally a good idea when switching between releases to execute "git clean -fxd" prior to rebuilding. Admittedly, I don't know how cleaning out that directory in .local could have impacted things. Go figure. Cheers! Ben Root On Fri, Aug 29, 2014 at 8:50 AM, Neal Becker wrote: > OK, it's fixed by doing: > > rm -rf ~/.local/lib/python2.7/site-packages/numpy* > python setup.py install --user > > I guess something was not cleaned out from previous packages > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From orion at cora.nwra.com Fri Aug 29 12:34:33 2014 From: orion at cora.nwra.com (Orion Poplawski) Date: Fri, 29 Aug 2014 10:34:33 -0600 Subject: [Numpy-discussion] ANN: NumPy 1.9.0 release candidate 1 available In-Reply-To: References: <53FE104C.2020006@googlemail.com> <53FE5312.6070808@cora.nwra.com> Message-ID: <5400AB99.6010509@cora.nwra.com> On 08/27/2014 11:00 PM, Charles R Harris wrote: > > > > On Wed, Aug 27, 2014 at 3:52 PM, Orion Poplawski > wrote: > > On 08/27/2014 11:07 AM, Julian Taylor wrote: > > Hello, > > > > Almost punctually for EuroScipy we have finally managed to release the > > first release candidate of NumPy 1.9. > > We intend to only fix bugs until the final release which we plan to do > > in the next 1-2 weeks. > > > I'm seeing the following errors from setup.py: > > > non-existing path in 'numpy/f2py': 'docs' > non-existing path in 'numpy/f2py': 'f2py.1' > > non-existing path in 'numpy/lib': 'benchmarks' > > > Hmm, benchmarks is long gone, and the f2py files also no longer exist. Since > this causes no problems on my system, I suspect something else. How are you > doing the install? > > Chuck The references need to be removed from the appropriate setup.py files. f2py.1 is now in doc/f2py. -- Orion Poplawski Technical Manager 303-415-9701 x222 NWRA, Boulder/CoRA Office FAX: 303-415-9702 3380 Mitchell Lane orion at nwra.com Boulder, CO 80301 http://www.nwra.com From ben.root at ou.edu Fri Aug 29 22:10:34 2014 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 29 Aug 2014 22:10:34 -0400 Subject: [Numpy-discussion] Can't seem to use np.insert() or np.append() for structured arrays Message-ID: Consider the following: a = np.array([(1, 'a'), (2, 'b'), (3, 'c')], dtype=[('foo', 'i'), ('bar', 'a1')]) b = np.append(a, (4, 'd')) Traceback (most recent call last): File "", line 1, in File "/home/ben/miniconda/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3555, in append return concatenate((arr, values), axis=axis) TypeError: invalid type promotion b = np.insert(a, 4, (4, 'd')) Traceback (most recent call last): File "", line 1, in File "/home/ben/miniconda/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3464, in insert new[slobj] = values ValueError: could not convert string to float: d In my original code snippet I was developing which has a more involved dtype, I actually got a different exception: b = np.append(a, c) Traceback (most recent call last): File "", line 1, in File "/home/ben/miniconda/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3553, in append values = ravel(values) File "/home/ben/miniconda/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 1367, in ravel return asarray(a).ravel(order) File "/home/ben/miniconda/lib/python2.7/site-packages/numpy/core/numeric.py", line 460, in asarray return array(a, dtype, copy=False, order=order) ValueError: setting an array element with a sequence. Luckily, this works as a work-around: >>> b = np.append(a, np.array([(4, 'd')], dtype=a.dtype)) >>> b array([(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')], dtype=[('foo', 'i'), ('bar', 'S1')]) The same happens whether I enclose the value with square bracket or not. I suspect that this array type just wasn't considered when its checking logic was developed. This is with 1.8.2 from miniconda. Should we consider this a bug or are structured arrays just not expected to be modified like this? Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Aug 29 22:29:47 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 29 Aug 2014 20:29:47 -0600 Subject: [Numpy-discussion] Can't seem to use np.insert() or np.append() for structured arrays In-Reply-To: References: Message-ID: On Fri, Aug 29, 2014 at 8:10 PM, Benjamin Root wrote: > Consider the following: > > a = np.array([(1, 'a'), (2, 'b'), (3, 'c')], dtype=[('foo', 'i'), ('bar', > 'a1')]) > b = np.append(a, (4, 'd')) > Traceback (most recent call last): > File "", line 1, in > File > "/home/ben/miniconda/lib/python2.7/site-packages/numpy/lib/function_base.py", > line 3555, in append > return concatenate((arr, values), axis=axis) > TypeError: invalid type promotion > b = np.insert(a, 4, (4, 'd')) > Traceback (most recent call last): > File "", line 1, in > File > "/home/ben/miniconda/lib/python2.7/site-packages/numpy/lib/function_base.py", > line 3464, in insert > new[slobj] = values > ValueError: could not convert string to float: d > > In my original code snippet I was developing which has a more involved > dtype, I actually got a different exception: > b = np.append(a, c) > Traceback (most recent call last): > File "", line 1, in > File > "/home/ben/miniconda/lib/python2.7/site-packages/numpy/lib/function_base.py", > line 3553, in append > values = ravel(values) > File > "/home/ben/miniconda/lib/python2.7/site-packages/numpy/core/fromnumeric.py", > line 1367, in ravel > return asarray(a).ravel(order) > File > "/home/ben/miniconda/lib/python2.7/site-packages/numpy/core/numeric.py", > line 460, in asarray > return array(a, dtype, copy=False, order=order) > ValueError: setting an array element with a sequence. > > Luckily, this works as a work-around: > >>> b = np.append(a, np.array([(4, 'd')], dtype=a.dtype)) > >>> b > array([(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')], > dtype=[('foo', 'i'), ('bar', 'S1')]) > > The same happens whether I enclose the value with square bracket or not. I > suspect that this array type just wasn't considered when its checking logic > was developed. This is with 1.8.2 from miniconda. Should we consider this a > bug or are structured arrays just not expected to be modified like this? > > Could be one of many bug reports related to assignment to structured types. Can you try using `x`? In [25]: x = array([(4, 'd')], dt)[0] In [26]: type(x) Out[26]: numpy.void In [27]: x Out[27]: (4, 'd') Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Sat Aug 30 04:04:40 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sat, 30 Aug 2014 09:04:40 +0100 Subject: [Numpy-discussion] Can't seem to use np.insert() or np.append() for structured arrays In-Reply-To: References: Message-ID: <1409385880.17692.1.camel@sebastian-t440> On Fr, 2014-08-29 at 22:10 -0400, Benjamin Root wrote: > Consider the following: > > a = np.array([(1, 'a'), (2, 'b'), (3, 'c')], dtype=[('foo', 'i'), > ('bar', 'a1')]) > > b = np.append(a, (4, 'd')) > Traceback (most recent call last): > File "", line 1, in > File > "/home/ben/miniconda/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3555, in append > return concatenate((arr, values), axis=axis) > TypeError: invalid type promotion > b = np.insert(a, 4, (4, 'd')) > Traceback (most recent call last): > File "", line 1, in > File > "/home/ben/miniconda/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3464, in insert > new[slobj] = values > ValueError: could not convert string to float: d > Ooops, nice bug in there, might have been me :) (will open a PR). - Sebastian > > In my original code snippet I was developing which has a more involved > dtype, I actually got a different exception: > b = np.append(a, c) > Traceback (most recent call last): > File "", line 1, in > File > "/home/ben/miniconda/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3553, in append > values = ravel(values) > File > "/home/ben/miniconda/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 1367, in ravel > return asarray(a).ravel(order) > File > "/home/ben/miniconda/lib/python2.7/site-packages/numpy/core/numeric.py", line 460, in asarray > return array(a, dtype, copy=False, order=order) > ValueError: setting an array element with a sequence. > > > Luckily, this works as a work-around: > >>> b = np.append(a, np.array([(4, 'd')], dtype=a.dtype)) > >>> b > array([(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')], > dtype=[('foo', 'i'), ('bar', 'S1')]) > > > > The same happens whether I enclose the value with square bracket or > not. I suspect that this array type just wasn't considered when its > checking logic was developed. This is with 1.8.2 from miniconda. > Should we consider this a bug or are structured arrays just not > expected to be modified like this? > > > Cheers! > > Ben Root > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From sebastian at sipsolutions.net Sat Aug 30 05:05:11 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sat, 30 Aug 2014 10:05:11 +0100 Subject: [Numpy-discussion] Can't seem to use np.insert() or np.append() for structured arrays In-Reply-To: <1409385880.17692.1.camel@sebastian-t440> References: <1409385880.17692.1.camel@sebastian-t440> Message-ID: <1409389511.23364.1.camel@sebastian-t440> On Sa, 2014-08-30 at 09:04 +0100, Sebastian Berg wrote: > On Fr, 2014-08-29 at 22:10 -0400, Benjamin Root wrote: > > Consider the following: > > > > a = np.array([(1, 'a'), (2, 'b'), (3, 'c')], dtype=[('foo', 'i'), > > ('bar', 'a1')]) > > > > b = np.append(a, (4, 'd')) > > Traceback (most recent call last): > > File "", line 1, in > > File > > "/home/ben/miniconda/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3555, in append > > return concatenate((arr, values), axis=axis) > > TypeError: invalid type promotion > > b = np.insert(a, 4, (4, 'd')) > > Traceback (most recent call last): > > File "", line 1, in > > File > > "/home/ben/miniconda/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3464, in insert > > new[slobj] = values > > ValueError: could not convert string to float: d > > > Actually, for insert it is easy to fix (https://github.com/numpy/numpy/pull/5022), for append there are some difficulties, because the dtype is not forced to be the arrays dtype, but gotten from both the original and the appended value currently. - Sebastian > Ooops, nice bug in there, might have been me :) (will open a PR). > > - Sebastian > > > > > In my original code snippet I was developing which has a more involved > > dtype, I actually got a different exception: > > b = np.append(a, c) > > Traceback (most recent call last): > > File "", line 1, in > > File > > "/home/ben/miniconda/lib/python2.7/site-packages/numpy/lib/function_base.py", line 3553, in append > > values = ravel(values) > > File > > "/home/ben/miniconda/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 1367, in ravel > > return asarray(a).ravel(order) > > File > > "/home/ben/miniconda/lib/python2.7/site-packages/numpy/core/numeric.py", line 460, in asarray > > return array(a, dtype, copy=False, order=order) > > ValueError: setting an array element with a sequence. > > > > > > Luckily, this works as a work-around: > > >>> b = np.append(a, np.array([(4, 'd')], dtype=a.dtype)) > > >>> b > > array([(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')], > > dtype=[('foo', 'i'), ('bar', 'S1')]) > > > > > > > > The same happens whether I enclose the value with square bracket or > > not. I suspect that this array type just wasn't considered when its > > checking logic was developed. This is with 1.8.2 from miniconda. > > Should we consider this a bug or are structured arrays just not > > expected to be modified like this? > > > > > > Cheers! > > > > Ben Root > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Sat Aug 30 13:43:27 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 30 Aug 2014 13:43:27 -0400 Subject: [Numpy-discussion] inplace unary operations? Message-ID: Is there a way to negate a boolean, or to change the sign of a float inplace ? Josef random thoughts -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Aug 30 13:45:47 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 30 Aug 2014 18:45:47 +0100 Subject: [Numpy-discussion] inplace unary operations? In-Reply-To: References: Message-ID: On Sat, Aug 30, 2014 at 6:43 PM, wrote: > Is there a way to negate a boolean, or to change the sign of a float inplace > ? np.logical_not(arr, out=arr) np.negative(arr, out=arr) -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From ben.root at ou.edu Sat Aug 30 14:39:33 2014 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 30 Aug 2014 14:39:33 -0400 Subject: [Numpy-discussion] inplace unary operations? In-Reply-To: References: Message-ID: Random thoughts are the best kinds of thoughts! I didn't even know there was a np.negative() function! I will keep this card up my sleeve at work for one of those save-the-day moments in optimization. Cheers! Ben Root On Sat, Aug 30, 2014 at 1:45 PM, Nathaniel Smith wrote: > On Sat, Aug 30, 2014 at 6:43 PM, wrote: > > Is there a way to negate a boolean, or to change the sign of a float > inplace > > ? > > np.logical_not(arr, out=arr) > np.negative(arr, out=arr) > > -n > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Aug 30 16:33:46 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 30 Aug 2014 21:33:46 +0100 Subject: [Numpy-discussion] inplace unary operations? In-Reply-To: References: Message-ID: On Sat, Aug 30, 2014 at 7:39 PM, Benjamin Root wrote: > Random thoughts are the best kinds of thoughts! I didn't even know there was > a np.negative() function! Me neither, I had to look it up :-) -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From josef.pktd at gmail.com Sun Aug 31 09:31:33 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 31 Aug 2014 09:31:33 -0400 Subject: [Numpy-discussion] inplace unary operations? In-Reply-To: References: Message-ID: On Sat, Aug 30, 2014 at 1:45 PM, Nathaniel Smith wrote: > On Sat, Aug 30, 2014 at 6:43 PM, wrote: > > Is there a way to negate a boolean, or to change the sign of a float > inplace > > ? > > np.logical_not(arr, out=arr) > np.negative(arr, out=arr) > Thanks Nathaniel. np.negative might save a bit of memory and time when we have to negate the loglikelihood all the time. Josef > > -n > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre at barbierdereuille.net Sun Aug 31 11:04:25 2014 From: pierre at barbierdereuille.net (Pierre Barbier de Reuille) Date: Sun, 31 Aug 2014 17:04:25 +0200 Subject: [Numpy-discussion] inplace unary operations? In-Reply-To: References: Message-ID: Just to point out another solution to change the sign: >>> arr *= -1 Both solutions take the same time on my computer. However, the boolean equivalent: >>> arr ^= True is a lot slower than using negative. My two cents ... -- Dr. Barbier de Reuille, Pierre Institute of Plant Sciences Altenbergrain 21, CH-3013 Bern, Switzerland http://www.botany.unibe.ch/associated/systemsx/index.php On 31 August 2014 15:31, wrote: > > > > On Sat, Aug 30, 2014 at 1:45 PM, Nathaniel Smith wrote: > >> On Sat, Aug 30, 2014 at 6:43 PM, wrote: >> > Is there a way to negate a boolean, or to change the sign of a float >> inplace >> > ? >> >> np.logical_not(arr, out=arr) >> np.negative(arr, out=arr) >> > > Thanks Nathaniel. > > np.negative might save a bit of memory and time when we have to negate the > loglikelihood all the time. > > Josef > > > >> >> -n >> >> -- >> Nathaniel J. Smith >> Postdoctoral researcher - Informatics - University of Edinburgh >> http://vorpus.org >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Sun Aug 31 15:48:33 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Sun, 31 Aug 2014 21:48:33 +0200 Subject: [Numpy-discussion] Does a `mergesorted` function make sense? In-Reply-To: References: Message-ID: Ive organized all code I had relating to this subject in a github repository . That should facilitate shooting around ideas. Ive also added more documentation and structure to make it easier to see what is going on. Hopefully we can converge on a common vision, and then improve the documentation and testing to make it worthy of including in the numpy master. Note that there is also a complete rewrite of the classic numpy.arraysetops, such that they are also generalized to more complex input, such as finding unique graph edges, and so on. You mentioned getting the numpy core developers involved; are they not subscribed to this mailing list? I wouldn't be surprised; youd hope there is a channel of discussion concerning development with higher signal to noise.... On Thu, Aug 28, 2014 at 1:49 AM, Eelco Hoogendoorn < hoogendoorn.eelco at gmail.com> wrote: > I just checked the docs on ufuncs, and it appears that's a solved problem > now, since ufunc.reduceat now comes with an axis argument. Or maybe it > already did when I wrote that, but I simply wasn't paying attention. Either > way, the code is fully vectorized now, in both grouped and non-grouped > axes. Its a lot of code, but all that happens for a grouping other than > some O(1) and O(n) stuff is an argsort of the keys, and then the reduction > itself, all fully vectorized. > > Note that I sort the values first, and then use ufunc.reduceat on the > groups. It would seem to me that ufunc.at should be more efficient, by > avoiding this indirection, but testing very much revealed the opposite, for > reasons unclear to me. Perhaps that's changed now as well. > > > On Wed, Aug 27, 2014 at 11:32 PM, Jaime Fern?ndez del R?o < > jaime.frio at gmail.com> wrote: > >> Yes, I was aware of that. But the point would be to provide true >> vectorization on those operations. >> >> The way I see it, numpy may not have to have a GroupBy implementation, >> but it should at least enable implementing one that is fast and efficient >> over any axis. >> >> >> On Wed, Aug 27, 2014 at 12:38 PM, Eelco Hoogendoorn < >> hoogendoorn.eelco at gmail.com> wrote: >> >>> i.e, if the grouped axis is small but the other axes are not, you could >>> write this, which avoids the python loop over the long axis that >>> np.vectorize would otherwise perform. >>> >>> import numpy as np >>> from grouping import group_by >>> keys = np.random.randint(0,4,10) >>> values = np.random.rand(10,2000) >>> for k,g in zip(*group_by(keys)(values)): >>> print k, g.mean(0) >>> >>> >>> >>> >>> On Wed, Aug 27, 2014 at 9:29 PM, Eelco Hoogendoorn < >>> hoogendoorn.eelco at gmail.com> wrote: >>> >>>> f.i., this works as expected as well (100 keys of 1d int arrays and 100 >>>> values of 1d float arrays): >>>> >>>> group_by(randint(0,4,(100,2))).mean(rand(100,2)) >>>> >>>> >>>> On Wed, Aug 27, 2014 at 9:27 PM, Eelco Hoogendoorn < >>>> hoogendoorn.eelco at gmail.com> wrote: >>>> >>>>> If I understand you correctly, the current implementation supports >>>>> these operations. All reductions over groups (except for median) are >>>>> performed through the corresponding ufunc (see GroupBy.reduce). This works >>>>> on multidimensional arrays as well, although this broadcasting over the >>>>> non-grouping axes is accomplished using np.vectorize. Actual vectorization >>>>> only happens over the axis being grouped over, but this is usually a long >>>>> axis. If it isn't, it is more efficient to perform a reduction by means of >>>>> splitting the array by its groups first, and then map the iterable of >>>>> groups over some reduction operation (as noted in the docstring of >>>>> GroupBy.reduce). >>>>> >>>>> >>>>> On Wed, Aug 27, 2014 at 8:29 PM, Jaime Fern?ndez del R?o < >>>>> jaime.frio at gmail.com> wrote: >>>>> >>>>>> Hi Eelco, >>>>>> >>>>>> I took a deeper look into your code a couple of weeks back. I don't >>>>>> think I have fully grasped what it allows completely, but I agree that some >>>>>> form of what you have there is highly desirable. Along the same lines, for >>>>>> sometime I have been thinking that the right place for a `groupby` in numpy >>>>>> is as a method of ufuncs, so that `np.add.groupby(arr, groups)` would do a >>>>>> multidimensional version of `np.bincount(groups, weights=arr)`. You would >>>>>> then need a more powerful version of `np.unique` to produce the `groups`, >>>>>> but that is something that Joe Kington's old PR was very close to >>>>>> achieving, that should probably be resurrected as well. But yes, there >>>>>> seems to be material for a NEP here, and some guidance from one of the >>>>>> numpy devs would be helpful in getting this somewhere. >>>>>> >>>>>> Jaime >>>>>> >>>>>> >>>>>> On Wed, Aug 27, 2014 at 10:35 AM, Eelco Hoogendoorn < >>>>>> hoogendoorn.eelco at gmail.com> wrote: >>>>>> >>>>>>> It wouldn't hurt to have this function, but my intuition is that its >>>>>>> use will be minimal. If you are already working with sorted arrays, you >>>>>>> already have a flop cost on that order of magnitude, and the optimized >>>>>>> merge saves you a factor two at the very most. Using numpy means you are >>>>>>> sacrificing factors of two and beyond relative to pure C left right and >>>>>>> center anyway, so if this kind of thing matters to you, you probably wont >>>>>>> be working in numpy in the first place. >>>>>>> >>>>>>> That said, I share your interest in overhauling arraysetops. I see >>>>>>> many opportunities for expanding its functionality. There is a question >>>>>>> that amounts to 'how do I do group-by in numpy' on stackoverflow almost >>>>>>> every week. That would have my top priority, but also things like extending >>>>>>> np.unique to things like graph edges, or other more complex input, is very >>>>>>> often useful to me. >>>>>>> >>>>>>> Ive written up a draft a while ago >>>>>>> which accomplishes all of the above and more. It reimplements functions >>>>>>> like np.unique around a common Index object. This index object encapsulates >>>>>>> the precomputation (sorting) required for efficient set-ops on different >>>>>>> datatypes, and provides a common interface to obtain the kind of >>>>>>> information you are talking about (which is used extensively internally in >>>>>>> the implementation of group_by, for instance). >>>>>>> >>>>>>> ie, this functionality allows you to write neat things like >>>>>>> group_by(randint(0,9,(100,2))).median(rand(100)) >>>>>>> >>>>>>> But I have the feeling much more could be done in this direction, >>>>>>> and I feel this draft could really use a bit of back and forth. If we are >>>>>>> going to completely rewrite arraysetops, we might as well do it right. >>>>>>> >>>>>>> >>>>>>> On Wed, Aug 27, 2014 at 7:02 PM, Jaime Fern?ndez del R?o < >>>>>>> jaime.frio at gmail.com> wrote: >>>>>>> >>>>>>>> A request was open in github to add a `merge` function to numpy >>>>>>>> that would merge two sorted 1d arrays into a single sorted 1d array. I have >>>>>>>> been playing around with that idea for a while, and have a branch in my >>>>>>>> numpy fork that adds a `mergesorted` function to `numpy.lib`: >>>>>>>> >>>>>>>> >>>>>>>> https://github.com/jaimefrio/numpy/commit/ce5d480afecc989a36e5d2bf4ea1d1ba58a83b0a >>>>>>>> >>>>>>>> I drew inspiration from C++ STL algorithms, and merged into a >>>>>>>> single function what merge, set_union, set_intersection, set_difference and >>>>>>>> set_symmetric_difference do there. >>>>>>>> >>>>>>>> My first thought when implementing this was to not make it a public >>>>>>>> function, but use it under the hood to speed-up some of the functions of >>>>>>>> `arraysetops.py`, which are now merging two already sorted functions by >>>>>>>> doing `np.sort(np.concatenate((a, b)))`. I would need to revisit my >>>>>>>> testing, but the speed-ups weren't that great. >>>>>>>> >>>>>>>> One other thing I saw value in for some of the `arraysetops.py` >>>>>>>> functions, but couldn't fully figure out, was in providing extra output >>>>>>>> aside from the merged arrays, either in the form of indices, or of boolean >>>>>>>> masks, indicating which items of the original arrays made it into the >>>>>>>> merged one, and/or where did they end up in it. >>>>>>>> >>>>>>>> Since there is at least one other person out there that likes it, >>>>>>>> is there any more interest in such a function? If yes, any comments on what >>>>>>>> the proper interface for extra output should be? Although perhaps the best >>>>>>>> is to leave that out for starters and see what use people make of it, if >>>>>>>> any. >>>>>>>> >>>>>>>> Jaime >>>>>>>> >>>>>>>> -- >>>>>>>> (\__/) >>>>>>>> ( O.o) >>>>>>>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus >>>>>>>> planes de dominaci?n mundial. >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> NumPy-Discussion mailing list >>>>>>>> NumPy-Discussion at scipy.org >>>>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> NumPy-Discussion mailing list >>>>>>> NumPy-Discussion at scipy.org >>>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> (\__/) >>>>>> ( O.o) >>>>>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus >>>>>> planes de dominaci?n mundial. >>>>>> >>>>>> _______________________________________________ >>>>>> NumPy-Discussion mailing list >>>>>> NumPy-Discussion at scipy.org >>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>>> >>>>>> >>>>> >>>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> >> -- >> (\__/) >> ( O.o) >> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes >> de dominaci?n mundial. >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Aug 31 22:36:28 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 31 Aug 2014 20:36:28 -0600 Subject: [Numpy-discussion] Does a `mergesorted` function make sense? In-Reply-To: References: Message-ID: On Sun, Aug 31, 2014 at 1:48 PM, Eelco Hoogendoorn < hoogendoorn.eelco at gmail.com> wrote: > Ive organized all code I had relating to this subject in a github > repository . > That should facilitate shooting around ideas. Ive also added more > documentation and structure to make it easier to see what is going on. > > Hopefully we can converge on a common vision, and then improve the > documentation and testing to make it worthy of including in the numpy > master. > > Note that there is also a complete rewrite of the classic > numpy.arraysetops, such that they are also generalized to more complex > input, such as finding unique graph edges, and so on. > > You mentioned getting the numpy core developers involved; are they not > subscribed to this mailing list? I wouldn't be surprised; youd hope there > is a channel of discussion concerning development with higher signal to > noise.... > > There are only about 2.5 of us at the moment. Those for whom this is an itch that need scratching should hash things out and make a PR. The main question for me is if it belongs in numpy, scipy, or somewhere else. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: