From sebastian at sipsolutions.net Sun Nov 1 10:53:26 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 1 Nov 2015 15:53:26 +0000 Subject: [Numpy-discussion] Commit rights for Jonathan J. Helmus In-Reply-To: References: <563187E7.10801@gmail.com> <56338AA4.5080308@gmail.com> Message-ID: Congrats, both of you ;). On Sun Nov 1 04:30:27 2015 GMT+0330, Jaime Fern?ndez del R?o wrote: > "Gruetzi!", as I just found out we say in Switzerland... > On Oct 30, 2015 8:20 AM, "Jonathan Helmus" wrote: > > > On 10/28/2015 09:43 PM, Allan Haldane wrote: > > > On 10/28/2015 05:27 PM, Nathaniel Smith wrote: > > >> Hi all, > > >> > > >> Jonathan J. Helmus (@jjhelmus) has been given commit rights -- let's all > > >> welcome him aboard. > > >> > > >> -n > > > > > > Welcome Jonathan, happy to have you on the team! > > > > > > Allan > > > > > > > Thanks you everyone for the kind welcome. I'm looking forwarding to > > being part of them team. > > > > - Jonathan Helmus > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > From ralf.gommers at gmail.com Sun Nov 1 18:16:27 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 2 Nov 2015 00:16:27 +0100 Subject: [Numpy-discussion] Proposal: stop supporting 'setup.py install'; start requiring 'pip install .' instead In-Reply-To: References: Message-ID: On Sun, Nov 1, 2015 at 1:59 AM, Ralf Gommers wrote: > > > On Sun, Nov 1, 2015 at 1:54 AM, Ralf Gommers > wrote: > >> >> >> >> On Thu, Oct 29, 2015 at 8:11 PM, Warren Weckesser < >> warren.weckesser at gmail.com> wrote: >> >>> >>> >>> On Tue, Oct 27, 2015 at 12:31 AM, Nathaniel Smith wrote: >>> >>>> Hi all, >>>> >>>> Apparently it is not well known that if you have a Python project >>>> source tree (e.g., a numpy checkout), then the correct way to install >>>> it is NOT to type >>>> >>>> python setup.py install # bad and broken! >>>> >>>> but rather to type >>>> >>>> pip install . >>>> >>>> >>> >>> FWIW, I don't see any mention of this in the numpy docs, but I do see a >>> lot of instructions involving `setup.py build` and `setup.py install`. >>> See, for example, INSTALL.txt. Also see >>> >>> http://docs.scipy.org/doc/numpy/user/install.html#building-from-source >>> So I guess it is not surprising that it is not well known. >>> >> >> Indeed, install docs are always hopelessly outdated. And we have too many >> of them. There's duplicate info in INSTALL.txt and >> http://scipy.org/scipylib/building/index.html for example. We should >> probably just empty out INSTALL.txt and simply put a link in it to the html >> docs. >> >> I've created an issue with a long todo list and a bunch of links: >> https://github.com/numpy/numpy/issues/6599. Feel free to add stuff. Or >> to go fix something:) >> > > Oh, and: looking at this thread there haven't been serious unanswered > concerns (at least in my perception), so without more discussion I'd > interpret the current status as "go ahead". > Hmm, after some more testing I'm going to have to bring up a few concerns myself: 1. ``pip install .`` still has a clear bug; it starts by copying everything (including .git/ !) to a tempdir with shutil, which is very slow. And the fix for that will go via ``setup.py sdist``, which is still slow. 2. ``pip install .`` silences build output, which may make sense for some usecases, but for numpy it just sits there for minutes with no output after printing "Running setup.py install for numpy". Users will think it hangs and Ctrl-C it. https://github.com/pypa/pip/issues/2732 3. ``pip install .`` refuses to upgrade an already installed development version. For released versions that makes sense, but if I'm in a git tree then I don't want it to refuse because 1.11.0.dev0+githash1 compares equal to 1.11.0.dev0+githash2. Especially after waiting a few minutes, see (1). I've sent a (incomplete) fix for the shutil thing ( https://github.com/pypa/pip/pull/3219) and will comment on some open issues on the pip tracker. But I'm thinking that for now we should go with some printed message first. Something like "please use ``pip install .`` if you want reliable uninstall behavior. See for more details". Pip has worked quite well for me in the past, but the above makes me thing it's not much of an improvement over use of setuptools..... Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Nov 1 19:12:33 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 1 Nov 2015 17:12:33 -0700 Subject: [Numpy-discussion] Proposal: stop supporting 'setup.py install'; start requiring 'pip install .' instead In-Reply-To: References: Message-ID: On Sun, Nov 1, 2015 at 4:16 PM, Ralf Gommers wrote: > > > On Sun, Nov 1, 2015 at 1:59 AM, Ralf Gommers > wrote: > >> >> >> On Sun, Nov 1, 2015 at 1:54 AM, Ralf Gommers >> wrote: >> >>> >>> >>> >>> On Thu, Oct 29, 2015 at 8:11 PM, Warren Weckesser < >>> warren.weckesser at gmail.com> wrote: >>> >>>> >>>> >>>> On Tue, Oct 27, 2015 at 12:31 AM, Nathaniel Smith >>>> wrote: >>>> >>>>> Hi all, >>>>> >>>>> Apparently it is not well known that if you have a Python project >>>>> source tree (e.g., a numpy checkout), then the correct way to install >>>>> it is NOT to type >>>>> >>>>> python setup.py install # bad and broken! >>>>> >>>>> but rather to type >>>>> >>>>> pip install . >>>>> >>>>> >>>> >>>> FWIW, I don't see any mention of this in the numpy docs, but I do see a >>>> lot of instructions involving `setup.py build` and `setup.py install`. >>>> See, for example, INSTALL.txt. Also see >>>> >>>> http://docs.scipy.org/doc/numpy/user/install.html#building-from-source >>>> So I guess it is not surprising that it is not well known. >>>> >>> >>> Indeed, install docs are always hopelessly outdated. And we have too >>> many of them. There's duplicate info in INSTALL.txt and >>> http://scipy.org/scipylib/building/index.html for example. We should >>> probably just empty out INSTALL.txt and simply put a link in it to the html >>> docs. >>> >>> I've created an issue with a long todo list and a bunch of links: >>> https://github.com/numpy/numpy/issues/6599. Feel free to add stuff. Or >>> to go fix something:) >>> >> >> Oh, and: looking at this thread there haven't been serious unanswered >> concerns (at least in my perception), so without more discussion I'd >> interpret the current status as "go ahead". >> > > Hmm, after some more testing I'm going to have to bring up a few concerns > myself: > > 1. ``pip install .`` still has a clear bug; it starts by copying > everything (including .git/ !) to a tempdir with shutil, which is very > slow. And the fix for that will go via ``setup.py sdist``, which is still > slow. > > 2. ``pip install .`` silences build output, which may make sense for some > usecases, but for numpy it just sits there for minutes with no output after > printing "Running setup.py install for numpy". Users will think it hangs > and Ctrl-C it. https://github.com/pypa/pip/issues/2732 > > 3. ``pip install .`` refuses to upgrade an already installed development > version. For released versions that makes sense, but if I'm in a git tree > then I don't want it to refuse because 1.11.0.dev0+githash1 compares equal > to 1.11.0.dev0+githash2. Especially after waiting a few minutes, see (1). > > > I've sent a (incomplete) fix for the shutil thing ( > https://github.com/pypa/pip/pull/3219) and will comment on some open > issues on the pip tracker. But I'm thinking that for now we should go with > some printed message first. Something like "please use ``pip install .`` if > you want reliable uninstall behavior. See for more details". > > Pip has worked quite well for me in the past, but the above makes me thing > it's not much of an improvement over use of setuptools..... > Which version of pip? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Mon Nov 2 00:22:01 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 2 Nov 2015 05:22:01 +0000 (UTC) Subject: [Numpy-discussion] isfortran compatibility in numpy 1.10. References: Message-ID: <1351855729468134001.145125sturla.molden-gmail.com@news.gmane.org> Charles R Harris wrote: > 1. Return `a.flags.f_contiguous`. This differs for 1-D arrays, but is > most consistent with the name isfortran. If the idea is to determine if an array can safely be passed to Fortran, this is the correct one. > 2. Return `a.flags.f_contiguous and a.ndim > 1`, which would be backward > compatible. This one is just wrong. A compromize might be to raise an exception in the case of a.ndim<2. Sturla From ralf.gommers at gmail.com Mon Nov 2 01:47:34 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 2 Nov 2015 07:47:34 +0100 Subject: [Numpy-discussion] Proposal: stop supporting 'setup.py install'; start requiring 'pip install .' instead In-Reply-To: References: Message-ID: On Mon, Nov 2, 2015 at 1:12 AM, Charles R Harris wrote: > > > On Sun, Nov 1, 2015 at 4:16 PM, Ralf Gommers > wrote: > >> >> >> On Sun, Nov 1, 2015 at 1:59 AM, Ralf Gommers >> wrote: >> >>> >>> >>> On Sun, Nov 1, 2015 at 1:54 AM, Ralf Gommers >>> wrote: >>> >>>> >>>> >>>> >>>> On Thu, Oct 29, 2015 at 8:11 PM, Warren Weckesser < >>>> warren.weckesser at gmail.com> wrote: >>>> >>>>> >>>>> >>>>> On Tue, Oct 27, 2015 at 12:31 AM, Nathaniel Smith >>>>> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> Apparently it is not well known that if you have a Python project >>>>>> source tree (e.g., a numpy checkout), then the correct way to install >>>>>> it is NOT to type >>>>>> >>>>>> python setup.py install # bad and broken! >>>>>> >>>>>> but rather to type >>>>>> >>>>>> pip install . >>>>>> >>>>>> >>>>> >>>>> FWIW, I don't see any mention of this in the numpy docs, but I do see >>>>> a lot of instructions involving `setup.py build` and `setup.py install`. >>>>> See, for example, INSTALL.txt. Also see >>>>> >>>>> http://docs.scipy.org/doc/numpy/user/install.html#building-from-source >>>>> So I guess it is not surprising that it is not well known. >>>>> >>>> >>>> Indeed, install docs are always hopelessly outdated. And we have too >>>> many of them. There's duplicate info in INSTALL.txt and >>>> http://scipy.org/scipylib/building/index.html for example. We should >>>> probably just empty out INSTALL.txt and simply put a link in it to the html >>>> docs. >>>> >>>> I've created an issue with a long todo list and a bunch of links: >>>> https://github.com/numpy/numpy/issues/6599. Feel free to add stuff. Or >>>> to go fix something:) >>>> >>> >>> Oh, and: looking at this thread there haven't been serious unanswered >>> concerns (at least in my perception), so without more discussion I'd >>> interpret the current status as "go ahead". >>> >> >> Hmm, after some more testing I'm going to have to bring up a few concerns >> myself: >> >> 1. ``pip install .`` still has a clear bug; it starts by copying >> everything (including .git/ !) to a tempdir with shutil, which is very >> slow. And the fix for that will go via ``setup.py sdist``, which is still >> slow. >> >> 2. ``pip install .`` silences build output, which may make sense for some >> usecases, but for numpy it just sits there for minutes with no output after >> printing "Running setup.py install for numpy". Users will think it hangs >> and Ctrl-C it. https://github.com/pypa/pip/issues/2732 >> >> 3. ``pip install .`` refuses to upgrade an already installed development >> version. For released versions that makes sense, but if I'm in a git tree >> then I don't want it to refuse because 1.11.0.dev0+githash1 compares equal >> to 1.11.0.dev0+githash2. Especially after waiting a few minutes, see (1). >> >> >> I've sent a (incomplete) fix for the shutil thing ( >> https://github.com/pypa/pip/pull/3219) and will comment on some open >> issues on the pip tracker. But I'm thinking that for now we should go with >> some printed message first. Something like "please use ``pip install .`` if >> you want reliable uninstall behavior. See for more details". >> >> Pip has worked quite well for me in the past, but the above makes me >> thing it's not much of an improvement over use of setuptools..... >> > > Which version of pip? > Latest master (it's 'develop' branch). Recent released versions will be the same, because there are open issues for these things. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at gmail.com Mon Nov 2 05:09:05 2015 From: faltet at gmail.com (Francesc Alted) Date: Mon, 2 Nov 2015 11:09:05 +0100 Subject: [Numpy-discussion] ANN: numexpr 2.4.5 released Message-ID: ========================= Announcing Numexpr 2.4.5 ========================= Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. It wears multi-threaded capabilities, as well as support for Intel's MKL (Math Kernel Library), which allows an extremely fast evaluation of transcendental functions (sin, cos, tan, exp, log...) while squeezing the last drop of performance out of your multi-core processors. Look here for a some benchmarks of numexpr using MKL: https://github.com/pydata/numexpr/wiki/NumexprMKL Its only dependency is NumPy (MKL is optional), so it works well as an easy-to-deploy, easy-to-use, computational engine for projects that don't want to adopt other solutions requiring more heavy dependencies. What's new ========== This is a maintenance release where an important bug in multithreading code has been fixed (#185 Benedikt Reinartz, Francesc Alted). Also, many harmless warnings (overflow/underflow, divide by zero and others) in the test suite have been silenced (#183, Francesc Alted). In case you want to know more in detail what has changed in this version, see: https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst Where I can find Numexpr? ========================= The project is hosted at GitHub in: https://github.com/pydata/numexpr You can get the packages from PyPI as well (but not for RC releases): http://pypi.python.org/pypi/numexpr Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy data! -- Francesc Alted -------------- next part -------------- An HTML attachment was scrubbed... URL: From scollis.acrf at gmail.com Mon Nov 2 11:40:25 2015 From: scollis.acrf at gmail.com (Scott Collis) Date: Mon, 02 Nov 2015 10:40:25 -0600 Subject: [Numpy-discussion] Argonne is hiring a postdoc in radar forward modelling using Python Message-ID: <563791F9.6060807@gmail.com> Dear Numpy Users, Argonne National Lab is hiring a postdoc working with the team behind Py-ART. Please take a look and use this link to apply and direct any questions towards me. http://careers.peopleclick.com/careerscp/client_argonnelab/post_doc/en_US/gateway.do?functionName=viewFromLink&localeCode=en-us&jobPostId=3702&source=Facebook&sourceType=NETWORKING_SITE Long shot I know, but we found our key developer using this list last time :) Cheers, Scott -- -- Dr Scott Collis ARM Precipitation Radar Translator Environmental Science Division Argonne National Laboratory Mb: +1 630 235 8025 Of: +1 630 252 0550 Become a Py-ART user today! http://arm-doe.github.io/pyart/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Mon Nov 2 13:28:23 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 2 Nov 2015 18:28:23 +0000 Subject: [Numpy-discussion] isfortran compatibility in numpy 1.10. In-Reply-To: <1351855729468134001.145125sturla.molden-gmail.com@news.gmane.org> References: <1351855729468134001.145125sturla.molden-gmail.com@news.gmane.org> Message-ID: I bet it has all been said already, but to note just in case. In numpy itself we use it mostly to determine the memory order of the *output* and not for safty purpose. That is the macro of course and I think yelling people to use flags.fnc in python is better. - Sebastian On Mon Nov 2 08:52:01 2015 GMT+0330, Sturla Molden wrote: > Charles R Harris wrote: > > > 1. Return `a.flags.f_contiguous`. This differs for 1-D arrays, but is > > most consistent with the name isfortran. > > If the idea is to determine if an array can safely be passed to Fortran, > this is the correct one. > > > 2. Return `a.flags.f_contiguous and a.ndim > 1`, which would be backward > > compatible. > > This one is just wrong. > > A compromize might be to raise an exception in the case of a.ndim<2. > > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > From charlesr.harris at gmail.com Mon Nov 2 13:49:27 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 2 Nov 2015 11:49:27 -0700 Subject: [Numpy-discussion] isfortran compatibility in numpy 1.10. In-Reply-To: References: <1351855729468134001.145125sturla.molden-gmail.com@news.gmane.org> Message-ID: On Mon, Nov 2, 2015 at 11:28 AM, Sebastian Berg wrote: > I bet it has all been said already, but to note just in case. In numpy > itself we use it mostly to determine the memory order of the *output* and > not for safty purpose. That is the macro of course and I think yelling > people to use flags.fnc in python is better. > Probably all the Numpy uses of `PyArray_ISFORTRAN` should be audited. My guess is that it will be found to be incorrect in some (most?) of the places. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at gmail.com Mon Nov 2 14:13:46 2015 From: faltet at gmail.com (Francesc Alted) Date: Mon, 2 Nov 2015 20:13:46 +0100 Subject: [Numpy-discussion] ANN: numexpr 2.4.6 released Message-ID: Hi, This is a quick release fixing some reported problems in the 2.4.5 version that I announced a few hours ago. Hope I have fixed the main issues now. Now, the official announcement: ========================= Announcing Numexpr 2.4.6 ========================= Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. It wears multi-threaded capabilities, as well as support for Intel's MKL (Math Kernel Library), which allows an extremely fast evaluation of transcendental functions (sin, cos, tan, exp, log...) while squeezing the last drop of performance out of your multi-core processors. Look here for a some benchmarks of numexpr using MKL: https://github.com/pydata/numexpr/wiki/NumexprMKL Its only dependency is NumPy (MKL is optional), so it works well as an easy-to-deploy, easy-to-use, computational engine for projects that don't want to adopt other solutions requiring more heavy dependencies. What's new ========== This is a quick maintenance version that offers better handling of MSVC symbols (#168, Francesc Alted), as well as fising some UserWarnings in Solaris (#189, Graham Jones). In case you want to know more in detail what has changed in this version, see: https://github.com/pydata/numexpr/blob/master/RELEASE_NOTES.rst Where I can find Numexpr? ========================= The project is hosted at GitHub in: https://github.com/pydata/numexpr You can get the packages from PyPI as well (but not for RC releases): http://pypi.python.org/pypi/numexpr Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy data! -- Francesc Alted -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Nov 2 18:04:54 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 3 Nov 2015 00:04:54 +0100 Subject: [Numpy-discussion] Numpy style docstring support in Sphinx and PyCharm Message-ID: Hi all, Just noticed this: http://sphinx-doc.org/latest/ext/napoleon.html http://www.jetbrains.com/pycharm/whatsnew/index.html#GDocstrings Slowly conquering the docstring world:) Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Nov 2 18:44:06 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 2 Nov 2015 15:44:06 -0800 Subject: [Numpy-discussion] deprecate fromstring() for text reading? In-Reply-To: References: <2283704104052164280@unknownmsgid> <-1464708838107245522@unknownmsgid> Message-ID: On Tue, Oct 27, 2015 at 7:30 AM, Benjamin Root wrote: > FWIW, when I needed a fast Fixed Width reader > was there potentially no whitespace between fields in that case? In which case, it really isn a different use-case than delimited text -- if it's at all common, a version written in C would be nice and fast. and nat hard to do. But fromstring never would have helped you with that anyway :-) -CHB > for a very large dataset last year, I found that np.genfromtext() was > faster than pandas' read_fwf(). IIRC, pandas' text reading code fell back > to pure python for fixed width scenarios. > > On Fri, Oct 23, 2015 at 8:22 PM, Chris Barker - NOAA Federal < > chris.barker at noaa.gov> wrote: > >> Grabbing the pandas csv reader would be great, and I hope it happens >> sooner than later, though alas, I haven't the spare cycles for it either. >> >> In the meantime though, can we put a deprecation Warning in when using >> fromstring() on text files? It's really pretty broken. >> >> -Chris >> >> On Oct 23, 2015, at 4:02 PM, Jeff Reback wrote: >> >> >> >> On Oct 23, 2015, at 6:49 PM, Nathaniel Smith wrote: >> >> On Oct 23, 2015 3:30 PM, "Jeff Reback" wrote: >> > >> > On Oct 23, 2015, at 6:13 PM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> > >> >> >> >> >> >> On Thu, Oct 22, 2015 at 5:47 PM, Chris Barker - NOAA Federal < >> chris.barker at noaa.gov> wrote: >> >>> >> >>> >> >>>> I think it would be good to keep the usage to read binary data at >> least. >> >>> >> >>> >> >>> Agreed -- it's only the text file reading I'm proposing to deprecate. >> It was kind of weird to cram it in there in the first place. >> >>> >> >>> Oh, fromfile() has the same issues. >> >>> >> >>> Chris >> >>> >> >>> >> >>>> Or is there a good alternative to `np.fromstring(, >> dtype=...)`? -- Marten >> >>>> >> >>>> On Thu, Oct 22, 2015 at 1:03 PM, Chris Barker >> wrote: >> >>>>> >> >>>>> There was just a question about a bug/issue with scipy.fromstring >> (which is numpy.fromstring) when used to read integers from a text file. >> >>>>> >> >>>>> >> https://mail.scipy.org/pipermail/scipy-user/2015-October/036746.html >> >>>>> >> >>>>> fromstring() is bugging and inflexible for reading text files -- >> and it is a very, very ugly mess of code. I dug into it a while back, and >> gave up -- just to much of a mess! >> >>>>> >> >>>>> So we really should completely re-implement it, or deprecate it. I >> doubt anyone is going to do a big refactor, so that means deprecating it. >> >>>>> >> >>>>> Also -- if we do want a fast read numbers from text files function >> (which would be nice, actually), it really should get a new name anyway. >> >>>>> >> >>>>> (and the hopefully coming new dtype system would make it easier to >> write cleanly) >> >>>>> >> >>>>> I'm not sure what deprecating something means, though -- have it >> raise a deprecation warning in the next version? >> >>>>> >> >> >> >> There was discussion at SciPy 2015 of separating out the text reading >> abilities of Pandas so that numpy could include it. We should contact Jeff >> Rebeck and see about moving that forward. >> > >> > >> > IIRC Thomas Caswell was interested in doing this :) >> >> When he was in Berkeley a few weeks ago he assured me that every night >> since SciPy he has dutifully been feeling guilty about not having done it >> yet. I think this week his paltry excuse is that he's "on his honeymoon" or >> something. >> >> ...which is to say that if someone has some spare cycles to take this >> over then I think that might be a nice wedding present for him :-). >> >> (The basic idea is to take the text reading backend behind >> pandas.read_csv and extract it into a standalone package that pandas could >> depend on, and that could also be used by other packages like numpy (among >> others -- I thing dato's SFrame package has a fork of this code as well?)) >> >> -n >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> I can certainly provide guidance on how/what to extract but don't have >> spare cycles myself for this :( >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Nov 2 18:55:46 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 2 Nov 2015 15:55:46 -0800 Subject: [Numpy-discussion] [NumPy/Swig] Return NumPy array with same size as input array (no additional length argument) In-Reply-To: <1446272109262-41601.post@n7.nabble.com> References: <1446272109262-41601.post@n7.nabble.com> Message-ID: On Fri, Oct 30, 2015 at 11:15 PM, laurentes wrote: > Using Swig, I don't manage to (properly) create the Python Binding for the > following C-like function: > > void add_array(double* input_array1, double* input_array2, double* > output_array, int length); > > where the three arrays have all the same length. > do you have to use SWIG? this would be really easy in Cython.... cdef cdef extern from "your_header.h": void add_array(double* input_array1, double* input_array2, double* output_array, int length) def py_add_array( np.ndarray[double, ndim=1] arr1, np.ndarray[double, ndim=1] arr2): cdef int length if arr1.shape != arr2.shape: raise ValueError("Arrays must be the same size") length = arr1.shape[0] cdef np.ndarray[double, ndim=1] out_arr = np.empty((length), dtype=np.float64) add_array(&arr1[0], &arr2[0], &out_arr[0], length) return out_arr Untested and from memory -- but you get the idea. -CHB > > > > This is similar to this thread > > < > http://numpy-discussion.10968.n7.nabble.com/Numpy-SWIG-td25709.html#a25710 > > > > , which has never been fully addressed online. > > > > From Python, I would like to be able to call: > > > add_array(input_array1, input_array2) > > which would return me a newly allocated NumPy array (output_array) with the > result. > > In my Swig file, I've first used the wrapper function trick described here > < > http://web.mit.edu/6.863/spring2011/packages/numpy_src/doc/swig/doc/numpy_swig.html#a-common-example > > > , that is: > > %apply (double* IN_ARRAY1, int DIM1) {(double* input_array1, int length1), > (double* input_array2, int length2)}; > %apply (double* ARGOUT_ARRAY1, int DIM1) {(double* output_array, int > length3)}; > > %rename (add_array) my_add_array; > %exception my_add_array { > $action > if (PyErr_Occurred()) SWIG_fail; > } > %inline %{ > void my_add_array(double* input_array1, int length1, double* input_array2, > int length2, double* output_array, int length3) { > if (length1 != length2 || length1 != length3) { > PyErr_Format(PyExc_ValueError, > "Arrays of lengths (%d,%d,%d) given", > length1, length2, length3); > } > else { > add_array(input_array1, input_array2, output_array, length1); > } > } > %} > > This allows me to call the function from Python using > add_array(input_array1, input_array2, length). But the third argument of > this function is useless and this function does not look 'Pythonic'. > > Could someone help me to modify my Swig file, such that only the first two > arguments are required for the Python API? > > Thanks a lot, > Laurent > > > > -- > View this message in context: > http://numpy-discussion.10968.n7.nabble.com/NumPy-Swig-Return-NumPy-array-with-same-size-as-input-array-no-additional-length-argument-tp41601.html > Sent from the Numpy-discussion mailing list archive at Nabble.com. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Nov 2 19:04:17 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 2 Nov 2015 16:04:17 -0800 Subject: [Numpy-discussion] Proposal: stop supporting 'setup.py install'; start requiring 'pip install .' instead In-Reply-To: References: <562F80A5.6060004@gmail.com> Message-ID: On Tue, Oct 27, 2015 at 8:25 AM, Nathan Goldbaum wrote: > Interestingly, conda actually does "setup.py install" in the recipe for > numpy: > indeed -- many, many conda packages do setup.py install, whihc doesn't mean it's a good idea --personally, I'm trying hard to switch them all to: pip install ./ :-) Which reminds me, the conda skelaton command craes a pip install build.sh file -- I really need to submit a PR for that ... There are two cases where a 'pip install' run might go off and start >> downloading packages without asking you: >> > for my part, regular old setup.py isntall oftem goes off and istalls sutff too - using easy_install, which really sucks... This is making me want a setuptools-lite again -- see the distutils SIG if you're curious. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Nov 2 20:57:35 2015 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 2 Nov 2015 17:57:35 -0800 Subject: [Numpy-discussion] Proposal: stop supporting 'setup.py install'; start requiring 'pip install .' instead In-Reply-To: References: Message-ID: [Adding distutils-sig to the CC as a heads-up. The context is that numpy is looking at deprecating the use of 'python setup.py install' and enforcing the use of 'pip install .' instead, and running into some issues that will probably need to be addressed if 'pip install .' is going to become the standard interface to work with source trees.] On Sun, Nov 1, 2015 at 3:16 PM, Ralf Gommers wrote: [...] > Hmm, after some more testing I'm going to have to bring up a few concerns > myself: > > 1. ``pip install .`` still has a clear bug; it starts by copying everything > (including .git/ !) to a tempdir with shutil, which is very slow. And the > fix for that will go via ``setup.py sdist``, which is still slow. Ugh. If 'pip (install/wheel) .' is supposed to become the standard way to build things, then it should probably build in-place by default. Working in a temp dir makes perfect sense for 'pip install ' or 'pip install ', but if the user supplies an actual named on-disk directory then presumably the user is expecting this directory to be used, and to be able to take advantage of incremental rebuilds etc., no? > 2. ``pip install .`` silences build output, which may make sense for some > usecases, but for numpy it just sits there for minutes with no output after > printing "Running setup.py install for numpy". Users will think it hangs and > Ctrl-C it. https://github.com/pypa/pip/issues/2732 I tend to agree with the commentary there that for end users this is different but no worse than the current situation where we spit out pages of "errors" that don't mean anything :-). I posted a suggestion on that bug that might help with the apparent hanging problem. > 3. ``pip install .`` refuses to upgrade an already installed development > version. For released versions that makes sense, but if I'm in a git tree > then I don't want it to refuse because 1.11.0.dev0+githash1 compares equal > to 1.11.0.dev0+githash2. Especially after waiting a few minutes, see (1). Ugh, this is clearly just a bug -- `pip install .` should always unconditionally install, IMO. (Did you file a bug yet?) At least the workaround is just 'pip uninstall numpy; pip install .', which is still better the running 'setup.py install' and having it blithely overwrite some files and not others. The first and last issue seem like ones that will mostly only affect developers, who should mostly have the ability to deal with these weird issues (or just use setup.py install --force if that's what they prefer)? This still seems like a reasonable trade-off to me if it also has the effect of reducing the number of weird broken installs among our thousands-of-times-larger userbase. -n -- Nathaniel J. Smith -- http://vorpus.org From njs at pobox.com Mon Nov 2 22:02:30 2015 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 2 Nov 2015 19:02:30 -0800 Subject: [Numpy-discussion] [Distutils] Proposal: stop supporting 'setup.py install'; start requiring 'pip install .' instead In-Reply-To: References: Message-ID: On Nov 2, 2015 6:51 PM, "Robert Collins" wrote: > > On 3 November 2015 at 14:57, Nathaniel Smith wrote: > > [Adding distutils-sig to the CC as a heads-up. The context is that > > numpy is looking at deprecating the use of 'python setup.py install' > > and enforcing the use of 'pip install .' instead, and running into > > some issues that will probably need to be addressed if 'pip install .' > > is going to become the standard interface to work with source trees.] > > > > On Sun, Nov 1, 2015 at 3:16 PM, Ralf Gommers wrote: > > [...] > >> Hmm, after some more testing I'm going to have to bring up a few concerns > >> myself: > >> > >> 1. ``pip install .`` still has a clear bug; it starts by copying everything > >> (including .git/ !) to a tempdir with shutil, which is very slow. And the > >> fix for that will go via ``setup.py sdist``, which is still slow. > > > > Ugh. If 'pip (install/wheel) .' is supposed to become the standard way > > to build things, then it should probably build in-place by default. > > Working in a temp dir makes perfect sense for 'pip install > > ' or 'pip install ', but if the user supplies an > > actual named on-disk directory then presumably the user is expecting > > this directory to be used, and to be able to take advantage of > > incremental rebuilds etc., no? > > Thats what 'pip install -e .' does. 'setup.py develop' -> 'pip install -e .' I'm not talking about in place installs, I'm talking about e.g. building a wheel and then tweaking one file and rebuilding -- traditionally build systems go to some effort to keep track of intermediate artifacts and reuse them across builds when possible, but if you always copy the source tree into a temporary directory before building then there's not much the build system can do. > >> 3. ``pip install .`` refuses to upgrade an already installed development > >> version. For released versions that makes sense, but if I'm in a git tree > >> then I don't want it to refuse because 1.11.0.dev0+githash1 compares equal > >> to 1.11.0.dev0+githash2. Especially after waiting a few minutes, see (1). > > > > Ugh, this is clearly just a bug -- `pip install .` should always > > unconditionally install, IMO. (Did you file a bug yet?) At least the > > workaround is just 'pip uninstall numpy; pip install .', which is > > still better the running 'setup.py install' and having it blithely > > overwrite some files and not others. > > There is a bug open. https://github.com/pypa/pip/issues/536 Thanks! -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From lzkelley at gmail.com Tue Nov 3 09:40:31 2015 From: lzkelley at gmail.com (Luke Zoltan Kelley) Date: Tue, 3 Nov 2015 09:40:31 -0500 Subject: [Numpy-discussion] histogram gives meaningless results with non-finite range Message-ID: <33129B70-E5C4-4F7E-A953-E47D4391690E@gmail.com> This came up in [a matplotlib issue](https://github.com/matplotlib/matplotlib/issues/5221): >>> np.histogram(np.arange(10), range=(0.0, np.inf)) (array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), array([ nan, inf, inf, inf, inf, inf, inf, inf, inf, inf, inf])) >>> np.histogram(np.arange(10), range=(0.0, np.nan)) (array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), array([ nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan])) Clearly the behavior is undefined for those arguments, but perhaps there should be an assertion that the given range must be finite? Happy to make a PR for this. Luke -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.v.root at gmail.com Tue Nov 3 09:59:59 2015 From: ben.v.root at gmail.com (Benjamin Root) Date: Tue, 3 Nov 2015 09:59:59 -0500 Subject: [Numpy-discussion] deprecate fromstring() for text reading? In-Reply-To: References: <2283704104052164280@unknownmsgid> <-1464708838107245522@unknownmsgid> Message-ID: Correct, there were entries that would sometimes take up their entire width. The delimited text readers could not read this particular dataset. The dataset I am referring to is the processed ISD data: https://www.ncdc.noaa.gov/isd As for fromstring() not being able to help there, I didn't mean to imply that it would. I was more aiming to point out a situation where the NumPy's text file reader was significantly better than the Pandas version, so we would want to make sure that we properly benchmark any significant changes to NumPy's text reading code. Who knows where else NumPy beats Pandas? Ben On Mon, Nov 2, 2015 at 6:44 PM, Chris Barker wrote: > On Tue, Oct 27, 2015 at 7:30 AM, Benjamin Root > wrote: > >> FWIW, when I needed a fast Fixed Width reader >> > > was there potentially no whitespace between fields in that case? In which > case, it really isn a different use-case than delimited text -- if it's at > all common, a version written in C would be nice and fast. and nat hard to > do. > > But fromstring never would have helped you with that anyway :-) > > -CHB > > > >> for a very large dataset last year, I found that np.genfromtext() was >> faster than pandas' read_fwf(). IIRC, pandas' text reading code fell back >> to pure python for fixed width scenarios. >> >> On Fri, Oct 23, 2015 at 8:22 PM, Chris Barker - NOAA Federal < >> chris.barker at noaa.gov> wrote: >> >>> Grabbing the pandas csv reader would be great, and I hope it happens >>> sooner than later, though alas, I haven't the spare cycles for it either. >>> >>> In the meantime though, can we put a deprecation Warning in when using >>> fromstring() on text files? It's really pretty broken. >>> >>> -Chris >>> >>> On Oct 23, 2015, at 4:02 PM, Jeff Reback wrote: >>> >>> >>> >>> On Oct 23, 2015, at 6:49 PM, Nathaniel Smith wrote: >>> >>> On Oct 23, 2015 3:30 PM, "Jeff Reback" wrote: >>> > >>> > On Oct 23, 2015, at 6:13 PM, Charles R Harris < >>> charlesr.harris at gmail.com> wrote: >>> > >>> >> >>> >> >>> >> On Thu, Oct 22, 2015 at 5:47 PM, Chris Barker - NOAA Federal < >>> chris.barker at noaa.gov> wrote: >>> >>> >>> >>> >>> >>>> I think it would be good to keep the usage to read binary data at >>> least. >>> >>> >>> >>> >>> >>> Agreed -- it's only the text file reading I'm proposing to >>> deprecate. It was kind of weird to cram it in there in the first place. >>> >>> >>> >>> Oh, fromfile() has the same issues. >>> >>> >>> >>> Chris >>> >>> >>> >>> >>> >>>> Or is there a good alternative to `np.fromstring(, >>> dtype=...)`? -- Marten >>> >>>> >>> >>>> On Thu, Oct 22, 2015 at 1:03 PM, Chris Barker < >>> chris.barker at noaa.gov> wrote: >>> >>>>> >>> >>>>> There was just a question about a bug/issue with scipy.fromstring >>> (which is numpy.fromstring) when used to read integers from a text file. >>> >>>>> >>> >>>>> >>> https://mail.scipy.org/pipermail/scipy-user/2015-October/036746.html >>> >>>>> >>> >>>>> fromstring() is bugging and inflexible for reading text files -- >>> and it is a very, very ugly mess of code. I dug into it a while back, and >>> gave up -- just to much of a mess! >>> >>>>> >>> >>>>> So we really should completely re-implement it, or deprecate it. I >>> doubt anyone is going to do a big refactor, so that means deprecating it. >>> >>>>> >>> >>>>> Also -- if we do want a fast read numbers from text files function >>> (which would be nice, actually), it really should get a new name anyway. >>> >>>>> >>> >>>>> (and the hopefully coming new dtype system would make it easier to >>> write cleanly) >>> >>>>> >>> >>>>> I'm not sure what deprecating something means, though -- have it >>> raise a deprecation warning in the next version? >>> >>>>> >>> >> >>> >> There was discussion at SciPy 2015 of separating out the text reading >>> abilities of Pandas so that numpy could include it. We should contact Jeff >>> Rebeck and see about moving that forward. >>> > >>> > >>> > IIRC Thomas Caswell was interested in doing this :) >>> >>> When he was in Berkeley a few weeks ago he assured me that every night >>> since SciPy he has dutifully been feeling guilty about not having done it >>> yet. I think this week his paltry excuse is that he's "on his honeymoon" or >>> something. >>> >>> ...which is to say that if someone has some spare cycles to take this >>> over then I think that might be a nice wedding present for him :-). >>> >>> (The basic idea is to take the text reading backend behind >>> pandas.read_csv and extract it into a standalone package that pandas could >>> depend on, and that could also be used by other packages like numpy (among >>> others -- I thing dato's SFrame package has a fork of this code as well?)) >>> >>> -n >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >>> I can certainly provide guidance on how/what to extract but don't have >>> spare cycles myself for this :( >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Nov 3 12:03:01 2015 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Tue, 3 Nov 2015 09:03:01 -0800 Subject: [Numpy-discussion] deprecate fromstring() for text reading? In-Reply-To: References: <2283704104052164280@unknownmsgid> <-1464708838107245522@unknownmsgid> Message-ID: <2168677754684929763@unknownmsgid> I was more aiming to point out a situation where the NumPy's text file reader was significantly better than the Pandas version, so we would want to make sure that we properly benchmark any significant changes to NumPy's text reading code. Who knows where else NumPy beats Pandas? Indeed. For this example, I think a fixed-with reader really is a different animal, and it's probably a good idea to have a high performance one in Numpy. Among other things, you wouldn't want it to try to auto-determine data types or anything like that. I think what's on the table now is to bring in a new delimited reader -- I.e. CSV in its various flavors. CHB Ben On Mon, Nov 2, 2015 at 6:44 PM, Chris Barker wrote: > On Tue, Oct 27, 2015 at 7:30 AM, Benjamin Root > wrote: > >> FWIW, when I needed a fast Fixed Width reader >> > > was there potentially no whitespace between fields in that case? In which > case, it really isn a different use-case than delimited text -- if it's at > all common, a version written in C would be nice and fast. and nat hard to > do. > > But fromstring never would have helped you with that anyway :-) > > -CHB > > > >> for a very large dataset last year, I found that np.genfromtext() was >> faster than pandas' read_fwf(). IIRC, pandas' text reading code fell back >> to pure python for fixed width scenarios. >> >> On Fri, Oct 23, 2015 at 8:22 PM, Chris Barker - NOAA Federal < >> chris.barker at noaa.gov> wrote: >> >>> Grabbing the pandas csv reader would be great, and I hope it happens >>> sooner than later, though alas, I haven't the spare cycles for it either. >>> >>> In the meantime though, can we put a deprecation Warning in when using >>> fromstring() on text files? It's really pretty broken. >>> >>> -Chris >>> >>> On Oct 23, 2015, at 4:02 PM, Jeff Reback wrote: >>> >>> >>> >>> On Oct 23, 2015, at 6:49 PM, Nathaniel Smith wrote: >>> >>> On Oct 23, 2015 3:30 PM, "Jeff Reback" wrote: >>> > >>> > On Oct 23, 2015, at 6:13 PM, Charles R Harris < >>> charlesr.harris at gmail.com> wrote: >>> > >>> >> >>> >> >>> >> On Thu, Oct 22, 2015 at 5:47 PM, Chris Barker - NOAA Federal < >>> chris.barker at noaa.gov> wrote: >>> >>> >>> >>> >>> >>>> I think it would be good to keep the usage to read binary data at >>> least. >>> >>> >>> >>> >>> >>> Agreed -- it's only the text file reading I'm proposing to >>> deprecate. It was kind of weird to cram it in there in the first place. >>> >>> >>> >>> Oh, fromfile() has the same issues. >>> >>> >>> >>> Chris >>> >>> >>> >>> >>> >>>> Or is there a good alternative to `np.fromstring(, >>> dtype=...)`? -- Marten >>> >>>> >>> >>>> On Thu, Oct 22, 2015 at 1:03 PM, Chris Barker < >>> chris.barker at noaa.gov> wrote: >>> >>>>> >>> >>>>> There was just a question about a bug/issue with scipy.fromstring >>> (which is numpy.fromstring) when used to read integers from a text file. >>> >>>>> >>> >>>>> >>> https://mail.scipy.org/pipermail/scipy-user/2015-October/036746.html >>> >>>>> >>> >>>>> fromstring() is bugging and inflexible for reading text files -- >>> and it is a very, very ugly mess of code. I dug into it a while back, and >>> gave up -- just to much of a mess! >>> >>>>> >>> >>>>> So we really should completely re-implement it, or deprecate it. I >>> doubt anyone is going to do a big refactor, so that means deprecating it. >>> >>>>> >>> >>>>> Also -- if we do want a fast read numbers from text files function >>> (which would be nice, actually), it really should get a new name anyway. >>> >>>>> >>> >>>>> (and the hopefully coming new dtype system would make it easier to >>> write cleanly) >>> >>>>> >>> >>>>> I'm not sure what deprecating something means, though -- have it >>> raise a deprecation warning in the next version? >>> >>>>> >>> >> >>> >> There was discussion at SciPy 2015 of separating out the text reading >>> abilities of Pandas so that numpy could include it. We should contact Jeff >>> Rebeck and see about moving that forward. >>> > >>> > >>> > IIRC Thomas Caswell was interested in doing this :) >>> >>> When he was in Berkeley a few weeks ago he assured me that every night >>> since SciPy he has dutifully been feeling guilty about not having done it >>> yet. I think this week his paltry excuse is that he's "on his honeymoon" or >>> something. >>> >>> ...which is to say that if someone has some spare cycles to take this >>> over then I think that might be a nice wedding present for him :-). >>> >>> (The basic idea is to take the text reading backend behind >>> pandas.read_csv and extract it into a standalone package that pandas could >>> depend on, and that could also be used by other packages like numpy (among >>> others -- I thing dato's SFrame package has a fork of this code as well?)) >>> >>> -n >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >>> I can certainly provide guidance on how/what to extract but don't have >>> spare cycles myself for this :( >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Nov 3 12:10:16 2015 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Tue, 3 Nov 2015 09:10:16 -0800 Subject: [Numpy-discussion] [Distutils] Proposal: stop supporting 'setup.py install'; start requiring 'pip install .' instead In-Reply-To: References: Message-ID: <561702487663958680@unknownmsgid> >> I'm not talking about in place installs, I'm talking about e.g. building a >> wheel and then tweaking one file and rebuilding -- traditionally build >> systems go to some effort to keep track of intermediate artifacts and reuse >> them across builds when possible, but if you always copy the source tree >> into a temporary directory before building then there's not much the build >> system can do. This strikes me as an optimization -- is it an important one? If I'm doing a lot of tweaking and re-running, I'm usually in develop mode. I can see that when you build a wheel, you may build it, test it, discover an wheel-specific error, and then need to repeat the cycle -- but is that a major use-case? That being said, I have been pretty frustrated debugging conda-build scripts -- there is a lot of overhead setting up the build environment each time you do a build... But with wheel building there is much less overhead, and far fewer complications requiring the edit-build cycle. And couldn't make-style this-has-already-been-done checking happen with a copy anyway? CHB > Ah yes. So I don't think pip should do what it does. It a violation of > the abstractions we all want to see within it. However its not me you > need to convince ;). > > -Rob > > -- > Robert Collins > Distinguished Technologist > HP Converged Cloud > _______________________________________________ > Distutils-SIG maillist - Distutils-SIG at python.org > https://mail.python.org/mailman/listinfo/distutils-sig From charlesr.harris at gmail.com Wed Nov 4 14:28:48 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 4 Nov 2015 12:28:48 -0700 Subject: [Numpy-discussion] New behavior of allclose Message-ID: Hi All, This is to open a discussion of a change of behavior of `np.allclose`. That function uses `isclose` in numpy 1.10 with the result that array subtypes are preserved whereas before they were not. In particular, memmaps are returned when at least one of the inputs is a memmap. By and large I think this is a good thing, OTOH, it is a change in behavior. It is easy to fix, just run `np.array(result, copy=False)` on the current `result`, but I thought I'd raise the topic on the list in case there is a good argument to change things. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan12343 at gmail.com Wed Nov 4 14:36:01 2015 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Wed, 4 Nov 2015 13:36:01 -0600 Subject: [Numpy-discussion] New behavior of allclose In-Reply-To: References: Message-ID: I actually brought this up before 1.10 came out: https://github.com/numpy/numpy/issues/6196 The behavior change brought out a bug in our use of allclose, so while it was annoying in the sense that our test suite started failing in a new way, it was good in that our tests are now more correct. On Wed, Nov 4, 2015 at 1:28 PM, Charles R Harris wrote: > Hi All, > > This is to open a discussion of a change of behavior of `np.allclose`. > That function uses `isclose` in numpy 1.10 with the result that array > subtypes are preserved whereas before they were not. In particular, memmaps > are returned when at least one of the inputs is a memmap. By and large I > think this is a good thing, OTOH, it is a change in behavior. It is easy to > fix, just run `np.array(result, copy=False)` on the current `result`, but I > thought I'd raise the topic on the list in case there is a good argument to > change things. > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.v.root at gmail.com Wed Nov 4 14:40:12 2015 From: ben.v.root at gmail.com (Benjamin Root) Date: Wed, 4 Nov 2015 14:40:12 -0500 Subject: [Numpy-discussion] New behavior of allclose In-Reply-To: References: Message-ID: I am not sure I understand what you mean. Specifically that np.isclose will return a memmap if one of the inputs is a memmap. The result is a brand new array, right? So, what is that result memmapping from? Also, how does this impact np.allclose()? That function returns a scalar True/False, so what is the change in behavior there? By the way, the docs for isclose in 1.10.1 does not mention any behavior changes. Ben Root On Wed, Nov 4, 2015 at 2:28 PM, Charles R Harris wrote: > Hi All, > > This is to open a discussion of a change of behavior of `np.allclose`. > That function uses `isclose` in numpy 1.10 with the result that array > subtypes are preserved whereas before they were not. In particular, memmaps > are returned when at least one of the inputs is a memmap. By and large I > think this is a good thing, OTOH, it is a change in behavior. It is easy to > fix, just run `np.array(result, copy=False)` on the current `result`, but I > thought I'd raise the topic on the list in case there is a good argument to > change things. > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan12343 at gmail.com Wed Nov 4 14:42:28 2015 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Wed, 4 Nov 2015 13:42:28 -0600 Subject: [Numpy-discussion] New behavior of allclose In-Reply-To: References: Message-ID: Oh oops, this is about np.allcose, not np.assert_allclose. Sorry for the noise... On Wed, Nov 4, 2015 at 1:36 PM, Nathan Goldbaum wrote: > I actually brought this up before 1.10 came out: > https://github.com/numpy/numpy/issues/6196 > > The behavior change brought out a bug in our use of allclose, so while it > was annoying in the sense that our test suite started failing in a new way, > it was good in that our tests are now more correct. > > On Wed, Nov 4, 2015 at 1:28 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> Hi All, >> >> This is to open a discussion of a change of behavior of `np.allclose`. >> That function uses `isclose` in numpy 1.10 with the result that array >> subtypes are preserved whereas before they were not. In particular, memmaps >> are returned when at least one of the inputs is a memmap. By and large I >> think this is a good thing, OTOH, it is a change in behavior. It is easy to >> fix, just run `np.array(result, copy=False)` on the current `result`, but I >> thought I'd raise the topic on the list in case there is a good argument to >> change things. >> >> Chuck >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Nov 4 14:43:48 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 4 Nov 2015 12:43:48 -0700 Subject: [Numpy-discussion] New behavior of allclose In-Reply-To: References: Message-ID: On Wed, Nov 4, 2015 at 12:40 PM, Benjamin Root wrote: > I am not sure I understand what you mean. Specifically that np.isclose > will return a memmap if one of the inputs is a memmap. The result is a > brand new array, right? So, what is that result memmapping from? Also, how > does this impact np.allclose()? That function returns a scalar True/False, > so what is the change in behavior there? > > By the way, the docs for isclose in 1.10.1 does not mention any behavior > changes. > Yep, it is a new issue, see #6475 Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Nov 4 14:45:06 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 4 Nov 2015 12:45:06 -0700 Subject: [Numpy-discussion] New behavior of allclose In-Reply-To: References: Message-ID: On Wed, Nov 4, 2015 at 12:42 PM, Nathan Goldbaum wrote: > Oh oops, this is about np.allcose, not np.assert_allclose. Sorry for the > noise... > Probably related ;) Did you open an issue for it? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From nathan12343 at gmail.com Wed Nov 4 14:47:43 2015 From: nathan12343 at gmail.com (Nathan Goldbaum) Date: Wed, 4 Nov 2015 13:47:43 -0600 Subject: [Numpy-discussion] New behavior of allclose In-Reply-To: References: Message-ID: Yup, https://github.com/numpy/numpy/issues/6196 On Wed, Nov 4, 2015 at 1:45 PM, Charles R Harris wrote: > > > On Wed, Nov 4, 2015 at 12:42 PM, Nathan Goldbaum > wrote: > >> Oh oops, this is about np.allcose, not np.assert_allclose. Sorry for the >> noise... >> > > Probably related ;) Did you open an issue for it? > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From derek at astro.physik.uni-goettingen.de Wed Nov 4 15:00:19 2015 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Wed, 4 Nov 2015 21:00:19 +0100 Subject: [Numpy-discussion] deprecate fromstring() for text reading? In-Reply-To: <2168677754684929763@unknownmsgid> References: <2283704104052164280@unknownmsgid> <-1464708838107245522@unknownmsgid> <2168677754684929763@unknownmsgid> Message-ID: <58E03A91-E801-4A53-AC4D-538C9A7DBBC1@astro.physik.uni-goettingen.de> On 3 Nov 2015, at 6:03 pm, Chris Barker - NOAA Federal wrote: > > I was more aiming to point out a situation where the NumPy's text file reader was significantly better than the Pandas version, so we would want to make sure that we properly benchmark any significant changes to NumPy's text reading code. Who knows where else NumPy beats Pandas? > Indeed. For this example, I think a fixed-with reader really is a different animal, and it's probably a good idea to have a high performance one in Numpy. Among other things, you wouldn't want it to try to auto-determine data types or anything like that. > > I think what's on the table now is to bring in a new delimited reader -- I.e. CSV in its various flavors. > To add my own handful of change or at least another data point, I had been looking into both the pandas and the Astropy fast readers as a fast loadtxt/genfromtxt replacement; at the time I found the Astropy cparser source somewhat easier to dig into, although looking now Pandas' parser.pyx seems clear enough as well. Some comparison of the two can be found at http://astropy.readthedocs.org/en/stable/io/ascii/fast_ascii_io.html#speed-gains Unfortunately the Astropy fast reader currently does not support fixed-width format either, and adding this functionality would require modifications to the tokenizer C code - not sure how extensive. Cheers, Derek From stefan at seefeld.name Wed Nov 4 19:40:11 2015 From: stefan at seefeld.name (Stefan Seefeld) Date: Wed, 4 Nov 2015 19:40:11 -0500 Subject: [Numpy-discussion] querying backend information Message-ID: <563AA56B.5020207@seefeld.name> Hello, is there a way to query Numpy for information about backends (BLAS, LAPACK, etc.) that it was compiled against, including compiler / linker flags that were used ? Consider the use-case where instead of calling a function such as numpy.dot() I may want to call the appropriate backend directly using the C API as an optimization technique. Is there a straight-forward way to do that ? In a somewhat related line of thought: Is there a way to see what backends are available during Numpy compile-time ? I'm looking for a list of flags to pick ATLAS/OpenBLAS/LAPACK/MKL or any other backend that might be available, combined with variables (compiler and linker flags, notably) I might have to set. Is that information available at all ? Thanks, Stefan -- ...ich hab' noch einen Koffer in Berlin... From njs at pobox.com Wed Nov 4 23:11:38 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 4 Nov 2015 20:11:38 -0800 Subject: [Numpy-discussion] querying backend information In-Reply-To: <563AA56B.5020207@seefeld.name> References: <563AA56B.5020207@seefeld.name> Message-ID: On Wed, Nov 4, 2015 at 4:40 PM, Stefan Seefeld wrote: > Hello, > > is there a way to query Numpy for information about backends (BLAS, > LAPACK, etc.) that it was compiled against, including compiler / linker > flags that were used ? > Consider the use-case where instead of calling a function such as > numpy.dot() I may want to call the appropriate backend directly using > the C API as an optimization technique. Is there a straight-forward way > to do that ? > > In a somewhat related line of thought: Is there a way to see what > backends are available during Numpy compile-time ? I'm looking for a > list of flags to pick ATLAS/OpenBLAS/LAPACK/MKL or any other backend > that might be available, combined with variables (compiler and linker > flags, notably) I might have to set. Is that information available at all ? NumPy does reveal some information about its configuration and numpy.distutils does provide helper methods, but I'm not super familiar with it so I'll let others answer that part. Regarding the idea of "cutting out the middleman" and calling directly into the appropriate backend via the C API, NumPy doesn't currently expose any interface for doing this. There are some discussions with Antoine from a few months back about this (and given that you work at the same place I'm guessing the motivation is the same? :-)). For some reason I'm failing to find the archives now, but the summary from off the top of my head is: SciPy does expose an interface for this (via cython and its PyCapsule tricks -- see [1]), NumPy is unlikely to because we're wary of adding extra public interfaces and can't guarantee that we even have a full BLAS/LAPACK available (sometimes we fall back on a minimal vendored subset that's just enough for our needs), you probably don't want to try and get into the business of dynamically hunting down BLAS/LAPACK because it will be brittle and expose you to all kinds of cross-platform linker issues, and if you want to pull the clever stuff that scipy is doing out of scipy and put it into its own dedicated blas/lapack package, then well, we need one of those anyway [2]. -n [1] https://github.com/scipy-conference/scipy_proceedings_2015/blob/master/papers/ian_henriksen/cython_blas_lapack_api.rst [2] e.g. https://mail.scipy.org/pipermail/numpy-discussion/2015-January/072123.html -- Nathaniel J. Smith -- http://vorpus.org From ralf.gommers at gmail.com Thu Nov 5 01:37:41 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 5 Nov 2015 07:37:41 +0100 Subject: [Numpy-discussion] querying backend information In-Reply-To: References: <563AA56B.5020207@seefeld.name> Message-ID: On Thu, Nov 5, 2015 at 5:11 AM, Nathaniel Smith wrote: > On Wed, Nov 4, 2015 at 4:40 PM, Stefan Seefeld > wrote: > > Hello, > > > > is there a way to query Numpy for information about backends (BLAS, > > LAPACK, etc.) that it was compiled against, including compiler / linker > > flags that were used ? > > Consider the use-case where instead of calling a function such as > > numpy.dot() I may want to call the appropriate backend directly using > > the C API as an optimization technique. Is there a straight-forward way > > to do that ? > > > > In a somewhat related line of thought: Is there a way to see what > > backends are available during Numpy compile-time ? I'm looking for a > > list of flags to pick ATLAS/OpenBLAS/LAPACK/MKL or any other backend > > that might be available, combined with variables (compiler and linker > > flags, notably) I might have to set. Is that information available at > all ? > > NumPy does reveal some information about its configuration and > numpy.distutils does provide helper methods, but I'm not super > familiar with it so I'll let others answer that part. > np.show_config() Gives: lapack_opt_info: libraries = ['lapack', 'f77blas', 'cblas', 'atlas'] library_dirs = ['/usr/lib/atlas-base/atlas', '/usr/lib/atlas-base'] define_macros = [('NO_ATLAS_INFO', -1)] language = f77 include_dirs = ['/usr/include/atlas'] openblas_lapack_info: NOT AVAILABLE .... It's a function with no docstring and not in the html docs (I think), so that can certainly be improved. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Thu Nov 5 02:42:15 2015 From: shoyer at gmail.com (Stephan Hoyer) Date: Wed, 4 Nov 2015 23:42:15 -0800 Subject: [Numpy-discussion] Proposal for a new function: np.moveaxis Message-ID: I've put up a pull request implementing a new function, np.moveaxis, as an alternative to np.transpose and np.rollaxis: https://github.com/numpy/numpy/pull/6630 This functionality has been discussed (even the exact function name) several times over the years, but it never made it into a pull request. The most pressing issue is that the behavior of np.rollaxis is not intuitive to most users: https://mail.scipy.org/pipermail/numpy-discussion/2010-September/052882.html https://github.com/numpy/numpy/issues/2039 http://stackoverflow.com/questions/29891583/reason-why-numpy-rollaxis-is-so-confusing In this pull request, I also allow the source and destination axes to be sequences as well as scalars. This does not add much complexity to the code, solves some additional use cases and makes np.moveaxis a proper generalization of the other axes manipulation routines (see the pull requests for details). Best of all, it already works on ndarray duck types (like masked array and dask.array), because they have already implemented transpose. I think np.moveaxis would be a useful addition to NumPy -- I've found myself writing helper functions with a subset of its functionality several times over the past few years. What do you think? Cheers, Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From jni.soma at gmail.com Thu Nov 5 03:26:24 2015 From: jni.soma at gmail.com (Juan Nunez-Iglesias) Date: Thu, 05 Nov 2015 00:26:24 -0800 (PST) Subject: [Numpy-discussion] Proposal for a new function: np.moveaxis In-Reply-To: References: Message-ID: <1446711984088.9f2d3d97@Nodemailer> I'm just a lowly user, but I'm a fan of this. +1! On Thu, Nov 5, 2015 at 6:42 PM, Stephan Hoyer wrote: > I've put up a pull request implementing a new function, np.moveaxis, as an > alternative to np.transpose and np.rollaxis: > https://github.com/numpy/numpy/pull/6630 > This functionality has been discussed (even the exact function name) > several times over the years, but it never made it into a pull request. The > most pressing issue is that the behavior of np.rollaxis is not intuitive to > most users: > https://mail.scipy.org/pipermail/numpy-discussion/2010-September/052882.html > https://github.com/numpy/numpy/issues/2039 > http://stackoverflow.com/questions/29891583/reason-why-numpy-rollaxis-is-so-confusing > In this pull request, I also allow the source and destination axes to be > sequences as well as scalars. This does not add much complexity to the > code, solves some additional use cases and makes np.moveaxis a proper > generalization of the other axes manipulation routines (see the pull > requests for details). > Best of all, it already works on ndarray duck types (like masked array and > dask.array), because they have already implemented transpose. > I think np.moveaxis would be a useful addition to NumPy -- I've found > myself writing helper functions with a subset of its functionality several > times over the past few years. What do you think? > Cheers, > Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewm at redtetrahedron.org Thu Nov 5 08:12:36 2015 From: ewm at redtetrahedron.org (Eric Moore) Date: Thu, 5 Nov 2015 08:12:36 -0500 Subject: [Numpy-discussion] querying backend information In-Reply-To: References: <563AA56B.5020207@seefeld.name> Message-ID: On Thu, Nov 5, 2015 at 1:37 AM, Ralf Gommers wrote: > > > On Thu, Nov 5, 2015 at 5:11 AM, Nathaniel Smith wrote: > >> On Wed, Nov 4, 2015 at 4:40 PM, Stefan Seefeld >> wrote: >> > Hello, >> > >> > is there a way to query Numpy for information about backends (BLAS, >> > LAPACK, etc.) that it was compiled against, including compiler / linker >> > flags that were used ? >> > Consider the use-case where instead of calling a function such as >> > numpy.dot() I may want to call the appropriate backend directly using >> > the C API as an optimization technique. Is there a straight-forward way >> > to do that ? >> > >> > In a somewhat related line of thought: Is there a way to see what >> > backends are available during Numpy compile-time ? I'm looking for a >> > list of flags to pick ATLAS/OpenBLAS/LAPACK/MKL or any other backend >> > that might be available, combined with variables (compiler and linker >> > flags, notably) I might have to set. Is that information available at >> all ? >> >> NumPy does reveal some information about its configuration and >> numpy.distutils does provide helper methods, but I'm not super >> familiar with it so I'll let others answer that part. >> > > np.show_config() > > Gives: > > lapack_opt_info: > libraries = ['lapack', 'f77blas', 'cblas', 'atlas'] > library_dirs = ['/usr/lib/atlas-base/atlas', '/usr/lib/atlas-base'] > define_macros = [('NO_ATLAS_INFO', -1)] > language = f77 > include_dirs = ['/usr/include/atlas'] > openblas_lapack_info: > NOT AVAILABLE > .... > > > It's a function with no docstring and not in the html docs (I think), so > that can certainly be improved. > > Ralf > I don't think that show_config is what you want. Those are built time values that aren't necessarily true at run time. For instance, numpy from conda references directories that are not on my machine. Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From jjhelmus at gmail.com Thu Nov 5 10:18:23 2015 From: jjhelmus at gmail.com (Jonathan Helmus) Date: Thu, 5 Nov 2015 09:18:23 -0600 Subject: [Numpy-discussion] Proposal for a new function: np.moveaxis In-Reply-To: <1446711984088.9f2d3d97@Nodemailer> References: <1446711984088.9f2d3d97@Nodemailer> Message-ID: <563B733F.1090903@gmail.com> Also a +1 from me. I've had to (re-)learn how exactly np.transpose works more times then I care to admit. - Jonathan Helmus On 11/05/2015 02:26 AM, Juan Nunez-Iglesias wrote: > I'm just a lowly user, but I'm a fan of this. +1! > > > > > On Thu, Nov 5, 2015 at 6:42 PM, Stephan Hoyer > wrote: > > I've put up a pull request implementing a new function, > np.moveaxis, as an alternative to np.transpose and np.rollaxis: > https://github.com/numpy/numpy/pull/6630 > > This functionality has been discussed (even the exact function > name) several times over the years, but it never made it into a > pull request. The most pressing issue is that the behavior of > np.rollaxis is not intuitive to most users: > https://mail.scipy.org/pipermail/numpy-discussion/2010-September/052882.html > https://github.com/numpy/numpy/issues/2039 > http://stackoverflow.com/questions/29891583/reason-why-numpy-rollaxis-is-so-confusing > > In this pull request, I also allow the source and destination axes > to be sequences as well as scalars. This does not add much > complexity to the code, solves some additional use cases and makes > np.moveaxis a proper generalization of the other axes manipulation > routines (see the pull requests for details). > > Best of all, it already works on ndarray duck types (like masked > array and dask.array), because they have already implemented > transpose. > > I think np.moveaxis would be a useful addition to NumPy -- I've > found myself writing helper functions with a subset of its > functionality several times over the past few years. What do you > think? > > Cheers, > Stephan > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From hbar1054571 at gmail.com Thu Nov 5 11:26:18 2015 From: hbar1054571 at gmail.com (Johan) Date: Thu, 5 Nov 2015 16:26:18 +0000 (UTC) Subject: [Numpy-discussion] =?utf-8?q?Compilation_problems_npy=5Ffloat64?= Message-ID: Hello, I searched the forum, but couldn't find a post related to my problem. I am installing scipy via pip in cygwin environment pip install scipy Note: numpy version 1.10.1 was installed with pip install -U numpy /usr/bin/gfortran -Wall -g -Wall -g -shared -Wl,-gc-sections -Wl,-s build/temp.cygwin-2.2.1-x86_64-2.7/scipy/spatial/qhull.o build/temp.cygwin-2.2.1-x86_64-2.7/scipy/spatial/qhull/src/geom2.o build/temp.cygwin-2.2.1-x86_64-2.7/scipy/spatial/qhull/src/geom.o build/temp.cygwin-2.2.1-x86_64-2.7/scipy/spatial/qhull/src/global.o build/temp.cygwin-2.2.1-x86_64-2.7/scipy/spatial/qhull/src/io.o build/temp.cygwin-2.2.1-x86_64-2.7/scipy/spatial/qhull/src/libqhull.o build/temp.cygwin-2.2.1-x86_64-2.7/scipy/spatial/qhull/src/mem.o build/temp.cygwin-2.2.1-x86_64-2.7/scipy/spatial/qhull/src/merge.o build/temp.cygwin-2.2.1-x86_64-2.7/scipy/spatial/qhull/src/poly2.o build/temp.cygwin-2.2.1-x86_64-2.7/scipy/spatial/qhull/src/poly.o build/temp.cygwin-2.2.1-x86_64-2.7/scipy/spatial/qhull/src/qset.o build/temp.cygwin-2.2.1-x86_64-2.7/scipy/spatial/qhull/src/random.o build/temp.cygwin-2.2.1-x86_64-2.7/scipy/spatial/qhull/src/rboxlib.o build/temp.cygwin-2.2.1-x86_64-2.7/scipy/spatial/qhull/src/stat.o build/temp.cygwin-2.2.1-x86_64-2.7/scipy/spatial/qhull/src/user.o build/temp.cygwin-2.2.1-x86_64-2.7/scipy/spatial/qhull/src/usermem.o build/temp.cygwin-2.2.1-x86_64-2.7/scipy/spatial/qhull/src/userprintf.o build/temp.cygwin-2.2.1-x86_64- 2.7/scipy/spatial/qhull/src/userprintf_rbox.o -L/usr/lib - L/usr/lib/gcc/x86_64-pc-cygwin/4.9.3 -L/usr/lib/python2.7/config - L/usr/lib -Lbuild/temp.cygwin-2.2.1-x86_64-2.7 -llapack -lblas - lpython2.7 -lgfortran -o build/lib.cygwin-2.2.1-x86_64- 2.7/scipy/spatial/qhull.dll building 'scipy.spatial.ckdtree' extension compiling C++ sources C compiler: g++ -fno-strict-aliasing -ggdb -O2 -pipe -Wimplicit- function-declaration -fdebug-prefix-map=/usr/src/ports/python/python- 2.7.10-1.x86_64/build=/usr/src/debug/python-2.7.10-1 -fdebug-prefix- map=/usr/src/ports/python/python-2.7.10-1.x86_64/src/Python- 2.7.10=/usr/src/debug/python-2.7.10-1 -DNDEBUG -g -fwrapv -O3 -Wall creating build/temp.cygwin-2.2.1-x86_64-2.7/scipy/spatial/ckdtree creating build/temp.cygwin-2.2.1-x86_64- 2.7/scipy/spatial/ckdtree/src compile options: '-I/usr/include/python2.7 - I/usr/lib/python2.7/site-packages/numpy/core/include - Iscipy/spatial/ckdtree/src -I/usr/lib/python2.7/site- packages/numpy/core/include -I/usr/include/python2.7 -c' g++: scipy/spatial/ckdtree/src/ckdtree_cpp_exc.cxx cc1plus: warning: command line option ?-Wimplicit-function- declaration? is valid for C/ObjC but not for C++ g++: scipy/spatial/ckdtree/src/ckdtree_query.cxx cc1plus: warning: command line option ?-Wimplicit-function- declaration? is valid for C/ObjC but not for C++ In file included from /usr/lib/python2.7/site- packages/numpy/core/include/numpy/ndarraytypes.h:1781:0, from /usr/lib/python2.7/site- packages/numpy/core/include/numpy/ndarrayobject.h:18, from /usr/lib/python2.7/site- packages/numpy/core/include/numpy/arrayobject.h:4, from scipy/spatial/ckdtree/src/ckdtree_query.cxx:15: /usr/lib/python2.7/site- packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp] #warning "Using deprecated NumPy API, disable it by " \ ^ In file included from scipy/spatial/ckdtree/src/ckdtree_query.cxx:31:0: scipy/spatial/ckdtree/src/ckdtree_cpp_methods.h:12:20: error: ?npy_float64 infinity? redeclared as different kind of symbol extern npy_float64 infinity; ^ In file included from /usr/include/python2.7/pyport.h:325:0, from /usr/include/python2.7/Python.h:58, from scipy/spatial/ckdtree/src/ckdtree_query.cxx:14: /usr/include/math.h:263:15: note: previous declaration ?double infinity()? extern double infinity _PARAMS((void)); ^ In file included from scipy/spatial/ckdtree/src/ckdtree_query.cxx:31:0: scipy/spatial/ckdtree/src/ckdtree_cpp_methods.h: In function ?npy_float64 _distance_p(const npy_float64*, const npy_float64*, npy_float64, npy_intp, npy_float64)?: scipy/spatial/ckdtree/src/ckdtree_cpp_methods.h:139:17: error: invalid operands of types ?const npy_float64 {aka const double}? and ?double()? to binary ?operator==? else if (p==infinity) { ^ scipy/spatial/ckdtree/src/ckdtree_query.cxx: In function ?PyObject* query_knn(const ckdtree*, npy_float64*, npy_intp*, const npy_float64*, npy_intp, npy_intp, npy_float64, npy_float64, npy_float64)?: scipy/spatial/ckdtree/src/ckdtree_query.cxx:431:111: error: cannot convert ?double (*)()? to ?npy_float64 {aka double}? for argument ?9? to ?void __query_single_point(const ckdtree*, npy_float64*, npy_intp*, const npy_float64*, npy_intp, npy_float64, npy_float64, npy_float64, npy_float64)? __query_single_point(self, dd_row, ii_row, xx_row, k, eps, p, distance_upper_bound, ::infinity); ^ In file included from /usr/lib/python2.7/site- packages/numpy/core/include/numpy/ndarrayobject.h:27:0, from /usr/lib/python2.7/site- packages/numpy/core/include/numpy/arrayobject.h:4, from scipy/spatial/ckdtree/src/ckdtree_query.cxx:15: /usr/lib/python2.7/site- packages/numpy/core/include/numpy/__multiarray_api.h: At global scope: /usr/lib/python2.7/site- packages/numpy/core/include/numpy/__multiarray_api.h:1634:1: warning: ?int _import_array()? defined but not used [-Wunused-function] _import_array(void) ^ cc1plus: warning: command line option ?-Wimplicit-function- declaration? is valid for C/ObjC but not for C++ In file included from /usr/lib/python2.7/site- packages/numpy/core/include/numpy/ndarraytypes.h:1781:0, from /usr/lib/python2.7/site- packages/numpy/core/include/numpy/ndarrayobject.h:18, from /usr/lib/python2.7/site- packages/numpy/core/include/numpy/arrayobject.h:4, from scipy/spatial/ckdtree/src/ckdtree_query.cxx:15: /usr/lib/python2.7/site- packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp] #warning "Using deprecated NumPy API, disable it by " \ ^ In file included from scipy/spatial/ckdtree/src/ckdtree_query.cxx:31:0: scipy/spatial/ckdtree/src/ckdtree_cpp_methods.h:12:20: error: ?npy_float64 infinity? redeclared as different kind of symbol extern npy_float64 infinity; ^ In file included from /usr/include/python2.7/pyport.h:325:0, from /usr/include/python2.7/Python.h:58, from scipy/spatial/ckdtree/src/ckdtree_query.cxx:14: /usr/include/math.h:263:15: note: previous declaration ?double infinity()? extern double infinity _PARAMS((void)); ^ In file included from scipy/spatial/ckdtree/src/ckdtree_query.cxx:31:0: scipy/spatial/ckdtree/src/ckdtree_cpp_methods.h: In function ?npy_float64 _distance_p(const npy_float64*, const npy_float64*, npy_float64, npy_intp, npy_float64)?: scipy/spatial/ckdtree/src/ckdtree_cpp_methods.h:139:17: error: invalid operands of types ?const npy_float64 {aka const double}? and ?double()? to binary ?operator==? else if (p==infinity) { ^ scipy/spatial/ckdtree/src/ckdtree_query.cxx: In function ?PyObject* query_knn(const ckdtree*, npy_float64*, npy_intp*, const npy_float64*, npy_intp, npy_intp, npy_float64, npy_float64, npy_float64)?: scipy/spatial/ckdtree/src/ckdtree_query.cxx:431:111: error: cannot convert ?double (*)()? to ?npy_float64 {aka double}? for argument ?9? to ?void __query_single_point(const ckdtree*, npy_float64*, npy_intp*, const npy_float64*, npy_intp, npy_float64, npy_float64, npy_float64, npy_float64)? __query_single_point(self, dd_row, ii_row, xx_row, k, eps, p, distance_upper_bound, ::infinity); ^ In file included from /usr/lib/python2.7/site- packages/numpy/core/include/numpy/ndarrayobject.h:27:0, from /usr/lib/python2.7/site- packages/numpy/core/include/numpy/arrayobject.h:4, from scipy/spatial/ckdtree/src/ckdtree_query.cxx:15: /usr/lib/python2.7/site- packages/numpy/core/include/numpy/__multiarray_api.h: At global scope: /usr/lib/python2.7/site- packages/numpy/core/include/numpy/__multiarray_api.h:1634:1: warning: ?int _import_array()? defined but not used [-Wunused-function] _import_array(void) ^ error: Command "g++ -fno-strict-aliasing -ggdb -O2 -pipe -Wimplicit- function-declaration -fdebug-prefix-map=/usr/src/ports/python/python- 2.7.10-1.x86_64/build=/usr/src/debug/python-2.7.10-1 -fdebug-prefix- map=/usr/src/ports/python/python-2.7.10-1.x86_64/src/Python- 2.7.10=/usr/src/debug/python-2.7.10-1 -DNDEBUG -g -fwrapv -O3 -Wall - I/usr/include/python2.7 -I/usr/lib/python2.7/site- packages/numpy/core/include -Iscipy/spatial/ckdtree/src - I/usr/lib/python2.7/site-packages/numpy/core/include - I/usr/include/python2.7 -c scipy/spatial/ckdtree/src/ckdtree_query.cxx - o build/temp.cygwin-2.2.1-x86_64- 2.7/scipy/spatial/ckdtree/src/ckdtree_query.o" failed with exit status 1 ---------------------------------------- Command "/usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip-build- vAliRx/scipy/setup.py';exec(compile(getattr(tokenize, 'open', open) (__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install -- record /tmp/pip-gxrCbK-record/install-record.txt --single-version- externally-managed --compile" failed with error code 1 in /tmp/pip- build-vAliRx/scipy From pav at iki.fi Thu Nov 5 15:07:51 2015 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 5 Nov 2015 20:07:51 +0000 (UTC) Subject: [Numpy-discussion] Compilation problems npy_float64 References: Message-ID: Thu, 05 Nov 2015 16:26:18 +0000, Johan kirjoitti: > Hello, I searched the forum, but couldn't find a post related to my > problem. I am installing scipy via pip in cygwin environment [clip] > /usr/include/math.h:263:15: note: previous declaration ?double > infinity()? > extern double infinity _PARAMS((void)); > ^ [clip] This looks like some Cygwin weirdness --- a variable called "infinity" is apparently there declared by math.h, and thus a reserved name. This was fixed by (but not for this reason) https://github.com/scipy/scipy/commit/832baa20f0b5 so you may have better luck with the dev version. -- Pauli Virtanen From ralf.gommers at gmail.com Thu Nov 5 16:50:49 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 5 Nov 2015 22:50:49 +0100 Subject: [Numpy-discussion] New behavior of allclose In-Reply-To: References: Message-ID: On Wed, Nov 4, 2015 at 8:28 PM, Charles R Harris wrote: > Hi All, > > This is to open a discussion of a change of behavior of `np.allclose`. > That function uses `isclose` in numpy 1.10 with the result that array > subtypes are preserved whereas before they were not. In particular, memmaps > are returned when at least one of the inputs is a memmap. By and large I > think this is a good thing, OTOH, it is a change in behavior. It is easy to > fix, just run `np.array(result, copy=False)` on the current `result`, but I > thought I'd raise the topic on the list in case there is a good argument to > change things. > Why would it be good to return a memmap? And am I confused or does your just merged PR [1] revert the behavior you say here is a good thing? Ralf [1] https://github.com/numpy/numpy/pull/6628 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.v.root at gmail.com Thu Nov 5 17:00:34 2015 From: ben.v.root at gmail.com (Benjamin Root) Date: Thu, 5 Nov 2015 17:00:34 -0500 Subject: [Numpy-discussion] New behavior of allclose In-Reply-To: References: Message-ID: allclose() needs to return a bool so that one can do "if np.allclose(foo, bar) is True" or some such. The "good behavior" is for np.isclose() to return a memmap, which still confuses the heck out of me, but I am not a memmap expert. On Thu, Nov 5, 2015 at 4:50 PM, Ralf Gommers wrote: > > > On Wed, Nov 4, 2015 at 8:28 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> Hi All, >> >> This is to open a discussion of a change of behavior of `np.allclose`. >> That function uses `isclose` in numpy 1.10 with the result that array >> subtypes are preserved whereas before they were not. In particular, memmaps >> are returned when at least one of the inputs is a memmap. By and large I >> think this is a good thing, OTOH, it is a change in behavior. It is easy to >> fix, just run `np.array(result, copy=False)` on the current `result`, but I >> thought I'd raise the topic on the list in case there is a good argument to >> change things. >> > > Why would it be good to return a memmap? And am I confused or does your > just merged PR [1] revert the behavior you say here is a good thing? > > Ralf > > [1] https://github.com/numpy/numpy/pull/6628 > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Nov 5 17:15:26 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 5 Nov 2015 15:15:26 -0700 Subject: [Numpy-discussion] New behavior of allclose In-Reply-To: References: Message-ID: On Thu, Nov 5, 2015 at 2:50 PM, Ralf Gommers wrote: > > > On Wed, Nov 4, 2015 at 8:28 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> Hi All, >> >> This is to open a discussion of a change of behavior of `np.allclose`. >> That function uses `isclose` in numpy 1.10 with the result that array >> subtypes are preserved whereas before they were not. In particular, memmaps >> are returned when at least one of the inputs is a memmap. By and large I >> think this is a good thing, OTOH, it is a change in behavior. It is easy to >> fix, just run `np.array(result, copy=False)` on the current `result`, but I >> thought I'd raise the topic on the list in case there is a good argument to >> change things. >> > > Why would it be good to return a memmap? And am I confused or does your > just merged PR [1] revert the behavior you say here is a good thing? > Good thing for isclose, not allclose. I was thinking of very large files that might exceed memory in the isclose case, but an argument could be made for other subtypes. Allclose, OTOH, always returns a scalar. I went ahead with boolean for allclose because 1) it is backward compatible, 2) Nathaniel tended in that direction, 3) the conversation here is tending in that direction, 4) I tend in that direction, and finally, I want to get 1.10.2rc1 out this weekend ;) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefanv at berkeley.edu Fri Nov 6 12:32:53 2015 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Fri, 06 Nov 2015 09:32:53 -0800 Subject: [Numpy-discussion] Help wanted: implementation of 3D medial axis skeletonization References: <87vb9khwbd.fsf@berkeley.edu> Message-ID: <87oaf7f2yi.fsf@berkeley.edu> Hi all, I have been approached by a group that is interested in sponsoring the development of 3D skeletonization in scikit-image. One potential starting place would be: http://www.insight-journal.org/browse/publication/181 Is anyone interested in working on this? Please get in touch either on the scikit-image mailing list or by mailing me directly. Thanks! St?fan From njs at pobox.com Fri Nov 6 16:56:45 2015 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 6 Nov 2015 13:56:45 -0800 Subject: [Numpy-discussion] Proposal: stop supporting 'setup.py install'; start requiring 'pip install .' instead In-Reply-To: References: Message-ID: On Mon, Nov 2, 2015 at 5:57 PM, Nathaniel Smith wrote: > On Sun, Nov 1, 2015 at 3:16 PM, Ralf Gommers wrote: >> 2. ``pip install .`` silences build output, which may make sense for some >> usecases, but for numpy it just sits there for minutes with no output after >> printing "Running setup.py install for numpy". Users will think it hangs and >> Ctrl-C it. https://github.com/pypa/pip/issues/2732 > > I tend to agree with the commentary there that for end users this is > different but no worse than the current situation where we spit out > pages of "errors" that don't mean anything :-). I posted a suggestion > on that bug that might help with the apparent hanging problem. For the record, this is now fixed in pip's "develop" branch and should be in the next release. For commands like 'setup.py install', pip now displays a spinner that ticks over whenever the underlying process prints to stdout/stderr. So if the underlying process hangs, then the spinner will stop (it's not just lying to you), but normally it works nicely. https://github.com/pypa/pip/pull/3224 -n -- Nathaniel J. Smith -- http://vorpus.org From pythondev1 at aerojockey.com Sat Nov 7 16:18:22 2015 From: pythondev1 at aerojockey.com (aerojockey) Date: Sat, 7 Nov 2015 14:18:22 -0700 (MST) Subject: [Numpy-discussion] Question about structure arrays Message-ID: <1446931102879-41653.post@n7.nabble.com> Hello, Recently I made some changes to a program I'm working on, and found that the changes made it four times slower than before. After some digging, I found out that one of the new costs was that I added structure arrays. Inside a low-level loop, I create a structure array, populate it Python, then turn it over to some handwritten C code for processing. It turned out that, when passed a structure array as a dtype, numpy has to parse the dtype, which included calls to re.match and eval. Now, this is not a big deal for me to work around by using ordinary slicing and such, and also I can improve things by reusing arrays. Since this is inner loop stuff, sacrificing readability for speed is an appropriate tradeoff. Nevertheless, I was curious if there was a way (or any plans for there to be a way) to compile a struture array dtype. I realize it's not the bread-and-butter of numpy, but it turned out to be a very convenient feature for my use case (populating an array of structures to pass off to C). Thanks -- View this message in context: http://numpy-discussion.10968.n7.nabble.com/Question-about-structure-arrays-tp41653.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From njs at pobox.com Sat Nov 7 18:49:22 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 7 Nov 2015 15:49:22 -0800 Subject: [Numpy-discussion] Question about structure arrays In-Reply-To: <1446931102879-41653.post@n7.nabble.com> References: <1446931102879-41653.post@n7.nabble.com> Message-ID: On Sat, Nov 7, 2015 at 1:18 PM, aerojockey wrote: > Hello, > > Recently I made some changes to a program I'm working on, and found that the > changes made it four times slower than before. After some digging, I found > out that one of the new costs was that I added structure arrays. Inside a > low-level loop, I create a structure array, populate it Python, then turn it > over to some handwritten C code for processing. It turned out that, when > passed a structure array as a dtype, numpy has to parse the dtype, which > included calls to re.match and eval. > > Now, this is not a big deal for me to work around by using ordinary slicing > and such, and also I can improve things by reusing arrays. Since this is > inner loop stuff, sacrificing readability for speed is an appropriate > tradeoff. > > Nevertheless, I was curious if there was a way (or any plans for there to be > a way) to compile a struture array dtype. I realize it's not the > bread-and-butter of numpy, but it turned out to be a very convenient feature > for my use case (populating an array of structures to pass off to C). Does it help to turn your dtype string into a dtype object and then pass the dtype object around? E.g. In [1]: dt = np.dtype("i4,i4") In [2]: np.zeros(2, dtype=dt) Out[2]: array([(0, 0), (0, 0)], dtype=[('f0', ' Message-ID: <641995422468645328.080949sturla.molden-gmail.com@news.gmane.org> Johan wrote: > Hello, I searched the forum, but couldn't find a post related to my > problem. I am installing scipy via pip in cygwin environment I think I introduced this error when moving a global variable from the Cython module to a C++ module. The name collision with math.h was silent on Linux, Mac, and Windows (MinGW and MSVC) -- or not even present --, and thus went under the radar. But it eventually showed up on SunOS, and now also on Cygwin. :-( My apologies. Anyhow, it should be gone now. Try SciPy master. Sturla From charlesr.harris at gmail.com Sun Nov 8 20:46:17 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 8 Nov 2015 18:46:17 -0700 Subject: [Numpy-discussion] Feedback on new argument positions for ma.dot and MaskedArray.dot Message-ID: Hi All, I'd like some feedback for the position of the `strict` and `out` arguments for masked arrays. See gh-6653 for the PR in question. Current status without #6652 1. ma.dot(a, b, strict=False) -- established 2. a.dot(b, out=None) -- new in 1.10 Note that 1. requires adding `out` to the end for backward compatibility. OTOH, 2. is new(ish). We can either keep it compatible with ndarray.dot and add `strict` to the end and have it incompatible with 1., or, slightly changing it in 1.10.2, make it compatible with with 1. but incompatible with ndarray. We will face the same sort of problem with adding newer ndarray arguments other existing ma functions that have their own specialized arguments, so having a policy up front will be helpful. My own inclination here is to keep 1. and 2. compatible, and then perhaps at some point following a future warning, make both `strict` and `out` keyword arguments only. Another possiblitly is to make that transition immediate for the method. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Sun Nov 8 21:00:25 2015 From: efiring at hawaii.edu (Eric Firing) Date: Sun, 8 Nov 2015 16:00:25 -1000 Subject: [Numpy-discussion] Feedback on new argument positions for ma.dot and MaskedArray.dot In-Reply-To: References: Message-ID: <563FFE39.7060202@hawaii.edu> On 2015/11/08 3:46 PM, Charles R Harris wrote: > Hi All, > > I'd like some feedback for the position of the `strict` and `out` > arguments for masked arrays. See gh-6653 > for the PR in question. > > Current status without #6652 > > 1. ma.dot(a, b, strict=False) -- established > 2. a.dot(b, out=None) -- new in 1.10 > > > Note that 1. requires adding `out` to the end for backward > compatibility. OTOH, 2. is new(ish). We can either keep it compatible > with ndarray.dot and add `strict` to the end and have it incompatible > with 1., or, slightly changing it in 1.10.2, make it compatible with > with 1. but incompatible with ndarray. We will face the same sort of > problem with adding newer ndarray arguments other existing ma functions > that have their own specialized arguments, so having a policy up front > will be helpful. My own inclination here is to keep 1. and 2. > compatible, and then perhaps at some point following a future warning, > make both `strict` and `out` keyword arguments only. Another possiblitly > is to make that transition immediate for the method. I'm not sure about the best sequence, but I like the strategy of moving to keyword-only arguments. It is good for readability, and for flexibility. I also prefer that there be a single convention: either the "out" kwarg is the end of the every signature, or it is the first kwarg in every signature. It's a very special and unusual kwarg, so it should have a standard location. Eric > > Thoughts? > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > From njs at pobox.com Sun Nov 8 22:43:35 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 8 Nov 2015 19:43:35 -0800 Subject: [Numpy-discussion] Feedback on new argument positions for ma.dot and MaskedArray.dot In-Reply-To: <563FFE39.7060202@hawaii.edu> References: <563FFE39.7060202@hawaii.edu> Message-ID: On Nov 8, 2015 6:00 PM, "Eric Firing" wrote: > > I also prefer that there be a single convention: either the "out" kwarg is the end of the every signature, or it is the first kwarg in every signature. It's a very special and unusual kwarg, so it should have a standard location. For all ufuncs, out arguments come first immediately after in arguments, so +1 for doing that for consistency. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugreports2005 at cs.tut.fi Mon Nov 9 01:11:13 2015 From: bugreports2005 at cs.tut.fi (Lintula) Date: Mon, 9 Nov 2015 08:11:13 +0200 Subject: [Numpy-discussion] Failed numpy.test() with numpy-1.10.1 on RHEL 6 Message-ID: <56403901.50301@cs.tut.fi> Hello, I'm setting up numpy 1.10.1 on RHEL6 (python 2.6.6, atlas-3.8.4, lapack-3.2.1, gcc-4.4.7), and this test fails for me. I notice that someone else has had the same at https://github.com/numpy/numpy/issues/6063 in July. Is this harmless or is it of concern? ====================================================================== FAIL: test_umath.TestComplexFunctions.test_branch_cuts(, [-1, 0.5], [1j, 1j], 1, -1, True) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/nose/case.py", line 182, in runTest self.test(*self.arg) File "/usr/lib64/python2.6/site-packages/numpy/core/tests/test_umath.py", line 1748, in _check_branch_cut assert_(np.all(np.absolute(y0.imag - yp.imag) < atol), (y0, yp)) File "/usr/lib64/python2.6/site-packages/numpy/testing/utils.py", line 53, in assert_ raise AssertionError(smsg) AssertionError: (array([ 0.00000000e+00+3.14159265j, 1.11022302e-16-1.04719755j]), array([ 4.71216091e-07+3.14159218j, 1.28119737e-13+1.04719755j])) ---------------------------------------------------------------------- Ran 5955 tests in 64.284s FAILED (KNOWNFAIL=3, SKIP=2, failures=1) From irvin.probst at ensta-bretagne.fr Mon Nov 9 04:15:04 2015 From: irvin.probst at ensta-bretagne.fr (Irvin Probst) Date: Mon, 9 Nov 2015 10:15:04 +0100 Subject: [Numpy-discussion] loadtxt and usecols Message-ID: <56406418.1010500@ensta-bretagne.fr> Hi, I've recently seen many students, coming from Matlab, struggling against the usecols argument of loadtxt. Most of them tried something like: loadtxt("foo.bar", usecols=2) or the ones with better documentation reading skills tried loadtxt("foo.bar", usecols=(2)) but none of them understood they had to write usecols=[2] or usecols=(2,). Is there a policy in numpy stating that this kind of arguments must be sequences ? I think that being able to an int or a sequence when a single column is needed would make this function a bit more user friendly for beginners. I would gladly submit a PR if noone disagrees. Regards. -- Irvin From ewm at redtetrahedron.org Mon Nov 9 08:24:51 2015 From: ewm at redtetrahedron.org (Eric Moore) Date: Mon, 9 Nov 2015 08:24:51 -0500 Subject: [Numpy-discussion] Failed numpy.test() with numpy-1.10.1 on RHEL 6 In-Reply-To: <56403901.50301@cs.tut.fi> References: <56403901.50301@cs.tut.fi> Message-ID: This fails because numpy uses the function `cacosh` from the libm and on RHEL6 this function has a bug. As long as you don't care about getting the sign right at the branch cut in this function, then it's harmless. If you do care, the easiest solution will be to install something like anaconda that does not link against the relatively old libm that RHEL6 ships. On Mon, Nov 9, 2015 at 1:11 AM, Lintula wrote: > Hello, > > I'm setting up numpy 1.10.1 on RHEL6 (python 2.6.6, atlas-3.8.4, > lapack-3.2.1, gcc-4.4.7), and this test fails for me. I notice that > someone else has had the same at > https://github.com/numpy/numpy/issues/6063 in July. > > Is this harmless or is it of concern? > > > ====================================================================== > FAIL: test_umath.TestComplexFunctions.test_branch_cuts( 'arccosh'>, [-1, 0.5], [1j, 1j], 1, -1, True) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "/usr/lib/python2.6/site-packages/nose/case.py", line 182, in > runTest > self.test(*self.arg) > File > "/usr/lib64/python2.6/site-packages/numpy/core/tests/test_umath.py", > line 1748, in _check_branch_cut > assert_(np.all(np.absolute(y0.imag - yp.imag) < atol), (y0, yp)) > File "/usr/lib64/python2.6/site-packages/numpy/testing/utils.py", line > 53, in assert_ > raise AssertionError(smsg) > AssertionError: (array([ 0.00000000e+00+3.14159265j, > 1.11022302e-16-1.04719755j]), array([ 4.71216091e-07+3.14159218j, > 1.28119737e-13+1.04719755j])) > > ---------------------------------------------------------------------- > Ran 5955 tests in 64.284s > > FAILED (KNOWNFAIL=3, SKIP=2, failures=1) > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From othalan at othalan.net Mon Nov 9 09:27:19 2015 From: othalan at othalan.net (David Morris) Date: Mon, 9 Nov 2015 07:27:19 -0700 Subject: [Numpy-discussion] Question about structure arrays In-Reply-To: <1446931102879-41653.post@n7.nabble.com> References: <1446931102879-41653.post@n7.nabble.com> Message-ID: On Nov 7, 2015 2:58 PM, "aerojockey" wrote: > > Hello, > > Recently I made some changes to a program I'm working on, and found that the > changes made it four times slower than before. After some digging, I found > out that one of the new costs was that I added structure arrays. Inside a > low-level loop, I create a structure array, populate it Python, then turn it > over to some handwritten C code for processing. It turned out that, when > passed a structure array as a dtype, numpy has to parse the dtype, which > included calls to re.match and eval. > > Now, this is not a big deal for me to work around by using ordinary slicing > and such, and also I can improve things by reusing arrays. Since this is > inner loop stuff, sacrificing readability for speed is an appropriate > tradeoff. > > Nevertheless, I was curious if there was a way (or any plans for there to be > a way) to compile a struture array dtype. I realize it's not the > bread-and-butter of numpy, but it turned out to be a very convenient feature > for my use case (populating an array of structures to pass off to C). I was just looking into structured arrays. In case it is relevant: Are you using certain 1.10? They are apparently a LOT slower than 1.9.3, an issue which will be fixed in a future version. David -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.v.root at gmail.com Mon Nov 9 13:42:49 2015 From: ben.v.root at gmail.com (Benjamin Root) Date: Mon, 9 Nov 2015 13:42:49 -0500 Subject: [Numpy-discussion] loadtxt and usecols In-Reply-To: <56406418.1010500@ensta-bretagne.fr> References: <56406418.1010500@ensta-bretagne.fr> Message-ID: My personal rule for flexible inputs like that is that it should be encouraged so long as it does not introduce ambiguity. Furthermore, Allowing a scalar as an input doesn't add a congitive disconnect on the user on how to specify multiple columns. Therefore, I'd give this a +1. On Mon, Nov 9, 2015 at 4:15 AM, Irvin Probst wrote: > Hi, > I've recently seen many students, coming from Matlab, struggling against > the usecols argument of loadtxt. Most of them tried something like: > loadtxt("foo.bar", usecols=2) or the ones with better documentation > reading skills tried loadtxt("foo.bar", usecols=(2)) but none of them > understood they had to write usecols=[2] or usecols=(2,). > > Is there a policy in numpy stating that this kind of arguments must be > sequences ? I think that being able to an int or a sequence when a single > column is needed would make this function a bit more user friendly for > beginners. I would gladly submit a PR if noone disagrees. > > Regards. > > -- > Irvin > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Nov 9 14:36:57 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 9 Nov 2015 20:36:57 +0100 Subject: [Numpy-discussion] loadtxt and usecols In-Reply-To: References: <56406418.1010500@ensta-bretagne.fr> Message-ID: On Mon, Nov 9, 2015 at 7:42 PM, Benjamin Root wrote: > My personal rule for flexible inputs like that is that it should be > encouraged so long as it does not introduce ambiguity. Furthermore, > Allowing a scalar as an input doesn't add a congitive disconnect on the > user on how to specify multiple columns. Therefore, I'd give this a +1. > > On Mon, Nov 9, 2015 at 4:15 AM, Irvin Probst < > irvin.probst at ensta-bretagne.fr> wrote: > >> Hi, >> I've recently seen many students, coming from Matlab, struggling against >> the usecols argument of loadtxt. Most of them tried something like: >> loadtxt("foo.bar", usecols=2) or the ones with better documentation >> reading skills tried loadtxt("foo.bar", usecols=(2)) but none of them >> understood they had to write usecols=[2] or usecols=(2,). >> >> Is there a policy in numpy stating that this kind of arguments must be >> sequences ? > > There isn't. In many/most cases it's array_like, which means scalar, sequence or array. > I think that being able to an int or a sequence when a single column is >> needed would make this function a bit more user friendly for beginners. I >> would gladly submit a PR if noone disagrees. >> > +1 Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Nov 9 18:53:03 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 9 Nov 2015 15:53:03 -0800 Subject: [Numpy-discussion] Question about structure arrays In-Reply-To: <1446931102879-41653.post@n7.nabble.com> References: <1446931102879-41653.post@n7.nabble.com> Message-ID: On Sat, Nov 7, 2015 at 1:18 PM, aerojockey wrote: > Inside a > low-level loop, I create a structure array, populate it Python, then turn > it > over to some handwritten C code for processing. can you do that inside bit of the low-level loop in C (or cython?) you often want to put the guts of your loop in C anyway... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Nov 9 19:43:37 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 9 Nov 2015 17:43:37 -0700 Subject: [Numpy-discussion] Feedback on new argument positions for ma.dot and MaskedArray.dot In-Reply-To: References: <563FFE39.7060202@hawaii.edu> Message-ID: On Sun, Nov 8, 2015 at 8:43 PM, Nathaniel Smith wrote: > On Nov 8, 2015 6:00 PM, "Eric Firing" wrote: > > > > I also prefer that there be a single convention: either the "out" kwarg > is the end of the every signature, or it is the first kwarg in every > signature. It's a very special and unusual kwarg, so it should have a > standard location. > > For all ufuncs, out arguments come first immediately after in arguments, > so +1 for doing that for consistency. > Agree that that is what to shoot for. The particular problem with `ma.dot` is that it already has the `strict` argument where the new `out` argument should go. I propose the following steps. 1. For backward compatibility, start by adding new arguments to the end 2. Later raise FutureWarning on positional arguments that are out of place 3. Then make all but early arguments keyword only Once we have keyword only for a while, it would be possible to add some arguments back as positional arguments, but it might be best to keep them as keyword only as suggested above. For the current PR, this means that the dot method will have positional arguments in a different order than ma.dot. Alternatively, out could be made keyword only in both, although that would require fixing up some tests. There is really no magical solution that avoids all difficulties that I can see. Unless a consensus develops otherwise, I will pursue step 1. and go for a 1.10.2rc tomorrow. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Nov 9 19:54:01 2015 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 9 Nov 2015 16:54:01 -0800 Subject: [Numpy-discussion] Feedback on new argument positions for ma.dot and MaskedArray.dot In-Reply-To: References: <563FFE39.7060202@hawaii.edu> Message-ID: On Mon, Nov 9, 2015 at 4:43 PM, Charles R Harris wrote: > > > On Sun, Nov 8, 2015 at 8:43 PM, Nathaniel Smith wrote: >> >> On Nov 8, 2015 6:00 PM, "Eric Firing" wrote: >> > >> > I also prefer that there be a single convention: either the "out" kwarg >> > is the end of the every signature, or it is the first kwarg in every >> > signature. It's a very special and unusual kwarg, so it should have a >> > standard location. >> >> For all ufuncs, out arguments come first immediately after in arguments, >> so +1 for doing that for consistency. > > > Agree that that is what to shoot for. The particular problem with `ma.dot` > is that it already has the `strict` argument where the new `out` argument > should go. I propose the following steps. > > 1. For backward compatibility, start by adding new arguments to the end > 2. Later raise FutureWarning on positional arguments that are out of place > 3. Then make all but early arguments keyword only > > Once we have keyword only for a while, it would be possible to add some > arguments back as positional arguments, but it might be best to keep them as > keyword only as suggested above. > > For the current PR, this means that the dot method will have positional > arguments in a different order than ma.dot. Alternatively, out could be made > keyword only in both, although that would require fixing up some tests. > There is really no magical solution that avoids all difficulties that I can > see. > > Unless a consensus develops otherwise, I will pursue step 1. and go for a > 1.10.2rc tomorrow. If we're adding it in a funny place to ma.dot now (the end of the arglist) with the plan of changing it later, then why not make it kwarg-only in ma.dot now to start with? If this turns out to be annoying somehow then go ahead with whatever as far I'm concerned -- I don't want to hold up 1.10.2 by trying to micro-optimize the transition path for an obscure corner of np.ma :-). -n -- Nathaniel J. Smith -- http://vorpus.org From sebastian at sipsolutions.net Tue Nov 10 03:19:33 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 10 Nov 2015 09:19:33 +0100 Subject: [Numpy-discussion] loadtxt and usecols In-Reply-To: References: <56406418.1010500@ensta-bretagne.fr> Message-ID: <1447143573.2487.9.camel@sipsolutions.net> On Mo, 2015-11-09 at 20:36 +0100, Ralf Gommers wrote: > > > On Mon, Nov 9, 2015 at 7:42 PM, Benjamin Root > wrote: > My personal rule for flexible inputs like that is that it > should be encouraged so long as it does not introduce > ambiguity. Furthermore, Allowing a scalar as an input doesn't > add a congitive disconnect on the user on how to specify > multiple columns. Therefore, I'd give this a +1. > > > On Mon, Nov 9, 2015 at 4:15 AM, Irvin Probst > wrote: > Hi, > I've recently seen many students, coming from Matlab, > struggling against the usecols argument of loadtxt. > Most of them tried something like: > loadtxt("foo.bar", usecols=2) or the ones with better > documentation reading skills tried loadtxt("foo.bar", > usecols=(2)) but none of them understood they had to > write usecols=[2] or usecols=(2,). > > Is there a policy in numpy stating that this kind of > arguments must be sequences ? > > > There isn't. In many/most cases it's array_like, which means scalar, > sequence or array. > Agree, I think we have, or should have, to types of things there (well, three since we certainly have "must be sequence"). Args such as "axes" which is typically just one, so we allow scalar, but can often be generalized to a sequence. And things that are array-likes (and broadcasting). So, if this is an array-like, however, the "correct" result could be different by broadcasting between `1` and `(1,)` analogous to indexing the full array with usecols: usecols=1 result: array([2, 3, 4, 5]) usecols=(1,) result [1]: array([[2, 3, 4, 5]]) since a scalar row (so just one row) is read and not a 2D array. I tend to say it should be an array-like argument and not a generalized sequence argument, just wanted to note that, since I am not sure what matlab does. - Sebastian [1] could go further and do `usecols=[[1]]` and get `array([[[2, 3, 4, 5]]])` > > I think that being able to an int or a sequence when a > single column is needed would make this function a bit > more user friendly for beginners. I would gladly > submit a PR if noone disagrees. > > +1 > > > Ralf > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From irvin.probst at ensta-bretagne.fr Tue Nov 10 04:24:57 2015 From: irvin.probst at ensta-bretagne.fr (Irvin Probst) Date: Tue, 10 Nov 2015 10:24:57 +0100 Subject: [Numpy-discussion] loadtxt and usecols In-Reply-To: <1447143573.2487.9.camel@sipsolutions.net> References: <56406418.1010500@ensta-bretagne.fr> <1447143573.2487.9.camel@sipsolutions.net> Message-ID: <5641B7E9.2090802@ensta-bretagne.fr> On 10/11/2015 09:19, Sebastian Berg wrote: > since a scalar row (so just one row) is read and not a 2D array. I tend > to say it should be an array-like argument and not a generalized > sequence argument, just wanted to note that, since I am not sure what > matlab does. Hi, By default Matlab reads everything, silently fails on what can't be converted into a float and the user has to guess what was read or not. Say you have a file like this: 2010-01-01 00:00:00 3.026 2010-01-01 01:00:00 4.049 2010-01-01 02:00:00 4.865 >> M=load('CONCARNEAU_2010.txt'); >> M(1:3,:) ans = 1.0e+03 * 2.0100 0 0.0030 2.0100 0.0010 0.0040 2.0100 0.0020 0.0049 I think this is a terrible way of doing it even if newcomers might find this handy. There are of course optionnal arguments (even regexps !) but to my knowledge almost no Matlab user even knows these arguments are there. Anyway, I made a PR here https://github.com/numpy/numpy/pull/6656 with usecols as an array-like. Regards. From sebastian at sipsolutions.net Tue Nov 10 08:17:32 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 10 Nov 2015 14:17:32 +0100 Subject: [Numpy-discussion] loadtxt and usecols In-Reply-To: <5641B7E9.2090802@ensta-bretagne.fr> References: <56406418.1010500@ensta-bretagne.fr> <1447143573.2487.9.camel@sipsolutions.net> <5641B7E9.2090802@ensta-bretagne.fr> Message-ID: <1447161452.2487.15.camel@sipsolutions.net> On Di, 2015-11-10 at 10:24 +0100, Irvin Probst wrote: > On 10/11/2015 09:19, Sebastian Berg wrote: > > since a scalar row (so just one row) is read and not a 2D array. I tend > > to say it should be an array-like argument and not a generalized > > sequence argument, just wanted to note that, since I am not sure what > > matlab does. > > Hi, > By default Matlab reads everything, silently fails on what can't be > converted into a float and the user has to guess what was read or not. > Say you have a file like this: > > 2010-01-01 00:00:00 3.026 > 2010-01-01 01:00:00 4.049 > 2010-01-01 02:00:00 4.865 > > > >> M=load('CONCARNEAU_2010.txt'); > >> M(1:3,:) > > ans = > > 1.0e+03 * > > 2.0100 0 0.0030 > 2.0100 0.0010 0.0040 > 2.0100 0.0020 0.0049 > > > I think this is a terrible way of doing it even if newcomers might find > this handy. There are of course optionnal arguments (even regexps !) but > to my knowledge almost no Matlab user even knows these arguments are there. > > Anyway, I made a PR here https://github.com/numpy/numpy/pull/6656 with > usecols as an array-like. > Actually, it is the "sequence special case" type ;). (matlab does not have this, since matlab always returns 2-D I realized). As I said, if usecols is like indexing, the result should mimic: arr = np.loadtxt(f) arr = arr[usecols] in which case a 1-D array is returned if you put in a scalar into usecols (and you could even generalize usecols to higher dimensional array-likes). The way you implemented it -- which is fine, but I want to stress that there is a real decision being made here --, you always see it as a sequence but allow a scalar for convenience (i.e. always return a 2-D array). It is a `sequence of ints or int` type argument and not an array-like argument in my opinion. - Sebastian > Regards. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From irvin.probst at ensta-bretagne.fr Tue Nov 10 10:07:13 2015 From: irvin.probst at ensta-bretagne.fr (Irvin Probst) Date: Tue, 10 Nov 2015 16:07:13 +0100 Subject: [Numpy-discussion] loadtxt and usecols In-Reply-To: <1447161452.2487.15.camel@sipsolutions.net> References: <56406418.1010500@ensta-bretagne.fr> <1447143573.2487.9.camel@sipsolutions.net> <5641B7E9.2090802@ensta-bretagne.fr> <1447161452.2487.15.camel@sipsolutions.net> Message-ID: <56420821.6010805@ensta-bretagne.fr> On 10/11/2015 14:17, Sebastian Berg wrote: > Actually, it is the "sequence special case" type ;). (matlab does not > have this, since matlab always returns 2-D I realized). > > As I said, if usecols is like indexing, the result should mimic: > > arr = np.loadtxt(f) > arr = arr[usecols] > > in which case a 1-D array is returned if you put in a scalar into > usecols (and you could even generalize usecols to higher dimensional > array-likes). > The way you implemented it -- which is fine, but I want to stress that > there is a real decision being made here --, you always see it as a > sequence but allow a scalar for convenience (i.e. always return a 2-D > array). It is a `sequence of ints or int` type argument and not an > array-like argument in my opinion. I think we have two separate problems here: The first one is whether loadtxt should always return a 2D array or should it match the shape of the usecol argument. From a CS guy point of view I do understand your concern here. Now from a teacher point of view I know many people expect to get a "matrix" (thank you Matlab...) and the "purity" of matching the dimension of the usecol variable will be seen by many people [1] as a nerdy useless heavyness noone cares of (no offense). So whatever you, seadoned numpy devs from this mailing list, decide I think it should be explained in the docstring with a very clear wording. My own opinion on this first problem is that loadtxt() should always return a 2D array, no less, no more. If I write np.loadtxt(f)[42] it means I want to read the whole file and then I explicitely ask for transforming the 2-D array loadtxt() returned into a 1-D array. Otoh if I write loadtxt(f, usecol=42) it means I don't want to read the other columns and I want only this one, but it does not mean that I want to change the returned array from 2-D to 1-D. I know this new behavior might break a lot of existing code as usecol=(42,) used to return a 1-D array, but usecol=((((42,)))) also returns a 1-D array so the current behavior is not consistent imho. The second problem is about the wording in the docstring, when I see "sequence of int or int" I uderstand I will have to cast into a 1-D python list whatever wicked N-dimensional object I use to store my column indexes, or hope list(my_object) will do it fine. On the other hand when I read "array-like" the function is telling me I don't have to worry about my object, as long as numpy knows how to cast it into an array it will be fine. Anyway I think something like that: import numpy as np a=[[[2,],[],[],],[],[],[]] foo=np.loadtxt("CONCARNEAU_2010.txt", usecols=a) should just work and return me a 2-D (or 1-D if you like) array with the data I asked for and I don't think "a" here is an int or a sequence of int (but it's a good example of why loadtxt() should not match the shape of the usecol argument). To make it short, let the reading function read the data in a consistent and predictible way and then let the user explicitely change the data's shape into anything he likes. Regards. [1] read non CS people trying to switch to numpy/scipy From ben.v.root at gmail.com Tue Nov 10 10:24:40 2015 From: ben.v.root at gmail.com (Benjamin Root) Date: Tue, 10 Nov 2015 10:24:40 -0500 Subject: [Numpy-discussion] loadtxt and usecols In-Reply-To: <56420821.6010805@ensta-bretagne.fr> References: <56406418.1010500@ensta-bretagne.fr> <1447143573.2487.9.camel@sipsolutions.net> <5641B7E9.2090802@ensta-bretagne.fr> <1447161452.2487.15.camel@sipsolutions.net> <56420821.6010805@ensta-bretagne.fr> Message-ID: Just pointing out np.loadtxt(..., ndmin=2) will always return a 2D array. Notice that without that option, the result is effectively squeezed. So if you don't specify that option, and you load up a CSV file with only one row, you will get a very differently shaped array than if you load up a CSV file with two rows. Ben Root On Tue, Nov 10, 2015 at 10:07 AM, Irvin Probst < irvin.probst at ensta-bretagne.fr> wrote: > On 10/11/2015 14:17, Sebastian Berg wrote: > >> Actually, it is the "sequence special case" type ;). (matlab does not >> have this, since matlab always returns 2-D I realized). >> >> As I said, if usecols is like indexing, the result should mimic: >> >> arr = np.loadtxt(f) >> arr = arr[usecols] >> >> in which case a 1-D array is returned if you put in a scalar into >> usecols (and you could even generalize usecols to higher dimensional >> array-likes). >> The way you implemented it -- which is fine, but I want to stress that >> there is a real decision being made here --, you always see it as a >> sequence but allow a scalar for convenience (i.e. always return a 2-D >> array). It is a `sequence of ints or int` type argument and not an >> array-like argument in my opinion. >> > > I think we have two separate problems here: > > The first one is whether loadtxt should always return a 2D array or should > it match the shape of the usecol argument. From a CS guy point of view I do > understand your concern here. Now from a teacher point of view I know many > people expect to get a "matrix" (thank you Matlab...) and the "purity" of > matching the dimension of the usecol variable will be seen by many people > [1] as a nerdy useless heavyness noone cares of (no offense). So whatever > you, seadoned numpy devs from this mailing list, decide I think it should > be explained in the docstring with a very clear wording. > > My own opinion on this first problem is that loadtxt() should always > return a 2D array, no less, no more. If I write np.loadtxt(f)[42] it means > I want to read the whole file and then I explicitely ask for transforming > the 2-D array loadtxt() returned into a 1-D array. Otoh if I write > loadtxt(f, usecol=42) it means I don't want to read the other columns and I > want only this one, but it does not mean that I want to change the returned > array from 2-D to 1-D. I know this new behavior might break a lot of > existing code as usecol=(42,) used to return a 1-D array, but > usecol=((((42,)))) also returns a 1-D array so the current behavior is not > consistent imho. > > The second problem is about the wording in the docstring, when I see > "sequence of int or int" I uderstand I will have to cast into a 1-D python > list whatever wicked N-dimensional object I use to store my column indexes, > or hope list(my_object) will do it fine. On the other hand when I read > "array-like" the function is telling me I don't have to worry about my > object, as long as numpy knows how to cast it into an array it will be fine. > > Anyway I think something like that: > > import numpy as np > a=[[[2,],[],[],],[],[],[]] > foo=np.loadtxt("CONCARNEAU_2010.txt", usecols=a) > > should just work and return me a 2-D (or 1-D if you like) array with the > data I asked for and I don't think "a" here is an int or a sequence of int > (but it's a good example of why loadtxt() should not match the shape of the > usecol argument). > > To make it short, let the reading function read the data in a consistent > and predictible way and then let the user explicitely change the data's > shape into anything he likes. > > Regards. > > [1] read non CS people trying to switch to numpy/scipy > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidmenhur at gmail.com Tue Nov 10 10:52:52 2015 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Tue, 10 Nov 2015 16:52:52 +0100 Subject: [Numpy-discussion] loadtxt and usecols In-Reply-To: <56420821.6010805@ensta-bretagne.fr> References: <56406418.1010500@ensta-bretagne.fr> <1447143573.2487.9.camel@sipsolutions.net> <5641B7E9.2090802@ensta-bretagne.fr> <1447161452.2487.15.camel@sipsolutions.net> <56420821.6010805@ensta-bretagne.fr> Message-ID: On 10 November 2015 at 16:07, Irvin Probst wrote: > I know this new behavior might break a lot of existing code as > usecol=(42,) used to return a 1-D array, but usecol=((((42,)))) also > returns a 1-D array so the current behavior is not consistent imho. ((((42,)))) is exactly the same as (42,) If you want a tuple of tuples, you have to do ((42,),), but then it raises: TypeError: list indices must be integers, not tuple. What numpy cares about is that whatever object you give it is iterable, and its entries are ints, so usecol={0:'a', 5:'b'} is perfectly valid. I think loadtxt should be a tool to read text files in the least surprising fashion, and a text file is a 1 or 2D container, so it shouldn't return any other shapes. Any fancy stuff one may want to do with the output should be done with the typical indexing tricks. If I want a single column, I would first be very surprised if I got a 2D array (I was bitten by this design in MATLAB many many times). For the rare cases where I do want a "fake" 2D array, I can make it explicit by expanding it with arr[:, np.newaxis], and then I know that the shape will be (N, 1) and not (1, N). Thus, usecols should be int or sequence of ints, and the result 1 or 2D. In your example: a=[[[2,],[],[],],[],[],[]] foo=np.loadtxt("CONCARNEAU_2010.txt", usecols=a) What would the shape of foo be? /David. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Tue Nov 10 10:57:26 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 10 Nov 2015 16:57:26 +0100 Subject: [Numpy-discussion] loadtxt and usecols In-Reply-To: References: <56406418.1010500@ensta-bretagne.fr> <1447143573.2487.9.camel@sipsolutions.net> <5641B7E9.2090802@ensta-bretagne.fr> <1447161452.2487.15.camel@sipsolutions.net> <56420821.6010805@ensta-bretagne.fr> Message-ID: <1447171046.2487.22.camel@sipsolutions.net> On Di, 2015-11-10 at 10:24 -0500, Benjamin Root wrote: > Just pointing out np.loadtxt(..., ndmin=2) will always return a 2D > array. Notice that without that option, the result is effectively > squeezed. So if you don't specify that option, and you load up a CSV > file with only one row, you will get a very differently shaped array > than if you load up a CSV file with two rows. > Oh, well I personally think that default squeeze is an abomination :). Anyway, I just wanted to point out that it is two different possible logics, and we have to pick one. I have a slight preference for the indexing/array-like interpretation, but I am aware that from a usage point of view the sequence one is likely better. I could throw in another option: Throw an explicit error instead of the general. Anyway, I *really* do not have an opinion about what is better. Array-like would only suggest that you also accept buffer interface objects or array_interface stuff. Which in this case is really unnecessary I think. - Sebastian > > Ben Root > > > On Tue, Nov 10, 2015 at 10:07 AM, Irvin Probst > wrote: > On 10/11/2015 14:17, Sebastian Berg wrote: > Actually, it is the "sequence special case" type ;). > (matlab does not > have this, since matlab always returns 2-D I > realized). > > As I said, if usecols is like indexing, the result > should mimic: > > arr = np.loadtxt(f) > arr = arr[usecols] > > in which case a 1-D array is returned if you put in a > scalar into > usecols (and you could even generalize usecols to > higher dimensional > array-likes). > The way you implemented it -- which is fine, but I > want to stress that > there is a real decision being made here --, you > always see it as a > sequence but allow a scalar for convenience (i.e. > always return a 2-D > array). It is a `sequence of ints or int` type > argument and not an > array-like argument in my opinion. > > I think we have two separate problems here: > > The first one is whether loadtxt should always return a 2D > array or should it match the shape of the usecol argument. > From a CS guy point of view I do understand your concern here. > Now from a teacher point of view I know many people expect to > get a "matrix" (thank you Matlab...) and the "purity" of > matching the dimension of the usecol variable will be seen by > many people [1] as a nerdy useless heavyness noone cares of > (no offense). So whatever you, seadoned numpy devs from this > mailing list, decide I think it should be explained in the > docstring with a very clear wording. > > My own opinion on this first problem is that loadtxt() should > always return a 2D array, no less, no more. If I write > np.loadtxt(f)[42] it means I want to read the whole file and > then I explicitely ask for transforming the 2-D array > loadtxt() returned into a 1-D array. Otoh if I write > loadtxt(f, usecol=42) it means I don't want to read the other > columns and I want only this one, but it does not mean that I > want to change the returned array from 2-D to 1-D. I know this > new behavior might break a lot of existing code as > usecol=(42,) used to return a 1-D array, but > usecol=((((42,)))) also returns a 1-D array so the current > behavior is not consistent imho. > > The second problem is about the wording in the docstring, when > I see "sequence of int or int" I uderstand I will have to cast > into a 1-D python list whatever wicked N-dimensional object I > use to store my column indexes, or hope list(my_object) will > do it fine. On the other hand when I read "array-like" the > function is telling me I don't have to worry about my object, > as long as numpy knows how to cast it into an array it will be > fine. > > Anyway I think something like that: > > import numpy as np > a=[[[2,],[],[],],[],[],[]] > foo=np.loadtxt("CONCARNEAU_2010.txt", usecols=a) > > should just work and return me a 2-D (or 1-D if you like) > array with the data I asked for and I don't think "a" here is > an int or a sequence of int (but it's a good example of why > loadtxt() should not match the shape of the usecol argument). > > To make it short, let the reading function read the data in a > consistent and predictible way and then let the user > explicitely change the data's shape into anything he likes. > > Regards. > > [1] read non CS people trying to switch to numpy/scipy > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From irvin.probst at ensta-bretagne.fr Tue Nov 10 11:39:05 2015 From: irvin.probst at ensta-bretagne.fr (Irvin Probst) Date: Tue, 10 Nov 2015 17:39:05 +0100 Subject: [Numpy-discussion] loadtxt and usecols In-Reply-To: References: <56406418.1010500@ensta-bretagne.fr> <1447143573.2487.9.camel@sipsolutions.net> <5641B7E9.2090802@ensta-bretagne.fr> <1447161452.2487.15.camel@sipsolutions.net> <56420821.6010805@ensta-bretagne.fr> Message-ID: <56421DA9.8030308@ensta-bretagne.fr> On 10/11/2015 16:52, Da?id wrote: > ((((42,)))) is exactly the same as (42,) If you want a tuple of > tuples, you have to do ((42,),), but then it raises: TypeError: list > indices must be integers, not tuple. My bad, I wrote that too fast, please forget this. > I think loadtxt should be a tool to read text files in the least > surprising fashion, and a text file is a 1 or 2D container, so it > shouldn't return any other shapes. And I *do* agree with the "shouldn't return any other shapes" part of your phrase. What I was trying to say, admitedly with a very bogus example, is that either loadtxt() should always output an array whose shape matches the shape of the object passed to usecol or it should never do it, and I'm if favor of never. I'm perfectly aware that what I suggest would break the current behavior of usecols=(2,) so I know it does not have the slightest probability of being accepted but still, I think that the "least surprising fashion" is to always return an 2-D array because for many, many, many people a text data file has N lines and M columns and N=1 or M=1 is not a specific case. Anyway I will of course modify my PR according to any decision made here. In your example: > > a=[[[2,],[],[],],[],[],[]] > foo=np.loadtxt("CONCARNEAU_2010.txt", usecols=a) > > What would the shape of foo be? As I said in my previous email: > should just work and return me a 2-D (or 1-D if you like) array with the data I asked for So, 1-D or 2-D it is up to you, but as long as there is no ambiguity in which columns the user is asking for it should imho work. Regards. From pythondev1 at aerojockey.com Wed Nov 11 00:40:32 2015 From: pythondev1 at aerojockey.com (aerojockey) Date: Tue, 10 Nov 2015 22:40:32 -0700 (MST) Subject: [Numpy-discussion] Question about structure arrays In-Reply-To: References: <1446931102879-41653.post@n7.nabble.com> Message-ID: <1447220432593-41676.post@n7.nabble.com> Nathaniel Smith wrote > On Sat, Nov 7, 2015 at 1:18 PM, aerojockey < > pythondev1@ > > wrote: >> Hello, >> >> Recently I made some changes to a program I'm working on, and found that >> the >> changes made it four times slower than before. After some digging, I >> found >> out that one of the new costs was that I added structure arrays. Inside >> a >> low-level loop, I create a structure array, populate it Python, then turn >> it >> over to some handwritten C code for processing. It turned out that, when >> passed a structure array as a dtype, numpy has to parse the dtype, which >> included calls to re.match and eval. >> >> Now, this is not a big deal for me to work around by using ordinary >> slicing >> and such, and also I can improve things by reusing arrays. Since this is >> inner loop stuff, sacrificing readability for speed is an appropriate >> tradeoff. >> >> Nevertheless, I was curious if there was a way (or any plans for there to >> be >> a way) to compile a struture array dtype. I realize it's not the >> bread-and-butter of numpy, but it turned out to be a very convenient >> feature >> for my use case (populating an array of structures to pass off to C). > > Does it help to turn your dtype string into a dtype object and then > pass the dtype object around? E.g. > > In [1]: dt = np.dtype("i4,i4") > > In [2]: np.zeros(2, dtype=dt) > Out[2]: > array([(0, 0), (0, 0)], > dtype=[('f0', '<i4'), ('f1', '<i4')]) > > -n I actually don't know, since I removed the structure array part about ten minutes after I posted. However, I did a quick test of your suggestion, and indeed numpy calls exec and re.match only when creating the dtype object, not when creating the array. So certainly it would have helped. I wasn't actually aware you could do that with dtypes. In fact, I was only vaguely that there were dtype types at all. Thanks for the suggestion. Carl Banks -- View this message in context: http://numpy-discussion.10968.n7.nabble.com/Question-about-structure-arrays-tp41653p41676.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From sebastian at sipsolutions.net Wed Nov 11 05:02:50 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 11 Nov 2015 11:02:50 +0100 Subject: [Numpy-discussion] Indexing NEP draft Message-ID: <1447236170.2487.43.camel@sipsolutions.net> Hi all, at scipy discussing with Nathaniel and others, we thought that maybe we can push for orthogonal type indexing into numpy. Now with the new version out and some other discussions done, I thought it is time to pick it up :). The basic ideas are twofold. First make indexing easier and less confusing for starters (and advanced users also), and second improve interoperability with projects such as xray for whom orthogonal/outer type indexing makes more sense. I have started working on: 1. A preliminary draft of an NEP you can view at https://github.com/numpy/numpy/pull/6256/files?short_path=01e4dd9#diff-01e4dd9d2ecf994b24e5883f98f789e6 or at the end of this mail. 2. A preliminary implementation of `oindex` attribute with orthogonal/outer style indexing in https://github.com/numpy/numpy/pull/6075 which you can try out by cloning numpy and then running from the source dir: git fetch upstream pull/6075/head:pr-6075 && git checkout pr-6075; python runtests.py --ipython This will fetch my PR, switch to the branch and open an interactive ipython shell where you will be able to do arr.oindex[]. Note that I consider the NEP quite preliminary in many parts, and it may still be very confusing unless you are well versed with current advanced indexing. There are some longer examples comparing the different styles and another "example" which tries to show a "use case" example going from simpler to more complex indexing operations. Any comments are very welcome, and if it is "I don't understand a word" :). I know it is probably too short and, at least without examples, not easy to understand. Best, Sebastian ================================================================================== The current NEP draft: ========================================================== Implementing intuitive and full featured advanced indexing ========================================================== :Author: Sebastian Berg :Date: 2015-08-27 :Status: draft Executive summary ================= Advanced indexing with multiple array indices is typically confusing to both new, and in many cases even old, users of NumPy. To avoid this problem and allow for more and clearer features, we propose to: 1. Introduce ``arr.oindex[indices]`` which allows advanced indices, but uses outer indexing logic. 2. Introduce ``arr.vindex[indices]`` which use the current "vectorized"/broadcasted logic but with two differences from fancy indexing: 1. Boolean indices always use the outer indexing logic. (Multi dimensional booleans should be allowed). 2. The integer index result dimensions are always the first axes of the result array. No transpose is done, even for a single integer array index. 3. Vanilla indexing on the array will only give warnings and eventually errors either: * when there is ambiguity between legacy fancy and outer indexing (note that ``arr[[1, 2], :, 0]`` is such a case, an integer can be the "second" integer index array), * when any integer index array is present (possibly additional for more then one boolean index array). These constraints are sufficient for making indexing generally consistent with expectations and providing a less surprising learning curve with ``oindex``. Note that all things mentioned here apply both for assignment as well as subscription. Understanding these details is *not* easy. The `Examples` section gives code examples. And the hopefully easier `Motivational Example` provides some motivational use-cases for the general ideas and is likely a good start for anyone not intimately familiar with advanced indexing. Motivation ========== Old style advanced indexing with multiple array (boolean or integer) indices, also called "fancy indexing", tends to be very confusing for new users. While fancy (or legacy) indexing is useful in many cases one would naively assume that the result of multiple 1-d ranges is analogous to multiple slices along each dimension (also called "outer indexing"). However, legacy fancy indexing with multiple arrays broadcasts these arrays into a single index over multiple dimensions. There are three main points of confusion when multiple array indices are involved: 1. Most new users will usually expect outer indexing (consistent with slicing). This is also the most common way of handling this in other packages or languages. 2. The axes introduced by the array indices are at the front, unless all array indices are consecutive, in which case one can deduce where the user "expects" them to be: * `arr[:, [0, 1], :, [0, 1]]` will have the first dimension shaped 2. * `arr[:, [0, 1], [0, 1]]` will have the second dimension shaped 2. 3. When a boolean array index is mixed with another boolean or integer array, the result is very hard to understand (the boolean array is converted to integer array indices and then broadcast), and hardly useful. There is no well defined broadcast for booleans, so that boolean indices are logically always "``outer``" type indices. Proposed rules ============== From the three problems noted above some expectations for NumPy can be deduced: 1. There should be a prominent outer/orthogonal indexing method such as ``arr.oindex[indices]``. 2. Considering how confusing fancy indexing can be, it should only occur explicitly (e.g. ``arr.vindex[indices]``) 3. A new ``arr.vindex[indices]`` method, would not be tied to the confusing transpose rules of fancy indexing (which is for example needed for the simple case of a single advanced index). Thus, it no transposing should be done. The axes of the advanced indices are always inserted at the front, even for a single index. 4. Boolean indexing is conceptionally outer indexing. A broadcasting together with other advanced indices in the manner of legacy "fancy indexing" is generally not helpful or well defined. A user who wishes the "``nonzero``" plus broadcast behaviour can thus be expected to do this manually. Using this rule, a single boolean index can index into multiple dimensions at once. 5. An ``arr.lindex`` or ``arr.findex`` should likely be implemented to allow legacy fancy indexing indefinetly. This also gives a simple way to update fancy indexing code making deprecations to vanilla indexing easier. 6. Vanilla indexing ``arr[...]`` could return an error for ambiguous cases. For the beginning, this probably means cases where ``arr[ind]`` and ``arr.oindex[ind]`` return different results gives deprecation warnings. However, the exact rules for this (especially the final behaviour) are not quite clear in cases such as ``arr[0, :, index_arr]``. All other rules for indexing are identical. Open Questions ============== 1. Especially for the new indexing attributes ``oindex`` and ``vindex``, a case could be made to not implicitly add an ``Ellipsis`` index if necessary. This helps finding bugs since a too high dimensional array can be caught. (I am in favor for this, but doubt we should think about this for vanilla indexing.) 2. The names ``oindex`` and ``vindex`` are just suggestions at the time of writing this, another name NumPy has used for something like ``oindex`` is ``np.ix_``. See also below. 3. It would be possible to limit the use of boolean indices in ``vindex``, assuming that they are rare and to some degree special. (This would make implementation simpler, but I do not see a big reason.) 4. ``oindex`` and ``vindex`` could always return copies, even when no array operation occurs. One argument for using the same rules is that this way ``oindex`` can be used as a general index replacement. (There is likely no big reason for this, however, there is one reason: ``arr.vindex[array_scalar, ...]`` can occur, where ``arr_scalar`` should be a 0-D array. Copying always "fixes" the possible inconsistency.) 5. The final state to morph indexing in is not fixed in this PEP. It is for example possible that `arr[index]`` will be equivalent to ``arr.oindex`` at some point in the future. Since such a change will take years, it seems unnecessary to make specific decisions now. 6. Proposed changes to vanilla indexing could be postponed indefinetly or not taken in order to not break or force fixing of existing code bases. 7. Possible the ``vindex`` combination with boolean indexing could be rethought or not allowed at all for simplicity. Necessary changes to NumPy ========================== Implement ``arr.oindex`` and ``arr.vindex`` objects to allow these indexing operations and create warnings (and eventually deprecate) ambiguous direct indexing operations on arrays. Alternative Names ================= Possible names suggested (more suggestions will be added). ============== ======== ======= **Orthogonal** oindex oix **Vectorized** vindex fix **Legacy** l/findex ============== ======== ======= Examples ======== Since the various kinds of indexing is hard to grasp in many cases, these examples hopefully give some more insights. Note that they are all in terms of shape. All original dimensions start with 5, advanced indexing inserts less long dimensions. (Note that ``...`` or ``Ellipsis`` mostly inserts as many slices as needed to index the full array). These examples may be hard to grasp without working knowledge of advanced indexing as of NumPy 1.9. Example array:: >>> arr = np.ones((5, 6, 7, 8)) Legacy fancy indexing --------------------- Single index is transposed (this is the same for all indexing types):: >>> arr[[0], ...].shape (1, 6, 7, 8) >>> arr[:, [0], ...].shape (5, 1, 7, 8) Multiple indices are transposed *if* consecutive:: >>> arr[:, [0], [0], :].shape # future error (5, 1, 7) >>> arr[:, [0], :, [0]].shape # future error (1, 5, 6) It is important to note that a scalar *is* integer array index in this sense (and gets broadcasted with the other advanced index):: >>> arr[:, [0], 0, :].shape # future error (scalar is "fancy") (5, 1, 7) >>> arr[:, [0], :, 0].shape # future error (scalar is "fancy") (1, 5, 6) Single boolean index can act on multiple dimensions (especially the whole array). It has to match (as of 1.10. a deprecation warning) the dimensions. The boolean index is otherwise identical to (multiple consecutive) integer array indices:: >>> # Create boolean index with one True value for the last two dimensions: >>> bindx = np.zeros((7, 8), dtype=np.bool_) >>> bindx[[0, 0]] = True >>> arr[:, 0, bindx].shape (5, 1) >>> arr[0, :, bindx].shape (1, 6) The combination with anything that is not a scalar is confusing, e.g.:: >>> arr[[0], :, bindx].shape # bindx result broadcasts with [0] (1, 6) >>> arr[:, [0, 1], bindx] # IndexError Outer indexing -------------- Multiple indices are "orthogonal" and their result axes are inserted at the same place (they are not broadcasted):: >>> arr.oindex[:, [0], [0, 1], :].shape (5, 1, 2, 8) >>> arr.oindex[:, [0], :, [0, 1]].shape (5, 1, 7, 2) >>> arr.oindex[:, [0], 0, :].shape (5, 1, 8) >>> arr.oindex[:, [0], :, 0].shape (5, 1, 7) Boolean indices results are always inserted where the index is:: >>> # Create boolean index with one True value for the last two dimensions: >>> bindx = np.zeros((7, 8), dtype=np.bool_) >>> bindx[[0, 0]] = True >>> arr.oindex[:, 0, bindx].shape (5, 1) >>> arr.oindex[0, :, bindx].shape (6, 1) Nothing changed in the presence of other advanced indices since:: >>> arr.oindex[[0], :, bindx].shape (1, 6, 1) >>> arr.oindex[:, [0, 1], bindx] (5, 2, 1) Vectorized/inner indexing ------------------------- Multiple indices are broadcasted and iterated as one like fancy indexing, but the new axes area always inserted at the front:: >>> arr.vindex[:, [0], [0, 1], :].shape (2, 5, 8) >>> arr.vindex[:, [0], :, [0, 1]].shape (2, 5, 7) >>> arr.vindex[:, [0], 0, :].shape (1, 5, 8) >>> arr.vindex[:, [0], :, 0].shape (1, 5, 7) Boolean indices results are always inserted where the index is, exactly as in ``oindex`` given how specific they are to the axes they operate on:: >>> # Create boolean index with one True value for the last two dimensions: >>> bindx = np.zeros((7, 8), dtype=np.bool_) >>> bindx[[0, 0]] = True >>> arr.vindex[:, 0, bindx].shape (5, 1) >>> arr.vindex[0, :, bindx].shape (6, 1) But other advanced indices are again transposed to the front:: >>> arr.vindex[[0], :, bindx].shape (1, 6, 1) >>> arr.vindex[:, [0, 1], bindx] (2, 5, 1) Related Questions ================= There exist a further indexing or indexing like method. That is the inverse of a command such as ``np.argmin(arr, axis=axis)``, to pick the specific elements *along* an axis given an array of (at least typically) the same size. Doing such a thing with the indexing notation is not quite straight forward since the axis on which to pick elements has to be supplied. One plausible solution would be to create a function (calling it pick here for simplicity):: np.pick(arr, index_arr, axis=axis) where ``index_arr`` has to be the same shape as ``arr`` except along ``axis``. One could imagine that this can be useful together with other indexing types, but such a function may be sufficient and extra information needed seems easier to pass using a function convention. Another option would be to allow an argument such as ``compress_axes=None`` (just to have some name) which maps the axes from the index array to the new array with ``None`` signaling a new axis. Also keepdims could be added as a simple default. (Note that the use of axis is not compatible to ``np.take`` for an ``index_arr`` which is not zero or one dimensional.) Another solution is to provide functions or features to the ``arg*``functions to map this to the equivalent ``vindex`` indexing operation. Motivational Example ==================== Imagine having a data acquisition software storing ``D`` channels and ``N`` datapoints along the time. She stores this into an ``(N, D)`` shaped array. During data analysis, we needs to fetch a pool of channels, for example to calculate a mean over them. This data can be faked using:: >>> arr = np.random.random((100, 10)) Now one may remember indexing with an integer array and find the correct code:: >>> group = arr[:, [2, 5]] >>> mean_value = arr.mean() However, assume that there were some specific time points (first dimension of the data) that need to be specially considered. These time points are already known and given by:: >>> interesting_times = np.array([1, 5, 8, 10], dtype=np.intp) Now to fetch them, we may try to modify the previous code:: >>> group_at_it = arr[interesting_times, [2, 5]] IndexError: Ambiguous index, use `.oindex` or `.vindex` An error such as this will point to read up the indexing documentation. This should make it clear, that ``oindex`` behaves more like slicing. So, out of the different methods it is the obvious choice (for now, this is a shape mismatch, but that could possibly also mention ``oindex``):: >>> group_at_it = arr.oindex[interesting_times, [2, 5]] Now of course one could also have used ``vindex``, but it is much less obvious how to achieve the right thing!:: >>> reshaped_times = interesting_times[:, np.newaxis] >>> group_at_it = arr.vindex[reshaped_times, [2, 5]] One may find, that for example our data is corrupt in some places. So, we need to replace these values by zero (or anything else) for these times. The first column may for example give the necessary information, so that changing the values becomes easy remembering boolean indexing:: >>> bad_data = arr[0] > 0.5 >>> arr[bad_data, :] = 0 Again, however, the columns may need to be handled more individually (but in groups), and the ``oindex`` attribute works well:: >>> arr.oindex[bad_data, [2, 5]] = 0 Note that it would be very hard to do this using legacy fancy indexing. The only way would be to create an integer array first:: >>> bad_data_indx = np.nonzero(bad_data)[0] >>> bad_data_indx_reshaped = bad_data_indx[:, np.newaxis] >>> arr[bad_data_indx_reshaped, [2, 5]] In any case we can use only ``oindex`` to do all of this without getting into any trouble or confused by the whole complexity of advanced indexing. But, some new features are added to the data acquisition. Different sensors have to be used depending on the times. Let us assume we already have created an array of indices:: >>> correct_sensors = np.random.randint(10, size=(100, 2)) Which lists for each time the two correct sensors in an ``(N, 2)`` array. A first try to achieve this may be ``arr[:, correct_sensors]`` and this does not work. It should be clear quickly that slicing cannot achieve the desired thing. But hopefully users will remember that there is ``vindex`` as a more powerful and flexible approach to advanced indexing. One may, if trying ``vindex`` randomly, be confused about:: >>> new_arr = arr.vindex[:, correct_sensors] which is neither the same, nor the correct result (see transposing rules)! This is because slicing works still the same in ``vindex``. However, reading the documentation and examples, one can hopefully quickly find the desired solution:: >>> rows = np.arange(len(arr)) >>> rows = rows[:, np.newaxis] # make shape fit with correct_sensors >>> new_arr = arr.vindex[rows, correct_sensors] At this point we have left the straight forward world of ``oindex`` but can do random picking of any element from the array. Note that in the last example a method such as mentioned in the ``Related Questions`` section could be more straight forward. But this approach is even more flexible, since ``rows`` does not have to be a simple ``arange``, but could be ``intersting_times``:: >>> correct_sensors_at_it = correct_sensors[interesting_times, :] >>> interesting_times_reshaped = interesting_times[:, np.newaxis] >>> new_arr_it = arr[interesting_times_reshaped, correct_sensors_at_it] Truly complex situation would arise now if you would for example pool ``L`` experiments into an array shaped ``(L, N, D)``. But for ``oindex`` this should not result into surprises. ``vindex``, being more powerful, will quite certainly create some confusion in this case but also cover pretty much all eventualities. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Wed Nov 11 12:38:50 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 11 Nov 2015 18:38:50 +0100 Subject: [Numpy-discussion] loadtxt and usecols In-Reply-To: <56421DA9.8030308@ensta-bretagne.fr> References: <56406418.1010500@ensta-bretagne.fr> <1447143573.2487.9.camel@sipsolutions.net> <5641B7E9.2090802@ensta-bretagne.fr> <1447161452.2487.15.camel@sipsolutions.net> <56420821.6010805@ensta-bretagne.fr> <56421DA9.8030308@ensta-bretagne.fr> Message-ID: <1447263530.2487.54.camel@sipsolutions.net> On Di, 2015-11-10 at 17:39 +0100, Irvin Probst wrote: > On 10/11/2015 16:52, Da?id wrote: > > ((((42,)))) is exactly the same as (42,) If you want a tuple of > > tuples, you have to do ((42,),), but then it raises: TypeError: list > > indices must be integers, not tuple. > > My bad, I wrote that too fast, please forget this. > > > I think loadtxt should be a tool to read text files in the least > > surprising fashion, and a text file is a 1 or 2D container, so it > > shouldn't return any other shapes. > > And I *do* agree with the "shouldn't return any other shapes" part of > your phrase. What I was trying to say, admitedly with a very bogus > example, is that either loadtxt() should always output an array whose > shape matches the shape of the object passed to usecol or it should > never do it, and I'm if favor of never. Sounds fine to me, and considering the squeeze logic (which I think is unfortunate, but it is not something you can easily change), I would be for simply adding logic to accept a single integral argument and otherwise not change anything. I am personally against the flattening and even the array-like logic [1] currently in the PR, it seems like arbitrary generality for my taste without any obvious application. As said before, the other/additional thing that might be very helpful is trying to give a more useful error message. - Sebastian [1] Almost all 1-d array-likes will be sequences/iterables in any case, those that are not are so obscure that there is no point in explicitly supporting them. > I'm perfectly aware that what I suggest would break the current behavior > of usecols=(2,) so I know it does not have the slightest probability of > being accepted but still, I think that the "least surprising fashion" is > to always return an 2-D array because for many, many, many people a text > data file has N lines and M columns and N=1 or M=1 is not a specific case. > > Anyway I will of course modify my PR according to any decision made here. > > In your example: > > > > a=[[[2,],[],[],],[],[],[]] > > foo=np.loadtxt("CONCARNEAU_2010.txt", usecols=a) > > > > What would the shape of foo be? > > As I said in my previous email: > > > should just work and return me a 2-D (or 1-D if you like) array with > the data I asked for > > So, 1-D or 2-D it is up to you, but as long as there is no ambiguity in > which columns the user is asking for it should imho work. > > Regards. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From charlesr.harris at gmail.com Thu Nov 12 16:11:14 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 12 Nov 2015 14:11:14 -0700 Subject: [Numpy-discussion] Numpy 1.10.2rc1 Message-ID: Hi All, I am pleased to announce the release of Numpy 1.10.2rc1. This release should fix the problems exposed in 1.10.1, which is not to say there are no remaining problems. Please test this thoroughly, exspecially if you experienced problems with 1.10.1. Julian Taylor has opened an issue relating to cblas detection on Debian (and probably Debian derived distributions) that is not dealt with in this release. Hopefully a solution will be available before the final. To all who reported issues with 1.10.1 and to those who helped close them, a big thank you. Source and binary files may be found on Sourceforge . Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From irvin.probst at ensta-bretagne.fr Fri Nov 13 05:51:54 2015 From: irvin.probst at ensta-bretagne.fr (Irvin Probst) Date: Fri, 13 Nov 2015 11:51:54 +0100 Subject: [Numpy-discussion] loadtxt and usecols In-Reply-To: <1447263530.2487.54.camel@sipsolutions.net> References: <56406418.1010500@ensta-bretagne.fr> <1447143573.2487.9.camel@sipsolutions.net> <5641B7E9.2090802@ensta-bretagne.fr> <1447161452.2487.15.camel@sipsolutions.net> <56420821.6010805@ensta-bretagne.fr> <56421DA9.8030308@ensta-bretagne.fr> <1447263530.2487.54.camel@sipsolutions.net> Message-ID: <5645C0CA.1060506@ensta-bretagne.fr> On 11/11/2015 18:38, Sebastian Berg wrote: > > Sounds fine to me, and considering the squeeze logic (which I think is > unfortunate, but it is not something you can easily change), I would be > for simply adding logic to accept a single integral argument and > otherwise not change anything. > [...] > > As said before, the other/additional thing that might be very helpful is > trying to give a more useful error message. > I've modified my PR to (hopefully) match these requests. https://github.com/numpy/numpy/pull/6656 Regards. -- Irvin From charlesr.harris at gmail.com Fri Nov 13 13:06:37 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 13 Nov 2015 11:06:37 -0700 Subject: [Numpy-discussion] Removiing 1.10.0 and 1.10.1 from sourceforge and pypi Message-ID: Hi All, I think 1.10.0 and 1.10.1 are sufficiently buggy that they should be removed from circulation as soon as 1.10.2 comes out. The inner product bug for non contiguous arrays is particularly egregious. It is not customary to remove outdated Numpy releases from sourceforge and pypi, but I'd like to make an exception for those two. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Nov 13 13:21:51 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 13 Nov 2015 10:21:51 -0800 Subject: [Numpy-discussion] Removiing 1.10.0 and 1.10.1 from sourceforge and pypi In-Reply-To: References: Message-ID: Hi, On Fri, Nov 13, 2015 at 10:06 AM, Charles R Harris wrote: > Hi All, > > I think 1.10.0 and 1.10.1 are sufficiently buggy that they should be removed > from circulation as soon as 1.10.2 comes out. The inner product bug for non > contiguous arrays is particularly egregious. It is not customary to remove > outdated Numpy releases from sourceforge and pypi, but I'd like to make an > exception for those two. First pass, that doesn't seem like a good idea to me. No-one will get these releases unless they ask for them specifically, once 1.10.2 is out. Imagine for example that someone does have one of these versions already and is making a bug report, we might want to test against those versions to replicate the bug. Cheers, Matthew From manolo at austrohungaro.com Fri Nov 13 13:22:32 2015 From: manolo at austrohungaro.com (Manolo =?iso-8859-1?Q?Mart=EDnez?=) Date: Fri, 13 Nov 2015 19:22:32 +0100 Subject: [Numpy-discussion] Removiing 1.10.0 and 1.10.1 from sourceforge and pypi In-Reply-To: References: Message-ID: <20151113182232.GA6748@beagle.localdomain> On 11/13/15 at 11:06am, Charles R Harris wrote: > The inner product bug > for non contiguous arrays is particularly egregious. Could you please post a link to the related issue? I have been seeing very strange things with scipy.integrate.odeint, and I wonder if they are related. Thanks, Manolo From charlesr.harris at gmail.com Fri Nov 13 14:16:09 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 13 Nov 2015 12:16:09 -0700 Subject: [Numpy-discussion] Removiing 1.10.0 and 1.10.1 from sourceforge and pypi In-Reply-To: <20151113182232.GA6748@beagle.localdomain> References: <20151113182232.GA6748@beagle.localdomain> Message-ID: On Fri, Nov 13, 2015 at 11:22 AM, Manolo Mart?nez wrote: > On 11/13/15 at 11:06am, Charles R Harris wrote: > > The inner product bug > > for non contiguous arrays is particularly egregious. > > Could you please post a link to the related issue? I have been seeing > very strange things with scipy.integrate.odeint, and I wonder if they > are related. > Here you go: https://github.com/numpy/numpy/issues/6532 . Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From manolo at austrohungaro.com Fri Nov 13 14:26:01 2015 From: manolo at austrohungaro.com (Manolo =?iso-8859-1?Q?Mart=EDnez?=) Date: Fri, 13 Nov 2015 20:26:01 +0100 Subject: [Numpy-discussion] Removiing 1.10.0 and 1.10.1 from sourceforge and pypi In-Reply-To: References: <20151113182232.GA6748@beagle.localdomain> Message-ID: <20151113192601.GA8191@beagle.localdomain> On 11/13/15 at 12:16pm, Charles R Harris wrote: > On Fri, Nov 13, 2015 at 11:22 AM, Manolo Mart?nez > wrote: > > > On 11/13/15 at 11:06am, Charles R Harris wrote: > > > The inner product bug > > > for non contiguous arrays is particularly egregious. > > > > Could you please post a link to the related issue? I have been seeing > > very strange things with scipy.integrate.odeint, and I wonder if they > > are related. > > > > Here you go: https://github.com/numpy/numpy/issues/6532 > . > Thanks! M > Chuck > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -- From njs at pobox.com Fri Nov 13 14:50:23 2015 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 13 Nov 2015 11:50:23 -0800 Subject: [Numpy-discussion] Removiing 1.10.0 and 1.10.1 from sourceforge and pypi In-Reply-To: References: Message-ID: On Nov 13, 2015 10:06 AM, "Charles R Harris" wrote: > > Hi All, > > I think 1.10.0 and 1.10.1 are sufficiently buggy that they should be removed from circulation as soon as 1.10.2 comes out. The inner product bug for non contiguous arrays is particularly egregious. It is not customary to remove outdated Numpy releases from sourceforge and pypi, but I'd like to make an exception for those two. > Can you elaborate on what you're trying to accomplish? Like Matthew says, they'll effectively be removed from circulation once 1.10.2 is released, regardless of whether we actually delete the files. But deleting the files does make it difficult to do legitimate things. (Example: rerunning an analysis with both 1.10.1 and 1.10.2 to check whether some published results were affected by one of the bugs in 1.10.1.) -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Nov 13 15:04:36 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 13 Nov 2015 13:04:36 -0700 Subject: [Numpy-discussion] Removiing 1.10.0 and 1.10.1 from sourceforge and pypi In-Reply-To: References: Message-ID: On Fri, Nov 13, 2015 at 12:50 PM, Nathaniel Smith wrote: > On Nov 13, 2015 10:06 AM, "Charles R Harris" > wrote: > > > > Hi All, > > > > I think 1.10.0 and 1.10.1 are sufficiently buggy that they should be > removed from circulation as soon as 1.10.2 comes out. The inner product bug > for non contiguous arrays is particularly egregious. It is not customary to > remove outdated Numpy releases from sourceforge and pypi, but I'd like to > make an exception for those two. > > > > Can you elaborate on what you're trying to accomplish? Like Matthew says, > they'll effectively be removed from circulation once 1.10.2 is released, > regardless of whether we actually delete the files. But deleting the files > does make it difficult to do legitimate things. (Example: rerunning an > analysis with both 1.10.1 and 1.10.2 to check whether some published > results were affected by one of the bugs in 1.10.1.) > Basically, they should never be used, but they will always be tagged in the repo. That said, if the consensus is to leave them up I won't be bothered much. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Nov 14 04:21:36 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 14 Nov 2015 10:21:36 +0100 Subject: [Numpy-discussion] Removiing 1.10.0 and 1.10.1 from sourceforge and pypi In-Reply-To: References: Message-ID: On Fri, Nov 13, 2015 at 9:04 PM, Charles R Harris wrote: > > > On Fri, Nov 13, 2015 at 12:50 PM, Nathaniel Smith wrote: > >> On Nov 13, 2015 10:06 AM, "Charles R Harris" >> wrote: >> > >> > Hi All, >> > >> > I think 1.10.0 and 1.10.1 are sufficiently buggy that they should be >> removed from circulation as soon as 1.10.2 comes out. The inner product bug >> for non contiguous arrays is particularly egregious. It is not customary to >> remove outdated Numpy releases from sourceforge and pypi, but I'd like to >> make an exception for those two. >> > >> >> Can you elaborate on what you're trying to accomplish? Like Matthew says, >> they'll effectively be removed from circulation once 1.10.2 is released, >> regardless of whether we actually delete the files. But deleting the files >> does make it difficult to do legitimate things. (Example: rerunning an >> analysis with both 1.10.1 and 1.10.2 to check whether some published >> results were affected by one of the bugs in 1.10.1.) >> > > Basically, they should never be used, but they will always be tagged in > the repo. That said, if the consensus is to leave them up I won't be > bothered much. > I'd also vote for leaving them up. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From tcaswell at gmail.com Sat Nov 14 18:34:38 2015 From: tcaswell at gmail.com (Thomas Caswell) Date: Sat, 14 Nov 2015 23:34:38 +0000 Subject: [Numpy-discussion] Removiing 1.10.0 and 1.10.1 from sourceforge and pypi In-Reply-To: References: Message-ID: I would also vote for leaving them up. On Sat, Nov 14, 2015 at 4:21 AM Ralf Gommers wrote: > On Fri, Nov 13, 2015 at 9:04 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Fri, Nov 13, 2015 at 12:50 PM, Nathaniel Smith wrote: >> >>> On Nov 13, 2015 10:06 AM, "Charles R Harris" >>> wrote: >>> > >>> > Hi All, >>> > >>> > I think 1.10.0 and 1.10.1 are sufficiently buggy that they should be >>> removed from circulation as soon as 1.10.2 comes out. The inner product bug >>> for non contiguous arrays is particularly egregious. It is not customary to >>> remove outdated Numpy releases from sourceforge and pypi, but I'd like to >>> make an exception for those two. >>> > >>> >>> Can you elaborate on what you're trying to accomplish? Like Matthew >>> says, they'll effectively be removed from circulation once 1.10.2 is >>> released, regardless of whether we actually delete the files. But deleting >>> the files does make it difficult to do legitimate things. (Example: >>> rerunning an analysis with both 1.10.1 and 1.10.2 to check whether some >>> published results were affected by one of the bugs in 1.10.1.) >>> >> >> Basically, they should never be used, but they will always be tagged in >> the repo. That said, if the consensus is to leave them up I won't be >> bothered much. >> > > I'd also vote for leaving them up. > > Ralf > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at gmail.com Mon Nov 16 08:02:01 2015 From: faltet at gmail.com (Francesc Alted) Date: Mon, 16 Nov 2015 14:02:01 +0100 Subject: [Numpy-discussion] ANN: bcolz 0.12.0 released Message-ID: ======================= Announcing bcolz 0.12.0 ======================= What's new ========== This release copes with some compatibility issues with NumPy 1.10. Also, several improvements have happened in the installation procedure, allowing for a smoother process. Last but not least, the tutorials haven been migrated to the IPython notebook format (a huge thank you to Francesc Elies for this!). This will hopefully will allow users to better exercise the different features of bcolz. For a more detailed change log, see: https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst What it is ========== *bcolz* provides columnar and compressed data containers that can live either on-disk or in-memory. Column storage allows for efficiently querying tables with a large number of columns. It also allows for cheap addition and removal of column. In addition, bcolz objects are compressed by default for reducing memory/disk I/O needs. The compression process is carried out internally by Blosc, an extremely fast meta-compressor that is optimized for binary data. Lastly, high-performance iterators (like ``iter()``, ``where()``) for querying the objects are provided. bcolz can use numexpr internally so as to accelerate many vector and query operations (although it can use pure NumPy for doing so too). numexpr optimizes the memory usage and use several cores for doing the computations, so it is blazing fast. Moreover, since the carray/ctable containers can be disk-based, and it is possible to use them for seamlessly performing out-of-memory computations. bcolz has minimal dependencies (NumPy), comes with an exhaustive test suite and fully supports both 32-bit and 64-bit platforms. Also, it is typically tested on both UNIX and Windows operating systems. Together, bcolz and the Blosc compressor, are finally fulfilling the promise of accelerating memory I/O, at least for some real scenarios: http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) the Blaze project (http://blaze.pydata.org/), Quantopian (https://www.quantopian.com/) and Scikit-Allel (https://github.com/cggh/scikit-allel) which you can read more about by pointing your browser at the links below. * Visualfabriq: * *bquery*, A query and aggregation framework for Bcolz: * https://github.com/visualfabriq/bquery * Blaze: * Notebooks showing Blaze + Pandas + BColz interaction: * http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-csv.ipynb * http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-bcolz.ipynb * Quantopian: * Using compressed data containers for faster backtesting at scale: * https://quantopian.github.io/talks/NeedForSpeed/slides.html * Scikit-Allel * Provides an alternative backend to work with compressed arrays * https://scikit-allel.readthedocs.org/en/latest/model/bcolz.html Installing ========== bcolz is in the PyPI repository, so installing it is easy:: $ pip install -U bcolz Resources ========= Visit the main bcolz site repository at: http://github.com/Blosc/bcolz Manual: http://bcolz.blosc.org Home of Blosc compressor: http://blosc.org User's mail list: bcolz at googlegroups.com http://groups.google.com/group/bcolz License is the new BSD: https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt Release notes can be found in the Git repository: https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst ---- **Enjoy data!** -- Francesc Alted -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Tue Nov 17 10:48:44 2015 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 17 Nov 2015 10:48:44 -0500 Subject: [Numpy-discussion] reshaping array question Message-ID: I have an array of shape (7, 24, 2, 1024) I'd like an array of (7, 24, 2048) such that the elements on the last dimension are interleaving the elements from the 3rd dimension [0,0,0,0] -> [0,0,0] [0,0,1,0] -> [0,0,1] [0,0,0,1] -> [0,0,2] [0,0,1,1] -> [0,0,3] ... What might be the simplest way to do this? ------------ A different question, suppose I just want to stack them [0,0,0,0] -> [0,0,0] [0,0,0,1] -> [0,0,1] [0,0,0,2] -> [0,0,2] ... [0,0,1,0] -> [0,0,1024] [0,0,1,1] -> [0,0,1025] [0,0,1,2] -> [0,0,1026] ... From robert.kern at gmail.com Tue Nov 17 11:20:25 2015 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 17 Nov 2015 16:20:25 +0000 Subject: [Numpy-discussion] reshaping array question In-Reply-To: References: Message-ID: On Tue, Nov 17, 2015 at 3:48 PM, Neal Becker wrote: > > I have an array of shape > (7, 24, 2, 1024) > > I'd like an array of > (7, 24, 2048) > > such that the elements on the last dimension are interleaving the elements > from the 3rd dimension > > [0,0,0,0] -> [0,0,0] > [0,0,1,0] -> [0,0,1] > [0,0,0,1] -> [0,0,2] > [0,0,1,1] -> [0,0,3] > ... > > What might be the simplest way to do this? np.transpose(A, (-2, -1)).reshape(A.shape[:-2] + (-1,)) > ------------ > A different question, suppose I just want to stack them > > [0,0,0,0] -> [0,0,0] > [0,0,0,1] -> [0,0,1] > [0,0,0,2] -> [0,0,2] > ... > [0,0,1,0] -> [0,0,1024] > [0,0,1,1] -> [0,0,1025] > [0,0,1,2] -> [0,0,1026] > ... A.reshape(A.shape[:-2] + (-1,)) -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Tue Nov 17 11:23:45 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 17 Nov 2015 17:23:45 +0100 Subject: [Numpy-discussion] reshaping array question In-Reply-To: References: Message-ID: <1447777425.2734.1.camel@sipsolutions.net> On Di, 2015-11-17 at 10:48 -0500, Neal Becker wrote: > I have an array of shape > (7, 24, 2, 1024) > > I'd like an array of > (7, 24, 2048) > > such that the elements on the last dimension are interleaving the elements > from the 3rd dimension > Which basically means you want to reshape with the earlier index varying faster. This is fortran order (in the simplest case for two axes being reshaped). So you can do: arr.reshape((7, 24, -1), order="F") otherwise, if order seems too confusing or dangerous. Just transpose the two axes first: arr_r = arr.transpose((0, 1, 3, 2)) arr_r = arr_r.reshape((7, 24, -1)) - Sebastian > [0,0,0,0] -> [0,0,0] > [0,0,1,0] -> [0,0,1] > [0,0,0,1] -> [0,0,2] > [0,0,1,1] -> [0,0,3] > ... > > What might be the simplest way to do this? > > ------------ > A different question, suppose I just want to stack them > > [0,0,0,0] -> [0,0,0] > [0,0,0,1] -> [0,0,1] > [0,0,0,2] -> [0,0,2] > ... > [0,0,1,0] -> [0,0,1024] > [0,0,1,1] -> [0,0,1025] > [0,0,1,2] -> [0,0,1026] > ... > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From ndbecker2 at gmail.com Tue Nov 17 13:49:21 2015 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 17 Nov 2015 13:49:21 -0500 Subject: [Numpy-discussion] reshaping array question References: Message-ID: Robert Kern wrote: > On Tue, Nov 17, 2015 at 3:48 PM, Neal Becker wrote: >> >> I have an array of shape >> (7, 24, 2, 1024) >> >> I'd like an array of >> (7, 24, 2048) >> >> such that the elements on the last dimension are interleaving the >> elements from the 3rd dimension >> >> [0,0,0,0] -> [0,0,0] >> [0,0,1,0] -> [0,0,1] >> [0,0,0,1] -> [0,0,2] >> [0,0,1,1] -> [0,0,3] >> ... >> >> What might be the simplest way to do this? > > np.transpose(A, (-2, -1)).reshape(A.shape[:-2] + (-1,)) I get an error on that 1st transpose: here, 'A' is 'fftouts' print (fftouts.shape) print (np.transpose (fftouts, (-2,-1)).shape) (4, 24, 2, 1024) <<< fftouts.shape prints this Traceback (most recent call last): File "test_uw2.py", line 194, in run_line (sys.argv) File "test_uw2.py", line 190, in run_line run (opt) File "test_uw2.py", line 103, in run print (np.transpose (fftouts, (-2,-1)).shape) File "/home/nbecker/.local/lib/python2.7/site- packages/numpy/core/fromnumeric.py", line 551, in transpose return transpose(axes) ValueError: axes don't match array > >> ------------ >> A different question, suppose I just want to stack them >> >> [0,0,0,0] -> [0,0,0] >> [0,0,0,1] -> [0,0,1] >> [0,0,0,2] -> [0,0,2] >> ... >> [0,0,1,0] -> [0,0,1024] >> [0,0,1,1] -> [0,0,1025] >> [0,0,1,2] -> [0,0,1026] >> ... > > A.reshape(A.shape[:-2] + (-1,)) > > -- > Robert Kern From sebastian at sipsolutions.net Tue Nov 17 13:53:34 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 17 Nov 2015 19:53:34 +0100 Subject: [Numpy-discussion] reshaping array question In-Reply-To: References: Message-ID: <1447786414.2734.3.camel@sipsolutions.net> On Di, 2015-11-17 at 13:49 -0500, Neal Becker wrote: > Robert Kern wrote: > > > On Tue, Nov 17, 2015 at 3:48 PM, Neal Becker wrote: > >> > >> I have an array of shape > >> (7, 24, 2, 1024) > >> > >> I'd like an array of > >> (7, 24, 2048) > >> > >> such that the elements on the last dimension are interleaving the > >> elements from the 3rd dimension > >> > >> [0,0,0,0] -> [0,0,0] > >> [0,0,1,0] -> [0,0,1] > >> [0,0,0,1] -> [0,0,2] > >> [0,0,1,1] -> [0,0,3] > >> ... > >> > >> What might be the simplest way to do this? > > > > np.transpose(A, (-2, -1)).reshape(A.shape[:-2] + (-1,)) > > I get an error on that 1st transpose: > Transpose needs a slightly different input. If you look at the help, it should be clear. The help might also point to np.swapaxes, which may be a bit more straight forward for this exact case. > here, 'A' is 'fftouts' > > print (fftouts.shape) > print (np.transpose (fftouts, (-2,-1)).shape) > > (4, 24, 2, 1024) <<< fftouts.shape prints this > Traceback (most recent call last): > File "test_uw2.py", line 194, in > run_line (sys.argv) > File "test_uw2.py", line 190, in run_line > run (opt) > File "test_uw2.py", line 103, in run > print (np.transpose (fftouts, (-2,-1)).shape) > File "/home/nbecker/.local/lib/python2.7/site- > packages/numpy/core/fromnumeric.py", line 551, in transpose > return transpose(axes) > ValueError: axes don't match array > > > > >> ------------ > >> A different question, suppose I just want to stack them > >> > >> [0,0,0,0] -> [0,0,0] > >> [0,0,0,1] -> [0,0,1] > >> [0,0,0,2] -> [0,0,2] > >> ... > >> [0,0,1,0] -> [0,0,1024] > >> [0,0,1,1] -> [0,0,1025] > >> [0,0,1,2] -> [0,0,1026] > >> ... > > > > A.reshape(A.shape[:-2] + (-1,)) > > > > -- > > Robert Kern > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From robert.kern at gmail.com Tue Nov 17 14:05:09 2015 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 17 Nov 2015 19:05:09 +0000 Subject: [Numpy-discussion] reshaping array question In-Reply-To: <1447786414.2734.3.camel@sipsolutions.net> References: <1447786414.2734.3.camel@sipsolutions.net> Message-ID: On Nov 17, 2015 6:53 PM, "Sebastian Berg" wrote: > > On Di, 2015-11-17 at 13:49 -0500, Neal Becker wrote: > > Robert Kern wrote: > > > > > On Tue, Nov 17, 2015 at 3:48 PM, Neal Becker wrote: > > >> > > >> I have an array of shape > > >> (7, 24, 2, 1024) > > >> > > >> I'd like an array of > > >> (7, 24, 2048) > > >> > > >> such that the elements on the last dimension are interleaving the > > >> elements from the 3rd dimension > > >> > > >> [0,0,0,0] -> [0,0,0] > > >> [0,0,1,0] -> [0,0,1] > > >> [0,0,0,1] -> [0,0,2] > > >> [0,0,1,1] -> [0,0,3] > > >> ... > > >> > > >> What might be the simplest way to do this? > > > > > > np.transpose(A, (-2, -1)).reshape(A.shape[:-2] + (-1,)) > > > > I get an error on that 1st transpose: > > > > Transpose needs a slightly different input. If you look at the help, it > should be clear. The help might also point to np.swapaxes, which may be > a bit more straight forward for this exact case. Sorry about that. Was in a rush and working from a faulty memory. -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Thu Nov 19 16:53:13 2015 From: fperez.net at gmail.com (Fernando Perez) Date: Thu, 19 Nov 2015 13:53:13 -0800 Subject: [Numpy-discussion] [JOB] Project Jupyter is hiring two postdoctoral fellows @ UC Berkeley Message-ID: Hi all, We are delighted to announce today that Project Jupyter/IPython has two postdoctoral fellowships open at UC Berkeley, open immediately. Interested candidates can apply here: https://aprecruit.berkeley.edu/apply/JPF00899 We hope to find candidates who will work on a number of challenging questions over the next few years, as described in our grant proposal here: http://blog.jupyter.org/2015/07/07/project-jupyter-computational-narratives-as-the-engine-of-collaborative-data-science/ Interested candidates should carefully read that proposal before applying to familiarize themselves with the full scope of the questions we intend to tackle. We'd like to thank the support of the Helmsley Trust, the Gordon and Betty Moore Foundation and the Alfred P. Sloan Foundation. Cheers, Brian Granger and Fernando Perez. -- Fernando Perez (@fperez_org; http://fperez.org) fperez.net-at-gmail: mailing lists only (I ignore this when swamped!) fernando.perez-at-berkeley: contact me here for any direct mail -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Nov 20 15:40:16 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 20 Nov 2015 15:40:16 -0500 Subject: [Numpy-discussion] asarray(sparse) -> object Message-ID: Is this intentional? >>> exog <50x5 sparse matrix of type '' with 50 stored elements in Compressed Sparse Column format> >>> np.asarray(exog) array(<50x5 sparse matrix of type '' with 50 stored elements in Compressed Sparse Column format>, dtype=object) I'm just a newbie who thought to use the usual pattern. .... >>> np.asarray(exog).dot(beta) array([ <50x5 sparse matrix of type '' with 50 stored elements in Compressed Sparse Column format>, <50x5 sparse matrix of type '' with 50 stored elements in Compressed Sparse Column format>, <50x5 sparse matrix of type '' with 50 stored elements in Compressed Sparse Column format>, <50x5 sparse matrix of type '' with 50 stored elements in Compressed Sparse Column format>, <50x5 sparse matrix of type '' with 50 stored elements in Compressed Sparse Column format>], dtype=object) C:\programs\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\scipy\sparse\compressed.py:306: SparseEfficiencyWarning: Comparing sparse matrices using >= and <= is inefficient, using <, >, or !=, instead. "using <, >, or !=, instead.", SparseEfficiencyWarning) seems to warn only once >>> y = np.asarray(exog).dot(beta) >>> y.shape (5,) >>> np.__version__ '1.9.2rc1' >>> scipy.__version__ '0.15.1' Josef -------------- next part -------------- An HTML attachment was scrubbed... URL: From orion at cora.nwra.com Fri Nov 20 15:42:11 2015 From: orion at cora.nwra.com (Orion Poplawski) Date: Fri, 20 Nov 2015 13:42:11 -0700 Subject: [Numpy-discussion] Numpy 1.10.2rc1 In-Reply-To: References: Message-ID: <564F85A3.20204@cora.nwra.com> On 11/12/2015 02:11 PM, Charles R Harris wrote: > Hi All, > > I am pleased to announce the release of Numpy 1.10.2rc1. This release should > fix the problems exposed in 1.10.1, which is not to say there are no remaining > problems. Please test this thoroughly, exspecially if you experienced problems > with 1.10.1. Julian Taylor has opened an issue relating to cblas detection on > Debian (and probably Debian derived distributions) that is not dealt with in > this release. Hopefully a solution will be available before the final. So, this fails: File "setup.py", line 427, in fortran_extensionlists if StrictVersion(np.version.version) > StrictVersion("1.6.1"): File "/usr/lib64/python2.7/distutils/version.py", line 40, in __init__ self.parse(vstring) File "/usr/lib64/python2.7/distutils/version.py", line 107, in parse raise ValueError, "invalid version number '%s'" % vstring ValueError: invalid version number '1.10.2rc1' But I'm not sure numpy has made any contracts to follow the distutils StrictVersion format: http://epydoc.sourceforge.net/stdlib/distutils.version.StrictVersion-class.html Any thoughts? -- Orion Poplawski Technical Manager 303-415-9701 x222 NWRA, Boulder/CoRA Office FAX: 303-415-9702 3380 Mitchell Lane orion at nwra.com Boulder, CO 80301 http://www.nwra.com From charlesr.harris at gmail.com Fri Nov 20 16:00:37 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 20 Nov 2015 14:00:37 -0700 Subject: [Numpy-discussion] Numpy 1.10.2rc1 In-Reply-To: <564F85A3.20204@cora.nwra.com> References: <564F85A3.20204@cora.nwra.com> Message-ID: On Fri, Nov 20, 2015 at 1:42 PM, Orion Poplawski wrote: > On 11/12/2015 02:11 PM, Charles R Harris wrote: > > Hi All, > > > > I am pleased to announce the release of Numpy 1.10.2rc1. This release > should > > fix the problems exposed in 1.10.1, which is not to say there are no > remaining > > problems. Please test this thoroughly, exspecially if you experienced > problems > > with 1.10.1. Julian Taylor has opened an issue relating to cblas > detection on > > Debian (and probably Debian derived distributions) that is not dealt > with in > > this release. Hopefully a solution will be available before the final. > > So, this fails: > > File "setup.py", line 427, in fortran_extensionlists > if StrictVersion(np.version.version) > StrictVersion("1.6.1"): > File "/usr/lib64/python2.7/distutils/version.py", line 40, in __init__ > self.parse(vstring) > File "/usr/lib64/python2.7/distutils/version.py", line 107, in parse > raise ValueError, "invalid version number '%s'" % vstring > ValueError: invalid version number '1.10.2rc1' > > But I'm not sure numpy has made any contracts to follow the distutils > StrictVersion format: > > http://epydoc.sourceforge.net/stdlib/distutils.version.StrictVersion-class.html No, we don't support StrictVersion nor does Scipy. """Utility to compare (Numpy) version strings. The NumpyVersion class allows properly comparing numpy version strings. The LooseVersion and StrictVersion classes that distutils provides don't work; they don't recognize anything like alpha/beta/rc/dev versions. """ Looks like `numpy/distutils/mingw32ccompiler.py` needs fixing. Could you open an issue? The import of StrictVersion dates back to 2005, so not sure why this is turning up now. Maybe it is specific to compiling Fortran and we haven't done that with rc's before. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Nov 20 16:37:33 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 20 Nov 2015 14:37:33 -0700 Subject: [Numpy-discussion] Numpy 1.10.2rc1 In-Reply-To: References: <564F85A3.20204@cora.nwra.com> Message-ID: On Fri, Nov 20, 2015 at 2:00 PM, Charles R Harris wrote: > > > On Fri, Nov 20, 2015 at 1:42 PM, Orion Poplawski > wrote: > >> On 11/12/2015 02:11 PM, Charles R Harris wrote: >> > Hi All, >> > >> > I am pleased to announce the release of Numpy 1.10.2rc1. This release >> should >> > fix the problems exposed in 1.10.1, which is not to say there are no >> remaining >> > problems. Please test this thoroughly, exspecially if you experienced >> problems >> > with 1.10.1. Julian Taylor has opened an issue relating to cblas >> detection on >> > Debian (and probably Debian derived distributions) that is not dealt >> with in >> > this release. Hopefully a solution will be available before the final. >> >> So, this fails: >> >> File "setup.py", line 427, in fortran_extensionlists >> if StrictVersion(np.version.version) > StrictVersion("1.6.1"): >> File "/usr/lib64/python2.7/distutils/version.py", line 40, in __init__ >> self.parse(vstring) >> File "/usr/lib64/python2.7/distutils/version.py", line 107, in parse >> raise ValueError, "invalid version number '%s'" % vstring >> ValueError: invalid version number '1.10.2rc1' >> >> But I'm not sure numpy has made any contracts to follow the distutils >> StrictVersion format: >> >> http://epydoc.sourceforge.net/stdlib/distutils.version.StrictVersion-class.html > > > No, we don't support StrictVersion nor does Scipy. > > """Utility to compare (Numpy) version strings. > > The NumpyVersion class allows properly comparing numpy version strings. > The LooseVersion and StrictVersion classes that distutils provides don't > work; they don't recognize anything like alpha/beta/rc/dev versions. > > """ > > Looks like `numpy/distutils/mingw32ccompiler.py` needs fixing. Could you > open an issue? The import of StrictVersion dates back to 2005, so not sure > why this is turning up now. Maybe it is specific to compiling Fortran and > we haven't done that with rc's before. > In fact, I don't see where the call is coming from. Is this something specific to your project? If so, you want to use NumpyVersion which you can import from `numpy.lib`. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From perimosocordiae at gmail.com Fri Nov 20 18:29:08 2015 From: perimosocordiae at gmail.com (CJ Carey) Date: Fri, 20 Nov 2015 17:29:08 -0600 Subject: [Numpy-discussion] asarray(sparse) -> object In-Reply-To: References: Message-ID: The short answer is: "kind of". These two Github issues explain what's going on more in-depth: https://github.com/scipy/scipy/issues/3995 https://github.com/scipy/scipy/issues/4239 As for the warning only showing once, that's Python's default behavior for warnings: http://stackoverflow.com/q/22661745/10601 -CJ On Fri, Nov 20, 2015 at 2:40 PM, wrote: > Is this intentional? > > > >>> exog > <50x5 sparse matrix of type '' > with 50 stored elements in Compressed Sparse Column format> > > >>> np.asarray(exog) > array(<50x5 sparse matrix of type '' > with 50 stored elements in Compressed Sparse Column format>, dtype=object) > > > I'm just a newbie who thought to use the usual pattern. > > > .... > > >>> np.asarray(exog).dot(beta) > array([ <50x5 sparse matrix of type '' > with 50 stored elements in Compressed Sparse Column format>, > <50x5 sparse matrix of type '' > with 50 stored elements in Compressed Sparse Column format>, > <50x5 sparse matrix of type '' > with 50 stored elements in Compressed Sparse Column format>, > <50x5 sparse matrix of type '' > with 50 stored elements in Compressed Sparse Column format>, > <50x5 sparse matrix of type '' > with 50 stored elements in Compressed Sparse Column format>], dtype=object) > C:\programs\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\scipy\sparse\compressed.py:306: > SparseEfficiencyWarning: Comparing sparse matrices using >= and <= is > inefficient, using <, >, or !=, instead. > "using <, >, or !=, instead.", SparseEfficiencyWarning) > > seems to warn only once > > >>> y = np.asarray(exog).dot(beta) > >>> y.shape > (5,) > > > >>> np.__version__ > '1.9.2rc1' > > >>> scipy.__version__ > '0.15.1' > > > > Josef > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Nov 20 18:57:08 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 20 Nov 2015 18:57:08 -0500 Subject: [Numpy-discussion] asarray(sparse) -> object In-Reply-To: References: Message-ID: On Fri, Nov 20, 2015 at 6:29 PM, CJ Carey wrote: > The short answer is: "kind of". > > These two Github issues explain what's going on more in-depth: > https://github.com/scipy/scipy/issues/3995 > https://github.com/scipy/scipy/issues/4239 > Thanks, I didn't pay attention to those issues, or only very superficially. +1 for doing anything else than converting to object arrays. > > > As for the warning only showing once, that's Python's default behavior for > warnings: http://stackoverflow.com/q/22661745/10601 > The default should be overwritten for warnings that are always relevant. I usually don't use sparse arrays, and don't know if this should always warn. Josef > > -CJ > > On Fri, Nov 20, 2015 at 2:40 PM, wrote: > >> Is this intentional? >> >> >> >>> exog >> <50x5 sparse matrix of type '' >> with 50 stored elements in Compressed Sparse Column format> >> >> >>> np.asarray(exog) >> array(<50x5 sparse matrix of type '' >> with 50 stored elements in Compressed Sparse Column format>, dtype=object) >> >> >> I'm just a newbie who thought to use the usual pattern. >> >> >> .... >> >> >>> np.asarray(exog).dot(beta) >> array([ <50x5 sparse matrix of type '' >> with 50 stored elements in Compressed Sparse Column format>, >> <50x5 sparse matrix of type '' >> with 50 stored elements in Compressed Sparse Column format>, >> <50x5 sparse matrix of type '' >> with 50 stored elements in Compressed Sparse Column format>, >> <50x5 sparse matrix of type '' >> with 50 stored elements in Compressed Sparse Column format>, >> <50x5 sparse matrix of type '' >> with 50 stored elements in Compressed Sparse Column format>], >> dtype=object) >> C:\programs\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\scipy\sparse\compressed.py:306: >> SparseEfficiencyWarning: Comparing sparse matrices using >= and <= is >> inefficient, using <, >, or !=, instead. >> "using <, >, or !=, instead.", SparseEfficiencyWarning) >> >> seems to warn only once >> >> >>> y = np.asarray(exog).dot(beta) >> >>> y.shape >> (5,) >> >> >> >>> np.__version__ >> '1.9.2rc1' >> >> >>> scipy.__version__ >> '0.15.1' >> >> >> >> Josef >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeffreback at gmail.com Sat Nov 21 08:43:52 2015 From: jeffreback at gmail.com (Jeff Reback) Date: Sat, 21 Nov 2015 08:43:52 -0500 Subject: [Numpy-discussion] ANN: pandas v0.17.1 Released Message-ID: Hi, We are proud to announce that *pandas* has become a sponsored project of the NUMFocus organization This will help ensure the success of development of *pandas* as a world-class open-source project. This is a minor bug-fix release from 0.17.0 and includes a large number of bug fixes along several new features, enhancements, and performance improvements. We recommend that all users upgrade to this version. This was a release of 5 weeks with 176 commits by 61 authors encompassing 84 issues and 128 pull-requests. *What is it:* *pandas* is a Python package providing fast, flexible, and expressive data structures designed to make working with ?relational? or ?labeled? data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. *Highlights*: - Support for Conditional HTML Formatting, see here - Releasing the GIL on the csv reader & other ops, see here - Fixed regression in DataFrame.drop_duplicates from 0.16.2, causing incorrect results on integer values see Issue 11376 See the Whatsnew for much more information and the full Documentation link. *How to get it:* Source tarballs, windows wheels, and macosx wheels are available on PyPI Installation via conda is: - conda install pandas windows wheels are courtesy of Christoph Gohlke and are built on Numpy 1.9 macosx wheels are courtesy of Matthew Brett *Issues:* Please report any issues on our issue tracker : Jeff *Thanks to all of the contributors* * - Aleksandr Drozd - Alex Chase - Anthonios Partheniou - BrenBarn - Brian J. McGuirk - Chris - Christian Berendt - Christian Perez - Cody Piersall - Data & Code Expert Experimenting with Code on Data - DrIrv - Evan Wright - Guillaume Gay - Hamed Saljooghinejad - Iblis Lin - Jake VanderPlas - Jan Schulz - Jean-Mathieu Deschenes - Jeff Reback - Jimmy Callin - Joris Van den Bossche - K.-Michael Aye - Ka Wo Chen - Lo?c S?guin-C - Luo Yicheng - Magnus J?ud - Manuel Leonhardt - Matthew Gilbert - Maximilian Roos - Michael - Nicholas Stahl - Nicolas Bonnotte - Pastafarianist - Petra Chong - Phil Schaf - Philipp A - Rob deCarvalho - Roman Khomenko - R?my L?one - Sebastian Bank - Thierry Moisan - Tom Augspurger - Tux1 - Varun - Wieland Hoffmann - Winterflower - Yoav Ram - Younggun Kim - Zeke - ajcr - azuranski - behzad nouri - cel4 - emilydolson - hironow - lexual - llllllllll - rockg - silentquasar - sinhrks - taeold * -------------- next part -------------- An HTML attachment was scrubbed... URL: From brad.reisfeld at gmail.com Sat Nov 21 09:30:46 2015 From: brad.reisfeld at gmail.com (Brad Reisfeld) Date: Sat, 21 Nov 2015 06:30:46 -0800 (PST) Subject: [Numpy-discussion] ANN: pandas v0.17.1 Released In-Reply-To: References: Message-ID: <10c8bcba-8046-432d-a135-299597052898@googlegroups.com> To Jeff and all of the contributors, Thank you for your hard and dedicated work on pandas! It is an awesome package that gets better with every release. -Brad On Saturday, November 21, 2015 at 6:44:02 AM UTC-7, Jeff wrote: > > Hi, > > We are proud to announce that *pandas* has become a sponsored project of > the NUMFocus organization > > - private > > This will help ensure the success of development of *pandas* as a > world-class open-source project. > > This is a minor bug-fix release from 0.17.0 and includes a large number of > bug fixes along several new features, enhancements, and performance > improvements. > We recommend that all users upgrade to this version. > > This was a release of 5 weeks with 176 commits by 61 authors encompassing > 84 issues and 128 pull-requests. > > > *What is it:* > > *pandas* is a Python package providing fast, flexible, and expressive data > structures designed to make working with ?relational? or ?labeled? data > both > easy and intuitive. It aims to be the fundamental high-level building > block for > doing practical, real world data analysis in Python. Additionally, it has > the > broader goal of becoming the most powerful and flexible open source data > analysis / manipulation tool available in any language. > > *Highlights*: > > > - Support for Conditional HTML Formatting, see here > > - private > > - Releasing the GIL on the csv reader & other ops, see here > > - private > > - Fixed regression in DataFrame.drop_duplicates from 0.16.2, causing > incorrect results on integer values see Issue 11376 > > > See the Whatsnew > - > private > for > much more information and the full Documentation > - private > link. > > *How to get it:* > > Source tarballs, windows wheels, and macosx wheels are available on PyPI > - private > > > Installation via conda is: > > - conda install pandas > > windows wheels are courtesy of Christoph Gohlke and are built on Numpy > 1.9 > macosx wheels are courtesy of Matthew Brett > > *Issues:* > > Please report any issues on our issue tracker > - private > : > > Jeff > > *Thanks to all of the contributors* > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > * - Aleksandr Drozd - Alex Chase - Anthonios Partheniou - BrenBarn - Brian > J. McGuirk - Chris - Christian Berendt - Christian Perez - Cody Piersall - > Data & Code Expert Experimenting with Code on Data - DrIrv - Evan Wright - > Guillaume Gay - Hamed Saljooghinejad - Iblis Lin - Jake VanderPlas - Jan > Schulz - Jean-Mathieu Deschenes - Jeff Reback - Jimmy Callin - Joris Van > den Bossche - K.-Michael Aye - Ka Wo Chen - Lo?c S?guin-C - Luo Yicheng - > Magnus J?ud - Manuel Leonhardt - Matthew Gilbert - Maximilian Roos - > Michael - Nicholas Stahl - Nicolas Bonnotte - Pastafarianist - Petra Chong > - Phil Schaf - Philipp A - Rob deCarvalho - Roman Khomenko - R?my L?one - > Sebastian Bank - Thierry Moisan - Tom Augspurger - Tux1 - Varun - Wieland > Hoffmann - Winterflower - Yoav Ram - Younggun Kim - Zeke - ajcr - azuranski > - behzad nouri - cel4 - emilydolson - hironow - lexual - llllllllll - rockg > - silentquasar - sinhrks - taeold * > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From glenn.caltech at gmail.com Sat Nov 21 22:54:27 2015 From: glenn.caltech at gmail.com (G Jones) Date: Sat, 21 Nov 2015 22:54:27 -0500 Subject: [Numpy-discussion] record array performance issue / bug Message-ID: Hi, Using the latest numpy from anaconda (1.10.1) on Python 2.7, I found that the following code works OK if npackets = 2, but acts bizarrely if npackets is large (2**12): ----------- npackets = 2**12 dlen=2048 PacketType = np.dtype([('timestamp','float64'), ('pkts',np.dtype(('int8',(npackets,dlen)))), ('data',np.dtype(('int8',(npackets*dlen,)))), ]) b = np.zeros((1,),dtype=PacketType) b['timestamp'] # Should return array([0.0]) ---------------- Specifically, if npackets is large, i.e. 2**12 or 2**16, trying to access b['timestamp'] results in 100% CPU usage while the memory consumption is increasing by hundreds of MB per second. When I interrupt, I find the traceback in numpy/core/_internal.pyc : _get_all_field_offsets Since it seems to work for small values of npackets, I suspect that if I had the memory and time, the access to b['timestamp'] would eventually return, so I think the issue is that the algorithm doesn't scale well with record dtypes made up of lots of bytes. Looking on Github, I can see this code has been in flux recently, but I can't quite tell if the issue I'm seeing is addressed by the issues being discussed and tackled there. Thanks, Glenn -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Nov 22 11:52:04 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 22 Nov 2015 09:52:04 -0700 Subject: [Numpy-discussion] record array performance issue / bug In-Reply-To: References: Message-ID: On Sat, Nov 21, 2015 at 8:54 PM, G Jones wrote: > Hi, > Using the latest numpy from anaconda (1.10.1) on Python 2.7, I found that > the following code works OK if npackets = 2, but acts bizarrely if npackets > is large (2**12): > > ----------- > > npackets = 2**12 > dlen=2048 > PacketType = np.dtype([('timestamp','float64'), > ('pkts',np.dtype(('int8',(npackets,dlen)))), > ('data',np.dtype(('int8',(npackets*dlen,)))), > ]) > > b = np.zeros((1,),dtype=PacketType) > > b['timestamp'] # Should return array([0.0]) > > ---------------- > > Specifically, if npackets is large, i.e. 2**12 or 2**16, trying to access > b['timestamp'] results in 100% CPU usage while the memory consumption is > increasing by hundreds of MB per second. When I interrupt, I find the > traceback in numpy/core/_internal.pyc : _get_all_field_offsets > Since it seems to work for small values of npackets, I suspect that if I > had the memory and time, the access to b['timestamp'] would eventually > return, so I think the issue is that the algorithm doesn't scale well with > record dtypes made up of lots of bytes. > Looking on Github, I can see this code has been in flux recently, but I > can't quite tell if the issue I'm seeing is addressed by the issues being > discussed and tackled there. > This should be fixed in 1.10.2. 1.10.2rc1 is up on sourceforge if you want to test it. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From eliben at gmail.com Mon Nov 23 16:31:36 2015 From: eliben at gmail.com (Eli Bendersky) Date: Mon, 23 Nov 2015 13:31:36 -0800 Subject: [Numpy-discussion] understanding buffering done when broadcasting Message-ID: Hello, I'm trying to understand the buffering done by the Numpy iterator interface (the new post 1.6-one) when running ufuncs on arrays that require broadcasting. Consider this simple case: In [35]: m = np.arange(16).reshape(4,4) In [37]: n = np.arange(4) In [39]: m + n Out[39]: array([[ 0, 2, 4, 6], [ 4, 6, 8, 10], [ 8, 10, 12, 14], [12, 14, 16, 18]]) If I instrument Numpy (setting NPY_IT_DBG_TRACING and such), I see that when the add() ufunc is called, 'n' is copied into a temporary buffer by the iterator. The ufunc then gets the buffer as its data. My question is: why is this buffering needed? It seems wasteful, since no casting is required here, no special alignment problems and also 'n' is contiguously laid out in memory. It seems that it would be more efficient to just use 'n' in the ufunc instead of passing in the buffer. What am I missing? Thanks in advance, Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Mon Nov 23 17:09:53 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 23 Nov 2015 23:09:53 +0100 Subject: [Numpy-discussion] understanding buffering done when broadcasting In-Reply-To: References: Message-ID: <1448316593.1604.33.camel@sipsolutions.net> On Mo, 2015-11-23 at 13:31 -0800, Eli Bendersky wrote: > Hello, > > > I'm trying to understand the buffering done by the Numpy iterator > interface (the new post 1.6-one) when running ufuncs on arrays that > require broadcasting. Consider this simple case: > > In [35]: m = np.arange(16).reshape(4,4) > In [37]: n = np.arange(4) > In [39]: m + n > Out[39]: > array([[ 0, 2, 4, 6], > [ 4, 6, 8, 10], > [ 8, 10, 12, 14], > [12, 14, 16, 18]]) > > On first sight this seems true. However, there is one other point to consider here. The inner ufunc loop can only handle a single stride. The contiguous array `n` has to be iterated as if it had the strides `(0, 8)`, which is not the strides of the contiguous array `m` which can be "unrolled" to 1-D. Those effective strides are thus not contiguous for the inner ufunc loop and cannot be unrolled into a single ufunc call! The optimization (which might kick in a bit more broadly maybe), is thus that the number of inner loop calls is minimized, whether that is worth it, I am not sure, it may well be that there is some worthy optimization possible here. Note however, that this does not occur for large inner loop sizes (though I think you can find some "bad" sizes): ``` In [18]: n = np.arange(40000) In [19]: m = np.arange(160000).reshape(4,40000) In [20]: o = m + n Iterator: Checking casting for operand 0 op: dtype('int64'), iter: dtype('int64') Iterator: Checking casting for operand 1 op: dtype('int64'), iter: dtype('int64') Iterator: Checking casting for operand 2 op: , iter: dtype('int64') Iterator: Setting allocated stride 1 for iterator dimension 0 to 8 Iterator: Setting allocated stride 0 for iterator dimension 1 to 320000 Iterator: Copying inputs to buffers Iterator: Expanding inner loop size from 8192 to 40000 since buffering wasn't needed Any buffering needed: 0 Iterator: Finished copying inputs to buffers (buffered size is 40000) ``` Anyway, feel free to have a look ;). The code is not the most read one in NumPy, and it would not surprise me a lot if you can find something to tweak. - Sebastian > > If I instrument Numpy (setting NPY_IT_DBG_TRACING and such), I see > that when the add() ufunc is called, 'n' is copied into a temporary > buffer by the iterator. The ufunc then gets the buffer as its data. > > > My question is: why is this buffering needed? It seems wasteful, since > no casting is required here, no special alignment problems and also > 'n' is contiguously laid out in memory. It seems that it would be more > efficient to just use 'n' in the ufunc instead of passing in the > buffer. What am I missing? > > > Thanks in advance, > > Eli > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From p.e.creasey.00 at googlemail.com Tue Nov 24 14:42:19 2015 From: p.e.creasey.00 at googlemail.com (Peter Creasey) Date: Tue, 24 Nov 2015 11:42:19 -0800 Subject: [Numpy-discussion] Misleading/erroneous TypeError message Message-ID: Hi, I just upgraded my numpy and started to received a TypeError from one of my codes that relied on the old, less strict, casting behaviour. The error message, however, left me scratching my head when trying to debug something like this: >>> a = array([0],dtype=uint64) >>> a += array([1],dtype=int64) TypeError: Cannot cast ufunc add output from dtype('float64') to dtype('uint64') with casting rule 'same_kind' Where does the 'float64' come from?!?! Peter PS Thanks for all the great work guys, numpy is a fantastic tool and has been a lot of help to me over the years! -------------- next part -------------- An HTML attachment was scrubbed... URL: From jakirkham at gmail.com Tue Nov 24 14:57:00 2015 From: jakirkham at gmail.com (John Kirkham) Date: Tue, 24 Nov 2015 14:57:00 -0500 Subject: [Numpy-discussion] ENH: Add the function 'expand_view' Message-ID: <57E05793-75DE-47A8-896F-F864C6915577@gmail.com> Takes an array and tacks on arbitrary dimensions on either side, which is returned as a view always. Here are the relevant features: * Creates a view of the array that has the dimensions before and after tacked on to it. * Takes the before and after arguments independent of each other and the current shape. * Allows for read and write access to the underlying array. To see an example of what this would look like, see this PR ( https://github.com/numpy/numpy/pull/6713 ). -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Nov 24 15:39:20 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 24 Nov 2015 13:39:20 -0700 Subject: [Numpy-discussion] Misleading/erroneous TypeError message In-Reply-To: References: Message-ID: On Tue, Nov 24, 2015 at 12:42 PM, Peter Creasey < p.e.creasey.00 at googlemail.com> wrote: > Hi, > > I just upgraded my numpy and started to received a TypeError from one of > my codes that relied on the old, less strict, casting behaviour. The error > message, however, left me scratching my head when trying to debug something > like this: > > >>> a = array([0],dtype=uint64) > >>> a += array([1],dtype=int64) > TypeError: Cannot cast ufunc add output from dtype('float64') to > dtype('uint64') with casting rule 'same_kind' > > Where does the 'float64' come from?!?! > The combination of uint64 and int64 leads to promotion to float64 as the best option for the combination of signed and unsigned. To fix things, you can either use `np.add` with an output argument and `casting='unsafe'` or just be careful about using unsigned types. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Nov 24 19:13:07 2015 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 24 Nov 2015 16:13:07 -0800 Subject: [Numpy-discussion] ENH: Add the function 'expand_view' In-Reply-To: <57E05793-75DE-47A8-896F-F864C6915577@gmail.com> References: <57E05793-75DE-47A8-896F-F864C6915577@gmail.com> Message-ID: On Nov 24, 2015 11:57 AM, "John Kirkham" wrote: > > Takes an array and tacks on arbitrary dimensions on either side, which is returned as a view always. Here are the relevant features: > > * Creates a view of the array that has the dimensions before and after tacked on to it. > * Takes the before and after arguments independent of each other and the current shape. > * Allows for read and write access to the underlying array. Can you expand this with some discussion of why you want this function, and why you chose these specific features? (E.g. as mentioned in the PR comments already, the reason broadcast_to returns a read-only array is that it was decided that this was less confusing for users, not because of any technical issue.) -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From eliben at gmail.com Tue Nov 24 19:49:44 2015 From: eliben at gmail.com (Eli Bendersky) Date: Tue, 24 Nov 2015 16:49:44 -0800 Subject: [Numpy-discussion] understanding buffering done when broadcasting In-Reply-To: <1448316593.1604.33.camel@sipsolutions.net> References: <1448316593.1604.33.camel@sipsolutions.net> Message-ID: On Mon, Nov 23, 2015 at 2:09 PM, Sebastian Berg wrote: > On Mo, 2015-11-23 at 13:31 -0800, Eli Bendersky wrote: > > Hello, > > > > > > I'm trying to understand the buffering done by the Numpy iterator > > interface (the new post 1.6-one) when running ufuncs on arrays that > > require broadcasting. Consider this simple case: > > > > In [35]: m = np.arange(16).reshape(4,4) > > In [37]: n = np.arange(4) > > In [39]: m + n > > Out[39]: > > array([[ 0, 2, 4, 6], > > [ 4, 6, 8, 10], > > [ 8, 10, 12, 14], > > [12, 14, 16, 18]]) > > > > > > > On first sight this seems true. However, there is one other point to > consider here. The inner ufunc loop can only handle a single stride. The > contiguous array `n` has to be iterated as if it had the strides > `(0, 8)`, which is not the strides of the contiguous array `m` which can > be "unrolled" to 1-D. Those effective strides are thus not contiguous > for the inner ufunc loop and cannot be unrolled into a single ufunc > call! > > The optimization (which might kick in a bit more broadly maybe), is thus > that the number of inner loop calls is minimized, whether that is worth > it, I am not sure, it may well be that there is some worthy optimization > possible here. > Note however, that this does not occur for large inner loop sizes > (though I think you can find some "bad" sizes): > > ``` > In [18]: n = np.arange(40000) > > In [19]: m = np.arange(160000).reshape(4,40000) > > In [20]: o = m + n > Iterator: Checking casting for operand 0 > op: dtype('int64'), iter: dtype('int64') > Iterator: Checking casting for operand 1 > op: dtype('int64'), iter: dtype('int64') > Iterator: Checking casting for operand 2 > op: , iter: dtype('int64') > Iterator: Setting allocated stride 1 for iterator dimension 0 to 8 > Iterator: Setting allocated stride 0 for iterator dimension 1 to 320000 > Iterator: Copying inputs to buffers > Iterator: Expanding inner loop size from 8192 to 40000 since buffering > wasn't needed > Any buffering needed: 0 > Iterator: Finished copying inputs to buffers (buffered size is 40000) > ``` > The heuristic in the code says that if we can use a single stride and that's larger than the buffer size (which I assume is the default buffer size, and can change) then it's "is_onestride" and no buffering is done. So this led me to explore around this threshold (8192 items by default on my machine), and indeed we can notice funny behavior: In [51]: %%timeit n = arange(8192); m = np.arange(8192*40).reshape(40,8192) o = m + n ....: 1000 loops, best of 3: 274 ?s per loop In [52]: %%timeit n = arange(8292); m = np.arange(8292*40).reshape(40,8292) o = m + n ....: 1000 loops, best of 3: 229 ?s per loop So, given this, it's not very clear why the "optimization" kicks in. Buffering for small sizes seems like a mistake. Eli > > Anyway, feel free to have a look ;). The code is not the most read one > in NumPy, and it would not surprise me a lot if you can find something > to tweak. > > - Sebastian > > > > > > If I instrument Numpy (setting NPY_IT_DBG_TRACING and such), I see > > that when the add() ufunc is called, 'n' is copied into a temporary > > buffer by the iterator. The ufunc then gets the buffer as its data. > > > > > > My question is: why is this buffering needed? It seems wasteful, since > > no casting is required here, no special alignment problems and also > > 'n' is contiguously laid out in memory. It seems that it would be more > > efficient to just use 'n' in the ufunc instead of passing in the > > buffer. What am I missing? > > > > > > Thanks in advance, > > > > Eli > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.e.creasey.00 at googlemail.com Tue Nov 24 20:42:51 2015 From: p.e.creasey.00 at googlemail.com (Peter Creasey) Date: Tue, 24 Nov 2015 17:42:51 -0800 Subject: [Numpy-discussion] Misleading/erroneous TypeError message Message-ID: > > I just upgraded my numpy and started to received a TypeError from one of > > my codes that relied on the old, less strict, casting behaviour. The error > > message, however, left me scratching my head when trying to debug something > > like this: > > > > >>> a = array([0],dtype=uint64) > > >>> a += array([1],dtype=int64) > > TypeError: Cannot cast ufunc add output from dtype('float64') to > > dtype('uint64') with casting rule 'same_kind' > > > > Where does the 'float64' come from?!?! > > > > The combination of uint64 and int64 leads to promotion to float64 as the > best option for the combination of signed and unsigned. To fix things, you > can either use `np.add` with an output argument and `casting='unsafe'` or > just be careful about using unsigned types. Thanks for the quick response. I understand there are reasons for the promotion to float64 (although my expectation would usually be that Numpy is going to follow C conventions), however the I found the error a little unhelpful. In particular Numpy is complaining about a dtype (float64) that it silently promoted to, rather than the dtype that the user provided, which generally seems like a bad idea. Could Numpy somehow complain about the original dtypes in this case? Or at least give a warning about the first promotion (e.g. loss of precision)? Peter From josef.pktd at gmail.com Tue Nov 24 21:13:47 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 24 Nov 2015 21:13:47 -0500 Subject: [Numpy-discussion] ENH: Add the function 'expand_view' In-Reply-To: References: <57E05793-75DE-47A8-896F-F864C6915577@gmail.com> Message-ID: On Tue, Nov 24, 2015 at 7:13 PM, Nathaniel Smith wrote: > On Nov 24, 2015 11:57 AM, "John Kirkham" wrote: > > > > Takes an array and tacks on arbitrary dimensions on either side, which > is returned as a view always. Here are the relevant features: > > > > * Creates a view of the array that has the dimensions before and after > tacked on to it. > > * Takes the before and after arguments independent of each other and the > current shape. > > * Allows for read and write access to the underlying array. > > Can you expand this with some discussion of why you want this function, > and why you chose these specific features? (E.g. as mentioned in the PR > comments already, the reason broadcast_to returns a read-only array is that > it was decided that this was less confusing for users, not because of any > technical issue.) > Why is this a stride_trick? I thought this looks similar to expand_dims and could maybe be implemented with some extra options there. Josef > -n > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Wed Nov 25 03:21:25 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 25 Nov 2015 09:21:25 +0100 Subject: [Numpy-discussion] understanding buffering done when broadcasting In-Reply-To: References: <1448316593.1604.33.camel@sipsolutions.net> Message-ID: <1448439685.16828.14.camel@sipsolutions.net> On Di, 2015-11-24 at 16:49 -0800, Eli Bendersky wrote: > > > On Mon, Nov 23, 2015 at 2:09 PM, Sebastian Berg > wrote: > On Mo, 2015-11-23 at 13:31 -0800, Eli Bendersky wrote: > > Hello, > > > > > > I'm trying to understand the buffering done by the Numpy > iterator > > interface (the new post 1.6-one) when running ufuncs on > arrays that > > require broadcasting. Consider this simple case: > > > > In [35]: m = np.arange(16).reshape(4,4) > > In [37]: n = np.arange(4) > > In [39]: m + n > > Out[39]: > > array([[ 0, 2, 4, 6], > > [ 4, 6, 8, 10], > > [ 8, 10, 12, 14], > > [12, 14, 16, 18]]) > > > > > > > On first sight this seems true. However, there is one other > point to > consider here. The inner ufunc loop can only handle a single > stride. The > contiguous array `n` has to be iterated as if it had the > strides > `(0, 8)`, which is not the strides of the contiguous array `m` > which can > be "unrolled" to 1-D. Those effective strides are thus not > contiguous > for the inner ufunc loop and cannot be unrolled into a single > ufunc > call! > > The optimization (which might kick in a bit more broadly > maybe), is thus > that the number of inner loop calls is minimized, whether that > is worth > it, I am not sure, it may well be that there is some worthy > optimization > possible here. > Note however, that this does not occur for large inner loop > sizes > (though I think you can find some "bad" sizes): > > ``` > In [18]: n = np.arange(40000) > > In [19]: m = np.arange(160000).reshape(4,40000) > > In [20]: o = m + n > Iterator: Checking casting for operand 0 > op: dtype('int64'), iter: dtype('int64') > Iterator: Checking casting for operand 1 > op: dtype('int64'), iter: dtype('int64') > Iterator: Checking casting for operand 2 > op: , iter: dtype('int64') > Iterator: Setting allocated stride 1 for iterator dimension 0 > to 8 > Iterator: Setting allocated stride 0 for iterator dimension 1 > to 320000 > Iterator: Copying inputs to buffers > Iterator: Expanding inner loop size from 8192 to 40000 since > buffering > wasn't needed > Any buffering needed: 0 > Iterator: Finished copying inputs to buffers (buffered size is > 40000) > ``` > > > The heuristic in the code says that if we can use a single stride and > that's larger than the buffer size (which I assume is the default > buffer size, and can change) then it's "is_onestride" and no buffering > is done. > > > So this led me to explore around this threshold (8192 items by default > on my machine), and indeed we can notice funny behavior: > > In [51]: %%timeit n = arange(8192); m = > np.arange(8192*40).reshape(40,8192) > o = m + n > ....: > 1000 loops, best of 3: 274 ?s per loop > > In [52]: %%timeit n = arange(8292); m = > np.arange(8292*40).reshape(40,8292) > o = m + n > ....: > 1000 loops, best of 3: 229 ?s per loop > > > So, given this, it's not very clear why the "optimization" kicks in. > Buffering for small sizes seems like a mistake. > I am pretty sure it is not generally a mistake. Consider the case of an 10000x3 array (note that shrinking the buffer can have great advantage though, I am not sure if this is done usually). If you have (10000, 3) + (3,) arrays, then the ufunc outer loop would have 10000x overhead. Doing the buffering (which I believe has some extra code to be faster), will lower this to a few ufunc inner loop calls. I have not timed it, but I would be a bit surprised if it was not faster in this case at least. Even calling a C function (and looping) for an inner loop of 3 elements, should be quite a bit of overhead, and my guess is more overhead than the buffering. - Sebastian > > Eli > > > > > Anyway, feel free to have a look ;). The code is not the most > read one > in NumPy, and it would not surprise me a lot if you can find > something > to tweak. > > - Sebastian > > > > > > If I instrument Numpy (setting NPY_IT_DBG_TRACING and such), > I see > > that when the add() ufunc is called, 'n' is copied into a > temporary > > buffer by the iterator. The ufunc then gets the buffer as > its data. > > > > > > My question is: why is this buffering needed? It seems > wasteful, since > > no casting is required here, no special alignment problems > and also > > 'n' is contiguously laid out in memory. It seems that it > would be more > > efficient to just use 'n' in the ufunc instead of passing in > the > > buffer. What am I missing? > > > > > > Thanks in advance, > > > > Eli > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From antlarac at gmail.com Wed Nov 25 17:31:47 2015 From: antlarac at gmail.com (Antonio Lara) Date: Wed, 25 Nov 2015 23:31:47 +0100 Subject: [Numpy-discussion] New functions added in pull request Message-ID: Hello, I have added three new functions to the file function_base.py in the numpy/lib folder. These are divergence, curl and laplacian (for the moment, laplacian of a scalar field, maybe in the future I will try laplacian for a vector field). The calculation method is based in the existing one for numpy.gradient, with central differences. The changes are in this pull request: https://github.com/numpy/numpy/pull/6727 Thank you, Antonio -------------- next part -------------- An HTML attachment was scrubbed... URL: From manolo at austrohungaro.com Thu Nov 26 10:18:01 2015 From: manolo at austrohungaro.com (Manolo =?iso-8859-1?Q?Mart=EDnez?=) Date: Thu, 26 Nov 2015 16:18:01 +0100 Subject: [Numpy-discussion] Recognizing a cycle in a vector Message-ID: <20151126151801.GA12553@beagle> Dear all, Suppose that I have a vector with the numerical solution of a differential equation -- more concretely, I am working with evolutionary game theory models, and the solutions are frequencies of types in a population that follows the replicator dynamics; but this is probably irrelevant. Sometimes these solutions are cyclical, yet I sample at points which do not correspond with the period of the cycle, so that np.allclose() cannot be directly applied. Is there any way to check for cycles in this situation? Thanks for any advice, Manolo From ben.v.root at gmail.com Thu Nov 26 10:30:57 2015 From: ben.v.root at gmail.com (Benjamin Root) Date: Thu, 26 Nov 2015 10:30:57 -0500 Subject: [Numpy-discussion] ENH: Add the function 'expand_view' In-Reply-To: References: <57E05793-75DE-47A8-896F-F864C6915577@gmail.com> Message-ID: How is this different from using np.newaxis and broadcasting? Or am I misunderstanding this? Ben Root On Tue, Nov 24, 2015 at 9:13 PM, wrote: > > > On Tue, Nov 24, 2015 at 7:13 PM, Nathaniel Smith wrote: > >> On Nov 24, 2015 11:57 AM, "John Kirkham" wrote: >> > >> > Takes an array and tacks on arbitrary dimensions on either side, which >> is returned as a view always. Here are the relevant features: >> > >> > * Creates a view of the array that has the dimensions before and after >> tacked on to it. >> > * Takes the before and after arguments independent of each other and >> the current shape. >> > * Allows for read and write access to the underlying array. >> >> Can you expand this with some discussion of why you want this function, >> and why you chose these specific features? (E.g. as mentioned in the PR >> comments already, the reason broadcast_to returns a read-only array is that >> it was decided that this was less confusing for users, not because of any >> technical issue.) >> > > Why is this a stride_trick? > > I thought this looks similar to expand_dims and could maybe be implemented > with some extra options there. > > > > Josef > > > >> -n >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.v.root at gmail.com Thu Nov 26 10:32:32 2015 From: ben.v.root at gmail.com (Benjamin Root) Date: Thu, 26 Nov 2015 10:32:32 -0500 Subject: [Numpy-discussion] New functions added in pull request In-Reply-To: References: Message-ID: Oooh, this will be nice to have. This would be one of the few times I would love to see unicode versions of these function names supplied, too. On Wed, Nov 25, 2015 at 5:31 PM, Antonio Lara wrote: > Hello, I have added three new functions to the file function_base.py in > the numpy/lib folder. These are divergence, curl and laplacian (for the > moment, laplacian of a scalar field, maybe in the future I will try > laplacian for a vector field). The calculation method is based in the > existing one for numpy.gradient, with central differences. > The changes are in this pull request: > > https://github.com/numpy/numpy/pull/6727 > > Thank you, > > Antonio > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Permafacture at gmail.com Thu Nov 26 11:32:05 2015 From: Permafacture at gmail.com (Elliot Hallmark) Date: Thu, 26 Nov 2015 10:32:05 -0600 Subject: [Numpy-discussion] Recognizing a cycle in a vector In-Reply-To: <20151126151801.GA12553@beagle> References: <20151126151801.GA12553@beagle> Message-ID: Fast fourier transform (fft)? On Nov 26, 2015 9:21 AM, "Manolo Mart?nez" wrote: > Dear all, > > Suppose that I have a vector with the numerical solution of a > differential equation -- more concretely, I am working with evolutionary > game theory models, and the solutions are frequencies of types in a > population that follows the replicator dynamics; but this is probably > irrelevant. > > Sometimes these solutions are cyclical, yet I sample at points which do > not correspond with the period of the cycle, so that np.allclose() > cannot be directly applied. > > Is there any way to check for cycles in this situation? > > Thanks for any advice, > Manolo > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sank.daniel at gmail.com Thu Nov 26 11:38:40 2015 From: sank.daniel at gmail.com (Daniel Sank) Date: Thu, 26 Nov 2015 08:38:40 -0800 Subject: [Numpy-discussion] Recognizing a cycle in a vector In-Reply-To: References: <20151126151801.GA12553@beagle> Message-ID: Manolo, >> Is there any way to check for cycles in this situation? > Fast fourier transform (fft)? +1 For using a discrete Fourier transform, as implemented by numpy.fft.fft. You mentioned that you sample at points which do not correspond with the period of the signal; this introduces a slight complexity in how the Fourier transform reflects information about the original signal. I attach two documents to this email with details about those (and other) complexities. There is also much information on this topic online and in signal processing books. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: spectral_leakage.pdf Type: application/pdf Size: 185839 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: dtft_aliasing.pdf Type: application/pdf Size: 268775 bytes Desc: not available URL: From sank.daniel at gmail.com Thu Nov 26 11:40:44 2015 From: sank.daniel at gmail.com (Daniel Sank) Date: Thu, 26 Nov 2015 08:40:44 -0800 Subject: [Numpy-discussion] Recognizing a cycle in a vector In-Reply-To: <20151126151801.GA12553@beagle> References: <20151126151801.GA12553@beagle> Message-ID: Oops, that leakage document is incomplete. Guess I should finish it up. On Thu, Nov 26, 2015 at 7:18 AM, Manolo Mart?nez wrote: > Dear all, > > Suppose that I have a vector with the numerical solution of a > differential equation -- more concretely, I am working with evolutionary > game theory models, and the solutions are frequencies of types in a > population that follows the replicator dynamics; but this is probably > irrelevant. > > Sometimes these solutions are cyclical, yet I sample at points which do > not correspond with the period of the cycle, so that np.allclose() > cannot be directly applied. > > Is there any way to check for cycles in this situation? > > Thanks for any advice, > Manolo > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Daniel Sank -------------- next part -------------- An HTML attachment was scrubbed... URL: From manolo at austrohungaro.com Thu Nov 26 16:59:58 2015 From: manolo at austrohungaro.com (Manolo =?iso-8859-1?Q?Mart=EDnez?=) Date: Thu, 26 Nov 2015 22:59:58 +0100 Subject: [Numpy-discussion] Recognizing a cycle in a vector In-Reply-To: References: <20151126151801.GA12553@beagle> Message-ID: <20151126215958.GA28958@beagle> > >> Is there any way to check for cycles in this situation? > > > Fast fourier transform (fft)? > > +1 For using a discrete Fourier transform, as implemented by numpy.fft.fft. > You mentioned that you sample at points which do not correspond with the > period of the signal; this introduces a slight complexity in how the > Fourier transform reflects information about the original signal. I attach > two documents to this email with details about those (and other) > complexities. There is also much information on this topic online and in > signal processing books. Dear Elliot, Daniel, Thanks a lot for that. Off to read! M From antlarac at gmail.com Fri Nov 27 05:11:10 2015 From: antlarac at gmail.com (Antonio Lara) Date: Fri, 27 Nov 2015 11:11:10 +0100 Subject: [Numpy-discussion] ENH: added vector operators: divergence, curl and laplacian #6727 Message-ID: Hello, I have corrected the errors in my previous pull request that includes the new functions divergence, curl and laplacian. https://github.com/numpy/numpy/pull/6727 Thank you, Antonio -------------- next part -------------- An HTML attachment was scrubbed... URL: From Stephan.Sahm at gmx.de Fri Nov 27 05:37:17 2015 From: Stephan.Sahm at gmx.de (Stephan Sahm) Date: Fri, 27 Nov 2015 11:37:17 +0100 Subject: [Numpy-discussion] FeatureRequest: support for array construction from iterators Message-ID: ?? [ this ?request /discussion refers to numpy issue ?? ?? #5863 ? ? ?? https://github.com/numpy/numpy/pull/5863#issuecomment-159738368 ] Dear all, As far as I can think, the expected functionality of np.array(...) would be np.array(list(...)) or something even nicer. Therefore, I like to request a generator/iterator support for np.array(...) as far as list(...) supports it. A more detailed reasoning behind this follows now. In general it seems possible to identify iterators/generators as needed for this purpose: - someone actually implemented this feature already (see ?? ?? #5863 ) - there is ?``type.GeneratorType?`` and ?``?collections.abc.Iterator?`` for ``isinstance?(...)`` check ?- numpy can destinguish them already from all other types which get well translated into a numpy array? ? Given this, I think the general argument goes roughly like the following: PROS (effect maybe 10% of numpy user or more): - more intuitive overall behaviour, array(...) = array(list(...)) roughly - python3 compatibility (see e.g. #5951 ) - compatibility with analog ``__builtin__`` functions (see e.g. #5756 ) - all the above make numpy easier to use in an interactive style (e.g. ipython --pylab) (computation not that important, however coding time well) CONS (effect less than 0.1% numpy user I would guess): - might break existing code which in total, at least for me at this stage, speaks in favour of merging ? ? ?the already existing ?featurebranch (see ? ?? ?? #5863 ?) ? or something similar into numpy master ?. Discussion, please! ?cheers, Stepha?n -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Fri Nov 27 08:18:59 2015 From: alan.isaac at gmail.com (Alan G Isaac) Date: Fri, 27 Nov 2015 08:18:59 -0500 Subject: [Numpy-discussion] FeatureRequest: support for array construction from iterators In-Reply-To: References: Message-ID: <56585843.80103@gmail.com> On 11/27/2015 5:37 AM, Stephan Sahm wrote: > I like to request a generator/iterator support for np.array(...) as far as list(...) supports it. http://docs.scipy.org/doc/numpy/reference/generated/numpy.fromiter.html hth, Alan Isaac From eliben at gmail.com Fri Nov 27 11:13:52 2015 From: eliben at gmail.com (Eli Bendersky) Date: Fri, 27 Nov 2015 08:13:52 -0800 Subject: [Numpy-discussion] understanding buffering done when broadcasting In-Reply-To: <1448439685.16828.14.camel@sipsolutions.net> References: <1448316593.1604.33.camel@sipsolutions.net> <1448439685.16828.14.camel@sipsolutions.net> Message-ID: On Wed, Nov 25, 2015 at 12:21 AM, Sebastian Berg wrote: > On Di, 2015-11-24 at 16:49 -0800, Eli Bendersky wrote: > > > > > > On Mon, Nov 23, 2015 at 2:09 PM, Sebastian Berg > > wrote: > > On Mo, 2015-11-23 at 13:31 -0800, Eli Bendersky wrote: > > > Hello, > > > > > > > > > I'm trying to understand the buffering done by the Numpy > > iterator > > > interface (the new post 1.6-one) when running ufuncs on > > arrays that > > > require broadcasting. Consider this simple case: > > > > > > In [35]: m = np.arange(16).reshape(4,4) > > > In [37]: n = np.arange(4) > > > In [39]: m + n > > > Out[39]: > > > array([[ 0, 2, 4, 6], > > > [ 4, 6, 8, 10], > > > [ 8, 10, 12, 14], > > > [12, 14, 16, 18]]) > > > > > > > > > > > > On first sight this seems true. However, there is one other > > point to > > consider here. The inner ufunc loop can only handle a single > > stride. The > > contiguous array `n` has to be iterated as if it had the > > strides > > `(0, 8)`, which is not the strides of the contiguous array `m` > > which can > > be "unrolled" to 1-D. Those effective strides are thus not > > contiguous > > for the inner ufunc loop and cannot be unrolled into a single > > ufunc > > call! > > > > The optimization (which might kick in a bit more broadly > > maybe), is thus > > that the number of inner loop calls is minimized, whether that > > is worth > > it, I am not sure, it may well be that there is some worthy > > optimization > > possible here. > > Note however, that this does not occur for large inner loop > > sizes > > (though I think you can find some "bad" sizes): > > > > ``` > > In [18]: n = np.arange(40000) > > > > In [19]: m = np.arange(160000).reshape(4,40000) > > > > In [20]: o = m + n > > Iterator: Checking casting for operand 0 > > op: dtype('int64'), iter: dtype('int64') > > Iterator: Checking casting for operand 1 > > op: dtype('int64'), iter: dtype('int64') > > Iterator: Checking casting for operand 2 > > op: , iter: dtype('int64') > > Iterator: Setting allocated stride 1 for iterator dimension 0 > > to 8 > > Iterator: Setting allocated stride 0 for iterator dimension 1 > > to 320000 > > Iterator: Copying inputs to buffers > > Iterator: Expanding inner loop size from 8192 to 40000 since > > buffering > > wasn't needed > > Any buffering needed: 0 > > Iterator: Finished copying inputs to buffers (buffered size is > > 40000) > > ``` > > > > > > The heuristic in the code says that if we can use a single stride and > > that's larger than the buffer size (which I assume is the default > > buffer size, and can change) then it's "is_onestride" and no buffering > > is done. > > > > > > So this led me to explore around this threshold (8192 items by default > > on my machine), and indeed we can notice funny behavior: > > > > In [51]: %%timeit n = arange(8192); m = > > np.arange(8192*40).reshape(40,8192) > > o = m + n > > ....: > > 1000 loops, best of 3: 274 ?s per loop > > > > In [52]: %%timeit n = arange(8292); m = > > np.arange(8292*40).reshape(40,8292) > > o = m + n > > ....: > > 1000 loops, best of 3: 229 ?s per loop > > > > > > So, given this, it's not very clear why the "optimization" kicks in. > > Buffering for small sizes seems like a mistake. > > > > I am pretty sure it is not generally a mistake. Consider the case of an > 10000x3 array (note that shrinking the buffer can have great advantage > though, I am not sure if this is done usually). > If you have (10000, 3) + (3,) arrays, then the ufunc outer loop would > have 10000x overhead. Doing the buffering (which I believe has some > extra code to be faster), will lower this to a few ufunc inner loop > calls. > I have not timed it, but I would be a bit surprised if it was not faster > in this case at least. Even calling a C function (and looping) for an > inner loop of 3 elements, should be quite a bit of overhead, and my > guess is more overhead than the buffering. > Yes, that's a good point for arrays shaped like this. I guess all this leaves us with is a realization that the heuristic *could* be tuned somewhat for arrays where the inner dimension is large - as in the case I demonstrated above it's nonsensical to have a computation be 20% faster when the array size increases over an arbitrary threshold. I'll see if I can find some time to dig more into this and figure out where the knobs to tweak the heuristic are. Thanks for the enlightening discussion, Sebastian Eli > - Sebastian > > > > > > Eli > > > > > > > > > > Anyway, feel free to have a look ;). The code is not the most > > read one > > in NumPy, and it would not surprise me a lot if you can find > > something > > to tweak. > > > > - Sebastian > > > > > > > > > > If I instrument Numpy (setting NPY_IT_DBG_TRACING and such), > > I see > > > that when the add() ufunc is called, 'n' is copied into a > > temporary > > > buffer by the iterator. The ufunc then gets the buffer as > > its data. > > > > > > > > > My question is: why is this buffering needed? It seems > > wasteful, since > > > no casting is required here, no special alignment problems > > and also > > > 'n' is contiguously laid out in memory. It seems that it > > would be more > > > efficient to just use 'n' in the ufunc instead of passing in > > the > > > buffer. What am I missing? > > > > > > > > > Thanks in advance, > > > > > > Eli > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at scipy.org > > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Sun Nov 29 13:56:05 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 29 Nov 2015 19:56:05 +0100 Subject: [Numpy-discussion] Indexing NEP draft In-Reply-To: <1447236170.2487.43.camel@sipsolutions.net> References: <1447236170.2487.43.camel@sipsolutions.net> Message-ID: <1448823365.17293.6.camel@sipsolutions.net> Hey, small update on this. The NEP draft has not changed much, but you can now try the full power of the proposed new indexing types [1]: * arr.oindex[...] # orthogonal/outer indexing * arr.vindex[...] # vectorized (like fancy, but different ;)) * arr.lindex[...] # legacy/fancy indexing with my pull request at https://github.com/numpy/numpy/pull/6075 You can try it locally by cloning the numpy github repository and then running from the source dir: git fetch upstream pull/6075/head:pr-6075 && git checkout pr-6075; python runtests.py --ipython # Inside ipython: import warnings; warnings.simplefilter("always") The examples from the NEP at should all run fine, you can find the NEP draft at: https://github.com/numpy/numpy/pull/6256/files?short_path=01e4dd9#diff-01e4dd9d2ecf994b24e5883f98f789e6 I would be most happy about any comments or suggestions! - Sebastian [1] Modulo possible bugs, there is not test suit yet.... On Mi, 2015-11-11 at 11:02 +0100, Sebastian Berg wrote: > Hi all, > > at scipy discussing with Nathaniel and others, we thought that maybe we > can push for orthogonal type indexing into numpy. Now with the new > version out and some other discussions done, I thought it is time to > pick it up :). > > The basic ideas are twofold. First make indexing easier and less > confusing for starters (and advanced users also), and second improve > interoperability with projects such as xray for whom orthogonal/outer > type indexing makes more sense. > > I have started working on: > > 1. A preliminary draft of an NEP you can view at > https://github.com/numpy/numpy/pull/6256/files?short_path=01e4dd9#diff-01e4dd9d2ecf994b24e5883f98f789e6 > or at the end of this mail. > > 2. A preliminary implementation of `oindex` attribute with > orthogonal/outer style indexing in > https://github.com/numpy/numpy/pull/6075 which you can try out by > cloning numpy and then running from the source dir: > > git fetch upstream pull/6075/head:pr-6075 && git checkout pr-6075; > python runtests.py --ipython > > This will fetch my PR, switch to the branch and open an interactive > ipython shell where you will be able to do arr.oindex[]. > > > Note that I consider the NEP quite preliminary in many parts, and it may > still be very confusing unless you are well versed with current advanced > indexing. There are some longer examples comparing the different styles > and another "example" which tries to show a "use case" example going > from simpler to more complex indexing operations. > Any comments are very welcome, and if it is "I don't understand a > word" :). I know it is probably too short and, at least without > examples, not easy to understand. > > Best, > > Sebastian > > > ================================================================================== > The current NEP draft: > > > ========================================================== > Implementing intuitive and full featured advanced indexing > ========================================================== > > :Author: Sebastian Berg > :Date: 2015-08-27 > :Status: draft > > > Executive summary > ================= > > Advanced indexing with multiple array indices is typically confusing to > both new, and in many cases even old, users of NumPy. To avoid this > problem > and allow for more and clearer features, we propose to: > > 1. Introduce ``arr.oindex[indices]`` which allows advanced indices, but > uses outer indexing logic. > 2. Introduce ``arr.vindex[indices]`` which use the current > "vectorized"/broadcasted logic but with two differences from > fancy indexing: > > 1. Boolean indices always use the outer indexing logic. > (Multi dimensional booleans should be allowed). > 2. The integer index result dimensions are always the first axes > of the result array. No transpose is done, even for a single > integer array index. > > 3. Vanilla indexing on the array will only give warnings and eventually > errors either: > > * when there is ambiguity between legacy fancy and outer indexing > (note that ``arr[[1, 2], :, 0]`` is such a case, an integer > can be the "second" integer index array), > * when any integer index array is present (possibly additional for > more then one boolean index array). > > These constraints are sufficient for making indexing generally > consistent > with expectations and providing a less surprising learning curve with > ``oindex``. > > Note that all things mentioned here apply both for assignment as well as > subscription. > > Understanding these details is *not* easy. The `Examples` section gives > code > examples. And the hopefully easier `Motivational Example` provides some > motivational use-cases for the general ideas and is likely a good start > for > anyone not intimately familiar with advanced indexing. > > > Motivation > ========== > > Old style advanced indexing with multiple array (boolean or integer) > indices, > also called "fancy indexing", tends to be very confusing for new users. > While fancy (or legacy) indexing is useful in many cases one would > naively > assume that the result of multiple 1-d ranges is analogous to multiple > slices along each dimension (also called "outer indexing"). > > However, legacy fancy indexing with multiple arrays broadcasts these > arrays > into a single index over multiple dimensions. There are three main > points > of confusion when multiple array indices are involved: > > 1. Most new users will usually expect outer indexing (consistent with > slicing). This is also the most common way of handling this in other > packages or languages. > 2. The axes introduced by the array indices are at the front, unless > all array indices are consecutive, in which case one can deduce where > the user "expects" them to be: > > * `arr[:, [0, 1], :, [0, 1]]` will have the first dimension shaped 2. > * `arr[:, [0, 1], [0, 1]]` will have the second dimension shaped 2. > > 3. When a boolean array index is mixed with another boolean or integer > array, the result is very hard to understand (the boolean array is > converted to integer array indices and then broadcast), and hardly > useful. > There is no well defined broadcast for booleans, so that boolean > indices are logically always "``outer``" type indices. > > > Proposed rules > ============== > > From the three problems noted above some expectations for NumPy can > be deduced: > > 1. There should be a prominent outer/orthogonal indexing method such as > ``arr.oindex[indices]``. > 2. Considering how confusing fancy indexing can be, it should only > occur explicitly (e.g. ``arr.vindex[indices]``) > 3. A new ``arr.vindex[indices]`` method, would not be tied to the > confusing transpose rules of fancy indexing (which is for example > needed for the simple case of a single advanced index). Thus, it > no transposing should be done. The axes of the advanced indices are > always inserted at the front, even for a single index. > 4. Boolean indexing is conceptionally outer indexing. A broadcasting > together with other advanced indices in the manner of legacy > "fancy indexing" is generally not helpful or well defined. > A user who wishes the "``nonzero``" plus broadcast behaviour can thus > be expected to do this manually. > Using this rule, a single boolean index can index into multiple > dimensions at once. > 5. An ``arr.lindex`` or ``arr.findex`` should likely be implemented to > allow > legacy fancy indexing indefinetly. This also gives a simple way to > update fancy indexing code making deprecations to vanilla indexing > easier. > 6. Vanilla indexing ``arr[...]`` could return an error for ambiguous > cases. > For the beginning, this probably means cases where ``arr[ind]`` and > ``arr.oindex[ind]`` return different results gives deprecation > warnings. > However, the exact rules for this (especially the final behaviour) > are not > quite clear in cases such as ``arr[0, :, index_arr]``. > > All other rules for indexing are identical. > > > Open Questions > ============== > > 1. Especially for the new indexing attributes ``oindex`` and ``vindex``, > a case could be made to not implicitly add an ``Ellipsis`` index if > necessary. > This helps finding bugs since a too high dimensional array can be > caught. > (I am in favor for this, but doubt we should think about this for > vanilla > indexing.) > > 2. The names ``oindex`` and ``vindex`` are just suggestions at the time > of > writing this, another name NumPy has used for something like > ``oindex`` > is ``np.ix_``. See also below. > > 3. It would be possible to limit the use of boolean indices in > ``vindex``, > assuming that they are rare and to some degree special. > (This would make implementation simpler, but I do not see a big > reason.) > > 4. ``oindex`` and ``vindex`` could always return copies, even when no > array > operation occurs. One argument for using the same rules is that this > way > ``oindex`` can be used as a general index replacement. > (There is likely no big reason for this, however, there is one > reason: > ``arr.vindex[array_scalar, ...]`` can occur, where ``arr_scalar`` > should be a 0-D array. Copying always "fixes" the possible > inconsistency.) > > 5. The final state to morph indexing in is not fixed in this PEP. It is > for > example possible that `arr[index]`` will be equivalent to > ``arr.oindex`` > at some point in the future. Since such a change will take years, it > seems unnecessary to make specific decisions now. > > 6. Proposed changes to vanilla indexing could be postponed indefinetly > or > not taken in order to not break or force fixing of existing code > bases. > > 7. Possible the ``vindex`` combination with boolean indexing could be > rethought or not allowed at all for simplicity. > > > Necessary changes to NumPy > ========================== > > Implement ``arr.oindex`` and ``arr.vindex`` objects to allow these > indexing > operations and create warnings (and eventually deprecate) ambiguous > direct > indexing operations on arrays. > > > Alternative Names > ================= > > Possible names suggested (more suggestions will be added). > > ============== ======== ======= > **Orthogonal** oindex oix > **Vectorized** vindex fix > **Legacy** l/findex > ============== ======== ======= > > > Examples > ======== > > Since the various kinds of indexing is hard to grasp in many cases, > these > examples hopefully give some more insights. Note that they are all in > terms > of shape. All original dimensions start with 5, advanced indexing > inserts less long dimensions. (Note that ``...`` or ``Ellipsis`` mostly > inserts as many slices as needed to index the full array). These > examples > may be hard to grasp without working knowledge of advanced indexing as > of > NumPy 1.9. > > Example array:: > > >>> arr = np.ones((5, 6, 7, 8)) > > > Legacy fancy indexing > --------------------- > > Single index is transposed (this is the same for all indexing types):: > > >>> arr[[0], ...].shape > (1, 6, 7, 8) > >>> arr[:, [0], ...].shape > (5, 1, 7, 8) > > > Multiple indices are transposed *if* consecutive:: > > >>> arr[:, [0], [0], :].shape # future error > (5, 1, 7) > >>> arr[:, [0], :, [0]].shape # future error > (1, 5, 6) > > > It is important to note that a scalar *is* integer array index in this > sense > (and gets broadcasted with the other advanced index):: > > >>> arr[:, [0], 0, :].shape # future error (scalar is "fancy") > (5, 1, 7) > >>> arr[:, [0], :, 0].shape # future error (scalar is "fancy") > (1, 5, 6) > > > Single boolean index can act on multiple dimensions (especially the > whole > array). It has to match (as of 1.10. a deprecation warning) the > dimensions. > The boolean index is otherwise identical to (multiple consecutive) > integer > array indices:: > > >>> # Create boolean index with one True value for the last two > dimensions: > >>> bindx = np.zeros((7, 8), dtype=np.bool_) > >>> bindx[[0, 0]] = True > >>> arr[:, 0, bindx].shape > (5, 1) > >>> arr[0, :, bindx].shape > (1, 6) > > > The combination with anything that is not a scalar is confusing, e.g.:: > > >>> arr[[0], :, bindx].shape # bindx result broadcasts with [0] > (1, 6) > >>> arr[:, [0, 1], bindx] # IndexError > > > Outer indexing > -------------- > > Multiple indices are "orthogonal" and their result axes are inserted > at the same place (they are not broadcasted):: > > >>> arr.oindex[:, [0], [0, 1], :].shape > (5, 1, 2, 8) > >>> arr.oindex[:, [0], :, [0, 1]].shape > (5, 1, 7, 2) > >>> arr.oindex[:, [0], 0, :].shape > (5, 1, 8) > >>> arr.oindex[:, [0], :, 0].shape > (5, 1, 7) > > > Boolean indices results are always inserted where the index is:: > > >>> # Create boolean index with one True value for the last two > dimensions: > >>> bindx = np.zeros((7, 8), dtype=np.bool_) > >>> bindx[[0, 0]] = True > >>> arr.oindex[:, 0, bindx].shape > (5, 1) > >>> arr.oindex[0, :, bindx].shape > (6, 1) > > > Nothing changed in the presence of other advanced indices since:: > > >>> arr.oindex[[0], :, bindx].shape > (1, 6, 1) > >>> arr.oindex[:, [0, 1], bindx] > (5, 2, 1) > > > Vectorized/inner indexing > ------------------------- > > Multiple indices are broadcasted and iterated as one like fancy > indexing, > but the new axes area always inserted at the front:: > > >>> arr.vindex[:, [0], [0, 1], :].shape > (2, 5, 8) > >>> arr.vindex[:, [0], :, [0, 1]].shape > (2, 5, 7) > >>> arr.vindex[:, [0], 0, :].shape > (1, 5, 8) > >>> arr.vindex[:, [0], :, 0].shape > (1, 5, 7) > > > Boolean indices results are always inserted where the index is, exactly > as in ``oindex`` given how specific they are to the axes they operate > on:: > > >>> # Create boolean index with one True value for the last two > dimensions: > >>> bindx = np.zeros((7, 8), dtype=np.bool_) > >>> bindx[[0, 0]] = True > >>> arr.vindex[:, 0, bindx].shape > (5, 1) > >>> arr.vindex[0, :, bindx].shape > (6, 1) > > > But other advanced indices are again transposed to the front:: > > >>> arr.vindex[[0], :, bindx].shape > (1, 6, 1) > >>> arr.vindex[:, [0, 1], bindx] > (2, 5, 1) > > > Related Questions > ================= > > There exist a further indexing or indexing like method. That is the > inverse of a command such as ``np.argmin(arr, axis=axis)``, to pick > the specific elements *along* an axis given an array of (at least > typically) the same size. > > Doing such a thing with the indexing notation is not quite straight > forward > since the axis on which to pick elements has to be supplied. One > plausible > solution would be to create a function (calling it pick here for > simplicity):: > > np.pick(arr, index_arr, axis=axis) > > where ``index_arr`` has to be the same shape as ``arr`` except along > ``axis``. > One could imagine that this can be useful together with other indexing > types, > but such a function may be sufficient and extra information needed seems > easier > to pass using a function convention. Another option would be to allow an > argument > such as ``compress_axes=None`` (just to have some name) which maps the > axes from > the index array to the new array with ``None`` signaling a new axis. > Also keepdims could be added as a simple default. (Note that the use of > axis is not > compatible to ``np.take`` for an ``index_arr`` which is not zero or one > dimensional.) > > Another solution is to provide functions or features to the > ``arg*``functions > to map this to the equivalent ``vindex`` indexing operation. > > > Motivational Example > ==================== > > Imagine having a data acquisition software storing ``D`` channels and > ``N`` datapoints along the time. She stores this into an ``(N, D)`` > shaped > array. During data analysis, we needs to fetch a pool of channels, for > example > to calculate a mean over them. > > This data can be faked using:: > > >>> arr = np.random.random((100, 10)) > > Now one may remember indexing with an integer array and find the correct > code:: > > >>> group = arr[:, [2, 5]] > >>> mean_value = arr.mean() > > However, assume that there were some specific time points (first > dimension > of the data) that need to be specially considered. These time points are > already known and given by:: > > >>> interesting_times = np.array([1, 5, 8, 10], dtype=np.intp) > > Now to fetch them, we may try to modify the previous code:: > > >>> group_at_it = arr[interesting_times, [2, 5]] > IndexError: Ambiguous index, use `.oindex` or `.vindex` > > An error such as this will point to read up the indexing documentation. > This should make it clear, that ``oindex`` behaves more like slicing. > So, out of the different methods it is the obvious choice > (for now, this is a shape mismatch, but that could possibly also mention > ``oindex``):: > > >>> group_at_it = arr.oindex[interesting_times, [2, 5]] > > Now of course one could also have used ``vindex``, but it is much less > obvious how to achieve the right thing!:: > > >>> reshaped_times = interesting_times[:, np.newaxis] > >>> group_at_it = arr.vindex[reshaped_times, [2, 5]] > > > One may find, that for example our data is corrupt in some places. > So, we need to replace these values by zero (or anything else) for these > times. The first column may for example give the necessary information, > so that changing the values becomes easy remembering boolean indexing:: > > >>> bad_data = arr[0] > 0.5 > >>> arr[bad_data, :] = 0 > > Again, however, the columns may need to be handled more individually > (but in > groups), and the ``oindex`` attribute works well:: > > >>> arr.oindex[bad_data, [2, 5]] = 0 > > Note that it would be very hard to do this using legacy fancy indexing. > The only way would be to create an integer array first:: > > >>> bad_data_indx = np.nonzero(bad_data)[0] > >>> bad_data_indx_reshaped = bad_data_indx[:, np.newaxis] > >>> arr[bad_data_indx_reshaped, [2, 5]] > > In any case we can use only ``oindex`` to do all of this without getting > into any trouble or confused by the whole complexity of advanced > indexing. > > But, some new features are added to the data acquisition. Different > sensors > have to be used depending on the times. Let us assume we already have > created an array of indices:: > > >>> correct_sensors = np.random.randint(10, size=(100, 2)) > > Which lists for each time the two correct sensors in an ``(N, 2)`` > array. > > A first try to achieve this may be ``arr[:, correct_sensors]`` and this > does > not work. It should be clear quickly that slicing cannot achieve the > desired > thing. But hopefully users will remember that there is ``vindex`` as a > more > powerful and flexible approach to advanced indexing. > One may, if trying ``vindex`` randomly, be confused about:: > > >>> new_arr = arr.vindex[:, correct_sensors] > > which is neither the same, nor the correct result (see transposing > rules)! > This is because slicing works still the same in ``vindex``. However, > reading > the documentation and examples, one can hopefully quickly find the > desired > solution:: > > >>> rows = np.arange(len(arr)) > >>> rows = rows[:, np.newaxis] # make shape fit with > correct_sensors > >>> new_arr = arr.vindex[rows, correct_sensors] > > At this point we have left the straight forward world of ``oindex`` but > can > do random picking of any element from the array. Note that in the last > example > a method such as mentioned in the ``Related Questions`` section could be > more > straight forward. But this approach is even more flexible, since > ``rows`` > does not have to be a simple ``arange``, but could be > ``intersting_times``:: > > >>> correct_sensors_at_it = correct_sensors[interesting_times, :] > >>> interesting_times_reshaped = interesting_times[:, np.newaxis] > >>> new_arr_it = arr[interesting_times_reshaped, > correct_sensors_at_it] > > Truly complex situation would arise now if you would for example pool > ``L`` > experiments into an array shaped ``(L, N, D)``. But for ``oindex`` this > should > not result into surprises. ``vindex``, being more powerful, will quite > certainly create some confusion in this case but also cover pretty much > all > eventualities. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From charlesr.harris at gmail.com Sun Nov 29 15:28:47 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 29 Nov 2015 13:28:47 -0700 Subject: [Numpy-discussion] Python development on fedora 23. Message-ID: Hi Fedora users, Python distutils on fedora 23 is configured for hardening, hence `redhat-rpm-config` is a dependency if you want to build numpy, scipy, etc. The symptom is a "broken toolchain" error. A bug is open for this and it might get fixed on the Python end, but I don't expect anything soon. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From vilanova at ac.upc.edu Mon Nov 30 12:42:06 2015 From: vilanova at ac.upc.edu (=?utf-8?Q?Llu=C3=ADs_Vilanova?=) Date: Mon, 30 Nov 2015 18:42:06 +0100 Subject: [Numpy-discussion] Inconsistent/unexpected indexing semantics Message-ID: <87zixvqt8x.fsf@fimbulvetr.bsc.es> Hi, TL;DR: There's a pending pull request deprecating some behaviour I find unexpected. Does anyone object? Some time ago I noticed that numpy yields unexpected results in some very specific cases. An array can be used to index multiple elements of a single dimension: >>> a = np.arange(8).reshape((2,2,2)) >>> a[ np.array([[0], [0]]) ] array([[[[0, 1], [2, 3]]], [[[0, 1], [2, 3]]]]) Nonetheless, if a list is used instead, it is (unexpectedly) transformed into a tuple, resulting in indexing across multiple dimensions: >>> a[ [[0], [0]] ] array([[0, 1]]) I.e., it is interpeted as: >>> a[ [0], [0] ] array([[0, 1]]) Or what is the same: >>> a[( [0], [0] )] array([[0, 1]]) I've been informed that there's a pending pull request that deprecates this behaviour [1], which could in the future be reverted to what is expected (at least what I expect) from the documents (except for an obscure note in [2]). The discussion leading to this mail can be found here [3]. [1] https://github.com/numpy/numpy/pull/4434 [2] http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing [3] https://github.com/numpy/numpy/issues/6564 Thanks, Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From sebastian at sipsolutions.net Mon Nov 30 15:19:45 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 30 Nov 2015 21:19:45 +0100 Subject: [Numpy-discussion] Inconsistent/unexpected indexing semantics In-Reply-To: <87zixvqt8x.fsf@fimbulvetr.bsc.es> References: <87zixvqt8x.fsf@fimbulvetr.bsc.es> Message-ID: <1448914785.7789.11.camel@sipsolutions.net> On Mo, 2015-11-30 at 18:42 +0100, Llu?s Vilanova wrote: > Hi, > > TL;DR: There's a pending pull request deprecating some behaviour I find > unexpected. Does anyone object? > > Some time ago I noticed that numpy yields unexpected results in some very > specific cases. An array can be used to index multiple elements of a single > dimension: > > >>> a = np.arange(8).reshape((2,2,2)) > >>> a[ np.array([[0], [0]]) ] > array([[[[0, 1], > [2, 3]]], > [[[0, 1], > [2, 3]]]]) > > Nonetheless, if a list is used instead, it is (unexpectedly) transformed into a > tuple, resulting in indexing across multiple dimensions: > > >>> a[ [[0], [0]] ] > array([[0, 1]]) > > I.e., it is interpeted as: > > >>> a[ [0], [0] ] > array([[0, 1]]) > > Or what is the same: > > >>> a[( [0], [0] )] > array([[0, 1]]) > > > I've been informed that there's a pending pull request that deprecates this > behaviour [1], which could in the future be reverted to what is expected (at > least what I expect) from the documents (except for an obscure note in [2]). > Obviously, I am not against this ;). I have to admit it worries me a bit, because there is quite a bit of code doing things like: >>> slice_object = [slice(None)] * 5 >>> slice_object[2] = 3 >>> arr[slice_object] and all of this code (numpy also has a lot of it), will probably have to change the last line to be: >>> arr[tuple(slice_object)] So the implication of this might actually be more farther reaching then one might think at first; or at least require quite a lot of code to be touched (inside numpy that is no problem, but outside). - Sebastian > The discussion leading to this mail can be found here [3]. > > [1] https://github.com/numpy/numpy/pull/4434 > [2] http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing > [3] https://github.com/numpy/numpy/issues/6564 > > > Thanks, > Lluis > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From njs at pobox.com Mon Nov 30 20:10:34 2015 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 30 Nov 2015 17:10:34 -0800 Subject: [Numpy-discussion] Inconsistent/unexpected indexing semantics In-Reply-To: <1448914785.7789.11.camel@sipsolutions.net> References: <87zixvqt8x.fsf@fimbulvetr.bsc.es> <1448914785.7789.11.camel@sipsolutions.net> Message-ID: On Nov 30, 2015 12:19 PM, "Sebastian Berg" wrote: > > On Mo, 2015-11-30 at 18:42 +0100, Llu?s Vilanova wrote: > > Hi, > > > > TL;DR: There's a pending pull request deprecating some behaviour I find > > unexpected. Does anyone object? > > > > Some time ago I noticed that numpy yields unexpected results in some very > > specific cases. An array can be used to index multiple elements of a single > > dimension: > > > > >>> a = np.arange(8).reshape((2,2,2)) > > >>> a[ np.array([[0], [0]]) ] > > array([[[[0, 1], > > [2, 3]]], > > [[[0, 1], > > [2, 3]]]]) > > > > Nonetheless, if a list is used instead, it is (unexpectedly) transformed into a > > tuple, resulting in indexing across multiple dimensions: > > > > >>> a[ [[0], [0]] ] > > array([[0, 1]]) > > > > I.e., it is interpeted as: > > > > >>> a[ [0], [0] ] > > array([[0, 1]]) > > > > Or what is the same: > > > > >>> a[( [0], [0] )] > > array([[0, 1]]) > > > > > > I've been informed that there's a pending pull request that deprecates this > > behaviour [1], which could in the future be reverted to what is expected (at > > least what I expect) from the documents (except for an obscure note in [2]). > > > > > Obviously, I am not against this ;). I have to admit it worries me a > bit, because there is quite a bit of code doing things like: > > >>> slice_object = [slice(None)] * 5 > >>> slice_object[2] = 3 > >>> arr[slice_object] > > and all of this code (numpy also has a lot of it), will probably have to > change the last line to be: > > >>> arr[tuple(slice_object)] This seems like an improvement to me, so I'm +1 on deprecating. I agree that it might be a very long time before we can actually change the behavior though. I think it would make sense to split it into two separate, parallel deprecations: one for lists like in Llu?s's example that are coerceable to an integer array, and one for lists like in your example that contain slices and stuff. The first case needs a FutureWarning, is incredibly confusing, and is unlikely to be used on purpose; the second only needs a DeprecationWarning, is less confusing, and is probably in broader use, so might want a longer deprecation period. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: