From othalan at othalan.net Mon May 2 05:51:37 2016 From: othalan at othalan.net (David Morris) Date: Mon, 2 May 2016 16:51:37 +0700 Subject: [Numpy-discussion] Unable to pickle numpy array on iOS Message-ID: I have an application running on iOS where I pickle a numpy array in order to save it for later use. However, I receive the following error: pickle.dumps(arr) ... _pickle.PicklingError: Can't pickle : import of module 'multiarray' failed On a desktop system (OSX), there is no problem dumping the array. I am using NumPy v1.9.3 Any ideas on why this might be happening? Thank you, David -------------- next part -------------- An HTML attachment was scrubbed... URL: From honi at brandeis.edu Tue May 3 11:43:18 2016 From: honi at brandeis.edu (Honi Sanders) Date: Tue, 3 May 2016 11:43:18 -0400 Subject: [Numpy-discussion] Cross-correlation PR stuck in limbo Message-ID: <8D8A0029-A28D-4E4E-A0D5-437FDC85AA73@brandeis.edu> Hello all, I have completed a pull request to add a ?maxlag? functionality to numpy.correlate. See here: https://github.com/numpy/numpy/pull/5978 . This pull request has passed all tests and has been ready to be merged for around six months. Several people have commented requesting for it to be included on stackoverflow, the listserve, and github. Can someone please let me know what needs to be done or can it be merged? Here is some background: What was troubling me is that numpy.correlate does not have a maxlag feature. This means that even if I only want to see correlations between two time series with lags between -100 and +100 ms, for example, it will still calculate the correlation for every lag between -20000 and +20000 ms (which is the length of the time series). This (theoretically) gives a 200x performance hit! I have introduced this question as a numpy issue , a scipy issue and on the scipy-dev list . It seems the best place to start is with numpy.correlate, so that is what I am requesting. Previous discussion of this functionality can be found at another discussion on numpy correlate (and convolution) . Other issues related to correlate functions include ENH: Fold fftconvolve into convolve/correlate functions as a parameter #2651 , Use FFT in np.correlate/convolve? (Trac #1260) #1858 , and normalized cross-correlation (Trac #1714) #2310 . The new implementation allows new types of the ?mode? argument, to include an int value, which defines the maximum lag for which cross-correlation should be calculated, or a tuple, which defines the minlag, maxlag, and lagstep to be used in the same format as the arguments to numpy.arange. Please let me know what should be done to move this pull request forward. Honi -------------- next part -------------- An HTML attachment was scrubbed... URL: From pierre.haessig at crans.org Wed May 4 08:07:43 2016 From: pierre.haessig at crans.org (Pierre Haessig) Date: Wed, 4 May 2016 14:07:43 +0200 Subject: [Numpy-discussion] Cross-correlation PR stuck in limbo In-Reply-To: <8D8A0029-A28D-4E4E-A0D5-437FDC85AA73@brandeis.edu> References: <8D8A0029-A28D-4E4E-A0D5-437FDC85AA73@brandeis.edu> Message-ID: <5729E60F.1040102@crans.org> Hi, I don't know how to push the PR forward, but all I can say is that this maxlag feature would be a major improvement for using Numpy in time series analysis! Immediate benefits downstream for Matplotlib and statsmodel. Thanks Honi for having taken the time to implement this! best, Pierre From jeffreback at gmail.com Wed May 4 20:30:00 2016 From: jeffreback at gmail.com (Jeff Reback) Date: Wed, 4 May 2016 20:30:00 -0400 Subject: [Numpy-discussion] ANN: v0.18.1 pandas Released Message-ID: This is a minor bug-fix release from 0.18.0 and includes a large number of bug fixes along several new features, enhancements, and performance improvements. We recommend that all users upgrade to this version. This was a release of 6 weeks with 210 commits by 60 authors encompassing 142 issues and 164 pull-requests. *What is it:* *pandas* is a Python package providing fast, flexible, and expressive data structures designed to make working with ?relational? or ?labeled? data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. *Highlights*: - .groupby(...) has been enhanced to provide convenient syntax when working with .rolling(..), .expanding(..) and .resample(..) per group, see here - pd.to_datetime() has gained the ability to assemble dates from a DataFrame, see here - Method chaining improvements, see here - Custom business hour offset, see here - Many bug fixes in the handling of sparse, see here - Expanded the Tutorials section with a feature on modern pandas, courtesy of @TomAugsburger . See the Whatsnew for much more information, and the full Documentation link. *How to get it:* Source tarballs, windows wheels, and macosx wheels are available on PyPI . Windows wheels are courtesy of Christoph Gohlke, and are built on Numpy 1.10. Macosx wheels are courtesy of Matthew Brett. Installation via conda is: conda install pandas currently its available via the conda-forge channel: conda install pandas -c conda-forge It will be available on the main channel shortly. Please report any issues on our issue tracker : Jeff Reback *Thanks to all of the contributors* * - Andrew Fiore-Gartland- Bastiaan- Beno?t Vinot- Brandon Rhodes- DaCoEx- Drew Fustin- Ernesto Freitas- Filip Ter- Gregory Livschitz- G?bor Lipt?k- Hassan Kibirige- Iblis Lin- Israel Saeta P?rez- Jason Wolosonovich- Jeff Reback- Joe Jevnik- Joris Van den Bossche- Joshua Storck- Ka Wo Chen- Kerby Shedden- Kieran O'Mahony- Leif Walsh- Mahmoud Lababidi- Maoyuan Liu- Mark Roth- Matt Wittmann- MaxU- Maximilian Roos- Michael Droettboom- Nick Eubank- Nicolas Bonnotte- OXPHOS- Pauli Virtanen- Peter Waller- Pietro Battiston- Prabhjot Singh- Robin Wilson- Roger Thomas- Sebastian Bank- Stephen Hoover- Tim Hopper- Tom Augspurger- WANG Aiyong- Wes Turner- Winand- Xbar- Yan Facai- adneu- ajenkins-cargometrics- behzad nouri- chinskiy- gfyoung- jeps-journal- jonaslb- kotrfa- nileracecrew- onesandzeroes- rs2- sinhrks- tsdlovell* -------------- next part -------------- An HTML attachment was scrubbed... URL: From oysteijo at gmail.com Thu May 5 05:38:51 2016 From: oysteijo at gmail.com (=?UTF-8?Q?=C3=98ystein_Sch=C3=B8nning=2DJohansen?=) Date: Thu, 5 May 2016 11:38:51 +0200 Subject: [Numpy-discussion] Calling C code that assumes SIMD aligned data. Message-ID: Hi! I've written a little code of numpy code that does a neural network feedforward calculation: def feedforward(self,x): for activation, w, b in zip( self.activations, self.weights, self.biases ): x = activation( np.dot(w, x) + b) This works fine when my activation functions are in Python, however I've wrapped the activation functions from a C implementation that requires the array to be memory aligned. (due to simd instructions in the C implementation.) So I need the operation np.dot( w, x) + b to return a ndarray where the data pointer is aligned. How can I do that? Is it possible at all? (BTW: the function works correctly about 20% of the time I run it, and else it segfaults on the simd instruction in the the C function) Thanks, -?ystein -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at gmail.com Thu May 5 07:55:36 2016 From: faltet at gmail.com (Francesc Alted) Date: Thu, 5 May 2016 13:55:36 +0200 Subject: [Numpy-discussion] Calling C code that assumes SIMD aligned data. In-Reply-To: References: Message-ID: 2016-05-05 11:38 GMT+02:00 ?ystein Sch?nning-Johansen : > Hi! > > I've written a little code of numpy code that does a neural network > feedforward calculation: > > def feedforward(self,x): > for activation, w, b in zip( self.activations, self.weights, > self.biases ): > x = activation( np.dot(w, x) + b) > > This works fine when my activation functions are in Python, however I've > wrapped the activation functions from a C implementation that requires the > array to be memory aligned. (due to simd instructions in the C > implementation.) So I need the operation np.dot( w, x) + b to return a > ndarray where the data pointer is aligned. How can I do that? Is it > possible at all? > Yes. np.dot() does accept an `out` parameter where you can pass your aligned array. The way for testing if numpy is returning you an aligned array is easy: In [15]: x = np.arange(6).reshape(2,3) In [16]: x.ctypes.data % 16 Out[16]: 0 but: In [17]: x.ctypes.data % 32 Out[17]: 16 so, in this case NumPy returned a 16-byte aligned array which should be enough for 128 bit SIMD (SSE family). This kind of alignment is pretty common in modern computers. If you need 256 bit (32-byte) alignment then you will need to build your container manually. See here for an example: http://stackoverflow.com/questions/9895787/memory-alignment-for-fast-fft-in-python-using-shared-arrrays Francesc > > (BTW: the function works correctly about 20% of the time I run it, and > else it segfaults on the simd instruction in the the C function) > > Thanks, > -?ystein > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Francesc Alted -------------- next part -------------- An HTML attachment was scrubbed... URL: From oysteijo at gmail.com Thu May 5 16:10:32 2016 From: oysteijo at gmail.com (=?UTF-8?Q?=C3=98ystein_Sch=C3=B8nning=2DJohansen?=) Date: Thu, 5 May 2016 22:10:32 +0200 Subject: [Numpy-discussion] Calling C code that assumes SIMD aligned data. In-Reply-To: References:

Message-ID: Thanks for your answer, Francesc. Knowing that there is no numpy solution saves the work of searching for this. I've not tried the solution described at SO, but it looks like a real performance killer. I'll rather try to override malloc with glibs malloc_hooks or LD_PRELOAD tricks. Do you think that will do it? I'll try it and report back. Thanks, -?ystein On Thu, May 5, 2016 at 1:55 PM, Francesc Alted wrote: > 2016-05-05 11:38 GMT+02:00 ?ystein Sch?nning-Johansen > : > >> Hi! >> >> I've written a little code of numpy code that does a neural network >> feedforward calculation: >> >> def feedforward(self,x): >> for activation, w, b in zip( self.activations, self.weights, >> self.biases ): >> x = activation( np.dot(w, x) + b) >> >> This works fine when my activation functions are in Python, however I've >> wrapped the activation functions from a C implementation that requires the >> array to be memory aligned. (due to simd instructions in the C >> implementation.) So I need the operation np.dot( w, x) + b to return a >> ndarray where the data pointer is aligned. How can I do that? Is it >> possible at all? >> > > Yes. np.dot() does accept an `out` parameter where you can pass your > aligned array. The way for testing if numpy is returning you an aligned > array is easy: > > In [15]: x = np.arange(6).reshape(2,3) > > In [16]: x.ctypes.data % 16 > Out[16]: 0 > > but: > > In [17]: x.ctypes.data % 32 > Out[17]: 16 > > so, in this case NumPy returned a 16-byte aligned array which should be > enough for 128 bit SIMD (SSE family). This kind of alignment is pretty > common in modern computers. If you need 256 bit (32-byte) alignment then > you will need to build your container manually. See here for an example: > http://stackoverflow.com/questions/9895787/memory-alignment-for-fast-fft-in-python-using-shared-arrrays > > Francesc > > >> >> (BTW: the function works correctly about 20% of the time I run it, and >> else it segfaults on the simd instruction in the the C function) >> >> Thanks, >> -?ystein >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > Francesc Alted > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu May 5 16:32:46 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 5 May 2016 14:32:46 -0600 Subject: [Numpy-discussion] Calling C code that assumes SIMD aligned data. In-Reply-To: References:

Message-ID: On Thu, May 5, 2016 at 2:10 PM, ?ystein Sch?nning-Johansen < oysteijo at gmail.com> wrote: > Thanks for your answer, Francesc. Knowing that there is no numpy solution > saves the work of searching for this. I've not tried the solution described > at SO, but it looks like a real performance killer. I'll rather try to > override malloc with glibs malloc_hooks or LD_PRELOAD tricks. Do you think > that will do it? I'll try it and report back. > > Thanks, > -?ystein > Might take a look at how numpy handles this in `numpy/core/src/umath/simd.inc.src`. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at gmail.com Fri May 6 09:01:32 2016 From: faltet at gmail.com (Francesc Alted) Date: Fri, 6 May 2016 15:01:32 +0200 Subject: [Numpy-discussion] Calling C code that assumes SIMD aligned data. In-Reply-To: References:

Message-ID: 2016-05-05 22:10 GMT+02:00 ?ystein Sch?nning-Johansen : > Thanks for your answer, Francesc. Knowing that there is no numpy solution > saves the work of searching for this. I've not tried the solution described > at SO, but it looks like a real performance killer. I'll rather try to > override malloc with glibs malloc_hooks or LD_PRELOAD tricks. Do you think > that will do it? I'll try it and report back. > I don't think you need that much weaponry. Just create an array with some spare space for alignment. Realize that you want a 64-byte aligned double precision array. With that, create your desired array + 64 additional bytes (8 doubles): In [92]: a = np.zeros(int(1e6) + 8) In [93]: a.ctypes.data % 64 Out[93]: 16 and compute the elements to shift this: In [94]: shift = (64 / a.itemsize) - (a.ctypes.data % 64) / a.itemsize In [95]: shift Out[95]: 6 now, create a view with the required elements less: In [98]: b = a[shift:-((64 / a.itemsize)-shift)] In [99]: len(b) Out[99]: 1000000 In [100]: b.ctypes.data % 64 Out[100]: 0 and voila, b is now aligned to 64 bytes. As the view is a copy-free operation, this is fast, and you only wasted 64 bytes. Pretty cheap indeed. Francesc > > Thanks, > -?ystein > > On Thu, May 5, 2016 at 1:55 PM, Francesc Alted wrote: > >> 2016-05-05 11:38 GMT+02:00 ?ystein Sch?nning-Johansen > >: >> >>> Hi! >>> >>> I've written a little code of numpy code that does a neural network >>> feedforward calculation: >>> >>> def feedforward(self,x): >>> for activation, w, b in zip( self.activations, self.weights, >>> self.biases ): >>> x = activation( np.dot(w, x) + b) >>> >>> This works fine when my activation functions are in Python, however I've >>> wrapped the activation functions from a C implementation that requires the >>> array to be memory aligned. (due to simd instructions in the C >>> implementation.) So I need the operation np.dot( w, x) + b to return a >>> ndarray where the data pointer is aligned. How can I do that? Is it >>> possible at all? >>> >> >> Yes. np.dot() does accept an `out` parameter where you can pass your >> aligned array. The way for testing if numpy is returning you an aligned >> array is easy: >> >> In [15]: x = np.arange(6).reshape(2,3) >> >> In [16]: x.ctypes.data % 16 >> Out[16]: 0 >> >> but: >> >> In [17]: x.ctypes.data % 32 >> Out[17]: 16 >> >> so, in this case NumPy returned a 16-byte aligned array which should be >> enough for 128 bit SIMD (SSE family). This kind of alignment is pretty >> common in modern computers. If you need 256 bit (32-byte) alignment then >> you will need to build your container manually. See here for an example: >> http://stackoverflow.com/questions/9895787/memory-alignment-for-fast-fft-in-python-using-shared-arrrays >> >> Francesc >> >> >>> >>> (BTW: the function works correctly about 20% of the time I run it, and >>> else it segfaults on the simd instruction in the the C function) >>> >>> Thanks, >>> -?ystein >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> >> -- >> Francesc Alted >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Francesc Alted -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Fri May 6 16:22:38 2016 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Fri, 6 May 2016 22:22:38 +0200 Subject: [Numpy-discussion] Calling C code that assumes SIMD aligned data. In-Reply-To: References:

Message-ID: <572CFD0E.1080301@googlemail.com> note that anything larger than 16 bytes alignment is unnecessary for simd purposes on current hardware (>= haswell). 16 byte is default malloc alignment on amd64. And even on older ones (sandy bridge) the penalty is pretty minor. On 05.05.2016 22:32, Charles R Harris wrote: > > > On Thu, May 5, 2016 at 2:10 PM, ?ystein Sch?nning-Johansen > > wrote: > > Thanks for your answer, Francesc. Knowing that there is no numpy > solution saves the work of searching for this. I've not tried the > solution described at SO, but it looks like a real performance > killer. I'll rather try to override malloc with glibs malloc_hooks > or LD_PRELOAD tricks. Do you think that will do it? I'll try it and > report back. > > Thanks, > -?ystein > > > Might take a look at how numpy handles this in > `numpy/core/src/umath/simd.inc.src`. > > > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > From solipsis at pitrou.net Sat May 7 07:02:14 2016 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 7 May 2016 13:02:14 +0200 Subject: [Numpy-discussion] Calling C code that assumes SIMD aligned data. References:

Message-ID: <20160507130214.3ccf5af8@fsol> Here's an obligatory plug for the two following PRs: https://github.com/numpy/numpy/pull/5457 https://github.com/numpy/numpy/pull/5470 Regards Antoine. On Fri, 6 May 2016 15:01:32 +0200 Francesc Alted wrote: > 2016-05-05 22:10 GMT+02:00 ?ystein Sch?nning-Johansen : > > > Thanks for your answer, Francesc. Knowing that there is no numpy solution > > saves the work of searching for this. I've not tried the solution described > > at SO, but it looks like a real performance killer. I'll rather try to > > override malloc with glibs malloc_hooks or LD_PRELOAD tricks. Do you think > > that will do it? I'll try it and report back. > > > > I don't think you need that much weaponry. Just create an array with some > spare space for alignment. Realize that you want a 64-byte aligned double > precision array. With that, create your desired array + 64 additional > bytes (8 doubles): > > In [92]: a = np.zeros(int(1e6) + 8) > > In [93]: a.ctypes.data % 64 > Out[93]: 16 > > and compute the elements to shift this: > > In [94]: shift = (64 / a.itemsize) - (a.ctypes.data % 64) / a.itemsize > > In [95]: shift > Out[95]: 6 > > now, create a view with the required elements less: > > In [98]: b = a[shift:-((64 / a.itemsize)-shift)] > > In [99]: len(b) > Out[99]: 1000000 > > In [100]: b.ctypes.data % 64 > Out[100]: 0 > > and voila, b is now aligned to 64 bytes. As the view is a copy-free > operation, this is fast, and you only wasted 64 bytes. Pretty cheap indeed. > > Francesc > > > > > > Thanks, > > -?ystein > > > > On Thu, May 5, 2016 at 1:55 PM, Francesc Alted wrote: > > > >> 2016-05-05 11:38 GMT+02:00 ?ystein Sch?nning-Johansen >> >: > >> > >>> Hi! > >>> > >>> I've written a little code of numpy code that does a neural network > >>> feedforward calculation: > >>> > >>> def feedforward(self,x): > >>> for activation, w, b in zip( self.activations, self.weights, > >>> self.biases ): > >>> x = activation( np.dot(w, x) + b) > >>> > >>> This works fine when my activation functions are in Python, however I've > >>> wrapped the activation functions from a C implementation that requires the > >>> array to be memory aligned. (due to simd instructions in the C > >>> implementation.) So I need the operation np.dot( w, x) + b to return a > >>> ndarray where the data pointer is aligned. How can I do that? Is it > >>> possible at all? > >>> > >> > >> Yes. np.dot() does accept an `out` parameter where you can pass your > >> aligned array. The way for testing if numpy is returning you an aligned > >> array is easy: > >> > >> In [15]: x = np.arange(6).reshape(2,3) > >> > >> In [16]: x.ctypes.data % 16 > >> Out[16]: 0 > >> > >> but: > >> > >> In [17]: x.ctypes.data % 32 > >> Out[17]: 16 > >> > >> so, in this case NumPy returned a 16-byte aligned array which should be > >> enough for 128 bit SIMD (SSE family). This kind of alignment is pretty > >> common in modern computers. If you need 256 bit (32-byte) alignment then > >> you will need to build your container manually. See here for an example: > >> http://stackoverflow.com/questions/9895787/memory-alignment-for-fast-fft-in-python-using-shared-arrrays > >> > >> Francesc > >> > >> > >>> > >>> (BTW: the function works correctly about 20% of the time I run it, and > >>> else it segfaults on the simd instruction in the the C function) > >>> > >>> Thanks, > >>> -?ystein > >>> > >>> _______________________________________________ > >>> NumPy-Discussion mailing list > >>> NumPy-Discussion at scipy.org > >>> https://mail.scipy.org/mailman/listinfo/numpy-discussion > >>> > >>> > >> > >> > >> -- > >> Francesc Alted > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> https://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > >> > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > From zbyszek at in.waw.pl Tue May 10 13:29:35 2016 From: zbyszek at in.waw.pl (Zbigniew =?utf-8?Q?J=C4=99drzejewski-Szmek?=) Date: Tue, 10 May 2016 17:29:35 +0000 Subject: [Numpy-discussion] =?utf-8?q?=5BANN=5D_Reminder=3A_Summer_School_?= =?utf-8?q?=22Advanced_Scientific_Programming_in_Python=22_in_Reading=2C_U?= =?utf-8?q?K=2C_September_5=E2=80=9411=2C_2016?= Message-ID: <20160510172935.GA3290@in.waw.pl> Reminder: Deadline for application is 23:59 UTC, May 15, 2016. Advanced Scientific Programming in Python ========================================= a Summer School by the G-Node, and the Centre for Integrative Neuroscience and Neurodynamics, School of Psychology and Clinical Language Sciences, University of Reading, UK Scientists spend more and more time writing, maintaining, and debugging software. While techniques for doing this efficiently have evolved, only few scientists have been trained to use them. As a result, instead of doing their research, they spend far too much time writing deficient code and reinventing the wheel. In this course we will present a selection of advanced programming techniques and best practices which are standard in the industry, but especially tailored to the needs of a programming scientist. Lectures are devised to be interactive and to give the students enough time to acquire direct hands-on experience with the materials. Students will work in pairs throughout the school and will team up to practice the newly learned skills in a real programming project ? an entertaining computer game. We use the Python programming language for the entire course. Python works as a simple programming language for beginners, but more importantly, it also works great in scientific simulations and data analysis. We show how clean language design, ease of extensibility, and the great wealth of open source libraries for scientific computing and data visualization are driving Python to become a standard tool for the programming scientist. This school is targeted at Master or PhD students and Post-docs from all areas of science. Competence in Python or in another language such as Java, C/C++, MATLAB, or Mathematica is absolutely required. Basic knowledge of Python and of a version control system such as git, subversion, mercurial, or bazaar is assumed. Participants without any prior experience with Python and/or git should work through the proposed introductory material before the course. We are striving hard to get a pool of students which is international and gender-balanced. You can apply online: https://python.g-node.org Application deadline: 23:59 UTC, May 15, 2016. Be sure to read the FAQ before applying. Participation is for free, i.e. no fee is charged! Participants however should take care of travel, living, and accommodation expenses by themselves. Travel grants may be available. Date & Location =============== September 5?11, 2016. Reading, UK Program ======= - Best Programming Practices ? Best practices for scientific programming ? Version control with git and how to contribute to open source projects with GitHub ? Best practices in data visualization - Software Carpentry ? Test-driven development ? Debugging with a debuggger ? Profiling code - Scientific Tools for Python ? Advanced NumPy - Advanced Python ? Decorators ? Context managers ? Generators - The Quest for Speed ? Writing parallel applications ? Interfacing to C with Cython ? Memory-bound problems and memory profiling ? Data containers: storage and fast access to large data - Practical Software Development ? Group project Faculty ======= ? Francesc Alted, freelance consultant, author of Blosc, Spain ? Pietro Berkes, Enthought Inc., Cambridge, UK ? Zbigniew J?drzejewski-Szmek, Krasnow Institute, George Mason University, Fairfax, VA, USA ? Eilif Muller, Blue Brain Project, ?cole Polytechnique F?d?rale de Lausanne, Switzerland ? Rike-Benjamin Schuppner, Institute for Theoretical Biology, Humboldt-Universit?t zu Berlin, Germany ? Bartosz Tele?czuk, European Institute for Theoretical Neuroscience, CNRS, Paris, France ? St?fan van der Walt, Berkeley Institute for Data Science, UC Berkeley, CA, USA ? Nelle Varoquaux, Centre for Computational Biology Mines ParisTech, Institut Curie, U900 INSERM, Paris, France ? Tiziano Zito, freelance consultant, Germany Organizers ========== For the German Neuroinformatics Node of the INCF (G-Node) Germany: ? Tiziano Zito, freelance consultant, Germany ? Zbigniew J?drzejewski-Szmek, Krasnow Institute, George Mason University, Fairfax, USA ? Jakob Jordan, Institute of Neuroscience and Medicine (INM-6), Forschungszentrum J?lich GmbH, Germany For the Centre for Integrative Neuroscience and Neurodynamics, School of Psychology and Clinical Language Sciences, University of Reading UK: ? Etienne Roesch, Centre for Integrative Neuroscience and Neurodynamics, University of Reading, UK Website: https://python.g-node.org Contact: python-info at g-node.org From sturla.molden at gmail.com Wed May 11 04:29:02 2016 From: sturla.molden at gmail.com (Sturla Molden) Date: Wed, 11 May 2016 08:29:02 +0000 (UTC) Subject: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533) References: Message-ID: <1624973999484645450.688294sturla.molden-gmail.com@news.gmane.org> I did some work on this some years ago. I have more or less concluded that it was a waste of effort. But first let me explain what the suggested approach do not work. As it uses memory mapping to create shared memory (i.e. shared segments are not named), they must be created ahead of spawning processes. But if you really want this to work smoothly, you want named shared memory (Sys V IPC or posix shm_open), so that shared arrays can be created in the spawned processes and passed back. Now for the reason I don't care about shared memory arrays anymore, and what I am currently working on instead: 1. I have come across very few cases where threaded code cannot be used in numerical computing. In fact, multithreading nearly always happens in the code where I write pure C or Fortran anyway. Most often it happens in library code that are already multithreaded (Intel MKL, Apple Accelerate Framework, OpenBLAS, etc.), which means using it requires no extra effort from my side. A multithreaded LAPACK library is not less multithreaded if I call it from Python. 2. Getting shared memory right can be difficult because of hierarchical memory and false sharing. You might not see it if you only have a multicore CPU with a shared cache. But your code might not scale up on computers with more than one physical processor. False sharing acts like the GIL, except it happens in hardware and affects your C code invisibly without any explicit locking you can pinpoint. This is also why MPI code tends to scale much better than OpenMP code. If nothing is shared there will be no false sharing. 3. Raw C level IPC is cheap ? very, very cheap. Even if you use pipes or sockets instead of shared memory it is cheap. There are very few cases where the IPC tends to be a bottleneck. 4. The reason IPC appears expensive with NumPy is because multiprocessing pickles the arrays. It is pickle that is slow, not the IPC. Some would say that the pickle overhead is an integral part of the IPC ovearhead, but i will argue that it is not. The slowness of pickle is a separate problem alltogether. 5. Share memory does not improve on the pickle overhead because also NumPy arrays with shared memory must be pickled. Multiprocessing can bypass pickling the RawArray object, but the rest of the NumPy array is pickled. Using shared memory arrays have no speed advantage over normal NumPy arrays when we use multiprocessing. 6. It is much easier to write concurrent code that uses queues for message passing than anything else. That is why using a Queue object has been the popular Pythonic approach to both multitreading and multiprocessing. I would like this to continue. I am therefore focusing my effort on the multiprocessing.Queue object. If you understand the six points I listed you will see where this is going: What we really need is a specialized queue that has knowledge about NumPy arrays and can bypass pickle. I am therefore focusing my efforts on creating a NumPy aware queue object. We are not doing the users a favor by encouraging the use of shared memory arrays. They help with nothing. Sturla Molden Mat?j T?? wrote: > Dear Numpy developers, > I propose a pull request https://github.com/numpy/numpy/pull/7533 that > features numpy arrays that can be shared among processes (with some > effort). > > Why: > In CPython, multiprocessing is the only way of how to exploit > multi-core CPUs if your parallel code can't avoid creating Python > objects. In that case, CPython's GIL makes threads unusable. However, > unlike with threading, sharing data among processes is something that > is non-trivial and platform-dependent. > > Although numpy (and certainly some other packages) implement some > operations in a way that GIL is not a concern, consider another case: > You have a large amount of data in a form of a numpy array and you > want to pass it to a function of an arbitrary Python module that also > expects numpy array (e.g. list of vertices coordinates as an input and > array of the corresponding polygon as an output). Here, it is clear > GIL is an issue you and since you want a numpy array on both ends, now > you would have to copy your numpy array to a multiprocessing.Array (to > pass the data) and then to convert it back to ndarray in the worker > process. > This contribution would streamline it a bit - you would create an > array as you are used to, pass it to the subprocess as you would do > with the multiprocessing.Array, and the process can work with a numpy > array right away. > > How: > The idea is to create a numpy array in a buffer that can be shared > among processes. Python has support for this in its standard library, > so the current solution creates a multiprocessing.Array and then > passes it as the "buffer" to the ndarray.__new__. That would be it on > Unixes, but on Windows, there has to be a a custom pickle method, > otherwise the array "forgets" that its buffer is that special and the > sharing doesn't work. > > Some of what has been said in the pull request & my answer to that: > > * ... I do see some value in providing a canonical right way to > construct shared memory arrays in NumPy, but I'm not very happy with > this solution, ... terrible code organization (with the global > variables): > * I understand that, however this is a pattern of Python > multiprocessing and everybody who wants to use the Pool and shared > data either is familiar with this approach or has to become familiar > with[2, 3]. The good compromise is to have a separate module for each > parallel calculation, so global variables are not a problem. > > * Can you explain why the ndarray subclass is needed? Subclasses can > be rather annoying to get right, and also for other reasons. > * The shmarray class needs the custom pickler (but only on Windows). > > * If there's some way to we can paper over the boilerplate such that > users can use it without understanding the arcana of multiprocessing, > then yes, that would be great. But otherwise I'm not sure there's > anything to be gained by putting it in a library rather than referring > users to the examples on StackOverflow [1] [2]. > * What about telling users: "You can use numpy with multiprocessing. > Remeber the multiprocessing.Value and multiprocessing.Aray classes? > numpy.shm works exactly the same way, which means that it shares their > limitations. Refer to an example: ." Notice that > although those SO links contain all of the information, it is very > difficult to get it up and running for a newcomer like me few years > ago. > > * This needs tests and justification for custom pickling methods, > which are not used in any of the current examples. ... > * I am sorry, but don't fully understand that point. The custom > pickling method of shmarray has to be there on Windows, but users > don't have to know about it at all. As noted earlier, the global > variable is the only way of using standard Python multiprocessing.Pool > with shared objects. > > [1]: > http://stackoverflow.com/questions/10721915/shared-memory-objects-in-python-multiprocessing > [2]: > http://stackoverflow.com/questions/7894791/use-numpy-array-in-shared-memory-for-multiprocessing > [3]: > http://stackoverflow.com/questions/1675766/how-to-combine-pool-map-with-array-shared-memory-in-python-multiprocessing From Permafacture at gmail.com Wed May 11 10:41:46 2016 From: Permafacture at gmail.com (Elliot Hallmark) Date: Wed, 11 May 2016 09:41:46 -0500 Subject: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533) Message-ID: Strula, this sounds brilliant! To be clear, you're talking about serializing the numpy array and reconstructing it in a way that's faster than pickle? Or using shared memory and signaling array creation around that shared memory rather than using pickle? For what it's worth, I have used shared memory with numpy arrays as IPC (no queue), with one process writing to it and one process reading from it, and liked it. Your point #5 did not apply because I was reusing the shared memory. Do you have a public repo where you are working on this? Thanks! Elliot On Wed, May 11, 2016 at 3:29 AM, Sturla Molden wrote: > I did some work on this some years ago. I have more or less concluded that > it was a waste of effort. But first let me explain what the suggested > approach do not work. As it uses memory mapping to create shared memory > (i.e. shared segments are not named), they must be created ahead of > spawning processes. But if you really want this to work smoothly, you want > named shared memory (Sys V IPC or posix shm_open), so that shared arrays > can be created in the spawned processes and passed back. > > Now for the reason I don't care about shared memory arrays anymore, and > what I am currently working on instead: > > 1. I have come across very few cases where threaded code cannot be used in > numerical computing. In fact, multithreading nearly always happens in the > code where I write pure C or Fortran anyway. Most often it happens in > library code that are already multithreaded (Intel MKL, Apple Accelerate > Framework, OpenBLAS, etc.), which means using it requires no extra effort > from my side. A multithreaded LAPACK library is not less multithreaded if I > call it from Python. > > 2. Getting shared memory right can be difficult because of hierarchical > memory and false sharing. You might not see it if you only have a multicore > CPU with a shared cache. But your code might not scale up on computers with > more than one physical processor. False sharing acts like the GIL, except > it happens in hardware and affects your C code invisibly without any > explicit locking you can pinpoint. This is also why MPI code tends to scale > much better than OpenMP code. If nothing is shared there will be no false > sharing. > > 3. Raw C level IPC is cheap ? very, very cheap. Even if you use pipes or > sockets instead of shared memory it is cheap. There are very few cases > where the IPC tends to be a bottleneck. > > 4. The reason IPC appears expensive with NumPy is because multiprocessing > pickles the arrays. It is pickle that is slow, not the IPC. Some would say > that the pickle overhead is an integral part of the IPC ovearhead, but i > will argue that it is not. The slowness of pickle is a separate problem > alltogether. > > 5. Share memory does not improve on the pickle overhead because also NumPy > arrays with shared memory must be pickled. Multiprocessing can bypass > pickling the RawArray object, but the rest of the NumPy array is pickled. > Using shared memory arrays have no speed advantage over normal NumPy arrays > when we use multiprocessing. > > 6. It is much easier to write concurrent code that uses queues for message > passing than anything else. That is why using a Queue object has been the > popular Pythonic approach to both multitreading and multiprocessing. I > would like this to continue. > > I am therefore focusing my effort on the multiprocessing.Queue object. If > you understand the six points I listed you will see where this is going: > What we really need is a specialized queue that has knowledge about NumPy > arrays and can bypass pickle. I am therefore focusing my efforts on > creating a NumPy aware queue object. > > We are not doing the users a favor by encouraging the use of shared memory > arrays. They help with nothing. > > > Sturla Molden > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Wed May 11 14:01:02 2016 From: allanhaldane at gmail.com (Allan Haldane) Date: Wed, 11 May 2016 14:01:02 -0400 Subject: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533) In-Reply-To: <1624973999484645450.688294sturla.molden-gmail.com@news.gmane.org> References: <1624973999484645450.688294sturla.molden-gmail.com@news.gmane.org> Message-ID: <5733735E.2040309@gmail.com> On 05/11/2016 04:29 AM, Sturla Molden wrote: > 4. The reason IPC appears expensive with NumPy is because multiprocessing > pickles the arrays. It is pickle that is slow, not the IPC. Some would say > that the pickle overhead is an integral part of the IPC ovearhead, but i > will argue that it is not. The slowness of pickle is a separate problem > alltogether. That's interesting. I've also used multiprocessing with numpy and didn't realize that. Is this true in python3 too? In python2 it appears that multiprocessing uses pickle protocol 0 which must cause a big slowdown (a factor of 100) relative to protocol 2, and uses pickle instead of cPickle. a = np.arange(40*40) %timeit pickle.dumps(a) 1000 loops, best of 3: 1.63 ms per loop %timeit cPickle.dumps(a) 1000 loops, best of 3: 1.56 ms per loop %timeit cPickle.dumps(a, protocol=2) 100000 loops, best of 3: 18.9 ?s per loop Python 3 uses protocol 3 by default: %timeit pickle.dumps(a) 10000 loops, best of 3: 20 ?s per loop > 5. Share memory does not improve on the pickle overhead because also NumPy > arrays with shared memory must be pickled. Multiprocessing can bypass > pickling the RawArray object, but the rest of the NumPy array is pickled. > Using shared memory arrays have no speed advantage over normal NumPy arrays > when we use multiprocessing. > > 6. It is much easier to write concurrent code that uses queues for message > passing than anything else. That is why using a Queue object has been the > popular Pythonic approach to both multitreading and multiprocessing. I > would like this to continue. > > I am therefore focusing my effort on the multiprocessing.Queue object. If > you understand the six points I listed you will see where this is going: > What we really need is a specialized queue that has knowledge about NumPy > arrays and can bypass pickle. I am therefore focusing my efforts on > creating a NumPy aware queue object. > > We are not doing the users a favor by encouraging the use of shared memory > arrays. They help with nothing. > > > Sturla Molden From ben.v.root at gmail.com Wed May 11 14:22:54 2016 From: ben.v.root at gmail.com (Benjamin Root) Date: Wed, 11 May 2016 14:22:54 -0400 Subject: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533) In-Reply-To: <5733735E.2040309@gmail.com> References: <1624973999484645450.688294sturla.molden-gmail.com@news.gmane.org> <5733735E.2040309@gmail.com> Message-ID: Oftentimes, if one needs to share numpy arrays for multiprocessing, I would imagine that it is because the array is huge, right? So, the pickling approach would copy that array for each process, which defeats the purpose, right? Ben Root On Wed, May 11, 2016 at 2:01 PM, Allan Haldane wrote: > On 05/11/2016 04:29 AM, Sturla Molden wrote: > > 4. The reason IPC appears expensive with NumPy is because multiprocessing > > pickles the arrays. It is pickle that is slow, not the IPC. Some would > say > > that the pickle overhead is an integral part of the IPC ovearhead, but i > > will argue that it is not. The slowness of pickle is a separate problem > > alltogether. > > That's interesting. I've also used multiprocessing with numpy and didn't > realize that. Is this true in python3 too? > > In python2 it appears that multiprocessing uses pickle protocol 0 which > must cause a big slowdown (a factor of 100) relative to protocol 2, and > uses pickle instead of cPickle. > > a = np.arange(40*40) > > %timeit pickle.dumps(a) > 1000 loops, best of 3: 1.63 ms per loop > > %timeit cPickle.dumps(a) > 1000 loops, best of 3: 1.56 ms per loop > > %timeit cPickle.dumps(a, protocol=2) > 100000 loops, best of 3: 18.9 ?s per loop > > Python 3 uses protocol 3 by default: > > %timeit pickle.dumps(a) > 10000 loops, best of 3: 20 ?s per loop > > > > 5. Share memory does not improve on the pickle overhead because also > NumPy > > arrays with shared memory must be pickled. Multiprocessing can bypass > > pickling the RawArray object, but the rest of the NumPy array is pickled. > > Using shared memory arrays have no speed advantage over normal NumPy > arrays > > when we use multiprocessing. > > > > 6. It is much easier to write concurrent code that uses queues for > message > > passing than anything else. That is why using a Queue object has been the > > popular Pythonic approach to both multitreading and multiprocessing. I > > would like this to continue. > > > > I am therefore focusing my effort on the multiprocessing.Queue object. If > > you understand the six points I listed you will see where this is going: > > What we really need is a specialized queue that has knowledge about NumPy > > arrays and can bypass pickle. I am therefore focusing my efforts on > > creating a NumPy aware queue object. > > > > We are not doing the users a favor by encouraging the use of shared > memory > > arrays. They help with nothing. > > > > > > Sturla Molden > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rainwoodman at gmail.com Wed May 11 18:38:15 2016 From: rainwoodman at gmail.com (Feng Yu) Date: Wed, 11 May 2016 15:38:15 -0700 Subject: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533) In-Reply-To: References: <1624973999484645450.688294sturla.molden-gmail.com@news.gmane.org> <5733735E.2040309@gmail.com> Message-ID: Hi, I've been thinking and exploring this for some time. If we are to start some effort I'd like to help. Here are my comments, mostly regarding to Sturla's comments. 1. If we are talking about shared memory and copy-on-write inheritance, then we are using 'fork'. If we are free to use fork, then a large chunk of the concerns regarding the python std library multiprocessing is no longer relevant. Especially those functions must be in a module limitation that tends to impose a special requirement on the software design. 2. Picking of inherited shared memory array can be done minimally by just picking the array_interface and the pointer address. It is because the child process and the parent share the same address space layout, guarenteed by the fork call. 3. The RawArray and RawValue implementation in std multiprocessing has its own memory allocator for managing small variables. It is a huge overkill (in terms of implementation) if we only care about very large memory chunks. 4. Hidden sychronization cost on multi-cpu (NUMA?) systems. A choice is to defer the responsibility of avoiding racing to the developer. Simple structs for working on slices of array in parallel can cover a huge fraction of use cases and fully avoid this issue. 5. Whether to delegate parallelism to underlying low level implementation or to implement the paralellism in python while maintaining the underlying low level implementation sequential is probably dependent on the problem. It may be convenient as of the current state of parallelism support in Python to delegate, but will it forever be the case? For example, after the MPI FFTW binding stuck for a long time, someone wrote a parallel python FFT package (https://github.com/spectralDNS/mpiFFT4py) that uses FFTW for sequential and write all parallel semantics in Python with mpi4py, and it uses a more efficient domain decomposition. 6. If we are to define a set of operations I would recommend take a look at OpenMP as a reference -- It has been out there for decades and used widely. An equiavlant to the 'omp parallel for' construct in Python will be a very good starting point and immediately useful. - Yu On Wed, May 11, 2016 at 11:22 AM, Benjamin Root wrote: > Oftentimes, if one needs to share numpy arrays for multiprocessing, I would > imagine that it is because the array is huge, right? So, the pickling > approach would copy that array for each process, which defeats the purpose, > right? > > Ben Root > > On Wed, May 11, 2016 at 2:01 PM, Allan Haldane > wrote: >> >> On 05/11/2016 04:29 AM, Sturla Molden wrote: >> > 4. The reason IPC appears expensive with NumPy is because >> > multiprocessing >> > pickles the arrays. It is pickle that is slow, not the IPC. Some would >> > say >> > that the pickle overhead is an integral part of the IPC ovearhead, but i >> > will argue that it is not. The slowness of pickle is a separate problem >> > alltogether. >> >> That's interesting. I've also used multiprocessing with numpy and didn't >> realize that. Is this true in python3 too? >> >> In python2 it appears that multiprocessing uses pickle protocol 0 which >> must cause a big slowdown (a factor of 100) relative to protocol 2, and >> uses pickle instead of cPickle. >> >> a = np.arange(40*40) >> >> %timeit pickle.dumps(a) >> 1000 loops, best of 3: 1.63 ms per loop >> >> %timeit cPickle.dumps(a) >> 1000 loops, best of 3: 1.56 ms per loop >> >> %timeit cPickle.dumps(a, protocol=2) >> 100000 loops, best of 3: 18.9 ?s per loop >> >> Python 3 uses protocol 3 by default: >> >> %timeit pickle.dumps(a) >> 10000 loops, best of 3: 20 ?s per loop >> >> >> > 5. Share memory does not improve on the pickle overhead because also >> > NumPy >> > arrays with shared memory must be pickled. Multiprocessing can bypass >> > pickling the RawArray object, but the rest of the NumPy array is >> > pickled. >> > Using shared memory arrays have no speed advantage over normal NumPy >> > arrays >> > when we use multiprocessing. >> > >> > 6. It is much easier to write concurrent code that uses queues for >> > message >> > passing than anything else. That is why using a Queue object has been >> > the >> > popular Pythonic approach to both multitreading and multiprocessing. I >> > would like this to continue. >> > >> > I am therefore focusing my effort on the multiprocessing.Queue object. >> > If >> > you understand the six points I listed you will see where this is going: >> > What we really need is a specialized queue that has knowledge about >> > NumPy >> > arrays and can bypass pickle. I am therefore focusing my efforts on >> > creating a NumPy aware queue object. >> > >> > We are not doing the users a favor by encouraging the use of shared >> > memory >> > arrays. They help with nothing. >> > >> > >> > Sturla Molden >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > From joferkington at gmail.com Wed May 11 18:39:49 2016 From: joferkington at gmail.com (Joe Kington) Date: Wed, 11 May 2016 15:39:49 -0700 Subject: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533) In-Reply-To: <5733735E.2040309@gmail.com> References: <1624973999484645450.688294sturla.molden-gmail.com@news.gmane.org> <5733735E.2040309@gmail.com> Message-ID: In python2 it appears that multiprocessing uses pickle protocol 0 which > must cause a big slowdown (a factor of 100) relative to protocol 2, and > uses pickle instead of cPickle. > > Even on Python 2.x, multiprocessing uses protocol 2, not protocol 0. The default for the `pickle` module changed, but multiprocessing has always used a binary pickle protocol to communicate between processes. Have a look at multiprocessing's forking.py in Python 2.7. As some context here for folks that may not be aware, Sturla is referring to his earlier shared memory implementation he wrote that avoids actually pickling the data, and instead essentially pickles a pointer to an array in shared memory. As Sturla very nicely summed up, it saves memory usage, but doesn't help the deeper issues. You're far better off just communicating between processes as opposed to using shared memory. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Wed May 11 18:48:22 2016 From: sturla.molden at gmail.com (Sturla Molden) Date: Wed, 11 May 2016 22:48:22 +0000 (UTC) Subject: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533) References: Message-ID: <274831917484699028.356959sturla.molden-gmail.com@news.gmane.org> Elliot Hallmark wrote: > Strula, this sounds brilliant! To be clear, you're talking about > serializing the numpy array and reconstructing it in a way that's faster > than pickle? Yes. We know the binary format of NumPy arrays. We don't need to invoke the machinery of pickle to serialize an array and write the bytes to some IPC mechanism (pipe, tcp socket, unix socket, shared memory). The choise of IPC mechanism might not even be relevant, and could even be deferred to a library like ZeroMQ. The point is that if multiple peocesses are to cooperate efficiently, we need a way to let them communicate NumPy arrays quickly. That is where using multiprocessing hurts today, and shared memory does not help here. Sturla From sturla.molden at gmail.com Wed May 11 18:48:23 2016 From: sturla.molden at gmail.com (Sturla Molden) Date: Wed, 11 May 2016 22:48:23 +0000 (UTC) Subject: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533) References: <1624973999484645450.688294sturla.molden-gmail.com@news.gmane.org> <5733735E.2040309@gmail.com> Message-ID: <1865553010484698763.952860sturla.molden-gmail.com@news.gmane.org> Allan Haldane wrote: > That's interesting. I've also used multiprocessing with numpy and didn't > realize that. Is this true in python3 too? I am not sure. As you have noticed, pickle is faster by to orders of magnitude on Python 3. But several microseconds is also a lot, particularly if we are going to do this often during a computation. Sturla From sturla.molden at gmail.com Wed May 11 18:48:23 2016 From: sturla.molden at gmail.com (Sturla Molden) Date: Wed, 11 May 2016 22:48:23 +0000 (UTC) Subject: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533) References: <1624973999484645450.688294sturla.molden-gmail.com@news.gmane.org> <5733735E.2040309@gmail.com> Message-ID: <1393378874484697930.449371sturla.molden-gmail.com@news.gmane.org> Benjamin Root wrote: > Oftentimes, if one needs to share numpy arrays for multiprocessing, I would > imagine that it is because the array is huge, right? That is a case for shared memory, but what. i was taking about is more common than this. In order for processes to cooperate, they must communicate. So we need a way to pass around NumPy arrays quickly. Sometimes we want to use shared memory because of the size of the data, but more often it is just used as a form of inexpensive IPC. > So, the pickling > approach would copy that array for each process, which defeats the purpose, > right? I am not sure what you mean. When I made shared memory arrays I used named segments, and made sure only the name of the segments were pickled, not the contents of the buffers. Sturla From sturla.molden at gmail.com Wed May 11 19:02:05 2016 From: sturla.molden at gmail.com (Sturla Molden) Date: Wed, 11 May 2016 23:02:05 +0000 (UTC) Subject: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533) References: <1624973999484645450.688294sturla.molden-gmail.com@news.gmane.org> <5733735E.2040309@gmail.com> Message-ID: <771832558484700460.287024sturla.molden-gmail.com@news.gmane.org> Joe Kington wrote: > You're far better off just > communicating between processes as opposed to using shared memory. Yes. From sturla.molden at gmail.com Wed May 11 19:02:06 2016 From: sturla.molden at gmail.com (Sturla Molden) Date: Wed, 11 May 2016 23:02:06 +0000 (UTC) Subject: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533) References: <1624973999484645450.688294sturla.molden-gmail.com@news.gmane.org> <5733735E.2040309@gmail.com>

Message-ID: <1338223165484699842.726170sturla.molden-gmail.com@news.gmane.org> Feng Yu wrote: > 1. If we are talking about shared memory and copy-on-write > inheritance, then we are using 'fork'. Not available on Windows. On Unix it only allows one-way communication, from parent to child. > 2. Picking of inherited shared memory array can be done minimally by > just picking the array_interface and the pointer address. It is > because the child process and the parent share the same address space > layout, guarenteed by the fork call. Again, not everyone uses Unix. And on Unix it is not trival to pass data back from the child process. I solved that problem with Sys V IPC (pickling the name of the segment). > 6. If we are to define a set of operations I would recommend take a > look at OpenMP as a reference -- It has been out there for decades and > used widely. An equiavlant to the 'omp parallel for' construct in > Python will be a very good starting point and immediately useful. If you are on Unix, you can just use a context manager. Call os.fork in __enter__ and os.waitpid in __exit__. Sturla From allanhaldane at gmail.com Wed May 11 19:26:03 2016 From: allanhaldane at gmail.com (Allan Haldane) Date: Wed, 11 May 2016 19:26:03 -0400 Subject: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533) In-Reply-To: References: <1624973999484645450.688294sturla.molden-gmail.com@news.gmane.org> <5733735E.2040309@gmail.com> Message-ID: <5733BF8B.50707@gmail.com> On 05/11/2016 06:39 PM, Joe Kington wrote: > > > In python2 it appears that multiprocessing uses pickle protocol 0 which > must cause a big slowdown (a factor of 100) relative to protocol 2, and > uses pickle instead of cPickle. > > > Even on Python 2.x, multiprocessing uses protocol 2, not protocol 0. > The default for the `pickle` module changed, but multiprocessing has > always used a binary pickle protocol to communicate between processes. > Have a look at multiprocessing's forking.py in > Python 2.7. Are you sure? As far as I understood the code, it uses the default protocol 0. The file forking.py no longer exists, also. https://github.com/python/cpython/tree/master/Lib/multiprocessing (see reduction.py and queue.py) http://bugs.python.org/issue23403 From allanhaldane at gmail.com Wed May 11 19:30:14 2016 From: allanhaldane at gmail.com (Allan Haldane) Date: Wed, 11 May 2016 19:30:14 -0400 Subject: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533) In-Reply-To: <274831917484699028.356959sturla.molden-gmail.com@news.gmane.org> References: <274831917484699028.356959sturla.molden-gmail.com@news.gmane.org> Message-ID: <5733C086.9020403@gmail.com> On 05/11/2016 06:48 PM, Sturla Molden wrote: > Elliot Hallmark wrote: >> Strula, this sounds brilliant! To be clear, you're talking about >> serializing the numpy array and reconstructing it in a way that's faster >> than pickle? > > Yes. We know the binary format of NumPy arrays. We don't need to invoke the > machinery of pickle to serialize an array and write the bytes to some IPC > mechanism (pipe, tcp socket, unix socket, shared memory). The choise of IPC > mechanism might not even be relevant, and could even be deferred to a > library like ZeroMQ. The point is that if multiple peocesses are to > cooperate efficiently, we need a way to let them communicate NumPy arrays > quickly. That is where using multiprocessing hurts today, and shared memory > does not help here. > > Sturla You probably already know this, but I just wanted to note that the mpi4py module has worked around pickle too. They discuss how they efficiently transfer numpy arrays in mpi messages here: http://pythonhosted.org/mpi4py/usrman/overview.html#communicating-python-objects-and-array-data Of course not everyone is able to install mpi easily. From sturla.molden at gmail.com Thu May 12 02:27:43 2016 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 12 May 2016 06:27:43 +0000 (UTC) Subject: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533) References: <274831917484699028.356959sturla.molden-gmail.com@news.gmane.org> <5733C086.9020403@gmail.com> Message-ID: <732428194484726233.271444sturla.molden-gmail.com@news.gmane.org> Allan Haldane wrote: > You probably already know this, but I just wanted to note that the > mpi4py module has worked around pickle too. They discuss how they > efficiently transfer numpy arrays in mpi messages here: > http://pythonhosted.org/mpi4py/usrman/overview.html#communicating-python-objects-and-array-data Unless I am mistaken, they use the PEP 3118 buffer interface to support NumPy as well as a number of other Python objects. However, this protocol makes buffer aquisition an expensive operation. You can see this in Cython if you use typed memory views. Assigning a NumPy array to a typed memoryview (i,e, buffer acqisition) is slow. They are correct that avoiding pickle means we save some memory. It also avoids creating and destroying temporary Python objects, and associated reference counting. However, because of the expensive buffer acquisition, I am not sure how much faster their apporach will be. I prefer to use the NumPy C API, and bypass any unneccesary overhead. The idea is to make IPC of NumPy arrays fast, and then we cannot have an expensive buffer acquisition in there. Sturla From niki.spahiev at gmail.com Thu May 12 03:06:27 2016 From: niki.spahiev at gmail.com (Niki Spahiev) Date: Thu, 12 May 2016 10:06:27 +0300 Subject: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533) In-Reply-To: <1338223165484699842.726170sturla.molden-gmail.com@news.gmane.org> References: <1624973999484645450.688294sturla.molden-gmail.com@news.gmane.org> <5733735E.2040309@gmail.com>

<1338223165484699842.726170sturla.molden-gmail.com@news.gmane.org> Message-ID: On 12.05.2016 02:02, Sturla Molden wrote: > Feng Yu wrote: > >> 1. If we are talking about shared memory and copy-on-write >> inheritance, then we are using 'fork'. > > Not available on Windows. On Unix it only allows one-way communication, > from parent to child. Apparently next Win10 will have fork as part of bash integration. Niki From cimrman3 at ntc.zcu.cz Thu May 12 03:51:35 2016 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Thu, 12 May 2016 09:51:35 +0200 Subject: [Numpy-discussion] ANN: SfePy 2016.2 Message-ID: <57343607.2020100@ntc.zcu.cz> I am pleased to announce release 2016.2 of SfePy. Description ----------- SfePy (simple finite elements in Python) is a software for solving systems of coupled partial differential equations by the finite element method or by the isogeometric analysis (preliminary support). It is distributed under the new BSD license. Home page: http://sfepy.org Mailing list: http://groups.google.com/group/sfepy-devel Git (source) repository, issue tracker, wiki: http://github.com/sfepy Highlights of this release -------------------------- - partial shell10x element implementation - parallel computation of homogenized coefficients - clean up of elastic terms - read support for msh file mesh format of gmsh For full release notes see http://docs.sfepy.org/doc/release_notes.html#id1 (rather long and technical). Best regards, Robert Cimrman on behalf of the SfePy development team --- Contributors to this release in alphabetical order: Robert Cimrman Vladimir Lukes From evgeny.burovskiy at gmail.com Thu May 12 09:02:03 2016 From: evgeny.burovskiy at gmail.com (Evgeni Burovski) Date: Thu, 12 May 2016 14:02:03 +0100 Subject: [Numpy-discussion] scipy 0.17.1 release Message-ID: Hi, On behalf of the scipy development team, I'm pleased to announce the availability of scipy 0.17.1. This is a bugfix release with no new features compared to 0.17.0. Source tarballs and OS X wheels are available from PyPI or from GitHub releases at https://github.com/scipy/scipy/releases/tag/v0.17.1 We recommend that all users upgrade from scipy 0.17.0. Cheers, Evgeni ========================== SciPy 0.17.1 Release Notes ========================== SciPy 0.17.1 is a bug-fix release with no new features compared to 0.17.0. Issues closed for 0.17.1 ------------------------ - `#5817 `__: BUG: skew, kurtosis return np.nan instead of "propagate" - `#5850 `__: Test failed with sgelsy - `#5898 `__: interpolate.interp1d crashes using float128 - `#5953 `__: Massive performance regression in cKDTree.query with L_inf distance... - `#6062 `__: mannwhitneyu breaks backward compatibility in 0.17.0 - `#6134 `__: T test does not handle nans Pull requests for 0.17.1 ------------------------ - `#5902 `__: BUG: interpolate: make interp1d handle np.float128 again - `#5957 `__: BUG: slow down with p=np.inf in 0.17 cKDTree.query - `#5970 `__: Actually propagate nans through stats functions with nan_policy="propagate" - `#5971 `__: BUG: linalg: fix lwork check in *gelsy - `#6074 `__: BUG: special: fixed violation of strict aliasing rules. - `#6083 `__: BUG: Fix dtype for sum of linear operators - `#6100 `__: BUG: Fix mannwhitneyu to be backward compatible - `#6135 `__: Don't pass null pointers to LAPACK, even during workspace queries. - `#6148 `__: stats: fix handling of nan values in T tests and kendalltau From solipsis at pitrou.net Thu May 12 11:38:10 2016 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 12 May 2016 17:38:10 +0200 Subject: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533) References: <274831917484699028.356959sturla.molden-gmail.com@news.gmane.org> <5733C086.9020403@gmail.com> <732428194484726233.271444sturla.molden-gmail.com@news.gmane.org> Message-ID: <20160512173810.79fb21b5@fsol> On Thu, 12 May 2016 06:27:43 +0000 (UTC) Sturla Molden wrote: > Allan Haldane wrote: > > > You probably already know this, but I just wanted to note that the > > mpi4py module has worked around pickle too. They discuss how they > > efficiently transfer numpy arrays in mpi messages here: > > http://pythonhosted.org/mpi4py/usrman/overview.html#communicating-python-objects-and-array-data > > Unless I am mistaken, they use the PEP 3118 buffer interface to support > NumPy as well as a number of other Python objects. However, this protocol > makes buffer aquisition an expensive operation. Can you define "expensive"? > You can see this in Cython > if you use typed memory views. Assigning a NumPy array to a typed > memoryview (i,e, buffer acqisition) is slow. You're assuming this is the cost of "buffer acquisition", while most likely it's the cost of creating the memoryview object itself. Buffer acquisition itself only calls a single C callback and uses a stack-allocated C structure. It shouldn't be "expensive". Regards Antoine. From rainwoodman at gmail.com Thu May 12 16:46:14 2016 From: rainwoodman at gmail.com (Feng Yu) Date: Thu, 12 May 2016 13:46:14 -0700 Subject: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533) In-Reply-To: <1338223165484699842.726170sturla.molden-gmail.com@news.gmane.org> References: <1624973999484645450.688294sturla.molden-gmail.com@news.gmane.org> <5733735E.2040309@gmail.com>

<1338223165484699842.726170sturla.molden-gmail.com@news.gmane.org> Message-ID: > Again, not everyone uses Unix. > > And on Unix it is not trival to pass data back from the child process. I > solved that problem with Sys V IPC (pickling the name of the segment). > I wonder if it is neccessary insist being able to pass large amount of data back from child to the parent process. In most (half?) situations the result can be directly write back via preallocated shared array before works are spawned. Then there is no need to pass data back with named segments. Here I am just doodling some possible use cases along the OpenMP line. The sample would just copy the data from s to r, in two different ways. On systems that does not support multiprocess + fork, the semantics is still well preserved if threading is used. ``` import ...... as mp # the access attribute of inherited variables is at least 'privatecopy' # but with threading backend it becomes 'shared' s = numpy.arange(10000) with mp.parallel(num_threads=8) as section: r = section.empty(10000) # variables defined via section.empty will always be 'shared' def work(): # variables defined in the body is 'private' tid = section.get_thread_num() size = section.get_num_threads() sl = slice(tid * r.size // size, (tid + 1) * r.size // size) r[sl] = s[sl] status = section.run(work) assert not any(status.errors) # the support to the following could be implemented with section.run chunksize = 1000 def work(i): sl = slice(i, i + chunksize) r[sl] = s[sl] return s[sl].sum() status = section.loop(work, range(0, r.size, chunksize), schedule='static') assert not any(status.errors) total = sum(status.results) ``` >> 6. If we are to define a set of operations I would recommend take a >> look at OpenMP as a reference -- It has been out there for decades and >> used widely. An equiavlant to the 'omp parallel for' construct in >> Python will be a very good starting point and immediately useful. > > If you are on Unix, you can just use a context manager. Call os.fork in > __enter__ and os.waitpid in __exit__. > > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion From sturla.molden at gmail.com Thu May 12 19:14:36 2016 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 12 May 2016 23:14:36 +0000 (UTC) Subject: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533) References: <274831917484699028.356959sturla.molden-gmail.com@news.gmane.org> <5733C086.9020403@gmail.com> <732428194484726233.271444sturla.molden-gmail.com@news.gmane.org> <20160512173810.79fb21b5@fsol> Message-ID: <115932182484786870.778106sturla.molden-gmail.com@news.gmane.org> Antoine Pitrou wrote: > Can you define "expensive"? Slow enough to cause complaints on the Cython mailing list. > You're assuming this is the cost of "buffer acquisition", while most > likely it's the cost of creating the memoryview object itself. Constructing a typed memoryview from a typed memoryview or a slice is fast. Numerical code doing this intensively is still within 80-90% of the speed of plain c code using pointer arithmetics. > Buffer acquisition itself only calls a single C callback and uses a > stack-allocated C structure. It shouldn't be "expensive". I don't know the reason, only that buffer acquisition from NumPy arrays with typed memoryviews is very expensive compared to assigning a typed memoryview to another or slicing a typed memoryview. Sturla From sturla.molden at gmail.com Thu May 12 19:14:35 2016 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 12 May 2016 23:14:35 +0000 (UTC) Subject: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533) References: <1624973999484645450.688294sturla.molden-gmail.com@news.gmane.org> <5733735E.2040309@gmail.com>

<1338223165484699842.726170sturla.molden-gmail.com@news.gmane.org> Message-ID: <1352160884484787255.075220sturla.molden-gmail.com@news.gmane.org> Niki Spahiev wrote: > Apparently next Win10 will have fork as part of bash integration. That would be great. The lack of fork on Windows is very annoying. Sturla From dave.hirschfeld at gmail.com Thu May 12 19:25:55 2016 From: dave.hirschfeld at gmail.com (Dave) Date: Thu, 12 May 2016 23:25:55 +0000 (UTC) Subject: [Numpy-discussion] =?utf-8?q?Numpy_arrays_shareable_among_related?= =?utf-8?b?CXByb2Nlc3NlcyAoUFIgIzc1MzMp?= References: <274831917484699028.356959sturla.molden-gmail.com@news.gmane.org> <5733C086.9020403@gmail.com> <732428194484726233.271444sturla.molden-gmail.com@news.gmane.org> <20160512173810.79fb21b5@fsol> Message-ID: Antoine Pitrou pitrou.net> writes: > > On Thu, 12 May 2016 06:27:43 +0000 (UTC) > Sturla Molden gmail.com> wrote: > > > Allan Haldane gmail.com> wrote: > > > > > You probably already know this, but I just wanted to note that the > > > mpi4py module has worked around pickle too. They discuss how they > > > efficiently transfer numpy arrays in mpi messages here: > > > http://pythonhosted.org/mpi4py/usrman/overview.html#communicating- python-objects-and-array-data > > > > Unless I am mistaken, they use the PEP 3118 buffer interface to support > > NumPy as well as a number of other Python objects. However, this protocol > > makes buffer aquisition an expensive operation. > > Can you define "expensive"? > > > You can see this in Cython > > if you use typed memory views. Assigning a NumPy array to a typed > > memoryview (i,e, buffer acqisition) is slow. > > You're assuming this is the cost of "buffer acquisition", while most > likely it's the cost of creating the memoryview object itself. > > Buffer acquisition itself only calls a single C callback and uses a > stack-allocated C structure. It shouldn't be "expensive". > > Regards > > Antoine. > When I looked at it, using a typed memoryview was between 7-50 times slower than using numpy directly: http://thread.gmane.org/gmane.comp.python.cython.devel/14626 It looks like there was some improvement since then: https://github.com/numpy/numpy/pull/3779 ...and repeating my experiment shows the deficit is down to 3-11 times slower. In [5]: x = randn(10000) In [6]: %timeit echo_memview(x) The slowest run took 14.98 times longer than the fastest. This could mean that an intermediate result is being cached. 100000 loops, best of 3: 5.31 ?s per loop In [7]: %timeit echo_memview_nocast(x) The slowest run took 10.80 times longer than the fastest. This could mean that an intermediate result is being cached. 1000000 loops, best of 3: 1.58 ?s per loop In [8]: %timeit echo_numpy(x) The slowest run took 58.81 times longer than the fastest. This could mean that an intermediate result is being cached. 1000000 loops, best of 3: 474 ns per loop -Dave From sturla.molden at gmail.com Thu May 12 19:32:35 2016 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 12 May 2016 23:32:35 +0000 (UTC) Subject: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533) References: <1624973999484645450.688294sturla.molden-gmail.com@news.gmane.org> <5733735E.2040309@gmail.com>

<1338223165484699842.726170sturla.molden-gmail.com@news.gmane.org> Message-ID: <864303179484787737.086354sturla.molden-gmail.com@news.gmane.org> Feng Yu wrote: > In most (half?) situations the result can be directly write back via > preallocated shared array before works are spawned. Then there is no > need to pass data back with named segments. You can work around it in various ways, this being one of them. Personally I prefer a parallel programming style with queues ? either to scatter arrays to workers and collecting arrays from workers, or to chain workers together in a pipeline (without using coroutines). But exactly how you program is a matter of taste. I want to make it as inexpensive as possible to pass a NumPy array through a queue. If anyone else wants to help improve parallel programming with NumPy using a different paradigm, that is fine too. I just wanted to clarify why I stopped working on shared memory arrays. (As for the implementation, I am also experimenting with platform dependent asynchronous I/O (IOCP, GCD or kqueue, epoll) to pass NumPy arrays though a queue as inexpensively and scalably as possible. And no, there is no public repo, as I like to experiment with my pet project undisturbed before I let it out in the wild.) Sturla From sturla.molden at gmail.com Fri May 13 13:29:13 2016 From: sturla.molden at gmail.com (Sturla Molden) Date: Fri, 13 May 2016 17:29:13 +0000 (UTC) Subject: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533) References: <1624973999484645450.688294sturla.molden-gmail.com@news.gmane.org> <5733735E.2040309@gmail.com>

<1338223165484699842.726170sturla.molden-gmail.com@news.gmane.org> Message-ID: <444540179484853062.734900sturla.molden-gmail.com@news.gmane.org> Niki Spahiev wrote: > Apparently next Win10 will have fork as part of bash integration. It is Interix/SUA rebranded "Subsystem for Linux". It remains to be seen how long it will stay this time. Also a Python built for this subsystem will not run on the Win32 subsystem, so there is no graphics. Also it will not be installed by default, just like SUA. From rainwoodman at gmail.com Fri May 13 15:44:34 2016 From: rainwoodman at gmail.com (Feng Yu) Date: Fri, 13 May 2016 12:44:34 -0700 Subject: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533) In-Reply-To: <864303179484787737.086354sturla.molden-gmail.com@news.gmane.org> References: <1624973999484645450.688294sturla.molden-gmail.com@news.gmane.org> <5733735E.2040309@gmail.com>

<1338223165484699842.726170sturla.molden-gmail.com@news.gmane.org> <864303179484787737.086354sturla.molden-gmail.com@news.gmane.org> Message-ID: > > Personally I prefer a parallel programming style with queues ? either to > scatter arrays to workers and collecting arrays from workers, or to chain > workers together in a pipeline (without using coroutines). But exactly how > you program is a matter of taste. I want to make it as inexpensive as > possible to pass a NumPy array through a queue. If anyone else wants to > help improve parallel programming with NumPy using a different paradigm, > that is fine too. I just wanted to clarify why I stopped working on shared > memory arrays. Even I am not very obsessed with functional and queues, I still have to agree with you queues tend to produce more readable and less verbose code -- if there is the right tool. > > (As for the implementation, I am also experimenting with platform dependent > asynchronous I/O (IOCP, GCD or kqueue, epoll) to pass NumPy arrays though a > queue as inexpensively and scalably as possible. And no, there is no public > repo, as I like to experiment with my pet project undisturbed before I let > it out in the wild.) It will be wonderful if there is a way to pass numpy array around without a huge dependency list. After all, we know the address of the array and, in principle we are able to find the physical pages and map them in the receiver side. Also, did you checkout http://zeromq.org/blog:zero-copy ? ZeroMQ is a dependency of Jupyter, so it is quite available. - Yu > > > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion From sturla.molden at gmail.com Fri May 13 19:15:59 2016 From: sturla.molden at gmail.com (Sturla Molden) Date: Fri, 13 May 2016 23:15:59 +0000 (UTC) Subject: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533) References: <1624973999484645450.688294sturla.molden-gmail.com@news.gmane.org> <5733735E.2040309@gmail.com>

<1338223165484699842.726170sturla.molden-gmail.com@news.gmane.org> <864303179484787737.086354sturla.molden-gmail.com@news.gmane.org> Message-ID: <1163774665484873595.652105sturla.molden-gmail.com@news.gmane.org> Feng Yu wrote: > Also, did you checkout http://zeromq.org/blog:zero-copy ? > ZeroMQ is a dependency of Jupyter, so it is quite available. ZeroMQ is great, but it lacks some crucial features. In particular it does not support IPC on Windows. Ideally one should e.g. use Unix doman sockets on Linux and named pipes on Windows. Most MPI implementations seems to prefer shared memory over these mechanisms, though. Also I am not sure about ZeroMQ and asynch i/o. I would e.g. like to use IOCP on Windows, GCD on Mac, and a threadpool plus epoll on Linux. Sturla From phillip.m.feldman at gmail.com Sat May 14 03:23:07 2016 From: phillip.m.feldman at gmail.com (Phillip Feldman) Date: Sat, 14 May 2016 00:23:07 -0700 Subject: [Numpy-discussion] three-way comparisons Message-ID: I often find a need to do the type of comparison done by function shown below. I suspect that this would be more efficient for large arrays if implemented direction in C. Is there any possibility of adding something like this to NumPy? def three_way(x, y): """ This function performs a 3-way comparison on `x` and `y`, which must be either lists or arrays of compatible shape. Each pair of items or elements-- let's call them x[i] and y[i]--are compared. The corresponding element in the output array is 1 if `x[i]` is greater then `y[i]`, -1 of `x[i]` is less, and zero if the two are equal. """ numpy.greater(y, x).astype(int) - numpy.less(y, x).astype(int) Phillip -------------- next part -------------- An HTML attachment was scrubbed... URL: From yellowhat46 at gmail.com Sat May 14 09:27:14 2016 From: yellowhat46 at gmail.com (Vasco Gervasi) Date: Sat, 14 May 2016 15:27:14 +0200 Subject: [Numpy-discussion] FFT and reconstruct Message-ID: Hi all, I am trying to understand how FFT work, so I wrote the attached script. The idea is to extract amplitude and phase of a signal and then reconstruct using amplitude and phase information. As you can see, I create some cosine curve on the interval t0-t1. Let's start with t0=1.0 and t1=3.0 and consider just y['1'] = cos(1.0*omega*t), the signal is. [image: Immagine incorporata 1] The amplitude and phase for each order are: [image: Immagine incorporata 2] But if I try to reconstruct the signal using amplitude and phase: [image: Immagine incorporata 3] So as you can see there is a shifting of 180 deg. Now let's consider another case, t0=2 and t1=3, the signal is y['Signal'] = 1.0*cos(1.0*omega*t) + 2.0*cos(2.0*omega*t) + 3.0*cos(3.0*omega*t + pi/4) + 4.0*cos(4.0*omega*t) + 5.0*cos(5.0*omega*t) + 1.0 The reconstructed signal is very similar to the initial one: [image: Immagine incorporata 4] but is not exactly the same. Any advice? Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 138857 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 128252 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 160672 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 153579 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: FFT.py Type: application/octet-stream Size: 3190 bytes Desc: not available URL: From josef.pktd at gmail.com Sat May 14 11:03:29 2016 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 14 May 2016 11:03:29 -0400 Subject: [Numpy-discussion] three-way comparisons In-Reply-To: References: Message-ID: On Sat, May 14, 2016 at 3:23 AM, Phillip Feldman < phillip.m.feldman at gmail.com> wrote: > I often find a need to do the type of comparison done by function shown > below. I suspect that this would be more efficient for large arrays if > implemented direction in C. Is there any possibility of adding something > like this to NumPy? > > def three_way(x, y): > """ > This function performs a 3-way comparison on `x` and `y`, which must be > either lists or arrays of compatible shape. Each pair of items or > elements-- > let's call them x[i] and y[i]--are compared. The corresponding element > in > the output array is 1 if `x[i]` is greater then `y[i]`, -1 of `x[i]` is > less, > and zero if the two are equal. > """ > numpy.greater(y, x).astype(int) - numpy.less(y, x).astype(int) > isn't that the same as sign( x- y) ? Josef > > Phillip > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Permafacture at gmail.com Sat May 14 14:25:44 2016 From: Permafacture at gmail.com (Elliot Hallmark) Date: Sat, 14 May 2016 13:25:44 -0500 Subject: [Numpy-discussion] Why is this old bug still present? In-Reply-To: References: Message-ID: Sorry for the noob question. On numpy 10.4.1, I am bit by this: https://github.com/numpy/numpy/issues/4185 But it has been fixed 6 months ago: https://github.com/numpy/numpy/issues/6740 Do I need to compile numpy to get this fix on debian sid? Would anaconda be up to date enough? Elliot -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean at mehta.io Sun May 15 10:30:39 2016 From: sean at mehta.io (Sean Mehta) Date: Sun, 15 May 2016 10:30:39 -0400 Subject: [Numpy-discussion] Why is this old bug still present? In-Reply-To: References:

Message-ID: <1463322639.3418552.608328433.5AA59FD4@webmail.messagingengine.com> On Sat, May 14, 2016, at 14:25, Elliot Hallmark wrote: > On numpy 10.4.1,? I am bit by this: https://github.com/numpy/numpy/issues/4185 ? I assume you mean 1.10.4? ? > But it has been fixed 6 months ago: https://github.com/numpy/numpy/issues/6740 > Do I need to compile numpy to get this fix on debian sid? Would anaconda be up to date enough? ? This fix was included in 1.11.0. My Anaconda installation hasn't updated to 1.11.0 yet, although their package list [0] includes the 1.11.0 packages. Although I don't use Debian, a quick search in the Debian unstable package list [1] seems to suggest that 1.11.0 is available. I ended up building 1.11.0 manually for myself. [0] http://repo.continuum.io/pkgs/free/linux-64/ [1] https://packages.debian.org/search?keywords=numpy&searchon=names&suite=unstable§ion=all ? From Permafacture at gmail.com Sun May 15 21:42:58 2016 From: Permafacture at gmail.com (Elliot Hallmark) Date: Sun, 15 May 2016 20:42:58 -0500 Subject: [Numpy-discussion] Why is this old bug still present? In-Reply-To: <1463322639.3418552.608328433.5AA59FD4@webmail.messagingengine.com> References:

<1463322639.3418552.608328433.5AA59FD4@webmail.messagingengine.com> Message-ID: Thanks! Man, I got every one of those numbers in the wrong place... -------------- next part -------------- An HTML attachment was scrubbed... URL: From rainwoodman at gmail.com Mon May 16 02:46:18 2016 From: rainwoodman at gmail.com (Feng Yu) Date: Sun, 15 May 2016 23:46:18 -0700 Subject: [Numpy-discussion] FFT and reconstruct In-Reply-To: References: Message-ID: Hi Vasco, It looks slightly strange that you are using cos instead of exp in the reconstruction of the signal. I'd recommend you take a look at http://docs.scipy.org/doc/numpy-1.10.1/reference/routines.fft.html Also the documents of fftfreq, fftshift, and ifft. Best, - Yu On Sat, May 14, 2016 at 6:27 AM, Vasco Gervasi wrote: > Hi all, > I am trying to understand how FFT work, so I wrote the attached script. > The idea is to extract amplitude and phase of a signal and then > reconstruct using amplitude and phase information. > As you can see, I create some cosine curve on the interval t0-t1. > Let's start with t0=1.0 and t1=3.0 and consider just y['1'] > = cos(1.0*omega*t), the signal is. > [image: Immagine incorporata 1] > The amplitude and phase for each order are: > [image: Immagine incorporata 2] > But if I try to reconstruct the signal using amplitude and phase: > [image: Immagine incorporata 3] > So as you can see there is a shifting of 180 deg. > > Now let's consider another case, t0=2 and t1=3, the signal is > y['Signal'] = 1.0*cos(1.0*omega*t) + 2.0*cos(2.0*omega*t) + > 3.0*cos(3.0*omega*t + pi/4) + 4.0*cos(4.0*omega*t) + 5.0*cos(5.0*omega*t) + > 1.0 > The reconstructed signal is very similar to the initial one: > [image: Immagine incorporata 4] > but is not exactly the same. > Any advice? > > Thanks > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 160672 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 153579 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 138857 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 128252 bytes Desc: not available URL: From mailinglists at xgm.de Mon May 16 10:08:07 2016 From: mailinglists at xgm.de (Florian Lindner) Date: Mon, 16 May 2016 16:08:07 +0200 Subject: [Numpy-discussion] Remove a random sample from array Message-ID: <3366973.oDrVMeR9Kb@horus> Hello, I have an array of shape (n, 2) from which I want to extract a random sample of 20% of rows. The choosen samples should be removed the original array and moved to a new array of the same shape (n, 2). What is the most clever way to do with numpy? Thanks, Florian From Permafacture at gmail.com Mon May 16 11:01:38 2016 From: Permafacture at gmail.com (Elliot Hallmark) Date: Mon, 16 May 2016 10:01:38 -0500 Subject: [Numpy-discussion] Remove a random sample from array In-Reply-To: <3366973.oDrVMeR9Kb@horus> References: <3366973.oDrVMeR9Kb@horus> Message-ID: What do you mean remove them from the array? Replace with zero or NaN? On May 16, 2016 9:08 AM, "Florian Lindner" wrote: > Hello, > > I have an array of shape (n, 2) from which I want to extract a random > sample > of 20% of rows. The choosen samples should be removed the original array > and > moved to a new array of the same shape (n, 2). > > What is the most clever way to do with numpy? > > Thanks, > Florian > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mailinglists at xgm.de Mon May 16 11:45:04 2016 From: mailinglists at xgm.de (Florian Lindner) Date: Mon, 16 May 2016 17:45:04 +0200 Subject: [Numpy-discussion] Remove a random sample from array In-Reply-To: References: <3366973.oDrVMeR9Kb@horus> Message-ID: <1689041.8OOJlUXQ1U@horus> Am Montag, 16. Mai 2016, 10:01:38 CEST schrieb Elliot Hallmark: > What do you mean remove them from the array? Replace with zero or NaN? Removed like when 10 samples are taken from a (100, 2) array it becomes a (90, 2) array. Copying the array is no problem, it removing inplace is not possible. Best, Florian > On May 16, 2016 9:08 AM, "Florian Lindner" wrote: > > Hello, > > > > I have an array of shape (n, 2) from which I want to extract a random > > sample > > of 20% of rows. The choosen samples should be removed the original array > > and > > moved to a new array of the same shape (n, 2). > > > > What is the most clever way to do with numpy? > > > > Thanks, > > Florian > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > https://mail.scipy.org/mailman/listinfo/numpy-discussion From martin.noblia at openmailbox.org Mon May 16 12:04:41 2016 From: martin.noblia at openmailbox.org (Martin Noblia) Date: Mon, 16 May 2016 13:04:41 -0300 Subject: [Numpy-discussion] Remove a random sample from array In-Reply-To: <3366973.oDrVMeR9Kb@horus> References: <3366973.oDrVMeR9Kb@horus> Message-ID: <5739EF99.2030304@openmailbox.org> I think with `np.random.choice` http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.random.choice.html On 05/16/2016 11:08 AM, Florian Lindner wrote: > Hello, > > I have an array of shape (n, 2) from which I want to extract a random sample > of 20% of rows. The choosen samples should be removed the original array and > moved to a new array of the same shape (n, 2). > > What is the most clever way to do with numpy? > > Thanks, > Florian > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion -- *Martin Noblia* -------------- next part -------------- An HTML attachment was scrubbed... URL: From Permafacture at gmail.com Mon May 16 12:24:55 2016 From: Permafacture at gmail.com (Elliot Hallmark) Date: Mon, 16 May 2016 11:24:55 -0500 Subject: [Numpy-discussion] Remove a random sample from array In-Reply-To: <5739EF99.2030304@openmailbox.org> References: <3366973.oDrVMeR9Kb@horus> <5739EF99.2030304@openmailbox.org> Message-ID: Use `random.shuffle(range(len(arr))` to make a list of indices. Use a slices to get your 20/80. Convert to integer arrays and index your original array with them. Use sorted on the 80% list if you need to preserve the order. -Elliot On Mon, May 16, 2016 at 11:04 AM, Martin Noblia < martin.noblia at openmailbox.org> wrote: > I think with `np.random.choice` > > > http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.random.choice.html > > > On 05/16/2016 11:08 AM, Florian Lindner wrote: > > Hello, > > I have an array of shape (n, 2) from which I want to extract a random sample > of 20% of rows. The choosen samples should be removed the original array and > moved to a new array of the same shape (n, 2). > > What is the most clever way to do with numpy? > > Thanks, > Florian > _______________________________________________ > NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttps://mail.scipy.org/mailman/listinfo/numpy-discussion > > > -- > *Martin Noblia* > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdboom at gmail.com Mon May 16 13:49:43 2016 From: mdboom at gmail.com (Michael Droettboom) Date: Mon, 16 May 2016 17:49:43 +0000 Subject: [Numpy-discussion] Scipy John Hunter Plotting Contest: DEADLINE EXTENDED TO JUNE 3 Message-ID: The Scipy John Hunter Excellence in Plotting Contest is a great opportunity to showcase advancement in the start of the art of visualization. Entries are still welcome, as the deadline has been extended to June 3, 2016. Entry Instructions - Participants are invited to submit scientific plots to be judged by a panel. - Entries must be submitted by June 3, 2016 via e-mail to plotting-contest at scipy.org - Plots may be produced with any combination of Python-based tools. (It is not required that they use matplotlib, for example.) - Source code for the plot must be provided, in the form of Python code and/or IPython notebook, along with a rendering of the plot in PDF format. If the original data can not be shared for reasons of size or licensing, ?fake? data may be substituted, along with an image of the plot using real data. - Each entry must include a 300-500 word abstract describing the plot and its scientific importance for a general scientific audience. - Entries will be judged on their clarity, innovation and aesthetics, but most importantly for their effectiveness in illuminating real scientific work. Entrants are encouraged to submit plots that were used during the course of research, rather than merely being hypothetical. - SciPy reserves the right to display any and all entries, whether prize-winning or not, at the conference, use in any materials or on its website, with attribution to the original author(s). Michael Droettboom, chair Jacob Vanderplas Phil Elson ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From yellowhat46 at gmail.com Mon May 16 15:27:27 2016 From: yellowhat46 at gmail.com (Vasco Gervasi) Date: Mon, 16 May 2016 21:27:27 +0200 Subject: [Numpy-discussion] FFT and reconstruct In-Reply-To: References: Message-ID: I would like to use cos, to reconstruct the signal, to verify phase angle. I will try ifft. Thanks, 2016-05-16 8:46 GMT+02:00 Feng Yu : > Hi Vasco, > > It looks slightly strange that you are using cos instead of exp in the > reconstruction of the signal. > > I'd recommend you take a look at > http://docs.scipy.org/doc/numpy-1.10.1/reference/routines.fft.html > > Also the documents of fftfreq, fftshift, and ifft. > > Best, > > - Yu > > On Sat, May 14, 2016 at 6:27 AM, Vasco Gervasi > wrote: > >> Hi all, >> I am trying to understand how FFT work, so I wrote the attached script. >> The idea is to extract amplitude and phase of a signal and then >> reconstruct using amplitude and phase information. >> As you can see, I create some cosine curve on the interval t0-t1. >> Let's start with t0=1.0 and t1=3.0 and consider just y['1'] >> = cos(1.0*omega*t), the signal is. >> [image: Immagine incorporata 1] >> The amplitude and phase for each order are: >> [image: Immagine incorporata 2] >> But if I try to reconstruct the signal using amplitude and phase: >> [image: Immagine incorporata 3] >> So as you can see there is a shifting of 180 deg. >> >> Now let's consider another case, t0=2 and t1=3, the signal is >> y['Signal'] = 1.0*cos(1.0*omega*t) + 2.0*cos(2.0*omega*t) + >> 3.0*cos(3.0*omega*t + pi/4) + 4.0*cos(4.0*omega*t) + 5.0*cos(5.0*omega*t) + >> 1.0 >> The reconstructed signal is very similar to the initial one: >> [image: Immagine incorporata 4] >> but is not exactly the same. >> Any advice? >> >> Thanks >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 160672 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 138857 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 128252 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image.png Type: image/png Size: 153579 bytes Desc: not available URL: From josef.pktd at gmail.com Mon May 16 23:00:07 2016 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 16 May 2016 23:00:07 -0400 Subject: [Numpy-discussion] Remove a random sample from array In-Reply-To: References: <3366973.oDrVMeR9Kb@horus> <5739EF99.2030304@openmailbox.org> Message-ID: On Mon, May 16, 2016 at 12:24 PM, Elliot Hallmark wrote: > Use `random.shuffle(range(len(arr))` to make a list of indices. Use a > slices to get your 20/80. Convert to integer arrays and index your > original array with them. Use sorted on the 80% list if you need to > preserve the order. > similar but simpler You can just random permute/shuffle an array with 20% ones (True) and 80% zeros (False) and use it as a mask to select from the original array. Josef > > > -Elliot > > On Mon, May 16, 2016 at 11:04 AM, Martin Noblia < > martin.noblia at openmailbox.org> wrote: > >> I think with `np.random.choice` >> >> >> http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.random.choice.html >> >> >> On 05/16/2016 11:08 AM, Florian Lindner wrote: >> >> Hello, >> >> I have an array of shape (n, 2) from which I want to extract a random sample >> of 20% of rows. The choosen samples should be removed the original array and >> moved to a new array of the same shape (n, 2). >> >> What is the most clever way to do with numpy? >> >> Thanks, >> Florian >> _______________________________________________ >> NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttps://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> -- >> *Martin Noblia* >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Mon May 16 23:54:29 2016 From: shoyer at gmail.com (Stephan Hoyer) Date: Mon, 16 May 2016 20:54:29 -0700 Subject: [Numpy-discussion] Proposal: numpy.random.random_seed Message-ID: I have recently encountered several use cases for randomly generate random number seeds: 1. When writing a library of stochastic functions that take a seed as an input argument, and some of these functions call multiple other such stochastic functions. Dask is one such example [1]. 2. When a library needs to produce results that are reproducible after calling numpy.random.seed, but that do not want to use the functions in numpy.random directly. This came up recently in a pandas pull request [2], because we want to allow using RandomState objects as an alternative to global state in numpy.random. A major advantage of this approach is that it provides an obvious alternative to reusing the private numpy.random._mtrand [3]. The implementation of this function (and the corresponding method on RandomState) is almost trivial, and I've already written such a utility for my code: def random_seed(): # numpy.random uses uint32 seeds np.random.randint(2 ** 32) The advantage of adding a new method is that it avoids the need for explanation by making the intent of code using this pattern obvious. So I think it is a good candidate for inclusion in numpy.random. Any opinions? [1] https://github.com/dask/dask/blob/e0b246221957c4bd618e57246f3a7ccc8863c494/dask/utils.py#L336 [2] https://github.com/pydata/pandas/pull/13161 [3] On a side note, if there's no longer a good reason to keep this object private, perhaps we should expose it in our public API. It would certainly be useful -- scikit-learn is already using it (see links in the pandas PR above). -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Tue May 17 00:32:35 2016 From: shoyer at gmail.com (Stephan Hoyer) Date: Mon, 16 May 2016 21:32:35 -0700 Subject: [Numpy-discussion] Proposal: numpy.random.random_seed In-Reply-To: References: Message-ID: Looking at the dask helper function again reminds me of an important cavaet to this approach, which was pointed out to me by Clark Fitzgerald. If you generate a moderately large number of random seeds in this fashion, you are quite likely to have collisions due to the Birthday Paradox. For example, you have a 50% chance of encountering at least one collision if you generate only 77,000 seeds: https://en.wikipedia.org/wiki/Birthday_attack The docstring for this function should document this limitation of the approach, which is still appropriate for a small number of seeds. Our implementation can also encourage creating these seeds in a single vectorized call to random_seed, which can significantly reduce the likelihood of collisions between seeds generated in a single call to random_seed with something like the following: def random_seed(size): base = np.random.randint(2 ** 32) offset = np.arange(size) return (base + offset) % (2 ** 32) In principle, I believe this could generate the full 2 ** 32 unique seeds without any collisions. Cryptography experts, please speak up if I'm mistaken here. On Mon, May 16, 2016 at 8:54 PM, Stephan Hoyer wrote: > I have recently encountered several use cases for randomly generate random > number seeds: > > 1. When writing a library of stochastic functions that take a seed as an > input argument, and some of these functions call multiple other such > stochastic functions. Dask is one such example [1]. > > 2. When a library needs to produce results that are reproducible after > calling numpy.random.seed, but that do not want to use the functions in > numpy.random directly. This came up recently in a pandas pull request [2], > because we want to allow using RandomState objects as an alternative to > global state in numpy.random. A major advantage of this approach is that it > provides an obvious alternative to reusing the private numpy.random._mtrand > [3]. > > The implementation of this function (and the corresponding method on > RandomState) is almost trivial, and I've already written such a utility for > my code: > > def random_seed(): > # numpy.random uses uint32 seeds > np.random.randint(2 ** 32) > > The advantage of adding a new method is that it avoids the need for > explanation by making the intent of code using this pattern obvious. So I > think it is a good candidate for inclusion in numpy.random. > > Any opinions? > > [1] > https://github.com/dask/dask/blob/e0b246221957c4bd618e57246f3a7ccc8863c494/dask/utils.py#L336 > [2] https://github.com/pydata/pandas/pull/13161 > [3] On a side note, if there's no longer a good reason to keep this object > private, perhaps we should expose it in our public API. It would certainly > be useful -- scikit-learn is already using it (see links in the pandas PR > above). > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue May 17 03:18:52 2016 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 17 May 2016 08:18:52 +0100 Subject: [Numpy-discussion] Proposal: numpy.random.random_seed In-Reply-To: References: Message-ID: On Tue, May 17, 2016 at 4:54 AM, Stephan Hoyer wrote: > > I have recently encountered several use cases for randomly generate random number seeds: > > 1. When writing a library of stochastic functions that take a seed as an input argument, and some of these functions call multiple other such stochastic functions. Dask is one such example [1]. Can you clarify the use case here? I don't really know what you are doing here, but I'm pretty sure this is not the right approach. > 2. When a library needs to produce results that are reproducible after calling numpy.random.seed, but that do not want to use the functions in numpy.random directly. This came up recently in a pandas pull request [2], because we want to allow using RandomState objects as an alternative to global state in numpy.random. A major advantage of this approach is that it provides an obvious alternative to reusing the private numpy.random._mtrand [3]. It's only pseudo-private. This is an authorized use of it. However, for this case, I usually just pass around the the numpy.random module itself and let duck-typing take care of the rest. > [3] On a side note, if there's no longer a good reason to keep this object private, perhaps we should expose it in our public API. It would certainly be useful -- scikit-learn is already using it (see links in the pandas PR above). Adding a public get_global_random_state() function might be in order. Originally, I wanted there to be *some* barrier to entry, but just grabbing it to use as a default RandomState object is definitely an intended use of it. It's not going to disappear. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Tue May 17 04:09:28 2016 From: shoyer at gmail.com (Stephan Hoyer) Date: Tue, 17 May 2016 01:09:28 -0700 Subject: [Numpy-discussion] Proposal: numpy.random.random_seed In-Reply-To: References: Message-ID: On Tue, May 17, 2016 at 12:18 AM, Robert Kern wrote: > On Tue, May 17, 2016 at 4:54 AM, Stephan Hoyer wrote: > > 1. When writing a library of stochastic functions that take a seed as an > input argument, and some of these functions call multiple other such > stochastic functions. Dask is one such example [1]. > > Can you clarify the use case here? I don't really know what you are doing > here, but I'm pretty sure this is not the right approach. > Here's a contrived example. Suppose I've written a simulator for cars that consists of a number of loosely connected components (e.g., an engine, brakes, etc.). The behavior of each component of our simulator is stochastic, but we want everything to be fully reproducible, so we need to use seeds or RandomState objects. We might write our simulate_car function like the following: def simulate_car(engine_config, brakes_config, seed=None): rs = np.random.RandomState(seed) engine = simulate_engine(engine_config, seed=rs.random_seed()) brakes = simulate_brakes(brakes_config, seed=rs.random_seed()) ... The problem with passing the same RandomState object (either explicitly or dropping the seed argument entirely and using the global state) to both simulate_engine and simulate_breaks is that it breaks encapsulation -- if I change what I do inside simulate_engine, it also effects the brakes. The dask use case is actually pretty different -- the intent is to create many random numbers in parallel using multiple threads or processes (possibly in a distributed fashion). I know that skipping ahead is the standard way to get independent number streams for parallel sampling, but that isn't exposed in numpy.random, and setting distinct seeds seems like a reasonable alternative for scientific computing use cases. > It's only pseudo-private. This is an authorized use of it. > > However, for this case, I usually just pass around the the numpy.random > module itself and let duck-typing take care of the rest. > I like the duck-typing approach. That's very elegant. If this is an authorized use of the global RandomState object, let's document it! Otherwise cautious library maintainers like myself will discourage using it :). > > [3] On a side note, if there's no longer a good reason to keep this > object private, perhaps we should expose it in our public API. It would > certainly be useful -- scikit-learn is already using it (see links in the > pandas PR above). > > Adding a public get_global_random_state() function might be in order. > Yes, possibly. -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue May 17 04:49:45 2016 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 17 May 2016 09:49:45 +0100 Subject: [Numpy-discussion] Proposal: numpy.random.random_seed In-Reply-To: References:

Message-ID: On Tue, May 17, 2016 at 9:09 AM, Stephan Hoyer wrote: > > On Tue, May 17, 2016 at 12:18 AM, Robert Kern wrote: >> >> On Tue, May 17, 2016 at 4:54 AM, Stephan Hoyer wrote: >> > 1. When writing a library of stochastic functions that take a seed as an input argument, and some of these functions call multiple other such stochastic functions. Dask is one such example [1]. >> >> Can you clarify the use case here? I don't really know what you are doing here, but I'm pretty sure this is not the right approach. > > Here's a contrived example. Suppose I've written a simulator for cars that consists of a number of loosely connected components (e.g., an engine, brakes, etc.). The behavior of each component of our simulator is stochastic, but we want everything to be fully reproducible, so we need to use seeds or RandomState objects. > > We might write our simulate_car function like the following: > > def simulate_car(engine_config, brakes_config, seed=None): > rs = np.random.RandomState(seed) > engine = simulate_engine(engine_config, seed=rs.random_seed()) > brakes = simulate_brakes(brakes_config, seed=rs.random_seed()) > ... > > The problem with passing the same RandomState object (either explicitly or dropping the seed argument entirely and using the global state) to both simulate_engine and simulate_breaks is that it breaks encapsulation -- if I change what I do inside simulate_engine, it also effects the brakes. That's a little too contrived, IMO. In most such simulations, the different components interact with each other in the normal course of the simulation; that's why they are both joined together in the same simulation instead of being two separate runs. Unless if the components are being run across a process or thread boundary (a la dask below) where true nondeterminism comes into play, then I don't think you want these semi-independent streams. This seems to be the advice du jour from the agent-based modeling community. > The dask use case is actually pretty different -- the intent is to create many random numbers in parallel using multiple threads or processes (possibly in a distributed fashion). I know that skipping ahead is the standard way to get independent number streams for parallel sampling, but that isn't exposed in numpy.random, and setting distinct seeds seems like a reasonable alternative for scientific computing use cases. Forget about integer seeds. Those are for human convenience. If you're not jotting them down in your lab notebook in pen, you don't want an integer seed. What you want is a function that returns many RandomState objects that are hopefully spread around the MT19937 space enough that they are essentially independent (in the absence of true jumpahead). The better implementation of such a function would look something like this: def spread_out_prngs(n, root_prng=None): if root_prng is None: root_prng = np.random elif not isinstance(root_prng, np.random.RandomState): root_prng = np.random.RandomState(root_prng) sprouted_prngs = [] for i in range(n): seed_array = root_prng.randint(1<<32, size=624) # dtype=np.uint32 under 1.11 sprouted_prngs.append(np.random.RandomState(seed_array)) return spourted_prngs Internally, this generates seed arrays of about the size of the MT19937 state so make sure that you can access more of the state space. That will at least make the chance of collision tiny. And it can be easily rewritten to take advantage of one of the newer PRNGs that have true independent streams: https://github.com/bashtage/ng-numpy-randomstate -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From matej.tyc at gmail.com Tue May 17 05:04:07 2016 From: matej.tyc at gmail.com (=?UTF-8?B?TWF0xJtqIFTDvcSN?=) Date: Tue, 17 May 2016 11:04:07 +0200 Subject: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533) In-Reply-To: <1624973999484645450.688294sturla.molden-gmail.com@news.gmane.org> References: <1624973999484645450.688294sturla.molden-gmail.com@news.gmane.org> Message-ID: <9de8219e-79de-a474-79df-647921f3344a@gmail.com> On 11.5.2016 10:29, Sturla Molden wrote: > I did some work on this some years ago. ... > I am sorry, I have missed this discussion when it started. There are two cases when I had feeling that I had to use this functionality: - Parallel processing of HUGE data, and - using parallel processing in an application that had plug-ins which operated on one shared array (that was updated every one and then - it was a producer-consumer pattern thing). As everything got set up, it worked like a charm. The thing I especially like about the proposed module is the lack of external dependencies + it works if one knows how to use it. The bad thing about it is its fragility - I admit that using it as it is is not particularly intuitive. Unlike Sturla, I think that this is not a dead end, but it indeed feels clumsy. However, I dislike the necessity of writing Cython or C to get true multithreading for reasons I have mentioned - what if you want to run high-level Python functions in parallel? So, what I would really like to see is some kind of numpy documentation on how to approach parallel computing with numpy arrays (depending on what kind of task one wants to achieve). Maybe just using the queue is good enough, or there are those 3-rd party modules with known limitations? Plenty of people start off with numpy, so some kind of overview should be part of numpy docs. From sturla.molden at gmail.com Tue May 17 08:13:42 2016 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 17 May 2016 12:13:42 +0000 (UTC) Subject: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533) References: <1624973999484645450.688294sturla.molden-gmail.com@news.gmane.org> <9de8219e-79de-a474-79df-647921f3344a@gmail.com> Message-ID: <1502440745485178811.176557sturla.molden-gmail.com@news.gmane.org> Mat?j T?? wrote: > - Parallel processing of HUGE data, and This is mainly a Windows problem, as copy-on-write fork() will solve this on any other platform. I am more in favor of asking Microsoft to fix their broken OS. Also observe that the usefulness of shared memory is very limited on Windows, as we in practice never get the same base address in a spawned process. This prevents sharing data structures with pointers and Python objects. Anything more complex than an array cannot be shared. What this means is that shared memory is seldom useful for sharing huge data, even on Windows. It is only useful for this on Unix/Linux, where base addresses can stay they same. But on non-Windows platforms, the COW will in 99.99% of the cases be sufficient, thus make shared memory superfluous anyway. We don't need shared memory to scatter large data on Linux, only fork. As I see it. shared memory is mostly useful as a means to construct an inter-process communication (IPC) protocol. Sturla From josef.pktd at gmail.com Tue May 17 08:34:26 2016 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 17 May 2016 08:34:26 -0400 Subject: [Numpy-discussion] Proposal: numpy.random.random_seed In-Reply-To: References:

Message-ID: On Tue, May 17, 2016 at 4:49 AM, Robert Kern wrote: > On Tue, May 17, 2016 at 9:09 AM, Stephan Hoyer wrote: > > > > On Tue, May 17, 2016 at 12:18 AM, Robert Kern > wrote: > >> > >> On Tue, May 17, 2016 at 4:54 AM, Stephan Hoyer > wrote: > >> > 1. When writing a library of stochastic functions that take a seed as > an input argument, and some of these functions call multiple other such > stochastic functions. Dask is one such example [1]. > >> > >> Can you clarify the use case here? I don't really know what you are > doing here, but I'm pretty sure this is not the right approach. > > > > Here's a contrived example. Suppose I've written a simulator for cars > that consists of a number of loosely connected components (e.g., an engine, > brakes, etc.). The behavior of each component of our simulator is > stochastic, but we want everything to be fully reproducible, so we need to > use seeds or RandomState objects. > > > > We might write our simulate_car function like the following: > > > > def simulate_car(engine_config, brakes_config, seed=None): > > rs = np.random.RandomState(seed) > > engine = simulate_engine(engine_config, seed=rs.random_seed()) > > brakes = simulate_brakes(brakes_config, seed=rs.random_seed()) > > ... > > > > The problem with passing the same RandomState object (either explicitly > or dropping the seed argument entirely and using the global state) to both > simulate_engine and simulate_breaks is that it breaks encapsulation -- if I > change what I do inside simulate_engine, it also effects the brakes. > > That's a little too contrived, IMO. In most such simulations, the > different components interact with each other in the normal course of the > simulation; that's why they are both joined together in the same simulation > instead of being two separate runs. Unless if the components are being run > across a process or thread boundary (a la dask below) where true > nondeterminism comes into play, then I don't think you want these > semi-independent streams. This seems to be the advice du jour from the > agent-based modeling community. > similar usecase where I had to switch to using several RandomStates In a Monte Carlo experiment with increasing sample size, I want two random variables, x, y, to have the same the same draws in the common initial observations. If I draw x and y sequentially, and then increase the number of observations for the simulation, then it completely changes the draws for second variable if they use a common RandomState. With separate random states, increasing from 1000 to 1200 observations, leaves the first 1000 draws unchanged. (This reduces the Monte Carlo noise for example when calculating the power of a hypothesis test as function of the sample size.) Josef > > > > The dask use case is actually pretty different -- the intent is to > create many random numbers in parallel using multiple threads or processes > (possibly in a distributed fashion). I know that skipping ahead is the > standard way to get independent number streams for parallel sampling, but > that isn't exposed in numpy.random, and setting distinct seeds seems like a > reasonable alternative for scientific computing use cases. > > Forget about integer seeds. Those are for human convenience. If you're not > jotting them down in your lab notebook in pen, you don't want an integer > seed. > > What you want is a function that returns many RandomState objects that are > hopefully spread around the MT19937 space enough that they are essentially > independent (in the absence of true jumpahead). The better implementation > of such a function would look something like this: > > def spread_out_prngs(n, root_prng=None): > if root_prng is None: > root_prng = np.random > elif not isinstance(root_prng, np.random.RandomState): > root_prng = np.random.RandomState(root_prng) > sprouted_prngs = [] > for i in range(n): > seed_array = root_prng.randint(1<<32, size=624) # dtype=np.uint32 > under 1.11 > sprouted_prngs.append(np.random.RandomState(seed_array)) > return spourted_prngs > > Internally, this generates seed arrays of about the size of the MT19937 > state so make sure that you can access more of the state space. That will > at least make the chance of collision tiny. And it can be easily rewritten > to take advantage of one of the newer PRNGs that have true independent > streams: > > https://github.com/bashtage/ng-numpy-randomstate > > -- > Robert Kern > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Tue May 17 09:40:56 2016 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 17 May 2016 13:40:56 +0000 (UTC) Subject: [Numpy-discussion] Proposal: numpy.random.random_seed References: Message-ID: <1591394023485184927.488471sturla.molden-gmail.com@news.gmane.org> Stephan Hoyer wrote: > I have recently encountered several use cases for randomly generate random > number seeds: > > 1. When writing a library of stochastic functions that take a seed as an > input argument, and some of these functions call multiple other such > stochastic functions. Dask is one such example [1]. > > 2. When a library needs to produce results that are reproducible after > calling numpy.random.seed, but that do not want to use the functions in > numpy.random directly. This came up recently in a pandas pull request [2], > because we want to allow using RandomState objects as an alternative to > global state in numpy.random. A major advantage of this approach is that it > provides an obvious alternative to reusing the private numpy.random._mtrand > [3]. What about making numpy.random a finite state machine, and keeping a stack of RandomState seeds? That is, something similar to what OpenGL does for its matrices? Then we get two functions, numpy.random.push_seed and numpy.random.pop_seed. Sturla From robert.kern at gmail.com Tue May 17 09:48:45 2016 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 17 May 2016 14:48:45 +0100 Subject: [Numpy-discussion] Proposal: numpy.random.random_seed In-Reply-To: <1591394023485184927.488471sturla.molden-gmail.com@news.gmane.org> References: <1591394023485184927.488471sturla.molden-gmail.com@news.gmane.org> Message-ID: On Tue, May 17, 2016 at 2:40 PM, Sturla Molden wrote: > > Stephan Hoyer wrote: > > I have recently encountered several use cases for randomly generate random > > number seeds: > > > > 1. When writing a library of stochastic functions that take a seed as an > > input argument, and some of these functions call multiple other such > > stochastic functions. Dask is one such example [1]. > > > > 2. When a library needs to produce results that are reproducible after > > calling numpy.random.seed, but that do not want to use the functions in > > numpy.random directly. This came up recently in a pandas pull request [2], > > because we want to allow using RandomState objects as an alternative to > > global state in numpy.random. A major advantage of this approach is that it > > provides an obvious alternative to reusing the private numpy.random._mtrand > > [3]. > > What about making numpy.random a finite state machine, and keeping a stack > of RandomState seeds? That is, something similar to what OpenGL does for > its matrices? Then we get two functions, numpy.random.push_seed and > numpy.random.pop_seed. I don't think that addresses the issues brought up here. It's just more global state to worry about. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewm at redtetrahedron.org Tue May 17 09:50:41 2016 From: ewm at redtetrahedron.org (Eric Moore) Date: Tue, 17 May 2016 09:50:41 -0400 Subject: [Numpy-discussion] Proposal: numpy.random.random_seed In-Reply-To: <1591394023485184927.488471sturla.molden-gmail.com@news.gmane.org> References: <1591394023485184927.488471sturla.molden-gmail.com@news.gmane.org> Message-ID: On Tue, May 17, 2016 at 9:40 AM, Sturla Molden wrote: > Stephan Hoyer wrote: > > I have recently encountered several use cases for randomly generate > random > > number seeds: > > > > 1. When writing a library of stochastic functions that take a seed as an > > input argument, and some of these functions call multiple other such > > stochastic functions. Dask is one such example [1]. > > > > 2. When a library needs to produce results that are reproducible after > > calling numpy.random.seed, but that do not want to use the functions in > > numpy.random directly. This came up recently in a pandas pull request > [2], > > because we want to allow using RandomState objects as an alternative to > > global state in numpy.random. A major advantage of this approach is that > it > > provides an obvious alternative to reusing the private > numpy.random._mtrand > > [3]. > > > What about making numpy.random a finite state machine, and keeping a stack > of RandomState seeds? That is, something similar to what OpenGL does for > its matrices? Then we get two functions, numpy.random.push_seed and > numpy.random.pop_seed. > I don't like the idea of adding this kind of internal state. Having it built into the module means that it is shared by all callers, libraries user code etc. That's not the right choice when a stack of seeds could be easily built around the RandomState object if that is really what someone needs. Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue May 17 13:24:05 2016 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 17 May 2016 10:24:05 -0700 Subject: [Numpy-discussion] Proposal: numpy.random.random_seed In-Reply-To: References:

Message-ID: On May 17, 2016 1:50 AM, "Robert Kern" wrote: > [...] > What you want is a function that returns many RandomState objects that are hopefully spread around the MT19937 space enough that they are essentially independent (in the absence of true jumpahead). The better implementation of such a function would look something like this: > > def spread_out_prngs(n, root_prng=None): > if root_prng is None: > root_prng = np.random > elif not isinstance(root_prng, np.random.RandomState): > root_prng = np.random.RandomState(root_prng) > sprouted_prngs = [] > for i in range(n): > seed_array = root_prng.randint(1<<32, size=624) # dtype=np.uint32 under 1.11 > sprouted_prngs.append(np.random.RandomState(seed_array)) > return spourted_prngs Maybe a nice way to encapsulate this in the RandomState interface would be a method RandomState.random_state() that generates and returns a new child RandomState. > Internally, this generates seed arrays of about the size of the MT19937 state so make sure that you can access more of the state space. That will at least make the chance of collision tiny. And it can be easily rewritten to take advantage of one of the newer PRNGs that have true independent streams: > > https://github.com/bashtage/ng-numpy-randomstate ... But unfortunately I'm not sure how to make my interface suggestion above work on top of one of these RNGs, because for RandomState.random_state you really want a tree of independent RNGs and the fancy new PRNGs only provide a single flat namespace :-/. And even more annoyingly, the tree API is actually a nicer API, because with a flat namespace you have to know up front about all possible RNGs your code will use, which is an unfortunate global coupling that makes it difficult to compose programs out of independent pieces, while the RandomState.random_state approach composes beautifully. Maybe there's some clever way to allocate a 64-bit namespace to make it look tree-like? I'm not sure 64 bits is really enough... -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue May 17 13:41:31 2016 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 17 May 2016 18:41:31 +0100 Subject: [Numpy-discussion] Proposal: numpy.random.random_seed In-Reply-To: References:

Message-ID: On Tue, May 17, 2016 at 6:24 PM, Nathaniel Smith wrote: > > On May 17, 2016 1:50 AM, "Robert Kern" wrote: > > > [...] > > What you want is a function that returns many RandomState objects that are hopefully spread around the MT19937 space enough that they are essentially independent (in the absence of true jumpahead). The better implementation of such a function would look something like this: > > > > def spread_out_prngs(n, root_prng=None): > > if root_prng is None: > > root_prng = np.random > > elif not isinstance(root_prng, np.random.RandomState): > > root_prng = np.random.RandomState(root_prng) > > sprouted_prngs = [] > > for i in range(n): > > seed_array = root_prng.randint(1<<32, size=624) # dtype=np.uint32 under 1.11 > > sprouted_prngs.append(np.random.RandomState(seed_array)) > > return spourted_prngs > > Maybe a nice way to encapsulate this in the RandomState interface would be a method RandomState.random_state() that generates and returns a new child RandomState. I disagree. This is a workaround in the absence of proper jumpahead or guaranteed-independent streams. I would not encourage it. > > Internally, this generates seed arrays of about the size of the MT19937 state so make sure that you can access more of the state space. That will at least make the chance of collision tiny. And it can be easily rewritten to take advantage of one of the newer PRNGs that have true independent streams: > > > > https://github.com/bashtage/ng-numpy-randomstate > > ... But unfortunately I'm not sure how to make my interface suggestion above work on top of one of these RNGs, because for RandomState.random_state you really want a tree of independent RNGs and the fancy new PRNGs only provide a single flat namespace :-/. And even more annoyingly, the tree API is actually a nicer API, because with a flat namespace you have to know up front about all possible RNGs your code will use, which is an unfortunate global coupling that makes it difficult to compose programs out of independent pieces, while the RandomState.random_state approach composes beautifully. Maybe there's some clever way to allocate a 64-bit namespace to make it look tree-like? I'm not sure 64 bits is really enough... MT19937 doesn't have a "tree" any more than the others. It's the same flat state space. You are just getting the illusion of a tree by hoping that you never collide. You ought to think about precisely the same global coupling issues with MT19937 as you do with guaranteed-independent streams. Hope-and-prayer isn't really a substitute for properly engineering your problem. It's just a moral hazard to promote this method to the main API. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From matej.tyc at gmail.com Tue May 17 16:49:20 2016 From: matej.tyc at gmail.com (=?UTF-8?B?TWF0xJtqIFTDvcSN?=) Date: Tue, 17 May 2016 22:49:20 +0200 Subject: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533) In-Reply-To: <1502440745485178811.176557sturla.molden-gmail.com@news.gmane.org> References: <1624973999484645450.688294sturla.molden-gmail.com@news.gmane.org> <9de8219e-79de-a474-79df-647921f3344a@gmail.com> <1502440745485178811.176557sturla.molden-gmail.com@news.gmane.org> Message-ID: <00ae9046-1228-5209-0c14-984deb528643@gmail.com> On 17.5.2016 14:13, Sturla Molden wrote: > Mat?j T?? wrote: > >> - Parallel processing of HUGE data, and > This is mainly a Windows problem, as copy-on-write fork() will solve this > on any other platform. ... That sounds interesting, could you elaborate on it a bit? Does it mean that if you pass the numpy array to the child process using Queue, no significant amount of data will flow through it? Or I shouldn't pass it using Queue at all and just rely on inheritance? Finally, I assume that passing it as an argument to the Process class is the worst option, because it will be pickled and unpickled. Or maybe you refer to modules s.a. joblib that use this functionality and expose only a nice interface? And finally, cow means that returning large arrays still involves data moving between processes, whereas the shm approach has the workaround that you can preallocate the result array by the parent process, where the worker process can write to. > What this means is that shared memory is seldom useful for sharing huge > data, even on Windows. It is only useful for this on Unix/Linux, where base > addresses can stay they same. But on non-Windows platforms, the COW will in > 99.99% of the cases be sufficient, thus make shared memory superfluous > anyway. We don't need shared memory to scatter large data on Linux, only > fork. I am actually quite comfortable with sharing numpy arrays only. It is a nice format for sharing large amounts of numbers, which is what I want and what many modules accept as input (e.g. the "shapely" module). From sturla.molden at gmail.com Tue May 17 17:03:04 2016 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 17 May 2016 21:03:04 +0000 (UTC) Subject: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533) References: <1624973999484645450.688294sturla.molden-gmail.com@news.gmane.org> <9de8219e-79de-a474-79df-647921f3344a@gmail.com> <1502440745485178811.176557sturla.molden-gmail.com@news.gmane.org> <00ae9046-1228-5209-0c14-984deb528643@gmail.com> Message-ID: <1126141486485211291.448744sturla.molden-gmail.com@news.gmane.org> Mat?j T?? wrote: > Does it mean > that if you pass the numpy array to the child process using Queue, no > significant amount of data will flow through it? This is what my shared memory arrayes do. > Or I shouldn't pass it > using Queue at all and just rely on inheritance? This is what David Baddeley's shared memory arrays do. > Finally, I assume that > passing it as an argument to the Process class is the worst option, > because it will be pickled and unpickled. My shared memory arrays only pickles the metadata, and can be used in this way. > Or maybe you refer to modules s.a. joblib that use this functionality > and expose only a nice interface? Joblib creates "share memory" by memory mapping a temporary file, which is back by RAM on Libux (tempfs). It is backed by a physical file on disk on Mac and Windows. In this resepect, joblib is much better on Linux than Mac or Windows. > And finally, cow means that returning large arrays still involves data > moving between processes, whereas the shm approach has the workaround > that you can preallocate the result array by the parent process, where > the worker process can write to. My shared memory arrays need no workaround dor this. They also allow shared memory arrays to be returned to the parent process. No preallocation is needed. Sturla From njs at pobox.com Tue May 17 20:14:33 2016 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 17 May 2016 17:14:33 -0700 Subject: [Numpy-discussion] Proposal: numpy.random.random_seed In-Reply-To: References:

Message-ID: On Tue, May 17, 2016 at 10:41 AM, Robert Kern wrote: > On Tue, May 17, 2016 at 6:24 PM, Nathaniel Smith wrote: >> >> On May 17, 2016 1:50 AM, "Robert Kern" wrote: >> > >> [...] >> > What you want is a function that returns many RandomState objects that >> > are hopefully spread around the MT19937 space enough that they are >> > essentially independent (in the absence of true jumpahead). The better >> > implementation of such a function would look something like this: >> > >> > def spread_out_prngs(n, root_prng=None): >> > if root_prng is None: >> > root_prng = np.random >> > elif not isinstance(root_prng, np.random.RandomState): >> > root_prng = np.random.RandomState(root_prng) >> > sprouted_prngs = [] >> > for i in range(n): >> > seed_array = root_prng.randint(1<<32, size=624) # >> > dtype=np.uint32 under 1.11 >> > sprouted_prngs.append(np.random.RandomState(seed_array)) >> > return spourted_prngs >> >> Maybe a nice way to encapsulate this in the RandomState interface would be >> a method RandomState.random_state() that generates and returns a new child >> RandomState. > > I disagree. This is a workaround in the absence of proper jumpahead or > guaranteed-independent streams. I would not encourage it. > >> > Internally, this generates seed arrays of about the size of the MT19937 >> > state so make sure that you can access more of the state space. That will at >> > least make the chance of collision tiny. And it can be easily rewritten to >> > take advantage of one of the newer PRNGs that have true independent streams: >> > >> > https://github.com/bashtage/ng-numpy-randomstate >> >> ... But unfortunately I'm not sure how to make my interface suggestion >> above work on top of one of these RNGs, because for RandomState.random_state >> you really want a tree of independent RNGs and the fancy new PRNGs only >> provide a single flat namespace :-/. And even more annoyingly, the tree API >> is actually a nicer API, because with a flat namespace you have to know up >> front about all possible RNGs your code will use, which is an unfortunate >> global coupling that makes it difficult to compose programs out of >> independent pieces, while the RandomState.random_state approach composes >> beautifully. Maybe there's some clever way to allocate a 64-bit namespace to >> make it look tree-like? I'm not sure 64 bits is really enough... > > MT19937 doesn't have a "tree" any more than the others. It's the same flat > state space. You are just getting the illusion of a tree by hoping that you > never collide. You ought to think about precisely the same global coupling > issues with MT19937 as you do with guaranteed-independent streams. > Hope-and-prayer isn't really a substitute for properly engineering your > problem. It's just a moral hazard to promote this method to the main API. Nonsense. If your definition of "hope and prayer" includes assuming that we won't encounter a random collision in a 2**19937 state space, then literally all engineering is hope-and-prayer. A collision could happen, but if it does it's overwhelmingly more likely to happen because of a flaw in the mathematical analysis, or a bug in the implementation, or because random quantum fluctuations caused you and your program to suddenly be transported to a parallel world where 1 + 1 = 1, than that you just got unlucky with your random state. And all of these hazards apply equally to both MT19937 and more modern PRNGs. ...anyway, the real reason I'm a bit grumpy is because there are solid engineering reasons why users *want* this API, so whether or not it turns out to be possible I think we should at least be allowed to have a discussion about whether there's some way to give it to them. It's not even 100% out of the question that we conclude that existing PRNGs are buggy because they don't take this use case into account -- it would be far from the first time that numpy found itself going beyond the limits of older numerical tools that weren't designed to build the kind of large composable systems that numpy gets used for. MT19937's state space is large enough that you could explicitly encode a "tree seed" into it, even if you don't trust the laws of probability -- e.g., you start with a RandomState with id [], then its children have id [0], [1], [2], ..., and their children have ids [0, 0], [0, 1], ..., [1, 0], ..., and you write these into the state (probably after sending them through some bit-mixing permutation), to guarantee non-collision. Or if you do trust the laws of probability, then the randomly-sample-a-PRNG approach is not 100% out of the question even for more modern PRNGs. For example, let's take a PRNG with a 64-bit stream id and a 64-bit state space per stream. Suppose that we know that in our application each PRNG will be used to draw 2**48 64-bit samples (= 2 pebibytes of random data), and that we will use 2**32 PRNGs (= total of 8 yobibytes of random data). If we randomly initialize each PRNG with a 128-bit (stream id, state) pair, then the chance that two of our streams will overlap is about the chance of getting a collision in an 80-bit state space (128 - 48) on 2**32 draws, which can be approximated[1] as 1 - exp(-(2**32)**2 / (2 * 2**80)) = 1 - exp(-2**64 / 2 ** 81) = 1 - exp(-1/2**17) = 7.6-chances-in-a-million To put this number in some kind of context, Sandia says that on 128 pebibyte clusters (which I assume is the kind of system you use for a simulation that needs multiple yobibytes of random data!) they see a double-bit (!) DRAM fault ~every 4 minutes [2]. Also, by the time we've drawn 2**48 samples from a single 64-bit stream than we've already violated independence because in 2**48 draws you should see a repeated value with probability ~1.0 (by the same birthday approximation as above), but we see them with probability 0. Alternatively, for a PRNG with a 256 bit state space, or 128-bit stream id + 128-bit state space, then one can draw 2**64 64-bit samples without violating independence, *and* do this for, say, 2**70 randomly initialized streams, and the probability of collision would still be on the order of 10**-16. Unless I totally messed up these calculations, I'd conclude that almost everyone would be totally fine with a .random_state() method combined with any reasonable generator, and for some reasonable generators (those with more than, like, ~192 bits of state?) it's just unconditionally safe for all physically realistic problem sizes. -n [1] https://en.wikipedia.org/wiki/Birthday_problem#Approximations [2] http://www.fiala.me/pubs/papers/sc12-redmpi.pdf -- Nathaniel J. Smith -- https://vorpus.org From robert.kern at gmail.com Wed May 18 08:07:30 2016 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 18 May 2016 13:07:30 +0100 Subject: [Numpy-discussion] Proposal: numpy.random.random_seed In-Reply-To: References:

Message-ID: On Wed, May 18, 2016 at 1:14 AM, Nathaniel Smith wrote: > > On Tue, May 17, 2016 at 10:41 AM, Robert Kern wrote: > > On Tue, May 17, 2016 at 6:24 PM, Nathaniel Smith wrote: > >> > >> On May 17, 2016 1:50 AM, "Robert Kern" wrote: > >> > > >> [...] > >> > What you want is a function that returns many RandomState objects that > >> > are hopefully spread around the MT19937 space enough that they are > >> > essentially independent (in the absence of true jumpahead). The better > >> > implementation of such a function would look something like this: > >> > > >> > def spread_out_prngs(n, root_prng=None): > >> > if root_prng is None: > >> > root_prng = np.random > >> > elif not isinstance(root_prng, np.random.RandomState): > >> > root_prng = np.random.RandomState(root_prng) > >> > sprouted_prngs = [] > >> > for i in range(n): > >> > seed_array = root_prng.randint(1<<32, size=624) # > >> > dtype=np.uint32 under 1.11 > >> > sprouted_prngs.append(np.random.RandomState(seed_array)) > >> > return spourted_prngs > >> > >> Maybe a nice way to encapsulate this in the RandomState interface would be > >> a method RandomState.random_state() that generates and returns a new child > >> RandomState. > > > > I disagree. This is a workaround in the absence of proper jumpahead or > > guaranteed-independent streams. I would not encourage it. > > > >> > Internally, this generates seed arrays of about the size of the MT19937 > >> > state so make sure that you can access more of the state space. That will at > >> > least make the chance of collision tiny. And it can be easily rewritten to > >> > take advantage of one of the newer PRNGs that have true independent streams: > >> > > >> > https://github.com/bashtage/ng-numpy-randomstate > >> > >> ... But unfortunately I'm not sure how to make my interface suggestion > >> above work on top of one of these RNGs, because for RandomState.random_state > >> you really want a tree of independent RNGs and the fancy new PRNGs only > >> provide a single flat namespace :-/. And even more annoyingly, the tree API > >> is actually a nicer API, because with a flat namespace you have to know up > >> front about all possible RNGs your code will use, which is an unfortunate > >> global coupling that makes it difficult to compose programs out of > >> independent pieces, while the RandomState.random_state approach composes > >> beautifully. Maybe there's some clever way to allocate a 64-bit namespace to > >> make it look tree-like? I'm not sure 64 bits is really enough... > > > > MT19937 doesn't have a "tree" any more than the others. It's the same flat > > state space. You are just getting the illusion of a tree by hoping that you > > never collide. You ought to think about precisely the same global coupling > > issues with MT19937 as you do with guaranteed-independent streams. > > Hope-and-prayer isn't really a substitute for properly engineering your > > problem. It's just a moral hazard to promote this method to the main API. > > Nonsense. > > If your definition of "hope and prayer" includes assuming that we > won't encounter a random collision in a 2**19937 state space, then > literally all engineering is hope-and-prayer. A collision could > happen, but if it does it's overwhelmingly more likely to happen > because of a flaw in the mathematical analysis, or a bug in the > implementation, or because random quantum fluctuations caused you and > your program to suddenly be transported to a parallel world where 1 + > 1 = 1, than that you just got unlucky with your random state. And all > of these hazards apply equally to both MT19937 and more modern PRNGs. Granted. > ...anyway, the real reason I'm a bit grumpy is because there are solid > engineering reasons why users *want* this API, I remain unconvinced on this mark. Grumpily. > so whether or not it > turns out to be possible I think we should at least be allowed to have > a discussion about whether there's some way to give it to them. I'm not shutting down discussion of the option. I *implemented* the option. I think that discussing whether it should be part of the main API is premature. There probably ought to be a paper or three out there supporting its safety and utility first. Let the utility function version flourish first. > It's > not even 100% out of the question that we conclude that existing PRNGs > are buggy because they don't take this use case into account -- it > would be far from the first time that numpy found itself going beyond > the limits of older numerical tools that weren't designed to build the > kind of large composable systems that numpy gets used for. > > MT19937's state space is large enough that you could explicitly encode > a "tree seed" into it, even if you don't trust the laws of probability > -- e.g., you start with a RandomState with id [], then its children > have id [0], [1], [2], ..., and their children have ids [0, 0], [0, > 1], ..., [1, 0], ..., and you write these into the state (probably > after sending them through some bit-mixing permutation), to guarantee > non-collision. Sure. Not entirely sure if that can be done without preallocating the branching factor or depth, but I'm sure there's some fancy combinatoric representation of an unbounded tree that could be exploited here. It seems likely to me that such a thing could be done with the stream IDs of PRNGs that support that. > Or if you do trust the laws of probability, then the > randomly-sample-a-PRNG approach is not 100% out of the question even > for more modern PRNGs. Tread carefully here. PRNGs are not RNGs. Using the root PRNG to generate N new seeds for the same PRNG does not necessarily generate N good, uncorrelated streams with high probability. Overlap is not the only factor in play. Long-term correlations happen even in the case where the N streams do not overlap, and the structure of the root PRNG could well generate N such correlated seeds. http://www.adv-radio-sci.net/12/75/2014/ See section 4.4. It does look like one can get good MT19937 streams with sequential integer seeds, when expanded by the standard sub-PRNG that we use for that purpose (yay!). However, if you use a different (but otherwise good-quality and general purpose) sub-PRNG to expand the key, you get increasing numbers of failures as you increase the number of parallel streams. It is not explored in the paper exactly why this is the case, but I suspect that the similarity of the second sub-PRNG algorithm to that of the MT itself is part of the problem. It is *likely* that the scheme I implemented will be okay (our array-seed expansion algorithm is similar to the integer-seed expansion and *probably* insulates the sprouted MTs from their MT generating source), but that remains to be demonstrated. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed May 18 11:50:25 2016 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 18 May 2016 08:50:25 -0700 Subject: [Numpy-discussion] Proposal: numpy.random.random_seed In-Reply-To: References:

Message-ID: > > > ...anyway, the real reason I'm a bit grumpy is because there are solid > > engineering reasons why users *want* this API, > Honestly, I am lost in the math -- but like any good engineer, I want to accomplish something anyway :-) I trust you guys to get this right -- or at least document what's "wrong" with it. But, if I'm reading the use case that started all this correctly, it closely matches my use-case. That is, I have a complex model with multiple independent "random" processes. And we want to be able to re-produce EXACTLY simulations -- our users get confused when the results are "different" even if in a statistically insignificant way. At the moment we are using one RNG, with one seed for everything. So we get reproducible results, but if one thing is changed, then the entire simulation is different -- which is OK, but it would be nicer to have each process using its own RNG stream with it's own seed. However, it matters not one whit if those seeds are independent -- the processes are different, you'd never notice if they were using the same PRN stream -- because they are used differently. So a "fairly low probability of a clash" would be totally fine. Granted, in a Monte Carlo simulation, it could be disastrous... :-) I guess the point is -- do something reasonable, and document its limitations, and we're all fine :-) And thanks for giving your attention to this. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed May 18 12:01:44 2016 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 18 May 2016 17:01:44 +0100 Subject: [Numpy-discussion] Proposal: numpy.random.random_seed In-Reply-To: References:

Message-ID: On Wed, May 18, 2016 at 4:50 PM, Chris Barker wrote: >> >> > ...anyway, the real reason I'm a bit grumpy is because there are solid >> > engineering reasons why users *want* this API, > > Honestly, I am lost in the math -- but like any good engineer, I want to accomplish something anyway :-) I trust you guys to get this right -- or at least document what's "wrong" with it. > > But, if I'm reading the use case that started all this correctly, it closely matches my use-case. That is, I have a complex model with multiple independent "random" processes. And we want to be able to re-produce EXACTLY simulations -- our users get confused when the results are "different" even if in a statistically insignificant way. > > At the moment we are using one RNG, with one seed for everything. So we get reproducible results, but if one thing is changed, then the entire simulation is different -- which is OK, but it would be nicer to have each process using its own RNG stream with it's own seed. However, it matters not one whit if those seeds are independent -- the processes are different, you'd never notice if they were using the same PRN stream -- because they are used differently. So a "fairly low probability of a clash" would be totally fine. Well, the main question is: do you need to be able to spawn dependent streams at arbitrary points to an arbitrary depth without coordination between processes? The necessity for multiple independent streams per se is not contentious. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed May 18 13:20:57 2016 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 18 May 2016 13:20:57 -0400 Subject: [Numpy-discussion] Proposal: numpy.random.random_seed In-Reply-To: References:

Message-ID: On Wed, May 18, 2016 at 12:01 PM, Robert Kern wrote: > On Wed, May 18, 2016 at 4:50 PM, Chris Barker > wrote: > >> > >> > ...anyway, the real reason I'm a bit grumpy is because there are solid > >> > engineering reasons why users *want* this API, > > > > Honestly, I am lost in the math -- but like any good engineer, I want to > accomplish something anyway :-) I trust you guys to get this right -- or at > least document what's "wrong" with it. > > > > But, if I'm reading the use case that started all this correctly, it > closely matches my use-case. That is, I have a complex model with multiple > independent "random" processes. And we want to be able to re-produce > EXACTLY simulations -- our users get confused when the results are > "different" even if in a statistically insignificant way. > > > > At the moment we are using one RNG, with one seed for everything. So we > get reproducible results, but if one thing is changed, then the entire > simulation is different -- which is OK, but it would be nicer to have each > process using its own RNG stream with it's own seed. However, it matters > not one whit if those seeds are independent -- the processes are different, > you'd never notice if they were using the same PRN stream -- because they > are used differently. So a "fairly low probability of a clash" would be > totally fine. > > Well, the main question is: do you need to be able to spawn dependent > streams at arbitrary points to an arbitrary depth without coordination > between processes? The necessity for multiple independent streams per se is > not contentious. > I'm similar to Chris, and didn't try to figure out the details of what you are talking about. However, if there are functions getting into numpy that help in using a best practice even if it's not bullet proof, then it's still better than home made approaches. If it get's in soon, then we can use it in a few years (given dependency lag). At that point there should be more distributed, nested simulation based algorithms where we don't know in advance how far we have to go to get reliable numbers or convergence. (But I don't see anything like that right now.) Josef > > -- > Robert Kern > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed May 18 13:56:50 2016 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 18 May 2016 18:56:50 +0100 Subject: [Numpy-discussion] Proposal: numpy.random.random_seed In-Reply-To: References:

Message-ID: On Wed, May 18, 2016 at 6:20 PM, wrote: > > On Wed, May 18, 2016 at 12:01 PM, Robert Kern wrote: >> >> On Wed, May 18, 2016 at 4:50 PM, Chris Barker wrote: >> >> >> >> > ...anyway, the real reason I'm a bit grumpy is because there are solid >> >> > engineering reasons why users *want* this API, >> > >> > Honestly, I am lost in the math -- but like any good engineer, I want to accomplish something anyway :-) I trust you guys to get this right -- or at least document what's "wrong" with it. >> > >> > But, if I'm reading the use case that started all this correctly, it closely matches my use-case. That is, I have a complex model with multiple independent "random" processes. And we want to be able to re-produce EXACTLY simulations -- our users get confused when the results are "different" even if in a statistically insignificant way. >> > >> > At the moment we are using one RNG, with one seed for everything. So we get reproducible results, but if one thing is changed, then the entire simulation is different -- which is OK, but it would be nicer to have each process using its own RNG stream with it's own seed. However, it matters not one whit if those seeds are independent -- the processes are different, you'd never notice if they were using the same PRN stream -- because they are used differently. So a "fairly low probability of a clash" would be totally fine. >> >> Well, the main question is: do you need to be able to spawn dependent streams at arbitrary points to an arbitrary depth without coordination between processes? The necessity for multiple independent streams per se is not contentious. > > I'm similar to Chris, and didn't try to figure out the details of what you are talking about. > > However, if there are functions getting into numpy that help in using a best practice even if it's not bullet proof, then it's still better than home made approaches. > If it get's in soon, then we can use it in a few years (given dependency lag). At that point there should be more distributed, nested simulation based algorithms where we don't know in advance how far we have to go to get reliable numbers or convergence. > > (But I don't see anything like that right now.) Current best practice is to use PRNGs with settable streams (or fixed jumpahead for those PRNGs cursed to not have settable streams but blessed to have super-long periods). The way to get those into numpy is to help Kevin Sheppard finish: https://github.com/bashtage/ng-numpy-randomstate He's done nearly all of the hard work already. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed May 18 14:56:17 2016 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 18 May 2016 11:56:17 -0700 Subject: [Numpy-discussion] Proposal: numpy.random.random_seed In-Reply-To: References:

Message-ID: On Wed, May 18, 2016 at 5:07 AM, Robert Kern wrote: > On Wed, May 18, 2016 at 1:14 AM, Nathaniel Smith wrote: >> >> On Tue, May 17, 2016 at 10:41 AM, Robert Kern >> wrote: >> > On Tue, May 17, 2016 at 6:24 PM, Nathaniel Smith wrote: >> >> >> >> On May 17, 2016 1:50 AM, "Robert Kern" wrote: >> >> > >> >> [...] >> >> > What you want is a function that returns many RandomState objects >> >> > that >> >> > are hopefully spread around the MT19937 space enough that they are >> >> > essentially independent (in the absence of true jumpahead). The >> >> > better >> >> > implementation of such a function would look something like this: >> >> > >> >> > def spread_out_prngs(n, root_prng=None): >> >> > if root_prng is None: >> >> > root_prng = np.random >> >> > elif not isinstance(root_prng, np.random.RandomState): >> >> > root_prng = np.random.RandomState(root_prng) >> >> > sprouted_prngs = [] >> >> > for i in range(n): >> >> > seed_array = root_prng.randint(1<<32, size=624) # >> >> > dtype=np.uint32 under 1.11 >> >> > sprouted_prngs.append(np.random.RandomState(seed_array)) >> >> > return spourted_prngs >> >> >> >> Maybe a nice way to encapsulate this in the RandomState interface would >> >> be >> >> a method RandomState.random_state() that generates and returns a new >> >> child >> >> RandomState. >> > >> > I disagree. This is a workaround in the absence of proper jumpahead or >> > guaranteed-independent streams. I would not encourage it. >> > >> >> > Internally, this generates seed arrays of about the size of the >> >> > MT19937 >> >> > state so make sure that you can access more of the state space. That >> >> > will at >> >> > least make the chance of collision tiny. And it can be easily >> >> > rewritten to >> >> > take advantage of one of the newer PRNGs that have true independent >> >> > streams: >> >> > >> >> > https://github.com/bashtage/ng-numpy-randomstate >> >> >> >> ... But unfortunately I'm not sure how to make my interface suggestion >> >> above work on top of one of these RNGs, because for >> >> RandomState.random_state >> >> you really want a tree of independent RNGs and the fancy new PRNGs only >> >> provide a single flat namespace :-/. And even more annoyingly, the tree >> >> API >> >> is actually a nicer API, because with a flat namespace you have to know >> >> up >> >> front about all possible RNGs your code will use, which is an >> >> unfortunate >> >> global coupling that makes it difficult to compose programs out of >> >> independent pieces, while the RandomState.random_state approach >> >> composes >> >> beautifully. Maybe there's some clever way to allocate a 64-bit >> >> namespace to >> >> make it look tree-like? I'm not sure 64 bits is really enough... >> > >> > MT19937 doesn't have a "tree" any more than the others. It's the same >> > flat >> > state space. You are just getting the illusion of a tree by hoping that >> > you >> > never collide. You ought to think about precisely the same global >> > coupling >> > issues with MT19937 as you do with guaranteed-independent streams. >> > Hope-and-prayer isn't really a substitute for properly engineering your >> > problem. It's just a moral hazard to promote this method to the main >> > API. >> >> Nonsense. >> >> If your definition of "hope and prayer" includes assuming that we >> won't encounter a random collision in a 2**19937 state space, then >> literally all engineering is hope-and-prayer. A collision could >> happen, but if it does it's overwhelmingly more likely to happen >> because of a flaw in the mathematical analysis, or a bug in the >> implementation, or because random quantum fluctuations caused you and >> your program to suddenly be transported to a parallel world where 1 + >> 1 = 1, than that you just got unlucky with your random state. And all >> of these hazards apply equally to both MT19937 and more modern PRNGs. > > Granted. > >> ...anyway, the real reason I'm a bit grumpy is because there are solid >> engineering reasons why users *want* this API, > > I remain unconvinced on this mark. Grumpily. Sorry for getting grumpy :-). The engineering reasons seem pretty obvious to me though? If you have any use case for independent streams at all, and you're writing code that's intended to live inside a library's abstraction barrier, then you need some way to choose your streams to avoid colliding with arbitrary other code that the end-user might assemble alongside yours as part of their final program. So AFAICT you have two options: either you need a "tree-style" API for allocating these streams, or else you need to add some explicit API to your library that lets the end-user control in detail which streams you use. Both are possible, but the latter is obviously undesireable if you can avoid it, since it breaks the abstraction barrier, making your library more complicated to use and harder to evolve. >> so whether or not it >> turns out to be possible I think we should at least be allowed to have >> a discussion about whether there's some way to give it to them. > > I'm not shutting down discussion of the option. I *implemented* the option. > I think that discussing whether it should be part of the main API is > premature. There probably ought to be a paper or three out there supporting > its safety and utility first. Let the utility function version flourish > first. OK -- I guess this particularly makes sense given how extra-tightly-constrained we currently are in fixing mistakes in np.random. But I feel like in the end the right place for this really is inside the RandomState interface, because the person implementing RandomState is the one best placed to understand (a) the gnarly technical details here, and (b) how those change depending on the particular PRNG in use. I don't want to end up with a bunch of subtly-buggy utility functions in non-specialist libraries like dask -- so we should be trying to help downstream users figure out how to actually get this into np.random? >> It's >> not even 100% out of the question that we conclude that existing PRNGs >> are buggy because they don't take this use case into account -- it >> would be far from the first time that numpy found itself going beyond >> the limits of older numerical tools that weren't designed to build the >> kind of large composable systems that numpy gets used for. >> >> MT19937's state space is large enough that you could explicitly encode >> a "tree seed" into it, even if you don't trust the laws of probability >> -- e.g., you start with a RandomState with id [], then its children >> have id [0], [1], [2], ..., and their children have ids [0, 0], [0, >> 1], ..., [1, 0], ..., and you write these into the state (probably >> after sending them through some bit-mixing permutation), to guarantee >> non-collision. > > Sure. Not entirely sure if that can be done without preallocating the > branching factor or depth, but I'm sure there's some fancy combinatoric > representation of an unbounded tree that could be exploited here. It seems > likely to me that such a thing could be done with the stream IDs of PRNGs > that support that. I'm pretty sure you do have to preallocate both the branching factor and depth, since getting the collision-free guarantee requires that across the universe of possible tree addresses, each state id gets used at most once -- and there are finitely many state ids. But as a practical matter, saying "you can only sprout up to 2**32 new states out of any given state, and you can only nest them 600 deep, and exceeding these bounds is an error" would still be "composable enough" for ~all practical purposes. >> Or if you do trust the laws of probability, then the >> randomly-sample-a-PRNG approach is not 100% out of the question even >> for more modern PRNGs. > > Tread carefully here. PRNGs are not RNGs. Using the root PRNG to generate N > new seeds for the same PRNG does not necessarily generate N good, > uncorrelated streams with high probability. Overlap is not the only factor > in play. Long-term correlations happen even in the case where the N streams > do not overlap, and the structure of the root PRNG could well generate N > such correlated seeds. > > http://www.adv-radio-sci.net/12/75/2014/ > > See section 4.4. It does look like one can get good MT19937 streams with > sequential integer seeds, when expanded by the standard sub-PRNG that we use > for that purpose (yay!). However, if you use a different (but otherwise > good-quality and general purpose) sub-PRNG to expand the key, you get > increasing numbers of failures as you increase the number of parallel > streams. It is not explored in the paper exactly why this is the case, but I > suspect that the similarity of the second sub-PRNG algorithm to that of the > MT itself is part of the problem. > > It is *likely* that the scheme I implemented will be okay (our array-seed > expansion algorithm is similar to the integer-seed expansion and *probably* > insulates the sprouted MTs from their MT generating source), but that > remains to be demonstrated. Right... I think the way to think about this is to split it into two pieces. First, there's the issue that correlated streams exist at all. That they do for MT is annoying and closely related to why we prefer other PRNGs these days. When doing the analysis of the collision probability of randomly sprouted generators, this effectively reduces the size of the space and increases the risk of collision. For MT this by itself isn't a big deal because the space is so ludicrously large, but for the more modern PRNGs with smaller state spaces you definitely want these correlated streams not to exist at all. Fortunately I believe this is a design criterion that modern PRNGs take into account, but yeah, one needs to check this. (This kind of issue is also why I have a fondness for PRNGs designed on the "get a bigger hammer" principle, like AES-CTR :-).) Second, there's the issue that that paper talks about, where *if* correlated streams exist, then the structure of your PRNG might be such that when sampling a new seed from an old PRNG, you're unreasonably likely to hit one of them. (The most obvious example of this would be if you use an archaic PRNG design that works by outputting its internal state and then mixing it -- if you use the output of this PRNG to seed a new PRNG then the two will be identical modulo some lag!) If your sprouting procedure is subject to this kind of phenomenon, then it totally invalidates the use of simple probability theory to analyze the chance of collisions. But, I ignored this in my analysis because it's definitely solveable :-). All we need is some deterministic function that we can apply to the output of our original PRNG that will give us back something "effectively random" to use as our sprouted seed, totally destroying all correlations. And this problem of destroying correlations is well-studied in the crypto world (basically destroying these correlations *is* the entire problem of cryptographic primitive design), so even if we have doubts about the specific initialization functions usually used with MT, we have available to us lots of primitives that we're *very* confident will do a *very* thorough job, and sprouting isn't a performance sensitive operations so it doesn't matter if it takes a little longer using some crypto hash instead of the classic MT seeding algorithm. And then the simple birthday calculations are appropriate again. -n -- Nathaniel J. Smith -- https://vorpus.org From charlesr.harris at gmail.com Wed May 18 17:09:36 2016 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 18 May 2016 15:09:36 -0600 Subject: [Numpy-discussion] Scipy 2016 attending Message-ID: