[Numpy-discussion] Using multiprocessing (shared memory) with numpy array multiplication

Brandt Belson bbelson at princeton.edu
Wed Jun 8 14:24:57 EDT 2011


I'm attaching my files this time since it looks like the formatting was
messed up when I copied.
Brandt

On Wed, Jun 8, 2011 at 2:20 PM, <numpy-discussion-request at scipy.org> wrote:

> Send NumPy-Discussion mailing list submissions to
>        numpy-discussion at scipy.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://mail.scipy.org/mailman/listinfo/numpy-discussion
> or, via email, send a message with subject or body 'help' to
>        numpy-discussion-request at scipy.org
>
> You can reach the person managing the list at
>        numpy-discussion-owner at scipy.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of NumPy-Discussion digest..."
>
>
> Today's Topics:
>
>   1. Re: merging datetime progress (Bruce Southey)
>   2. Using multiprocessing (shared memory) with numpy  array
>      multiplication (Brandt Belson)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 08 Jun 2011 13:04:03 -0500
> From: Bruce Southey <bsouthey at gmail.com>
> Subject: Re: [Numpy-discussion] merging datetime progress
> To: numpy-discussion at scipy.org
> Message-ID: <4DEFB993.9050306 at gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> On 06/08/2011 10:26 AM, Mark Wiebe wrote:
> > On Tue, Jun 7, 2011 at 11:52 PM, Fernando Perez <fperez.net
> > <http://fperez.net>@gmail.com <http://gmail.com>> wrote:
> >
> >     On Tue, Jun 7, 2011 at 4:35 PM, Mark Wiebe <mwwiebe at gmail.com
> >     <mailto:mwwiebe at gmail.com>> wrote:
> >     > I went ahead and did the merge today as I said I wanted to, that
> >     pull
> >     > request is some further development for someone to code-review
> >     if they have
> >     > time.
> >
> >     I'm curious as to why there was a need to push ahead with the merge
> >     right away, without giving the original pull request more time for
> >     feedback?  If I'm not mistaken, the big merge was this PR:
> >
> >     https://github.com/numpy/numpy/pull/83
> >
> >     and it was just opened a few days ago, containing a massive amount of
> >     work, and so far had only received some feedback from charris,
> >     explicitly requesting a little more breakdown to make digesting it
> >     easier.
> >
> >
> > This is all true, I'll try to explain what my thought process was in
> > doing the merge. This set of changes basically takes the codebase from
> > a basically unusable datetime to a good starting point for all the
> > design discussions that we've been having on the mailing list. I
> > opened the pull request with about half of the changes that were there
> > to try and get some feedback, and kept developing on a datetime-fixes2
> > branch. When I reached the point that I later merged in, nobody had
> > responded so I added the new commits to the same pull request.
> >
> > Perhaps I should have more patience with git history-editing and
> > revisiting the development history, but I've basically come to the
> > conclusion that it's not worth the effort except on relatively minor
> > things, so I want to rather focus on the code and its design instead
> > of the particular series of commits that produced it. In doing those
> > commits, I had to repeatedly double back and implement new missing
> > functionality before returning to and finishing up what I was working on.
> >
> > For the development I'm doing now, which is related to the multitude
> > of design discussions, I'm splitting it up into more topical branches
> > partially because the work I merged provides a solid foundation for
> > doing so, and because these are things diverging from the NEP instead
> > of being things I perceived as having already had a discussion process
> > during the NEP formation.
> >
> > I tried to see if github would let me do a "dependent" pull request,
> > but it just included the commits from the branch my later development
> > was sitting on, and that's probably the main reason I wanted to do a
> > post-commit style review for this set of changes instead of
> > pre-commit. I wrote this email to try and communicate my transition
> > from pre-commit to post-commit review, but I think my wording about
> > this probably wasn't clear.
> >
> >     I realize that I'm not really an active numpy contributor in any
> >     significant way, and I see that you've put a ton of work into this,
> >     including a very detailed and impressive discussion on the list on
> >     with multiple people.  So my opinion is just that of a user, not
> >     really a core numpy developer.
> >
> >
> > I think your opinion is much more than that, particularly since you're
> > actively working on closely related projects using the same community
> > infrastructure.
> >
> >     But it seems to me that part of having numpy be a better
> >     community-driven project is precisely achieved by having the patience
> >     to allow others to provide feedback and testing, even if they aren't
> >     100% experts.  And one thing that github really shines at, is making
> >     the review/feedback process about as painless as possible (I actually
> >     find it kind of fun).
> >
> >
> > Definitely true (except for not having dependent pull requests, unless
> > my search was too shallow...). That pull request also has nothing to
> > do with the discussion we're currently having, it's more of a
> > prerequisite, so anyone who is following the discussion and wants to
> > dig in and review the code I'm doing related to that discussion will
> > be lost in a swamp. By merging this prerequisite, and introducing
> > separated, cleaner pull requests on that base that are directly from
> > issues being discussed, this kind of community collaboration is much
> > more likely to happen. I've simply done a bad job of communicating
> > this, and as I'm doing the things we're discussing I'll try and tie
> > these different elements better to encourage the ideals you're
> describing.
> >
> >     For example, with this merge, numpy HEAD right now won't even compile
> >     on x86_64, something that would easily have been caught with a bit
> >     more review, especially since it's so easy to test (even I can do
> >     that).  It's been a long time since we had a situation where numpy
> >     didn't cleanly at least build from HEAD, so if nothing else, it's a
> >     (small) sign that this particular merge could have used a few more
> >     eyes...
> >
> >
> > I apologize for that, I've grown accustomed to having little to no
> > review to my pull requests, except from Chuck whose time and effort
> > I've greatly appreciated, and has significantly improved the
> > contributions I've made. The only active C-level NumPy development
> > currently appears to be what I'm doing, and the great
> > clean-up/extension proposals that Chuck has emailed about. I would
> > like it if the buildbot system worked better to let me automatically
> > trigger some build/tests on a variety of platforms before merging a
> > branch, but it is as it is.
> >
> >     I realize that it's sometimes frustrating to have a lot of code
> >     sitting in review, and I know that certain efforts are large and
> >     self-contained enough that it's impractical to expect a detailed
> >     line-by-line review.  We've had a few such monster branches in
> ipython
> >     in the past, but at least in those cases we've always tried to ensure
> >     several core people (over skype if needed) have a chance to go over
> >     the entire big picture, discuss the main details with the author so
> >     they can know what to focus on from the large merge, and run the
> tests
> >     in as many scenarios as is realistic.  And so far we haven't really
> >     had any problems with this approach, even if it does require a little
> >     more patience in seeing that (often very high quality) work make it
> to
> >     the mainline.
> >
> >
> > That approach sounds great, and NumPy needs more active core developers!
> >
> > Cheers,
> > Mark
> >
> >
> >     Regards,
> >
> >     f
> >     _______________________________________________
> >     NumPy-Discussion mailing list
> >     NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
> >     http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> I am sorry but github pull requests do not appear to be sent to the
> numpy dev list. So you are not going to get many people to respond to
> that type of 'closed' request. Further any discussion for things that
> get merged into the master really should be on the list especially as
> many people do extensive testing.
>
> Bug fixes probably do not need further notification but feature
> additions or API/ABI changes should have wider notification. So an email
> to the list would be greatly appreciated so that interested people can
> track the request and any discussions there. Then, depending on the
> nature of the request, a second email that notifies that the request
> will be merged.
>
> I can understand Windows failures because not that many people build
> under Windows but build failures under Linux are rather hard to
> understand. If you do not test Python 2.4, 2.5, 2.6, 2.7, 3.1 and 3.2
> with the supported operating systems (mainly 32-bit and 64-bit Linux,
> Mac and Windows) then you must let those people who can and give them
> time to build and test it. That is really true when you acknowledged
> that you broke one of the 'one of the datetime API functions'.
>
> Bruce
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110608/d3de4f1a/attachment-0001.html
>
> ------------------------------
>
> Message: 2
> Date: Wed, 8 Jun 2011 14:20:16 -0400
> From: Brandt Belson <bbelson at princeton.edu>
> Subject: [Numpy-discussion] Using multiprocessing (shared memory) with
>        numpy   array multiplication
> To: numpy-discussion at scipy.org
> Message-ID: <BANLkTim=3izLfxSTybEuA8KybehpK257JA at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hello,
> I'm parallelizing some code I've written using the built in multiprocessing
> module. In my application, I need to multiply many large arrays together
> and
> sum the resulting product arrays (inner products). I noticed that when I
> parallelized this with myPool.map(...) with 8 processes (on an 8-core
> machine), the code was actually about an order of magnitude slower. I
> realized that this only happens when I use the numpy array multiplication.
> I
> defined my own array multiplication and summation. As expected, this is
> much
> slower than the numpy versions, but when I parallelized it I saw the
> roughly
> 8x speedup I expected. So something about numpy array multiplication
> prevented me from speeding it up with multiprocessing.
>
> Is there a way to speed up numpy array multiplication with multiprocessing?
> I noticed that numpy's SVD uses all of the available cores on its own, does
> numpy array multiplication do something similar already?
>
> I'm copying the code which can reproduce and summarize these results.
> Simply
> run: "python shared_mem.py" with myutil.py in the same directory.
>
> Thanks,
> Brandt
>
>
> -- myutil.py --
>
> # Utility functions
> import numpy as N
> def numpy_inner_product(snap1,snap2):
>    """ A default inner product for n-dimensional numpy arrays """
>    return N.sum(snap1*snap2.conj())
>
> def my_inner_product(a,b):
>    ip = 0
>    for r in range(a.shape[0]):
>        for c in range(a.shape[1]):
>            ip += a[r,c]*b[r,c]
>    return ip
>
> def my_random(args):
>    return N.random.random(args)
>
>
> def eval_func_tuple(f_args):
>    """Takes a tuple of a function and args, evaluates and returns result"""
>    return f_args[0](*f_args[1:])
>
>
> -- shared_mem.py --
>
>
> import myutil
> import numpy as N
> import copy
> import itertools
> import multiprocessing
> import time as T
> processes = multiprocessing.cpu_count()
> pool = multiprocessing.Pool(processes=processes)
>
> def IPs():
>    """Find inner products of all arrays with themselves"""
>    arraySize = (300,200)
>    numArrays = 50
>    arrayList = pool.map(myutil.eval_func_tuple, itertools.izip(
>
>
> itertools.repeat(myutil.my_random,numArrays),itertools.repeat(arraySize,numArrays)))
>
>    IPs = N.zeros(numArrays)
>
>    startTime = T.time()
>    for arrayIndex,arrayValue in enumerate(arrayList):
>        IPs[arrayIndex] = myutil.numpy_inner_product(arrayValue, arrayValue)
>    endTime = T.time() - startTime
>    print 'No shared memory, numpy array multiplication
> took',endTime,'seconds'
>
>    startTime = T.time()
>    innerProductList = pool.map(myutil.eval_func_tuple,
>         itertools.izip(itertools.repeat(myutil.numpy_inner_product),
>         arrayList, arrayList))
>    IPs = N.array(innerProductList)
>    endTime = T.time() - startTime
>    print 'Shared memory, numpy array multiplication took',endTime,'seconds'
>
>    startTime = T.time()
>    for arrayIndex,arrayValue in enumerate(arrayList):
>        IPs[arrayIndex] = myutil.my_inner_product(arrayValue, arrayValue)
>    endTime = T.time() - startTime
>    print 'No shared memory, my array multiplication took',endTime,'seconds'
>
>    startTime = T.time()
>    innerProductList = pool.map(myutil.eval_func_tuple,
>         itertools.izip(itertools.repeat(myutil.my_inner_product),
>         arrayList, arrayList))
>    IPs = N.array(innerProductList)
>    endTime = T.time() - startTime
>    print 'Shared memory, my array multiplication took',endTime,'seconds'
>
>
> if __name__ == '__main__':
>    print 'Using',processes,'processes'
>    IPs()
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110608/19d0c6b0/attachment.html
>
> ------------------------------
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> End of NumPy-Discussion Digest, Vol 57, Issue 36
> ************************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110608/2cb01057/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: shared_mem.py
Type: application/octet-stream
Size: 1797 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110608/2cb01057/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: myutil.py
Type: application/octet-stream
Size: 534 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110608/2cb01057/attachment-0001.obj>


More information about the NumPy-Discussion mailing list