[Chicago] Kickstarter Fund to get rid of the GIL
Massimo Di Pierro
mdipierro at cs.depaul.edu
Mon Jul 25 20:37:24 CEST 2011
I can (that is what I worked on most of my life) but it would be a physics talk not a python talk therefore not sure if chipy is the appropriate venue.
Massimo
On Jul 25, 2011, at 1:02 PM, Brian Herman wrote:
> Whoa massimo can you give a talk on that stuff?
>
>
> Arigatou gozaimasu,
> (Thank you very much)
> Brian Herman
>
> brianjherman.com
> brianherman at acm.org
>
>
>
>
>
>
>
>
> On Mon, Jul 25, 2011 at 9:09 AM, Massimo Di Pierro <mdipierro at cs.depaul.edu> wrote:
> Probably the single largest community in US using BlueGene machines is the lattice QCD community (in fact it was designed in collaboration with the lattice QCD group at Columbia University). The typical data structure consists of a 4D array (called lattice) of a 4-vector of SU(3) matrices (3x3 complex double precision) and a few other similar data structures that live on a site of the 4D lattice. A typical lattice has 96x64^3 sites for a total size of 96x64^3x4x9x2x8=14GB. The total memory usage is larger because of copies and other data structures. One of the 4D dimensions is stored locally the other 3D is distributed in parallel. Each iteration of the algorithm involved computing about ~2000 elementary floating point operations per lattice site and communicating the site structure (4x9x2x8bytes) to each of the 2^3 neighbor processors. The most efficient code can use up to 100-1000 CPU with a 50-80% efficiency. So if one computing node stores 96 sites it needs to perform ~20
> K FLOPs and 8 send and 8 recv for 96x4x9x2x8 bytes each. This type of computations are limited by latency more than bandwidth. Communication is always next neighbor (this is common for all algorithms that solve or are equivalent to solving differential equations numerically).
>
>
>
> On Jul 25, 2011, at 8:51 AM, sheila miguez wrote:
>
> > On Sun, Jul 24, 2011 at 11:51 AM, Alex Gaynor <alex.gaynor at gmail.com> wrote:
> >
> >> I'll live :) Anyway, the point I was getting is not that a message passing
> >> system is not scalable, I've written code for Blue Gene/Ls so I know that
> >> message passing scales. But rather that, for problems for which
> >> shared-memory concurrency is appropriate (read: the valid cases to complain
> >> about the GIL), message passing will not be, because of the
> >> marshal/unmarshal overhead (plus data size/locality ones).
> >> ALex
> >
> > Are jobs for Blue Gene where there would be fairly sizable data
> > packets a rare use case? I don't know a lot about the typical use
> > cases, and am curious. I'm guessing the common case would be where
> > they would do analysis on things that could be split out, but am
> > curious about the size of the chunks of information.
> >
> >
> >
> > --
> > sheila
> > _______________________________________________
> > Chicago mailing list
> > Chicago at python.org
> > http://mail.python.org/mailman/listinfo/chicago
>
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/mailman/listinfo/chicago
>
> _______________________________________________
> Chicago mailing list
> Chicago at python.org
> http://mail.python.org/mailman/listinfo/chicago
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/chicago/attachments/20110725/59d72abb/attachment.html>
More information about the Chicago
mailing list