
On Wed, Jun 24, 2015 at 9:59 PM, Trent Nelson <trent@snakebite.org> wrote:
(Sturla commented on the "import-DDoS" that you can run into on POSIX systems, which is a good example. You're saturating your underlying hardware, sure, but you're not doing useful work -- it's important to distinguish the two.)
To be clear, AFAIU the "import-DDoS" that supercomputers classically run into has nothing to do with POSIX, it has to do running systems that were designed for simulation workloads that go like: generate a bunch of data from scratch in memory, crunch on it for a while, and then spit out some summaries. So you end up with $1e11 spent on increasing the FLOP count, and the absolute minimum spent on the storage system -- basically just enough to let you load a single static binary into memory at the start of your computation, and there might even be some specific hacks in the linker to minimize cost of distributing that single binary load. (These are really weird architectures; they usually do not even have shared library support.) And the result is that when you try spinning up a Python program instead, the startup sequence produces (number of imports) * (number of entries in sys.path) * (hundreds of thousands of nodes) simultaneous stat calls hammering some poor NFS server somewhere and it falls over and dies. (I think often the network connection to the NFS server is not even using the ridiculously-fast interconnect mesh, but rather some plain-old-ethernet that gets saturated.) I could be wrong, I don't actually work with these systems myself, but that's what I've picked up. Continuing my vague and uninformed impressions, I suspect that this would actually be relatively easy to fix by hooking the import system to do something more intelligent, like nominate one node as the leader and have it do the file lookups and then tell everyone else what it found (via the existing message-passing systems). Though there is an interesting problem of how you bootstrap the hook code. But as to whether the new import hook stuff actually helps with this... I'm pretty sure most HPC centers haven't noticed that Python 3 exists yet. See above re: extremely weird architectures -- many of us are familiar with "clinging to RHEL 5" levels of conservatism, but that's nothing on "look there's only one person who ever knew how to get a working python and numpy using our bespoke compiler toolchain on this architecture that doesn't support extension module loading (!!), and they haven't touched it in years either"... There are lots of smart people working on this stuff right now. But they are starting from a pretty different place from those of us in the consumer computing world :-). -n -- Nathaniel J. Smith -- http://vorpus.org