Mailman 3 14-hour load time for Enzo dataset with VR, vs. 30 minutes with ProjectionPlot? - yt-users

3 Mar 2016

      Hello yt people,

We're trying to render imagery of a pretty large Enzo snapshot (~160GB, 
in 330,000 grids in 512 HDF5 domains) with yt-3.3dev.

On a reasonably fast Linux machine, we can do a ProjectionPlot of a few 
variables in about 30 minutes, running single-threaded while it scans 
the data (which is what takes most of the time).   Data access pattern: 
we see it reading through each of the HDF5 files in numerical order 
(cpu0000, cpu0001, ...), taking a few seconds each, and opening each 
file exactly once.

On the same machine and same dataset, using the volume rendering API, 
the data-scanning process takes about*14 hours* (not counting any 
rendering time).   (On Blue Waters, Kalina using a similar dataset 
couldn't get it to finish within a 24-hour wall-clock limit.)   Data 
access pattern: it opens an HDF5 file many times in quick succession, 
then opens another, then opens the previous file a bunch more times.  
I'm guessing it grabs one AMR grid from each HDF5 open:

    open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0074",
    O_RDONLY) = 3
    open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0075",
    O_RDONLY) = 3
    open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
    O_RDONLY) = 3
    open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
    O_RDONLY) = 3
    open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
    O_RDONLY) = 3
    open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
    O_RDONLY) = 3
    open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
    O_RDONLY) = 3
    open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
    O_RDONLY) = 3
    open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0074",
    O_RDONLY) = 3
    open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0075",
    O_RDONLY) = 3
    open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0235",
    O_RDONLY) = 3
    open("/fe0/deslsst/renaissance/normal/RD0074/RedshiftOutput0074.cpu0357",
    O_RDONLY) = 3

This is trouble.  Is there anything we can do to make load times less 
extravagant when using VR on Enzo?   What if we ran "ds.index" before

I tried running cProfile on it, as in
    python -m cProfile myscript.py ...
Happy to point anyone at the dataset on our systems or BW, but at this 
scale it's not a very portable problem.

14-hour load time for Enzo dataset with VR, vs. 30 minutes with ProjectionPlot?

Stuart Levy

Nathan Goldbaum

Matthew Turk

Stuart Levy

Matthew Turk

Cameron Hummels

Matthew Turk

Stuart Levy

tags

participants (4)