On Thu, Dec 6, 2012 at 2:44 PM, Nathan Goldbaum <nathan12343@gmail.com> wrote:
Pardon my ignorance, but is the case that computations done in 64 bit mode in enzo are normally saved to disk as 32 bit floats? If so, is there a setting I can change to make sure that my enzo datasets are always written to disk with double precision?
I disabled that particular anti-feature some time ago, with the "New_Grid_WriteGrid.C" stuff. Now in Enzo, you write to disk exactly what you store in memory.
Since most enzo calculations are done in 64 bit anyway and this change allows some pretty significant speedups, I'm +1 on this change.
On Dec 6, 2012, at 11:30 AM, Matthew Turk wrote:
Hi all,
I've been doing some benchmarking of various operations in the Enzo frontend in yt 2.x. I don't believe other frontends suffer from this, for the main reason that they're all 64 bit everywhere.
The test dataset is about ten gigs, with a bunch of grids. I'm extracting a surface, which means from a practical standpoint that I'm filling ghost zones for every grid inside the region of interest. There are many places in yt that we either upcast to 64-bit floats or that we assume 64-bits. Basically, nearly all yt-defined Cython or C operations assume 64-bit floats.
There's a large quantity of Enzo data out there that is float32 on disk, which gets passed into yt, where it gets handed around until it is upcast. There are two problems here: 1) We have a tendency to use "astype" instead of "asarray", which means the data is *always* duplicated. 2) We often do this repeatedly for the same set of grid data; nowhere is this more true than when generating ghost zones.
So for the dataset I've been working on, ghost zones are a really intense prospect. And the call to .astype("float64") actually completely dominated the operation. This comes from both copying the data, as well as casting. I found two different solutions.
The original code:
g_fields = [grid[field].astype("float64") for field in fields]
This is bad even if you're using float64 data types, since it will always copy. So it has to go. The total runtime for this dataset was 160s, and the most-expensive function was "astype" at 53 seconds.
So as a first step, I inserted a cast to "float64" if the dtype of an array inside the Enzo IO system was "float32". This way, all arrays were upcast automatically. This led me to see zero performance improvement. So I checked further and saw the "always copy" bit in astype, which I was ignorant of. This option:
g_fields = [np.asarray(grid[field], "float64") for field in fields]
is much faster, and saves a bunch of time. But 7 seconds is still spent inside "np.array", and total runtime is 107.5 seconds. This option is the fasted:
g_fields = [] for field in fields: gf = grid[field] if gf.dtype != "float64": gf = gf.astype("float64") g_fields.append(gf)
and now total runtime is 95.6 seconds, with the dominant cost *still* in _get_data_from_grid. At this point I am much more happy with the performance, although still quite disappointed, and I'll be doing line-by-line next to figure out any more micro-optimizations.
Now, the change to _get_data_from_grid *itself* will greatly impact performance for 64-bit datasets. But also updating the io.py to upcast-on-read datasets that are 32-bit will help speed things up considerably for 32-bit datasets as well. The downside is that it will be difficult to get back raw, unmodified 32-bit data from the grids, rather than 32-bit data that has been cast to 64-bits.
Is this an okay change to make?
[+-1][01]
-Matt _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org