To confirm, you're saying that we should avoid using astype() in future code and replace it with asarray instead, and that we don't need to use your list comprehension step in our general usage, just in this particular case?

Cameron


On Thu, Dec 6, 2012 at 1:12 PM, Matthew Turk <matthewturk@gmail.com> wrote:
Hi Cameron and everyone else,

Let me clarify one point -- the changes to _get_data_from_grid
preserve neutrality on the part of yt about the type of data that is
being fed to it, either 32 bit of 64.  But, the new version optimizes
considerably for the case where the data is not 64-bit.  In response
to Doug's reply, this particular change is designed to allow other
frontends to continue to supply 32-bit data.  A change to the
GridPatch object itself, making sure that when data gets sent into the
field_data dict is 64-bit rather than 32-bit, would eliminate the need
for neutrality in _get_data_from_grid in the SmoothedCoveringGrid
object.  So this still preserves neutrality.  Were we to move away
from neutrality, we'd see further speedup.  The change to the way
astype() was being called is going through regardless of the change to
the Enzo frontend.  The Enzo frontend change is just to ensure that
for a particular subset of Enzo data, we see maximum speedup.

Cameron, in response to your suggestion, yes, we absolutely should
avoid usage of astype() except where copies are explicitly needed.
The piece of information that this copies was in my head at some
point, but I had forgotten it.  The right function *does* exist, and
it is np.asarray, but that carried with it more overhead than my
(one-off) solution, which I've now also wrappedi nto a list
comprehension for another two seconds of savings.  It will be worth
going through the code base and looking at all of these usages of
astype and seeing which can be replaced.  One of the reasons this
particular usage was so problematic is that it was inside a tightloop,
which itself was in a tightloop, and inside another loop (not so
tight) and astype() was being called on the *same* arrays over and
over.  So it was pathologically bad.  I've optimized a bit more and I
think at this point we're at an algorithmic impasse, which is a
subject for another day.

Anybody up for an optimization hangout/sprint, we could probably come
up with benchmarks and get some changes out pretty quick for
performance.

-Matt

On Thu, Dec 6, 2012 at 3:01 PM, Cameron Hummels <chummels@gmail.com> wrote:
> Hey Matt,
>
> Thanks for optimizing the code--I know it isn't a glorious job, but it will
> help us all in the end.  I think your change looks great, but I'm curious:
> is this the sort of operation that is going to consistently come up in the
> future, or a one-off change you're making in the enzo front end?  If it is
> something we should be using for future casting, perhaps we should make a
> helper function to do this, so that everyone can consistently use this
> optimized code instead of the more common .astype cast.  Other than that,
> I'm +1 on this change.
>
> Cameron
>
> On Thu, Dec 6, 2012 at 12:50 PM, Matthew Turk <matthewturk@gmail.com> wrote:
>>
>> On Thu, Dec 6, 2012 at 2:44 PM, Nathan Goldbaum <nathan12343@gmail.com>
>> wrote:
>> > Pardon my ignorance, but is the case that computations done in 64 bit
>> > mode in enzo are normally saved to disk as 32 bit floats?  If so, is there a
>> > setting I can change to make sure that my enzo datasets are always written
>> > to disk with double precision?
>>
>> I disabled that particular anti-feature some time ago, with the
>> "New_Grid_WriteGrid.C" stuff.  Now in Enzo, you write to disk exactly
>> what you store in memory.
>>
>> >
>> > Since most enzo calculations are done in 64 bit anyway and this change
>> > allows some pretty significant speedups, I'm +1 on this change.
>> >
>> > On Dec 6, 2012, at 11:30 AM, Matthew Turk wrote:
>> >
>> >> Hi all,
>> >>
>> >> I've been doing some benchmarking of various operations in the Enzo
>> >> frontend in yt 2.x.  I don't believe other frontends suffer from this,
>> >> for the main reason that they're all 64 bit everywhere.
>> >>
>> >> The test dataset is about ten gigs, with a bunch of grids.  I'm
>> >> extracting a surface, which means from a practical standpoint that I'm
>> >> filling ghost zones for every grid inside the region of interest.
>> >> There are many places in yt that we either upcast to 64-bit floats or
>> >> that we assume 64-bits.  Basically, nearly all yt-defined Cython or C
>> >> operations assume 64-bit floats.
>> >>
>> >> There's a large quantity of Enzo data out there that is float32 on
>> >> disk, which gets passed into yt, where it gets handed around until it
>> >> is upcast.  There are two problems here: 1) We have a tendency to use
>> >> "astype" instead of "asarray", which means the data is *always*
>> >> duplicated.  2) We often do this repeatedly for the same set of grid
>> >> data; nowhere is this more true than when generating ghost zones.
>> >>
>> >> So for the dataset I've been working on, ghost zones are a really
>> >> intense prospect.  And the call to .astype("float64") actually
>> >> completely dominated the operation.  This comes from both copying the
>> >> data, as well as casting.  I found two different solutions.
>> >>
>> >> The original code:
>> >>
>> >> g_fields = [grid[field].astype("float64") for field in fields]
>> >>
>> >> This is bad even if you're using float64 data types, since it will
>> >> always copy.  So it has to go.  The total runtime for this dataset was
>> >> 160s, and the most-expensive function was "astype" at 53 seconds.
>> >>
>> >> So as a first step, I inserted a cast to "float64" if the dtype of an
>> >> array inside the Enzo IO system was "float32".  This way, all arrays
>> >> were upcast automatically.  This led me to see zero performance
>> >> improvement.  So I checked further and saw the "always copy" bit in
>> >> astype, which I was ignorant of.  This option:
>> >>
>> >> g_fields = [np.asarray(grid[field], "float64") for field in fields]
>> >>
>> >> is much faster, and saves a bunch of time.  But 7 seconds is still
>> >> spent inside "np.array", and total runtime is 107.5 seconds.  This
>> >> option is the fasted:
>> >>
>> >>        g_fields = []
>> >>        for field in fields:
>> >>            gf = grid[field]
>> >>            if gf.dtype != "float64": gf = gf.astype("float64")
>> >>            g_fields.append(gf)
>> >>
>> >> and now total runtime is 95.6 seconds, with the dominant cost *still*
>> >> in _get_data_from_grid.  At this point I am much more happy with the
>> >> performance, although still quite disappointed, and I'll be doing
>> >> line-by-line next to figure out any more micro-optimizations.
>> >>
>> >> Now, the change to _get_data_from_grid *itself* will greatly impact
>> >> performance for 64-bit datasets.  But also updating the io.py to
>> >> upcast-on-read datasets that are 32-bit will help speed things up
>> >> considerably for 32-bit datasets as well.  The downside is that it
>> >> will be difficult to get back raw, unmodified 32-bit data from the
>> >> grids, rather than 32-bit data that has been cast to 64-bits.
>> >>
>> >> Is this an okay change to make?
>> >>
>> >> [+-1][01]
>> >>
>> >> -Matt
>> >> _______________________________________________
>> >> yt-dev mailing list
>> >> yt-dev@lists.spacepope.org
>> >> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>> >
>> > _______________________________________________
>> > yt-dev mailing list
>> > yt-dev@lists.spacepope.org
>> > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>> _______________________________________________
>> yt-dev mailing list
>> yt-dev@lists.spacepope.org
>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
>
>
> _______________________________________________
> yt-dev mailing list
> yt-dev@lists.spacepope.org
> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
>
_______________________________________________
yt-dev mailing list
yt-dev@lists.spacepope.org
http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org