On Tue, Jul 23, 2013 at 3:27 PM, Nathan Goldbaum firstname.lastname@example.org wrote:
I've just issued a PR that will hopefully fix a whole class of buggy behavior that both new and experienced yt users commonly run into. Specifically, I'd like it if we could turn off data serialization by default. This changes a long-lived default value in yt's configuration, so I wanted to bring this change to the attention of both the yt user and developer community.
What is data serialization? Currently, yt will save the result of certain expensive calculations, including projections, the structure of the grid hierarchy, and the list of fields present in the data. While this does have the beneficial effect of saving time when a user needs to repetitively calculate these quantities on the same dataset, it has a number of features which lead to buggy, annoying behavior.
Specifically, If I am developing my simulation code or repeatedly restarting my code, searching for a way to grind past a code crash, I will quite often regenerate the same simulation output file over and over, changing a line of code or switching out the value of a parameter each time.
If yt's data serialization is turned on, it's likely that yt's visualizations will correspond to old versions of the data file. Since only certain operations are serialized, it's also possible for yt to get into an inconsistent state - one operation will show the current data file, while another operation will show an old version.
It's possible to fix a bug in your code, but because yt is still loading the old data, you won't be able to tell that your bug is fixed until you realize that you have .yt and .harrays files littering your filesystem.
I've personally wasted a lot of time due to yt's serialization 'feature' and denizens of our IRC channel and mailing list can attest to how often new users run into this behavior as well.
My pull request only turns off serlialization by default, it doesn't disable the capability completely. Once the pull request is merged in, you can turn on serialization either by adding an entry to your config file:
$ cat ~/.yt/config
[yt] serialize = True
Or on a per-script basis:
from yt.config import ytcfg ytcfg['yt', 'serialize'] = 'True' from yt.mods import *
The pull request is here: https://bitbucket.org/yt_analysis/yt/pull-request/558
I know several of you are big fans of this feature, so if you object to this change please leave a comment on the pull request so we can figure out a way forward.
I think this is long overdue, for all the reasons you list. Auto-serialization treated a lot of symptoms that we have since improved, or that we should address more directly -- speed of hierarchy construction, saving data that we want to retain, and detecting fields.
yt-dev mailing list email@example.com http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org