
Hi all, I've just issued a PR that will hopefully fix a whole class of buggy behavior that both new and experienced yt users commonly run into. Specifically, I'd like it if we could turn off data serialization by default. This changes a long-lived default value in yt's configuration, so I wanted to bring this change to the attention of both the yt user and developer community. What is data serialization? Currently, yt will save the result of certain expensive calculations, including projections, the structure of the grid hierarchy, and the list of fields present in the data. While this does have the beneficial effect of saving time when a user needs to repetitively calculate these quantities on the same dataset, it has a number of features which lead to buggy, annoying behavior. Specifically, If I am developing my simulation code or repeatedly restarting my code, searching for a way to grind past a code crash, I will quite often regenerate the same simulation output file over and over, changing a line of code or switching out the value of a parameter each time. If yt's data serialization is turned on, it's likely that yt's visualizations will correspond to old versions of the data file. Since only certain operations are serialized, it's also possible for yt to get into an inconsistent state - one operation will show the current data file, while another operation will show an old version. It's possible to fix a bug in your code, but because yt is still loading the old data, you won't be able to tell that your bug is fixed until you realize that you have .yt and .harrays files littering your filesystem. I've personally wasted a lot of time due to yt's serialization 'feature' and denizens of our IRC channel and mailing list can attest to how often new users run into this behavior as well. My pull request only turns off serlialization by default, it doesn't disable the capability completely. Once the pull request is merged in, you can turn on serialization either by adding an entry to your config file: $ cat ~/.yt/config [yt] serialize = True Or on a per-script basis: from yt.config import ytcfg ytcfg['yt', 'serialize'] = 'True' from yt.mods import * The pull request is here: https://bitbucket.org/yt_analysis/yt/pull-request/558 I know several of you are big fans of this feature, so if you object to this change please leave a comment on the pull request so we can figure out a way forward. -Nathan

On Tue, Jul 23, 2013 at 3:27 PM, Nathan Goldbaum <nathan12343@gmail.com> wrote:
Hi all,
I've just issued a PR that will hopefully fix a whole class of buggy behavior that both new and experienced yt users commonly run into. Specifically, I'd like it if we could turn off data serialization by default. This changes a long-lived default value in yt's configuration, so I wanted to bring this change to the attention of both the yt user and developer community.
What is data serialization? Currently, yt will save the result of certain expensive calculations, including projections, the structure of the grid hierarchy, and the list of fields present in the data. While this does have the beneficial effect of saving time when a user needs to repetitively calculate these quantities on the same dataset, it has a number of features which lead to buggy, annoying behavior.
Specifically, If I am developing my simulation code or repeatedly restarting my code, searching for a way to grind past a code crash, I will quite often regenerate the same simulation output file over and over, changing a line of code or switching out the value of a parameter each time.
If yt's data serialization is turned on, it's likely that yt's visualizations will correspond to old versions of the data file. Since only certain operations are serialized, it's also possible for yt to get into an inconsistent state - one operation will show the current data file, while another operation will show an old version.
It's possible to fix a bug in your code, but because yt is still loading the old data, you won't be able to tell that your bug is fixed until you realize that you have .yt and .harrays files littering your filesystem.
I've personally wasted a lot of time due to yt's serialization 'feature' and denizens of our IRC channel and mailing list can attest to how often new users run into this behavior as well.
My pull request only turns off serlialization by default, it doesn't disable the capability completely. Once the pull request is merged in, you can turn on serialization either by adding an entry to your config file:
$ cat ~/.yt/config
[yt] serialize = True
Or on a per-script basis:
from yt.config import ytcfg ytcfg['yt', 'serialize'] = 'True' from yt.mods import *
The pull request is here: https://bitbucket.org/yt_analysis/yt/pull-request/558
I know several of you are big fans of this feature, so if you object to this change please leave a comment on the pull request so we can figure out a way forward.
I think this is long overdue, for all the reasons you list. Auto-serialization treated a lot of symptoms that we have since improved, or that we should address more directly -- speed of hierarchy construction, saving data that we want to retain, and detecting fields. +1! -Matt
-Nathan
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

Oh god...+100000000000 <sound of coins dinging in 8 bit glory> On Jul 23, 2013 7:08 PM, "Matthew Turk" <matthewturk@gmail.com> wrote:
Hi all,
I've just issued a PR that will hopefully fix a whole class of buggy behavior that both new and experienced yt users commonly run into. Specifically, I'd like it if we could turn off data serialization by default. This changes a long-lived default value in yt's configuration, so I wanted to bring this change to the attention of both the yt user and developer community.
What is data serialization? Currently, yt will save the result of certain expensive calculations, including projections, the structure of the grid hierarchy, and the list of fields present in the data. While this does have the beneficial effect of saving time when a user needs to repetitively calculate these quantities on the same dataset, it has a number of features which lead to buggy, annoying behavior.
Specifically, If I am developing my simulation code or repeatedly restarting my code, searching for a way to grind past a code crash, I will quite often regenerate the same simulation output file over and over, changing a
code or switching out the value of a parameter each time.
If yt's data serialization is turned on, it's likely that yt's visualizations will correspond to old versions of the data file. Since only certain operations are serialized, it's also possible for yt to get into an inconsistent state - one operation will show the current data file, while another operation will show an old version.
It's possible to fix a bug in your code, but because yt is still loading
old data, you won't be able to tell that your bug is fixed until you realize that you have .yt and .harrays files littering your filesystem.
I've personally wasted a lot of time due to yt's serialization 'feature' and denizens of our IRC channel and mailing list can attest to how often new users run into this behavior as well.
My pull request only turns off serlialization by default, it doesn't disable the capability completely. Once the pull request is merged in, you can turn on serialization either by adding an entry to your config file:
$ cat ~/.yt/config
[yt] serialize = True
Or on a per-script basis:
from yt.config import ytcfg ytcfg['yt', 'serialize'] = 'True' from yt.mods import *
The pull request is here: https://bitbucket.org/yt_analysis/yt/pull-request/558
I know several of you are big fans of this feature, so if you object to
On Tue, Jul 23, 2013 at 3:27 PM, Nathan Goldbaum <nathan12343@gmail.com> wrote: line of the this
change please leave a comment on the pull request so we can figure out a way forward.
I think this is long overdue, for all the reasons you list. Auto-serialization treated a lot of symptoms that we have since improved, or that we should address more directly -- speed of hierarchy construction, saving data that we want to retain, and detecting fields.
+1!
-Matt
-Nathan
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

I'm +1 on changing the default. Thanks for making a announcement about the change. How hard would it be to make an individual routine get serialized on demand? For instance, proj = pf.h.proj( ... serizlize = True) Or, would it work to do ytcfg['yt', 'serialize'] = 'True' do stuff ytcfg['yt', 'serialize'] = 'False' ? d. On Tue, Jul 23, 2013 at 6:20 PM, j s oishi <jsoishi@gmail.com> wrote:
Oh god...+100000000000 <sound of coins dinging in 8 bit glory> On Jul 23, 2013 7:08 PM, "Matthew Turk" <matthewturk@gmail.com> wrote:
Hi all,
I've just issued a PR that will hopefully fix a whole class of buggy behavior that both new and experienced yt users commonly run into. Specifically, I'd like it if we could turn off data serialization by default. This changes a long-lived default value in yt's configuration, so I wanted to bring this change to the attention of both the yt user and developer community.
What is data serialization? Currently, yt will save the result of certain expensive calculations, including projections, the structure of the grid hierarchy, and the list of fields present in the data. While this does have the beneficial effect of saving time when a user needs to repetitively calculate these quantities on the same dataset, it has a number of features which lead to buggy, annoying behavior.
Specifically, If I am developing my simulation code or repeatedly restarting my code, searching for a way to grind past a code crash, I will quite often regenerate the same simulation output file over and over, changing a
code or switching out the value of a parameter each time.
If yt's data serialization is turned on, it's likely that yt's visualizations will correspond to old versions of the data file. Since only certain operations are serialized, it's also possible for yt to get into an inconsistent state - one operation will show the current data file, while another operation will show an old version.
It's possible to fix a bug in your code, but because yt is still loading the old data, you won't be able to tell that your bug is fixed until you realize that you have .yt and .harrays files littering your filesystem.
I've personally wasted a lot of time due to yt's serialization 'feature' and denizens of our IRC channel and mailing list can attest to how often new users run into this behavior as well.
My pull request only turns off serlialization by default, it doesn't disable the capability completely. Once the pull request is merged in, you can turn on serialization either by adding an entry to your config file:
$ cat ~/.yt/config
[yt] serialize = True
Or on a per-script basis:
from yt.config import ytcfg ytcfg['yt', 'serialize'] = 'True' from yt.mods import *
The pull request is here: https://bitbucket.org/yt_analysis/yt/pull-request/558
I know several of you are big fans of this feature, so if you object to
On Tue, Jul 23, 2013 at 3:27 PM, Nathan Goldbaum <nathan12343@gmail.com> wrote: line of this
change please leave a comment on the pull request so we can figure out a way forward.
I think this is long overdue, for all the reasons you list. Auto-serialization treated a lot of symptoms that we have since improved, or that we should address more directly -- speed of hierarchy construction, saving data that we want to retain, and detecting fields.
+1!
-Matt
-Nathan
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
-- -- Sent from a computer.

Hey David, I don't think you can modify the ytcfg object after loading up yt, so your second example won't work. As for your first example, I think that's possible via pickling: with open('data.pickle', 'wb') as pkl_file: s = cPickle.dumps(proj, pkl_file, protocol=-1) You can then load it later like so: with open('data.pickle', 'rb') as pkl_file: proj = cPickle.load(pkl_file) You can do similar things using pf.h.save_object() and load_object(), but in a bit of a chicken and egg situation, you'll need serialization turned on in your config parameters for that to work. -Nathan On Tue, Jul 23, 2013 at 8:03 PM, David Collins <dcollins4096@gmail.com>wrote:
I'm +1 on changing the default. Thanks for making a announcement about the change.
How hard would it be to make an individual routine get serialized on demand? For instance, proj = pf.h.proj( ... serizlize = True)
Or, would it work to do
ytcfg['yt', 'serialize'] = 'True' do stuff ytcfg['yt', 'serialize'] = 'False' ?
d.
On Tue, Jul 23, 2013 at 6:20 PM, j s oishi <jsoishi@gmail.com> wrote:
Oh god...+100000000000 <sound of coins dinging in 8 bit glory> On Jul 23, 2013 7:08 PM, "Matthew Turk" <matthewturk@gmail.com> wrote:
Hi all,
I've just issued a PR that will hopefully fix a whole class of buggy behavior that both new and experienced yt users commonly run into. Specifically, I'd like it if we could turn off data serialization by default. This changes a long-lived default value in yt's configuration, so I wanted to bring this change to the attention of both the yt user and developer community.
What is data serialization? Currently, yt will save the result of certain expensive calculations, including projections, the structure of the grid hierarchy, and the list of fields present in the data. While this does have the beneficial effect of saving time when a user needs to repetitively calculate these quantities on the same dataset, it has a number of features which lead to buggy, annoying behavior.
Specifically, If I am developing my simulation code or repeatedly restarting my code, searching for a way to grind past a code crash, I will quite often regenerate the same simulation output file over and over, changing a
On Tue, Jul 23, 2013 at 3:27 PM, Nathan Goldbaum <nathan12343@gmail.com> wrote: line of
code or switching out the value of a parameter each time.
If yt's data serialization is turned on, it's likely that yt's visualizations will correspond to old versions of the data file. Since only certain operations are serialized, it's also possible for yt to get into an inconsistent state - one operation will show the current data file, while another operation will show an old version.
It's possible to fix a bug in your code, but because yt is still loading the old data, you won't be able to tell that your bug is fixed until you realize that you have .yt and .harrays files littering your filesystem.
I've personally wasted a lot of time due to yt's serialization 'feature' and denizens of our IRC channel and mailing list can attest to how often new users run into this behavior as well.
My pull request only turns off serlialization by default, it doesn't disable the capability completely. Once the pull request is merged in, you can turn on serialization either by adding an entry to your config file:
$ cat ~/.yt/config
[yt] serialize = True
Or on a per-script basis:
from yt.config import ytcfg ytcfg['yt', 'serialize'] = 'True' from yt.mods import *
The pull request is here: https://bitbucket.org/yt_analysis/yt/pull-request/558
I know several of you are big fans of this feature, so if you object to this change please leave a comment on the pull request so we can figure out a way forward.
I think this is long overdue, for all the reasons you list. Auto-serialization treated a lot of symptoms that we have since improved, or that we should address more directly -- speed of hierarchy construction, saving data that we want to retain, and detecting fields.
+1!
-Matt
-Nathan
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
-- -- Sent from a computer.
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

Oops, that code snippet should read: with open('data.pickle', 'wb') as pkl_file: cPickle.dump(proj, pkl_file, protocol=-1) .. with open('data.pickle', 'rb') as pkl_file: proj = cPickle.load(pkl_file) On Wed, Jul 24, 2013 at 11:55 PM, Nathan Goldbaum <goldbaum@ucolick.org>wrote:
Hey David,
I don't think you can modify the ytcfg object after loading up yt, so your second example won't work.
As for your first example, I think that's possible via pickling:
with open('data.pickle', 'wb') as pkl_file: s = cPickle.dumps(proj, pkl_file, protocol=-1)
You can then load it later like so:
with open('data.pickle', 'rb') as pkl_file: proj = cPickle.load(pkl_file)
You can do similar things using pf.h.save_object() and load_object(), but in a bit of a chicken and egg situation, you'll need serialization turned on in your config parameters for that to work.
-Nathan
On Tue, Jul 23, 2013 at 8:03 PM, David Collins <dcollins4096@gmail.com>wrote:
I'm +1 on changing the default. Thanks for making a announcement about the change.
How hard would it be to make an individual routine get serialized on demand? For instance, proj = pf.h.proj( ... serizlize = True)
Or, would it work to do
ytcfg['yt', 'serialize'] = 'True' do stuff ytcfg['yt', 'serialize'] = 'False' ?
d.
On Tue, Jul 23, 2013 at 6:20 PM, j s oishi <jsoishi@gmail.com> wrote:
Oh god...+100000000000 <sound of coins dinging in 8 bit glory> On Jul 23, 2013 7:08 PM, "Matthew Turk" <matthewturk@gmail.com> wrote:
Hi all,
I've just issued a PR that will hopefully fix a whole class of buggy behavior that both new and experienced yt users commonly run into. Specifically, I'd like it if we could turn off data serialization by default. This changes a long-lived default value in yt's configuration, so I wanted to bring this change to the attention of both the yt user and developer community.
What is data serialization? Currently, yt will save the result of certain expensive calculations, including projections, the structure of the grid hierarchy, and the list of fields present in the data. While this does have the beneficial effect of saving time when a user needs to repetitively calculate these quantities on the same dataset, it has a number of features which lead to buggy, annoying behavior.
Specifically, If I am developing my simulation code or repeatedly restarting my code, searching for a way to grind past a code crash, I will quite often regenerate the same simulation output file over and over, changing a
On Tue, Jul 23, 2013 at 3:27 PM, Nathan Goldbaum <nathan12343@gmail.com> wrote: line of
code or switching out the value of a parameter each time.
If yt's data serialization is turned on, it's likely that yt's visualizations will correspond to old versions of the data file. Since only certain operations are serialized, it's also possible for yt to get into an inconsistent state - one operation will show the current data file, while another operation will show an old version.
It's possible to fix a bug in your code, but because yt is still loading the old data, you won't be able to tell that your bug is fixed until you realize that you have .yt and .harrays files littering your filesystem.
I've personally wasted a lot of time due to yt's serialization 'feature' and denizens of our IRC channel and mailing list can attest to how often new users run into this behavior as well.
My pull request only turns off serlialization by default, it doesn't disable the capability completely. Once the pull request is merged in, you can turn on serialization either by adding an entry to your config file:
$ cat ~/.yt/config
[yt] serialize = True
Or on a per-script basis:
from yt.config import ytcfg ytcfg['yt', 'serialize'] = 'True' from yt.mods import *
The pull request is here: https://bitbucket.org/yt_analysis/yt/pull-request/558
I know several of you are big fans of this feature, so if you object to this change please leave a comment on the pull request so we can figure out a way forward.
I think this is long overdue, for all the reasons you list. Auto-serialization treated a lot of symptoms that we have since improved, or that we should address more directly -- speed of hierarchy construction, saving data that we want to retain, and detecting fields.
+1!
-Matt
-Nathan
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
-- -- Sent from a computer.
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

Thanks for the examples. I'm a little unclear about your last statement-- will pickling the objects directly work with serialization off? Thanks! d. On Thu, Jul 25, 2013 at 12:55 AM, Nathan Goldbaum <goldbaum@ucolick.org>wrote:
Hey David,
I don't think you can modify the ytcfg object after loading up yt, so your second example won't work.
As for your first example, I think that's possible via pickling:
with open('data.pickle', 'wb') as pkl_file: s = cPickle.dumps(proj, pkl_file, protocol=-1)
You can then load it later like so:
with open('data.pickle', 'rb') as pkl_file: proj = cPickle.load(pkl_file)
You can do similar things using pf.h.save_object() and load_object(), but in a bit of a chicken and egg situation, you'll need serialization turned on in your config parameters for that to work.
-Nathan
On Tue, Jul 23, 2013 at 8:03 PM, David Collins <dcollins4096@gmail.com>wrote:
I'm +1 on changing the default. Thanks for making a announcement about the change.
How hard would it be to make an individual routine get serialized on demand? For instance, proj = pf.h.proj( ... serizlize = True)
Or, would it work to do
ytcfg['yt', 'serialize'] = 'True' do stuff ytcfg['yt', 'serialize'] = 'False' ?
d.
On Tue, Jul 23, 2013 at 6:20 PM, j s oishi <jsoishi@gmail.com> wrote:
Oh god...+100000000000 <sound of coins dinging in 8 bit glory> On Jul 23, 2013 7:08 PM, "Matthew Turk" <matthewturk@gmail.com> wrote:
Hi all,
I've just issued a PR that will hopefully fix a whole class of buggy behavior that both new and experienced yt users commonly run into. Specifically, I'd like it if we could turn off data serialization by default. This changes a long-lived default value in yt's configuration, so I wanted to bring this change to the attention of both the yt user and developer community.
What is data serialization? Currently, yt will save the result of certain expensive calculations, including projections, the structure of the grid hierarchy, and the list of fields present in the data. While this does have the beneficial effect of saving time when a user needs to repetitively calculate these quantities on the same dataset, it has a number of features which lead to buggy, annoying behavior.
Specifically, If I am developing my simulation code or repeatedly restarting my code, searching for a way to grind past a code crash, I will quite often regenerate the same simulation output file over and over, changing a
On Tue, Jul 23, 2013 at 3:27 PM, Nathan Goldbaum <nathan12343@gmail.com> wrote: line of
code or switching out the value of a parameter each time.
If yt's data serialization is turned on, it's likely that yt's visualizations will correspond to old versions of the data file. Since only certain operations are serialized, it's also possible for yt to get into an inconsistent state - one operation will show the current data file, while another operation will show an old version.
It's possible to fix a bug in your code, but because yt is still loading the old data, you won't be able to tell that your bug is fixed until you realize that you have .yt and .harrays files littering your filesystem.
I've personally wasted a lot of time due to yt's serialization 'feature' and denizens of our IRC channel and mailing list can attest to how often new users run into this behavior as well.
My pull request only turns off serlialization by default, it doesn't disable the capability completely. Once the pull request is merged in, you can turn on serialization either by adding an entry to your config file:
$ cat ~/.yt/config
[yt] serialize = True
Or on a per-script basis:
from yt.config import ytcfg ytcfg['yt', 'serialize'] = 'True' from yt.mods import *
The pull request is here: https://bitbucket.org/yt_analysis/yt/pull-request/558
I know several of you are big fans of this feature, so if you object to this change please leave a comment on the pull request so we can figure out a way forward.
I think this is long overdue, for all the reasons you list. Auto-serialization treated a lot of symptoms that we have since improved, or that we should address more directly -- speed of hierarchy construction, saving data that we want to retain, and detecting fields.
+1!
-Matt
-Nathan
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
-- -- Sent from a computer.
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
-- -- Sent from a computer.

On Thu, Jul 25, 2013 at 9:00 AM, David Collins <dcollins4096@gmail.com> wrote:
Thanks for the examples.
I'm a little unclear about your last statement-- will pickling the objects directly work with serialization off?
From your original examples, I think we absolutely need to move to the
Pickling should, yes. Calling save_data I think will not. place where we manually save proejctions -- I think that we all are in support of this being available, it's jjust the default that's a bummer. :) -Matt
Thanks! d.
On Thu, Jul 25, 2013 at 12:55 AM, Nathan Goldbaum <goldbaum@ucolick.org> wrote:
Hey David,
I don't think you can modify the ytcfg object after loading up yt, so your second example won't work.
As for your first example, I think that's possible via pickling:
with open('data.pickle', 'wb') as pkl_file: s = cPickle.dumps(proj, pkl_file, protocol=-1)
You can then load it later like so:
with open('data.pickle', 'rb') as pkl_file: proj = cPickle.load(pkl_file)
You can do similar things using pf.h.save_object() and load_object(), but in a bit of a chicken and egg situation, you'll need serialization turned on in your config parameters for that to work.
-Nathan
On Tue, Jul 23, 2013 at 8:03 PM, David Collins <dcollins4096@gmail.com> wrote:
I'm +1 on changing the default. Thanks for making a announcement about the change.
How hard would it be to make an individual routine get serialized on demand? For instance, proj = pf.h.proj( ... serizlize = True)
Or, would it work to do
ytcfg['yt', 'serialize'] = 'True' do stuff ytcfg['yt', 'serialize'] = 'False' ?
d.
On Tue, Jul 23, 2013 at 6:20 PM, j s oishi <jsoishi@gmail.com> wrote:
Oh god...+100000000000 <sound of coins dinging in 8 bit glory>
On Jul 23, 2013 7:08 PM, "Matthew Turk" <matthewturk@gmail.com> wrote:
On Tue, Jul 23, 2013 at 3:27 PM, Nathan Goldbaum <nathan12343@gmail.com> wrote:
Hi all,
I've just issued a PR that will hopefully fix a whole class of buggy behavior that both new and experienced yt users commonly run into. Specifically, I'd like it if we could turn off data serialization by default. This changes a long-lived default value in yt's configuration, so I wanted to bring this change to the attention of both the yt user and developer community.
What is data serialization? Currently, yt will save the result of certain expensive calculations, including projections, the structure of the grid hierarchy, and the list of fields present in the data. While this does have the beneficial effect of saving time when a user needs to repetitively calculate these quantities on the same dataset, it has a number of features which lead to buggy, annoying behavior.
Specifically, If I am developing my simulation code or repeatedly restarting my code, searching for a way to grind past a code crash, I will quite often regenerate the same simulation output file over and over, changing a line of code or switching out the value of a parameter each time.
If yt's data serialization is turned on, it's likely that yt's visualizations will correspond to old versions of the data file. Since only certain operations are serialized, it's also possible for yt to get into an inconsistent state - one operation will show the current data file, while another operation will show an old version.
It's possible to fix a bug in your code, but because yt is still loading the old data, you won't be able to tell that your bug is fixed until you realize that you have .yt and .harrays files littering your filesystem.
I've personally wasted a lot of time due to yt's serialization 'feature' and denizens of our IRC channel and mailing list can attest to how often new users run into this behavior as well.
My pull request only turns off serlialization by default, it doesn't disable the capability completely. Once the pull request is merged in, you can turn on serialization either by adding an entry to your config file:
$ cat ~/.yt/config
[yt] serialize = True
Or on a per-script basis:
from yt.config import ytcfg ytcfg['yt', 'serialize'] = 'True' from yt.mods import *
The pull request is here: https://bitbucket.org/yt_analysis/yt/pull-request/558
I know several of you are big fans of this feature, so if you object to this change please leave a comment on the pull request so we can figure out a way forward.
I think this is long overdue, for all the reasons you list. Auto-serialization treated a lot of symptoms that we have since improved, or that we should address more directly -- speed of hierarchy construction, saving data that we want to retain, and detecting fields.
+1!
-Matt
-Nathan
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
-- -- Sent from a computer.
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
-- -- Sent from a computer.
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

Hi, sorry I wasn't clear, picking directly will work. pf.h.save_object and pf.h.load_object will exit after doing nothing. On Thu, Jul 25, 2013 at 9:00 AM, David Collins <dcollins4096@gmail.com>wrote:
Thanks for the examples.
I'm a little unclear about your last statement-- will pickling the objects directly work with serialization off?
Thanks! d.
On Thu, Jul 25, 2013 at 12:55 AM, Nathan Goldbaum <goldbaum@ucolick.org>wrote:
Hey David,
I don't think you can modify the ytcfg object after loading up yt, so your second example won't work.
As for your first example, I think that's possible via pickling:
with open('data.pickle', 'wb') as pkl_file: s = cPickle.dumps(proj, pkl_file, protocol=-1)
You can then load it later like so:
with open('data.pickle', 'rb') as pkl_file: proj = cPickle.load(pkl_file)
You can do similar things using pf.h.save_object() and load_object(), but in a bit of a chicken and egg situation, you'll need serialization turned on in your config parameters for that to work.
-Nathan
On Tue, Jul 23, 2013 at 8:03 PM, David Collins <dcollins4096@gmail.com>wrote:
I'm +1 on changing the default. Thanks for making a announcement about the change.
How hard would it be to make an individual routine get serialized on demand? For instance, proj = pf.h.proj( ... serizlize = True)
Or, would it work to do
ytcfg['yt', 'serialize'] = 'True' do stuff ytcfg['yt', 'serialize'] = 'False' ?
d.
On Tue, Jul 23, 2013 at 6:20 PM, j s oishi <jsoishi@gmail.com> wrote:
Oh god...+100000000000 <sound of coins dinging in 8 bit glory> On Jul 23, 2013 7:08 PM, "Matthew Turk" <matthewturk@gmail.com> wrote:
Hi all,
I've just issued a PR that will hopefully fix a whole class of buggy behavior that both new and experienced yt users commonly run into. Specifically, I'd like it if we could turn off data serialization by default. This changes a long-lived default value in yt's configuration, so I wanted to bring this change to the attention of both the yt user and developer community.
What is data serialization? Currently, yt will save the result of certain expensive calculations, including projections, the structure of the grid hierarchy, and the list of fields present in the data. While this does have the beneficial effect of saving time when a user needs to repetitively calculate these quantities on the same dataset, it has a number of features which lead to buggy, annoying behavior.
Specifically, If I am developing my simulation code or repeatedly restarting my code, searching for a way to grind past a code crash, I will quite often regenerate the same simulation output file over and over, changing a
On Tue, Jul 23, 2013 at 3:27 PM, Nathan Goldbaum < nathan12343@gmail.com> wrote: line of
code or switching out the value of a parameter each time.
If yt's data serialization is turned on, it's likely that yt's visualizations will correspond to old versions of the data file. Since only certain operations are serialized, it's also possible for yt to get into an inconsistent state - one operation will show the current data file, while another operation will show an old version.
It's possible to fix a bug in your code, but because yt is still loading the old data, you won't be able to tell that your bug is fixed until you realize that you have .yt and .harrays files littering your filesystem.
I've personally wasted a lot of time due to yt's serialization 'feature' and denizens of our IRC channel and mailing list can attest to how often new users run into this behavior as well.
My pull request only turns off serlialization by default, it doesn't disable the capability completely. Once the pull request is merged in, you can turn on serialization either by adding an entry to your config file:
$ cat ~/.yt/config
[yt] serialize = True
Or on a per-script basis:
from yt.config import ytcfg ytcfg['yt', 'serialize'] = 'True' from yt.mods import *
The pull request is here: https://bitbucket.org/yt_analysis/yt/pull-request/558
I know several of you are big fans of this feature, so if you object to this change please leave a comment on the pull request so we can figure out a way forward.
I think this is long overdue, for all the reasons you list. Auto-serialization treated a lot of symptoms that we have since improved, or that we should address more directly -- speed of hierarchy construction, saving data that we want to retain, and detecting fields.
+1!
-Matt
-Nathan
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
-- -- Sent from a computer.
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
-- -- Sent from a computer.
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

+1. Thank you, Nathan. On Thu, Jul 25, 2013 at 9:03 AM, Nathan Goldbaum <nathan12343@gmail.com>wrote:
Hi, sorry I wasn't clear, picking directly will work. pf.h.save_object and pf.h.load_object will exit after doing nothing.
On Thu, Jul 25, 2013 at 9:00 AM, David Collins <dcollins4096@gmail.com>wrote:
Thanks for the examples.
I'm a little unclear about your last statement-- will pickling the objects directly work with serialization off?
Thanks! d.
On Thu, Jul 25, 2013 at 12:55 AM, Nathan Goldbaum <goldbaum@ucolick.org>wrote:
Hey David,
I don't think you can modify the ytcfg object after loading up yt, so your second example won't work.
As for your first example, I think that's possible via pickling:
with open('data.pickle', 'wb') as pkl_file: s = cPickle.dumps(proj, pkl_file, protocol=-1)
You can then load it later like so:
with open('data.pickle', 'rb') as pkl_file: proj = cPickle.load(pkl_file)
You can do similar things using pf.h.save_object() and load_object(), but in a bit of a chicken and egg situation, you'll need serialization turned on in your config parameters for that to work.
-Nathan
On Tue, Jul 23, 2013 at 8:03 PM, David Collins <dcollins4096@gmail.com>wrote:
I'm +1 on changing the default. Thanks for making a announcement about the change.
How hard would it be to make an individual routine get serialized on demand? For instance, proj = pf.h.proj( ... serizlize = True)
Or, would it work to do
ytcfg['yt', 'serialize'] = 'True' do stuff ytcfg['yt', 'serialize'] = 'False' ?
d.
On Tue, Jul 23, 2013 at 6:20 PM, j s oishi <jsoishi@gmail.com> wrote:
Oh god...+100000000000 <sound of coins dinging in 8 bit glory> On Jul 23, 2013 7:08 PM, "Matthew Turk" <matthewturk@gmail.com> wrote:
On Tue, Jul 23, 2013 at 3:27 PM, Nathan Goldbaum < nathan12343@gmail.com> wrote: > Hi all, > > I've just issued a PR that will hopefully fix a whole class of buggy > behavior that both new and experienced yt users commonly run into. > Specifically, I'd like it if we could turn off data serialization by > default. This changes a long-lived default value in yt's configuration, so > I wanted to bring this change to the attention of both the yt user and > developer community. > > What is data serialization? Currently, yt will save the result of certain > expensive calculations, including projections, the structure of the grid > hierarchy, and the list of fields present in the data. While this does have > the beneficial effect of saving time when a user needs to repetitively > calculate these quantities on the same dataset, it has a number of features > which lead to buggy, annoying behavior. > > Specifically, If I am developing my simulation code or repeatedly restarting > my code, searching for a way to grind past a code crash, I will quite often > regenerate the same simulation output file over and over, changing a line of > code or switching out the value of a parameter each time. > > If yt's data serialization is turned on, it's likely that yt's > visualizations will correspond to old versions of the data file. Since only > certain operations are serialized, it's also possible for yt to get into an > inconsistent state - one operation will show the current data file, while > another operation will show an old version. > > It's possible to fix a bug in your code, but because yt is still loading the > old data, you won't be able to tell that your bug is fixed until you realize > that you have .yt and .harrays files littering your filesystem. > > I've personally wasted a lot of time due to yt's serialization 'feature' and > denizens of our IRC channel and mailing list can attest to how often new > users run into this behavior as well. > > My pull request only turns off serlialization by default, it doesn't disable > the capability completely. Once the pull request is merged in, you can turn > on serialization either by adding an entry to your config file: > > $ cat ~/.yt/config > > [yt] > serialize = True > > Or on a per-script basis: > > from yt.config import ytcfg > ytcfg['yt', 'serialize'] = 'True' > from yt.mods import * > > The pull request is here: > https://bitbucket.org/yt_analysis/yt/pull-request/558 > > I know several of you are big fans of this feature, so if you object to this > change please leave a comment on the pull request so we can figure out a way > forward.
I think this is long overdue, for all the reasons you list. Auto-serialization treated a lot of symptoms that we have since improved, or that we should address more directly -- speed of hierarchy construction, saving data that we want to retain, and detecting fields.
+1!
-Matt
> > -Nathan > > _______________________________________________ > yt-dev mailing list > yt-dev@lists.spacepope.org > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org > _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
-- -- Sent from a computer.
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
-- -- Sent from a computer.
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
-- Cameron Hummels Postdoctoral Researcher Steward Observatory University of Arizona http://chummels.org

To join the chorus of "+1"s, I think this is a great idea. I've been burned by the default behavior many a time... -Andrew On Tue, Jul 23, 2013 at 4:08 PM, Matthew Turk <matthewturk@gmail.com> wrote:
Hi all,
I've just issued a PR that will hopefully fix a whole class of buggy behavior that both new and experienced yt users commonly run into. Specifically, I'd like it if we could turn off data serialization by default. This changes a long-lived default value in yt's configuration, so I wanted to bring this change to the attention of both the yt user and developer community.
What is data serialization? Currently, yt will save the result of certain expensive calculations, including projections, the structure of the grid hierarchy, and the list of fields present in the data. While this does have the beneficial effect of saving time when a user needs to repetitively calculate these quantities on the same dataset, it has a number of features which lead to buggy, annoying behavior.
Specifically, If I am developing my simulation code or repeatedly restarting my code, searching for a way to grind past a code crash, I will quite often regenerate the same simulation output file over and over, changing a
code or switching out the value of a parameter each time.
If yt's data serialization is turned on, it's likely that yt's visualizations will correspond to old versions of the data file. Since only certain operations are serialized, it's also possible for yt to get into an inconsistent state - one operation will show the current data file, while another operation will show an old version.
It's possible to fix a bug in your code, but because yt is still loading
old data, you won't be able to tell that your bug is fixed until you realize that you have .yt and .harrays files littering your filesystem.
I've personally wasted a lot of time due to yt's serialization 'feature' and denizens of our IRC channel and mailing list can attest to how often new users run into this behavior as well.
My pull request only turns off serlialization by default, it doesn't disable the capability completely. Once the pull request is merged in, you can turn on serialization either by adding an entry to your config file:
$ cat ~/.yt/config
[yt] serialize = True
Or on a per-script basis:
from yt.config import ytcfg ytcfg['yt', 'serialize'] = 'True' from yt.mods import *
The pull request is here: https://bitbucket.org/yt_analysis/yt/pull-request/558
I know several of you are big fans of this feature, so if you object to
On Tue, Jul 23, 2013 at 3:27 PM, Nathan Goldbaum <nathan12343@gmail.com> wrote: line of the this
change please leave a comment on the pull request so we can figure out a way forward.
I think this is long overdue, for all the reasons you list. Auto-serialization treated a lot of symptoms that we have since improved, or that we should address more directly -- speed of hierarchy construction, saving data that we want to retain, and detecting fields.
+1!
-Matt
-Nathan
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

Hi all, After a bit of time and discussion about this, I've accepted Nathan's pull request. If you want to restore the original behavior, you can turn it on in ~/.yt/config with the option: [yt] serialize = True or you can place this at the very top of your yt scripts: from yt.config import ytcfg; ytcfg["yt","serialize"] = "True" -Matt On Tue, Jul 23, 2013 at 3:27 PM, Nathan Goldbaum <nathan12343@gmail.com> wrote:
Hi all,
I've just issued a PR that will hopefully fix a whole class of buggy behavior that both new and experienced yt users commonly run into. Specifically, I'd like it if we could turn off data serialization by default. This changes a long-lived default value in yt's configuration, so I wanted to bring this change to the attention of both the yt user and developer community.
What is data serialization? Currently, yt will save the result of certain expensive calculations, including projections, the structure of the grid hierarchy, and the list of fields present in the data. While this does have the beneficial effect of saving time when a user needs to repetitively calculate these quantities on the same dataset, it has a number of features which lead to buggy, annoying behavior.
Specifically, If I am developing my simulation code or repeatedly restarting my code, searching for a way to grind past a code crash, I will quite often regenerate the same simulation output file over and over, changing a line of code or switching out the value of a parameter each time.
If yt's data serialization is turned on, it's likely that yt's visualizations will correspond to old versions of the data file. Since only certain operations are serialized, it's also possible for yt to get into an inconsistent state - one operation will show the current data file, while another operation will show an old version.
It's possible to fix a bug in your code, but because yt is still loading the old data, you won't be able to tell that your bug is fixed until you realize that you have .yt and .harrays files littering your filesystem.
I've personally wasted a lot of time due to yt's serialization 'feature' and denizens of our IRC channel and mailing list can attest to how often new users run into this behavior as well.
My pull request only turns off serlialization by default, it doesn't disable the capability completely. Once the pull request is merged in, you can turn on serialization either by adding an entry to your config file:
$ cat ~/.yt/config
[yt] serialize = True
Or on a per-script basis:
from yt.config import ytcfg ytcfg['yt', 'serialize'] = 'True' from yt.mods import *
The pull request is here: https://bitbucket.org/yt_analysis/yt/pull-request/558
I know several of you are big fans of this feature, so if you object to this change please leave a comment on the pull request so we can figure out a way forward.
-Nathan
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
participants (7)
-
Andrew Myers
-
Cameron Hummels
-
David Collins
-
j s oishi
-
Matthew Turk
-
Nathan Goldbaum
-
Nathan Goldbaum