Hi all,
I've just issued a PR that will hopefully fix a whole class of buggy behavior that both new and experienced yt users commonly run into. Specifically, I'd like it if we could turn off data serialization by default. This changes a long-lived default value in yt's configuration, so I wanted to bring this change to the attention of both the yt user and developer community.
What is data serialization? Currently, yt will save the result of certain expensive calculations, including projections, the structure of the grid hierarchy, and the list of fields present in the data. While this does have the beneficial effect of saving time when a user needs to repetitively calculate these quantities on the same dataset, it has a number of features which lead to buggy, annoying behavior.
Specifically, If I am developing my simulation code or repeatedly restarting my code, searching for a way to grind past a code crash, I will quite often regenerate the same simulation output file over and over, changing a line of code or switching out the value of a parameter each time.
If yt's data serialization is turned on, it's likely that yt's visualizations will correspond to old versions of the data file. Since only certain operations are serialized, it's also possible for yt to get into an inconsistent state - one operation will show the current data file, while another operation will show an old version.
It's possible to fix a bug in your code, but because yt is still loading the old data, you won't be able to tell that your bug is fixed until you realize that you have .yt and .harrays files littering your filesystem.
I've personally wasted a lot of time due to yt's serialization 'feature' and denizens of our IRC channel and mailing list can attest to how often new users run into this behavior as well.
My pull request only turns off serlialization by default, it doesn't disable the capability completely. Once the pull request is merged in, you can turn on serialization either by adding an entry to your config file:
$ cat ~/.yt/config
[yt] serialize = True
Or on a per-script basis:
from yt.config import ytcfg ytcfg['yt', 'serialize'] = 'True' from yt.mods import *
The pull request is here: https://bitbucket.org/yt_analysis/yt/pull-request/558
I know several of you are big fans of this feature, so if you object to this change please leave a comment on the pull request so we can figure out a way forward.
-Nathan
On Tue, Jul 23, 2013 at 3:27 PM, Nathan Goldbaum nathan12343@gmail.com wrote:
Hi all,
I've just issued a PR that will hopefully fix a whole class of buggy behavior that both new and experienced yt users commonly run into. Specifically, I'd like it if we could turn off data serialization by default. This changes a long-lived default value in yt's configuration, so I wanted to bring this change to the attention of both the yt user and developer community.
What is data serialization? Currently, yt will save the result of certain expensive calculations, including projections, the structure of the grid hierarchy, and the list of fields present in the data. While this does have the beneficial effect of saving time when a user needs to repetitively calculate these quantities on the same dataset, it has a number of features which lead to buggy, annoying behavior.
Specifically, If I am developing my simulation code or repeatedly restarting my code, searching for a way to grind past a code crash, I will quite often regenerate the same simulation output file over and over, changing a line of code or switching out the value of a parameter each time.
If yt's data serialization is turned on, it's likely that yt's visualizations will correspond to old versions of the data file. Since only certain operations are serialized, it's also possible for yt to get into an inconsistent state - one operation will show the current data file, while another operation will show an old version.
It's possible to fix a bug in your code, but because yt is still loading the old data, you won't be able to tell that your bug is fixed until you realize that you have .yt and .harrays files littering your filesystem.
I've personally wasted a lot of time due to yt's serialization 'feature' and denizens of our IRC channel and mailing list can attest to how often new users run into this behavior as well.
My pull request only turns off serlialization by default, it doesn't disable the capability completely. Once the pull request is merged in, you can turn on serialization either by adding an entry to your config file:
$ cat ~/.yt/config
[yt] serialize = True
Or on a per-script basis:
from yt.config import ytcfg ytcfg['yt', 'serialize'] = 'True' from yt.mods import *
The pull request is here: https://bitbucket.org/yt_analysis/yt/pull-request/558
I know several of you are big fans of this feature, so if you object to this change please leave a comment on the pull request so we can figure out a way forward.
I think this is long overdue, for all the reasons you list. Auto-serialization treated a lot of symptoms that we have since improved, or that we should address more directly -- speed of hierarchy construction, saving data that we want to retain, and detecting fields.
+1!
-Matt
-Nathan
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Oh god...+100000000000 <sound of coins dinging in 8 bit glory> On Jul 23, 2013 7:08 PM, "Matthew Turk" matthewturk@gmail.com wrote:
On Tue, Jul 23, 2013 at 3:27 PM, Nathan Goldbaum nathan12343@gmail.com wrote:
Hi all,
I've just issued a PR that will hopefully fix a whole class of buggy behavior that both new and experienced yt users commonly run into. Specifically, I'd like it if we could turn off data serialization by default. This changes a long-lived default value in yt's configuration,
so
I wanted to bring this change to the attention of both the yt user and developer community.
What is data serialization? Currently, yt will save the result of
certain
expensive calculations, including projections, the structure of the grid hierarchy, and the list of fields present in the data. While this does
have
the beneficial effect of saving time when a user needs to repetitively calculate these quantities on the same dataset, it has a number of
features
which lead to buggy, annoying behavior.
Specifically, If I am developing my simulation code or repeatedly
restarting
my code, searching for a way to grind past a code crash, I will quite
often
regenerate the same simulation output file over and over, changing a
line of
code or switching out the value of a parameter each time.
If yt's data serialization is turned on, it's likely that yt's visualizations will correspond to old versions of the data file. Since
only
certain operations are serialized, it's also possible for yt to get into
an
inconsistent state - one operation will show the current data file, while another operation will show an old version.
It's possible to fix a bug in your code, but because yt is still loading
the
old data, you won't be able to tell that your bug is fixed until you
realize
that you have .yt and .harrays files littering your filesystem.
I've personally wasted a lot of time due to yt's serialization 'feature'
and
denizens of our IRC channel and mailing list can attest to how often new users run into this behavior as well.
My pull request only turns off serlialization by default, it doesn't
disable
the capability completely. Once the pull request is merged in, you can
turn
on serialization either by adding an entry to your config file:
$ cat ~/.yt/config
[yt] serialize = True
Or on a per-script basis:
from yt.config import ytcfg ytcfg['yt', 'serialize'] = 'True' from yt.mods import *
The pull request is here: https://bitbucket.org/yt_analysis/yt/pull-request/558
I know several of you are big fans of this feature, so if you object to
this
change please leave a comment on the pull request so we can figure out a
way
forward.
I think this is long overdue, for all the reasons you list. Auto-serialization treated a lot of symptoms that we have since improved, or that we should address more directly -- speed of hierarchy construction, saving data that we want to retain, and detecting fields.
+1!
-Matt
-Nathan
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
I'm +1 on changing the default. Thanks for making a announcement about the change.
How hard would it be to make an individual routine get serialized on demand? For instance, proj = pf.h.proj( ... serizlize = True)
Or, would it work to do ytcfg['yt', 'serialize'] = 'True' do stuff ytcfg['yt', 'serialize'] = 'False' ?
d.
On Tue, Jul 23, 2013 at 6:20 PM, j s oishi jsoishi@gmail.com wrote:
Oh god...+100000000000 <sound of coins dinging in 8 bit glory> On Jul 23, 2013 7:08 PM, "Matthew Turk" matthewturk@gmail.com wrote:
On Tue, Jul 23, 2013 at 3:27 PM, Nathan Goldbaum nathan12343@gmail.com wrote:
Hi all,
I've just issued a PR that will hopefully fix a whole class of buggy behavior that both new and experienced yt users commonly run into. Specifically, I'd like it if we could turn off data serialization by default. This changes a long-lived default value in yt's
configuration, so
I wanted to bring this change to the attention of both the yt user and developer community.
What is data serialization? Currently, yt will save the result of
certain
expensive calculations, including projections, the structure of the grid hierarchy, and the list of fields present in the data. While this does
have
the beneficial effect of saving time when a user needs to repetitively calculate these quantities on the same dataset, it has a number of
features
which lead to buggy, annoying behavior.
Specifically, If I am developing my simulation code or repeatedly
restarting
my code, searching for a way to grind past a code crash, I will quite
often
regenerate the same simulation output file over and over, changing a
line of
code or switching out the value of a parameter each time.
If yt's data serialization is turned on, it's likely that yt's visualizations will correspond to old versions of the data file. Since
only
certain operations are serialized, it's also possible for yt to get
into an
inconsistent state - one operation will show the current data file,
while
another operation will show an old version.
It's possible to fix a bug in your code, but because yt is still
loading the
old data, you won't be able to tell that your bug is fixed until you
realize
that you have .yt and .harrays files littering your filesystem.
I've personally wasted a lot of time due to yt's serialization
'feature' and
denizens of our IRC channel and mailing list can attest to how often new users run into this behavior as well.
My pull request only turns off serlialization by default, it doesn't
disable
the capability completely. Once the pull request is merged in, you can
turn
on serialization either by adding an entry to your config file:
$ cat ~/.yt/config
[yt] serialize = True
Or on a per-script basis:
from yt.config import ytcfg ytcfg['yt', 'serialize'] = 'True' from yt.mods import *
The pull request is here: https://bitbucket.org/yt_analysis/yt/pull-request/558
I know several of you are big fans of this feature, so if you object to
this
change please leave a comment on the pull request so we can figure out
a way
forward.
I think this is long overdue, for all the reasons you list. Auto-serialization treated a lot of symptoms that we have since improved, or that we should address more directly -- speed of hierarchy construction, saving data that we want to retain, and detecting fields.
+1!
-Matt
-Nathan
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Hey David,
I don't think you can modify the ytcfg object after loading up yt, so your second example won't work.
As for your first example, I think that's possible via pickling:
with open('data.pickle', 'wb') as pkl_file: s = cPickle.dumps(proj, pkl_file, protocol=-1)
You can then load it later like so:
with open('data.pickle', 'rb') as pkl_file: proj = cPickle.load(pkl_file)
You can do similar things using pf.h.save_object() and load_object(), but in a bit of a chicken and egg situation, you'll need serialization turned on in your config parameters for that to work.
-Nathan
On Tue, Jul 23, 2013 at 8:03 PM, David Collins dcollins4096@gmail.comwrote:
I'm +1 on changing the default. Thanks for making a announcement about the change.
How hard would it be to make an individual routine get serialized on demand? For instance, proj = pf.h.proj( ... serizlize = True)
Or, would it work to do
ytcfg['yt', 'serialize'] = 'True' do stuff ytcfg['yt', 'serialize'] = 'False' ?
d.
On Tue, Jul 23, 2013 at 6:20 PM, j s oishi jsoishi@gmail.com wrote:
Oh god...+100000000000 <sound of coins dinging in 8 bit glory> On Jul 23, 2013 7:08 PM, "Matthew Turk" matthewturk@gmail.com wrote:
On Tue, Jul 23, 2013 at 3:27 PM, Nathan Goldbaum nathan12343@gmail.com wrote:
Hi all,
I've just issued a PR that will hopefully fix a whole class of buggy behavior that both new and experienced yt users commonly run into. Specifically, I'd like it if we could turn off data serialization by default. This changes a long-lived default value in yt's
configuration, so
I wanted to bring this change to the attention of both the yt user and developer community.
What is data serialization? Currently, yt will save the result of
certain
expensive calculations, including projections, the structure of the
grid
hierarchy, and the list of fields present in the data. While this
does have
the beneficial effect of saving time when a user needs to repetitively calculate these quantities on the same dataset, it has a number of
features
which lead to buggy, annoying behavior.
Specifically, If I am developing my simulation code or repeatedly
restarting
my code, searching for a way to grind past a code crash, I will quite
often
regenerate the same simulation output file over and over, changing a
line of
code or switching out the value of a parameter each time.
If yt's data serialization is turned on, it's likely that yt's visualizations will correspond to old versions of the data file.
Since only
certain operations are serialized, it's also possible for yt to get
into an
inconsistent state - one operation will show the current data file,
while
another operation will show an old version.
It's possible to fix a bug in your code, but because yt is still
loading the
old data, you won't be able to tell that your bug is fixed until you
realize
that you have .yt and .harrays files littering your filesystem.
I've personally wasted a lot of time due to yt's serialization
'feature' and
denizens of our IRC channel and mailing list can attest to how often
new
users run into this behavior as well.
My pull request only turns off serlialization by default, it doesn't
disable
the capability completely. Once the pull request is merged in, you
can turn
on serialization either by adding an entry to your config file:
$ cat ~/.yt/config
[yt] serialize = True
Or on a per-script basis:
from yt.config import ytcfg ytcfg['yt', 'serialize'] = 'True' from yt.mods import *
The pull request is here: https://bitbucket.org/yt_analysis/yt/pull-request/558
I know several of you are big fans of this feature, so if you object
to this
change please leave a comment on the pull request so we can figure out
a way
forward.
I think this is long overdue, for all the reasons you list. Auto-serialization treated a lot of symptoms that we have since improved, or that we should address more directly -- speed of hierarchy construction, saving data that we want to retain, and detecting fields.
+1!
-Matt
-Nathan
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
-- -- Sent from a computer.
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Oops, that code snippet should read:
with open('data.pickle', 'wb') as pkl_file: cPickle.dump(proj, pkl_file, protocol=-1)
..
with open('data.pickle', 'rb') as pkl_file: proj = cPickle.load(pkl_file)
On Wed, Jul 24, 2013 at 11:55 PM, Nathan Goldbaum goldbaum@ucolick.orgwrote:
Hey David,
I don't think you can modify the ytcfg object after loading up yt, so your second example won't work.
As for your first example, I think that's possible via pickling:
with open('data.pickle', 'wb') as pkl_file: s = cPickle.dumps(proj, pkl_file, protocol=-1)
You can then load it later like so:
with open('data.pickle', 'rb') as pkl_file: proj = cPickle.load(pkl_file)
You can do similar things using pf.h.save_object() and load_object(), but in a bit of a chicken and egg situation, you'll need serialization turned on in your config parameters for that to work.
-Nathan
On Tue, Jul 23, 2013 at 8:03 PM, David Collins dcollins4096@gmail.comwrote:
I'm +1 on changing the default. Thanks for making a announcement about the change.
How hard would it be to make an individual routine get serialized on demand? For instance, proj = pf.h.proj( ... serizlize = True)
Or, would it work to do
ytcfg['yt', 'serialize'] = 'True' do stuff ytcfg['yt', 'serialize'] = 'False' ?
d.
On Tue, Jul 23, 2013 at 6:20 PM, j s oishi jsoishi@gmail.com wrote:
Oh god...+100000000000 <sound of coins dinging in 8 bit glory> On Jul 23, 2013 7:08 PM, "Matthew Turk" matthewturk@gmail.com wrote:
On Tue, Jul 23, 2013 at 3:27 PM, Nathan Goldbaum nathan12343@gmail.com wrote:
Hi all,
I've just issued a PR that will hopefully fix a whole class of buggy behavior that both new and experienced yt users commonly run into. Specifically, I'd like it if we could turn off data serialization by default. This changes a long-lived default value in yt's
configuration, so
I wanted to bring this change to the attention of both the yt user and developer community.
What is data serialization? Currently, yt will save the result of
certain
expensive calculations, including projections, the structure of the
grid
hierarchy, and the list of fields present in the data. While this
does have
the beneficial effect of saving time when a user needs to repetitively calculate these quantities on the same dataset, it has a number of
features
which lead to buggy, annoying behavior.
Specifically, If I am developing my simulation code or repeatedly
restarting
my code, searching for a way to grind past a code crash, I will quite
often
regenerate the same simulation output file over and over, changing a
line of
code or switching out the value of a parameter each time.
If yt's data serialization is turned on, it's likely that yt's visualizations will correspond to old versions of the data file.
Since only
certain operations are serialized, it's also possible for yt to get
into an
inconsistent state - one operation will show the current data file,
while
another operation will show an old version.
It's possible to fix a bug in your code, but because yt is still
loading the
old data, you won't be able to tell that your bug is fixed until you
realize
that you have .yt and .harrays files littering your filesystem.
I've personally wasted a lot of time due to yt's serialization
'feature' and
denizens of our IRC channel and mailing list can attest to how often
new
users run into this behavior as well.
My pull request only turns off serlialization by default, it doesn't
disable
the capability completely. Once the pull request is merged in, you
can turn
on serialization either by adding an entry to your config file:
$ cat ~/.yt/config
[yt] serialize = True
Or on a per-script basis:
from yt.config import ytcfg ytcfg['yt', 'serialize'] = 'True' from yt.mods import *
The pull request is here: https://bitbucket.org/yt_analysis/yt/pull-request/558
I know several of you are big fans of this feature, so if you object
to this
change please leave a comment on the pull request so we can figure
out a way
forward.
I think this is long overdue, for all the reasons you list. Auto-serialization treated a lot of symptoms that we have since improved, or that we should address more directly -- speed of hierarchy construction, saving data that we want to retain, and detecting fields.
+1!
-Matt
-Nathan
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
-- -- Sent from a computer.
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Thanks for the examples.
I'm a little unclear about your last statement-- will pickling the objects directly work with serialization off?
Thanks! d.
On Thu, Jul 25, 2013 at 12:55 AM, Nathan Goldbaum goldbaum@ucolick.orgwrote:
Hey David,
I don't think you can modify the ytcfg object after loading up yt, so your second example won't work.
As for your first example, I think that's possible via pickling:
with open('data.pickle', 'wb') as pkl_file: s = cPickle.dumps(proj, pkl_file, protocol=-1)
You can then load it later like so:
with open('data.pickle', 'rb') as pkl_file: proj = cPickle.load(pkl_file)
You can do similar things using pf.h.save_object() and load_object(), but in a bit of a chicken and egg situation, you'll need serialization turned on in your config parameters for that to work.
-Nathan
On Tue, Jul 23, 2013 at 8:03 PM, David Collins dcollins4096@gmail.comwrote:
I'm +1 on changing the default. Thanks for making a announcement about the change.
How hard would it be to make an individual routine get serialized on demand? For instance, proj = pf.h.proj( ... serizlize = True)
Or, would it work to do
ytcfg['yt', 'serialize'] = 'True' do stuff ytcfg['yt', 'serialize'] = 'False' ?
d.
On Tue, Jul 23, 2013 at 6:20 PM, j s oishi jsoishi@gmail.com wrote:
Oh god...+100000000000 <sound of coins dinging in 8 bit glory> On Jul 23, 2013 7:08 PM, "Matthew Turk" matthewturk@gmail.com wrote:
On Tue, Jul 23, 2013 at 3:27 PM, Nathan Goldbaum nathan12343@gmail.com wrote:
Hi all,
I've just issued a PR that will hopefully fix a whole class of buggy behavior that both new and experienced yt users commonly run into. Specifically, I'd like it if we could turn off data serialization by default. This changes a long-lived default value in yt's
configuration, so
I wanted to bring this change to the attention of both the yt user and developer community.
What is data serialization? Currently, yt will save the result of
certain
expensive calculations, including projections, the structure of the
grid
hierarchy, and the list of fields present in the data. While this
does have
the beneficial effect of saving time when a user needs to repetitively calculate these quantities on the same dataset, it has a number of
features
which lead to buggy, annoying behavior.
Specifically, If I am developing my simulation code or repeatedly
restarting
my code, searching for a way to grind past a code crash, I will quite
often
regenerate the same simulation output file over and over, changing a
line of
code or switching out the value of a parameter each time.
If yt's data serialization is turned on, it's likely that yt's visualizations will correspond to old versions of the data file.
Since only
certain operations are serialized, it's also possible for yt to get
into an
inconsistent state - one operation will show the current data file,
while
another operation will show an old version.
It's possible to fix a bug in your code, but because yt is still
loading the
old data, you won't be able to tell that your bug is fixed until you
realize
that you have .yt and .harrays files littering your filesystem.
I've personally wasted a lot of time due to yt's serialization
'feature' and
denizens of our IRC channel and mailing list can attest to how often
new
users run into this behavior as well.
My pull request only turns off serlialization by default, it doesn't
disable
the capability completely. Once the pull request is merged in, you
can turn
on serialization either by adding an entry to your config file:
$ cat ~/.yt/config
[yt] serialize = True
Or on a per-script basis:
from yt.config import ytcfg ytcfg['yt', 'serialize'] = 'True' from yt.mods import *
The pull request is here: https://bitbucket.org/yt_analysis/yt/pull-request/558
I know several of you are big fans of this feature, so if you object
to this
change please leave a comment on the pull request so we can figure
out a way
forward.
I think this is long overdue, for all the reasons you list. Auto-serialization treated a lot of symptoms that we have since improved, or that we should address more directly -- speed of hierarchy construction, saving data that we want to retain, and detecting fields.
+1!
-Matt
-Nathan
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
-- -- Sent from a computer.
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
On Thu, Jul 25, 2013 at 9:00 AM, David Collins dcollins4096@gmail.com wrote:
Thanks for the examples.
I'm a little unclear about your last statement-- will pickling the objects directly work with serialization off?
Pickling should, yes. Calling save_data I think will not.
From your original examples, I think we absolutely need to move to the
place where we manually save proejctions -- I think that we all are in support of this being available, it's jjust the default that's a bummer. :)
-Matt
Thanks! d.
On Thu, Jul 25, 2013 at 12:55 AM, Nathan Goldbaum goldbaum@ucolick.org wrote:
Hey David,
I don't think you can modify the ytcfg object after loading up yt, so your second example won't work.
As for your first example, I think that's possible via pickling:
with open('data.pickle', 'wb') as pkl_file: s = cPickle.dumps(proj, pkl_file, protocol=-1)
You can then load it later like so:
with open('data.pickle', 'rb') as pkl_file: proj = cPickle.load(pkl_file)
You can do similar things using pf.h.save_object() and load_object(), but in a bit of a chicken and egg situation, you'll need serialization turned on in your config parameters for that to work.
-Nathan
On Tue, Jul 23, 2013 at 8:03 PM, David Collins dcollins4096@gmail.com wrote:
I'm +1 on changing the default. Thanks for making a announcement about the change.
How hard would it be to make an individual routine get serialized on demand? For instance, proj = pf.h.proj( ... serizlize = True)
Or, would it work to do
ytcfg['yt', 'serialize'] = 'True' do stuff ytcfg['yt', 'serialize'] = 'False' ?
d.
On Tue, Jul 23, 2013 at 6:20 PM, j s oishi jsoishi@gmail.com wrote:
Oh god...+100000000000 <sound of coins dinging in 8 bit glory>
On Jul 23, 2013 7:08 PM, "Matthew Turk" matthewturk@gmail.com wrote:
On Tue, Jul 23, 2013 at 3:27 PM, Nathan Goldbaum nathan12343@gmail.com wrote:
Hi all,
I've just issued a PR that will hopefully fix a whole class of buggy behavior that both new and experienced yt users commonly run into. Specifically, I'd like it if we could turn off data serialization by default. This changes a long-lived default value in yt's configuration, so I wanted to bring this change to the attention of both the yt user and developer community.
What is data serialization? Currently, yt will save the result of certain expensive calculations, including projections, the structure of the grid hierarchy, and the list of fields present in the data. While this does have the beneficial effect of saving time when a user needs to repetitively calculate these quantities on the same dataset, it has a number of features which lead to buggy, annoying behavior.
Specifically, If I am developing my simulation code or repeatedly restarting my code, searching for a way to grind past a code crash, I will quite often regenerate the same simulation output file over and over, changing a line of code or switching out the value of a parameter each time.
If yt's data serialization is turned on, it's likely that yt's visualizations will correspond to old versions of the data file. Since only certain operations are serialized, it's also possible for yt to get into an inconsistent state - one operation will show the current data file, while another operation will show an old version.
It's possible to fix a bug in your code, but because yt is still loading the old data, you won't be able to tell that your bug is fixed until you realize that you have .yt and .harrays files littering your filesystem.
I've personally wasted a lot of time due to yt's serialization 'feature' and denizens of our IRC channel and mailing list can attest to how often new users run into this behavior as well.
My pull request only turns off serlialization by default, it doesn't disable the capability completely. Once the pull request is merged in, you can turn on serialization either by adding an entry to your config file:
$ cat ~/.yt/config
[yt] serialize = True
Or on a per-script basis:
from yt.config import ytcfg ytcfg['yt', 'serialize'] = 'True' from yt.mods import *
The pull request is here: https://bitbucket.org/yt_analysis/yt/pull-request/558
I know several of you are big fans of this feature, so if you object to this change please leave a comment on the pull request so we can figure out a way forward.
I think this is long overdue, for all the reasons you list. Auto-serialization treated a lot of symptoms that we have since improved, or that we should address more directly -- speed of hierarchy construction, saving data that we want to retain, and detecting fields.
+1!
-Matt
-Nathan
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
-- -- Sent from a computer.
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
-- -- Sent from a computer.
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Hi, sorry I wasn't clear, picking directly will work. pf.h.save_object and pf.h.load_object will exit after doing nothing.
On Thu, Jul 25, 2013 at 9:00 AM, David Collins dcollins4096@gmail.comwrote:
Thanks for the examples.
I'm a little unclear about your last statement-- will pickling the objects directly work with serialization off?
Thanks! d.
On Thu, Jul 25, 2013 at 12:55 AM, Nathan Goldbaum goldbaum@ucolick.orgwrote:
Hey David,
I don't think you can modify the ytcfg object after loading up yt, so your second example won't work.
As for your first example, I think that's possible via pickling:
with open('data.pickle', 'wb') as pkl_file: s = cPickle.dumps(proj, pkl_file, protocol=-1)
You can then load it later like so:
with open('data.pickle', 'rb') as pkl_file: proj = cPickle.load(pkl_file)
You can do similar things using pf.h.save_object() and load_object(), but in a bit of a chicken and egg situation, you'll need serialization turned on in your config parameters for that to work.
-Nathan
On Tue, Jul 23, 2013 at 8:03 PM, David Collins dcollins4096@gmail.comwrote:
I'm +1 on changing the default. Thanks for making a announcement about the change.
How hard would it be to make an individual routine get serialized on demand? For instance, proj = pf.h.proj( ... serizlize = True)
Or, would it work to do
ytcfg['yt', 'serialize'] = 'True' do stuff ytcfg['yt', 'serialize'] = 'False' ?
d.
On Tue, Jul 23, 2013 at 6:20 PM, j s oishi jsoishi@gmail.com wrote:
Oh god...+100000000000 <sound of coins dinging in 8 bit glory> On Jul 23, 2013 7:08 PM, "Matthew Turk" matthewturk@gmail.com wrote:
On Tue, Jul 23, 2013 at 3:27 PM, Nathan Goldbaum < nathan12343@gmail.com> wrote:
Hi all,
I've just issued a PR that will hopefully fix a whole class of buggy behavior that both new and experienced yt users commonly run into. Specifically, I'd like it if we could turn off data serialization by default. This changes a long-lived default value in yt's
configuration, so
I wanted to bring this change to the attention of both the yt user
and
developer community.
What is data serialization? Currently, yt will save the result of
certain
expensive calculations, including projections, the structure of the
grid
hierarchy, and the list of fields present in the data. While this
does have
the beneficial effect of saving time when a user needs to
repetitively
calculate these quantities on the same dataset, it has a number of
features
which lead to buggy, annoying behavior.
Specifically, If I am developing my simulation code or repeatedly
restarting
my code, searching for a way to grind past a code crash, I will
quite often
regenerate the same simulation output file over and over, changing a
line of
code or switching out the value of a parameter each time.
If yt's data serialization is turned on, it's likely that yt's visualizations will correspond to old versions of the data file.
Since only
certain operations are serialized, it's also possible for yt to get
into an
inconsistent state - one operation will show the current data file,
while
another operation will show an old version.
It's possible to fix a bug in your code, but because yt is still
loading the
old data, you won't be able to tell that your bug is fixed until you
realize
that you have .yt and .harrays files littering your filesystem.
I've personally wasted a lot of time due to yt's serialization
'feature' and
denizens of our IRC channel and mailing list can attest to how often
new
users run into this behavior as well.
My pull request only turns off serlialization by default, it doesn't
disable
the capability completely. Once the pull request is merged in, you
can turn
on serialization either by adding an entry to your config file:
$ cat ~/.yt/config
[yt] serialize = True
Or on a per-script basis:
from yt.config import ytcfg ytcfg['yt', 'serialize'] = 'True' from yt.mods import *
The pull request is here: https://bitbucket.org/yt_analysis/yt/pull-request/558
I know several of you are big fans of this feature, so if you object
to this
change please leave a comment on the pull request so we can figure
out a way
forward.
I think this is long overdue, for all the reasons you list. Auto-serialization treated a lot of symptoms that we have since improved, or that we should address more directly -- speed of hierarchy construction, saving data that we want to retain, and detecting fields.
+1!
-Matt
-Nathan
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
-- -- Sent from a computer.
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
-- -- Sent from a computer.
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
+1. Thank you, Nathan.
On Thu, Jul 25, 2013 at 9:03 AM, Nathan Goldbaum nathan12343@gmail.comwrote:
Hi, sorry I wasn't clear, picking directly will work. pf.h.save_object and pf.h.load_object will exit after doing nothing.
On Thu, Jul 25, 2013 at 9:00 AM, David Collins dcollins4096@gmail.comwrote:
Thanks for the examples.
I'm a little unclear about your last statement-- will pickling the objects directly work with serialization off?
Thanks! d.
On Thu, Jul 25, 2013 at 12:55 AM, Nathan Goldbaum goldbaum@ucolick.orgwrote:
Hey David,
I don't think you can modify the ytcfg object after loading up yt, so your second example won't work.
As for your first example, I think that's possible via pickling:
with open('data.pickle', 'wb') as pkl_file: s = cPickle.dumps(proj, pkl_file, protocol=-1)
You can then load it later like so:
with open('data.pickle', 'rb') as pkl_file: proj = cPickle.load(pkl_file)
You can do similar things using pf.h.save_object() and load_object(), but in a bit of a chicken and egg situation, you'll need serialization turned on in your config parameters for that to work.
-Nathan
On Tue, Jul 23, 2013 at 8:03 PM, David Collins dcollins4096@gmail.comwrote:
I'm +1 on changing the default. Thanks for making a announcement about the change.
How hard would it be to make an individual routine get serialized on demand? For instance, proj = pf.h.proj( ... serizlize = True)
Or, would it work to do
ytcfg['yt', 'serialize'] = 'True' do stuff ytcfg['yt', 'serialize'] = 'False' ?
d.
On Tue, Jul 23, 2013 at 6:20 PM, j s oishi jsoishi@gmail.com wrote:
Oh god...+100000000000 <sound of coins dinging in 8 bit glory> On Jul 23, 2013 7:08 PM, "Matthew Turk" matthewturk@gmail.com wrote:
On Tue, Jul 23, 2013 at 3:27 PM, Nathan Goldbaum < nathan12343@gmail.com> wrote: > Hi all, > > I've just issued a PR that will hopefully fix a whole class of buggy > behavior that both new and experienced yt users commonly run into. > Specifically, I'd like it if we could turn off data serialization by > default. This changes a long-lived default value in yt's configuration, so > I wanted to bring this change to the attention of both the yt user and > developer community. > > What is data serialization? Currently, yt will save the result of certain > expensive calculations, including projections, the structure of the grid > hierarchy, and the list of fields present in the data. While this does have > the beneficial effect of saving time when a user needs to repetitively > calculate these quantities on the same dataset, it has a number of features > which lead to buggy, annoying behavior. > > Specifically, If I am developing my simulation code or repeatedly restarting > my code, searching for a way to grind past a code crash, I will quite often > regenerate the same simulation output file over and over, changing a line of > code or switching out the value of a parameter each time. > > If yt's data serialization is turned on, it's likely that yt's > visualizations will correspond to old versions of the data file. Since only > certain operations are serialized, it's also possible for yt to get into an > inconsistent state - one operation will show the current data file, while > another operation will show an old version. > > It's possible to fix a bug in your code, but because yt is still loading the > old data, you won't be able to tell that your bug is fixed until you realize > that you have .yt and .harrays files littering your filesystem. > > I've personally wasted a lot of time due to yt's serialization 'feature' and > denizens of our IRC channel and mailing list can attest to how often new > users run into this behavior as well. > > My pull request only turns off serlialization by default, it doesn't disable > the capability completely. Once the pull request is merged in, you can turn > on serialization either by adding an entry to your config file: > > $ cat ~/.yt/config > > [yt] > serialize = True > > Or on a per-script basis: > > from yt.config import ytcfg > ytcfg['yt', 'serialize'] = 'True' > from yt.mods import * > > The pull request is here: > https://bitbucket.org/yt_analysis/yt/pull-request/558 > > I know several of you are big fans of this feature, so if you object to this > change please leave a comment on the pull request so we can figure out a way > forward.
I think this is long overdue, for all the reasons you list. Auto-serialization treated a lot of symptoms that we have since improved, or that we should address more directly -- speed of hierarchy construction, saving data that we want to retain, and detecting fields.
+1!
-Matt
> > -Nathan > > _______________________________________________ > yt-dev mailing list > yt-dev@lists.spacepope.org > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org > _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
-- -- Sent from a computer.
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
-- -- Sent from a computer.
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
To join the chorus of "+1"s, I think this is a great idea. I've been burned by the default behavior many a time...
-Andrew
On Tue, Jul 23, 2013 at 4:08 PM, Matthew Turk matthewturk@gmail.com wrote:
On Tue, Jul 23, 2013 at 3:27 PM, Nathan Goldbaum nathan12343@gmail.com wrote:
Hi all,
I've just issued a PR that will hopefully fix a whole class of buggy behavior that both new and experienced yt users commonly run into. Specifically, I'd like it if we could turn off data serialization by default. This changes a long-lived default value in yt's configuration,
so
I wanted to bring this change to the attention of both the yt user and developer community.
What is data serialization? Currently, yt will save the result of
certain
expensive calculations, including projections, the structure of the grid hierarchy, and the list of fields present in the data. While this does
have
the beneficial effect of saving time when a user needs to repetitively calculate these quantities on the same dataset, it has a number of
features
which lead to buggy, annoying behavior.
Specifically, If I am developing my simulation code or repeatedly
restarting
my code, searching for a way to grind past a code crash, I will quite
often
regenerate the same simulation output file over and over, changing a
line of
code or switching out the value of a parameter each time.
If yt's data serialization is turned on, it's likely that yt's visualizations will correspond to old versions of the data file. Since
only
certain operations are serialized, it's also possible for yt to get into
an
inconsistent state - one operation will show the current data file, while another operation will show an old version.
It's possible to fix a bug in your code, but because yt is still loading
the
old data, you won't be able to tell that your bug is fixed until you
realize
that you have .yt and .harrays files littering your filesystem.
I've personally wasted a lot of time due to yt's serialization 'feature'
and
denizens of our IRC channel and mailing list can attest to how often new users run into this behavior as well.
My pull request only turns off serlialization by default, it doesn't
disable
the capability completely. Once the pull request is merged in, you can
turn
on serialization either by adding an entry to your config file:
$ cat ~/.yt/config
[yt] serialize = True
Or on a per-script basis:
from yt.config import ytcfg ytcfg['yt', 'serialize'] = 'True' from yt.mods import *
The pull request is here: https://bitbucket.org/yt_analysis/yt/pull-request/558
I know several of you are big fans of this feature, so if you object to
this
change please leave a comment on the pull request so we can figure out a
way
forward.
I think this is long overdue, for all the reasons you list. Auto-serialization treated a lot of symptoms that we have since improved, or that we should address more directly -- speed of hierarchy construction, saving data that we want to retain, and detecting fields.
+1!
-Matt
-Nathan
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Hi all,
After a bit of time and discussion about this, I've accepted Nathan's pull request. If you want to restore the original behavior, you can turn it on in ~/.yt/config with the option:
[yt] serialize = True
or you can place this at the very top of your yt scripts:
from yt.config import ytcfg; ytcfg["yt","serialize"] = "True"
-Matt
On Tue, Jul 23, 2013 at 3:27 PM, Nathan Goldbaum nathan12343@gmail.com wrote:
Hi all,
I've just issued a PR that will hopefully fix a whole class of buggy behavior that both new and experienced yt users commonly run into. Specifically, I'd like it if we could turn off data serialization by default. This changes a long-lived default value in yt's configuration, so I wanted to bring this change to the attention of both the yt user and developer community.
What is data serialization? Currently, yt will save the result of certain expensive calculations, including projections, the structure of the grid hierarchy, and the list of fields present in the data. While this does have the beneficial effect of saving time when a user needs to repetitively calculate these quantities on the same dataset, it has a number of features which lead to buggy, annoying behavior.
Specifically, If I am developing my simulation code or repeatedly restarting my code, searching for a way to grind past a code crash, I will quite often regenerate the same simulation output file over and over, changing a line of code or switching out the value of a parameter each time.
If yt's data serialization is turned on, it's likely that yt's visualizations will correspond to old versions of the data file. Since only certain operations are serialized, it's also possible for yt to get into an inconsistent state - one operation will show the current data file, while another operation will show an old version.
It's possible to fix a bug in your code, but because yt is still loading the old data, you won't be able to tell that your bug is fixed until you realize that you have .yt and .harrays files littering your filesystem.
I've personally wasted a lot of time due to yt's serialization 'feature' and denizens of our IRC channel and mailing list can attest to how often new users run into this behavior as well.
My pull request only turns off serlialization by default, it doesn't disable the capability completely. Once the pull request is merged in, you can turn on serialization either by adding an entry to your config file:
$ cat ~/.yt/config
[yt] serialize = True
Or on a per-script basis:
from yt.config import ytcfg ytcfg['yt', 'serialize'] = 'True' from yt.mods import *
The pull request is here: https://bitbucket.org/yt_analysis/yt/pull-request/558
I know several of you are big fans of this feature, so if you object to this change please leave a comment on the pull request so we can figure out a way forward.
-Nathan
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org