Diff between object graphs?

Peter Otten __peter__ at web.de
Wed Apr 22 14:53:38 CEST 2015

Cem Karan wrote:

> Hi all, I need some help.  I'm working on a simple event-based simulator
> for my dissertation research. The simulator has state information that I
> want to analyze as a post-simulation step, so I currently save (pickle)
> the entire simulator every time an event occurs; this lets me analyze the
> simulation at any moment in time, and ask questions that I haven't thought
> of yet.  The problem is that pickling this amount of data is both
> time-consuming and a space hog.  This is true even when using bz2.open()
> to create a compressed file on the fly.
> This leaves me with two choices; first, pick the data I want to save, and
> second, find a way of generating diffs between object graphs.  Since I
> don't yet know all the questions I want to ask, I don't want to throw away
> information prematurely, which is why I would prefer to avoid scenario 1.
> So that brings up possibility two; generating diffs between object graphs.
>  I've searched around in the standard library and on pypi, but I haven't
> yet found a library that does what I want.  Does anyone know of something
> that does?
> Basically, I want something with the following ability:
> Object_graph_2 - Object_graph_1 = diff_2_1
> Object_graph_1 + diff_2_1 = Object_graph_2
> The object graphs are already pickleable, and the diffs must be, or this
> won't work.  I can use deepcopy to ensure the two object graphs are
> completely separate, so the diffing engine doesn't need to worry about
> that part.
> Anyone know of such a thing?

A poor man's approach:

Do not compress the pickled data, check it into version control. Getting the 
n-th state then becomes checking out the n-th revision of the file.

I have no idea how much space you save that way, but it's simple enough to 
give it a try.

Another slightly more involved idea:

Make the events pickleable, and save the simulator only for every 100th (for 
example) event. To restore the 7531th state load pickle 7500 and apply 
events 7501 to 7531.

More information about the Python-list mailing list