On Aug 4, 2014, at 10:07 PM, Daniel Sank <sank.daniel@gmail.com> wrote:
glyph
I would be happy to answer questions, but obviously I'm not super responsive :). Let me know what you need.
I am trying to understand jelly's serialization strategy:
1. In t.s.jelly._Jellier, what is the meaning of persistentStore?
From the perspective of PB, you can ignore this completely. It's effectively an unused feature. There are two entry-point call-sites for jelly in Pb. Broker.unserialize and Broker.serialize. Both explicitly pass "None" for the "persistent" argument, "persistentStore" and "persistentLoad" respectively. Reaching back into my dim and distant memory of the ancient past, I believe that the purpose of these callables was to allow you to use Jelly (and perhaps PB) to refer to objects in some kind of pluggable long-term storage. The reason they're called "persistent" was that "ephemeral" storage was local to the connection, and therefore short-lived enough that we could trust that an in-memory Python dictionary would be both large enough and long-lived enough to serve it. But if you have your objects in a database, you might want a different database backend with an application-provided callable for loading objects by ID. Again, this was never really used, so you can probably ignore it. (I think there might have been a 4X massively multiplayer video game which used it in 2002 or so, but nothing since then that I'm aware of, especially since PB doesn't even have a way to pass in your own without subclassing and overriding 'serialize'.)
2. In t.s.jelly._Jellier, what is the meaning of cooked? The comment here doesn't make sense to me yet.
I just read the comment in _cook, and I hate my younger self right now. Seriously. Screw that guy. When you make a jelly, you have to cook the fruit first. So part of the metaphor here is that you are "cooking" the objects as you're serialize them. The "cooked" attribute maps object IDs (integers representing pointers, at least in CPython) to "dereference" jelly expressions. It is said to be "cooked" at that point because you no longer need to put in the energy (I guess heat, in this metaphor?) to serialize the internal state. A "dereference" expression is one that points at an object within the same Jelly, so this is not like something pointing at a remote reference. It uses object IDs for keys and not the objects themselves because these objects are (since they can participate in circular references) implicitly mutable, and mutable objects often don't have a working __hash__ implementation, so we can't rely on that. This happens in a weird order because an object may circularly refer to itself, so we prepare it and put it in the "preserved" map before actually beginning the serialization process of its initial state. We also don't want to pollute the jelly output with reference IDs for every single object that _might_ be referenced more than once, we only want to add the ['reference'] expression if we actually refer to it twice. If you look at this example:
from twisted.spread.jelly import jelly circular = [1, 2] circular.append(circular) jelly(circular) ['reference', 1, ['list', 1, 2, ['dereference', 1]]] acyclic = [1, 2] jelly(acyclic) ['list', 1, 2]
You can see that the circular list allocates a reference ID '1' for the circular list. The output list there would have been the thing that went into the _Jellier's "cooked" list, keyed by the 'id' for the serialized list, and then 'reference 1' would have been inserted into the beginning and its body appended. So the steps are: Here's a mutable object. Let me remember that I've seen it, just in case I see it again. Now I'm going to recursively serialize it. Oh, here it is again, I know it's the same object because it has the same ID. Instead of serializing it, I'll change the ['list'] into a ['reference', 1] and stick in a ['dereference', 1] here. If we never get to step 3, we never see the ['reference'] at all, and it's as if this functionality didn't exist.
3. In t.s.jelly._Jellier, what is the meaning of cooker?
The "cooker" attribute is a hack related to the use of "id" for the unique IDs. If we used the object itself as the key (which we shouldn't do, for reasons I mentioned above), then we could just rely on it sticking around until the end of the 'jelly' call. But instead, we use its 'id', which is its pointer address, so we need to make sure that it lives on until the end of the _Jellier's lifetime, so we just stick it into the "cooker" map as the value. You'll notice that there's no store of the object itself anywhere else: in "cooked" the key is the ID, and the value is the serialized output value that Jelly is going to write out. If we didn't make sure the object stuck around, a different object might get the same ID, and that would produce spurious back-references (like, we might get a ['dereference'] where something harmless like a string should go).
A short, narrative explanation of what _Jellier does would be very useful, and if you provide it I will submit a patch to the documentation.
A _Jellier jellies objects of course, isn't it obvious ;-). Hopefully you can make sense out of the explanations above and your own existing knowledge. Are there any other phases of the process which are confusing? -glyph