[Python-Dev] PEP 351, the freeze protocol

Sat Oct 29 22:42:49 CEST 2005

Hello,

I have thought about freezing for some time, and I think that it is a
fundamental need - the need to know, sometimes, that objects aren't
going to change.

This is mostly the need of containers. dicts need to know that the
objects that are used as keys aren't going to change, because if they
change, their hash value changes, and you end up with a data structure
in an inconsistent state. This is the need of sets too, and of heaps,
and binary trees, and so on.

I want to give another example: I and my colleges designed something
which can be described as an "electronic spreadsheet in Python". We
called it a "table". The values in the table are Python objects, and
the functions which relate them are written in Python. Then comes the
problem: the user has, of course, access to the objects stored in the
table. What would happen if he changes them? The answer is that the
table would be in an inconsistent state, since something which should
be the return value of a function is now something else, and there's
no way for the table to know about that.

The solution is to have a "freeze" protocol. It may be called "frozen"
(like frozen(set([1,2,3]))), so that it will be clear that it does not
change the object itself. The definition of a frozen object is that
its value can't change - that is, if you compare it with another
object, you should get the same result as long as the other object
hasn't changed. As a rule, only frozen objects should be hashable.

I want to give another, different, use case for freezing objects. I
once thought about writing a graph package in Python - I mean a graph
with vertices and edges. The most obvious way to store a directed
graph is as a mapping (dict) from a node to the set of nodes that it
points to. Since I want to be able to find also which nodes point to a
specific node, I will store another mapping, from a node to the set of
nodes that point to it. Now, I want a method of the graph which will
return the set of nodes that a given node points to, for example to
let me write "if y in graph.adjacent_nodes(x) then". The question is,
what will the adjacent_nodes method return? If it returns the set
which is a part of the data structure, there is nothing (even no
convention!) that will prevent the user from playing with it. This
will corrupt the data structure, since the change won't be recorded in
the inverse mapping. adjacent_nodes can return a copy of the set, it's
a waste if you only want to check whether an object is a member of the
set.

I gave this example to say that the "frozen" protocol should (when
possible) return an object which doesn't really contain a copy of the
data, but rather gives an "image" of the original object. If the
original object changes while there are frozen copies of it, the data
will be copied, and all the frozen objects will then reference a
version of the data that will never change again.

This will solve the graph problem nicely - adjacent_nodes would simply
return a frozen copy of the set, and a copy operation would happen
only in the rare cases when the returned set is being modified. This
would also help the container use cases: they may call the frozen()
method on objects that should be inserted into the container, and
usually the data won't be copied. Some objects can't be created in
their final form, but can only be constructed step after step. This
means that they must be non-frozen objects. Sometimes they are
constructed in order to get into a container. Unless the frozen()
method is copy-on-change the way I described, all the data would have
to be copied again, just for the commitment that it won't change.

I don't mean to frighten, but in principle, this may mean that
immutable strings might be introduced, which will allow us to get rid
of all the cStringIO workarounds. Immutable strings would be
constructed whenever they are needed, at a low performance cost
(remember that a frozen copy of a given object has to be constructed
only once - once it has been created, the same object can be returned
on additional frozen() calls.)

Copy-on-change of containers of non-frozen objects requires additional
complication: it requires frozen objects to have a way for setting a
callback that will be called when the original object was changed.
This is because the change makes the container of the original object
change, so it must drop its own frozen copy. This needs to happen only
once per frozen object, since after a change, all the containers drop
their frozen copies. I think this callback is conceptually similar to
the weakref callback.

Just an example that copy-on-change (at least of containers of frozen
objects) is needed: sets. It was decided that you can test whether a
non-frozen set is a member of a set. I understand that it is done by
"temporarily freezing" the set, and that it caused some threading
issues. A copy-on-change mechanism might solve it more elegantly.

What do you think?

Noam