[Python-Dev] PEP 351, the freeze protocol

Mon Oct 31 00:35:05 CET 2005

Hello,

It seems that we both agree that freezing is cool (-; . We disagree on
whether a copy-on-write behaviour is desired. Your arguments agains
copy-on-write are:
1. It's not needed.
2. It's complicated to implement.

But first of all, you didn't like my use cases. I want to argue with that.

> In reading your description of a 'table of values', I can't help but be
> reminded of the wxPython (and wxWidget) wx.Grid and its semantics.  It
> offers arbitrary tables of values (whose editors and viewers you can
> change at will), which offers a mechanism by which you can "listen" to
> changes that occur to the contents of a cell.  I can't help but think
> that if you offered a protocol by which a user can signal that a cell
> has been changed, perhaps by writing the value to the table itself
> (table.SetValue(row, col, value)), every read a deepcopy (or a PEP 351
> freeze), etc., that both you and the users of your table would be much
> happier.

Perhaps I didn't make it clear. The difference between wxPython's Grid
and my table is that in the table, most values are *computed*. This
means that there's no point in changing the values themselves. They
are also used frequently as set members (I can describe why, but it's
a bit complicated.)

I want to say that even if sets weren't used, the objects in the table
should have been frozen. The fact the sets (and dicts) only allow
immutable objects as members/keys is just for protecting the user.
They could have declared, "you shouldn't change anything you insert -
as long as you don't, we'll function properly." The only reason why
you can't compute hash values of mutable objects is that you don't
want your user to make mistakes, and make the data structure
inconsistent.

> As for the graph issue, you've got a bigger problem than users just
> being able to edit edge lists, users can clear the entire dictionary of
> vertices (outgoing.clear()).  It seems to me that a more reasonable
> method to handle this particular case is to tell your users "don't
> modify the dictionaries or the edge lists", and/or store your edge lists
> as tuples instead of lists or dictionaries, and/or use an immutable
> dictionary (as offered by Barry in the PEP).

As I wrote before, telling my users "don't modify the edge lists" is
just like making lists hashable, and telling all Python users, "dont
modify lists that are dictionary keys." There's no way to tell the
users that - there's no convention for objects which should not be
changed. You can write it in the documentation, but who'll bother
looking there?

I don't think that your other suggestions will work: the data
structure of the graph itself can't be made of immutable objects,
because of the fact that the graph is a mutable object - you can
change it. It can be made of immutable objects, but this means copying
all the data every time the graph changes.

Now, about copy-on-write:

> There's also this little issue of "copy on write" semantics with Python.
> Anyone who tells you that "copy on write" is easy, is probably hanging
> out with the same kind of people who say that "threading is easy".  Of
> course both are easy if you limit your uses to some small subset of
> interesting interactions, but "copy on write" gets far harder when you
> start thinking of dictionaries, lists, StringIOs, arrays, and all the
> possible user-defined classes, which may be mutated beyond obj[key] =
> value and/or obj.attr = value (some have obj.method() which mutates the
> object). As such, offering a callback mechanism similar to weak
> references is probably pretty close to impossible with CPython.

Let's limit ourselves to copy-on-write of objects which do not contain
nonfrozen objects. Perhaps it's enough - the table, the graph, and
strings, are perfect examples of these. Implementation doesn't seem to
complicated to me - whenever the object is about to change, and there
is a connected frozen copy, you make a shallow copy of the object,
point the frozen copy to it, release the reference to the frozen copy,
and continue as usual. That's all.

I really think that this kind of copy-on-write is "correct". The
temporary freezing of sets in order to check if they are members of
other sets is a not-very-nice way of implementing it. This kind of
copy-on-write would allow, in principle, for Python strings to become
mutable, with almost no speed penalty. It would allow my table, and
other containers, to automatically freeze the objects that get into
it, without having to trust the user on not changing the objects - and
remember that there's no way of *telling* him not to change the
objects.

Now, the computer scientist in me wants to explain (and think about)
freezing containers of nonfrozen objects. What I actually want is that
as long as an object doesn't change after it's freezed, the cost of
freezing would be nothing - that is, O(1). Think about a mutable
string object, which is used in the same way as the current, immutable
strings. It is constructed once, and then may be used as a key in a
dictionary many times. I want to claim that it's a common pattern -
create an object, it doesn't matter how, and then use it without
changing it. If that is the case, it's obvious that all the frozen()
calls would take O(1) each.

How can we accomplish this (freezing costs O(1) as long as the object
doesn't change) with containers of nonfrozen objects? It seems
impossible - no matter what, on the first time the container is
freezed, you would have to call frozen() for every object it contains!
The answer is that in an amortized analysis, it is still an O(1)
operation. The reason is that as long as frozen() takes O(1)
(amortized), all those calls to frozen() can be considered a part of
the object construction, since they are made only once - on the next
call to frozen(), the already-created frozen object would be returned.
This analysis is correct as long as the object doesn't change after
it's freezed.

The problem is that we have to keep the created frozen object as long
as the original object stays alive. So we have to know if it has
changed. This is where those callbacks get in. As long as what is done
with them is correct, there should be no problems. They are used only
to disengage the frozen copies from their original objects. The action
they should trigger is simply that:

def on_contained_object_change(self):
    self._frozen_copy = None
    while self._callbacks:
        self._callbacks.pop()()

What's also interesting is that this freezing mechanism can be
provided automatically for user-created classes, since those are
simply containers of other objects, which behave exactly like dicts,
for this matter.

It allows everything in Python to be both mutable and hashable,
without changing the O() complexity! Wow!

Ok, I'm going to sleep now. If you find something wrong with this
idea, please tell me.

Have a good day,
Noam