[Python-Dev] PEP 351, the freeze protocol

Josiah Carlson jcarlson at uci.edu
Mon Oct 31 06:09:17 CET 2005


Noam Raphael <noamraph at gmail.com> wrote:
> 
> Hello,
> 
> It seems that we both agree that freezing is cool (-; . We disagree on
> whether a copy-on-write behaviour is desired. Your arguments agains
> copy-on-write are:
> 1. It's not needed.
> 2. It's complicated to implement.
> 
> But first of all, you didn't like my use cases. I want to argue with that.
> 
> > In reading your description of a 'table of values', I can't help but be
> > reminded of the wxPython (and wxWidget) wx.Grid and its semantics.  It
> > offers arbitrary tables of values (whose editors and viewers you can
> > change at will), which offers a mechanism by which you can "listen" to
> > changes that occur to the contents of a cell.  I can't help but think
> > that if you offered a protocol by which a user can signal that a cell
> > has been changed, perhaps by writing the value to the table itself
> > (table.SetValue(row, col, value)), every read a deepcopy (or a PEP 351
> > freeze), etc., that both you and the users of your table would be much
> > happier.
> 
> Perhaps I didn't make it clear. The difference between wxPython's Grid
> and my table is that in the table, most values are *computed*. This
> means that there's no point in changing the values themselves. They
> are also used frequently as set members (I can describe why, but it's
> a bit complicated.)

Again, user semantics.  Tell your users not to modify entries, and/or
you can make copies of objects you return.  If your users are too daft
to read and/or follow directions, then they deserve to have their
software not work.

Also from the sounds of it, you are storing both source and destination
values in the same table...hrm, that sounds quite a bit like a
spreadsheet.  How does every spreadsheet handle that again?  Oh yeah,
they only ever store immutables (generally strings which are interpreted). 
But I suppose since you are (of course) storing mutable objects, you
need to work a bit harder...so store mutables, and return immutable
copies (which you can cache if you want, and invalidate when your
application updates the results...like a wx.Grid update on changed).


> > As for the graph issue, you've got a bigger problem than users just
> > being able to edit edge lists, users can clear the entire dictionary of
> > vertices (outgoing.clear()).  It seems to me that a more reasonable
> > method to handle this particular case is to tell your users "don't
> > modify the dictionaries or the edge lists", and/or store your edge lists
> > as tuples instead of lists or dictionaries, and/or use an immutable
> > dictionary (as offered by Barry in the PEP).
> 
> As I wrote before, telling my users "don't modify the edge lists" is
> just like making lists hashable, and telling all Python users, "dont
> modify lists that are dictionary keys." There's no way to tell the
> users that - there's no convention for objects which should not be
> changed. You can write it in the documentation, but who'll bother
> looking there?

When someone complains that something doesn't work, I tell them to read
the documentation.  If your users haven't been told to RTFM often enough
to actually make it happen, then you need a RTFM-bat. Want to know how
you make one?  You start wrapping the objects you return which segfaults
the process if they change things. When they start asking, tell them it
is documented quite clearly "do not to modify objects returned, or else". 
Then there's the other option, which I provide below.


> I don't think that your other suggestions will work: the data
> structure of the graph itself can't be made of immutable objects,
> because of the fact that the graph is a mutable object - you can
> change it. It can be made of immutable objects, but this means copying
> all the data every time the graph changes.

So let me get this straight: You've got a graph.  You want to be able to
change the graph, but you don't want your users to accidentally change
the graph. Sounds to me like an API problem, not a freeze()/mutable problem.
Want an API?

class graph:
    ...
    def IterOutgoing(self, node):
        ...
    def IterIncoming(self, node):
        ...
    def IsAdjacent(self, node1, node2):
        ...
    def IterNodes(self):
        ...
    def AddEdge(self, f_node, t_node):
        ...
    def RemEdge(self, node1, node2):
        ...
    def AddNode(self):
        ...

If you are reasonable in your implementation, all of the above
operations can be fast, and you will never have to worry about your
users accidentally mucking about with your internal data structures:
because you aren't exposing them.  If you are really paranoid, you can
take the next step and implement it in Pyrex or C, so that only a
malicous user can muck about with internal structures, at which point
you stop supporting them.


> Now, about copy-on-write:
> 
> > There's also this little issue of "copy on write" semantics with Python.
> > Anyone who tells you that "copy on write" is easy, is probably hanging
> > out with the same kind of people who say that "threading is easy".  Of
> > course both are easy if you limit your uses to some small subset of
> > interesting interactions, but "copy on write" gets far harder when you
> > start thinking of dictionaries, lists, StringIOs, arrays, and all the
> > possible user-defined classes, which may be mutated beyond obj[key] =
> > value and/or obj.attr = value (some have obj.method() which mutates the
> > object). As such, offering a callback mechanism similar to weak
> > references is probably pretty close to impossible with CPython.
> 
> Let's limit ourselves to copy-on-write of objects which do not contain
> nonfrozen objects. Perhaps it's enough - the table, the graph, and
> strings, are perfect examples of these. Implementation doesn't seem to
> complicated to me - whenever the object is about to change, and there
> is a connected frozen copy, you make a shallow copy of the object,
> point the frozen copy to it, release the reference to the frozen copy,
> and continue as usual. That's all.

What you have written here is fairly unintelligible, but thankfully you
clarify yourself...pity it still doesn't work, I explain below.

[snip]


> The problem is that we have to keep the created frozen object as long
> as the original object stays alive. So we have to know if it has
> changed. This is where those callbacks get in. As long as what is done
> with them is correct, there should be no problems. They are used only
> to disengage the frozen copies from their original objects. The action
> they should trigger is simply that:
> 
> def on_contained_object_change(self):
>     self._frozen_copy = None
>     while self._callbacks:
>         self._callbacks.pop()()
> 
> What's also interesting is that this freezing mechanism can be
> provided automatically for user-created classes, since those are
> simply containers of other objects, which behave exactly like dicts,
> for this matter.
> 
> It allows everything in Python to be both mutable and hashable,
> without changing the O() complexity! Wow!
> 
> Ok, I'm going to sleep now. If you find something wrong with this
> idea, please tell me.


Here is where you are wrong.

    x = []
    for i in xrange(k):
        x.append(range(k))

We now have a simple list of lists, k lists, each of length k.  Let's
freeze it.

    y = frozen(x)

Ok, now we have a recursively frozen list of lists y, implemented
however you want, and you've ammortized this ONE call to creation time.
We'll give y to someone who does whatever he wants to it, and we'll
continue on.

    z = frozen(x)

Your claim is that due to the cache, the above operation can be done in
constant time after you have already frozen x, this is wrong, but we'll
get to that.  Let us mutate one of the contained lists, and see if this
can continue...

    x[0][0] = 7

Oh hrm.  This invalidates x[0]'s cached frozen object, which would
suggest that x's cached frozen object was also invalidated, even though
Python objects tend to know nothing about the objects which point to
them.  Well, that's a rub. In order to /validate/ that an object's
cached object is valid, you must validate that the contents of your
cached frozen object points to the cached frozen objects of your
contents.  Or in code (for this two-level example)...

    def frozen(x):
        if x.frozen_cache:
            for i,j in zip(x.contents, x.frozen_cache):
                if i.frozen_cache is not j:
                    x.frozen_cache = None
                    x.frozen_cache = frozen(x)
                    return x.frozen_cache
        x.frozen_cache = tuple(map(frozen, x.contents))
        return x.frozen_cache

Ouch, for the list of lists example with a total size O(k**2), you need
to spend O(k) time.  We'll say that n == k**2, so really, for this
particular object of size O(n), you need to spend O(sqrt(n)) time
verifying.  Not quite constant.

But wait...in order to verify that every cached frozen object is
valid...we should have been checking the contents of x[i] to verify that
they were frozen too!  Wow, that would take us O(n) time just to verify.
And we would need to do that EVERY TIME WE CALLED frozen(x), REGARDLESS
OF WHETHER x WAS MUTATED!

Wait a second...isn't that just the same as just recursively calling
freeze?  Yes.

Are we actually saving any time?  No.

What has this idea resulted in? The incorrect belief that caching frozen
objects will reduce the computation necessary in freeze(x), and a proposed
new attribute on literally every object which points to an immutable
copy of itself, generally doubling the amount of memory used.


Like I said before, it's not so easy.  In order to actually implement
the system, every object in an object heirarchy would need to know about
its parent, but such cannot work because...

    a = range(10)
    b = [a]
    c = [a]
    bf = frozen(b)
    cf = frozen(c)
    a[0] = 10       #oops!

That last line is the killer.  Every object would necessarily need to
know about all containers which point to it, and would necessarily need
to notify them all that their contents had changed.  I personally don't
see Python doing that any time soon.

Hope you sleep/slept well,
 - Josiah



More information about the Python-Dev mailing list