[Python-Dev] PEP 351, the freeze protocol

Mon Oct 31 21:05:16 CET 2005

Noam Raphael <noamraph at gmail.com> wrote:
> Hello,
> 
> I have slept quite well, and talked about it with a few people, and I
> still think I'm right.

And I'm going to point out why you are wrong.

> About the users-changing-my-internal-data issue:
> 
> > Again, user semantics.  Tell your users not to modify entries, and/or
> > you can make copies of objects you return.  If your users are too daft
> > to read and/or follow directions, then they deserve to have their
> > software not work.
> ...
> > When someone complains that something doesn't work, I tell them to read
> > the documentation.  If your users haven't been told to RTFM often enough
> > to actually make it happen, then you need a RTFM-bat. Want to know how
> > you make one?  You start wrapping the objects you return which segfaults
> > the process if they change things. When they start asking, tell them it
> > is documented quite clearly "do not to modify objects returned, or else".
> > Then there's the other option, which I provide below.
> 
> I disagree. I read the manual when I don't know what something does.
> If I can guess what it does (and this is where conventions are good),
> I don't read the manual. And let's say I ought to read the complete
> manual for every method that I use, and that I deserve a death
> punishment (or a segmentation fault) if I don't. But let's say that I
> wrote a software, without reading the manual, and it worked. I have
> gone to work on other things, and suddenly a bug arises. When the poor
> guy who needs to fix it goes over the code, everything looks
> absolutely correct. Should he also read the complete manuals of every
> library that I used, in order to fix that bug? And remember that in
> this case, the object could have traveled between several places
> (including functions in other libraries), before it was changed, and
> the original data structure starts behaving weird.

You can have a printout before it dies:
"I'm crashing your program because something attempted to modify a data
structure (here's the traceback), and you were told not to."

Then again, you can even raise an exception when people try to change
the object, as imdict does, as tuples do, etc.

> You suggest two ways for solving the problem. The first is by copying
> my mutable objects to immutable copies:

And by caching those results, then invalidating them when they are
updated by your application.  This is the same as what you would like to
do, except that I do not rely on copy-on-write semantics, which aren't
any faster than freeze+cache by your application.

[snip graph API example]

> This will work. It's simply... well, not very beautiful. I have to
> invent a lot of names, and my users need to remember them all. If I
> give them a frozen set, with all the vertices than a vertex points to
> (which is an absolutely reasonable API), they will know how to deal
> with it without learning a lot of method names, thanks to the fact
> that they are already familiar with sets, and that a lot of thought
> has gone into the set interface.

I never claimed it was beautiful, I claimed it would work.  And it does. 
There are 7 methods, which you can reduce if you play the special method
game:

RemEdge -> __delitem__((node, node))
RemNode -> __delitem__(node) #forgot this method before
IterNodes -> __iter__()
IterOutgoing,IterIncoming -> IterAdjacent(node)

In any case, whether you choose to use freeze, or use a different API,
this particular problem is solvable without copy-on-write semantics.

> 
> > > Now, about copy-on-write:
> ...
> >
> > What you have written here is fairly unintelligible, but thankfully you
> > clarify yourself...pity it still doesn't work, I explain below.
> 
> I'm sorry if I am sometimes hard to understand. English is not my
> mother tongue, and it degrades as the hour gets later - and sometimes,
> things are hard to explain. If I don't explain myself, please say so
> and I'll try again. This is an excellent example - I wrote about
> callbacks, and went to sleep. Let me try to explain again how it
> *does* work.

Thank you for the clarification (btw, your english is far better than
any of the foreign languages I've been "taught" over the years).

> This is not so. When a list creates its frozen copy, it gives itself
> to all those frozen() calls. This means that it will be notified if
> one of its members is changed. In that case, it has to do two simple
> actions: 1. release the reference to its frozen copy, so that
> subsequent freezes of the list would create a new frozen copy, and: 2.
> notify about the change any object which froze the list and requested
> notification.
> 
> This frees us of any validation code. If we haven't been notified
> about a change, there was no change, and the frozen copy is valid.
> 
> In case you ask, the cost of notification is O(1), amortized. The
> reason is that every frozen() call can cause at most one callback in
> the future.

Even without validation, there are examples that force a high number of
calls, which are not O(1), ammortized or otherwise.

> > Like I said before, it's not so easy.  In order to actually implement
> > the system, every object in an object heirarchy would need to know about
> > its parent, but such cannot work because...
> >
> >    a = range(10)
> >    b = [a]
> >    c = [a]
> >    bf = frozen(b)
> >    cf = frozen(c)
> >    a[0] = 10       #oops!
> >
> > That last line is the killer.  Every object would necessarily need to
> > know about all containers which point to it, and would necessarily need
> > to notify them all that their contents had changed.  I personally don't
> > see Python doing that any time soon.
> >
> This is not the case. Every object has to know only on the objects
> which created frozen copies of it, and requested notifications.
> Actually, the object itself doesn't have to store anything. I thought
> about it, and you can create a module for handling those
> change-callbacks. It would store only weak references to objects. It
> would have two functions:
> 
> def register_reference(x, y):
>     '''Register the fact that if object x changes, it means that
> object y changes too.'''
> 
> def changed(x):
>     '''Notify all objects that change with x that they are changed.'''
> 
> When an object is changed, this module would call the __changed__
> method of all the objects that have a reference to it, and haven't
> changed since the connection was created.

Callbacks work, in that you can notify parents, but they still don't
allow O(1) ammortization.

    a = [[] for i in xrange(k)]
    b = [list(a) for i in xrange(k)]
    del a

    c = freeze(b)

    b[0][0].append(1)

That append call requires that b and every list in b be notified.  That
takes O(k) time, because you have to notify the k lists which point to
b[0][0]. Let's freeze it again! Oh wait, that takes O(k**2) time for
that second freeze because you have to recreate the tuple version of b,
as well as the tuple version of everything in b. Cycling through
modifications and freezing ends up taking time equivalent to if you were
to just re-create the entire frozen version to start with every time you
freeze.

Now, the actual time analysis on repeated freezings and such gets ugly. 
There are actually O(k) objects, which take up O(k**2) space.  When you
modify object b[i][j] (which has just been frozen), you get O(k)
callbacks, and when you call freeze(b), it actually results in O(k**2)
time to re-copy the O(k**2) pointers to the O(k) objects.  It should be
obvious that this IS NOT AMMORTIZABLE to original object creation time.

> I hope I have clarified my idea. Tell me if you still think I'm wrong.

You have clarified it, but it is still wrong.  I stand by 'it is not
easy to get right', and would further claim, "I doubt it is possible to
make it fast."

Good day,
 - Josiah