[Python-Dev] PEP 351, the freeze protocol

Noam Raphael noamraph at gmail.com
Mon Oct 31 23:25:53 CET 2005


On 10/31/05, Josiah Carlson <jcarlson at uci.edu> wrote:
...
> And I'm going to point out why you are wrong.

I still don't think so. I think that I need to reclarify what I mean.

> > About the users-changing-my-internal-data issue:
...
> You can have a printout before it dies:
> "I'm crashing your program because something attempted to modify a data
> structure (here's the traceback), and you were told not to."
>
> Then again, you can even raise an exception when people try to change
> the object, as imdict does, as tuples do, etc.

Both solutions would solve the problem, but would require me to wrap
the built-in set with something which doesn't allow changes. This is a
lot of work - but it's quite similiar to what my solution would
actually do, in a single built-in function.
>
> > You suggest two ways for solving the problem. The first is by copying
> > my mutable objects to immutable copies:
>
> And by caching those results, then invalidating them when they are
> updated by your application.  This is the same as what you would like to
> do, except that I do not rely on copy-on-write semantics, which aren't
> any faster than freeze+cache by your application.

This isn't correct - freezing a set won't require a single copy to be
performed, as long as the frozen copy isn't saved after the original
is changed. Copy+cache always requires one copy.

...
> I never claimed it was beautiful, I claimed it would work.  And it does.
> There are 7 methods, which you can reduce if you play the special method
> game:
>
> RemEdge -> __delitem__((node, node))
> RemNode -> __delitem__(node) #forgot this method before
> IterNodes -> __iter__()
> IterOutgoing,IterIncoming -> IterAdjacent(node)
>
I just wanted to say that this game is of course a lot of fun, but it
doesn't simplify the interface.

> In any case, whether you choose to use freeze, or use a different API,
> this particular problem is solvable without copy-on-write semantics.

Right. But I think that a significant simplification of the API is a
nice bonus for my solution. And about those copy-on-write semantics -
it should be proven how complex they are. Remember that we are talking
about frozen-copy-on-write, which I think would simplify matters
considerably - for example, there are at most two instances sharing
the same data, since the frozen copy can be returned again and again.

> > > > Now, about copy-on-write:
> > ...
> Thank you for the clarification (btw, your english is far better than
> any of the foreign languages I've been "taught" over the years).
Thanks! It seems that if you are forced to use a language from time to
time it improves a little...

...

> Even without validation, there are examples that force a high number of
> calls, which are not O(1), ammortized or otherwise.
>
[Snap - a very interesting example]
>
> Now, the actual time analysis on repeated freezings and such gets ugly.
> There are actually O(k) objects, which take up O(k**2) space.  When you
> modify object b[i][j] (which has just been frozen), you get O(k)
> callbacks, and when you call freeze(b), it actually results in O(k**2)
> time to re-copy the O(k**2) pointers to the O(k) objects.  It should be
> obvious that this IS NOT AMMORTIZABLE to original object creation time.
>
That's absolutely right. My ammortized analysis is correct only if you
limit yourself to cases in which the original object doesn't change
after a frozen() call was made. In that case, it's ok to count the
O(k**2) copy with the O(k**2) object creation, because it's made only
once.

Why it's ok to analyze only that limited case? I am suggesting a
change in Python: that every object you would like be mutable, and
would support the frozen() protocol. When you evaluate my suggestion,
you need to take a program, and measure its performance in the current
Python and in a Python which implements my suggestion. This means that
the program should work also on the current Python. In that case, my
assumption is true - you won't change objects after you have frozen
them, simply because these objects (strings which are used as dict
keys, for example) can't be changed at all in the current Python
implementation!

I will write it in another way: I am proposing a change that will make
Python objects, including strings, mutable, and gives you other
advantages as well. I claim that it won't make existing Python
programs run slower in O() terms. It would allow you to do many things
that you can't do today; some of them would be fast, like editing a
string, and some of them would be less fast - for example, repeatedly
changing an object and freezing it.

I think that the performance penalty may be rather small - remember
that in programs which do not change strings, there would never be a
need to copy the string data at all. And since I think that usually
most of the dict lookups are for method or function names, there would
almost never be a need to constuct a new object on dict lookup,
because you search for the same names again and again, and a new
object is created only on the first frozen() call. You might even gain
performance, because s += x would be faster.

...

> You have clarified it, but it is still wrong.  I stand by 'it is not
> easy to get right', and would further claim, "I doubt it is possible to
> make it fast."

It would not be very easy to implement, of course, but I hope that it
won't be very hard either, since the basic idea is quite simple. Do
you still doubt the possibility of making it fast, given my (correct)
definition of fast?

And if it's possible (which I think it is), it would allow us to get
rid of inconvinient immutable objects, and it would let us put
everything into a set. Isn't that nice?

>
> Good day,
>  - Josiah
>
The same to you,
Noam


More information about the Python-Dev mailing list