[Python-Dev] apparent ruminations on mutable immutables (was: PEP 351, the freeze protocol)

Tue Nov 1 19:48:46 CET 2005

Noam Raphael <noamraph at gmail.com> wrote:
> On 10/31/05, Josiah Carlson <jcarlson at uci.edu> wrote:

> > > About the users-changing-my-internal-data issue:
> ...
> > You can have a printout before it dies:
> > "I'm crashing your program because something attempted to modify a data
> > structure (here's the traceback), and you were told not to."
> >
> > Then again, you can even raise an exception when people try to change
> > the object, as imdict does, as tuples do, etc.
> 
> Both solutions would solve the problem, but would require me to wrap
> the built-in set with something which doesn't allow changes. This is a
> lot of work - but it's quite similiar to what my solution would
> actually do, in a single built-in function.

I am an advocate for PEP 351.  However, I am against your proposed
implementation/variant of PEP 351 because I don't believe it ads enough
to warrant the additional complication and overhead necessary for every
object (even tuples would need to get a .frozen_cache member).

Give me a recursive freeze from PEP 351 (which handles objects that are
duplicated, but errors out on circular references), and I'll be happy.

> > > You suggest two ways for solving the problem. The first is by copying
> > > my mutable objects to immutable copies:
> >
> > And by caching those results, then invalidating them when they are
> > updated by your application.  This is the same as what you would like to
> > do, except that I do not rely on copy-on-write semantics, which aren't
> > any faster than freeze+cache by your application.
> 
> This isn't correct - freezing a set won't require a single copy to be
> performed, as long as the frozen copy isn't saved after the original
> is changed. Copy+cache always requires one copy.

You are wrong, and you even say you are wrong..."freezing a set doesn't
require a COPY, IF the frozen COPY isn't saved after the original is
CHANGED". Creating an immutable set IS CREATING A COPY, so it ALSO
copies, and you admit as much, but then say the equivalent of "copying
isn't copying because I say so".

> > In any case, whether you choose to use freeze, or use a different API,
> > this particular problem is solvable without copy-on-write semantics.
> 
> Right. But I think that a significant simplification of the API is a
> nice bonus for my solution. And about those copy-on-write semantics -
> it should be proven how complex they are. Remember that we are talking
> about frozen-copy-on-write, which I think would simplify matters
> considerably - for example, there are at most two instances sharing
> the same data, since the frozen copy can be returned again and again.

I think that adding an additional attribute to literally every single
object to handle the caching of 'frozen' objects, as well as a list to
every object to handle callbacks which should be called on object
mutation, along with a _call_stuff_when_mutated() method that handles
these callback calls, IN ADDITION TO the __freeze__ method which is
necessary to support this, is a little much, AND IS CERTAINLY NOT A
SIMPLIFICATION!

Let us pause for a second and consider:
Original PEP proposed 1 new method: __freeze__, which could be
implemented as a subclass of the original object (now), and integrated
into the original classes as time goes on.  One could /register/
__freeze__ functions/methods a'la Pickle, at which point objects
wouldn't even need a native freeze method.

Your suggestion offers 2 new methods along with 2 new instance variables. 
Let's see, a callback handler, __freeze__, the cache, and the callback
list.  Doesn't that seem a little excessive to you to support freezing?
It does to me.  If Guido were to offer your implementation of freeze, or
no freeze at all, I would opt for no freeze, as implementing your freeze
on user-defined classes would be a pain in the ass, not to mention
implementing them in C code would be more than I would care to do, and
more than I would ask any of the core developers to work on.

> > Even without validation, there are examples that force a high number of
> > calls, which are not O(1), ammortized or otherwise.
> >
> [Snap - a very interesting example]
> >
> > Now, the actual time analysis on repeated freezings and such gets ugly.
> > There are actually O(k) objects, which take up O(k**2) space.  When you
> > modify object b[i][j] (which has just been frozen), you get O(k)
> > callbacks, and when you call freeze(b), it actually results in O(k**2)
> > time to re-copy the O(k**2) pointers to the O(k) objects.  It should be
> > obvious that this IS NOT AMMORTIZABLE to original object creation time.
> >
> That's absolutely right. My ammortized analysis is correct only if you
> limit yourself to cases in which the original object doesn't change
> after a frozen() call was made. In that case, it's ok to count the
> O(k**2) copy with the O(k**2) object creation, because it's made only
> once.

But here's the crucial observation which you are missing.  You yourself
have stated that in both your table and graph examples you want your
application to continue to modify values while the user can't manipulate
them.  So even in your own use-cases, you are going to be modifying
objects after they have been frozen, and even then it won't be fast!

I believe that in general, people who are freezing things are going to
want to be changing the original objects - hence the use of mutables to
begin with - maybe for situations like yours where you don't want users
mutating returns, whatever.  If after they have frozen the object, they
don't want to be changing the original objects, then they are probably
going to be tossing out the original mutable and using the immutable
created with freeze anyways (mutate your object until you get it right,
then freeze it and use that so that no one can alter your data, not even
yourself), so I think that caching is something that the /user/ should
be doing, NOT Python.

The simple implementation (not copy-on-write) leads us to a simple
matter of documenting, "Freeze is 'stateless'; every call to freeze
returns a new object, regardless of modifications (or lack thereof)
between freeze calls."

Remember: "Simple is better than complex."

> Why it's ok to analyze only that limited case? I am suggesting a
> change in Python: that every object you would like be mutable, and
> would support the frozen() protocol. When you evaluate my suggestion,
> you need to take a program, and measure its performance in the current
> Python and in a Python which implements my suggestion. This means that
> the program should work also on the current Python. In that case, my
> assumption is true - you won't change objects after you have frozen
> them, simply because these objects (strings which are used as dict
> keys, for example) can't be changed at all in the current Python
> implementation!

Not everything can/should become mutable.  Integers should never become
mutable, as tuples should never become mutable, as strings/unicode
should never become mutable...wait, aren't we getting to the point that
everything which is currently immutable shouldn't become mutable? 
Indeed.  I don't believe that any currently immutable object should be
able to become mutable in order to satisfy /anyone's/ desire for mutable
/anything/.

In starting to bring up benchmarks you are falling into the trap of
needing to /have/ a benchmark (I have too), for which there are very few,
if any, current use-cases.

Without having or needing a benchmark, I'll state quite clearly where
your proposed copy-on-write would beat out the naive 'create a new copy
on every call to freeze':
1. If objects after they are frozen are never modified, copy on write
will be faster.
2. If original objects are modified after they are frozen, then the
naive implementation will be as fast if not faster in general, due to
far lower overhead, but may be slower in corner cases where some nested
structure is unchanged, and some shallow bit has changed:

    x = [[], NEVER_CHANGED_MUTABLE_NESTED_STRUCTURE]
    y = freeze(x)
    x[0].append(1)
    z = freeze(x)

Further, discussing benchmarks on use-cases, for which there are few (if
any) previously existing uses, is like saying "let's race cars" back in
1850; it's a bit premature.

Then there is this other example:

    x = [1,2,3]
    y = freeze(x)

The flat version of freeze in the PEP right now handles this case.  I
can change x all I want, yet I have a frozen y which stays unchanged. 
This is what I would want, and I would imagine it is what others would
want too.  In fact, this is precisely the use-case you offered for your
table and graph examples, so your expression of a sentiment of "users
aren't going to be changing the object after it has been frozen" is, by
definition, wrong: you do it yourself!

> I will write it in another way: I am proposing a change that will make
> Python objects, including strings, mutable, and gives you other
> advantages as well. I claim that it won't make existing Python
> programs run slower in O() terms. It would allow you to do many things
> that you can't do today; some of them would be fast, like editing a
> string, and some of them would be less fast - for example, repeatedly
> changing an object and freezing it.

Your claim on running time only works if the original isn't changed
after it is frozen

And I don't like making everything mutable, it's a "solution looking for
a problem", or a "tail wagging the dog" idea.  There is no good reason
to make everything mutable, and I challenge you to come up with a valid
one that isn't already covered by the existing standard library or
extension modules.

There is no need to bring strings into this conversation as there are
string variants which are already mutable: array.array('c', ...),
StringIO, mmap, take your pick!  And some future Python (perhaps 2.5)
will support a 'bytes' object, which is essentially an mmap which
doesn't need to be backed by a file.

> I think that the performance penalty may be rather small - remember
> that in programs which do not change strings, there would never be a
> need to copy the string data at all. And since I think that usually
> most of the dict lookups are for method or function names, there would
> almost never be a need to constuct a new object on dict lookup,
> because you search for the same names again and again, and a new
> object is created only on the first frozen() call. You might even gain
> performance, because s += x would be faster.

You really don't know how Python internals work.

The slow part of s += x on strings in Python 2.4 is the memory
reallocation and occasional data copy (it has been tuned like crazy by
Raymond in 2.4, see _PyString_Resize in stringobject.c). Unless you
severely over-allocated your strings, this WOULD NOT BE SPED UP BY
MUTABLE STRINGS.

Further, identifiers/names (obj, obj.attr, obj.attr1.attr2, ...) are
already created during compile-time, and are 'interned'.  That is, if
you have an object that you call 'foo', there gets to be a single "foo"
string, which is referenced by pointer by any code in that module which
references the 'foo' object to the single, always unchanging "foo"
string.  And because the string has already been hashed, it has a cached
hash value, and lookups in dictionaries are already fast due to a check
for pointer equivalency before comparing contents.  Mutable strings
CANNOT be faster than this method.

> > You have clarified it, but it is still wrong.  I stand by 'it is not
> > easy to get right', and would further claim, "I doubt it is possible to
> > make it fast."
> 
> It would not be very easy to implement, of course, but I hope that it
> won't be very hard either, since the basic idea is quite simple. Do
> you still doubt the possibility of making it fast, given my (correct)
> definition of fast?

I would claim that your definition is limited.  Yours would be fast if
objects never changed after they are frozen, which is counter to your
own use-cases.  This suggests that your definition is in fact incorrect,
and you fail to see your own inconsistancy.

> And if it's possible (which I think it is), it would allow us to get
> rid of inconvinient immutable objects, and it would let us put
> everything into a set. Isn't that nice?

No, it sounds like a solution looking for a problem.  I see no need to
make strings, floats, ints, tuples, etc. mutable, and I think that you
will have very little luck in getting core Python developer support for
any attempt to make them mutable.

If you make such a suggestion, I would offer that you create a new PEP,
because this discussion has gone beyond PEP 351, and has wandered into
the realm of "What other kinds of objects would be interesting to have
in a Python-like system?"

I'll summarize your claims:
1. copy-on-write is a simplification
2. everything being mutable would add to Python
3. copy-on-write is fast
4. people won't be mutating objects after they are frozen

I'll counter your claims:
1. 2 methods and 2 instance variables on ALL OBJECTS is not a
simplification.
2. a = b = 1; a += 1;  If all objects were to become mutable, then a ==
b, despite what Python and every other sane language would tell you, and
dct[a] would stop working (you would have to use c = freeze(a);dct[c],
or dct[x] would need to automatically call freeze and only ever
reference the result, significantly slowing down ALL dictionary
references).
3. only if you NEVER MUTATE an object after it has been frozen
4. /you/ mutate original objects after they are frozen

ALSO:
5. You fail to realize that if all objects were to become mutable, then
one COULDN'T implement frozen, because the frozen objects THEMSELVES
would be mutable.

I'm going to bow out of this discussion for a few reasons, not the least
of which being that I've spent too much time on this subject, and that I
think it is quite clear that your proposal is dead, whether I had
anything to do with it or not.

 - Josiah