[Python-Dev] apparent ruminations on mutable immutables (was: PEP 351, the freeze protocol)

Tue Nov 1 21:49:59 CET 2005

On 11/1/05, Josiah Carlson <jcarlson at uci.edu> wrote:
...
>
> I am an advocate for PEP 351.  However, I am against your proposed
> implementation/variant of PEP 351 because I don't believe it ads enough
> to warrant the additional complication and overhead necessary for every
> object (even tuples would need to get a .frozen_cache member).
>
> Give me a recursive freeze from PEP 351 (which handles objects that are
> duplicated, but errors out on circular references), and I'll be happy.
>
That's fine - but it doesn't mean that I must be happy with it.
>
...
> >
> > This isn't correct - freezing a set won't require a single copy to be
> > performed, as long as the frozen copy isn't saved after the original
> > is changed. Copy+cache always requires one copy.
>
> You are wrong, and you even say you are wrong..."freezing a set doesn't
> require a COPY, IF the frozen COPY isn't saved after the original is
> CHANGED". Creating an immutable set IS CREATING A COPY, so it ALSO
> copies, and you admit as much, but then say the equivalent of "copying
> isn't copying because I say so".

No, I am not wrong. I am just using misleading terms. I will call a
"frozen copy" a "frozen image". Here it goes: "freezing a set doesn't
require a COPY, IF the frozen IMAGE isn't saved after the original is
CHANGED". I suggest that there would be a way to create a frozenset
without COPYING an O(n) amount of MEMORY. When a frozen set is created
by a call frozen(x), it would not copy all the data, but would rather
reference the existing data, which was created by the non-frozen set.
Only if the original set changes, when there's a frozen set
referencing the data, the MEMORY would be actually copied.

I call it a "frozen copy" because it behaves as a frozen copy, even
though not all the memory is being copied. When you call the COPY
function in the COPY module with a string, it doesn't really copy
memory - the same string is returned. When you copy a file inside
subversion, it doesn't actually copy all the data associated with it,
but does something smarter, which takes O(1). The point is, for the
user, it's a copy. Whether or not memory is actually being copied, is
an implementation detail.
>
...
>
> I think that adding an additional attribute to literally every single
> object to handle the caching of 'frozen' objects, as well as a list to
> every object to handle callbacks which should be called on object
> mutation, along with a _call_stuff_when_mutated() method that handles
> these callback calls, IN ADDITION TO the __freeze__ method which is
> necessary to support this, is a little much, AND IS CERTAINLY NOT A
> SIMPLIFICATION!

I don't agree. You don't need to add a list to every object, since you
can store all those relations in one place, with a standard function
for registering them. Anyway, code written in Python (which is the
language we are discussing) WON'T BE COMPLICATED! The frozen
mechanism, along with two new protocols (__frozen__ and __changed__),
would be added automatically! The internal state of a class written in
Python can be automatically frozen, since it's basically a dict. Now
let's see if it's a simplification:

1. No Python code would have to be made more complicated because of the change.
2. There would be no need to find workarounds, like cStringIO, for the
fact that strings and tuples are immutable.
3. You would be able to put any kind of object into a set, or use it
as a dict key.
4. Classes (like the graph example) would be able to give users things
without having to make a choice between risking their users with
strange bugs, making a complicated interface, making very inefficient
methods, and writing complicated wrapper classes.

I will ask you: Is this a complication?
The answer is: it requires a significent change of the CPython
implementation. But about the Python language: it's definitely a
simplification.
>
> Let us pause for a second and consider:
> Original PEP proposed 1 new method: __freeze__, which could be
> implemented as a subclass of the original object (now), and integrated
> into the original classes as time goes on.  One could /register/
> __freeze__ functions/methods a'la Pickle, at which point objects
> wouldn't even need a native freeze method.
>
> Your suggestion offers 2 new methods along with 2 new instance variables.
> Let's see, a callback handler, __freeze__, the cache, and the callback
> list.  Doesn't that seem a little excessive to you to support freezing?
> It does to me.  If Guido were to offer your implementation of freeze, or
> no freeze at all, I would opt for no freeze, as implementing your freeze
> on user-defined classes would be a pain in the ass, not to mention
> implementing them in C code would be more than I would care to do, and
> more than I would ask any of the core developers to work on.
>
As I said above: this suggestion would certainly require more change
in the Python implementation than your suggestion. But the Python
language would gain a lot more. Implementing my frozen on user-defined
classes would not be a pain in the ass, because it will require no
work at all - the Python implementation would provide it
automatically. The fact that it can be done automatically for
user-defined classes raises a hope in me that it can be made not too
complicated for classes written in C.
>
...
>
> But here's the crucial observation which you are missing.  You yourself
> have stated that in both your table and graph examples you want your
> application to continue to modify values while the user can't manipulate
> them.  So even in your own use-cases, you are going to be modifying
> objects after they have been frozen, and even then it won't be fast!

No. In the table example, the table would never change the object
themselves - it may only calculate new values, and drop the references
to the old ones. This is definitely a case of not changing the value
after it has been frozen.

In the graph example, it is true that the set would be changed after
it's frozen, but it is expected that the frozen copy would not exist
by the time the change happens - think about the x is
graph.neighbours(y) example. There is actually no reason for keeping
them, besides for tracking the history of the graph - which would
require a copy anyway. The frozen() implementation of objects which do
not reference non-frozen objects, such as sets, really doesn't copy
any memory when it's called, and will never cause a memory copy if
there are no living frozen copies of the object while the object
changes.
>
> I believe that in general, people who are freezing things are going to
> want to be changing the original objects - hence the use of mutables to
> begin with - maybe for situations like yours where you don't want users
> mutating returns, whatever.  If after they have frozen the object, they
> don't want to be changing the original objects, then they are probably
> going to be tossing out the original mutable and using the immutable
> created with freeze anyways (mutate your object until you get it right,
> then freeze it and use that so that no one can alter your data, not even
> yourself), so I think that caching is something that the /user/ should
> be doing, NOT Python.

I don't agree. The table and the graph are examples. The common use
patterns I see regarding frozen() are:
1. Don't use frozen() at all. Think about strings becoming mutable.
Most strings which are changed would never be frozen. When you are
using a list, how many times do you make a frozen copy of it? (The
answer is zero, of course, you can't. You can't use it as a dict key,
or as a member of a set. This is just to show you that not freezing
mutable objects is a common thing.)
2. Create the object using more operations than constructing it, and
then don't change it, possibly making a frozen copy of it. The table
is an example: functions given by the user create objects, in whatever
way they choose, and then the table doesn't need to change them, and
needs to create a frozen copy.
It's a very reasonable use case: I would say that the less common case
is that you can create an object using only the constructor. Many
times you make a tuple out of a list that you've created just for that
purpose. It's not intuitive!
>
> The simple implementation (not copy-on-write) leads us to a simple
> matter of documenting, "Freeze is 'stateless'; every call to freeze
> returns a new object, regardless of modifications (or lack thereof)
> between freeze calls."
>
> Remember: "Simple is better than complex."

Firstly, you are talking about implementation. Secondly, sometimes
things are too simple, and lead to complex workarounds.
>
...
>
> Not everything can/should become mutable.  Integers should never become
> mutable, as tuples should never become mutable, as strings/unicode
> should never become mutable...wait, aren't we getting to the point that
> everything which is currently immutable shouldn't become mutable?
> Indeed.  I don't believe that any currently immutable object should be
> able to become mutable in order to satisfy /anyone's/ desire for mutable
> /anything/.

Integers should never become mutable - right. There should be no
mutable ints in Python. Tuples and strings should never become mutable
- wrong. Strings created by the user should be mutable - those
immutable strings are a Python anomaly. All I was saying was that
sometimes, the Python implementation would want to use immutable
strings. So will users, sometimes. There is a need for a mutable
string, and a need for an immutable string, and a need for an
efficient conversion between the two. That's all.
>
>
> In starting to bring up benchmarks you are falling into the trap of
> needing to /have/ a benchmark (I have too), for which there are very few,
> if any, current use-cases.
>
No, I don't. There are a lot of use cases. As I said, I suggest a
change to the Python language, which would give you many benefits.
When suggesting such a change, it should be verified that the
performance of existing Python programs won't be harmed, which I did.

What might be done as well, is to compare my suggestion to yours:

> Without having or needing a benchmark, I'll state quite clearly where
> your proposed copy-on-write would beat out the naive 'create a new copy
> on every call to freeze':
> 1. If objects after they are frozen are never modified, copy on write
> will be faster.
> 2. If original objects are modified after they are frozen, then the
> naive implementation will be as fast if not faster in general, due to
> far lower overhead, but may be slower in corner cases where some nested
> structure is unchanged, and some shallow bit has changed:

As I said, many objects are never modified after they are frozen.
***This includes all the strings which are used in current Python
programs as dict keys*** - I suggest that strings would become mutable
by default. This means that whenever you use a string as a dict key, a
call to frozen() is done by the dict. It's obvious that the string
won't change after it is frozen.

Now, my suggestion is faster in its order of complexity than yours. In
some cases, yours is faster by a constant, which I claim that would be
quite small in real use cases.
>
>    x = [[], NEVER_CHANGED_MUTABLE_NESTED_STRUCTURE]
>    y = freeze(x)
>    x[0].append(1)
>    z = freeze(x)
>
This is one of the cases in which the change in order of complexity is
significant.

> Further, discussing benchmarks on use-cases, for which there are few (if
> any) previously existing uses, is like saying "let's race cars" back in
> 1850; it's a bit premature.

I don't agree. That's why we're discussing it.
>
>
> Then there is this other example:
>
>    x = [1,2,3]
>    y = freeze(x)
>
> The flat version of freeze in the PEP right now handles this case.  I
> can change x all I want, yet I have a frozen y which stays unchanged.
> This is what I would want, and I would imagine it is what others would
> want too.  In fact, this is precisely the use-case you offered for your
> table and graph examples, so your expression of a sentiment of "users
> aren't going to be changing the object after it has been frozen" is, by
> definition, wrong: you do it yourself!

Okay, so may I add another use case in which frozen() is fast: If an
object which only holds references to frozen object is changed after a
frozen copy of it has been made, and the frozen copy is discarded
before the change is made, frozen() would still take O(1). This is the
case with the graph.
>
>
> > I will write it in another way: I am proposing a change that will make
> > Python objects, including strings, mutable, and gives you other
> > advantages as well. I claim that it won't make existing Python
> > programs run slower in O() terms. It would allow you to do many things
> > that you can't do today; some of them would be fast, like editing a
> > string, and some of them would be less fast - for example, repeatedly
> > changing an object and freezing it.
>
> Your claim on running time only works if the original isn't changed
> after it is frozen

But they won't, in existing Python programs, so my claim: "it won't
make existing Python programs run slower in O() terms" is absolutely
correct!
>
> And I don't like making everything mutable, it's a "solution looking for
> a problem", or a "tail wagging the dog" idea.  There is no good reason
> to make everything mutable, and I challenge you to come up with a valid
> one that isn't already covered by the existing standard library or
> extension modules.
>
> There is no need to bring strings into this conversation as there are
> string variants which are already mutable: array.array('c', ...),
> StringIO, mmap, take your pick!  And some future Python (perhaps 2.5)
> will support a 'bytes' object, which is essentially an mmap which
> doesn't need to be backed by a file.

My two examples don't have a satisfactory solution currently.
All this variety acutally proves my point: There is "more than one way
to do it" because these are all workarounds! If strings were mutable,
I won't have to learn about all these nice modules. And if that's not
enough, here's a simple use case which isn't covered by all those
modules. Say I store my byte arrays using array.array, or mmap. What
if I want to make a set of those, in order to check if a certain byte
sequence was already encountered? I CAN'T. I have to do another
workaround, which will probably be complicated and unefficient, to
convert my byte array into a string.

Everything is possible, if you are willing to work hard enough. I am
suggesting to simplify things.

...
>
> You really don't know how Python internals work.
>
> The slow part of s += x on strings in Python 2.4 is the memory
> reallocation and occasional data copy (it has been tuned like crazy by
> Raymond in 2.4, see _PyString_Resize in stringobject.c). Unless you
> severely over-allocated your strings, this WOULD NOT BE SPED UP BY
> MUTABLE STRINGS.

The fact that it has been tuned like crazy, and that it had to wait
for Python 2.4, is just showing us that we talking on *another*
workaround. And please remember that this optimization was announced
not to be counted on, in case you want your code to work efficiently
on other Python implementations. In that case (which would just grow
more common in the future), you would have to get back to the other
workarounds, like cStringIO.
>
> Further, identifiers/names (obj, obj.attr, obj.attr1.attr2, ...) are
> already created during compile-time, and are 'interned'.  That is, if
> you have an object that you call 'foo', there gets to be a single "foo"
> string, which is referenced by pointer by any code in that module which
> references the 'foo' object to the single, always unchanging "foo"
> string.  And because the string has already been hashed, it has a cached
> hash value, and lookups in dictionaries are already fast due to a check
> for pointer equivalency before comparing contents.  Mutable strings
> CANNOT be faster than this method.

Right - they can't be faster than this method. But they can be
virtually AS FAST. Store frozen strings as identifiers/names, and you
can continue to use exactly the same method you described.
>
>
...
>
> I would claim that your definition is limited.  Yours would be fast if
> objects never changed after they are frozen, which is counter to your
> own use-cases.  This suggests that your definition is in fact incorrect,
> and you fail to see your own inconsistancy.
>
I have answered this above: It is not counter to my use cases, and
it's a very good assumption, as it is true in many examples, including
all current Python programs.
>
> > And if it's possible (which I think it is), it would allow us to get
> > rid of inconvinient immutable objects, and it would let us put
> > everything into a set. Isn't that nice?
>
> No, it sounds like a solution looking for a problem.  I see no need to
> make strings, floats, ints, tuples, etc. mutable, and I think that you
> will have very little luck in getting core Python developer support for
> any attempt to make them mutable.

Concerning ints, floats, complexes, and any other object with a
constant memory use, I agree. Concerning other objects, I disagree. I
think that it would simplify things considerably, and that many things
that we are used to are actually workarounds.
>
> If you make such a suggestion, I would offer that you create a new PEP,
> because this discussion has gone beyond PEP 351, and has wandered into
> the realm of "What other kinds of objects would be interesting to have
> in a Python-like system?"
>
That is a good suggestion, and I have already started to write one. It
takes me a long time, but I hope I will manage.
>
>
> I'll summarize your claims:
> 1. copy-on-write is a simplification
> 2. everything being mutable would add to Python
> 3. copy-on-write is fast
> 4. people won't be mutating objects after they are frozen
>
> I'll counter your claims:

I'll counter-counter them:

> 1. 2 methods and 2 instance variables on ALL OBJECTS is not a
> simplification.

It is. This is basically an implementation detail, Python code would
never be complicated.

> 2. a = b = 1; a += 1;  If all objects were to become mutable, then a ==
> b, despite what Python and every other sane language would tell you, and
> dct[a] would stop working (you would have to use c = freeze(a);dct[c],
> or dct[x] would need to automatically call freeze and only ever
> reference the result, significantly slowing down ALL dictionary
> references).

This might be the point that I didn't stress enough. Dict *would* call
freeze, and this is why more work is needed to make sure it is a quick
operation. I have proven that it is quick in O() terms, and I claimed
that it can be made quick in actual terms.

> 3. only if you NEVER MUTATE an object after it has been frozen

...or if the frozen copy is killed before the change, for many types of objects.

> 4. /you/ mutate original objects after they are frozen

Yes I do, but see 3.

>
> ALSO:
> 5. You fail to realize that if all objects were to become mutable, then
> one COULDN'T implement frozen, because the frozen objects THEMSELVES
> would be mutable.

Really, you take me by the word. All objects COULD become mutable, if
we supply a frozen version of it. This doesn't include any object
which you don't want, including ints, and including frozen objects.
>
> I'm going to bow out of this discussion for a few reasons, not the least
> of which being that I've spent too much time on this subject, and that I
> think it is quite clear that your proposal is dead, whether I had
> anything to do with it or not.
>
>  - Josiah

That's fine. I wish that you read my answer, think about it a little,
and just tell me in a yes or a no if you still consider it dead. I
think that I have answered all your questions, and I hope that at
least others would be convinced by them, and that at the end my
suggestion would be accepted.

Others who read this - please respond if you think there's something
to my suggestion!

Thanks for your involvement. I hope it would at least help me better
explain my idea.
Noam