PEP 351, the freeze protocol
I've had this PEP laying around for quite a few months. It was inspired by some code we'd written which wanted to be able to get immutable versions of arbitrary objects. I've finally finished the PEP, uploaded a sample patch (albeit a bit incomplete), and I'm posting it here to see if there is any interest. http://www.python.org/peps/pep-0351.html Cheers, -Barry
On 10/23/05, Barry Warsaw
I've had this PEP laying around for quite a few months. It was inspired by some code we'd written which wanted to be able to get immutable versions of arbitrary objects. I've finally finished the PEP, uploaded a sample patch (albeit a bit incomplete), and I'm posting it here to see if there is any interest.
My sandboxes need freezing for some stuff and ultimately freezable user classes will be desirable, but for performance reasons I prefer freezing inplace. Not much overlap with PEP 351 really. -- Adam Olsen, aka Rhamphoryncus
Barry Warsaw
I've had this PEP laying around for quite a few months. It was inspired by some code we'd written which wanted to be able to get immutable versions of arbitrary objects. I've finally finished the PEP, uploaded a sample patch (albeit a bit incomplete), and I'm posting it here to see if there is any interest.
class xlist(list): def __freeze__(self): return tuple(self) Shouldn't that be: class xlist(list): def __freeze__(self): return tuple(map(freeze, self)) "Should dicts and sets automatically freeze their mutable keys?" Dictionaries don't have mutable keys, but it is of my opinion that a container which is frozen should have its contents frozen as well. - Josiah
On 10/24/05, Josiah Carlson
"Should dicts and sets automatically freeze their mutable keys?"
Dictionaries don't have mutable keys,
Since when? class Foo: def __init__(self): self.x = 1 f = Foo() d = {f: 1} f.x = 2 Maybe you meant something else? I can't think of any way in which "dictionaries don't have mutable keys" is true. The only rule about dictionary keys that I know of is that they need to be hashable and need to be comparable with the equality operator. -- Twisted | Christopher Armstrong: International Man of Twistery Radix | -- http://radix.twistedmatrix.com | Release Manager, Twisted Project \\\V/// | -- http://twistedmatrix.com |o O| | w----v----w-+
Christopher Armstrong
On 10/24/05, Josiah Carlson
wrote: "Should dicts and sets automatically freeze their mutable keys?"
Dictionaries don't have mutable keys,
Since when?
Maybe you meant something else? I can't think of any way in which "dictionaries don't have mutable keys" is true. The only rule about dictionary keys that I know of is that they need to be hashable and need to be comparable with the equality operator.
Good point, I forgot about user-defined classes (I rarely use them as keys myself, it's all too easy to make a mutable whose hash is dependant on mutable contents, as having an object which you can only find if you have the exact object is not quite as useful I generally need). I will, however, stand by, "a container which is frozen should have its contents frozen as well." - Josiah
Barry Warsaw wrote:
I've had this PEP laying around for quite a few months. It was inspired by some code we'd written which wanted to be able to get immutable versions of arbitrary objects. I've finally finished the PEP, uploaded a sample patch (albeit a bit incomplete), and I'm posting it here to see if there is any interest.
I think it's definitely worth considering. It may also reduce the need for "x" and "frozenx" builtin pairs. We already have "set" and "frozenset", and the various "bytes" ideas that have been kicked around have generally considered the need for a "frozenbytes" as well. If freeze was available, then "freeze(x(*args))" might server as a replacement for any builtin "frozen" variants. I think having dicts and sets automatically invoke freeze would be a mistake, because at least one of the following two cases would behave unexpectedly: d = {} l = [] d[l] = "Oops!" d[l] # Raises KeyError if freeze() isn't also invoked in __getitem__ d = {} l = [] d[l] = "Oops!" l.append(1) d[l] # Raises KeyError regardless Oh, and the PEP's xdict example is even more broken than the PEP implies, because two imdicts which compare equal (same contents) may not hash equal (different id's). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.blogspot.com
Nick Coghlan
I think having dicts and sets automatically invoke freeze would be a mistake, because at least one of the following two cases would behave unexpectedly:
I'm pretty sure that the PEP was only aslomg if one would freeze the contents of dicts IF the dict was being frozen. That is, which of the following should be the case: freeze({1:[2,3,4]}) -> {1:[2,3,4]} freeze({1:[2,3,4]}) -> xdict(1=(2,3,4)) - Josiah
Josiah Carlson wrote:
Nick Coghlan
wrote: I think having dicts and sets automatically invoke freeze would be a mistake, because at least one of the following two cases would behave unexpectedly:
I'm pretty sure that the PEP was only aslomg if one would freeze the contents of dicts IF the dict was being frozen.
That is, which of the following should be the case: freeze({1:[2,3,4]}) -> {1:[2,3,4]} freeze({1:[2,3,4]}) -> xdict(1=(2,3,4))
I believe the choices you intended are: freeze({1:[2,3,4]}) -> imdict(1=[2,3,4]) freeze({1:[2,3,4]}) -> imdict(1=(2,3,4)) Regardless, that question makes a lot more sense (and looking at the PEP again, I realised I simply read it wrong the first time). For containers where equality depends on the contents of the container (i.e., all the builtin ones), I don't see how it is possible to implement a sensible hash function without freezing the contents as well - otherwise your immutable isn't particularly immutable. Consider what would happen if list "__freeze__" simply returned a tuple version of itself - you have a __freeze__ method which returns a potentially unhashable object! Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://boredomandlaziness.blogspot.com
Nick Coghlan
Josiah Carlson wrote:
Nick Coghlan
wrote: I think having dicts and sets automatically invoke freeze would be a mistake, because at least one of the following two cases would behave unexpectedly:
I'm pretty sure that the PEP was only aslomg if one would freeze the contents of dicts IF the dict was being frozen.
That is, which of the following should be the case: freeze({1:[2,3,4]}) -> {1:[2,3,4]} freeze({1:[2,3,4]}) -> xdict(1=(2,3,4))
I believe the choices you intended are: freeze({1:[2,3,4]}) -> imdict(1=[2,3,4]) freeze({1:[2,3,4]}) -> imdict(1=(2,3,4))
Regardless, that question makes a lot more sense (and looking at the PEP again, I realised I simply read it wrong the first time).
For containers where equality depends on the contents of the container (i.e., all the builtin ones), I don't see how it is possible to implement a sensible hash function without freezing the contents as well - otherwise your immutable isn't particularly immutable.
Consider what would happen if list "__freeze__" simply returned a tuple version of itself - you have a __freeze__ method which returns a potentially unhashable object!
I agree completely, hence my original statement on 10/23: "it is of my opinion that a container which is frozen should have its contents frozen as well." - Josiah
On Oct 23, 2005, at 6:43 PM, Barry Warsaw wrote:
I've had this PEP laying around for quite a few months. It was inspired by some code we'd written which wanted to be able to get immutable versions of arbitrary objects. I've finally finished the PEP, uploaded a sample patch (albeit a bit incomplete), and I'm posting it here to see if there is any interest.
I like this. I'd like it better if it integrated with the adapter PEP, so that the freezing mechanism for a given type could be pluggable, and could be provided even if the original object did not contemplate it. I don't know where the adapter PEP stands: skimming through the (most recent?) thread in January didn't give me a clear idea. As another poster mentioned, in-place freezing is also of interest to me (and why I read the PEP Initially), but as also as mentioned that's probably unrelated to your PEP. Gary
I'm not sure I understood completely the idea but deriving freeze function from hash gives hash a wider importance. Is __hash__=id inside a class enough to use a set (sets.Set before 2.5) derived class instance as a key to a mapping? Sure I missed the point. Regards Paolino
Is __hash__=id inside a class enough to use a set (sets.Set before 2.5) derived class instance as a key to a mapping? It is __hash__=lambda self:id(self) that is terribly slow ,it needs a faster way to state that to let them be useful as key to mapping as all set operations will pipe into the mechanism .In my application that function is eating time like hell, and will keep on doing it even with
Paolino wrote: the PEP proposed .OT probably. Regards Paolino
[Barry Warsaw]
I've had this PEP laying around for quite a few months. It was inspired by some code we'd written which wanted to be able to get immutable versions of arbitrary objects.
* FWIW, the _as_immutable() protocol was dropped from sets.py for a reason. User reports indicated that it was never helpful in practice. It added complexity and confusion without producing offsetting benefits. * AFAICT, there are no use cases for freezing arbitrary objects when the object types are restricted to just lists and sets but not dicts, arrays, or other containers. Even if the range of supported types were expanded, what applications could use this? Most apps cannot support generic substitution of lists and sets -- they have too few methods in common -- they are almost never interchangeable. * I'm concerned that generic freezing leads to poor design and hard-to-find bugs. One class of bugs results from conflating ordered and unordered collections as lookup keys. It is difficult to assess program correctness when the ordered/unordered distinction has been abstracted away. A second class of errors can arise when the original object mutates and gets out-of-sync with its frozen counterpart. * For a rare app needing mutable lookup keys, a simple recipe would suffice: freeze_pairs = [(list, tuple), (set, frozenset)] def freeze(obj): try: hash(obj) except TypeError: for sourcetype, desttype in freeze_pairs: if isinstance(obj, sourcetype): return desttype(obj) raise else: return obj Unlike the PEP, the recipe works with older pythons and is trivially easy to extend to include other containers. * The name "freeze" is problematic because it suggests an in-place change. Instead, the proposed mechanism creates a new object. In contrast, explicit conversions like tuple(l) or frozenset(s) are obvious about their running time, space consumed, and new object identity. Overall, I'm -1 on the PEP. Like a bad C macro, the proposed abstraction hides too much. We lose critical distinctions of ordered vs unordered, mutable vs immutable, new objects vs in-place change, etc. Without compelling use cases, the mechanism smells like a hyper-generalization. Raymond
Hello, I have thought about freezing for some time, and I think that it is a fundamental need - the need to know, sometimes, that objects aren't going to change. This is mostly the need of containers. dicts need to know that the objects that are used as keys aren't going to change, because if they change, their hash value changes, and you end up with a data structure in an inconsistent state. This is the need of sets too, and of heaps, and binary trees, and so on. I want to give another example: I and my colleges designed something which can be described as an "electronic spreadsheet in Python". We called it a "table". The values in the table are Python objects, and the functions which relate them are written in Python. Then comes the problem: the user has, of course, access to the objects stored in the table. What would happen if he changes them? The answer is that the table would be in an inconsistent state, since something which should be the return value of a function is now something else, and there's no way for the table to know about that. The solution is to have a "freeze" protocol. It may be called "frozen" (like frozen(set([1,2,3]))), so that it will be clear that it does not change the object itself. The definition of a frozen object is that its value can't change - that is, if you compare it with another object, you should get the same result as long as the other object hasn't changed. As a rule, only frozen objects should be hashable. I want to give another, different, use case for freezing objects. I once thought about writing a graph package in Python - I mean a graph with vertices and edges. The most obvious way to store a directed graph is as a mapping (dict) from a node to the set of nodes that it points to. Since I want to be able to find also which nodes point to a specific node, I will store another mapping, from a node to the set of nodes that point to it. Now, I want a method of the graph which will return the set of nodes that a given node points to, for example to let me write "if y in graph.adjacent_nodes(x) then". The question is, what will the adjacent_nodes method return? If it returns the set which is a part of the data structure, there is nothing (even no convention!) that will prevent the user from playing with it. This will corrupt the data structure, since the change won't be recorded in the inverse mapping. adjacent_nodes can return a copy of the set, it's a waste if you only want to check whether an object is a member of the set. I gave this example to say that the "frozen" protocol should (when possible) return an object which doesn't really contain a copy of the data, but rather gives an "image" of the original object. If the original object changes while there are frozen copies of it, the data will be copied, and all the frozen objects will then reference a version of the data that will never change again. This will solve the graph problem nicely - adjacent_nodes would simply return a frozen copy of the set, and a copy operation would happen only in the rare cases when the returned set is being modified. This would also help the container use cases: they may call the frozen() method on objects that should be inserted into the container, and usually the data won't be copied. Some objects can't be created in their final form, but can only be constructed step after step. This means that they must be non-frozen objects. Sometimes they are constructed in order to get into a container. Unless the frozen() method is copy-on-change the way I described, all the data would have to be copied again, just for the commitment that it won't change. I don't mean to frighten, but in principle, this may mean that immutable strings might be introduced, which will allow us to get rid of all the cStringIO workarounds. Immutable strings would be constructed whenever they are needed, at a low performance cost (remember that a frozen copy of a given object has to be constructed only once - once it has been created, the same object can be returned on additional frozen() calls.) Copy-on-change of containers of non-frozen objects requires additional complication: it requires frozen objects to have a way for setting a callback that will be called when the original object was changed. This is because the change makes the container of the original object change, so it must drop its own frozen copy. This needs to happen only once per frozen object, since after a change, all the containers drop their frozen copies. I think this callback is conceptually similar to the weakref callback. Just an example that copy-on-change (at least of containers of frozen objects) is needed: sets. It was decided that you can test whether a non-frozen set is a member of a set. I understand that it is done by "temporarily freezing" the set, and that it caused some threading issues. A copy-on-change mechanism might solve it more elegantly. What do you think? Noam
Noam Raphael
Hello,
I have thought about freezing for some time, and I think that it is a fundamental need - the need to know, sometimes, that objects aren't going to change.
I agree with this point.
This is mostly the need of containers. dicts need to know that the objects that are used as keys aren't going to change, because if they change, their hash value changes, and you end up with a data structure in an inconsistent state. This is the need of sets too, and of heaps, and binary trees, and so on.
You are exactly mirroring the sentiments of the PEP.
I want to give another example: I and my colleges designed something which can be described as an "electronic spreadsheet in Python". We called it a "table". The values in the table are Python objects, and the functions which relate them are written in Python. Then comes the problem: the user has, of course, access to the objects stored in the table. What would happen if he changes them? The answer is that the table would be in an inconsistent state, since something which should be the return value of a function is now something else, and there's no way for the table to know about that.
I respectfully disagree with this point and the rest of your email. Why? For two use-cases, you offer 'tables of values' and 'graphs', as well as a possible solution to the 'problem'; copy on write. In reading your description of a 'table of values', I can't help but be reminded of the wxPython (and wxWidget) wx.Grid and its semantics. It offers arbitrary tables of values (whose editors and viewers you can change at will), which offers a mechanism by which you can "listen" to changes that occur to the contents of a cell. I can't help but think that if you offered a protocol by which a user can signal that a cell has been changed, perhaps by writing the value to the table itself (table.SetValue(row, col, value)), every read a deepcopy (or a PEP 351 freeze), etc., that both you and the users of your table would be much happier. As for the graph issue, you've got a bigger problem than users just being able to edit edge lists, users can clear the entire dictionary of vertices (outgoing.clear()). It seems to me that a more reasonable method to handle this particular case is to tell your users "don't modify the dictionaries or the edge lists", and/or store your edge lists as tuples instead of lists or dictionaries, and/or use an immutable dictionary (as offered by Barry in the PEP). There's also this little issue of "copy on write" semantics with Python. Anyone who tells you that "copy on write" is easy, is probably hanging out with the same kind of people who say that "threading is easy". Of course both are easy if you limit your uses to some small subset of interesting interactions, but "copy on write" gets far harder when you start thinking of dictionaries, lists, StringIOs, arrays, and all the possible user-defined classes, which may be mutated beyond obj[key] = value and/or obj.attr = value (some have obj.method() which mutates the object). As such, offering a callback mechanism similar to weak references is probably pretty close to impossible with CPython. One of the reasons why I liked the freeze protocol is that it offered a simple mechanism by which Python could easily offer support, for both new and old objects alike. Want an example? Here's the implementation for array freezing: tuple(a). What about lists? tuple(map(freeze, lst)) Freezing may not ultimately be the right solution for everything, but it is a simple solution which handles the majority of cases. Copy on write in Python, on the other hand, is significantly harder to implement, support, and is probably not the right solution for many problems. - Josiah P.S. To reiterate to Barry: map freeze to the contents of containers, otherwise the object can still be modified, and hence is not frozen.
Hello, It seems that we both agree that freezing is cool (-; . We disagree on whether a copy-on-write behaviour is desired. Your arguments agains copy-on-write are: 1. It's not needed. 2. It's complicated to implement. But first of all, you didn't like my use cases. I want to argue with that.
In reading your description of a 'table of values', I can't help but be reminded of the wxPython (and wxWidget) wx.Grid and its semantics. It offers arbitrary tables of values (whose editors and viewers you can change at will), which offers a mechanism by which you can "listen" to changes that occur to the contents of a cell. I can't help but think that if you offered a protocol by which a user can signal that a cell has been changed, perhaps by writing the value to the table itself (table.SetValue(row, col, value)), every read a deepcopy (or a PEP 351 freeze), etc., that both you and the users of your table would be much happier.
Perhaps I didn't make it clear. The difference between wxPython's Grid and my table is that in the table, most values are *computed*. This means that there's no point in changing the values themselves. They are also used frequently as set members (I can describe why, but it's a bit complicated.) I want to say that even if sets weren't used, the objects in the table should have been frozen. The fact the sets (and dicts) only allow immutable objects as members/keys is just for protecting the user. They could have declared, "you shouldn't change anything you insert - as long as you don't, we'll function properly." The only reason why you can't compute hash values of mutable objects is that you don't want your user to make mistakes, and make the data structure inconsistent.
As for the graph issue, you've got a bigger problem than users just being able to edit edge lists, users can clear the entire dictionary of vertices (outgoing.clear()). It seems to me that a more reasonable method to handle this particular case is to tell your users "don't modify the dictionaries or the edge lists", and/or store your edge lists as tuples instead of lists or dictionaries, and/or use an immutable dictionary (as offered by Barry in the PEP).
As I wrote before, telling my users "don't modify the edge lists" is just like making lists hashable, and telling all Python users, "dont modify lists that are dictionary keys." There's no way to tell the users that - there's no convention for objects which should not be changed. You can write it in the documentation, but who'll bother looking there? I don't think that your other suggestions will work: the data structure of the graph itself can't be made of immutable objects, because of the fact that the graph is a mutable object - you can change it. It can be made of immutable objects, but this means copying all the data every time the graph changes. Now, about copy-on-write:
There's also this little issue of "copy on write" semantics with Python. Anyone who tells you that "copy on write" is easy, is probably hanging out with the same kind of people who say that "threading is easy". Of course both are easy if you limit your uses to some small subset of interesting interactions, but "copy on write" gets far harder when you start thinking of dictionaries, lists, StringIOs, arrays, and all the possible user-defined classes, which may be mutated beyond obj[key] = value and/or obj.attr = value (some have obj.method() which mutates the object). As such, offering a callback mechanism similar to weak references is probably pretty close to impossible with CPython.
Let's limit ourselves to copy-on-write of objects which do not contain nonfrozen objects. Perhaps it's enough - the table, the graph, and strings, are perfect examples of these. Implementation doesn't seem to complicated to me - whenever the object is about to change, and there is a connected frozen copy, you make a shallow copy of the object, point the frozen copy to it, release the reference to the frozen copy, and continue as usual. That's all. I really think that this kind of copy-on-write is "correct". The temporary freezing of sets in order to check if they are members of other sets is a not-very-nice way of implementing it. This kind of copy-on-write would allow, in principle, for Python strings to become mutable, with almost no speed penalty. It would allow my table, and other containers, to automatically freeze the objects that get into it, without having to trust the user on not changing the objects - and remember that there's no way of *telling* him not to change the objects. Now, the computer scientist in me wants to explain (and think about) freezing containers of nonfrozen objects. What I actually want is that as long as an object doesn't change after it's freezed, the cost of freezing would be nothing - that is, O(1). Think about a mutable string object, which is used in the same way as the current, immutable strings. It is constructed once, and then may be used as a key in a dictionary many times. I want to claim that it's a common pattern - create an object, it doesn't matter how, and then use it without changing it. If that is the case, it's obvious that all the frozen() calls would take O(1) each. How can we accomplish this (freezing costs O(1) as long as the object doesn't change) with containers of nonfrozen objects? It seems impossible - no matter what, on the first time the container is freezed, you would have to call frozen() for every object it contains! The answer is that in an amortized analysis, it is still an O(1) operation. The reason is that as long as frozen() takes O(1) (amortized), all those calls to frozen() can be considered a part of the object construction, since they are made only once - on the next call to frozen(), the already-created frozen object would be returned. This analysis is correct as long as the object doesn't change after it's freezed. The problem is that we have to keep the created frozen object as long as the original object stays alive. So we have to know if it has changed. This is where those callbacks get in. As long as what is done with them is correct, there should be no problems. They are used only to disengage the frozen copies from their original objects. The action they should trigger is simply that: def on_contained_object_change(self): self._frozen_copy = None while self._callbacks: self._callbacks.pop()() What's also interesting is that this freezing mechanism can be provided automatically for user-created classes, since those are simply containers of other objects, which behave exactly like dicts, for this matter. It allows everything in Python to be both mutable and hashable, without changing the O() complexity! Wow! Ok, I'm going to sleep now. If you find something wrong with this idea, please tell me. Have a good day, Noam
It allows everything in Python to be both mutable and hashable,
I don't understand, since it's already the case. Any user-defined object is at the same time mutable and hashable. And if you want the hash value to follow the changes in attribute values, just define an appropriate __hash__ method. Regards Antoine.
On 10/31/05, Antoine Pitrou
It allows everything in Python to be both mutable and hashable,
I don't understand, since it's already the case. Any user-defined object is at the same time mutable and hashable.
By default, user-defined objects are equal iff they are the same object, regardless of their content. This makes mutability a non-issue. If you want to allow different objects be equal you need to implement a consistent equality operator (commutative, etc), a consistent hash function and ensure that any attributes affecting equality or hash value are immutable. If you fail to meet any of these requirements and put such objects in dictionaries or sets it will result in undefined behavior that may change between Python versions and implementations. Oren
Noam Raphael
Hello,
It seems that we both agree that freezing is cool (-; . We disagree on whether a copy-on-write behaviour is desired. Your arguments agains copy-on-write are: 1. It's not needed. 2. It's complicated to implement.
But first of all, you didn't like my use cases. I want to argue with that.
In reading your description of a 'table of values', I can't help but be reminded of the wxPython (and wxWidget) wx.Grid and its semantics. It offers arbitrary tables of values (whose editors and viewers you can change at will), which offers a mechanism by which you can "listen" to changes that occur to the contents of a cell. I can't help but think that if you offered a protocol by which a user can signal that a cell has been changed, perhaps by writing the value to the table itself (table.SetValue(row, col, value)), every read a deepcopy (or a PEP 351 freeze), etc., that both you and the users of your table would be much happier.
Perhaps I didn't make it clear. The difference between wxPython's Grid and my table is that in the table, most values are *computed*. This means that there's no point in changing the values themselves. They are also used frequently as set members (I can describe why, but it's a bit complicated.)
Again, user semantics. Tell your users not to modify entries, and/or you can make copies of objects you return. If your users are too daft to read and/or follow directions, then they deserve to have their software not work. Also from the sounds of it, you are storing both source and destination values in the same table...hrm, that sounds quite a bit like a spreadsheet. How does every spreadsheet handle that again? Oh yeah, they only ever store immutables (generally strings which are interpreted). But I suppose since you are (of course) storing mutable objects, you need to work a bit harder...so store mutables, and return immutable copies (which you can cache if you want, and invalidate when your application updates the results...like a wx.Grid update on changed).
As for the graph issue, you've got a bigger problem than users just being able to edit edge lists, users can clear the entire dictionary of vertices (outgoing.clear()). It seems to me that a more reasonable method to handle this particular case is to tell your users "don't modify the dictionaries or the edge lists", and/or store your edge lists as tuples instead of lists or dictionaries, and/or use an immutable dictionary (as offered by Barry in the PEP).
As I wrote before, telling my users "don't modify the edge lists" is just like making lists hashable, and telling all Python users, "dont modify lists that are dictionary keys." There's no way to tell the users that - there's no convention for objects which should not be changed. You can write it in the documentation, but who'll bother looking there?
When someone complains that something doesn't work, I tell them to read the documentation. If your users haven't been told to RTFM often enough to actually make it happen, then you need a RTFM-bat. Want to know how you make one? You start wrapping the objects you return which segfaults the process if they change things. When they start asking, tell them it is documented quite clearly "do not to modify objects returned, or else". Then there's the other option, which I provide below.
I don't think that your other suggestions will work: the data structure of the graph itself can't be made of immutable objects, because of the fact that the graph is a mutable object - you can change it. It can be made of immutable objects, but this means copying all the data every time the graph changes.
So let me get this straight: You've got a graph. You want to be able to change the graph, but you don't want your users to accidentally change the graph. Sounds to me like an API problem, not a freeze()/mutable problem. Want an API? class graph: ... def IterOutgoing(self, node): ... def IterIncoming(self, node): ... def IsAdjacent(self, node1, node2): ... def IterNodes(self): ... def AddEdge(self, f_node, t_node): ... def RemEdge(self, node1, node2): ... def AddNode(self): ... If you are reasonable in your implementation, all of the above operations can be fast, and you will never have to worry about your users accidentally mucking about with your internal data structures: because you aren't exposing them. If you are really paranoid, you can take the next step and implement it in Pyrex or C, so that only a malicous user can muck about with internal structures, at which point you stop supporting them.
Now, about copy-on-write:
There's also this little issue of "copy on write" semantics with Python. Anyone who tells you that "copy on write" is easy, is probably hanging out with the same kind of people who say that "threading is easy". Of course both are easy if you limit your uses to some small subset of interesting interactions, but "copy on write" gets far harder when you start thinking of dictionaries, lists, StringIOs, arrays, and all the possible user-defined classes, which may be mutated beyond obj[key] = value and/or obj.attr = value (some have obj.method() which mutates the object). As such, offering a callback mechanism similar to weak references is probably pretty close to impossible with CPython.
Let's limit ourselves to copy-on-write of objects which do not contain nonfrozen objects. Perhaps it's enough - the table, the graph, and strings, are perfect examples of these. Implementation doesn't seem to complicated to me - whenever the object is about to change, and there is a connected frozen copy, you make a shallow copy of the object, point the frozen copy to it, release the reference to the frozen copy, and continue as usual. That's all.
What you have written here is fairly unintelligible, but thankfully you clarify yourself...pity it still doesn't work, I explain below. [snip]
The problem is that we have to keep the created frozen object as long as the original object stays alive. So we have to know if it has changed. This is where those callbacks get in. As long as what is done with them is correct, there should be no problems. They are used only to disengage the frozen copies from their original objects. The action they should trigger is simply that:
def on_contained_object_change(self): self._frozen_copy = None while self._callbacks: self._callbacks.pop()()
What's also interesting is that this freezing mechanism can be provided automatically for user-created classes, since those are simply containers of other objects, which behave exactly like dicts, for this matter.
It allows everything in Python to be both mutable and hashable, without changing the O() complexity! Wow!
Ok, I'm going to sleep now. If you find something wrong with this idea, please tell me.
Here is where you are wrong. x = [] for i in xrange(k): x.append(range(k)) We now have a simple list of lists, k lists, each of length k. Let's freeze it. y = frozen(x) Ok, now we have a recursively frozen list of lists y, implemented however you want, and you've ammortized this ONE call to creation time. We'll give y to someone who does whatever he wants to it, and we'll continue on. z = frozen(x) Your claim is that due to the cache, the above operation can be done in constant time after you have already frozen x, this is wrong, but we'll get to that. Let us mutate one of the contained lists, and see if this can continue... x[0][0] = 7 Oh hrm. This invalidates x[0]'s cached frozen object, which would suggest that x's cached frozen object was also invalidated, even though Python objects tend to know nothing about the objects which point to them. Well, that's a rub. In order to /validate/ that an object's cached object is valid, you must validate that the contents of your cached frozen object points to the cached frozen objects of your contents. Or in code (for this two-level example)... def frozen(x): if x.frozen_cache: for i,j in zip(x.contents, x.frozen_cache): if i.frozen_cache is not j: x.frozen_cache = None x.frozen_cache = frozen(x) return x.frozen_cache x.frozen_cache = tuple(map(frozen, x.contents)) return x.frozen_cache Ouch, for the list of lists example with a total size O(k**2), you need to spend O(k) time. We'll say that n == k**2, so really, for this particular object of size O(n), you need to spend O(sqrt(n)) time verifying. Not quite constant. But wait...in order to verify that every cached frozen object is valid...we should have been checking the contents of x[i] to verify that they were frozen too! Wow, that would take us O(n) time just to verify. And we would need to do that EVERY TIME WE CALLED frozen(x), REGARDLESS OF WHETHER x WAS MUTATED! Wait a second...isn't that just the same as just recursively calling freeze? Yes. Are we actually saving any time? No. What has this idea resulted in? The incorrect belief that caching frozen objects will reduce the computation necessary in freeze(x), and a proposed new attribute on literally every object which points to an immutable copy of itself, generally doubling the amount of memory used. Like I said before, it's not so easy. In order to actually implement the system, every object in an object heirarchy would need to know about its parent, but such cannot work because... a = range(10) b = [a] c = [a] bf = frozen(b) cf = frozen(c) a[0] = 10 #oops! That last line is the killer. Every object would necessarily need to know about all containers which point to it, and would necessarily need to notify them all that their contents had changed. I personally don't see Python doing that any time soon. Hope you sleep/slept well, - Josiah
Josiah Carlson wrote: [...]
Perhaps I didn't make it clear. The difference between wxPython's Grid and my table is that in the table, most values are *computed*. This means that there's no point in changing the values themselves. They are also used frequently as set members (I can describe why, but it's a bit complicated.)
Again, user semantics. Tell your users not to modify entries, and/or you can make copies of objects you return. If your users are too daft to read and/or follow directions, then they deserve to have their software not work.
That sounds like a "get out of jail free" card for Microsoft and many other software vendors ... regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC www.holdenweb.com PyCon TX 2006 www.python.org/pycon/
Steve Holden
Josiah Carlson wrote: [...]
Perhaps I didn't make it clear. The difference between wxPython's Grid and my table is that in the table, most values are *computed*. This means that there's no point in changing the values themselves. They are also used frequently as set members (I can describe why, but it's a bit complicated.)
Again, user semantics. Tell your users not to modify entries, and/or you can make copies of objects you return. If your users are too daft to read and/or follow directions, then they deserve to have their software not work.
That sounds like a "get out of jail free" card for Microsoft and many other software vendors ...
If/when vendors are COMPLETE in their specifications and documentation, they can have that card, but being that even when they specify such behaviors they are woefully incomplete, there is not a card to be found, and I stand by my opinion. - Josiah
Hello, I have slept quite well, and talked about it with a few people, and I still think I'm right. About the users-changing-my-internal-data issue:
Again, user semantics. Tell your users not to modify entries, and/or you can make copies of objects you return. If your users are too daft to read and/or follow directions, then they deserve to have their software not work. ... When someone complains that something doesn't work, I tell them to read the documentation. If your users haven't been told to RTFM often enough to actually make it happen, then you need a RTFM-bat. Want to know how you make one? You start wrapping the objects you return which segfaults the process if they change things. When they start asking, tell them it is documented quite clearly "do not to modify objects returned, or else". Then there's the other option, which I provide below.
I disagree. I read the manual when I don't know what something does. If I can guess what it does (and this is where conventions are good), I don't read the manual. And let's say I ought to read the complete manual for every method that I use, and that I deserve a death punishment (or a segmentation fault) if I don't. But let's say that I wrote a software, without reading the manual, and it worked. I have gone to work on other things, and suddenly a bug arises. When the poor guy who needs to fix it goes over the code, everything looks absolutely correct. Should he also read the complete manuals of every library that I used, in order to fix that bug? And remember that in this case, the object could have traveled between several places (including functions in other libraries), before it was changed, and the original data structure starts behaving weird. You suggest two ways for solving the problem. The first is by copying my mutable objects to immutable copies:
Also from the sounds of it, you are storing both source and destination values in the same table...hrm, that sounds quite a bit like a spreadsheet. How does every spreadsheet handle that again? Oh yeah, they only ever store immutables (generally strings which are interpreted). But I suppose since you are (of course) storing mutable objects, you need to work a bit harder...so store mutables, and return immutable copies (which you can cache if you want, and invalidate when your application updates the results...like a wx.Grid update on changed).
This is basically ok. It's just that in my solution, for many objects it's not necessary to make a complete copy just to prevent changing the value: Making frozen copies of objects which can't reference nonfrozen objects (sets, for example), costs O(1), thanks to the copy-on-write. Now, about the graph:
So let me get this straight: You've got a graph. You want to be able to change the graph, but you don't want your users to accidentally change the graph. Sounds to me like an API problem, not a freeze()/mutable problem. Want an API?
class graph: ... def IterOutgoing(self, node): ... def IterIncoming(self, node): ... def IsAdjacent(self, node1, node2): ... def IterNodes(self): ... def AddEdge(self, f_node, t_node): ... def RemEdge(self, node1, node2): ... def AddNode(self): ...
If you are reasonable in your implementation, all of the above operations can be fast, and you will never have to worry about your users accidentally mucking about with your internal data structures: because you aren't exposing them. If you are really paranoid, you can take the next step and implement it in Pyrex or C, so that only a malicous user can muck about with internal structures, at which point you stop supporting them.
This will work. It's simply... well, not very beautiful. I have to invent a lot of names, and my users need to remember them all. If I give them a frozen set, with all the vertices than a vertex points to (which is an absolutely reasonable API), they will know how to deal with it without learning a lot of method names, thanks to the fact that they are already familiar with sets, and that a lot of thought has gone into the set interface.
Now, about copy-on-write: ...
What you have written here is fairly unintelligible, but thankfully you clarify yourself...pity it still doesn't work, I explain below.
I'm sorry if I am sometimes hard to understand. English is not my mother tongue, and it degrades as the hour gets later - and sometimes, things are hard to explain. If I don't explain myself, please say so and I'll try again. This is an excellent example - I wrote about callbacks, and went to sleep. Let me try to explain again how it *does* work. The frozen() function, and the __frozen__ protocol, would get another optional argument - an object to be notified when the *nonfrozen* object has changed. It may be called at most once - only on the first change to the object, and only if the object which requested to be notified is still alive. I now introduce a second protocol, which I will call __changed__. Objects would be notified by calling their __changed__ method. You say that every frozen() call takes O(n), because it needs to verify that objects hadn't changed since the last call:
Oh hrm. This invalidates x[0]'s cached frozen object, which would suggest that x's cached frozen object was also invalidated, even though Python objects tend to know nothing about the objects which point to them. Well, that's a rub. In order to /validate/ that an object's cached object is valid, you must validate that the contents of your cached frozen object points to the cached frozen objects of your contents. Or in code (for this two-level example)...
def frozen(x): if x.frozen_cache: for i,j in zip(x.contents, x.frozen_cache): if i.frozen_cache is not j: x.frozen_cache = None x.frozen_cache = frozen(x) return x.frozen_cache x.frozen_cache = tuple(map(frozen, x.contents)) return x.frozen_cache
This is not so. When a list creates its frozen copy, it gives itself to all those frozen() calls. This means that it will be notified if one of its members is changed. In that case, it has to do two simple actions: 1. release the reference to its frozen copy, so that subsequent freezes of the list would create a new frozen copy, and: 2. notify about the change any object which froze the list and requested notification. This frees us of any validation code. If we haven't been notified about a change, there was no change, and the frozen copy is valid. In case you ask, the cost of notification is O(1), amortized. The reason is that every frozen() call can cause at most one callback in the future.
Like I said before, it's not so easy. In order to actually implement the system, every object in an object heirarchy would need to know about its parent, but such cannot work because...
a = range(10) b = [a] c = [a] bf = frozen(b) cf = frozen(c) a[0] = 10 #oops!
That last line is the killer. Every object would necessarily need to know about all containers which point to it, and would necessarily need to notify them all that their contents had changed. I personally don't see Python doing that any time soon.
This is not the case. Every object has to know only on the objects which created frozen copies of it, and requested notifications. Actually, the object itself doesn't have to store anything. I thought about it, and you can create a module for handling those change-callbacks. It would store only weak references to objects. It would have two functions: def register_reference(x, y): '''Register the fact that if object x changes, it means that object y changes too.''' def changed(x): '''Notify all objects that change with x that they are changed.''' When an object is changed, this module would call the __changed__ method of all the objects that have a reference to it, and haven't changed since the connection was created. I hope I have clarified my idea. Tell me if you still think I'm wrong.
Hope you sleep/slept well, - Josiah
Thanks! indeed, a good sleep is something very important. Sleep well too (when the time comes, of course), Noam
Noam Raphael
Hello,
I have slept quite well, and talked about it with a few people, and I still think I'm right.
And I'm going to point out why you are wrong.
About the users-changing-my-internal-data issue:
Again, user semantics. Tell your users not to modify entries, and/or you can make copies of objects you return. If your users are too daft to read and/or follow directions, then they deserve to have their software not work. ... When someone complains that something doesn't work, I tell them to read the documentation. If your users haven't been told to RTFM often enough to actually make it happen, then you need a RTFM-bat. Want to know how you make one? You start wrapping the objects you return which segfaults the process if they change things. When they start asking, tell them it is documented quite clearly "do not to modify objects returned, or else". Then there's the other option, which I provide below.
I disagree. I read the manual when I don't know what something does. If I can guess what it does (and this is where conventions are good), I don't read the manual. And let's say I ought to read the complete manual for every method that I use, and that I deserve a death punishment (or a segmentation fault) if I don't. But let's say that I wrote a software, without reading the manual, and it worked. I have gone to work on other things, and suddenly a bug arises. When the poor guy who needs to fix it goes over the code, everything looks absolutely correct. Should he also read the complete manuals of every library that I used, in order to fix that bug? And remember that in this case, the object could have traveled between several places (including functions in other libraries), before it was changed, and the original data structure starts behaving weird.
You can have a printout before it dies: "I'm crashing your program because something attempted to modify a data structure (here's the traceback), and you were told not to." Then again, you can even raise an exception when people try to change the object, as imdict does, as tuples do, etc.
You suggest two ways for solving the problem. The first is by copying my mutable objects to immutable copies:
And by caching those results, then invalidating them when they are updated by your application. This is the same as what you would like to do, except that I do not rely on copy-on-write semantics, which aren't any faster than freeze+cache by your application. [snip graph API example]
This will work. It's simply... well, not very beautiful. I have to invent a lot of names, and my users need to remember them all. If I give them a frozen set, with all the vertices than a vertex points to (which is an absolutely reasonable API), they will know how to deal with it without learning a lot of method names, thanks to the fact that they are already familiar with sets, and that a lot of thought has gone into the set interface.
I never claimed it was beautiful, I claimed it would work. And it does. There are 7 methods, which you can reduce if you play the special method game: RemEdge -> __delitem__((node, node)) RemNode -> __delitem__(node) #forgot this method before IterNodes -> __iter__() IterOutgoing,IterIncoming -> IterAdjacent(node) In any case, whether you choose to use freeze, or use a different API, this particular problem is solvable without copy-on-write semantics.
Now, about copy-on-write: ...
What you have written here is fairly unintelligible, but thankfully you clarify yourself...pity it still doesn't work, I explain below.
I'm sorry if I am sometimes hard to understand. English is not my mother tongue, and it degrades as the hour gets later - and sometimes, things are hard to explain. If I don't explain myself, please say so and I'll try again. This is an excellent example - I wrote about callbacks, and went to sleep. Let me try to explain again how it *does* work.
Thank you for the clarification (btw, your english is far better than any of the foreign languages I've been "taught" over the years).
This is not so. When a list creates its frozen copy, it gives itself to all those frozen() calls. This means that it will be notified if one of its members is changed. In that case, it has to do two simple actions: 1. release the reference to its frozen copy, so that subsequent freezes of the list would create a new frozen copy, and: 2. notify about the change any object which froze the list and requested notification.
This frees us of any validation code. If we haven't been notified about a change, there was no change, and the frozen copy is valid.
In case you ask, the cost of notification is O(1), amortized. The reason is that every frozen() call can cause at most one callback in the future.
Even without validation, there are examples that force a high number of calls, which are not O(1), ammortized or otherwise.
Like I said before, it's not so easy. In order to actually implement the system, every object in an object heirarchy would need to know about its parent, but such cannot work because...
a = range(10) b = [a] c = [a] bf = frozen(b) cf = frozen(c) a[0] = 10 #oops!
That last line is the killer. Every object would necessarily need to know about all containers which point to it, and would necessarily need to notify them all that their contents had changed. I personally don't see Python doing that any time soon.
This is not the case. Every object has to know only on the objects which created frozen copies of it, and requested notifications. Actually, the object itself doesn't have to store anything. I thought about it, and you can create a module for handling those change-callbacks. It would store only weak references to objects. It would have two functions:
def register_reference(x, y): '''Register the fact that if object x changes, it means that object y changes too.'''
def changed(x): '''Notify all objects that change with x that they are changed.'''
When an object is changed, this module would call the __changed__ method of all the objects that have a reference to it, and haven't changed since the connection was created.
Callbacks work, in that you can notify parents, but they still don't allow O(1) ammortization. a = [[] for i in xrange(k)] b = [list(a) for i in xrange(k)] del a c = freeze(b) b[0][0].append(1) That append call requires that b and every list in b be notified. That takes O(k) time, because you have to notify the k lists which point to b[0][0]. Let's freeze it again! Oh wait, that takes O(k**2) time for that second freeze because you have to recreate the tuple version of b, as well as the tuple version of everything in b. Cycling through modifications and freezing ends up taking time equivalent to if you were to just re-create the entire frozen version to start with every time you freeze. Now, the actual time analysis on repeated freezings and such gets ugly. There are actually O(k) objects, which take up O(k**2) space. When you modify object b[i][j] (which has just been frozen), you get O(k) callbacks, and when you call freeze(b), it actually results in O(k**2) time to re-copy the O(k**2) pointers to the O(k) objects. It should be obvious that this IS NOT AMMORTIZABLE to original object creation time.
I hope I have clarified my idea. Tell me if you still think I'm wrong.
You have clarified it, but it is still wrong. I stand by 'it is not easy to get right', and would further claim, "I doubt it is possible to make it fast." Good day, - Josiah
On 10/31/05, Josiah Carlson
And I'm going to point out why you are wrong.
I still don't think so. I think that I need to reclarify what I mean.
About the users-changing-my-internal-data issue: ... You can have a printout before it dies: "I'm crashing your program because something attempted to modify a data structure (here's the traceback), and you were told not to."
Then again, you can even raise an exception when people try to change the object, as imdict does, as tuples do, etc.
Both solutions would solve the problem, but would require me to wrap the built-in set with something which doesn't allow changes. This is a lot of work - but it's quite similiar to what my solution would actually do, in a single built-in function.
You suggest two ways for solving the problem. The first is by copying my mutable objects to immutable copies:
And by caching those results, then invalidating them when they are updated by your application. This is the same as what you would like to do, except that I do not rely on copy-on-write semantics, which aren't any faster than freeze+cache by your application.
This isn't correct - freezing a set won't require a single copy to be performed, as long as the frozen copy isn't saved after the original is changed. Copy+cache always requires one copy. ...
I never claimed it was beautiful, I claimed it would work. And it does. There are 7 methods, which you can reduce if you play the special method game:
RemEdge -> __delitem__((node, node)) RemNode -> __delitem__(node) #forgot this method before IterNodes -> __iter__() IterOutgoing,IterIncoming -> IterAdjacent(node)
I just wanted to say that this game is of course a lot of fun, but it doesn't simplify the interface.
In any case, whether you choose to use freeze, or use a different API, this particular problem is solvable without copy-on-write semantics.
Right. But I think that a significant simplification of the API is a nice bonus for my solution. And about those copy-on-write semantics - it should be proven how complex they are. Remember that we are talking about frozen-copy-on-write, which I think would simplify matters considerably - for example, there are at most two instances sharing the same data, since the frozen copy can be returned again and again.
Now, about copy-on-write: ... Thank you for the clarification (btw, your english is far better than any of the foreign languages I've been "taught" over the years). Thanks! It seems that if you are forced to use a language from time to time it improves a little...
...
Even without validation, there are examples that force a high number of calls, which are not O(1), ammortized or otherwise.
[Snap - a very interesting example]
Now, the actual time analysis on repeated freezings and such gets ugly. There are actually O(k) objects, which take up O(k**2) space. When you modify object b[i][j] (which has just been frozen), you get O(k) callbacks, and when you call freeze(b), it actually results in O(k**2) time to re-copy the O(k**2) pointers to the O(k) objects. It should be obvious that this IS NOT AMMORTIZABLE to original object creation time.
That's absolutely right. My ammortized analysis is correct only if you limit yourself to cases in which the original object doesn't change after a frozen() call was made. In that case, it's ok to count the O(k**2) copy with the O(k**2) object creation, because it's made only once. Why it's ok to analyze only that limited case? I am suggesting a change in Python: that every object you would like be mutable, and would support the frozen() protocol. When you evaluate my suggestion, you need to take a program, and measure its performance in the current Python and in a Python which implements my suggestion. This means that the program should work also on the current Python. In that case, my assumption is true - you won't change objects after you have frozen them, simply because these objects (strings which are used as dict keys, for example) can't be changed at all in the current Python implementation! I will write it in another way: I am proposing a change that will make Python objects, including strings, mutable, and gives you other advantages as well. I claim that it won't make existing Python programs run slower in O() terms. It would allow you to do many things that you can't do today; some of them would be fast, like editing a string, and some of them would be less fast - for example, repeatedly changing an object and freezing it. I think that the performance penalty may be rather small - remember that in programs which do not change strings, there would never be a need to copy the string data at all. And since I think that usually most of the dict lookups are for method or function names, there would almost never be a need to constuct a new object on dict lookup, because you search for the same names again and again, and a new object is created only on the first frozen() call. You might even gain performance, because s += x would be faster. ...
You have clarified it, but it is still wrong. I stand by 'it is not easy to get right', and would further claim, "I doubt it is possible to make it fast."
It would not be very easy to implement, of course, but I hope that it won't be very hard either, since the basic idea is quite simple. Do you still doubt the possibility of making it fast, given my (correct) definition of fast? And if it's possible (which I think it is), it would allow us to get rid of inconvinient immutable objects, and it would let us put everything into a set. Isn't that nice?
Good day, - Josiah
The same to you, Noam
I thought about something -
I think that the performance penalty may be rather small - remember that in programs which do not change strings, there would never be a need to copy the string data at all. And since I think that usually most of the dict lookups are for method or function names, there would almost never be a need to constuct a new object on dict lookup, because you search for the same names again and again, and a new object is created only on the first frozen() call. You might even gain performance, because s += x would be faster.
Name lookups can take virtually the same time they take now - method names can be saved from the beginning as frozen strings, so finding them in a dict will take just another bit test - is the object frozen - before doing exactly what is done now. Remember, the strings we are familiar with are simply frozen strings...
Noam Raphael
On 10/31/05, Josiah Carlson
wrote:
About the users-changing-my-internal-data issue: ... You can have a printout before it dies: "I'm crashing your program because something attempted to modify a data structure (here's the traceback), and you were told not to."
Then again, you can even raise an exception when people try to change the object, as imdict does, as tuples do, etc.
Both solutions would solve the problem, but would require me to wrap the built-in set with something which doesn't allow changes. This is a lot of work - but it's quite similiar to what my solution would actually do, in a single built-in function.
I am an advocate for PEP 351. However, I am against your proposed implementation/variant of PEP 351 because I don't believe it ads enough to warrant the additional complication and overhead necessary for every object (even tuples would need to get a .frozen_cache member). Give me a recursive freeze from PEP 351 (which handles objects that are duplicated, but errors out on circular references), and I'll be happy.
You suggest two ways for solving the problem. The first is by copying my mutable objects to immutable copies:
And by caching those results, then invalidating them when they are updated by your application. This is the same as what you would like to do, except that I do not rely on copy-on-write semantics, which aren't any faster than freeze+cache by your application.
This isn't correct - freezing a set won't require a single copy to be performed, as long as the frozen copy isn't saved after the original is changed. Copy+cache always requires one copy.
You are wrong, and you even say you are wrong..."freezing a set doesn't require a COPY, IF the frozen COPY isn't saved after the original is CHANGED". Creating an immutable set IS CREATING A COPY, so it ALSO copies, and you admit as much, but then say the equivalent of "copying isn't copying because I say so".
In any case, whether you choose to use freeze, or use a different API, this particular problem is solvable without copy-on-write semantics.
Right. But I think that a significant simplification of the API is a nice bonus for my solution. And about those copy-on-write semantics - it should be proven how complex they are. Remember that we are talking about frozen-copy-on-write, which I think would simplify matters considerably - for example, there are at most two instances sharing the same data, since the frozen copy can be returned again and again.
I think that adding an additional attribute to literally every single object to handle the caching of 'frozen' objects, as well as a list to every object to handle callbacks which should be called on object mutation, along with a _call_stuff_when_mutated() method that handles these callback calls, IN ADDITION TO the __freeze__ method which is necessary to support this, is a little much, AND IS CERTAINLY NOT A SIMPLIFICATION! Let us pause for a second and consider: Original PEP proposed 1 new method: __freeze__, which could be implemented as a subclass of the original object (now), and integrated into the original classes as time goes on. One could /register/ __freeze__ functions/methods a'la Pickle, at which point objects wouldn't even need a native freeze method. Your suggestion offers 2 new methods along with 2 new instance variables. Let's see, a callback handler, __freeze__, the cache, and the callback list. Doesn't that seem a little excessive to you to support freezing? It does to me. If Guido were to offer your implementation of freeze, or no freeze at all, I would opt for no freeze, as implementing your freeze on user-defined classes would be a pain in the ass, not to mention implementing them in C code would be more than I would care to do, and more than I would ask any of the core developers to work on.
Even without validation, there are examples that force a high number of calls, which are not O(1), ammortized or otherwise.
[Snap - a very interesting example]
Now, the actual time analysis on repeated freezings and such gets ugly. There are actually O(k) objects, which take up O(k**2) space. When you modify object b[i][j] (which has just been frozen), you get O(k) callbacks, and when you call freeze(b), it actually results in O(k**2) time to re-copy the O(k**2) pointers to the O(k) objects. It should be obvious that this IS NOT AMMORTIZABLE to original object creation time.
That's absolutely right. My ammortized analysis is correct only if you limit yourself to cases in which the original object doesn't change after a frozen() call was made. In that case, it's ok to count the O(k**2) copy with the O(k**2) object creation, because it's made only once.
But here's the crucial observation which you are missing. You yourself have stated that in both your table and graph examples you want your application to continue to modify values while the user can't manipulate them. So even in your own use-cases, you are going to be modifying objects after they have been frozen, and even then it won't be fast! I believe that in general, people who are freezing things are going to want to be changing the original objects - hence the use of mutables to begin with - maybe for situations like yours where you don't want users mutating returns, whatever. If after they have frozen the object, they don't want to be changing the original objects, then they are probably going to be tossing out the original mutable and using the immutable created with freeze anyways (mutate your object until you get it right, then freeze it and use that so that no one can alter your data, not even yourself), so I think that caching is something that the /user/ should be doing, NOT Python. The simple implementation (not copy-on-write) leads us to a simple matter of documenting, "Freeze is 'stateless'; every call to freeze returns a new object, regardless of modifications (or lack thereof) between freeze calls." Remember: "Simple is better than complex."
Why it's ok to analyze only that limited case? I am suggesting a change in Python: that every object you would like be mutable, and would support the frozen() protocol. When you evaluate my suggestion, you need to take a program, and measure its performance in the current Python and in a Python which implements my suggestion. This means that the program should work also on the current Python. In that case, my assumption is true - you won't change objects after you have frozen them, simply because these objects (strings which are used as dict keys, for example) can't be changed at all in the current Python implementation!
Not everything can/should become mutable. Integers should never become mutable, as tuples should never become mutable, as strings/unicode should never become mutable...wait, aren't we getting to the point that everything which is currently immutable shouldn't become mutable? Indeed. I don't believe that any currently immutable object should be able to become mutable in order to satisfy /anyone's/ desire for mutable /anything/. In starting to bring up benchmarks you are falling into the trap of needing to /have/ a benchmark (I have too), for which there are very few, if any, current use-cases. Without having or needing a benchmark, I'll state quite clearly where your proposed copy-on-write would beat out the naive 'create a new copy on every call to freeze': 1. If objects after they are frozen are never modified, copy on write will be faster. 2. If original objects are modified after they are frozen, then the naive implementation will be as fast if not faster in general, due to far lower overhead, but may be slower in corner cases where some nested structure is unchanged, and some shallow bit has changed: x = [[], NEVER_CHANGED_MUTABLE_NESTED_STRUCTURE] y = freeze(x) x[0].append(1) z = freeze(x) Further, discussing benchmarks on use-cases, for which there are few (if any) previously existing uses, is like saying "let's race cars" back in 1850; it's a bit premature. Then there is this other example: x = [1,2,3] y = freeze(x) The flat version of freeze in the PEP right now handles this case. I can change x all I want, yet I have a frozen y which stays unchanged. This is what I would want, and I would imagine it is what others would want too. In fact, this is precisely the use-case you offered for your table and graph examples, so your expression of a sentiment of "users aren't going to be changing the object after it has been frozen" is, by definition, wrong: you do it yourself!
I will write it in another way: I am proposing a change that will make Python objects, including strings, mutable, and gives you other advantages as well. I claim that it won't make existing Python programs run slower in O() terms. It would allow you to do many things that you can't do today; some of them would be fast, like editing a string, and some of them would be less fast - for example, repeatedly changing an object and freezing it.
Your claim on running time only works if the original isn't changed after it is frozen And I don't like making everything mutable, it's a "solution looking for a problem", or a "tail wagging the dog" idea. There is no good reason to make everything mutable, and I challenge you to come up with a valid one that isn't already covered by the existing standard library or extension modules. There is no need to bring strings into this conversation as there are string variants which are already mutable: array.array('c', ...), StringIO, mmap, take your pick! And some future Python (perhaps 2.5) will support a 'bytes' object, which is essentially an mmap which doesn't need to be backed by a file.
I think that the performance penalty may be rather small - remember that in programs which do not change strings, there would never be a need to copy the string data at all. And since I think that usually most of the dict lookups are for method or function names, there would almost never be a need to constuct a new object on dict lookup, because you search for the same names again and again, and a new object is created only on the first frozen() call. You might even gain performance, because s += x would be faster.
You really don't know how Python internals work. The slow part of s += x on strings in Python 2.4 is the memory reallocation and occasional data copy (it has been tuned like crazy by Raymond in 2.4, see _PyString_Resize in stringobject.c). Unless you severely over-allocated your strings, this WOULD NOT BE SPED UP BY MUTABLE STRINGS. Further, identifiers/names (obj, obj.attr, obj.attr1.attr2, ...) are already created during compile-time, and are 'interned'. That is, if you have an object that you call 'foo', there gets to be a single "foo" string, which is referenced by pointer by any code in that module which references the 'foo' object to the single, always unchanging "foo" string. And because the string has already been hashed, it has a cached hash value, and lookups in dictionaries are already fast due to a check for pointer equivalency before comparing contents. Mutable strings CANNOT be faster than this method.
You have clarified it, but it is still wrong. I stand by 'it is not easy to get right', and would further claim, "I doubt it is possible to make it fast."
It would not be very easy to implement, of course, but I hope that it won't be very hard either, since the basic idea is quite simple. Do you still doubt the possibility of making it fast, given my (correct) definition of fast?
I would claim that your definition is limited. Yours would be fast if objects never changed after they are frozen, which is counter to your own use-cases. This suggests that your definition is in fact incorrect, and you fail to see your own inconsistancy.
And if it's possible (which I think it is), it would allow us to get rid of inconvinient immutable objects, and it would let us put everything into a set. Isn't that nice?
No, it sounds like a solution looking for a problem. I see no need to make strings, floats, ints, tuples, etc. mutable, and I think that you will have very little luck in getting core Python developer support for any attempt to make them mutable. If you make such a suggestion, I would offer that you create a new PEP, because this discussion has gone beyond PEP 351, and has wandered into the realm of "What other kinds of objects would be interesting to have in a Python-like system?" I'll summarize your claims: 1. copy-on-write is a simplification 2. everything being mutable would add to Python 3. copy-on-write is fast 4. people won't be mutating objects after they are frozen I'll counter your claims: 1. 2 methods and 2 instance variables on ALL OBJECTS is not a simplification. 2. a = b = 1; a += 1; If all objects were to become mutable, then a == b, despite what Python and every other sane language would tell you, and dct[a] would stop working (you would have to use c = freeze(a);dct[c], or dct[x] would need to automatically call freeze and only ever reference the result, significantly slowing down ALL dictionary references). 3. only if you NEVER MUTATE an object after it has been frozen 4. /you/ mutate original objects after they are frozen ALSO: 5. You fail to realize that if all objects were to become mutable, then one COULDN'T implement frozen, because the frozen objects THEMSELVES would be mutable. I'm going to bow out of this discussion for a few reasons, not the least of which being that I've spent too much time on this subject, and that I think it is quite clear that your proposal is dead, whether I had anything to do with it or not. - Josiah
On 11/1/05, Josiah Carlson
I am an advocate for PEP 351. However, I am against your proposed implementation/variant of PEP 351 because I don't believe it ads enough to warrant the additional complication and overhead necessary for every object (even tuples would need to get a .frozen_cache member).
Give me a recursive freeze from PEP 351 (which handles objects that are duplicated, but errors out on circular references), and I'll be happy.
That's fine - but it doesn't mean that I must be happy with it.
...
This isn't correct - freezing a set won't require a single copy to be performed, as long as the frozen copy isn't saved after the original is changed. Copy+cache always requires one copy.
You are wrong, and you even say you are wrong..."freezing a set doesn't require a COPY, IF the frozen COPY isn't saved after the original is CHANGED". Creating an immutable set IS CREATING A COPY, so it ALSO copies, and you admit as much, but then say the equivalent of "copying isn't copying because I say so".
No, I am not wrong. I am just using misleading terms. I will call a "frozen copy" a "frozen image". Here it goes: "freezing a set doesn't require a COPY, IF the frozen IMAGE isn't saved after the original is CHANGED". I suggest that there would be a way to create a frozenset without COPYING an O(n) amount of MEMORY. When a frozen set is created by a call frozen(x), it would not copy all the data, but would rather reference the existing data, which was created by the non-frozen set. Only if the original set changes, when there's a frozen set referencing the data, the MEMORY would be actually copied. I call it a "frozen copy" because it behaves as a frozen copy, even though not all the memory is being copied. When you call the COPY function in the COPY module with a string, it doesn't really copy memory - the same string is returned. When you copy a file inside subversion, it doesn't actually copy all the data associated with it, but does something smarter, which takes O(1). The point is, for the user, it's a copy. Whether or not memory is actually being copied, is an implementation detail.
...
I think that adding an additional attribute to literally every single object to handle the caching of 'frozen' objects, as well as a list to every object to handle callbacks which should be called on object mutation, along with a _call_stuff_when_mutated() method that handles these callback calls, IN ADDITION TO the __freeze__ method which is necessary to support this, is a little much, AND IS CERTAINLY NOT A SIMPLIFICATION!
I don't agree. You don't need to add a list to every object, since you can store all those relations in one place, with a standard function for registering them. Anyway, code written in Python (which is the language we are discussing) WON'T BE COMPLICATED! The frozen mechanism, along with two new protocols (__frozen__ and __changed__), would be added automatically! The internal state of a class written in Python can be automatically frozen, since it's basically a dict. Now let's see if it's a simplification: 1. No Python code would have to be made more complicated because of the change. 2. There would be no need to find workarounds, like cStringIO, for the fact that strings and tuples are immutable. 3. You would be able to put any kind of object into a set, or use it as a dict key. 4. Classes (like the graph example) would be able to give users things without having to make a choice between risking their users with strange bugs, making a complicated interface, making very inefficient methods, and writing complicated wrapper classes. I will ask you: Is this a complication? The answer is: it requires a significent change of the CPython implementation. But about the Python language: it's definitely a simplification.
Let us pause for a second and consider: Original PEP proposed 1 new method: __freeze__, which could be implemented as a subclass of the original object (now), and integrated into the original classes as time goes on. One could /register/ __freeze__ functions/methods a'la Pickle, at which point objects wouldn't even need a native freeze method.
Your suggestion offers 2 new methods along with 2 new instance variables. Let's see, a callback handler, __freeze__, the cache, and the callback list. Doesn't that seem a little excessive to you to support freezing? It does to me. If Guido were to offer your implementation of freeze, or no freeze at all, I would opt for no freeze, as implementing your freeze on user-defined classes would be a pain in the ass, not to mention implementing them in C code would be more than I would care to do, and more than I would ask any of the core developers to work on.
As I said above: this suggestion would certainly require more change in the Python implementation than your suggestion. But the Python language would gain a lot more. Implementing my frozen on user-defined classes would not be a pain in the ass, because it will require no work at all - the Python implementation would provide it automatically. The fact that it can be done automatically for user-defined classes raises a hope in me that it can be made not too complicated for classes written in C.
...
But here's the crucial observation which you are missing. You yourself have stated that in both your table and graph examples you want your application to continue to modify values while the user can't manipulate them. So even in your own use-cases, you are going to be modifying objects after they have been frozen, and even then it won't be fast!
No. In the table example, the table would never change the object themselves - it may only calculate new values, and drop the references to the old ones. This is definitely a case of not changing the value after it has been frozen. In the graph example, it is true that the set would be changed after it's frozen, but it is expected that the frozen copy would not exist by the time the change happens - think about the x is graph.neighbours(y) example. There is actually no reason for keeping them, besides for tracking the history of the graph - which would require a copy anyway. The frozen() implementation of objects which do not reference non-frozen objects, such as sets, really doesn't copy any memory when it's called, and will never cause a memory copy if there are no living frozen copies of the object while the object changes.
I believe that in general, people who are freezing things are going to want to be changing the original objects - hence the use of mutables to begin with - maybe for situations like yours where you don't want users mutating returns, whatever. If after they have frozen the object, they don't want to be changing the original objects, then they are probably going to be tossing out the original mutable and using the immutable created with freeze anyways (mutate your object until you get it right, then freeze it and use that so that no one can alter your data, not even yourself), so I think that caching is something that the /user/ should be doing, NOT Python.
I don't agree. The table and the graph are examples. The common use patterns I see regarding frozen() are: 1. Don't use frozen() at all. Think about strings becoming mutable. Most strings which are changed would never be frozen. When you are using a list, how many times do you make a frozen copy of it? (The answer is zero, of course, you can't. You can't use it as a dict key, or as a member of a set. This is just to show you that not freezing mutable objects is a common thing.) 2. Create the object using more operations than constructing it, and then don't change it, possibly making a frozen copy of it. The table is an example: functions given by the user create objects, in whatever way they choose, and then the table doesn't need to change them, and needs to create a frozen copy. It's a very reasonable use case: I would say that the less common case is that you can create an object using only the constructor. Many times you make a tuple out of a list that you've created just for that purpose. It's not intuitive!
The simple implementation (not copy-on-write) leads us to a simple matter of documenting, "Freeze is 'stateless'; every call to freeze returns a new object, regardless of modifications (or lack thereof) between freeze calls."
Remember: "Simple is better than complex."
Firstly, you are talking about implementation. Secondly, sometimes things are too simple, and lead to complex workarounds.
...
Not everything can/should become mutable. Integers should never become mutable, as tuples should never become mutable, as strings/unicode should never become mutable...wait, aren't we getting to the point that everything which is currently immutable shouldn't become mutable? Indeed. I don't believe that any currently immutable object should be able to become mutable in order to satisfy /anyone's/ desire for mutable /anything/.
Integers should never become mutable - right. There should be no mutable ints in Python. Tuples and strings should never become mutable - wrong. Strings created by the user should be mutable - those immutable strings are a Python anomaly. All I was saying was that sometimes, the Python implementation would want to use immutable strings. So will users, sometimes. There is a need for a mutable string, and a need for an immutable string, and a need for an efficient conversion between the two. That's all.
In starting to bring up benchmarks you are falling into the trap of needing to /have/ a benchmark (I have too), for which there are very few, if any, current use-cases.
No, I don't. There are a lot of use cases. As I said, I suggest a change to the Python language, which would give you many benefits. When suggesting such a change, it should be verified that the performance of existing Python programs won't be harmed, which I did. What might be done as well, is to compare my suggestion to yours:
Without having or needing a benchmark, I'll state quite clearly where your proposed copy-on-write would beat out the naive 'create a new copy on every call to freeze': 1. If objects after they are frozen are never modified, copy on write will be faster. 2. If original objects are modified after they are frozen, then the naive implementation will be as fast if not faster in general, due to far lower overhead, but may be slower in corner cases where some nested structure is unchanged, and some shallow bit has changed:
As I said, many objects are never modified after they are frozen. ***This includes all the strings which are used in current Python programs as dict keys*** - I suggest that strings would become mutable by default. This means that whenever you use a string as a dict key, a call to frozen() is done by the dict. It's obvious that the string won't change after it is frozen. Now, my suggestion is faster in its order of complexity than yours. In some cases, yours is faster by a constant, which I claim that would be quite small in real use cases.
x = [[], NEVER_CHANGED_MUTABLE_NESTED_STRUCTURE] y = freeze(x) x[0].append(1) z = freeze(x)
This is one of the cases in which the change in order of complexity is significant.
Further, discussing benchmarks on use-cases, for which there are few (if any) previously existing uses, is like saying "let's race cars" back in 1850; it's a bit premature.
I don't agree. That's why we're discussing it.
Then there is this other example:
x = [1,2,3] y = freeze(x)
The flat version of freeze in the PEP right now handles this case. I can change x all I want, yet I have a frozen y which stays unchanged. This is what I would want, and I would imagine it is what others would want too. In fact, this is precisely the use-case you offered for your table and graph examples, so your expression of a sentiment of "users aren't going to be changing the object after it has been frozen" is, by definition, wrong: you do it yourself!
Okay, so may I add another use case in which frozen() is fast: If an object which only holds references to frozen object is changed after a frozen copy of it has been made, and the frozen copy is discarded before the change is made, frozen() would still take O(1). This is the case with the graph.
I will write it in another way: I am proposing a change that will make Python objects, including strings, mutable, and gives you other advantages as well. I claim that it won't make existing Python programs run slower in O() terms. It would allow you to do many things that you can't do today; some of them would be fast, like editing a string, and some of them would be less fast - for example, repeatedly changing an object and freezing it.
Your claim on running time only works if the original isn't changed after it is frozen
But they won't, in existing Python programs, so my claim: "it won't make existing Python programs run slower in O() terms" is absolutely correct!
And I don't like making everything mutable, it's a "solution looking for a problem", or a "tail wagging the dog" idea. There is no good reason to make everything mutable, and I challenge you to come up with a valid one that isn't already covered by the existing standard library or extension modules.
There is no need to bring strings into this conversation as there are string variants which are already mutable: array.array('c', ...), StringIO, mmap, take your pick! And some future Python (perhaps 2.5) will support a 'bytes' object, which is essentially an mmap which doesn't need to be backed by a file.
My two examples don't have a satisfactory solution currently. All this variety acutally proves my point: There is "more than one way to do it" because these are all workarounds! If strings were mutable, I won't have to learn about all these nice modules. And if that's not enough, here's a simple use case which isn't covered by all those modules. Say I store my byte arrays using array.array, or mmap. What if I want to make a set of those, in order to check if a certain byte sequence was already encountered? I CAN'T. I have to do another workaround, which will probably be complicated and unefficient, to convert my byte array into a string. Everything is possible, if you are willing to work hard enough. I am suggesting to simplify things. ...
You really don't know how Python internals work.
The slow part of s += x on strings in Python 2.4 is the memory reallocation and occasional data copy (it has been tuned like crazy by Raymond in 2.4, see _PyString_Resize in stringobject.c). Unless you severely over-allocated your strings, this WOULD NOT BE SPED UP BY MUTABLE STRINGS.
The fact that it has been tuned like crazy, and that it had to wait for Python 2.4, is just showing us that we talking on *another* workaround. And please remember that this optimization was announced not to be counted on, in case you want your code to work efficiently on other Python implementations. In that case (which would just grow more common in the future), you would have to get back to the other workarounds, like cStringIO.
Further, identifiers/names (obj, obj.attr, obj.attr1.attr2, ...) are already created during compile-time, and are 'interned'. That is, if you have an object that you call 'foo', there gets to be a single "foo" string, which is referenced by pointer by any code in that module which references the 'foo' object to the single, always unchanging "foo" string. And because the string has already been hashed, it has a cached hash value, and lookups in dictionaries are already fast due to a check for pointer equivalency before comparing contents. Mutable strings CANNOT be faster than this method.
Right - they can't be faster than this method. But they can be virtually AS FAST. Store frozen strings as identifiers/names, and you can continue to use exactly the same method you described.
...
I would claim that your definition is limited. Yours would be fast if objects never changed after they are frozen, which is counter to your own use-cases. This suggests that your definition is in fact incorrect, and you fail to see your own inconsistancy.
I have answered this above: It is not counter to my use cases, and it's a very good assumption, as it is true in many examples, including all current Python programs.
And if it's possible (which I think it is), it would allow us to get rid of inconvinient immutable objects, and it would let us put everything into a set. Isn't that nice?
No, it sounds like a solution looking for a problem. I see no need to make strings, floats, ints, tuples, etc. mutable, and I think that you will have very little luck in getting core Python developer support for any attempt to make them mutable.
Concerning ints, floats, complexes, and any other object with a constant memory use, I agree. Concerning other objects, I disagree. I think that it would simplify things considerably, and that many things that we are used to are actually workarounds.
If you make such a suggestion, I would offer that you create a new PEP, because this discussion has gone beyond PEP 351, and has wandered into the realm of "What other kinds of objects would be interesting to have in a Python-like system?"
That is a good suggestion, and I have already started to write one. It takes me a long time, but I hope I will manage.
I'll summarize your claims: 1. copy-on-write is a simplification 2. everything being mutable would add to Python 3. copy-on-write is fast 4. people won't be mutating objects after they are frozen
I'll counter your claims:
I'll counter-counter them:
1. 2 methods and 2 instance variables on ALL OBJECTS is not a simplification.
It is. This is basically an implementation detail, Python code would never be complicated.
2. a = b = 1; a += 1; If all objects were to become mutable, then a == b, despite what Python and every other sane language would tell you, and dct[a] would stop working (you would have to use c = freeze(a);dct[c], or dct[x] would need to automatically call freeze and only ever reference the result, significantly slowing down ALL dictionary references).
This might be the point that I didn't stress enough. Dict *would* call freeze, and this is why more work is needed to make sure it is a quick operation. I have proven that it is quick in O() terms, and I claimed that it can be made quick in actual terms.
3. only if you NEVER MUTATE an object after it has been frozen
...or if the frozen copy is killed before the change, for many types of objects.
4. /you/ mutate original objects after they are frozen
Yes I do, but see 3.
ALSO: 5. You fail to realize that if all objects were to become mutable, then one COULDN'T implement frozen, because the frozen objects THEMSELVES would be mutable.
Really, you take me by the word. All objects COULD become mutable, if we supply a frozen version of it. This doesn't include any object which you don't want, including ints, and including frozen objects.
I'm going to bow out of this discussion for a few reasons, not the least of which being that I've spent too much time on this subject, and that I think it is quite clear that your proposal is dead, whether I had anything to do with it or not.
- Josiah
That's fine. I wish that you read my answer, think about it a little, and just tell me in a yes or a no if you still consider it dead. I think that I have answered all your questions, and I hope that at least others would be convinced by them, and that at the end my suggestion would be accepted. Others who read this - please respond if you think there's something to my suggestion! Thanks for your involvement. I hope it would at least help me better explain my idea. Noam
That's fine. I wish that you read my answer, think about it a little, and just tell me in a yes or a no if you still consider it dead. I think that I have answered all your questions, and I hope that at least others would be convinced by them, and that at the end my suggestion would be accepted.
I still consider it dead. "If the implementation is hard to explain, it's a bad idea." Also, not all user-defined classes have a __dict__, and not all user-defined classes can have arbitrary attributes added to them. c>>> class foo(object): ... __slots__ = ['lst'] ... def __init__(self): ... self.lst = [] ...
a = foo() a.bar = 1 Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: 'foo' object has no attribute 'bar'
- Josiah
On 11/1/05, Josiah Carlson
I still consider it dead. "If the implementation is hard to explain, it's a bad idea."
It is sometimes true, but not always. It may mean two other things: 1. The one trying to explain is not talented enough. 2. The implementation is really not very simple. A hash table, used so widely in Python, is really not a simple idea, and it's not that easy to explain.
Also, not all user-defined classes have a __dict__, and not all user-defined classes can have arbitrary attributes added to them.
c>>> class foo(object): ... __slots__ = ['lst'] ... def __init__(self): ... self.lst = [] ...
a = foo() a.bar = 1 Traceback (most recent call last): File "<stdin>", line 1, in ? AttributeError: 'foo' object has no attribute 'bar'
It doesn't matter. It only means that the implementation would have to make frozen copies also of __slots__ items, when freezing a user-defined class. I am afraid that this question proves that I didn't convey my idea to you. If you like, please forgive my inability to explain it clearly, and try again to understand my idea, by going over what I wrote again, and thinking on it. You can also wait for the PEP that I intend to write. And you can also forget about it, if you don't want to bother with it - you've already helped a lot. Noam
participants (13)
-
Adam Olsen
-
Antoine Pitrou
-
Barry Warsaw
-
Christopher Armstrong
-
Gary Poster
-
Josiah Carlson
-
Nick Coghlan
-
Noam Raphael
-
Oren Tirosh
-
Paolino
-
Paolino
-
Raymond Hettinger
-
Steve Holden