Guido van Rossum wrote:
No. The idea is to have "association" objects. We can create these directly if we want:
a=Association('limit',100) print a.key, a.value # whatever
The association value is mutable, but the key is not.
A namespace object is a collection of association objects such that no two items have the same key. Internally, this would be very much like the current dictionary except that instead of an array of dictentries, you'd have an array of association object pointers. Effectively, associations are exposed dictentries.
Externally, a namspace acts more or less like any mapping object. For example, when someone does a getitem, the namespace object will find the association with the desired key and return it's value. In addition, a namspace object would provide methods along the lines of:
associations()
Return a sequence of the associations in the namespace
addAssociation(assoc)
Add the given association to the namsspace. This creates another reference to the association. Changing the association's value also changes the value in the namespace.
getAssociation(key)
Get the association associated with the key.
A setitem on a namespace modifies an existing association if there is already an association for the given key.
I presume __setitem__() creates a new association if there isn't one.
Yes.
I also presume that if an association's value is NULL, it doesn't show up in keys(), values() and items() and it doesn't exist for has_key() or __getitem__().
Right.
What does a delitem do? Delete the association or set the value to NULL? I suppose the latter.
Good question. I'm inclined to think the former. That is, deleting an item from the namespace would delete the name association. I can see arguments both ways.
For example:
n1=namespace() n1['limit']=100 n2=namespace() n2.addAssociation(n1.getAssociation('limit')) print n2['limit'] # prints 100 n1['limit']=200 print n2['limit'] # prints 200
When a function is compiled that refers to a global variable, we get the association from the global namespace and store it. The function doesn't need to store the global namespace itself, so we don't create a circular reference.
For this to work we would have to have to change the division of labor between the function object and the code object. The code object is immutable and contains no references to mutable objects; this means that it can easily be marshalled and unmarshalled. (Also, when a code object is compiled or unmarshalled, the globals in which its function will be defined may not exist yet.) The function object currently contains a pointer to the code object and a pointer to the dictionary with the globals. (It also contains the default arg values.)
It seems that for associations to work, they need to be placed in the function object, and the code object somehow needs to reference them through the function object. To make this concrete: if a function references globals a, b, and c, these need to be numbered, and the bytecodes should look like this:
LOAD_GLOBAL 0 # a STORE_GLOBAL 1 # b DEL_GLOBAL 2 # c
(This could be compiled from ``b = a; del c''.)
The code object should also contains a list of global names, ordered by their ordinals, e.g. ("a", "b", "c").
Then when the function object is created, it looks in that list and creates a corresponding list of associations, e.g.:
L = [] for name in code.co_global_names: L.append(globals.getAssociation(name))
The VM then sticks a pointer to this list into the frame, whenever the function is called (instead of the globals dict which it sticks there now), and the LOAD/STORE/DEL_GLOBAL opcodes reference the associations through this list.
Looks good to me. :)
Some complications left as exercises:
- The built-in functions (and exceptions, etc.) should also be referenced via associations; the loop above would become a bit trickier since it needs to look in two dicts. (We're assuming that the code generator doesn't know which names are globals and which are built-ins.)
- If the association for a name doesn't yet exist, it should be created.
Yup.
Note that the semantics are slightly different than currently: the decision whether a name refers to a global or to a built-in is made when the function is defined rather than each time when the name is referenced. This is a bit cleaner -- in the type-sig we're making similar assumptions but the decision is made even earlier.
But, overall the necessary changes to the implementation and to the semantics (e.g. of the 'for' statement) seem prohibitive to me.
Really? Even for Py3K?
I also think that the namespace implementation will be quite a bit less efficient than a regular dictionary:
Spacewise yes. They'd me much faster in use. This is a space/speed tradeoff.
currently, a dictionary entry is a struct of 12 bytes, and the dictionary has an array of these tightly packed. Your association objects will be "real" objects, which means they have a reference count, a type pointer, a key, and a value, i.e. 16 bytes, without counting the malloc overhead; this probably comes in addition to the 12 bytes in the dict entry.
Why not replace the key and value pointers with the association pointer. Then you'd get back a little of the space.
(If you want to have the association objects directly in the hash table, they can't be shared between namespaces, and a namespace couldn't grow -- when a dict grows its hash table is reallocated.)
Note that circular references are bad even if we have a more powerful gc.
I don't understand or believe this statement.
This was discussed at length a year or two ago. You added code to print to stderr when an error occured in a destructor. People noticed that they were getting errors when Python exited. The problem occured when a destructor was called after it's globals had been deallocated. You subsequently added alot of extra rules on shutdown to make this much less likely. I don't think you made the problem go away completely. I find circular references to be bad in other ways. For example, they are a pain with deep copy. You can make deep copy do something in the presense of circular references, but the things it does can be quite surprising.
For example, by not storing the global namespace in a function, we don't have to worry about the global namespace being blown away before a destructor is run during process exit.
If we had more powerful gc the global namespace wouldn't have to be blown away at all (it would gently dissolve when __main__ was deleted from the interpreter).
Uh, OK, then we wouldn't have to worry about the global namespace being gently dissolved before a destructor is run during process exit.
When we use the global variable in the function, we simply get the current value from the association. We don't have to look it up.
Namespaces would have other benefits:
- improve the semantics of:
from spam import foo
in that you'd be importing a name binding, not a value
But its semantics will be harder to explain, because they will no longer be equivalent to
import spam # assume there's no spam already foo = spam.foo del spam
Will they really be harder to explain? Why not explain them a different way? "The statement: from spam import foo copies a name binding for foo from module spam to the current module." Eh, I guess I can see why someone would find this harder....
Also, we currently *explain* that only objects are shared and name bindings are unique per namespace; this would no longer be true so we would have to explain a much harder rule. ("If you got your foo through an import from another module, assigning to it will affect foo in that other module too; but if you got it through a local assignment, the effect will be local.")
Good point. Perhaps assinging in the client module should break the connection to the other module. This would require some extra magic.
All in all, I think these semantics are messy and unacceptable. True, object sharing is hard to explain too (see diagram on Larning Python page 60), but you'll still have to explain that anyway because it still exists within a namespace; but now in addition we'd have to explain that there is an exception to object sharing... Messy, messy.
Well, I don't have a problem with object sharing, so the notion of sharing namespaces doesn't bother me. I undertand that some folks have a problem with object sharing and I agree that they'd have problems with name sharing. OTOH, I don't think you'd consider the fact that some people have difficulty with object sharing to be sufficient justification for removing the feature from the language.
- Be useful in any application where it's desireable to share a name binding.
I think it's better to explicitly share the namespace -- "foo.bar = 1" makes it clear that whoever else has a reference to foo will see bar similarly changed.
Again, it would also make function global variable access faster and cleaner in some ways.
But I have other plans for that (if the optional static typing stuff ever gets implemented).
Well, OK, but I argue that the namespace idea is much simpler and more foolproof.
I claim that it's not foolproof at all -- on the contrary, it creates something that hides in the dark and will bite us in the behind by surprise,
How so?
long after we thought we knew there were no monsters under the bed. (Yes, I've been re-reading Calvin and Hobbes. :-)
however it would break a considerable amount of old code, I think.
Really? I wonder. I bet it would break alot less old code that other recent changes.
Oh? Name some changes that broke a lot of code?
The move to class-based exceptions broke alot of our code.
It must have been very traumatic that you're still sore over that; it was introduced in 1.5, over two years ago.
I'm not sore. But it was a bigger (IMO) backward incompatibility. Jim -- Jim Fulton mailto:jim@digicool.com Technical Director (888) 344-4332 Python Powered! Digital Creations http://www.digicool.com http://www.python.org Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email address may not be added to any commercial mail list with out my permission. Violation of my privacy with advertising or SPAM will result in a suit for a MINIMUM of $500 damages/incident, $1500 for repeats.