Python 2 namespace change? (was Re: [Python-Dev] Changing existing class instances)

Thu, 03 Feb 2000 15:21:45 -0500

Guido van Rossum wrote:
> 
> > No.  The idea is to have "association" objects. We can create
> > these directly if we want:
> >
> >   a=Association('limit',100)
> >   print a.key, a.value # whatever
> >
> > The association value is mutable, but the key is not.
> >
> > A namespace object is a collection of association objects
> > such that no two items have the same key. Internally, this
> > would be very much like the current dictionary except that
> > instead of an array of dictentries, you'd have an array of
> > association object pointers.  Effectively, associations
> > are exposed dictentries.
> >
> > Externally, a namspace acts more or less like any
> > mapping object. For example, when someone does a getitem,
> > the namespace object will find the association with the
> > desired key and return it's value.  In addition, a namspace
> > object would provide methods along the lines of:
> >
> >   associations()
> >
> >     Return a sequence of the associations in the namespace
> >
> >   addAssociation(assoc)
> >
> >     Add the given association to the namsspace.  This
> >     creates another reference to the association.
> >     Changing the association's value also changes the value
> >     in the namespace.
> >
> >   getAssociation(key)
> >
> >     Get the association associated with the key.
> >
> > A setitem on a namespace modifies an existing association
> > if there is already an association for the given key.
> 
> I presume __setitem__() creates a new association if there isn't one.

Yes.

> I also presume that if an association's value is NULL, it doesn't show
> up in keys(), values() and items() and it doesn't exist for has_key()
> or __getitem__().

Right.

> What does a delitem do?  Delete the association or set the value to
> NULL?  I suppose the latter.

Good question.  I'm inclined to think the former.
That is, deleting an item from the namespace would
delete the name association.  I can see arguments both ways.

> > For example:
> >
> >   n1=namespace()
> >   n1['limit']=100
> >   n2=namespace()
> >   n2.addAssociation(n1.getAssociation('limit'))
> >   print n2['limit'] # prints 100
> >   n1['limit']=200
> >   print n2['limit'] # prints 200
> >
> > When a function is compiled that refers to a global
> > variable, we get the association from the global namespace
> > and store it. The function doesn't need to store the global
> > namespace itself, so we don't create a circular reference.
> 
> For this to work we would have to have to change the division of labor
> between the function object and the code object.  The code object is
> immutable and contains no references to mutable objects; this means
> that it can easily be marshalled and unmarshalled.  (Also, when a code
> object is compiled or unmarshalled, the globals in which its function
> will be defined may not exist yet.)  The function object currently
> contains a pointer to the code object and a pointer to the dictionary
> with the globals.  (It also contains the default arg values.)
> 
> It seems that for associations to work, they need to be placed in the
> function object, and the code object somehow needs to reference them
> through the function object.  To make this concrete: if a function
> references globals a, b, and c, these need to be numbered, and the
> bytecodes should look like this:
> 
>         LOAD_GLOBAL     0       # a
>         STORE_GLOBAL    1       # b
>         DEL_GLOBAL      2       # c
> 
> (This could be compiled from ``b = a; del c''.)
> 
> The code object should also contains a list of global names, ordered
> by their ordinals, e.g. ("a", "b", "c").
> 
> Then when the function object is created, it looks in that list and
> creates a corresponding list of associations, e.g.:
> 
>         L = []
>         for name in code.co_global_names:
>             L.append(globals.getAssociation(name))
> 
> The VM then sticks a pointer to this list into the frame, whenever the
> function is called (instead of the globals dict which it sticks there
> now), and the LOAD/STORE/DEL_GLOBAL opcodes reference the associations
> through this list.

Looks good to me. :)

> Some complications left as exercises:
> 
> - The built-in functions (and exceptions, etc.) should also be
> referenced via associations; the loop above would become a bit
> trickier since it needs to look in two dicts.  (We're assuming that
> the code generator doesn't know which names are globals and which are
> built-ins.)
> 
> - If the association for a name doesn't yet exist, it should be
> created.

Yup.

> Note that the semantics are slightly different than currently: the
> decision whether a name refers to a global or to a built-in is made
> when the function is defined rather than each time when the name is
> referenced.  This is a bit cleaner -- in the type-sig we're making
> similar assumptions but the decision is made even earlier.
> 
> But, overall the necessary changes to the implementation and to the
> semantics (e.g. of the 'for' statement) seem prohibitive to me.

Really? Even for Py3K?

> I also think that the namespace implementation will be quite a bit
> less efficient than a regular dictionary:

Spacewise yes.  They'd me much faster in use. This is a space/speed 
tradeoff.

> currently, a dictionary
> entry is a struct of 12 bytes, and the dictionary has an array of
> these tightly packed.  Your association objects will be "real"
> objects, which means they have a reference count, a type pointer, a
> key, and a value, i.e. 16 bytes, without counting the malloc overhead;
> this probably comes in addition to the 12 bytes in the dict entry.

Why not replace the key and value pointers with the association pointer.
Then you'd get back a little of the space.

> (If you want to have the association objects directly in the hash
> table, they can't be shared between namespaces, and a namespace
> couldn't grow -- when a dict grows its hash table is reallocated.)
> 
> > Note that circular references are bad even if we have
> > a more powerful gc.
> 
> I don't understand or believe this statement.

This was discussed at length a year or two ago. You added code
to print to stderr when an error occured in a destructor.
People noticed that they were getting errors when Python
exited. The problem occured when a destructor was called after
it's globals had been deallocated.  

You subsequently added alot of extra rules on shutdown
to make this much less likely. I don't think you made the problem
go away completely.

I find circular references to be bad in other ways.
For example, they are a pain with deep copy. You can make
deep copy do something in the presense of circular references, 
but the things it does can be quite surprising.

> > For example, by not storing the global
> > namespace in a function, we don't have to worry about the
> > global namespace being blown away before a destructor is run
> > during process exit.
> 
> If we had more powerful gc the global namespace wouldn't have to be
> blown away at all (it would gently dissolve when __main__ was deleted
> from the interpreter).

Uh, OK, then we wouldn't have to worry about the
global namespace being gently dissolved before a destructor is run
during process exit.

> > When we use the global variable
> > in the function, we simply get the current value from the
> > association. We don't have to look it up.
> >
> > Namespaces would have other benefits:
> >
> >   - improve the semantics of:
> >
> >       from spam import foo
> >
> >     in that you'd be importing a name binding, not a value
> 
> But its semantics will be harder to explain, because they will no
> longer be equivalent to
> 
>         import spam     # assume there's no spam already
>         foo = spam.foo
>         del spam

Will they really be harder to explain?  Why not explain them 
a different way?

  "The statement:

     from spam import foo

   copies a name binding for foo from module spam to the current
   module."

Eh, I guess I can see why someone would find this 
harder....

> Also, we currently *explain* that only objects are shared and name
> bindings are unique per namespace; this would no longer be true so we
> would have to explain a much harder rule.  ("If you got your foo
> through an import from another module, assigning to it will affect foo
> in that other module too; but if you got it through a local
> assignment, the effect will be local.")

Good point. Perhaps assinging in the client module
should break the connection to the other module. This would
require some extra magic.

> All in all, I think these semantics are messy and unacceptable.  True,
> object sharing is hard to explain too (see diagram on Larning Python
> page 60), but you'll still have to explain that anyway because it
> still exists within a namespace; but now in addition we'd have to
> explain that there is an exception to object sharing...  Messy, messy.

Well, I don't have a problem with object sharing, so the notion
of sharing namespaces doesn't bother me. I undertand that
some folks have a problem with object sharing and I agree
that they'd have problems with name sharing. OTOH, I don't
think you'd consider the fact that some people have difficulty
with object sharing to be sufficient justification for removing
the feature from the language.

> >   - Be useful in any application where it's desireable to
> >     share a name binding.
> 
> I think it's better to explicitly share the namespace -- "foo.bar = 1"
> makes it clear that whoever else has a reference to foo will see bar
> similarly changed.
> 
> > > > Again, it would also make function global variable access
> > > > faster and cleaner in some ways.
> > >
> > > But I have other plans for that (if the optional static typing stuff
> > > ever gets implemented).
> >
> > Well, OK, but I argue that the namespace idea is much simpler
> > and more foolproof.
> 
> I claim that it's not foolproof at all -- on the contrary, it creates
> something that hides in the dark and will bite us in the behind by
> surprise,

How so?

> long after we thought we knew there were no monsters under
> the bed.  (Yes, I've been re-reading Calvin and Hobbes. :-)
> 
> > > > > however it would break a considerable amount of old code,
> > > > > I think.
> > > >
> > > > Really? I wonder. I bet it would break alot less old
> > > > code that other recent changes.
> > >
> > > Oh?  Name some changes that broke a lot of code?
> >
> > The move to class-based exceptions broke alot of our code.
> 
> It must have been very traumatic that you're still sore over that;
> it was introduced in 1.5, over two years ago.

I'm not sore. But it was a bigger (IMO) backward incompatibility.

Jim

--
Jim Fulton           mailto:jim@digicool.com
Technical Director   (888) 344-4332              Python Powered!
Digital Creations    http://www.digicool.com     http://www.python.org

Under US Code Title 47, Sec.227(b)(1)(C), Sec.227(a)(2)(B) This email
address may not be added to any commercial mail list with out my
permission.  Violation of my privacy with advertising or SPAM will
result in a suit for a MINIMUM of $500 damages/incident, $1500 for
repeats.