A modest proposal (continued)

Thu Nov 15 21:06:56 EST 2001

In article <9t1ml4$vhi$1 at nntp1-cm.news.eni.net>, "Stuart D. Gathman"
<stuart at bmsi.com> wrote:

> I have been putting my thoughts toward the elusive goal of making python
> as fast as some LISP dialects.

> The basic idea is to split namespace dictionaries into two parts, a
> simple C array of PyObject pointers, and a SymbolTable which maps names
> to indexes into the C array.  A SymbolTable can grow, but never shrink.
> Deleting a name from the namespace is represented by storing a 0 into
> the C PyObject array.  The C array is automatically enlarged as required
> when adding names to the dictionary (call it a SymbolDictionary).
> 
> In the general case, access is slightly slower due to the extra index
> into the C array, and memory could be wasted due to deleted names, which
> still occupy a (0 value) slot in the array and SymbolTable.  However,
> all python semantics are preserved.

Forget all that stuff about caching optimized copies of function code.

Instead, an adaptation of call site caching will work wonders.  The
bytecode to access a global contains a namespace id, and an index.  The
namespace id can be a direct pointer, or an index into a table of
namespaces.

   E.g. 	LOAD nid idx

When accessing a global, the desired namespace is compared against the
namespace id in the bytecode.  If they match, the global is fetched
directly using the index (giving a Name Error if the slot is 0).  If they
don't match, the name is retrieved using the index and old namespace id,
and the index in the new namespace is looked up/added.  The new namespace
id and new index are stored into the bytecode.

This scheme will work even for compound globals (e.g. os.argv) - the
global access for each component has its own bytecode and is cached
independently.

The namespace id for each global access bytecode is initialized to the
namespace of the module or class the function is defined in (and the
identifier preloaded in that module or classes namespace SymbolTable).  If the
function is assigned to multiple modules or classes, a byte code may
switch namespaces frequently - and will be somewhat slower than it is
currently.  But in the common case where a function stays with the module
or class it is defined in, there will be a significant speedup.  

This strategy should work with a python compiler - preserving the dynamic
semantics.  Each global access would look like this:

def execGlobal(nid,bc):
    curNID = getNID(bc)
    if curNID == nid:
       return nid.getByIndex(getIDX(bc))	# name error if slot is 0
    name = nid.getNameByIndex(getIDX(bc))
    idx = nid.getIndexByName(name)	# add name if not in new namespace
    setNID(bc,nid)
    setIDX(bc,idx)
    return nid.getByIndex(idx)		# name error if slot is 0

The cache hit path would have only a few machine instructions (for some
random load/store machine):
   # a2 has bytecode pointer, a0 has namespace pointer
   load a1,a2[bcnid]
   compare a1,a0
   jeq hit
   call fixupbc
hit:
   load d0,a2[bcidx]
   load a0,a0[d0]
   compare a0,0
   jeq name_err
   # a0 has address of PyObject

-- 
	      Stuart D. Gathman <stuart at bmsi.com>
Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.