PEP 267 improvement idea (fwd)

Beni Cherniavsky cben at techunix.technion.ac.il
Mon Jan 13 11:25:56 EST 2003


Since the PEP's author never replied (and the PEP wasn't updated), I
thought I'd float the idea.  Perhaps somebody will also answer my
questions below on how the cross-module part was intended to work...

-- 
Beni Cherniavsky <cben at tx.technion.ac.il>

---------- Forwarded message ----------
Date: Tue, 10 Dec 2002 23:42:07 +0200 (IST)
From: Beni Cherniavsky <cben at techunix.technion.ac.il>
To: Jeremy Hylton <jeremy at zope.com>
Subject: PEP 267 improvement idea

[Wasn't sure what lists to mail so mailed just you; forward as you feel
appropriate...]

The PEP currently says:

    The compiler currently collects the names of all global variables
    in a module.  These are names bound at the module level or bound
    in a class or function body that declares them to be global.

    The compiler would assign indices for each global name and add the
    names and indices of the globals to the module's code object.

My idea is to collect the global names *read from* instead of the global
names assigned to.  The benefit is that unlike the names that the module
will actually contain, these are known statically; names that are assigned
through __dict__ won't be missed. precisely the names for which
LOAD_GLOBAL_BY_NUM (or however you call it) will be emitted.  The
implementation of the name collecting might be a little more difficult -
I'm not familiar with the internals s oI can't tell...

    Each code object would then be bound irrevocably to the module it
    was defined in.  (Not sure if there are some subtle problems with
    this.)

A question (independent of my idea): how does this interact with
compile(), eval() and exec?  Could a code object compiled under one module
be executed under another globals dict?  Probably eval() and exec should
check that the native dlict of code object is tied to the one that's
passed in and enter a fall back VM mode where LOAD_GLOBAL_BY_NUM indexes
the active dlict by the name at co_native_dlict.name_list[num]...

    For attributes of imported modules, the module will store an
    indirection record.  Internally, the module will store a pointer
    to the defining module and the offset of the attribute in the
    defining module's global variable array.  The offset would be
    initialized the first time the name is looked up.

I don't quite understand how this part of the PEP is going to work anyway.
Assume I do in module foo:

import bar
for i in range(100):
    bar.baz(bar.quux)

If baz was never read in bar, my method will miss it (i.e. it will not be
numbered in bar's dlict); if it was, it will be numbered and reading baz
in bar will be translated as LOAD_GLOBAL_BY_NUM.  Also the fetching bar in
foo will surely become a LOAD_GLOBAL_BY_NUM.  However I don't understand
how can the attribute accesses .baz and .quux on it be optimized beyond a
dict lookup?  To go through any "indirection record" for the numbers of
'baz' and 'quux' in bar, one should first select the right indirection
record - which needs another dict lookup!  What do I miss?


Another idea: this PEP could also benefit from "aggresive builtins
caching" along the lines suggested in PEP 280.  Actually it seems to be an
optimisation that is orthogonal to all three globals access optimization
PEPs.

Maybe even the fallback to searching in the builtins can be eliminated
completely.  Like in PEP 280, the dlict will have a builtin flag for each
name.  Upon creation of any dlict, the builtins are copied to the dlict
before the module starts executing.  When builtins are
modified/added/removed, every dlict must be updated, except when shadowed
(builtin flag cleared).  When a global is deleted, if has an index in the
dlict and a builtin of same name exists, the builtin is copied into the
dlict.  Maybe it'll be needed to forcefully give indices to all builtin
names, even if not referenced as free variables in the module - this will
ensure that builtins can only be seen in numbered dlict entries; this
implies growing the numebred part of dlicts when new builtins are added.

-- 
Beni Cherniavsky <cben at tx.technion.ac.il>

What's lower level than machine code?  A spreadsheet: not only addresses
are numeric and hand-allocated but also all loops are hand-unrolled and
all calls hand-inlined... (and macros are unheard of, of course).





More information about the Python-list mailing list