[Python-Dev] How to interpret get_code from PEP 302?

Tue Aug 21 17:01:11 CEST 2007

Brett Cannon wrote:
> PEP 302 ("New Import Hooks") has an optional extensions section so
> that tools like py2exe and py2app have an easier time.  Part of the
> optional extensions is the method get_code that is to return the code
> object for the specified method (if the loader can handle it).
> 
> But there is a lack in the definition of how get_code is supposed to
> be implemented.  The definition says that the "method should return
> the code object associated with the module", which is fine.  But then
> it goes on to state that "If the loader doesn't have the code object
> but it _does_ have the source code, it should return the compiled
> source code".  This throws me as this makes it sound like bytecode
> does not need to be used if the loader does not already have a code
> object and there is no source to be had; any bytecode can be ignored.

It also goes on to say *why* this requirement is there:
"(This is so that our caller doesn't also need to check get_source()
  if all it needs is the code object.)"

In terms of *what* get_code() returns, all it needs to do is accurately 
reflect what load_module() actually runs, but it may not be the exact 
same code object (e.g. if the code object isn't cached anywhere by the 
loader, then it will be recreated anew for each call to get_code() or 
load_module()).

> Now I doubt this is how it is supposed to be read.  Does anyone
> disagree with that?  If not, I will change the wording to mention that
> bytecode must be used if no source is available (and that the magic
> number must be verified).

The magic number belongs to the pyc file, not the code object. It's up 
to the loader to do whatever checking is needed to decide whether or not 
the code object is sane for the current environment (and it makes no 
difference whether this is done for get_code() or for an actual 
load_module() call).

If load_module() would use some cached bytecode (e.g. a pyc file) in 
preference to recompiling from source (e.g. a py file), then get_code() 
should do the same thing. On the other hand, if load_module() has a 
means for detecting stale bytecode and will recompile in that case, then 
the same check is also needed in get_code().

The important thing is for load_module() and get_code() to be consistent 
  - beyond that, there are no requirements regarding how the source and 
bytecode are stored externally (if they're stored at all). For example, 
you could write a perfectly valid loader which ignored .pyo and .pyc 
files entirely, and always recompiled from the source file (it would be 
slower than the normal one, but it would work).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org