[Python-Dev] PEP 338 issue finalisation (was Re: 2.5 PEP)

Nick Coghlan ncoghlan at gmail.com
Thu Feb 16 11:57:14 CET 2006


Guido van Rossum wrote:
> On 2/15/06, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> PEP 338 is pretty much ready to go, too - just waiting on Guido's review and
>> pronouncement on the specific API used in the latest update (his last PEP
>> parade said he was OK with the general concept, but I only posted the PEP 302
>> compliant version after that).
> 
> I  like the PEP and the implementation (which I downloaded from SF).
> Here are some comments in the form of diffs (attached).
> 
> Do you have unit tests for everything? I believe I fixed a bug in the
> code that reads a bytecode file (it wasn't skipping the timestamp).

I haven't worked the filesystem based tests into the unit tests yet, and even 
the manual tests I was using managed to leave out compiled bytecode files (as 
you noticed). I'll fix that.

Given I do my testing on Linux, I probably still would have forgotten the 'rb' 
mode definitions on the relevant calls to open() though. . .

> +++ pep-0338.txt	(working copy)
> -    The optional argument ``init_globals`` may be used to pre-populate
> +    The optional argument ``init_globals`` may be a dictionary used to pre-populate
>      the globals dictionary before the code is executed. The supplied
>      dictionary will not be modified.

I just realised that anything that's a legal argument to "dict.update" will 
work. I'll fix the function description in the PEP (and the docs patch as well).

> --- runpy.py	Wed Feb 15 15:56:07 2006
>           def get_data(self, pathname):
> !             # XXX Unfortunately PEP 302 assumes text data :-(
> !             return open(pathname).read()

Hmm.

The PEP itself requests that a string be returned from get_data(), but doesn't 
require that the file be opened in text mode. Perhaps the PEP 302 emulation 
should use binary mode here? Otherwise there could be strange data corruption 
bugs on Windows.


> --- 337,349 ----
>   
>   # This helper is needed as both the PEP 302 emulation and the
>   # main file execution functions want to read compiled files
> + # XXX marshal can also raise EOFError; perhaps that should be
> + # turned into ValueError?  Some callers expect ValueError.
>   def _read_compiled_file(compiled_file):
>       magic = compiled_file.read(4)
>       if magic != imp.get_magic():
>           raise ValueError("File not compiled for this Python version")
> +     compiled_file.read(4) # Throw away timestamp
>       return marshal.load(compiled_file)

I'm happy to convert EOFError to ValueError here if you'd prefer (using the 
string representation of the EOFError as the message in the ValueError).

Or did you mean changing the behaviour in marshal itself?

> --- 392,407 ----
>       loader = _get_loader(mod_name)
>       if loader is None:
>           raise ImportError("No module named " + mod_name)
> +     # XXX get_code() is an *optional* loader feature. Is that okay?
>       code = loader.get_code(mod_name)

If the loader doesn't provide access to the code object or the source code, 
then runpy can't really do anything useful with that module (e.g. if its a C 
builtin module). Given that PEP 302 states that if you provide get_source() 
you should also provide get_code(), this check should be sufficient to let 
runpy.run_module get to everything it can handle.

A case could be made for converting the attribute error to an ImportError, I 
guess. . .

>       filename = _get_filename(loader, mod_name)
>       if run_name is None:
>           run_name = mod_name
> +     # else:
> +         # XXX Should we also set sys.modules[run_name] = sys.modules[mod_name]?
> +         #     I know of code that does "import __main__".  It should probably
> +         #     get the substitute __main__ rather than the original __main__,
> +         #     if run_name != mod_name
>       return run_module_code(code, init_globals, run_name, 
>                              filename, loader, as_script)

Hmm, good point. How about a different solution, though: in run_module_code, I 
could create a new module object and put it temporarily in sys.modules, and 
then remove it when done (restoring the original module if necessary).

That would mean any module with code that looks up "sys.modules[__name__]" 
would still work when run via runpy.run_module or runpy.run_file.

I also realised that sys.argv[0] should be restored to its original value, too.

I'd then change the "as_script" flag to "alter_sys" and have it affect both of 
the above operations (and grab the import lock to avoid other import or 
run_module_code operations seeing the altered version of sys.modules).

> --- 439,457 ----
>       
>          Returns the resulting top level namespace dictionary
>          First tries to run as a compiled file, then as a source file
> +        XXX That's not the same algorithm as used by regular import;
> +            if the timestamp in the compiled file is not equal to the
> +            source file's mtime, the compiled file is ignored
> +            (unless there is no source file -- then the timestamp
> +            is ignored)

They're doing different things though - the import process uses that algorithm 
to decide which filename to use (.pyo, .pyc or .py). This code in run_file is 
trying to decide whether the supplied filename points to a compiled file or a 
source file without a tight coupling to the specific file extension used (e.g. 
so it works for Unix Python scripts that rely on the shebang line to identify 
which interpreter to use to run them).

I'll add a comment to that effect.

Another problem that occurred to me is that the module isn't thread safe at 
the moment. The PEP 302 emulation isn't protected by the import lock, and the 
changes to sys.argv in run_module_code will be visible across threads (and may 
clobber each other or the original if multiple threads invoke the function).

On that front, I'll change _get_path_loader to acquire and release the import 
lock, and the same for run_module_code when "alter_sys" is set to True.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org


More information about the Python-Dev mailing list